nvidia 384.98 and display orientation chaning when X restarts layout-less xorg.conf

Sorry this was misposted at the “Linux Graphics Debugger” section here:

Reposting to the correct location.

I proposed the following question on the elrepo mailing list looking for feedback related to multiple displays changing orientation when you logout (or when X restarts).

http://lists.elrepo.org/pipermail/elrepo/2017-December/003974.html

Based off additional testing we believe this is a driver issue introduced in 375. We’ve tested 375.66 384.90 and 384.98 on RHEL 7.4 and 367.57 384.98 on RHEL 7.3.

At first I thought it was something with Xorg 1.19 however testing with 7.3 and 384.98 it appears it’s not isolated to 1.19 (introduced) in RHEL 7.4.

If you see the above URL you’ll read the key point is whether or not a layout-less xorg.conf (generated by elrepo’s nvidia packages) is expected behavior: Should the layout remain static assuming cabling doesn’t change. This appears to be the case pre-375.xx however I don’t know if this is just luck (really lucky) or something that isn’t expected/supported.

Update 12/12/17

See the attached nvidia-bug-report files.
nvidia-bug-report.log.gz (230 KB)
nvidia-bug-report.log.old.gz (230 KB)

I presume you’re using Gnome as a desktop, then monitor config is handled by mutter and stored in ~/.config/monitors.xml. So this should be persistent unless the driver changes the connector names on X restart. So it’s a bit strange this is not working depending on driver version.

We use GNOME for init login but switch to fvwm later via ~/.xsession. We see this in both environments and don’t believe it’s GNOME, fvwm, or mutter related.

With mutter I believe that is true for non-RHEL mutter. However, RHEL7 (fixed in 7.2 and included with 7.3) mutter will not reorganize the display layout if one already exists. You can see this from the centos git repo with the patch that applied this (I don’t believe it’s in upstream mutter):

https://git.centos.org/blob/rpms!mutter.git/c7/SOURCES!0001-monitor-config-Consider-external-layout-before-defau.patch

https://bugzilla.redhat.com/show_bug.cgi?id=1290448

But we don’t have a ~/.config/monitors.xml or a system level monitors.xml.

However, I did at first want to blame mutter thinking it’s a regression but additional testing using run level 3 (multi-user systemdish) with just ‘startx’ we see this.

Also see incoming ‘startx – -logverbose 6’ + nvidia-bug-report logs.

Ok, that clarifies it a bit. Use
Option “ModeDebug” “true”
in the device section of xorg.conf to get detailed info what modes the driver is setting in X logs.

It appears to be swapping DFP-0 and DFP-1

See logs.

I think there’s more swapping when we have four displays connected.
nvidia-bug-report.log.gz (228 KB)
nvidia-bug-report.log.old.gz (228 KB)

Indeed, mode detection runs ordered from DFP-0 to DFP-2 every time and in the end the driver selects a random order when setting the modes. 384.90 still worked? Did you try if the 387 series is also hit by this?

No, we see this display change on 375.66, 384.90, and 384.98 on RHEL 7.4
Testing with 387.34 see still see the display changing.

Where we didn’t see the issue was 367.57 on RHEL 7.3. Upgrading 367 on 7.3 to 384.98 we would see the same problem.

See attached logs for 387.34

I don’t know if it’s related but when sitting on runlevel 3 the displays that are mirrored at change after logging out as well. I believe you’ll see that via the “Valid display device” “(boot)” entries.
nvidia-bug-report.log.gz (233 KB)
nvidia-bug-report.log.old.gz (233 KB)

I think that’s a hint. Seems like those are indistinguishable at some low level due to changes in the driver. The EDID and hashes of those monitors differ, though.

Just an idea: did you try setting nvidia-drm.modeset=1 kernel parameter?

I believe this might have fixed our issue. At least with 387.

Would you recommend the same setting for the long lived 384 driver? If so we can test that tomorrow.

Sure. It’s the next-gen modesetting anyway, matured since the 37X driver series. Can have adverse effects like gdm thinking it can run a wayland session on it.

Any idea why this setting would fix this issue and what was new in 375 (at least 375.66) that would require this?

Not really but around 367/370 the Pascal series was introduced which seems to have required changes in modesetting and this seems to have introduced sometimes subtle regressions for older hardware. The alternative kms implementation that’s enabled with the parameter is often not affected by those regressions.

Thanks for the help nvidia-drm.modeset=1 also works on the long lived 384.98 driver.