387.12, 387.22: [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Faile...

kokoko3k · October 11, 2017, 1:36pm

My log is flooded by the following:

[ 3486.059343] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.072668] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.099312] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.112654] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.125981] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.139313] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.152640] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.165969] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.179299] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000
[ 3486.192627] [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence context: 0x00000000

I’ve a dual head configuration; right monitor is attached to nvidia card, left monitor is attached to intel connector, and i use “reverse prime” with prime sync enabled.

To reproduce:

i start borderlands 2 on the rightmost monitor
while it runs, i hit an hotkey that switch forcefullcompositionpipeline on/off
Xorg crashes
I relog to plasma session
Log is flooded by those messages.

Disabling the leftmost screen temporally fixes it, as soon as i activate the flood restarts.
As above, disabling “PRIME Synchronization” temporally fixes it, as soon as i enable it, flood continues.

Reverting to 384.90 seems to fix it.
nvidia-bug-report.log.gz (295 KB)

kokoko3k · October 16, 2017, 7:58am

I was able to reproduce even without running any game, but just from the plasma desktop (opengl compositor running anyway).
This time i tried with new xorg server: 1.19.5

As a side note, and that surprised me, removing nvidia,nvidia_modeset,nvidia_drm modules and reprobing them does NOT fix it,it seems the only way is to reboot the machine.

For reference, the script i use to switch forcefullcompotitingpipeline is:

koko@Gozer# cat scripts/nvidia.compositionpipeline.switch.sh 

#!/bin/bash

function osd {
        killall aosd_cat
    echo $1 | aosd_cat -n "Sans Bold 15"  -x 0 -y 0 -p 0 -t 0 -b 255 -s 255 -d 10 -R yellow  -u 5000 &
    echo $1 | aosd_cat -n "Sans Bold 15"  -x 0 -y 0 -p 2 -t 0 -b 255 -s 255 -d 10 -R yellow  -u 5000 &                                                         
}                                                                                                                                                              
                                                                                                                                                               
killall aosd_cat

if nvidia-settings -t -q CurrentMetaMode|grep 'ForceCompositionPipeline=On' &>/dev/null ; then
    nvidia.compositionpipeline.disable.sh &
    osd "FFCP=OFF"
        else
    nvidia.compositionpipeline.enable.sh &
    osd "FFCP=ON"
fi

koko@Gozer# cat scripts/nvidia.compositionpipeline.enable.sh 
#/bin/bash
sh -c "nvidia-settings --assign CurrentMetaMode=\"$(nvidia-settings -t -q CurrentMetaMode |tr -d "\n"|sed 's/ViewPortIn=/ForceFullCompositionPipeline=On, ViewPortIn=/g'|sed 's/.*:://'|sed 's/^ *//;s/ *$//')\""

#!/bin/bash
sh -c "nvidia-settings --assign CurrentMetaMode=\"$(nvidia-settings -t -q CurrentMetaMode |tr -d "\n"|sed 's/.*:://'|sed 's/^ *//;s/ *$//'|sed "s/CompositionPipeline=On/CompositionPipeline=Off/g")\""

kokoko3k · October 31, 2017, 12:09pm

I can reproduce the bug even with 387.22, linux 4.13.9 and xorg-server 1.19.5

nvidia-bug-report.log.gz (249 KB)

kokoko3k · October 31, 2017, 12:27pm

…And i found that it is triggered by the following too:

xrandr --output DVI-D-0 --mode 1280x1024 --scale-from 1450x1160 --refresh 75 ; sleep 5 ; xrandr --output DVI-D-0 --mode 1280x1024 --scale-from 1280x1024 --refresh 75

arkades · December 3, 2021, 9:39pm

Have you found any other useful conclusions on this issue?
I also confront those errors after Xorg crash whch happens after second resume from sleep.

kokoko3k · December 4, 2021, 11:43am

Well, on my main pc i just switched to amd.
On another i switched to intel as the primary adapter and use the nvidia card just as 3d accelerator via prime offloading.
https://wiki.archlinux.org/title/PRIME#PRIME_render_offload

Maybe not the answer you expected, but it solved the problem.

arkades · December 4, 2021, 11:22pm

thanks for your quick answer and arch link! I am struggling with those error messages since several months!
Unfortunately I have to use 390.144 drivers.
I always get those messages after second resume from sleep (no hibernation, no swap). Do you have any clue what this is about? Or where I should look at? Source Code? Google says you are the only expert on this.

kokoko3k · December 5, 2021, 3:04pm

Sorry, but i don’t have a solution for your issue.
I hope a developer would finally steps in with an answer, but i wouldn’t hold my breath, you know.

arkades · December 5, 2021, 3:16pm

I didnt expect a solution of course! I will try “reverse PRIME”. maybe thats a promising direction. If you recall any further useful ideas I appreciate it very much! Thank you!

kokoko3k · December 5, 2021, 3:43pm

Reverse prime is what gave troubles to me, if you read the first post.
Prime offloading is what workarounded them, but i don’t know if it is available for the drivers you use.

arkades · December 5, 2021, 4:41pm

exactly! I think this find of yours is interesting!

In my sddm/scripts/Xsetup I also do perform something like

for next in $(xrandr --listmonitors | grep -E " [0-9]+:." | cut -d" " -f6); do
[ -z “$current” ] && current=$next && continue
xrandr --output $current --auto --output $next --auto --right-of $current
current=$next
done

to layout several screens next to each other at login. Probably sddm also runs the Xsetup on every resume, and there for xrand is run at every resume and causes Xorg to crash!

Now, reversing PRIME was just a guess, one thing I didnt try yet.

Missing EDID

Did you also experience those countless warnings about missing EDID for CRT-0/VGA-0 ?

I thought about adding an edid file to initramfs to solve this at early boot time.

kokoko3k · December 5, 2021, 5:35pm

Sorry, I moved on years ago, but missing EDID doesn’t sound familiar.

arkades · December 5, 2021, 5:38pm

:D yes I know. Just try to gather some parallels/causalities. Anyways, thank you!
Best regards

387.12, 387.22: [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Faile...

387.12, 387.22: [drm:nvidia_drm_gem_fence_attach [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Faile...