Modprobe and rmmod without an actual USB device succeeds for me on R21.4, so reproducing the problem is easier with an actual device. The lsmod given was prior to g_ether modprobe, so I’m curious if after module load (with an actual device attached) the lsmod shows any “Used by” dependency?
This is interesting, I had originally assumed that ifconfig showed usb0 because you had an actual device, and my own Jetson is not failing. It just seems strange to see this in ifconfig without hardware, but perhaps the driver does this because it is hotplug over USB. ifconfig shows the device on my system as well, without an actual device.
I’ve recreated the steps for modprobe and rmmod without any error using R21.4. Since USB is involved, I tried this with the default USB2 driver (tegra-ehci) and again with the USB3 driver (tegra-xhci) loaded. I see no difference and no error.
What comes to mind is that this install is updated via:
apt update
apt-get upgrade
Perhaps this is from an older driver mix. Can you run the above update/upgrade and see if the problem persists? You can run this before and after to be sure nVidia-specific hardware files are properly in place:
sha1sum -c /etc/nv_tegra_release
Also, is there anything at all customized about the system, e.g., kernel rebuilds, boot loader argument changes, etc?
we have customized system with our own file system. there is no apt or apt-get commands in our system.
we have used kernel, bootloader from 21.4 only. there is no change in boot argument.
why use run apt update ? is there is any update after 21.4 release ? can you try to run with kernel and modules compiled from 21.4 release package only ?
I know an unmodified R21.4 works correctly for load/unload of the module. The only thing about my working test system which is not default is that it ran through apt-get upgrade…the kernel itself is not changed. So that implies that what is different is outside of the kernel itself or its modules. Upgrade was an attempt to make our two systems match and guarantee it wasn’t an issue from a bug which had been fixed in related packages.
I did not know your system was custom, and so it seemed as if the difference was either hardware failure (which is unlikely since it otherwise works) or a software issue outside of the kernel (our kernels are the same). If it is a software issue, then it could be configuration rather than corrupt or altered files…mostly due to kernel boot parameters because the issue involves load/unload of a module not in use (and boot parameters are kernel configuration). We know this isn’t the issue because we have both tried with unaltered kernel parameters on matching kernels.
So it comes down to something in your file system outside of the kernel or its configuration. Quite possibly something in kernel code has a weakness where it should gracefully handle non-kernel missing or broken support packages, but instead does not and ends up dumping state as a last resort.
The message “Unable to handle kernel NULL pointer dereference” is a very common error, almost always being something which could have been avoided through testing of pointers before using them. Something such as kgdb could be used to obtain a backtrace and point directly at the offending code (which requires it to be run on your system, else the NULL pointer dereference will never occur). Alternatively, if you are able to merge just parts of modified file system back in and re-try until the error no longer occurs, you could track down the real cause without kernel debugging. There are perhaps other ways of debugging this, but time and expertise required increases substantially.
NOTE: I will reflash a Jetson in the next few hours and try without updated packages. It’ll take a bit of time because I will clone the system before the flash.
I just reflashed with a “pure” R21.4, no updates of any kind. This system succeeds with modprobe and rmmod of g_ether, so this strongly suggests the custom file system has something in its environment which the module or module loader depends on. If you developed the custom system in stages, I’d suggest re-loading each stage of development and testing at each stage. You can clone your existing work to save it if you desire. Clone info:
[url]http://elinux.org/Jetson/Cloning[/url]
difference is in filesystem init script /etc/init/nv.conf specifically following lines
machine=`cat /sys/devices/soc0/machine`
if [ "${machine}" = "jetson-tk1" ] ; then
echo 4 > /sys/class/graphics/fb0/blank
if [ -e /sys/devices/platform/tegra-otg/enable_device ] ; then
echo 0 > /sys/devices/platform/tegra-otg/enable_device
fi
if [ -e /sys/devices/platform/tegra-otg/enable_host ] ; then
echo 1 > /sys/devices/platform/tegra-otg/enable_host
fi
fi
Fire following command and then try to load and unload module … then it will create crash
echo 1 > /sys/devices/platform/tegra-otg/enable_device
why enabling device mode creates crash during module unload ?
The OTG port has two mutually exclusive modes, host or device. Normal boot runs as a host, normal recovery mode runs as a device. Device mode could be used in normal boot, but so far I don’t know of any drivers making use of this. The point of mentioning this is that the physical hardware cannot be put in both host and device modes at the same time…if you enable device while still in host mode, or if you enable host while in device mode, there will be a failure. Granted, it would be nice if an error were gracefully detected and does not cause an OOPS, but it is still an error to attempt both modes at once.
Change the order of echo 1 to enable_device to first run echo 0 to enable_host. The following works:
NOTE: As long as the “no two modes at once” rule is followed, you can then modprobe and rmmod g_ether without error. I tested this in both host and device modes.
I did this again and did end up with the segmentation fault (all packages were updated on this system), so it is repeatable. I’m not sure why it succeeded on my previous test, although the package “update” state would likely have been different. I’ll see what I can find out.
For reference, the test with repeatable OOPS, L4T R21.4, default kernel, all apt-get update and apt-get upgrade current (no USB connected to the OTG port, only HUB/keyboard/mouse on the other port):
For those interested, here is some added information found using a kgdboc debug version of the kernel.
rmmod g_ether while in device mode:
Entering kdb (current=0xea0c6080, pid 1359) on processor 0 due to Keyboard Entry
[0]kdb> 825.354009] ---[ end trace 65222b331a2d7cb4 ]---
[18825.368405] note: rmmod[11929] exited with preempt_count 1
[18825.373989] BUG: scheduling while atomic: rmmod/11929/0x40000002
[18825.379994] Modules linked in: g_ether(-) libcomposite configfs dm_crypt dm_mod rfcomm bnep bluetooth rfkill nvhost_vi
[18825.390843] CPU: 0 PID: 11929 Comm: rmmod Tainted: G D 3.10.40-gdacac96_dbg1 #1
[18825.399062] [<c001746c>] (unwind_backtrace+0x0/0x138) from [<c001368c>] (show_stack+0x18/0x1c)
[18825.407681] [<c001368c>] (show_stack+0x18/0x1c) from [<c02d80d8>] (dump_stack+0x1c/0x20)
[18825.415782] [<c02d80d8>] (dump_stack+0x1c/0x20) from [<c00a5548>] (__schedule_bug+0x58/0x68)
[18825.424229] [<c00a5548>] (__schedule_bug+0x58/0x68) from [<c0876e34>] (__schedule+0xb0/0x690)
[18825.432756] [<c0876e34>] (__schedule+0xb0/0x690) from [<c00a8358>] (__cond_resched+0x2c/0x3c)
[18825.441281] [<c00a8358>] (__cond_resched+0x2c/0x3c) from [<c08777b8>] (_cond_resched+0x4c/0x54)
[18825.449978] [<c08777b8>] (_cond_resched+0x4c/0x54) from [<c014989c>] (unmap_page_range+0x158/0x174)
[18825.459019] [<c014989c>] (unmap_page_range+0x158/0x174) from [<c0149904>] (unmap_single_vma+0x4c/0x54)
[18825.468315] [<c0149904>] (unmap_single_vma+0x4c/0x54) from [<c014af3c>] (unmap_vmas+0x4c/0x6c)
[18825.476921] [<c014af3c>] (unmap_vmas+0x4c/0x6c) from [<c01508b0>] (exit_mmap+0xcc/0x200)
[18825.485015] [<c01508b0>] (exit_mmap+0xcc/0x200) from [<c006c8fc>] (mmput+0x58/0xfc)
[18825.492673] [<c006c8fc>] (mmput+0x58/0xfc) from [<c00739c8>] (exit_mm+0x188/0x190)
[18825.500242] [<c00739c8>] (exit_mm+0x188/0x190) from [<c0074954>] (do_exit+0x230/0x43c)
[18825.508159] [<c0074954>] (do_exit+0x230/0x43c) from [<c001fbc8>] (do_page_fault+0x0/0x324)
[18825.516421] [<c001fbc8>] (do_page_fault+0x0/0x324) from [<00000044>] (0x44)
[18825.555583] BUG: scheduling while atomic: rmmod/11929/0x40000002
[18825.561710] Modules linked in: g_ether(-) libcomposite configfs dm_crypt dm_mod rfcomm bnep bluetooth rfkill nvhost_vi
[18825.572748] CPU: 0 PID: 11929 Comm: rmmod Tainted: G D W 3.10.40-gdacac96_dbg1 #1
[18825.580979] [<c001746c>] (unwind_backtrace+0x0/0x138) from [<c001368c>] (show_stack+0x18/0x1c)
[18825.589801] [<c001368c>] (show_stack+0x18/0x1c) from [<c02d80d8>] (dump_stack+0x1c/0x20)
[18825.597944] [<c02d80d8>] (dump_stack+0x1c/0x20) from [<c00a5548>] (__schedule_bug+0x58/0x68)
[18825.606443] [<c00a5548>] (__schedule_bug+0x58/0x68) from [<c0876e34>] (__schedule+0xb0/0x690)
[18825.615055] [<c0876e34>] (__schedule+0xb0/0x690) from [<c00a8358>] (__cond_resched+0x2c/0x3c)
[18825.623701] [<c00a8358>] (__cond_resched+0x2c/0x3c) from [<c08777b8>] (_cond_resched+0x4c/0x54)
[18825.632652] [<c08777b8>] (_cond_resched+0x4c/0x54) from [<c014989c>] (unmap_page_range+0x158/0x174)
[18825.641792] [<c014989c>] (unmap_page_range+0x158/0x174) from [<c0149904>] (unmap_single_vma+0x4c/0x54)
[18825.651172] [<c0149904>] (unmap_single_vma+0x4c/0x54) from [<c014af3c>] (unmap_vmas+0x4c/0x6c)
[18825.659833] [<c014af3c>] (unmap_vmas+0x4c/0x6c) from [<c01508b0>] (exit_mmap+0xcc/0x200)
[18825.668023] [<c01508b0>] (exit_mmap+0xcc/0x200) from [<c006c8fc>] (mmput+0x58/0xfc)
[18825.675854] [<c006c8fc>] (mmput+0x58/0xfc) from [<c00739c8>] (exit_mm+0x188/0x190)
[18825.683533] [<c00739c8>] (exit_mm+0x188/0x190) from [<c0074954>] (do_exit+0x230/0x43c)
[18825.691664] [<c0074954>] (do_exit+0x230/0x43c) from [<c001fbc8>] (do_page_fault+0x0/0x324)
[18825.700036] [<c001fbc8>] (do_page_fault+0x0/0x324) from [<00000044>] (0x44)
[18825.708894] BUG: scheduling while atomic: rmmod/11929/0x40000002
[18825.715032] Modules linked in: g_ether(-) libcomposite configfs dm_crypt dm_mod rfcomm bnep bluetooth rfkill nvhost_vi
[18825.726460] CPU: 0 PID: 11929 Comm: rmmod Tainted: G D W 3.10.40-gdacac96_dbg1 #1
[18825.734813] [<c001746c>] (unwind_backtrace+0x0/0x138) from [<c001368c>] (show_stack+0x18/0x1c)
[18825.743535] [<c001368c>] (show_stack+0x18/0x1c) from [<c02d80d8>] (dump_stack+0x1c/0x20)
[18825.751717] [<c02d80d8>] (dump_stack+0x1c/0x20) from [<c00a5548>] (__schedule_bug+0x58/0x68)
[18825.760364] [<c00a5548>] (__schedule_bug+0x58/0x68) from [<c0876e34>] (__schedule+0xb0/0x690)
[18825.768955] [<c0876e34>] (__schedule+0xb0/0x690) from [<c00a8358>] (__cond_resched+0x2c/0x3c)
[18825.777554] [<c00a8358>] (__cond_resched+0x2c/0x3c) from [<c08777b8>] (_cond_resched+0x4c/0x54)
[18825.786358] [<c08777b8>] (_cond_resched+0x4c/0x54) from [<c014989c>] (unmap_page_range+0x158/0x174)
[18825.795518] [<c014989c>] (unmap_page_range+0x158/0x174) from [<c0149904>] (unmap_single_vma+0x4c/0x54)
[18825.805006] [<c0149904>] (unmap_single_vma+0x4c/0x54) from [<c014af3c>] (unmap_vmas+0x4c/0x6c)
[18825.813775] [<c014af3c>] (unmap_vmas+0x4c/0x6c) from [<c01508b0>] (exit_mmap+0xcc/0x200)
[18825.821969] [<c01508b0>] (exit_mmap+0xcc/0x200) from [<c006c8fc>] (mmput+0x58/0xfc)
[18825.829795] [<c006c8fc>] (mmput+0x58/0xfc) from [<c00739c8>] (exit_mm+0x188/0x190)
[18825.837448] [<c00739c8>] (exit_mm+0x188/0x190) from [<c0074954>] (do_exit+0x230/0x43c)
[18825.845404] [<c0074954>] (do_exit+0x230/0x43c) from [<c001fbc8>] (do_page_fault+0x0/0x324)
[18825.853832] [<c001fbc8>] (do_page_fault+0x0/0x324) from [<00000044>] (0x44)
When it dropped into gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 11929]
__raw_spin_lock_irqsave (lock=0x44) at include/linux/spinlock_api_smp.h:119
119 do_raw_spin_lock_flags(lock, &flags);
(gdb) l
114 * that interrupts are not re-enabled during lock-acquire:
115 */
116 #ifdef CONFIG_LOCKDEP
117 LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
118 #else
119 do_raw_spin_lock_flags(lock, &flags);
120 #endif
121 return flags;
122 }
123
(gdb) bt
#0 __raw_spin_lock_irqsave (lock=0x44) at include/linux/spinlock_api_smp.h:119
#1 _raw_spin_lock_irqsave (Cannot access memory at address 0x800
lock=0x44) at kernel/spinlock.c:145
#2 0xbf0677c8 in ?? ()
Cannot access memory at address 0x800
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) f 1
#1 _raw_spin_lock_irqsave (lock=0x44) at kernel/spinlock.c:145
145 return __raw_spin_lock_irqsave(lock);
(gdb) l
140 #endif
141
142 #ifndef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
143 unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
144 {
145 return __raw_spin_lock_irqsave(lock);
146 }
147 EXPORT_SYMBOL(_raw_spin_lock_irqsave);
148 #endif
149
(gdb) f 0
#0 __raw_spin_lock_irqsave (lock=0x44) at include/linux/spinlock_api_smp.h:119
119 do_raw_spin_lock_flags(lock, &flags);
(gdb) l
114 * that interrupts are not re-enabled during lock-acquire:
115 */
116 #ifdef CONFIG_LOCKDEP
117 LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
118 #else
119 do_raw_spin_lock_flags(lock, &flags);
120 #endif
121 return flags;
122 }
123
(gdb) p lock
$1 = (raw_spinlock_t *) 0x44
(gdb) p flags
$2 = 2684420243
So…there is a spinlock issue, the exact cause of which is not yet known.
Additional Data (while in device mode, tested modprobe and rmmod of g_ether dependencies libcomposite and configfs, which did not trigger any error…error occurred only rmmod of g_ether…this is a separate OOPS from the previous OOPS):
Just an update. The spinlock issue is independent of a specific gadget driver, e.g., g_ether and g_mass_storage will fail in the same way if removed while in device mode. Some of the important information regarding spinlocks is being optimized out even with debug settings, so it is slow tracing this.
The failure is showing itself in function composite_disconnect, but so far I haven’t pinpointed in the call stack where the spinlock issue actually starts, which is passed as a pointer in a number of structs kgdboc does not have access to. The reset_queues function in tegra_udc.c is suspicious, as it unlocks a spinlock and calls composite_disconnect passing data which is used to re-lock in composite_disconnect. It seems that this momentary unlock/relock has a strong chance to be the issue, but I can’t verify yet (and even if it is verified, there may be a reason why there was an unlock/relock which would complicate the fix).
The root cause should be cdev NULL in composite_disconnect() because composite disconnect gadget twice [ called composite_disconnect() twice ] and the gadget has already unbinded at second call . This is why the log shows
[ 61.683180] Unable to handle kernel NULL pointer dereference at virtual address 00000044 . (cdev == NULL)
In usb_gadget_remove_driver() function, udc->driver->disconnect() will call composite_disconnect() first, and usb_gadget_udc_stop() calls composite_disconnect() only when vbus is ON .
(usb_gadget_udc_stop() → tegra_vbus_session()-> reset_queues() → composite_disconnect())
In host mode , VBUS is default off and waiting for HW interrupt, this is why we didn’t see this issue on host mode.
When switching to device mode(echo 1 > sys/devices/platform/tegra-otg/enable_device), the vbus would be turn ON:
[ 2611.960608] otg state changed: HOST → SUSPEND
[ 2611.964593] tegra-ehci tegra-ehci.0: remove, state 4
[ 2611.964604] usb usb3: USB disconnect, device number 1
[ 2611.965315] tegra-ehci tegra-ehci.0: USB bus 3 deregistered
[ 2611.965457] otg state changed: SUSPEND → PERIPHERAL
[ 2611.965462] tegra_udc: tegra_vbus_session(1559) turn VBUS state from off to on
…
If unplugging device from USB port, it will trigger an interrupt and off VBUS. In this case tegra udc driver will not call composite_disconnect() twice because that VBUS is off .
This issue should only happen when executing rmmod case.
PratikPatel,
Can you try below patch and see if the issue goes away?
— a/drivers/usb/gadget/composite.c
+++ b/drivers/usb/gadget/composite.c
@@ -1520,6 +1520,9 @@
/* REVISIT: should we have config and device level
* disconnect callbacks?
*/
if (!cdev )
return;
spin_lock_irqsave(&cdev->lock, flags);
if (cdev->config)
reset_config(cdev);
I have facing strange issue with Jetson Board with R21.4. For me USB OTG is working good in host mode. I connected Thumb drive and tested the same. But When I tried to test the Device mode nothing happens (It does not work for me).
Before trying to test the device mode I followed the following steps,
echo 0 > /sys/devices/platform/tegra-otg/enable_host
echo 1 > /sys/devices/platform/tegra-otg/enable_device
//I switched to device mode from Host mode
//Then I tried mass storage
modprobe g_mass_storage file=/dev/mmcblk0p1
Also In another boot I tried ether also
modprobe g_ether
In the PC side nothing happens. I tried with different cables and different PCs. Am I missing something?
The gadget interface is just a stub, you still have to write more code. As an example a mass storage device (or any other device) sends the information which would be present from “lsusb -v” for a specific device…including vendor ID and product ID. When you set up the mass storage, your backing store is set up, but none of the vendor-specific information. On your PC you might plug in a mass storage device (like a flash card reader) and see what lsusb -v shows for it…technical details like bulk versus isochronous modes are set, but everything else you need to add.