Both cards are the 64 GB model. The same problem was also observed on some ADATA cards. The SD cards tested appear to work in other systems without any issue.
The same error is reproducible on the Jetson DevKit (L4T 28.1), but only with an SD card extender connected to the DevKit. The SD-to-SD extender used with the DevKit is similar to Amazon.ca.
My questions:
Is there any advice on how to debug this problem?
What is the recommended method to qualify the SD card interface with the TX2? Are there any options available in software?
Are there available APIs to tune the SD card interface, similar to the ones available for the USB interface?
The card appears in lsblk, but it seems to take too long to attach properly to the TX2 because our line in /etc/fstab does not mount the drive on boot (subsequent sudo mount -a works).
Unfortunately this is not our top priority at the moment and I’m a bit too busy to give much time to this. Is there anything you guys can look into internally? It would be extremely helpful.
If we don’t have that card, the debug may not be proceeded. I’ll ask internal team to help check.
Please keep following up this thread if you are available.
Here’s more dmesg logs (pretty much full I believe) and I’m not sure what additional error logs you want, but I’ve included some relevant lines from syslog.
With respect to replicating the issue, everything I’ve seen online about this issue seems to come down to the card being UHS (specifically I’ve seen people reporting trouble with UHS-1, but I’d guess any UHS will produce the problem). The SDHCI driver/module seems to recognize the device without issue, but repeatedly fails to “tune the hardware.” As I’ve mentioned previously, the behavior seems fairly non-deterministic. Sometimes it mounts no problem (fstab and all), sometimes comes up (i.e. in lsblk) but seemingly too late for fstab, and sometimes never shows up at all as if it wasn’t plugged in.
As an aside, we’re using the TX2 with a custom PCB. We’ve used TX1 with this PCB previously and had no issues with SD cards, which makes me think the PCB is not to blame (we had device tree problems initially, but solved those).
Syslog:
Jan 28 14:47:45 new_unit_placeholer kernel: [ 3.599472] mmc2: tuning execution failed
Jan 28 14:47:45 new_unit_placeholer kernel: [ 3.599482] mmc2: error -5 whilst initialising SD card
Jan 28 14:47:45 new_unit_placeholer kernel: [ 3.971597] sdhci: Tuning procedure failed, falling back to fixed sampling clock
Jan 28 14:47:45 new_unit_placeholer kernel: [ 3.971603] mmc2: tuning execution failed
Jan 28 14:47:45 new_unit_placeholer kernel: [ 3.971615] mmc2: error -5 whilst initialising SD card
Jan 28 14:47:45 new_unit_placeholer kernel: [ 4.344619] sdhci: Tuning procedure failed, falling back to fixed sampling clock
Jan 28 14:47:45 new_unit_placeholer kernel: [ 4.344626] mmc2: tuning execution failed
Jan 28 14:47:45 new_unit_placeholer kernel: [ 4.344637] mmc2: error -5 whilst initialising SD card
Jan 28 14:47:45 new_unit_placeholer kernel: [ 4.623480] mmc2: error -110 whilst initialising SD card
Could you also reproduce this issue on devkit?
As you know that patch from topic 1031139 was already a fix for UHS mode. Thus, I don’t think we could hit this issue w/ arbitrary card.
Anyway, we will run the test with all uhs card we have. Thanks.
I actually cannot reproduce this on our devkit. It seems that this only happens on our custom board, which is strange because as I mentioned we didn’t have trouble with TX1 on this board.
To test our card with the devkit I needed to use a micro → full SD card adapter. Not sure if that would make any difference.
Do you have any idea as to what the issue could be now? Seems like it may not be drivers or the card, so does that leave us with a device tree problem? Something strange with our board + TX2? Are there differences between TX1 and TX2 SD card pinouts?
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index ff48515..956cb9d 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1857,12 +1857,11 @@
/* Keep clock gated for at least 10 ms, though spec only says 5 ms */
mmc_delay(10);
host->ios.clock = clock;
- host->skip_host_clkgate = false;
mmc_set_ios(host);
/* Wait for at least 1 ms according to spec */
mmc_delay(1);
-
+ host->skip_host_clkgate = false;
/*
* Failure to switch is indicated by the card holding
* dat[0:3] low
I’ve been attempting to roll out the patch to our fleet of devices, and the confidence seems to have been a bit premature. The device I did in house seemed to totally fix the problem (SD came up each time across very many reboots), however I’m not getting the same results now that I’ve done it to more devices and seen more reboots. I’m seeing the below output repeatedly in dmesg on devices after patching, and the same non-deterministic card behavior of mounting/appearing in lsblk. It might be happening less often now, but happening ever is not workable for us. Please advise.