installing nvme drive?

Skypuppy · May 15, 2017, 1:53am

Can one install an nvme drive on the TX2? There is an m.2 slot on it, or are there several designations for the phrase “m.2?”

linuxdev · May 15, 2017, 3:28am

It is an m.2 “Key e”. m.2 has several different standard layouts, the one which is used is labeled by its “key”. See:
https://en.wikipedia.org/wiki/M.2

At that URL you will see these interfaces for key e:

PCIe ×2, USB 2.0, I2C, SDIO, UART and PCM "WiFi/Bluetooth cards"

Sorry, it isn’t for NVMe unless the NVMe is PCIe and key e.

gyuhyong · May 15, 2017, 3:52am

so is there any ssd that can be installed into the onboard pcie slot on the Jetson tx2?

snarky · May 15, 2017, 5:05am

Yes, there exist SSDs that can be put in the PCIe slot.

You can also put a SSD on the SATA interface.

Some third party carrier boards also provide a M.2 Key M interface, which is what NVMe SSDs typically use.

linuxdev · May 15, 2017, 1:39pm

If you do test out an NVMe drive be sure this is enabled in the kernel:

CONFIG_BLK_DEV_NVME=y

…e.g., to see if the existing kernel has this feature enabled:

gunzip < /proc/config.gz | egrep CONFIG_BLK_DEV_NVME

Skypuppy · May 25, 2017, 10:19am

Gave up on nvme Key E version after an hour of searching. :) So, I’ll use the pcie slot.
Already had an adapter board and just ordered an nvme drive for it. Should be here in a couple of days. Then, I have to find out how to boot from the emmc (since there is no other choice?) but have root start from the nvme drive. Assuming that is possible, how difficult is the next step of duplicating all the work already done on the emmc (which is nearly full) over onto the nvme drive? Something using “dd,” I hope?
Wonder why they went with this “flash” procedure instead of just normal disk routines. Sure complicates this beast.
I’ve seen mention of the combo boot methods in here somewhere but I don’t recall if there were specific instructions on how to do so.

linuxdev · May 25, 2017, 2:34pm

If you edit “/boot/extlinux/extlinux.conf” you will find the APPEND key/value pair is for passing kernel command line arguments at the moment when Linux takes over from the boot loader. The “root=/dev/mmcblk0p1” is for eMMC. If the device to use were instead “/dev/sda1” (which is just an example, an NVMe drive might differ in naming), then you would use “root=/dev/sda1”. In this latter case, if sda1 has a valid root partition on it, then control will transfer from eMMC to now go to sda1 instead.

Note that where boot configuration files are and where the root partition is can be separate. The flash command line (versus kernel APPEND arguments) alters where boot files are searched for, and not just root partition. If you have a serial console and have added an entry for sda1 without deleting the original entry then it is possible to select either sda1 or eMMC via serial console. One could remove sda1 and still use eMMC for a rescue system. If instead you flashed to sda1 (moving boot files and not just root partition), then not having sda1 connected would likely make the system unbootable (you could probably get around that with some work, but you wouldn’t simply use serial console to select eMMC for rescue).

Most people are better off adjusting extlinux.conf for the different Linux rootfs partition instead of changing flash.sh parameters to achieve an alternate root partition.

Skypuppy · May 25, 2017, 5:43pm

Thanks, Linuxdev. That is almost exactly how I do it on RPi’s and Beaglebone Blacks and even once with a desktop. (with some name changes to protect the guilty. Sometimes I wish there weren’t so many Linux versions or that their developers would ALL stick to the same name/file conventions. But that’s a different rant.)

So, easiest & safest method:

check/set the kernel config to ensure nvme is turned on.
boot into emmc normally
sudo dd /dev/mmcblk0p11 /dev/sda1
change /boot/extlinux/extlinux.conf to set “root=/dev/sda1” or the appropriate device name
reboot and hold breath.

Sound like a good recipe?

I like the idea of the “dual” boot. Maybe I could move it next to a desktop just to do that – but would I have to use the serial console for every boot after that in order to choose? Or could I make one the nvme the default with the failover being emmc?

linuxdev · May 25, 2017, 8:00pm

The root partition is “/dev/mmcblk0p1”, you won’t get a root partition by copy of mmcblk0p1 (perhaps the extra “1” was a typo).

The steps are basically good, but I prefer to create a second boot entry in extlinux.conf instead of altering the original…then picking this with a serial console at boot time by interrupting U-Boot. This way if something goes wrong you can just boot like normal. Should it turn out results are good, then you can set the “default” in extlinux.conf to point at your alternate entry…the original would still be available by serial console if desired. If something does go wrong and you’ve changed the default, then you might have some work ahead of you in U-Boot to try and not lose your install…but that would require you to have a serial console, so there you are back to serial console again.

Very quickly after boot starts you can interrupt U-Boot with any keystroke. This gets you to its command line. If that passes by without hitting a keystroke you can again interrupt U-Boot, but it is to pick a kernel (numbered 1-to-n). Should you hit a key too soon and get to the command line you just run the command “boot” and then interrupt…this would again place you in kernel select.

If you do not have a serial console, then notice at the top of extlinux.conf is key/value pair “DEFAULT”. On the boot entry, note “LABEL” (not “MENU LABEL”…this is for command line labels…LABEL must be unique among all entries, MENU LABEL is just for identifying to a human). If no entry is picked then the entry with “DEFAULT” automatically runs (e.g., “primary”).

The extlinux.conf and files in “/boot” which are referred to are determined before the Linux kernel ever loads…if you flashed to eMMC originally (this is default), then the “/boot” of an SD card or SATA drive are ignored once the kernel loads. If you flashed to mmcblk1p1, then the SD card must contain all of the “/boot” files, and eMMC’s extlinux.conf would be ignored. When you talk about failover what you really are talking about is within the U-Boot environment…U-Boot looks for ext4 partitions in a certain order based on environment variables…if U-Boot does not find a Linux partition, then it moves on to testing the next device for a partition. The first partition it finds with configuration is the one it uses.

You could alter the order of searching for partitions (just go to U-Boot command line via serial console and type “help”…note the printenv and related commands to explore…just don’t “save” a change and next reboot will forget any mistakes you made, e.g., experimenting with boot order variables). However, if you have a valid but non-bootable partition, it probably won’t move on to check the next parition…it’ll likely just sit there. So the trick if the boot partition is messed up is to put a correct partition with boot information on another device, and then edit that environment variable to point at this partition before eMMC. Note that a removable device allows you to skip that device because no partition was found…you can’t remove eMMC, so if the boot order checks first for eMMC and an ext4 partition is found but set up wrong, then you are stuck without editing that environment to go to a new device.

One option would be to set first search for SD, then SATA, then eMMC (or SATA then eMMC if you want your SD card to be data and to not interrupt boot when ext4 formatted). Under that scheme, if SD card is not plugged in, then it tries SATA; if SATA is not plugged in, then it tries eMMC. Take a look at the printenv variables (they expand like macros or get substituted directly…see macro “boot_cmd”) and watch how they expand and substitute…lots of possibilities there.

Skypuppy · May 27, 2017, 10:21pm

Okay, I’m confused. Yes, again.

Have the NVME drive installed in the PCI slot. The kernel sees it. config.gz shows NVME is enabled for the kernel. hwinfo shows the kernel can read it, too, and even tell me correctly that it is a Toshiba etc. However, none of the system utilities can utilize it, not gparted, not testdisk, etc. When I try to mount it under /nvme, I get the error that it’s not a block device! How strange. Running “ls -al” on /dev/nvme* shows it as marked as a character device rather than a block device. How weird is that?
Btw, the nvme drive is straight from the factory so am unsure about formating or other initialization issues, hoping that whatever quirks from the TX2 would be resolved by initializing/formatting it on the TX2. Maybe I should init it in another Linux desktop instead?

So I initialized the nvme in a desktop and it all went fine. Now, back in the TX2, the kernel no longer sees it and hwinfo doesn’t show any info for it either.

Think I’ll take a break and come back to it.

linuxdev · May 27, 2017, 10:51pm

There wouldn’t be any default partitioning (that’s be more of a USB thumb drive or SD card thing). It is odd it would show as a character device, though I’ve never had an NVMe drive, so I don’t know. However, does “sudo gdisk /dev/nvme” (or whatever device special file name it has) work? Also, what is shown with “sudo lsblk”?

Skypuppy · May 28, 2017, 12:40am

For some strange reason, it appears that Ubuntu approaches and nvme as character and block device (didn’t we drop that concept back in 1492?) Anyway, the nvme disk does not show up at all since the ‘disk’ was formatted and then reinserted back into the TX2.

lsblk shows the same now as before the nvme was initialized:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
mmcblk0rpmb 179:8 0 4M 0 disk
mmcblk0 179:0 0 29.1G 0 disk
├─mmcblk0p1 179:1 0 28G 0 part /
├─mmcblk0p2 179:2 0 4M 0 part
├─mmcblk0p3 179:3 0 256K 0 part
├─mmcblk0p4 179:4 0 256K 0 part
├─mmcblk0p5 179:5 0 3M 0 part
├─mmcblk0p6 179:6 0 2K 0 part
├─mmcblk0p7 179:7 0 604K 0 part
├─mmcblk0p8 259:0 0 500K 0 part
├─mmcblk0p9 259:1 0 2M 0 part
├─mmcblk0p10 259:2 0 6M 0 part
├─mmcblk0p11 259:3 0 2M 0 part
├─mmcblk0p12 259:4 0 128M 0 part
├─mmcblk0p13 259:5 0 32M 0 part
├─mmcblk0p14 259:6 0 64M 0 part
├─mmcblk0p15 259:7 0 256K 0 part
├─mmcblk0p16 259:8 0 256M 0 part
└─mmcblk0p17 259:9 0 647.2M 0 part

What’s equally odd is config.gz remains the same as well:

root@tegra-ubuntu:/dev# gunzip < /proc/config.gz | egrep NVME
CONFIG_BLK_DEV_NVME=y

CONFIG_NVMEM is not set

and hwinfo from last boot shows:

P: /devices/10003000.pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0
E: DEVPATH=/devices/10003000.pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0
E: DRIVER=nvme
E: ID_PCI_CLASS_FROM_DATABASE=Mass storage controller
E: ID_PCI_INTERFACE_FROM_DATABASE=NVM Express
E: ID_PCI_SUBCLASS_FROM_DATABASE=Non-Volatile memory controller
E: ID_VENDOR_FROM_DATABASE=OCZ Technology Group, Inc.
E: MODALIAS=pci:v00001B85d00006018sv00001B85sd00006018bc01sc08i02
E: PCI_CLASS=10802
E: PCI_ID=1B85:6018
E: PCI_SLOT_NAME=0000:01:00.0
E: PCI_SUBSYS_ID=1B85:6018
E: SUBSYSTEM=pci
E: USEC_INITIALIZED=14736271
E: net.ifnames=0

P: /devices/10003000.pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0/nvme/nvme0
N: nvme0
E: DEVNAME=/dev/nvme0
E: DEVPATH=/devices/10003000.pcie-controller/pci0000:00/0000:00:01.0/0000:01:00.0/nvme/nvme0
E: MAJOR=237
E: MINOR=0
E: SUBSYSTEM=nvme
E: USEC_INITIALIZED=14736946
E: net.ifnames=0

P: /devices/10003000.pcie-controller/pci0000:00/0000:00:01.0/pci_bus/0000:01
E: DEVPATH=/devices/10003000.pcie-controller/pci0000:00/0000:00:01.0/pci_bus/0000:01
E: OF_COMPATIBLE_N=0
E: OF_FULLNAME=/pcie-controller@10003000/pci@1,0
E: OF_NAME=pci
E: OF_TYPE=pci
E: SUBSYSTEM=pci_bus
E: USEC_INITIALIZED=14736078
E: net.ifnames=0

************* but the same above nvme sections are NOT in the hwinfo report now! **********
only this line is the same between the two:
nvme: module = nvme
in the devices and module sections up top.

Maybe if I boot again, it will come back. :)

EDIT: quick follow up:
Oh, God! I rebooted and all the hwinfo pcie driver info is indeed back. But lsblk, gparted, and testdisk still don’t see the nvme drive. And “nvme0” is back in the /dev directory, and still as character device:
nvidia@tegra-ubuntu:/dev$ ls -al nvme*
crw------- 1 root root 237, 0 May 27 19:43 nvme0

I am completely lost at this point. And the emmc area is almost full.

Skypuppy · May 28, 2017, 1:00am

What lets me know something is up with the TX2 version of Ubuntu is when I initialized the nvme drive, it was on a desktop also running 16.04 and it had never seen an nvme drive before but that box handled it just fine with no hiccups or problems. Nothing at all like the groping we’re having to do with the TX2 version of Ubuntu.

linuxdev · May 28, 2017, 1:29am

Was a front end for kernel config used, e.g., “make nconfig” or “make menuconfig”? Or was the .config directly edited? It would be possible for a dependency to be missing if directly edited, though I’m kind of grasping at that. Since the driver is integrated and not a module there won’t be any module issues…but what is the current “uname -r”? If “uname -r” changed, then perhaps a feature which was required as a module is not where it needs to be (a dependency).

FYI, hwinfo admits it cannot find all devices. Also, since you integrated this feature, you will not get any of the messages you might get if a module version of the driver were inserted or removed. Do look at dmesg and see if there are any nvme notes, e.g.:

dmesg | egrep -i "(nvme|disk|block)"

Do you have file “/sys/bus/pci/rescan” (I’m not where I can verify if the TX2 has this file or not)? If so, try:

sudo echo "1" > /sys/bus/pci/rescan

What is the output of:

sudo find /dev -name 'nvme*'

Also, this is PCIe, so what is the output of “lspci”? If the drive shows up there, try “lspci -vvv” and show just the output for the drive.

Finally, what happens if you open the device special file with “sudo gdisk”?

Skypuppy · May 28, 2017, 2:23am

Come to think of it, one of the JetsonHacks did modify the kernel so that the realsense R200 could be utilized. Maybe that has something to do with it. I could experiment with reflashing and NOT do the R200 hack and see if the nvme drive works. Sure would like to be able to dd all the emmc before I did that so I could do a complete image restore. There’s probably 30 or more hours work in this image.

For your other requests:
dmesg output:

nvidia@tegra-ubuntu:~$ dmesg | egrep -i “(nvme | disk | block)”
[ 1.665005] nvmap_heap_init: nvmap_heap_init: created heap block cache
[ 14.794438] nvme 0000:01:00.0: enabling device (0000 → 0002)
[ 14.948087] nvme 0000:01:00.0: Failed status: ffffffff, reset controller <----------------
[ 75.521172] nvme 0000:01:00.0: Timeout I/O 1 QID 0 <----------------
[ 75.526076] nvme 0000:01:00.0: I/O 1 QID 0 timeout, reset controller <----------------
[ 135.512644] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 196.504516] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 256.517911] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 317.504265] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 377.512954] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 438.503935] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 498.518220] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 559.502780] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 619.512464] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 680.504996] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 740.520705] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 801.503983] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 849.736068] CPU: 4 PID: 2585 Comm: nvme Not tainted 4.4.15 #1
[ 861.510958] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 922.501757] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 982.516410] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1043.499129] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1103.508484] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1164.501321] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1224.517741] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1285.505808] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1345.509651] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1405.513264] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1466.500716] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1526.516032] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1587.499255] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1647.506409] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1708.501479] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1768.512526] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1829.495514] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1889.502474] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 1950.493397] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 2010.508330] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 2071.491641] nvme 0000:01:00.0: Timeout I/O 1 QID 0
[ 2131.501967] nvme 0000:01:00.0: Timeout I/O 1 QID 0

************ rescan
no output but no fail message either

************ find

crw------- 1 root root 237, 0 May 27 20:38 nvme0

(still only the character file)

***************** lspci

01:00.0 Non-Volatile memory controller: OCZ Technology Group, Inc. Device 6018 (rev 01) (prog-if 02 [NVM Express])
Subsystem: OCZ Technology Group, Inc. Device 6018
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 379
Region 0: Memory at 50100000 (64-bit, non-prefetchable) [disabled]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [178 v1] #19
Capabilities: [198 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [1a0 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=400us
Kernel driver in use: nvme

So it’s failing during kernel initializations, even though the driver is there. Beyond that, I can conclude nothing from the lspci and other outputs because I’m not that deep into pci hardware internals. Sorry. Did you need the lspci info about the pci driver, too, instead of just the nvme driver?

linuxdev · May 28, 2017, 5:09am

There is a good explanation of clone and restore for the TK1. Mostly this applies, but the actual flash and clone commands are explained separately. See this first:
[url]http://elinux.org/Jetson/Cloning[/url]

Then for the actual clone commands from a TX2 see this:
[url]https://devtalk.nvidia.com/default/topic/1000105/jetson-tx2/tx2-cloning/[/url]

The log notes show a troubling I/O error. lspci shows the drive is PCIe v3 capable; the slot it connects to is of course PCIe v2, which it should throttle back to. Unfortunately it throttles back to PCIe v1 speeds (the LnkSta of 2.5GT/s is actual speed where 2.5 is PCIe v1). Despite throttling back it does have advanced error reporting and there is no first error (a NULL pointer means there is no list of errors, though not all errors are designed to report). Additionally, it does report it found the nvme driver, so the PCIe end should be functional.

It has me curious as to why there is I/O timeout since PCIe is functioning (though not at a very good level). This would of course explain why programs like partitioning programs can’t access the device. It would be interesting to see what the “lspci -vvv” shows from a desktop host where it works. Incidentally, if you surround your lspci output with the mouse highlight and then click on the “Code Block” icon (“</>”) it’ll preserve whitespace and add scroll bars.

Skypuppy · May 28, 2017, 8:21am

Let’s see if this works. :) Is this everything you asked for, @linuxdev? It’s all in a file but I don’t see a way to #include a file in this forum, and I might have missed something in all the copy and pasting. Skypuppy

This entire output came from the desktop Ubuntu machine, where the nvme drive works just fine.

Your requests: ----------------------------------------------------------->

gdisk /dev/nvme0 gives error, character device.

*************************************************
gdisk -l /dev/nvme0n1 yeilds:
   ***
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 500118192 sectors, 238.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 0ED8DAE4-744E-4FA9-AA99-2DD285BC15DA
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 500118158
Partitions will be aligned on 2048-sector boundaries
Total free space is 2669 sectors (1.3 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       500117503   238.5 GiB   8300

***************************************
gdisk -l /dev/nvme0n1p1 yeilds:
   ***
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries.
Disk /dev/nvme0n1p1: 500115456 sectors, 238.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 670C2838-DC0F-4C0D-9DDB-9373B77BAA62
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 500115422
Partitions will be aligned on 2048-sector boundaries
Total free space is 500115389 sectors (238.5 GiB)

**********************************************************
lsblk yeilds:
   ***
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sr0          11:0    1  1024M  0 rom 
sda           8:0    0   1.8T  0 disk
├─sda2        8:2    0   1.8T  0 part /
├─sda3        8:3    0  15.9G  0 part [SWAP]
└─sda1        8:1    0   512M  0 part /boot/efi
nvme0n1     259:0    0 238.5G  0 disk
└─nvme0n1p1 259:2    0 238.5G  0 part

************************************************************
lspci -vvv:
   ***

*************************************************************
02:00.0 Non-Volatile memory controller: OCZ Technology Group, Inc. Device 6018 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: OCZ Technology Group, Inc. Device 6018
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 42
        Region 0: Memory at fe400000 (64-bit, non-prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [178 v1] #19
        Capabilities: [198 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [1a0 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
                          PortCommonModeRestoreTime=255us PortTPowerOnTime=400us
        Kernel driver in use: nvme
        Kernel modules: nvme

Skypuppy · May 28, 2017, 4:26pm

I wonder if the TX2 has a problem with GPT?

linuxdev · May 28, 2017, 6:10pm

One reason for looking at this with gdisk is that gdisk is designed for GPT partitions…fdisk was created when only old style BIOS existed and UEFI had not yet been invented (UEFI got around BIOS partition limitations and can deal with many more than four primary partitions). Jetsons use GPT partitions, though they can also use BIOS partitions.

I’ve never owned an NVMe drive, so I had to research a bit. Turns out the NVMe driver has a character device in addition to the block device. The character and block devices are paired and make available namespaces. The number of “nvme0” is the physical device with nvme0 being the first NVMe drive. The “n” appended to the name is the namespace, where “n0” is the character device namespace and “n1” is the block device namespace. “nvme0n1” is the first block device namespace provided by the first NVMe drive. This is what you’d format. “nvme0n1p1” would be the first partition of the first NVMe drive’s first namespace. You could format nvme0n1p1 if it shows up.

Since you found file “/dev/nvme0” on your Jetson, but not the namespaces, I suspect there is a kernel (or other) config to support namespaces. I am going to guess that NVMe control commands go through nvme0n0, and I/O goes through nvme0n1, and thus the I/O error (a case of the error message not being explicit…perhaps a better error message would be the original I/O error supplemented by an additional note that the block device was not found). Or perhaps firmware and/or user space support is missing if it isn’t a missing kernel config.

Try installing package “nvme-cli”. Also, does anyone know if there is any kind of firmware required for this NVMe drive to have its block device visible? Does a JTX2 require anything in the device tree for NVMe block devices?

Skypuppy · May 28, 2017, 9:11pm

And narrowing down the problem a bit is the fact that even when the nvme is formatted and partitioned on another computer, the TX2 can see it as the character device but fails to see it as the block device.

Is ANYONE else using an nvme drive on their TX units?

Am I the only one who is running out of space on the emmc in the first week of ownership? :) :)

Btw, I haven’t done any kernel configs since the 1980’s and it has changed enormously. Thank God. So I’m a bit leery of monkeying around with the kernel, especially since I have no way to make a backup image of the emmc. I can’t even get iscsi to cooperate. :( I’m also concerned that if I connect a multi-gig SATA drive to the TX2, the combinations will draw too much power and overwhelm something in the TX2, possibly destroying something (let the magic smoke out.)

Thanks.