Jetson Xavier cloning

yairhav · December 3, 2019, 2:02pm

Is there any tool or procedure for cloning the entire EMMC device?

jchaves · December 3, 2019, 7:51pm

Hi yairhav,

Typically the procedure that I use is as follows:

Apply patch in https://devtalk.nvidia.com/default/topic/1039548/jetson-agx-xavier/xavier-cloning/2/?offset=29#5330276 to flash.sh script
``` #Save cloned APP partition file in host computer sudo ./flash.sh -r -k APP -G backup.img jetson-xavier mmcblk0p1 ```
``` sudo cp backup.img.raw bootloader/system.img ```
``` #Restore back the cloned APP partition, don't forget the "-r" option sudo ./flash.sh -r -k APP jetson-xavier mmcblk0p1 ```

A word of caution, this procedure only supports cloning APP partition for now. Support for cloning other partitions in the eMMC is still under development by NVIDIA. If you have to restore the device tree and kernel image, you will need to run these commands additionally:

sudo ./flash.sh -r -k kernel -K kernel/Image jetson-xavier mmcblk0p1
sudo ./flash.sh -r -k kernel-dtb -d kernel/dtb/<device-tree.dtb> jetson-xavier mmcblk0p1

Andrey1984 · January 31, 2020, 10:50pm

for entire eMMC cloning, you may also like to try dd method which will create a raw image that will be converted into deployable sparse image:
First, stop disk at Xavier:

sudo su
echo u > /proc/sysrq-trigger

Second, take raw image:

dd if=/dev/mmcblk0p1 of=/path/testimage.raw

Moreover, the image could be taken over the network e.g. using one of two methods below:

dd if=/dev/mmcblk0p1 | ssh user@hostpc dd of=/data/testimage.raw

and can be done with netcat:

sender:	sudo dd if=/dev/mmcblk0p1 | netcat <ip_address> <port>
reciever:	netcat -l -p <port> > your_image_file

Third, make sparseimage [at Host PC]

./mksparse -v --fillpattern=0 ~/testimage.raw system.img

Fourth, deploy the sparse image to other devices:

sudo ./flash.sh -r jetson-xavier mmcblk0p1

permanent url: https://elinux.org/Jetson/AGX_Xavier_Alternative_II_For_Cloning

linuxdev · February 1, 2020, 8:56pm

The interesting thing about the sysrq method of cloning is that it gets around the worry of having a disk write in the middle of a read (you don’t have to worry about corruption of the image, so this is basically as good as a clone from a system in recovery mode…but the clone will be of the running state instead of the shutdown state). The “sysrq” methods are often underrated in what can be done with them. I am reminded that developers not familiar with sysrq will find this worth study.

For those interested, see:
https://en.wikipedia.org/wiki/Magic_SysRq_key
https://www.kernel.org/doc/html/v4.11/admin-guide/sysrq.html

Note that different parts of sysrq can be enabled/disabled via a numeric mask. If for example umount is desirable, but you want to protect from excessive write wear on a computer (for example because the disk is an SSD and the public has access to it as a kiosk), then you could disable the function. To see the current mask:

cat /proc/sys/kernel/sysrq

A “1” is to allow all (the above URL for kernel.org lists the masks). Not all architectures can use all sysrq, but as a developer, you probably want this to be “1”. On a shipped unit where you don’t want to allow special interaction, e.g., a kiosk, you probably want this to be “0”

To actually customize this, add this line in “/etc/sysctl.conf” (the cat of the “/pro/sys/kernel/sysrq” file will reflect this upon reboot:

kernel.sysrq=1

A special note about the “echo u > /proc/sysrq-trigger” step from @Andrey1984: This is why you can get a clone without corruption. The system is running, but only in RAM…the hard disk itself is now mounted read-only. There is one caveat though, that a sysrq umount is immediate, and that you could lose data which is currently in the process of writing. To get around that, use sync prior to this:

sudo su
echo s > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
...

The “s” calls sync. The logs will show “emergency sync”. The reason for calling this twice is that although a sync starts immediately, you don’t now that it finished. The second sync will only start running after the first sync completes, and so the second sync is only to know the first completed…the second would not have run if the first had not completed. If there is a program actively writing though, then in the time it takes to echo ‘u’ for umount, more writing and caching could have started. However, the odds of having a non-corrupt image this way is extremely high.

You do not want to sync solid state memory unless it is mandatory. This causes wear. The cache inside such a disk is designed to do wear leveling, and each time the disk actually flushes cache there is wear. However, the limits are rather high on modern SSD tech, and this is something which has to be done each time there is a “normal” shutdown anyway. In the case of a sync twice in a row as a disk umount precaution, then if there has not been any addtional cache written in the time it takes to hit sync the second time, there would be nothing to flush anyway, and thus no wear by the second flush. Just don’t do the sync unless it is really needed. Don’t allow the sync to be available to public devices you have security issues with.

I’ve not found any JTAG debuggers that work with Jetsons, and as such some people doing kernel debugging might be interested in kdb and kgdb for a software version of debugging parts of a kernel. Sysrq is how you enter the correct states, so if you have an interest in kdb/kgdb, then you should check out sysrq first.

You can also force dumps of information about what the kernel is currently doing. In the case of a hard lockup with no information available on serial console, or perhaps if serial console was not available and the situation is hard to reproduce, then you can use a forced dump to gather information.

NOTE: I like having this enabled on the development PC as well if there is a chance of something locking up the PC. Then I can sync and umount before power off. A number of people have had filesystem corruption here in the past due to invalid shutdown. Sysrq is a way to avoid that even if the system is locked. The “echo” command is not the only way to run those sysrq commands if you have a keyboard directly connected. As an example, this syncs twice and umounts:

ALT-SYSRQ-s
ALT-SYSRQ-s
ALT-SYSRQ-u

(this means hold down alt key, then hold down sysrq key…same button as the prtscn button…then tap the ‘s’ key or ‘u’ key, and release…monitor “dmesg --follow” and try ALT-SYSRQ-s just ot see it once)

Andrey1984 · February 10, 2020, 8:18pm

@linuxdev, Thank you for the extended explanation!
Is it right that when using the mksparse image and further deployment - the target device Jetpack version must match the source device from which the image has been taken?
If so, is that requirement lifted in case of raw dd cloning, without mksparse image conversion?
Thanks

linuxdev · February 11, 2020, 11:26pm

This is correct…other than minor bug fix releases, the version for the content on the root file system must match the JetPack used for flashing.

“mksparse” is just what I call a “poor man’s compression” for the filesystem…it reference counts the empty inodes rather than actually copying the millions of empty locations on the file system partition. The mksparse tool won’t care which raw rootfs release is being converted to sparse, but the release versions must still match correctly between the raw filesystem and the flashing tool. Think of “mksparse” as something like “gzip”, except it is a one-way trip (only the system being flashed knows how to unzip…the mksparse content seems incompatible with the open source sparse image tools).

A “dd” clone is no different from a recovery mode clone, except there are possibly issues with reading a live file system in the middle of changing. The filesystem may also have different content when running (e.g., temp files), versus when cloned since cloning is with a system which has been shut down.

Both “dd” images and cloned images can be passed through mksparse, and when restored, the content will be preserved. Preserved content must be compatible with the rest of the flashed software, e.g., you can’t restore a JetPack3.3 clone (by any method) into a JetPack4.3 system.

Andrey1984 · February 12, 2020, 3:57am

Hi @linuxdev,
Thank you for your response,

Could you remind what are the steps for the recovery mode clone, please?

linuxdev · February 12, 2020, 9:57pm

This depends on the release, but for the current releases, with recovery mode running and the USB-C connected:

sudo ./flash.sh -r -k APP -G my_backup.img jetson-xavier mmcblk0p1

Make sure you keep the “my_backup.img.raw” file (you get both the raw and sparse files) if you ever want to use the image in any way other than flashing. I throw away the sparse file (you can always use mksparse with “0” fill if you want a sparse from the raw, but you cannot do the reverse since open source sparse tools do not work on this).

Jaime_Element · March 11, 2020, 7:23pm

Hey guys, thanks for the information here I had been looking for this solution for a while… this should also work on the TX2’s correct? Currently we have a slightly different procedure, but if we can use this for both it would be good to standardize it in our workflow.

linuxdev · March 11, 2020, 11:44pm

This should work on any of the eMMC Jetsons. In some cases there are minor differences between major release versions, but what you see above should be valid for any of the R32.x releases.

Always beware when working with clones to use a rootfs image only from the same release of JetPack/SDKM which is flashing. Also beware that other binary partitions may have signing steps which won’t allow a dd copy to work when directly flashed to the Jetson.

koja.gafur · September 9, 2020, 4:05pm

Hi, sorry my bad English,
could you describe the exact steps for cloning an image eMMC Jetson Xavier.

Andrey1984 · September 9, 2020, 4:18pm

are you loking to clone full eMMC entirely? or just APP partition?
for purposes of storing as backup archive or to transfer to another device?
typically dd works for cloning or flash.sh method
steps: Jetson/AGX Xavier Alternative II For Cloning - eLinux.org
also please write in English, so that everyone could benefit, as it is English speaking forum

linuxdev · September 9, 2020, 5:04pm

See (видеть):
https://translate.google.com/

Translated:
I just work in a hospital, and Jetson Agx stands for body position analytics for a tomagraph, now, it may start to freeze due to the load, we bought a new one, but now we need to copy the system image from the old to a new one, probably the whole memory picture, there are a lot Total.

To use a clone for restore you must flash it with the same L4T release version which originally created the image (the image has a matching release version). In most L4T releases a clone can be extracted via:
sudo ./flash.sh -r -k APP -G my_backup.img jetson-xavier mmcblk0p1
(the Xavier would need to be connected in recovery mode)

The result would be two clones: “my_backup.img”, a “sparse” clone which can be thrown away if you want to do anything more than simply flashing it, and also a “raw” clone, “my_backup.img.raw”. This latter “raw” clone is quite useful.

The clone is the actual operating system, and does not include the many smaller partitions, but those partitions are used for boot and are standardized (this might be modified for a custom carrier board). Because of signing you may not find there is any purpose to cloning more than the rootfs (and “my_backup.img.raw” is the rootfs).

Andrey1984 · September 9, 2020, 8:29pm

another experimental method:

boot the Jetson
add microsdcard for backup/ or usb disk, external, or nvme drive
run on jetson steps below:

#stop disks
sudo su
echo u > /proc/sysrq-trigger
#remount microsdcard writeable
mount --options remount,rw /dev/mmcblk1p1
dd if=/dev/mmcblk0p1 of=/path/to the mountfolder of microsdcard/testimage.raw

reboot
At this point there will be a clone that further could be restored to the new device

koja.gafur · September 10, 2020, 1:49am

Thanks! how to restore a clone on another device?

Andrey1984 · September 10, 2020, 3:01am

it depends on either flash.sh or the experimental dd menthod is used

koja.gafur · September 10, 2020, 3:07am

I don’t understand, it’s just the same only a new device

Andrey1984 · September 10, 2020, 10:43am

I suggest to execute on old jetson

#stop disks
sudo su
echo u > /proc/sysrq-trigger
#remount microsdcard/usb  writeable
mount --options remount,rw /dev/here-identifier-of-usb-device #e.g. sda1 or sdcard device mmcblk1
dd if=/dev/mmcblk0 of=/path/to-file-on-usb-device

Then execute at the new jetson

sudo su
echo u > /proc/sysrq-trigger
#remount emmc writeable:	
mount --options remount,rw /dev/mmcblk0
#then override the eMMC with

dd if=/path-to-the-raw-file of=/dev/mmcblk0

Otherwise use method proposed by @linuxdev

koja.gafur · October 21, 2020, 9:05am

After restoring the image ( dd if=/media/image.raw of=/dev/mmcblk0), the Jetson Xavier AGX does not load

koja.gafur · October 21, 2020, 9:12am

Can you tell me step by step how to do this?