How would you set up a CI/CD (Continuous Integration/Deployment) on a Jetson TX2

Dear Community,

We are a Switzerland-based start-up, building a product to measure cities cleanness. Each unit is based on a set of 3 Jetsons TX2.

We have successfully validated both a prototype and a beta versions of the unit, which now trigger the big question about production quality and maintenance. Indeed, each unit we deliver to a client contains 3 jetsons, that need to be maintained and updated remotely and continuously as much as possible.

I have seen two interesting posts touching those topics:

My question would be “Has anyone setup a continuous deployment pipeline on a jetson TX2? and how?

I am also curious to get details on the parts of the system that you would have continuously integrated and deployed, e.g.:

  • the application itself of course, which happens to be in python in our case
  • the updates of the core libs, e.g. tensorflow, cuda
  • the updates of python itself
  • the updates of the OS, which is the part that would not be done remotely from what I red, since it requires flashing

Thanks,

Emmanuel
-Cortexia.ch

Hi,

Are you looking for an OTA solution for TX2?

You need to reflash OS for any CUDA packages(ex. CUDA, cuDNN, TensorFlow, …) update due to GPU driver.
As a result, it’s recommended to fix the library version for your product.

Thanks.

Hi,

An Over-The-Air solution would be a good starting point indeed, but I understand that reflashing the OS is required, and that goes through the USB cable.

What I am really looking for is experience in automatic upgrade / deployment.

Cheers,
Emanuel

Hi,

Some software update also requires system re-flashing.

Our GPU driver is embedded in the Jetson OS(L4T).
Once CUDA package is updated, you will need a newer driver to avoid incompatible error.

It’s still recommended to fix the library/software version for your product.

Thanks.

This is not an answer, but something to think about. I recommend you categorize all possible updates you might intend. This can be updated and details added later if you mention each upgrade you are interested in. It is a starting point.


Along those lines, disable any annoying message about dist upgrade. Edit file “/etc/update-manager/release-upgrades”. Note this line:

Prompt=lts

You never ever want to use the Ubuntu mechanism for do-release-upgrade. If you change to this, then it’ll only prompt for ordinary package updates for console users:

Prompt=normal

If you don’t want any kind of available updates check for command line login to see:

Prompt=never

Comment: Having automatic checks when network might be down implies it should be “never”, and then only manually run something like “sudo apt update;sudo apt-get upgrade”. If this goes to an end customer, and you want to be the only one doing upgrades, or if you have some schedule or routine in place, this should also be “never”.


Any upgrade which has an effect on the boot environment needs to be immediately marked “dangerous” to people who might be doing an upgrade. “sudo apt update” and “sudo apt-get-upgrade” are not dangerous, but still needs some comment.

Before you do any kind of apt operation which will actually update (as opposed to merely observing) you should first verify the NVIDIA-specific files are in place:

sha1sum -c /etc/nv_tegra_release

…correct any failures before the apt operation. Then do your apt operations, and once again check “sha1sum -c /etc/nv_tegra_release”. Don’t reboot until you’ve checked this.

There is only one place where this really matters in practice. See what is listed from:

# grep libglx /etc/nv_tegra_release
28dac9361b6fca4c80b4d33450c07d6567fda1e2 */usr/lib/xorg/modules/extensions/libglx.so
28dac9361b6fca4c80b4d33450c07d6567fda1e2 */usr/lib/aarch64-linux-gnu/tegra/libglx.so

The only failure you might get, and one which is very rare (other than perhaps the first apt-get upgrade after a flash) is this file:

/usr/lib/<b>xorg/modules/extensions/</b>libglx.so

…and note that this is the same file in a different location, and is always safe during any apt operation:

/usr/lib/<b>aarch64-linux-gnu/tegra</b>/libglx.so

You can fix this simply by copying the aarch64 version into the xorg module location.

If this file is wrong, then GUI login will break. The rest of the system will be ok, so ssh and CTRL-ALT-F2 will be available to fix this.

FYI, if you look at the driver package used for a PC to flash a Jetson (the content of the “Linux_for_Tegra/” subdirectory…present either from the driver package download or JetPack which is a front end to the driver package, then the “Linux_for_Tegra/nv_tegra/” directory archive “nidia_drivers.tbz2” is that content (the “apply_binaries.sh” step does this). If you are ever in doubt, then you could copy this file to “/” of the Jetson, and unpack it as root to place everything back in place which is named in “/etc/nv_tegra_release”.


Never mix and match between two L4T releases. If in doubt, see the current “head -n 1 /etc/nv_tegra_release”. Any kind of scripted upgrade which is L4T version-dependent should check this. Mixing anything between two incompatible releases will cause failure.


Categorize your updates to one of:

  • Rootfs and regular package updates.
  • Device tree updates.
  • Boot environment updates. Note: In older releases device tree was independent of boot environment. In semi-recent releases device tree update is a boot environment update. You should consider a device tree update in both boot and device tree category.
  • Kernel update. Note: The base kernel Image is part of the boot environment, but since modules are not depended upon for boot, modules are not really part of the boot environment unless you've customized.

Consider having a reference clone. Update a test TX2, if it succeeds, then either clone or rsync to an existing clone. Then do your update.


Never mix a clone from one release with another release.


If you update rootfs, then you can more or less use any mechanism published by Ubuntu to do the update so long as it doesn’t name the kernel or boot structure. “rsync” is something you can use.


Be careful before any clone restore to examine the content of the clone’s “/etc/udev/rules.d/”. There are times when for example an ethernet MAC address is added directly to a udev rule as a way of renaming a specific interface. Recent releases don’t do this, but if there a customization which is specific to a serial number of MAC address which differs from one physical Jetson to another, then you’ll have to account for that or remove the rule.

A second example for care with cloning is that the password and other files related to login will be from whatever you cloned.

Related to this, consider host ssh keys and individual login ssh keys are cloned…restore with those may not be what you want.

Consider preserving any WiFi setup information.


As per the last statement, you might find it useful to take down certain information before a risky operation…or even upon first deployment. This is especially useful on any dev Jetson. A suggested list of things to save a copy of:

  • ifconfig
  • route
  • sudo gdisk -l /dev/mmcblk0
  • uname -r
  • A copy of "/proc/config.gz".
  • A copy of the current running device tree:
    dtc -I fs -O dts -o extracted.dts /proc/device-tree
    
  • A copy of "/proc/cmdline". Note: This is mostly if you need to debug. The "Chosen" of the device tree is mostly how this is created, although "/boot/extlinux/extlinux.conf" can take part. "extlinux.conf" is not used for most changes these days.
  • A recursive copy with preserved file permissions for "/etc/ssh/".
  • Make sure no standard account (meaning "ubuntu" and "nvidia") have a standard password if this ever touches any public network for even a few minutes.
  • Output of "head -n 1 /etc/nv_tegra_release".

Be especially aware that recent L4T releases sign device trees. “dd” can put these in place, but if the signature is wrong, then reboot will fail. Along those lines, even if you do have a correct signature, be absolutely certain the partition itself is large enough to hold your edited and signed version of a “dd” emplaced device tree.

Recently, dived into this field and didn’t think it would have so many variants. After exploring some best ci cd tools, I just noticed that these are often considered two terms for the same process. However, as you can see from the above image, there is a significant difference. Continuous Delivery refers to the delivery of code to the testing teams, while Continuous Deployment refers to the deployment of code to the production environment.