sdkmanager 0.9.9 flash AGX failed

Hi,

I encountered the following error when flashing the Xavier AGX board w/ new SW 8.0 using sdkmanager 0.9.9 beta.
The following are section of the log showing the reported error. Full log is attached
Could you help ?

... ... 
2019-02-01 10:27:37.601 - info: ./bootburn.sh -b e3550b03-t194a -B qspi -x /dev/ttyUSB3
2019-02-01 10:27:38.886 - info: Execute Xavier Script
2019-02-01 10:27:38.886 - info: Successfully acquired lock over /var/lock/LCK..bootburn
2019-02-01 10:27:38.886 - info: Successfully acquired lock over /var/lock/LCK..ttyUSB3
2019-02-01 10:27:38.886 - info: Read skuinfo from InfoRom...
[b]2019-02-01 10:27:40.745 - error: /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn/../bootburn_t19x//bootburn.sh: line 774: array: bad array subscript
2019-02-01 10:27:41.693 - error: /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn/../bootburn_t19x//bootburn.sh: line 774: array: bad array subscript[/b]
2019-02-01 10:27:41.693 - info: Disabling SIGINT <Ctrl+C> temporarily
2019-02-01 10:27:44.104 - info: Setting Tegra-A on hold... Done
2019-02-01 10:27:46.795 - info: Setting Tegra-B on hold... Done
2019-02-01 10:27:52.343 - info: Setting Tegra-A in recovery... Done
2019-02-01 10:27:58.092 - info: Setting Tegra-B in recovery... Done
2019-02-01 10:27:58.093 - info: Enabling SIGINT <Ctrl+C>
2019-02-01 10:28:03.257 - info:
2019-02-01 10:28:03.258 - info:  ------------ Stack Trace ------------
2019-02-01 10:28:03.258 - info: stack frame 0 - 287 AbnormalTermination /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh
2019-02-01 10:28:04.148 - info: stack frame 1 - 1084 GetTargetECID /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh
2019-02-01 10:28:04.148 - info: stack frame 2 - 340 source /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_active.sh
2019-02-01 10:28:04.148 - info: stack frame 3 - 1014 main /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn/../bootburn_t19x//bootburn.sh
... ...

SDKM_logs_DRIVE_Software_8.0_Linux_for_DRIVE_AGX_Developer_Kit_02012019_1021.zip (530 KB)

Dear davyhuang,
It looks you are able to install successfully on host. Could you confirm if you have followed the requirements in https://developer.download.nvidia.com/sdkmanager/secure/clients/sdkmanager-0.9.9.2351/SDKManagerUserGuide.pdf?nc6YkjJdbkf0j9HvVfCWDQXMDNiTyO3ICseNsvHHQmoz2oAvkU2Agffhx9TMWnoSXIkCS5YEisoVKdsow45Xc3vHNFxCkXMsL78odJQSYD1PQmIGrEBEO3UdBBFVfCZZEs1MZ2-qOXyv5TZkgxcnAfEP22112khyHw7UKUl-uwyEJ9OLKns and you don’t have internet/proxy issues.

Hi SivaRama,

Yes, I followed those requirements. The host side installation and most target side installation were able to complete OK (so Internet shouldn’t be a problem).

I reran the process again - this time it was able to move further but failed on the following highlighted steps during the “Flash” step, looks like something related to AbnormalTermination and “Failed to get UID of the Chip”.

A new full log is attached for your reference.

Thanks for the help !

===============================

...
2019-02-04 11:41:08.482 - info: Generating Flashing-RCM Images
2019-02-04 11:41:12.778 - info: Sending bct and prerequisite binaries
2019-02-04 11:41:12.905 - info: ==> /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/_temp_dump/TegraALog_i23oJM0ARq.txt <==
2019-02-04 11:41:12.906 - info:
2019-02-04 11:41:12.906 - info:  ------------ Stack Trace ------------
<b>2019-02-04 11:41:12.906 - info: stack frame 0 - 287 AbnormalTermination /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh</b>
2019-02-04 11:41:12.906 - info: stack frame 1 - 1084 GetTargetECID /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh
2019-02-04 11:41:12.906 - info: stack frame 2 - 340 source /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_active.sh
2019-02-04 11:41:12.907 - info: stack frame 3 - 1014 main /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn/../bootburn_t19x//bootburn.sh
2019-02-04 11:41:12.907 - info: -------------------------------------
2019-02-04 11:41:12.907 - info:
[b]2019-02-04 11:41:12.908 - info: /mnt/home2/davyhuang/work/nvidia/nvidia_sdk/DRIVE/Linux/5.0.13.2/SW/DriveSDK/drive-t186ref-foundation/tools/host/flashtools/bootburn_t19x/bootburn_lib.sh: line 291: [: too many arguments
2019-02-04 11:41:12.908 - info: error-tool-tegrarcm-chipinfo -- Failed to get UID of chip[/b]
2019-02-04 11:41:12.908 - info:
...
...
2019-02-04 11:41:13.085 - info: Generating Flashing-RCM Images
2019-02-04 11:41:13.085 - info: Sending bct and prerequisite binaries
2019-02-04 11:41:13.085 - info: bootburn flashing failed! error code = 50
2019-02-04 11:41:13.085 - info: exit status 50
2019-02-04 11:41:13.085 - info: [ Component Install Finished with Error ]
2019-02-04 11:41:13.085 - error: NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP command ./pdk_flash.sh -b e3550b03-t194-es finished with error
2019-02-04 11:41:13.085 - info:
2019-02-04 11:41:13.085 - info: [ 628.00 KB used. Disk Avail: 79.98 GB ]
2019-02-04 11:41:13.085 - info: [ NV_FLASH_XAVIER_PDKFLASH_PARALLEL_COMP Install took 39s ]
2019-02-04 11:41:13.085 - info: command ./pdk_flash.sh -b e3550b03-t194-es finished with error
2019-02-04 11:41:13.085 - info: cmd finished failure SDKM_END_CODE_FAILURE_149e4fc3-0ae9-40c3-b0d1-c2449d461e33
2019-02-04 11:41:13.085 - error: command terminated with error

SDKM_logs_DRIVE_Software_8.0_Linux_for_DRIVE_AGX_Developer_Kit_02042019_1140.zip (527 KB)

Dear Davyhuang,
I will look into this issue and get back to you. Could you share output of deviceQuery(/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery)

Dear Davyhuang,
I could see exec_command: [ -e /dev/ttyUSB3 ] || echo Failed to flash device: detected no board connected to your host with USB cable. >&2 at Line 5 in NV_FLASH_XAVIER_PDKFLASH_A_COMP.log. Could you confirm if you have connected the debug USB port of board and host with USB A-A cable?

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
→ CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Please note that my host Ubuntu PC doesn’t have a NV GPU card installed. It has an Intel onboard Graphic.

Yes. a USB cable is connected. I was able to use such cable to install/flash DRIVE SW 1.0 few months ago. And I can also use minicom --device /dev/ttyUSB3 to access AURIX’s terminal shell, and it’s still responsive.

Does it help if you run ‘aurixreset’ in minicom (and disconnecting) before kicking off the sdkmanager?

Hi,

we are aware of this issue and extensively working on it. We will come back to you as soon as we have a fix for it.

Fabian

I tried issuing ‘aurixreset’ on Aurix’s shell, waited till Tegra rebooted, exit minicom, then retired sdkmanager. But it still failed the same.

Note that I also set the MCU switch to “PRG” instead of “RUN” before I ran sdkmanager - this was successful before.

Hi,

as I said, we are aware of this issue.

For now you are only able to flash your target once, then you would need to wipe your host / use a different host / use a clean VM to flash again - those are the only options for now.

Solution is in progress.

Fabian

Dear FabianWeise,

I have a similer problem when flashing the AGX Xavier using sdkmanager Beta 0.9.9.2351.

The flashing failed and SDK Manager displays a error message…
(The Message is “Flash operation timeout exceeded”)
I have done sdk flashing multiple times, but occur the same error.

the displayed message and Full log is attached.

I would be grateful if you could tell about any information we should check.
Any information would greatly help.

Thanks.


SDKM_logs_DRIVE_Software_8.0_Linux_for_DRIVE_AGX_Developer_Kit_02062019_0959.zip (652 KB)

Hi guys,

please turn around your DRIVE AGX platforms and find one or two white labels.

We need a photo of these, so please be so kind to provide them.

Fabian

Mine is attached. Thanks !
AGX_Lable.pdf (12.7 MB)

Davy,

from a brand new host, can you try to reproduce the issue again and give me the steps to follow?

Fabian

The picture of two white labels on AGX is attached.

Thanks.

Thanks for the picture.

We have staffed a whole team with some of our best engineers working on your issue right now. Please be patient and we will come back as soon as we have it fixed.

Fabian

Hi Davy,
Hi Atsutaka,

  1. Please check to flash each Xaviers individually and let us know all the logs. > ./bootburn.sh -b e3550b03-t194a -B qspi -x /dev/ttyUSB3 > ./bootburn.sh -b e3550b03-t194b -B qspi -x /dev/ttyUSB3
  2. What is your HW configuration (OS, `uname -a`, etc.)
  3. During failure, what does `dmesg` output?
  4. How consistently and on how many boards does your issue occur?

Here is a workaround for you I would please you to try out. We assume your kernel is ‘linux 4.15.0-45-generic’ causing the problem and thus downgrading may ‘hot-fix’ your issue:

  1. Downgrade your Linux host kernel to "linux 4.15.0-32"
  2. Reboot and try again.

Many thanks in advance.

Fabian

Hi FabianWeise,

My HW configuration is:

  • OS: Ubuntu 16.04.5 LTS
  • uname -a: 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I will check your above mentioned steps soon!

Hi Fabian,

It looks like I am able to make a progress on this issue:

  1. I first flashed Tegra A as you instructed. It failed the same before. Then I ran the uname and dmesge. Please see log “tegra1.log” attached;
  2. Next I flashed Tegra B. I believe it passed OK. Please see log “tegra2.log” attached.
  3. Next I repeated step 1 to flash Tegra A again. I believe it also passed OK. Please see log “tegra1_again.log”.

Later, I went back to run sdkmanager again - it still fail the same regardless I flash A+B in parallel or flash B first.

Can you please confirm everything looks OK as shown in those attached logs ?

To answer your other questions - I’ve only tested this on one board, and I’ve been testing only on this board.

Thanks,
tegra1.log (112 KB)
tegra2.log (23.6 KB)
tegra1_again.log (24.6 KB)