Is the available global memory in TX2 less than TX1?

vii22 · June 13, 2017, 8:19am

Hello.

I am interested in the memory addressing scheme used in TX2.
One question was raised while looking at the CUDA Module information through the device query.
It is the size of Total Global Memory.
TX2 has twice the size of memory compared to TX1. Why does it show this output?

I have made TX2 Flash with JetPack 3.0 provided on the homepage.

I would like to support up to 8GB of memory on TX2 because I am requesting a high amount of memory from the ongoing project. Anyone have any ideas?

TX1

root@tegra-ubuntu:~/test# ./query
CUDA Device Query...
There are 1 CUDA devices.

CUDA Device #0
Major revision number:         5
Minor revision number:         3
Name:                          NVIDIA Tegra X1
Total global memory:           4188778496
Total shared memory per block: 49152
Total registers per block:     32768
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    998400
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     2
Kernel execution timeout:      Yes

Press any key to exit...

TX2

root@tegra-ubuntu:~/test# ./query
CUDA Device Query...
There are 1 CUDA devices.

CUDA Device #0
Major revision number:         6
Minor revision number:         2
Name:                          GP10B
Total global memory:           3940433920
Total shared memory per block: 49152
Total registers per block:     32768
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   2147483647
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    1300500
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     2
Kernel execution timeout:      No

Press any key to exit...

PS, Kernel execution timeout in TX2 is intentional.

AastaLLL · June 14, 2017, 3:41am

Hi,

Did you run our cuda deviceQuery sample code?
In our sample code, it shows 8G memory on TX2:

Device 0: "GP10B"
  CUDA Driver Version / Runtime Version          8.5 / 8.0
  CUDA Capability Major/Minor version number:    6.2
  <b>Total amount of global memory:                 7854 MBytes (8235577344 bytes)</b>
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores

vii22 · June 14, 2017, 4:00am

Thank you for the reply…

The deviceQuery in the ‘NVIDIA_CUDA-8.0_Samples’ folder is correctly displayed as the answer. Thank you. :)

I think that it is not a problem of the device defect or the memory recognition for the preceding problem.

Skypuppy · June 14, 2017, 10:33am

So is the 8GB system memory “shared” with the cuda engines or do the cuda engines have 8GB of their own?

vii22 · June 14, 2017, 10:53am

@Skypuppy

I know all of jetson board is “shared” memory. Jetson GPU Module doesn’t have own memory.

Skypuppy · June 14, 2017, 5:17pm

Thanks, @vii22, but well, rats.