More Error information:
Memory need is 1931264
GPU is 0
…
…
Memory need is 860016
GPU is 0
Memory need is 55041024
GPU is 0
Memory need is 20952
GPU is 0
Memory need is 55041024
GPU is 0
syscedmem.cpp:73 Check failed: error == cudaSuccess(2 vs. 0) out of memory.
My modified syscedmem.cpp is:
inline void SyncedMemory::to_gpu() {
check_device(); #ifndef CPU_ONLY
switch (head_) {
case UNINITIALIZED:
std::cout << "Memory need is " << size_ << “\n”;
std::cout << "GPU is " << gpu_ptr_ << “\n”;
CUDA_CHECK(cudaMalloc(&gpu_ptr_, size_));
caffe_gpu_memset(size_, 0, gpu_ptr_);
More Error information:
Memory need is 1931264
GPU is 0
Total byte is 8235577344
Free byte is 4501069824
…
…
Memory need is 860016
GPU is 0
Total byte is 8235577344
Free byte is 4081430528
Memory need is 55041024
GPU is 0
Total byte is 8235577344
Free byte is 4079366144
Memory need is 20952
GPU is 0
Total byte is 8235577344
Free byte is 4024004608
Memory need is 55041024
GPU is 0
Total byte is 8235577344
Free byte is 4024004608
syscedmem.cpp:78 Check failed: error == cudaSuccess(2 vs. 0) out of memory.
Just a thought…the laptop will likely use memory from the video device, but Jetsons must use main system memory. Try enabling swap in the kernel if not already enabled (check “/proc/config.gz” for “CONFIG_SWAP=y”), then add an SD card or SATA disk and create a swap file or format a partition as swap (the “swapon” command can point at either a loopback swap formatted file or a partition formatted for swap…see “man mkswap”).
There are other requirements for GPU memory, but adding swap might take some pressure off of physical RAM from other programs and make more available to GPU.
I think it does require contiguous…this is one of those “other requirements”. Swapping out other use of RAM may lead to a bit more being available, but kernel command line options may be needed if larger amounts are failing for reason of not being contiguous. It is easy to try swap and not bother with kernel command line options to test out if that does the job…you’d probably still need swap anyway.
If you make sure to start your GPU process as early as possible in boot, and pre-allocate all memory that you will need, then you don’t need any VM or additional problems.
This is a thing that’s different about embedded compared to desktop PCs – you can have full control, but you also have very fixed resources that you have to know how to manage.
This is very similar to a game console target, TBH.
# Create a swapfile for Ubuntu at the current directory location
fallocate -l 8G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s
I try your method, but it does not work. The problem “F0616 03:17:41.486484 2017 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory” still exists.
I run your code, it can allocate ~7GB as maximum, but if i modify ONE_MBYTE to be ONE_GBYTE, it failed. I guess it can’t allocate large continuous system memory.
Is there any way to move the memory address pointer in such a way the cpu will allocate the 1st 4G memory bank and the GPU to allocate memory from the 2nd 4G memory bank.
So , we can avoid the 4G split limitation.
For example, if cudaMalloc()/cudaFree() starts to see the physical memory from the same address as the Malloc()/Free() as the OS that means we can not override the limitation. what if each device (gpu and cpu) can start allocating memory from different starting point?
Is that possible to implement it? I know that may require tweaking the OS where the memory management happened…
By the way, our application needs 2G allocated for GPU and the rest from for the CPU threads
This limitation is from our CUDA driver. All the memory used by GPU need to pass CUDA driver.
By the way, for desktop GPUs, it’s available to allocate memory via Malloc() and then registers it to GPUs with cudaHostRegister().
But cudaHostRegister() doesn’t support Jetson platform.
Since on ARM platform, the caching attribute of an existing allocation can’t be changed on the fly.
Please wait for our next release.
Sorry for the inconvenience.