Dynamic parallelism detected inappropriately

Hi,

I’m having a strange issue where my project thinks it needs to link to cudadevrt.lib and use dynamic parallelism whenever I call cudamalloc from a global function.

This is not normal behaviour for cuda since I can create a test project that calls cudamalloc from the gpu and does not trigger dynamic parallelism. But whenever I add the cudamalloc call to my existing (large) project this happens (I see it when I try to build the project and the linked insists I link to cudadevrt.lib. I also see it when I try to debug and get told “CUDA Dynamic Parallelism debugging is not supported in preemption mode. Breakpoints will be disabled”).

I have no idea what’s causing this. I verified I don’t see this in new projects with the simple code:

__global__ void Test()
{
	void* ptr;
	cudaMalloc(&ptr, 100);
}

void main()
{
	Test << < 1, 1 >> > ();
}

Has anyone else had this issue before? I can’t see why this is happening to my existing project, literally adding a cudaMalloc function in the main global of my project will trigger the Dynamic Parallelism mode. Any ideas?

If you want to use the runtime API on the device, you are supposed to link against the device runtime (-lcudadevrt).

The fact that you found a case where it seems like you didn’t have to do this is besides the point. (It is, actually, another bug)

This is not “triggering the Dynamic Parallelism mode”. This is simply what is required to use the runtime API calls inside of device code.

Regarding this:

"I also see it when I try to debug and get told “CUDA Dynamic Parallelism debugging is not supported in preemption mode.”

You might want to file a bug against nsight VSE, with a sample project that includes the device runtime but involves no device-side launches.

Here’s a minimal working example of the issue. It turns out to be calling cudaFree from the device

#include <stdio.h>
#include <math.h>
#include <cuda_runtime.h>
#include <curand_kernel.h>
#include <curand.h>
#include "device_launch_parameters.h"

__global__ void Test()
{
	void* ptr;
	cudaMalloc(&ptr, 100);
	cudaFree(ptr);
}

void main()
{
	Test << < 1, 1 >> > ();
}

If you try to debug the global function Test you will get the error “CUDA Dynamic Parallelism debugging is not supported in preemption mode. Breakpoints will be disabled.”.

Is it illegal to call cudaFree from a kernel? Or is this only possible when dynamic parallelism is used?

Note the following code will debug fine using new / delete operators. What’s the difference here? Is this a bug?

#include <stdio.h>
#include <math.h>
#include <cuda_runtime.h>
#include <curand_kernel.h>
#include <curand.h>
#include "device_launch_parameters.h"

class MyClass
{
public:
	int x;
};

__global__ void Test()
{
	MyClass* obj = new MyClass();
	obj->x = 4;
	delete obj;
}

void main()
{
	Test << < 1, 1 >> > ();
}

It seems this is a bug. I’m trying to submit a bug report but always get message “An error occurred while processing your request.”. Is the bug reporting system down?

It’s not down, its just hard to use.

File an empty bug report first. Then add content. It may be simpler just to link to this thread.

Thanks, tried this now but even with minimal fields completed it fails. Will try again in a couple of days as it seems to be down.

Hi

Despite reporting this bug in May last year and providing a repeatable example of the problem there has been no proper response from nvidia. This is disappointing and it feels like the Windows platform is not being properly supported. My application needs .net integration so i can’t move to linux to avoid this.

Any suggestions on how to escalate this bug?

Thanks

What is the bug number? I can’t make any promises, but I can take a look at it.

It is 1938562. Thank you i appreciate your time and assistance.

I’ve put a question internally into the bug. If I get anything meaningful to report, I will report back.