cudaGraphicsMapResources() and cuCtxCreate() incompatible?

SanBase · April 6, 2018, 7:34pm

Hi! I’m trying to create an application that generates DX11(actually DX12) texture using CUDA.
The build-in version of the cuda program works, but I need to read and compile PTX files on the fly. For this, I should create a cuda context by cuCtxCreate(), because the functions cuModuleLoad() and cuModuleGetFunction() cannot work without it. But in this case, the application crashing on cudaGraphicsMapResources().

Diagnostic: “getLastCudaError() CUDA error : cudaGraphicsMapResources(3) failed : (33) invalid resource handle.”

The resource was created by DX11 before CUDA was initialized, that’s true, but the app perfectly works without cuCtxCreat. As I can guess, when I run cuCtxCreate(), the cuda starts new process and my DX11 resources and cuda modules are in different address spaces. So, what should I do in this situation?

Some technical details:
cuda 9.1, the system includes AMD R9, Titan V and Titan X. The application running on Titan V.

Robert_Crovella · April 6, 2018, 8:00pm

Perform your driver API context creation before you hit any runtime API calls in your code. The first runtime API call will then “pick up” the driver API context you have created, and use that as the context for the runtime API activity, as well.

The reverse does not work. cuCtxCreate() will not “pick up” a previously created context in use/created by the runtime API. In this reverse scenario, the context associated with the runtime API activity would be accessed in the driver API code via the so-called “Primary” context.

SanBase · April 6, 2018, 8:15pm

Do you mean that I would have called cuCtxCreate() immediately after the DХ11 device was created? I’m afraid that this is almost impossible, so this is actually DX12 app, and the DX11 resources is created by D3D11On12CreateDevice->CreateWrappedResource etc… The textures are created quite elsewhere in the program and much earlier than cuda starting to work.

Robert_Crovella · April 6, 2018, 8:57pm

I was suggesting that before you call any cuda runtime API call in your program, you call your cuda driver API cuCtxCreate().

If you don’t know which calls belong to the cuda runtime API and which belong to the driver API, please review the documentation. Both APIs are documented.

[url]CUDA Runtime API :: CUDA Toolkit Documentation
[url]CUDA Driver API :: CUDA Toolkit Documentation

note that the driver API also has driver-API native graphics interoperability calls:

[url]CUDA Driver API :: CUDA Toolkit Documentation
[url]CUDA Driver API :: CUDA Toolkit Documentation

this might be a “cleaner” way to go.

Whether that will fix the issue in your actual program, I don’t know. I don’t have a test case to work with.

If this is “impossible” for whatever reason, then please disregard my comments.

SanBase · April 6, 2018, 10:56pm

I run cuCtxCreate first. This is constructor of my CCUDA class:
CCuda::CCuda()
{
int device;
cudaD3D11GetDevice(&device, vAdapters[TITANV].pAdapter);

cudaSetDevice(device); // just in case
checkCudaErrors(cuDeviceGet(&cuDevice, device));

checkCudaErrors(cuCtxCreate(&m_Context, 0, cuDevice));

}

IMHO: The problem is not this. The problem is that the resources that I try to map belong to a different address space.

From Cuda programming guide:
“A CUDA context is analogous to a CPU process. All resources and actions performed within the driver API are encapsulated inside a CUDA context, and the system automatically cleans up these resources when the context is destroyed. Besides objects such as modules and texture or surface references, each context has its own distinct address space. As a result, CUdeviceptr values from different contexts reference different memory locations.”

I double checked my code - I create the DX11 texture after cuCtxCreate():
p_CUDA = new CCuda();
…
hr = m_Shared11Device->CreateTexture2D(&pDesc, NULL, &tmp_texture);
…
cudaGraphicsD3D11RegisterResource(&m_texture.cudaResource, tmp_texture, cudaGraphicsRegisterFlagsNone);
cudaMallocPitch(&c_texture.cudaLinearMemory, &c_texture.pitch, c_texture.width * c_texture.byte_size, c_texture.height);
…

ppResources = m_textures[i].cudaResource;
cudaGraphicsMapResources(1, &ppResources, stream);  // crash here

I do not know CUDA well enough to solve this problem, therefore I ask the question here.

Robert_Crovella · April 6, 2018, 11:19pm

I actually do it. This is constructor of my CCUDA class:
CCuda::CCuda()
{
int device;
cudaD3D11GetDevice(&device, vAdapters[TITANV].pAdapter);
cudaSetDevice(device); // just in case
checkCudaErrors(cuDeviceGet(&cuDevice, device));

checkCudaErrors(cuCtxCreate(&m_Context, 0, cuDevice));
}

Actually, you don’t.

This is a cuda runtime API call:

cudaD3D11GetDevice(&device, vAdapters[TITANV].pAdapter);
^^^^

so is this:

cudaSetDevice(device); // just in case
^^^^

They are both documented here:

http://docs.nvidia.com/cuda/cuda-runtime-api/index.html

this call, in the constructor, appears later:

checkCudaErrors(cuCtxCreate(&m_Context, 0, cuDevice));

so you actually have 2 CUDA runtime API calls preceding the call to cuCtxCreate(). That will definitely result in a “new”, separate, different context being created. You should not do that, if you want the runtime API to share the context created by cuCtxCreate().

I have no idea what the problem in your application is. I was responding to this:

That statement is correct. If you initialize the CUDA runtime API (for example via cudaSetDevice()), then call cuCtxCreate() from the driver API, the context created by the driver API will be in a “different address space”. If you don’t want that specific behavior, I’m quite confident in my previous statements about it.

http://docs.nvidia.com/cuda/cuda-runtime-api/driver-vs-runtime-api.html#driver-vs-runtime-api

But whether that is actually giving rise to the problem in your application, I can’t say. “invalid resource handle” could be due to a variety of things. Ultimately, I wouldn’t be able to make any such statement without a test case.

That is certainly true if we are talking about two separate contexts. But I have already indicated to you how to get the runtime API to share the context that has been previously created by the driver API. As near as I can tell, if your claim is that your CCuda constructor is enforcing this, then I wouldn’t agree with that. It doesn’t do what I think needs to be done.

If you’re having trouble with CUDA runtime API/CUDA driver API interoperability in your program, then you may wish to simply convert your program to use the CUDA driver API entirely. As I’ve already pointed out, there are corresponding graphics API interoperability routines (API) within the driver API, that correspond to the ones you have indicated so far in the runtime API.

SanBase · April 6, 2018, 11:51pm

The simplest question is:
Is it possible to modify ~CUDA\Samples\2_Graphics\simpleD3D11Texture to work with PTX?
It would be sufficient for me to understand the principle that it is necessary to change in this sample to implement this.

Тhank you in advance.
P.S.

Hmm… How I can run cuCtxCreate(&m_Context,0,cuDevice) first, if I need to have cuDevice?

Robert_Crovella · April 7, 2018, 12:18am

Yes, it should be possible.

The vectorAdd and vectorAddDrv sample code pair shows how a runtime API application may be realized using the driver API, including using PTX kernels. Apart from that, for the remainder of the simpleD3D11Texture app, there should be approximately a 1:1 correspondence between the cuda runtime API calls and their corresponding cuda driver API calls. This may require some study and effort on your part.

perhaps by using the corresponding driver API call, to retrieve the D3D11 device, rather than by using a cuda runtime API call to do so:

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__D3D11.html#group__CUDA__D3D11

CUresult cuD3D11GetDevice ( CUdevice* pCudaDevice, IDXGIAdapter* pAdapter )
Gets the CUDA device corresponding to a display adapter.

SanBase · April 7, 2018, 12:52am

Thank you! I started doing this job and almost converted my application into the driver API. Only one function have no analogues:
cudaMemcpy2DToArray(out_array, 0, 0, m_textures[0].cudaLinearMemory, m_textures[0].pitch,
m_textures[0].width * m_textures[0].byte_size, m_textures[0].height, cudaMemcpyDeviceToDevice);

(I borrowed this code from “simpleD3D11Texture”).
Could you advise something?

Robert_Crovella · April 7, 2018, 2:25am

cuMemcpy2D can transfer to a destination that is a cuda Array type:

[url]CUDA Driver API :: CUDA Toolkit Documentation