cuDNN 6.0 half precision not working on Titan X Pascal. Fails to find convolution algorithm

I am using torch7 for training my networks. It is all working fine excapt when I try to run the training with half precision. Then I get the following error:

...e/marcel/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 1 endcallback] /home/marcel/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
/home/marcel/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA9,3,368,1224 -filtA13,3,3,3 9,13,184,612 -padA1,1 -convStrideA2,2 CUDNN_DATA_FLOAT
stack traceback:
	[C]: in function 'error'
	/home/marcel/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
	...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
	[C]: in function 'xpcall'
	/home/marcel/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	/home/marcel/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function </home/marcel/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
	[C]: in function 'xpcall'
	/home/marcel/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	/home/marcel/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	./trainManager.lua:156: in function 'opfunc'
	...
	...e/marcel/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
	...e/marcel/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
	./trainManager.lua:44: in function 'trainEpoch'
	main.lua:161: in main chunk
	[C]: in function 'dofile'
	[string "_RESULT={dofile("main.lua")}"]:1: in main chunk
	[C]: in function 'xpcall'
	/home/marcel/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
	...rcel/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
	[C]: at 0x00406670

I tried it with a batch size of 1 and a quite small input size so it is not a memory issue.
My torch version is up to date and now I am thinking there is maybe a bug in the cuDNN 6.0 library. Does anyone know how to solve this or why this error occurs?