System wide atomics

Since System wide atomics as per NVIDIA documentation was enabled with Pascal and CUDA 8 I am very surprised to see that it does not work on Jetson tx2 which was a such a good candidate to use these features. Any reason why this was not enabled?

Surprisingly the information available on product website has no mention of this and neither in any documentation. I believe this limitation should be highlighted on the documentation for developer information.

Hi,

Thanks for your question.
We will check this issue and reply information to you later.

Hi,

Sorry for keeping you waiting.

Currently, atomics[op]_system is only available on 64-bit (x86-64) Linux.
Loading a CUDA module that uses those _system atomics will fail on other systems.

Thanks and sorry for the inconvenience.

Yes I realized this. My question is why? Is it a ARM limitation and a chance to have this feature in the future? Thanks

Also this limitation should be posted on the user guide for one simple reason. The code compiles correctly. But when running the malloc’s fail. To a developer it appears something else is wrong. I had to spend a lot of time debugging it to understand the reason the mallocs were failing were because the application used system wide atomics.

Hi,

Sorry for the inconvenience.
We will add this information into our document.

System wide atomics is quite complex and require some kernel driver handling but is not done for TX2.

Hi,

FYI.
We have updated supported architecture information into CUDA document.
Information can be found in our next release.

Thanks for your feedback.

Thank you. Is there any plan for a driver release that will support system wide atomics or is there no chance for the TX2?