Hi there.
We plan to use DRIVE AGX (Pegasus) platform for real-time computing. Real-time in our case means that the system will always produce the same output when starting conditions and inputs are the same.
When running the program on a GPU there is one fundamental problem: None known to us GPU management programs allow the eviction of a running GPU kernel before its finish, that is, none of them allows preemptive GPU schedules [1][2][3]. Having preemptive scheduling is the bread and butter of determinism and real-time. Another important issue for determinism and real time is memory management, preferably all the memory needs to be statically pre-allocated and then just re-used during the runtime.
We would like to know how does Nvidia plan to tackle these challenges in on e.g. NVIDIA DrivePX Pegasus?
If you have a working example we would greatly appreciate it. When we talk about the determinism we talk about 2 types of determinism:
-
Timing determinism: Execution time of GPU kernels is NOT deterministic, if for no other reason than the CPU-GPU interaction being non-deterministic.
-
Result determinism: Floating-point (and even fixed-point or integer) arithmetic is not associative. That is, ((a+b)+c) may not be the same as (a+(b+c)). So the result of a set of operations depends on the order of doing those operations. So although we are not sure if this would lead to non-determinism in CUDA/GPUs (i.e., every time you run a kernel, you get slightly different results), every time you change the GPU or CUDA version, you run the chance of getting different results.
[1] File | Thesis | ID: dj52w8968 | Carolina Digital Repository
[2] gpgpu - GPU and determinism - Stack Overflow
[3] https://cs.unc.edu/~tamert/papers/ecrts18b.pdf