I build gdrcpy in a compute node with Titan X 2 GPUs, 2 CPUs. After I installed gdrcpy, and run test programs. The results are followings:
$ ./validate
buffer size: 327680
$ ./copybw
testing size: 4096
rounded size: 65536
device ptr: 206c80000
closing gdrdrv
But in Nvidia announcement, the results should be
$ ./validate
buffer size: 327680
check 1: direct access + read back via cuMemcpy D->H
check 2: gdr_copy_to_bar() + read back via cuMemcpy D->H
check 3: gdr_copy_to_bar() + read back via gdr_copy_from_bar()
check 4: gdr_copy_to_bar() + read back via gdr_copy_from_bar() + extra_dwords=5
$ ./copybw
testing size: 4096
rounded size: 65536
device ptr: 5046c0000
bar_ptr: 0x7f8cff410000
info.va: 5046c0000
info.mapped_size: 65536
info.page_size: 65536
page offset: 0
user-space pointer:0x7f8cff410000
BAR writing test…
BAR1 write BW: 9549.25MB/s
BAR reading test…
BAR1 read BW: 1.50172MB/s
unmapping buffer
unpinning buffer
closing gdrdrv
What is problem?