Hi,
I use these nvprof options
nvprof -o analysis.prof_ROW.%h.%p.%q{OMPI_COMM_WORLD_RANK} --system-profiling on --print-gpu-trace --print-api-trace
with my mpirun command and the run takes about 14 minutes
when I try these options, nvprof runs for hours and still no output
nvprof -o analysis.prof_ROW.%h.%p.%q{OMPI_COMM_WORLD_RANK}.nvprof2
–aggregate-mode on --metrics l2_utilization,texture_utilization,system_utilization,dram_utilization,dram_read_throughput,dram_read_transactions,dram_write_throughput,dram_write_transactions,gld_efficiency,gld_throughput,gld_transactions,gld_transactions_per_request,global_cache_replay_overhead,gst_efficiency,gst_throughput,gst_transactions,gst_transactions_per_request,l1_cache_global_hit_rate,l1_cache_local_hit_rate,l1_shared_utilization,l2_atomic_throughput,l2_atomic_transactions,l2_l1_read_throughput,l2_l1_write_throughput,ldst_executed,local_memory_overhead,shared_efficiency,shared_store_throughput,sm_efficiency,sysmem_utilization,sysmem_write_throughput,tex_cache_throughput,tex_fu_utilization,tex_utilization,warp_execution_efficiency
i want to get information on the memory bandwidth. any suggestions? thanks
YAH