@little_jimmy, I use 0 shared memory. None of my variables have the shared prefix, nor do I use the third kernel launch argument.
Here’s the ptxas output:
ptxas info : 73 bytes gmem
ptxas info : Function properties for _Z21SignalAUR_ProcessTickR14GpuAURSignal_tRK9GpuTick_t
32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads
ptxas info : Function properties for _Z26Base_UpdateSignalsR17GpuBase_tRK9GpuTick_tRK15TConfig_t
32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads
ptxas info : Function properties for cudaMalloc
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9UpdateStatusR15GpuStatManager_tRK9GpuTick_tRK12IConfig_t
8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads
ptxas info : Function properties for _Z21SignalAUF_ProcessTimeR14GpuAUFSignal_td
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _ZN4dim3C1Ejjj
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for cudaOccupancyMaxActiveBlocksPerMultiprocessor
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z28StatManager_ProcessEventR13GpuEvtData_tR15GpuStatManager_tRK12IConfig_tP14GpuEvtList_tRK9GpuTick_t
48 bytes stack frame, 48 bytes spill stores, 48 bytes spill loads
ptxas info : Function properties for _Z19SignalAUD_ProcessTimeR12GpuAUDSignal_td
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z25Base_ProcessRequestR17GpuBase_tRK9GpuTick_t11ReqAction
16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
ptxas info : Function properties for cudaDeviceGetAttribute
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z23CalcActiveR10GpuReq_tRK9GpuTick_tdP13GpuEvtData_t
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z19SignalAG_ProcessTimeR12GpuAGSignal_td
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z6OnEventR13GpuEvtData_tR17GpuBase_tRK12IConfig_tP14GpuEvtList_tRK9GpuTick_t
16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads
ptxas info : Function properties for _Z19SignalBF_ProcessTickR12GpuBFSignal_tRK9GpuTick_t
96 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads
ptxas info : Function properties for _Z5roundf
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z19SignalBUG_ProcessTickR12GpuBUGSignal_tRK9GpuTick_t
8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads
ptxas info : Compiling entry function '_Z11DerivedjPK9GpuTick_tP16GpuDerived_tP15GpuStatManager_tP14GpuEvtList_t15TConfig_t17SConfig_t12IConfig_t' for 'sm_35'
ptxas info : Function properties for _Z11DerivedjPK9GpuTick_tP16GpuDerived_tP15GpuStatManager_tP14GpuEvtList_t15TConfig_t17SConfig_t12IConfig_t
7728 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 76 registers, 8000 bytes cumulative stack size, 448 bytes cmem[0]
ptxas info : Function properties for cudaGetDevice
8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z26CalcPassiveR10GpuReq_tRK9GpuTick_tdP13GpuEvtData_t
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z19SignalDD_ProcessTickR12GpuDDSignal_tRK9GpuTick_t
24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads
ptxas info : Function properties for _Z20ProcessEventsR17GpuBase_tRK9GpuTick_tS3_RK17SConfig_tRK12IConfig_tP14GpuEvtList_t
208 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads
ptxas info : Function properties for _Z19SignalDF_ProcessTickR12GpuDFSignal_tP11AnySignal_t
32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads
ptxas info : Function properties for _Z3maxdd
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _ZSt3absd
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z19SignalDS_ProcessTickR12GpuDSSignal_tP11AnySignal_t
40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads
ptxas info : Function properties for _Z23Derived_ProcessTickR16GpuBase_tRK9GpuTick_tS3_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for cudaFuncGetAttributes
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z8ClosePendingEventsR17GpuBase_tRK9GpuTick_t
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z13IsTimedOutdRK17SConfig_t
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z16OnEvtRejectR17GpuBase_t
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z21SignalDUG_ProcessTickR14GpuDUGSignal_tRK9GpuTick_t
24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads
ptxas info : Function properties for _ZN4dim3C2Ejjj
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads