bandwidthTest failure bandwidthTest out of memory Error
Saw the other post like this but dropping the memory range increments doesn't help.

Ran the deviceQuery & my card seems fine, see below. When I then run the BandwidthTest I get the error below. Tried various ranges with no joy. Any idea why this doesn't work ? The other post also mentioned for their card you could do memory mapping instead which was faster to get around it. Can I do that with my card ? All drives/SDK's etc all installed fine & everything seemed to compile OK.

Many thanks.

Russ


./bandwidthTest --mode=range --start=1024 --end=102400 --increment=1024
[bandwidthTest] starting...

./bandwidthTest Starting...

Running on...

Device 0: GeForce 8600M GT
Range Mode

bandwidthTest.cu(761) : CUDA Runtime API error 2: out of memory.



[deviceQuery] starting...

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce 8600M GT"
CUDA Driver Version / Runtime Version 4.1 / 4.1
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 256 MBytes (268238848 bytes)
( 4) Multiprocessors x ( 8) CUDA Cores/MP: 32 CUDA Cores
GPU Clock Speed: 1.04 GHz
Memory Clock rate: 650.00 Mhz
Memory Bus Width: 128-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.1, CUDA Runtime Version = 4.1, NumDevs = 1, Device = GeForce 8600M GT
[deviceQuery] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!
Saw the other post like this but dropping the memory range increments doesn't help.



Ran the deviceQuery & my card seems fine, see below. When I then run the BandwidthTest I get the error below. Tried various ranges with no joy. Any idea why this doesn't work ? The other post also mentioned for their card you could do memory mapping instead which was faster to get around it. Can I do that with my card ? All drives/SDK's etc all installed fine & everything seemed to compile OK.



Many thanks.



Russ





./bandwidthTest --mode=range --start=1024 --end=102400 --increment=1024

[bandwidthTest] starting...



./bandwidthTest Starting...



Running on...



Device 0: GeForce 8600M GT

Range Mode



bandwidthTest.cu(761) : CUDA Runtime API error 2: out of memory.







[deviceQuery] starting...



./deviceQuery Starting...



CUDA Device Query (Runtime API) version (CUDART static linking)



Found 1 CUDA Capable device(s)



Device 0: "GeForce 8600M GT"

CUDA Driver Version / Runtime Version 4.1 / 4.1

CUDA Capability Major/Minor version number: 1.1

Total amount of global memory: 256 MBytes (268238848 bytes)

( 4) Multiprocessors x ( 8) CUDA Cores/MP: 32 CUDA Cores

GPU Clock Speed: 1.04 GHz

Memory Clock rate: 650.00 Mhz

Memory Bus Width: 128-bit

Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)

Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Concurrent copy and execution: Yes with 1 copy engine(s)

Run time limit on kernels: Yes

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: No

Alignment requirement for Surfaces: Yes

Device has ECC support enabled: No

Device is using TCC driver mode: No

Device supports Unified Addressing (UVA): No

Device PCI Bus ID / PCI location ID: 1 / 0

Compute Mode:

< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >



deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.1, CUDA Runtime Version = 4.1, NumDevs = 1, Device = GeForce 8600M GT

[deviceQuery] test results...

PASSED



> exiting in 3 seconds: 3...2...1...done!

#1
Posted 02/17/2012 01:59 PM   
BTW - I am running a MacBookPro

Model Name: MacBook Pro
Model Identifier: MacBookPro3,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2.6 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 800 MHz
Boot ROM Version: MBP31.0070.B07
SMC Version (system): 1.18f5
Serial Number (system): W87472B3XA9
Hardware UUID: 00000000-0000-1000-8000-001EC20708A1
Sudden Motion Sensor:
State: Enabled

NVIDIA GeForce 8600M GT:

Chipset Model: GeForce 8600M GT
Type: GPU
Bus: PCIe
PCIe Lane Width: x16
VRAM (Total): 256 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0407
Revision ID: 0x00a1
ROM Revision: 3175
Displays:
Color LCD:
Resolution: 1920 x 1200
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
Display Connector:
Status: No Display Connected
BTW - I am running a MacBookPro



Model Name: MacBook Pro

Model Identifier: MacBookPro3,1

Processor Name: Intel Core 2 Duo

Processor Speed: 2.6 GHz

Number Of Processors: 1

Total Number Of Cores: 2

L2 Cache: 4 MB

Memory: 4 GB

Bus Speed: 800 MHz

Boot ROM Version: MBP31.0070.B07

SMC Version (system): 1.18f5

Serial Number (system): W87472B3XA9

Hardware UUID: 00000000-0000-1000-8000-001EC20708A1

Sudden Motion Sensor:

State: Enabled



NVIDIA GeForce 8600M GT:



Chipset Model: GeForce 8600M GT

Type: GPU

Bus: PCIe

PCIe Lane Width: x16

VRAM (Total): 256 MB

Vendor: NVIDIA (0x10de)

Device ID: 0x0407

Revision ID: 0x00a1

ROM Revision: 3175

Displays:

Color LCD:

Resolution: 1920 x 1200

Pixel Depth: 32-Bit Color (ARGB8888)

Main Display: Yes

Mirror: Off

Online: Yes

Built-In: Yes

Display Connector:

Status: No Display Connected

#2
Posted 02/17/2012 02:00 PM   
Seemed to have sorted this somehow. Running the main ./BandwidthTest gives :
[bandwidthTest] starting...

./bandwidthTest Starting...

Running on...

Device 0: GeForce 8600M GT
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 130.1

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 438.1

bandwidthTest.cu(895) : CUDA Runtime API error 2: out of memory.


So still a failure. But, if I change the parameters to the following it now works.

./bandwidthTest --mode=range --start=1024 --end=102400 --increment=1024
[bandwidthTest] starting...

./bandwidthTest Starting...

Running on...

Device 0: GeForce 8600M GT
Range Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
1024 20.3
2048 0.6
3072 92.1
4096 3.0
5120 116.8
6144 148.3
7168 165.1
8192 179.6
9216 190.7
10240 203.5
11264 216.6
12288 229.8
13312 236.4
14336 249.0
15360 251.3
16384 265.7
17408 269.9
18432 279.0
19456 285.9
20480 285.5
21504 299.4
22528 263.9
23552 301.9
24576 315.4
25600 313.8
26624 308.5
27648 328.4
28672 399.8
29696 558.6
30720 581.3
31744 580.0
32768 572.3
33792 603.5
34816 613.7
35840 626.0
36864 634.6
37888 468.6
38912 635.4
39936 606.5
40960 662.1
41984 654.2
43008 676.8
44032 680.6
45056 617.4
46080 695.3
47104 668.5
48128 693.3
49152 621.7
50176 688.5
51200 687.7
52224 714.6
53248 573.2
54272 722.9
55296 742.7
56320 729.8
57344 715.8
58368 702.8
59392 761.3
60416 768.2
61440 768.9
62464 779.7
63488 741.1
64512 742.1
65536 771.6
66560 759.3
67584 873.3
68608 878.3
69632 874.9
70656 815.8
71680 884.3
72704 892.4
73728 893.4
74752 879.0
75776 842.3
76800 874.0
77824 895.3
78848 897.3
79872 920.0
80896 880.7
81920 899.0
82944 910.3
83968 920.4
84992 905.6
86016 936.4
87040 886.8
88064 953.3
89088 932.6
90112 955.9
91136 938.6
92160 902.4
93184 853.7
94208 967.1
95232 953.0
96256 973.5
97280 961.4
98304 979.6
99328 958.8
100352 922.9
101376 979.5
102400 895.1

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
1024 20.5
2048 66.9
3072 96.4
4096 104.2
5120 150.2
6144 167.9
7168 197.0
8192 223.2
9216 233.1
10240 256.3
11264 274.0
12288 285.1
13312 307.4
14336 315.7
15360 332.2
16384 344.2
17408 355.5
18432 374.8
19456 388.2
20480 393.8
21504 403.7
22528 410.0
23552 427.8
24576 435.6
25600 453.0
26624 451.8
27648 453.8
28672 467.4
29696 474.4
30720 485.0
31744 480.5
32768 500.8
33792 498.1
34816 509.3
35840 517.9
36864 502.2
37888 526.0
38912 526.4
39936 539.5
40960 532.2
41984 549.2
43008 552.0
44032 559.2
45056 556.6
46080 564.8
47104 575.9
48128 567.3
49152 575.2
50176 577.9
51200 569.8
52224 533.8
53248 529.0
54272 526.0
55296 533.2
56320 588.3
57344 600.3
58368 556.6
59392 602.6
60416 604.0
61440 512.2
62464 567.3
63488 617.2
64512 543.5
65536 581.9
66560 118.1
67584 346.0
68608 644.6
69632 651.0
70656 644.2
71680 651.7
72704 580.2
73728 654.1
74752 654.6
75776 661.2
76800 641.4
77824 666.8
78848 663.1
79872 670.5
80896 667.4
81920 662.1
82944 681.9
83968 680.9
84992 680.6
86016 684.2
87040 685.4
88064 677.8
89088 681.9
90112 661.1
91136 624.4
92160 667.4
93184 700.8
94208 685.8
95232 699.2
96256 695.4
97280 706.6
98304 707.0
99328 704.8
100352 708.4
101376 709.3
102400 708.2

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
1024 191.3
2048 519.7
3072 747.7
4096 971.9
5120 1193.5
6144 1391.9
7168 1569.6
8192 1709.7
9216 1865.9
10240 2091.7
11264 2112.6
12288 2278.1
13312 2351.0
14336 2449.8
15360 2604.6
16384 2606.3
17408 2707.7
18432 2918.0
19456 3055.8
20480 3121.2
21504 3307.7
22528 3424.6
23552 3410.6
24576 3477.0
25600 3537.0
26624 3700.0
27648 3767.6
28672 3814.7
29696 3993.7
30720 3985.8
31744 4057.7
32768 4214.8
33792 4289.1
34816 4330.5
35840 4344.6
36864 4396.3
37888 4547.5
38912 4589.1
39936 4741.8
40960 4661.8
41984 4831.9
43008 4785.3
44032 4893.7
45056 4969.6
46080 5040.5
47104 5086.3
48128 5179.9
49152 5138.9
50176 5224.9
51200 5304.6
52224 5315.6
53248 5411.5
54272 5426.7
55296 5544.9
56320 5568.9
57344 5688.1
58368 5645.9
59392 5778.7
60416 5794.2
61440 5870.6
62464 5884.5
63488 5919.3
64512 5998.8
65536 6035.6
66560 5985.6
67584 6078.6
68608 6188.5
69632 6191.8
70656 6238.2
71680 6246.3
72704 6296.9
73728 6407.0
74752 6467.7
75776 6429.3
76800 6509.7
77824 6533.3
78848 6545.6
79872 6605.7
80896 6606.1
81920 6621.7
82944 6669.2
83968 6764.3
84992 6743.8
86016 6781.7
87040 6795.0
88064 6943.2
89088 6870.3
90112 6887.8
91136 6904.1
92160 7013.7
93184 7006.7
94208 6990.2
95232 7114.9
96256 7083.1
97280 7126.8
98304 7019.7
99328 7141.6
100352 7156.6
101376 7162.7
102400 7130.3

[bandwidthTest] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!



Unfortunately I am not sure exactly what I have done to fix the issue so not going to help anyone greatly I'm afraid. It 'might' be that I hadn't added the CUDA/bin directory to the path and lib dir to the DYLD_LIBRARY_PATH although not sure. I did though re-boot the machine so if you get into this scenario try all of the above.
Seemed to have sorted this somehow. Running the main ./BandwidthTest gives :

[bandwidthTest] starting...



./bandwidthTest Starting...



Running on...



Device 0: GeForce 8600M GT

Quick Mode



Host to Device Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 130.1



Device to Host Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 438.1



bandwidthTest.cu(895) : CUDA Runtime API error 2: out of memory.





So still a failure. But, if I change the parameters to the following it now works.



./bandwidthTest --mode=range --start=1024 --end=102400 --increment=1024

[bandwidthTest] starting...



./bandwidthTest Starting...



Running on...



Device 0: GeForce 8600M GT

Range Mode



Host to Device Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

1024 20.3

2048 0.6

3072 92.1

4096 3.0

5120 116.8

6144 148.3

7168 165.1

8192 179.6

9216 190.7

10240 203.5

11264 216.6

12288 229.8

13312 236.4

14336 249.0

15360 251.3

16384 265.7

17408 269.9

18432 279.0

19456 285.9

20480 285.5

21504 299.4

22528 263.9

23552 301.9

24576 315.4

25600 313.8

26624 308.5

27648 328.4

28672 399.8

29696 558.6

30720 581.3

31744 580.0

32768 572.3

33792 603.5

34816 613.7

35840 626.0

36864 634.6

37888 468.6

38912 635.4

39936 606.5

40960 662.1

41984 654.2

43008 676.8

44032 680.6

45056 617.4

46080 695.3

47104 668.5

48128 693.3

49152 621.7

50176 688.5

51200 687.7

52224 714.6

53248 573.2

54272 722.9

55296 742.7

56320 729.8

57344 715.8

58368 702.8

59392 761.3

60416 768.2

61440 768.9

62464 779.7

63488 741.1

64512 742.1

65536 771.6

66560 759.3

67584 873.3

68608 878.3

69632 874.9

70656 815.8

71680 884.3

72704 892.4

73728 893.4

74752 879.0

75776 842.3

76800 874.0

77824 895.3

78848 897.3

79872 920.0

80896 880.7

81920 899.0

82944 910.3

83968 920.4

84992 905.6

86016 936.4

87040 886.8

88064 953.3

89088 932.6

90112 955.9

91136 938.6

92160 902.4

93184 853.7

94208 967.1

95232 953.0

96256 973.5

97280 961.4

98304 979.6

99328 958.8

100352 922.9

101376 979.5

102400 895.1



Device to Host Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

1024 20.5

2048 66.9

3072 96.4

4096 104.2

5120 150.2

6144 167.9

7168 197.0

8192 223.2

9216 233.1

10240 256.3

11264 274.0

12288 285.1

13312 307.4

14336 315.7

15360 332.2

16384 344.2

17408 355.5

18432 374.8

19456 388.2

20480 393.8

21504 403.7

22528 410.0

23552 427.8

24576 435.6

25600 453.0

26624 451.8

27648 453.8

28672 467.4

29696 474.4

30720 485.0

31744 480.5

32768 500.8

33792 498.1

34816 509.3

35840 517.9

36864 502.2

37888 526.0

38912 526.4

39936 539.5

40960 532.2

41984 549.2

43008 552.0

44032 559.2

45056 556.6

46080 564.8

47104 575.9

48128 567.3

49152 575.2

50176 577.9

51200 569.8

52224 533.8

53248 529.0

54272 526.0

55296 533.2

56320 588.3

57344 600.3

58368 556.6

59392 602.6

60416 604.0

61440 512.2

62464 567.3

63488 617.2

64512 543.5

65536 581.9

66560 118.1

67584 346.0

68608 644.6

69632 651.0

70656 644.2

71680 651.7

72704 580.2

73728 654.1

74752 654.6

75776 661.2

76800 641.4

77824 666.8

78848 663.1

79872 670.5

80896 667.4

81920 662.1

82944 681.9

83968 680.9

84992 680.6

86016 684.2

87040 685.4

88064 677.8

89088 681.9

90112 661.1

91136 624.4

92160 667.4

93184 700.8

94208 685.8

95232 699.2

96256 695.4

97280 706.6

98304 707.0

99328 704.8

100352 708.4

101376 709.3

102400 708.2



Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

1024 191.3

2048 519.7

3072 747.7

4096 971.9

5120 1193.5

6144 1391.9

7168 1569.6

8192 1709.7

9216 1865.9

10240 2091.7

11264 2112.6

12288 2278.1

13312 2351.0

14336 2449.8

15360 2604.6

16384 2606.3

17408 2707.7

18432 2918.0

19456 3055.8

20480 3121.2

21504 3307.7

22528 3424.6

23552 3410.6

24576 3477.0

25600 3537.0

26624 3700.0

27648 3767.6

28672 3814.7

29696 3993.7

30720 3985.8

31744 4057.7

32768 4214.8

33792 4289.1

34816 4330.5

35840 4344.6

36864 4396.3

37888 4547.5

38912 4589.1

39936 4741.8

40960 4661.8

41984 4831.9

43008 4785.3

44032 4893.7

45056 4969.6

46080 5040.5

47104 5086.3

48128 5179.9

49152 5138.9

50176 5224.9

51200 5304.6

52224 5315.6

53248 5411.5

54272 5426.7

55296 5544.9

56320 5568.9

57344 5688.1

58368 5645.9

59392 5778.7

60416 5794.2

61440 5870.6

62464 5884.5

63488 5919.3

64512 5998.8

65536 6035.6

66560 5985.6

67584 6078.6

68608 6188.5

69632 6191.8

70656 6238.2

71680 6246.3

72704 6296.9

73728 6407.0

74752 6467.7

75776 6429.3

76800 6509.7

77824 6533.3

78848 6545.6

79872 6605.7

80896 6606.1

81920 6621.7

82944 6669.2

83968 6764.3

84992 6743.8

86016 6781.7

87040 6795.0

88064 6943.2

89088 6870.3

90112 6887.8

91136 6904.1

92160 7013.7

93184 7006.7

94208 6990.2

95232 7114.9

96256 7083.1

97280 7126.8

98304 7019.7

99328 7141.6

100352 7156.6

101376 7162.7

102400 7130.3



[bandwidthTest] test results...

PASSED



> exiting in 3 seconds: 3...2...1...done!







Unfortunately I am not sure exactly what I have done to fix the issue so not going to help anyone greatly I'm afraid. It 'might' be that I hadn't added the CUDA/bin directory to the path and lib dir to the DYLD_LIBRARY_PATH although not sure. I did though re-boot the machine so if you get into this scenario try all of the above.

#3
Posted 02/21/2012 07:11 PM   
Last update - FYI - I am pretty sure the reason why my app was failing occasionally was down to other running apps using up the cards memory !

I tend to run windows under VMWare concurrently. As soon as I shut down VMWare the app ran fine all the time. Obvious I guess, but I suspect I am not the only one to miss this point. Hope this helps others.
Last update - FYI - I am pretty sure the reason why my app was failing occasionally was down to other running apps using up the cards memory !



I tend to run windows under VMWare concurrently. As soon as I shut down VMWare the app ran fine all the time. Obvious I guess, but I suspect I am not the only one to miss this point. Hope this helps others.

#4
Posted 02/27/2012 09:17 AM   
Scroll To Top