Fermi and partition camping
  1 / 2    
Is partition camping an issue for Fermi cards? I've noticed that for the matrix transpose example in SDK, there's not much performance difference between coarse-grained and fine-grained transposes.
Is partition camping an issue for Fermi cards? I've noticed that for the matrix transpose example in SDK, there's not much performance difference between coarse-grained and fine-grained transposes.

#1
Posted 06/16/2010 12:14 AM   
No, Fermi changed the design of the memory controllers and should not be affected by partition camping. Hooray!
No, Fermi changed the design of the memory controllers and should not be affected by partition camping. Hooray!

#2
Posted 06/16/2010 09:10 AM   
Did someone have technical explanation about it?

Because anyway you put it, there's bandwidth limitation for each memory controller, and if ONE is used for all memory accesses, there should be a huge penalty, unless partitioning is dynamically done and fine-grained (that is really improbable).
Did someone have technical explanation about it?



Because anyway you put it, there's bandwidth limitation for each memory controller, and if ONE is used for all memory accesses, there should be a huge penalty, unless partitioning is dynamically done and fine-grained (that is really improbable).

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#3
Posted 08/09/2010 04:16 PM   
Did someone have technical explanation about it?

Because anyway you put it, there's bandwidth limitation for each memory controller, and if ONE is used for all memory accesses, there should be a huge penalty, unless partitioning is dynamically done and fine-grained (that is really improbable).
Did someone have technical explanation about it?



Because anyway you put it, there's bandwidth limitation for each memory controller, and if ONE is used for all memory accesses, there should be a huge penalty, unless partitioning is dynamically done and fine-grained (that is really improbable).

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#4
Posted 08/09/2010 04:16 PM   
I can't really go into details, but there is no longer a linear mapping between addresses and partitions, so typical access patterns are unlikely to all fall into the same partition.
I can't really go into details, but there is no longer a linear mapping between addresses and partitions, so typical access patterns are unlikely to all fall into the same partition.

#5
Posted 08/09/2010 05:15 PM   
I can't really go into details, but there is no longer a linear mapping between addresses and partitions, so typical access patterns are unlikely to all fall into the same partition.
I can't really go into details, but there is no longer a linear mapping between addresses and partitions, so typical access patterns are unlikely to all fall into the same partition.

#6
Posted 08/09/2010 05:15 PM   
[quote name='iAPX' post='1101471' date='Aug 9 2010, 05:16 PM']Did someone have technical explanation about it?[/quote]

If you are curious about how this can be done, you can look at papers from the 80's or 90's about memory interleaving schemes, like skewing or random interleaving...
[quote name='iAPX' post='1101471' date='Aug 9 2010, 05:16 PM']Did someone have technical explanation about it?



If you are curious about how this can be done, you can look at papers from the 80's or 90's about memory interleaving schemes, like skewing or random interleaving...
#7
Posted 08/10/2010 08:59 AM   
[quote name='iAPX' post='1101471' date='Aug 9 2010, 05:16 PM']Did someone have technical explanation about it?[/quote]

If you are curious about how this can be done, you can look at papers from the 80's or 90's about memory interleaving schemes, like skewing or random interleaving...
[quote name='iAPX' post='1101471' date='Aug 9 2010, 05:16 PM']Did someone have technical explanation about it?



If you are curious about how this can be done, you can look at papers from the 80's or 90's about memory interleaving schemes, like skewing or random interleaving...
#8
Posted 08/10/2010 08:59 AM   
I know some, but it just handle [b]typical access patterns[/b], they don't warranty that in any particular case there won't be partition camping!

Avoiding it on typical case is relatively easy, but ensuring that it's immune is wrong, from my point-of-view, until you dynamically MOVE memory content and remap them dynamically with a load analyse on each memory controller. That's my point.
I know some, but it just handle typical access patterns, they don't warranty that in any particular case there won't be partition camping!



Avoiding it on typical case is relatively easy, but ensuring that it's immune is wrong, from my point-of-view, until you dynamically MOVE memory content and remap them dynamically with a load analyse on each memory controller. That's my point.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#9
Posted 08/11/2010 10:47 PM   
I know some, but it just handle [b]typical access patterns[/b], they don't warranty that in any particular case there won't be partition camping!

Avoiding it on typical case is relatively easy, but ensuring that it's immune is wrong, from my point-of-view, until you dynamically MOVE memory content and remap them dynamically with a load analyse on each memory controller. That's my point.
I know some, but it just handle typical access patterns, they don't warranty that in any particular case there won't be partition camping!



Avoiding it on typical case is relatively easy, but ensuring that it's immune is wrong, from my point-of-view, until you dynamically MOVE memory content and remap them dynamically with a load analyse on each memory controller. That's my point.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#10
Posted 08/11/2010 10:47 PM   
[quote name='Simon Green' post='1074283' date='Jun 16 2010, 02:10 AM']No, Fermi changed the design of the memory controllers and should not be affected by partition camping. Hooray![/quote]
It seems that Fermi is still affected by partition camping, although in a lighter degree. Here is the performance I get in transpose:
[center][img]http://img84.imageshack.us/img84/4206/transposegtx480.png[/img][/center]
[quote name='Simon Green' post='1074283' date='Jun 16 2010, 02:10 AM']No, Fermi changed the design of the memory controllers and should not be affected by partition camping. Hooray!

It seems that Fermi is still affected by partition camping, although in a lighter degree. Here is the performance I get in transpose:

Image

#11
Posted 08/14/2010 03:49 PM   
[quote name='Simon Green' post='1074283' date='Jun 16 2010, 02:10 AM']No, Fermi changed the design of the memory controllers and should not be affected by partition camping. Hooray![/quote]
It seems that Fermi is still affected by partition camping, although in a lighter degree. Here is the performance I get in transpose:
[center][img]http://img84.imageshack.us/img84/4206/transposegtx480.png[/img][/center]
[quote name='Simon Green' post='1074283' date='Jun 16 2010, 02:10 AM']No, Fermi changed the design of the memory controllers and should not be affected by partition camping. Hooray!

It seems that Fermi is still affected by partition camping, although in a lighter degree. Here is the performance I get in transpose:

Image

#12
Posted 08/14/2010 03:49 PM   
Interesting...

Is this for 32-bit data, 1 output per thread? That would suggest that conflicts appear with a stride of 1536B. (Seems to match nicely GT200-like partitions, handling blocks of 256B each...)

Did you try with other cache configurations (16K/48K) or caching policies (-Xptxas -dlcm...)? Just to be sure to rule out any cache-related effect...
Interesting...



Is this for 32-bit data, 1 output per thread? That would suggest that conflicts appear with a stride of 1536B. (Seems to match nicely GT200-like partitions, handling blocks of 256B each...)



Did you try with other cache configurations (16K/48K) or caching policies (-Xptxas -dlcm...)? Just to be sure to rule out any cache-related effect...
#13
Posted 08/14/2010 06:56 PM   
Interesting...

Is this for 32-bit data, 1 output per thread? That would suggest that conflicts appear with a stride of 1536B. (Seems to match nicely GT200-like partitions, handling blocks of 256B each...)

Did you try with other cache configurations (16K/48K) or caching policies (-Xptxas -dlcm...)? Just to be sure to rule out any cache-related effect...
Interesting...



Is this for 32-bit data, 1 output per thread? That would suggest that conflicts appear with a stride of 1536B. (Seems to match nicely GT200-like partitions, handling blocks of 256B each...)



Did you try with other cache configurations (16K/48K) or caching policies (-Xptxas -dlcm...)? Just to be sure to rule out any cache-related effect...
#14
Posted 08/14/2010 06:56 PM   
[quote name='Sylvain Collange' post='1103735' date='Aug 14 2010, 11:56 AM']Is this for 32-bit data, 1 output per thread? That would suggest that conflicts appear with a stride of 1536B. (Seems to match nicely GT200-like partitions, handling blocks of 256B each...)

Did you try with other cache configurations (16K/48K) or caching policies (-Xptxas -dlcm...)? Just to be sure to rule out any cache-related effect...[/quote]
It is for 32-bit data, 4 outputs per thread. But I see same effect when computing 1 output per thread and when compiling with "-Xptxas -dlcm=cg".
[quote name='Sylvain Collange' post='1103735' date='Aug 14 2010, 11:56 AM']Is this for 32-bit data, 1 output per thread? That would suggest that conflicts appear with a stride of 1536B. (Seems to match nicely GT200-like partitions, handling blocks of 256B each...)



Did you try with other cache configurations (16K/48K) or caching policies (-Xptxas -dlcm...)? Just to be sure to rule out any cache-related effect...

It is for 32-bit data, 4 outputs per thread. But I see same effect when computing 1 output per thread and when compiling with "-Xptxas -dlcm=cg".

#15
Posted 08/14/2010 07:08 PM   
  1 / 2    
Scroll To Top