GTX480 to C2050 hack or unlocking TCC-mode on GeForce
  1 / 5    
[b]EDIT[/b]: Check out the [url="http://forums.nvidia.com/index.php?showtopic=195864&st=0&p=1210250&#entry1210250"]post below[/url] for instructions!

This is a follow-up on the somewhat frustrating problems I've been experiencing lately with CUDA, in particular for our pretty complex CUDA + DirectX interop + Multi-GPU setup, more info on that here: http://forums.nvidia.com/index.php?showtopic=193894

To recap a little from that topic.. I have a workstation with dual identical NVIDIA GeForce GTX480 graphics cards, Windows 7 x64. Our software consists of a graphics engine (DirectX 10) and a physics engine (CUDA) and some other misc. subsystems. Physics runs in its own CUDA context assigned to the secondary GTX480 card (separate CPU thread), generally at a rate of 1000Hz and higher. The primary card is used for DirectX 10 rendering, but also runs a CUDA context for DirectX<->CUDA interop in order to "bridge" vertex data from DirectX/primary onto the secondary card (and back).

Whenever the load on DirectX (primary card) gets sufficiently high and the rendering rate drops below somewhere around 10Hz, I notice that some CUDA runtime calls on the secondary card (physics) crawl to a halt, while the secondary card is basically running standalone from the rest of the application. So I disable the interop calls on the primary card and make sure there are no weird CPU<->GPU copies going on on the secondary card, but without any results. I then verify the problem by running some SDK samples on the secondary card alongside our software - with the physics completely disabled - in parallel and notice a performance drop in the execution times. I narrow the problem down to cudaMemcpy* calls taking up excessive CPU time for some kind of reason. I figure this shouldn't be happening, but may be down to the way DirectX works somewhere deep in kernel land.

Following the CUDA 4.0rc release, I read about the Tesla TCC performance driver and various "exclusive" compute modes and I figured this may be the solution to my problem as the entire WDDM is dropped. Unfortunately... I don't have a Tesla card at my disposal.

When you compare the Fermi cards in the consumer (GeForce) and HPC (Tesla) markets, and specifically the features that were introduced into CUDA 4.0 exclusively for the Tesla cards, it looks like NVIDIA may be pulling some kind of premium Tesla lock-in for whatever reason. This wouldn't be the first time that cards are deliberately crippled. My GeForce GTX480 runs on the same GF100 architecture as a Tesla C2050, despite lacking ECC-memory and probably some other high grade components, but this really shouldn't prevent me from using these features, right? Right.

So I dig up some low-level skills and naively figure that I could probably modify the firmware/softstraps to get the driver to detect my secondary card as a Tesla C2050 instead of a GTX480, use TCC-mode with high performance and live happily ever after. Turns out this assumption was true.

I got the secondary card to a state where it is now detected as a Tesla C2050, and I can use nvidia-smi to trigger TCC-mode (after which the card is no longer accessible through normal programs). I ran bandwidthTest to verify that the card was still working correctly, and noticed an immediate increase in performance:

Primary card, regular GTX480:
[code]
Device 0: GeForce GTX 480
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3377.4

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3534.5

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 119200.9
[/code]

Secondary card, GTX480 rigged with C2050 firmware:
(Note that the performance of the secondary card used to be identical to the first.)
[code]
Device 1: Tesla C2050
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5081.0

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5037.8

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 119437.2
[/code]

I then tried our own software and noticed there were no longer any performance issues going on with the cudaMemcpy* calls, and the physics engine ran as expected this time around! Mind you, no stability problems, BSODs or strange issues so far.

What irritates me most is that all these features are exclusively for Tesla while they obviously work on their GeForce counterparts as well. This leads me to believe that there is some deliberate crippling going on here, perhaps for commercial reasons or whatever.

I understand that these modifications are risky, highly experimental, unsupported and will probably result in a halt-and-catch-fire, but for the sake of documentation for those who are interested in the firmware modifications I'll probably post a follow-up on this post explaining more details.
EDIT: Check out the post below for instructions!



This is a follow-up on the somewhat frustrating problems I've been experiencing lately with CUDA, in particular for our pretty complex CUDA + DirectX interop + Multi-GPU setup, more info on that here: http://forums.nvidia.com/index.php?showtopic=193894



To recap a little from that topic.. I have a workstation with dual identical NVIDIA GeForce GTX480 graphics cards, Windows 7 x64. Our software consists of a graphics engine (DirectX 10) and a physics engine (CUDA) and some other misc. subsystems. Physics runs in its own CUDA context assigned to the secondary GTX480 card (separate CPU thread), generally at a rate of 1000Hz and higher. The primary card is used for DirectX 10 rendering, but also runs a CUDA context for DirectX<->CUDA interop in order to "bridge" vertex data from DirectX/primary onto the secondary card (and back).



Whenever the load on DirectX (primary card) gets sufficiently high and the rendering rate drops below somewhere around 10Hz, I notice that some CUDA runtime calls on the secondary card (physics) crawl to a halt, while the secondary card is basically running standalone from the rest of the application. So I disable the interop calls on the primary card and make sure there are no weird CPU<->GPU copies going on on the secondary card, but without any results. I then verify the problem by running some SDK samples on the secondary card alongside our software - with the physics completely disabled - in parallel and notice a performance drop in the execution times. I narrow the problem down to cudaMemcpy* calls taking up excessive CPU time for some kind of reason. I figure this shouldn't be happening, but may be down to the way DirectX works somewhere deep in kernel land.



Following the CUDA 4.0rc release, I read about the Tesla TCC performance driver and various "exclusive" compute modes and I figured this may be the solution to my problem as the entire WDDM is dropped. Unfortunately... I don't have a Tesla card at my disposal.



When you compare the Fermi cards in the consumer (GeForce) and HPC (Tesla) markets, and specifically the features that were introduced into CUDA 4.0 exclusively for the Tesla cards, it looks like NVIDIA may be pulling some kind of premium Tesla lock-in for whatever reason. This wouldn't be the first time that cards are deliberately crippled. My GeForce GTX480 runs on the same GF100 architecture as a Tesla C2050, despite lacking ECC-memory and probably some other high grade components, but this really shouldn't prevent me from using these features, right? Right.



So I dig up some low-level skills and naively figure that I could probably modify the firmware/softstraps to get the driver to detect my secondary card as a Tesla C2050 instead of a GTX480, use TCC-mode with high performance and live happily ever after. Turns out this assumption was true.



I got the secondary card to a state where it is now detected as a Tesla C2050, and I can use nvidia-smi to trigger TCC-mode (after which the card is no longer accessible through normal programs). I ran bandwidthTest to verify that the card was still working correctly, and noticed an immediate increase in performance:



Primary card, regular GTX480:



Device 0: GeForce GTX 480

Quick Mode



Host to Device Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 3377.4



Device to Host Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 3534.5



Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 119200.9




Secondary card, GTX480 rigged with C2050 firmware:

(Note that the performance of the secondary card used to be identical to the first.)



Device 1: Tesla C2050

Quick Mode



Host to Device Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5081.0



Device to Host Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5037.8



Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 119437.2




I then tried our own software and noticed there were no longer any performance issues going on with the cudaMemcpy* calls, and the physics engine ran as expected this time around! Mind you, no stability problems, BSODs or strange issues so far.



What irritates me most is that all these features are exclusively for Tesla while they obviously work on their GeForce counterparts as well. This leads me to believe that there is some deliberate crippling going on here, perhaps for commercial reasons or whatever.



I understand that these modifications are risky, highly experimental, unsupported and will probably result in a halt-and-catch-fire, but for the sake of documentation for those who are interested in the firmware modifications I'll probably post a follow-up on this post explaining more details.

Contact me at http://ijsf.nl/

#1
Posted 03/18/2011 08:30 PM   
I would be quite interested in this as well.

Did your double precision FP performance go up as well? (You can use [url="http://cuda-z.sourceforge.net/"]CUDA-Z[/url] to test)
I would be quite interested in this as well.



Did your double precision FP performance go up as well? (You can use CUDA-Z to test)

i7-920 @ 4 GHz (20x200) 1.41250 VCore, 1.45 QPI/UC, 2.02V PLL

24 GB DDR3-1600(2x CMX12GX3M3A1333C9)

eVGA GTX580 SC (980/1960/2350 @ 1.213V)

Mushkin Callisto Deluxe 60 GB

A-DATA S599 120 GB

Western Digital Black 1TB x2 (RAID 0)

Western Digital Blue 500GB x4 (RAID 5)

HP LP3065 + Samsung 245BW

Antec Earthwatts 750W



Cooling: Swiftech MCP35X + MCR420 + MicroRes V2 + Apogee XT + Danger Den GTX580 GPU block + Large tower fan.

#2
Posted 03/18/2011 09:44 PM   
Thanks. I'll do a quick CUDA-Z benchmark tomorrow using the primary card as reference.

P.S. I think I also remember nvidia-smi showing 2 DMA copy engines for the C2050, not sure about that though. Will check.
Thanks. I'll do a quick CUDA-Z benchmark tomorrow using the primary card as reference.



P.S. I think I also remember nvidia-smi showing 2 DMA copy engines for the C2050, not sure about that though. Will check.

Contact me at http://ijsf.nl/

#3
Posted 03/19/2011 12:46 AM   
I seriously doubt the higher double precision rate will be available just because you are using the Windows TCC driver. The faster host<->device memory bandwidth you get with a GTX 480 using the TCC driver can be achieved in Linux without any driver hacking at all. The bottleneck is the overhead of WDDM for compute tasks, which Linux naturally avoids for all cards, both GeForce and Tesla.

The other limitations on the GTX 480 relative to the Telsa cards, like fast double precision and bidirectional DMA, are almost certainly imposed by on-card firmware and not OS drivers. (However, if you discover I'm wrong, then there will be a lot of happy GTX 400 and 500 series users...)
I seriously doubt the higher double precision rate will be available just because you are using the Windows TCC driver. The faster host<->device memory bandwidth you get with a GTX 480 using the TCC driver can be achieved in Linux without any driver hacking at all. The bottleneck is the overhead of WDDM for compute tasks, which Linux naturally avoids for all cards, both GeForce and Tesla.



The other limitations on the GTX 480 relative to the Telsa cards, like fast double precision and bidirectional DMA, are almost certainly imposed by on-card firmware and not OS drivers. (However, if you discover I'm wrong, then there will be a lot of happy GTX 400 and 500 series users...)

#4
Posted 03/19/2011 04:38 PM   
Here is some additional benchmark/query information for the two cards.

CUDA-Z 0.5.95 - Primary GTX480 stock:
[code]
Core Information
----------------
Name: GeForce GTX 480
Compute Capability: 2.0
Clock Rate: 1401 MHz
Multiprocessors: 15
Warp Size: 32
Regs Per Block: 32768
Threads Per Block: 1024
Watchdog Enabled: No
Threads Dimentions: 1024 x 1024 x 64
Grid Dimentions: 65535 x 65535 x 65535

Memory Information
------------------
Total Global: 1471.56 MB
Shared Per Block: 48 KB
Pitch: 2.09715e+06 KB
Total Constant: 64 KB
Texture Alignment: 512
GPU Overlap: Yes

Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 5683.38 MB/s
Host Pageable to Device: 3002.56 MB/s
Device to Host Pinned: 5688.28 MB/s
Device to Host Pageable: 3356.25 MB/s
Device to Device: 58350.7 MB/s
GPU Core Performance
Single-precision Float: 1.26361e+06 Mflop/s
Double-precision Float: 168172 Mflop/s
32-bit Integer: 671633 Miop/s
24-bit Integer: 670834 Miop/s
[/code]

CUDA-Z 0.5.95 - Secondary GTX480 rigged:
[code]
Core Information
----------------
Name: Tesla C2050
Compute Capability: 2.0
Clock Rate: 1401 MHz
Multiprocessors: 15
Warp Size: 32
Regs Per Block: 32768
Threads Per Block: 1024
Watchdog Enabled: No
Threads Dimentions: 1024 x 1024 x 64
Grid Dimentions: 65535 x 65535 x 65535

Memory Information
------------------
Total Global: 1535.69 MB
Shared Per Block: 48 KB
Pitch: 2.09715e+06 KB
Total Constant: 64 KB
Texture Alignment: 512
GPU Overlap: Yes

Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 5736.88 MB/s
Host Pageable to Device: 4452.88 MB/s
Device to Host Pinned: 5737.86 MB/s
Device to Host Pageable: 5012.45 MB/s
Device to Device: 57671.8 MB/s
GPU Core Performance
Single-precision Float: 1.25623e+06 Mflop/s
Double-precision Float: 168116 Mflop/s
32-bit Integer: 670805 Miop/s
24-bit Integer: 670030 Miop/s
[/code]

deviceQuery 4.0 - Primary GTX480 stock:
[code]
Device 0: "GeForce GTX 480"
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1543045120 bytes
(15) Multiprocessors x (32) CUDA Cores/MP: 480 CUDA Cores
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.40 GHz
Concurrent copy and execution: Yes
# of Asynchronous Copy Engines: 1
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
[/code]

deviceQuery 4.0 - Secondary GTX480 rigged:
[code]
Device 1: "Tesla C2050"
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1610285056 bytes
(15) Multiprocessors x (32) CUDA Cores/MP: 480 CUDA Cores
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.40 GHz
Concurrent copy and execution: Yes
# of Asynchronous Copy Engines: 2
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: Yes
[/code]

The immediate performance boost I was talking about earlier obviously only affects pageable memory. Double precision rates are identical. Though, check out the additional asynchronous copy engine (bidirectional DMA?).

I may do some additional forensic research on the firmwares and see what happens.
Here is some additional benchmark/query information for the two cards.



CUDA-Z 0.5.95 - Primary GTX480 stock:



Core Information

----------------

Name: GeForce GTX 480

Compute Capability: 2.0

Clock Rate: 1401 MHz

Multiprocessors: 15

Warp Size: 32

Regs Per Block: 32768

Threads Per Block: 1024

Watchdog Enabled: No

Threads Dimentions: 1024 x 1024 x 64

Grid Dimentions: 65535 x 65535 x 65535



Memory Information

------------------

Total Global: 1471.56 MB

Shared Per Block: 48 KB

Pitch: 2.09715e+06 KB

Total Constant: 64 KB

Texture Alignment: 512

GPU Overlap: Yes



Performance Information

-----------------------

Memory Copy

Host Pinned to Device: 5683.38 MB/s

Host Pageable to Device: 3002.56 MB/s

Device to Host Pinned: 5688.28 MB/s

Device to Host Pageable: 3356.25 MB/s

Device to Device: 58350.7 MB/s

GPU Core Performance

Single-precision Float: 1.26361e+06 Mflop/s

Double-precision Float: 168172 Mflop/s

32-bit Integer: 671633 Miop/s

24-bit Integer: 670834 Miop/s




CUDA-Z 0.5.95 - Secondary GTX480 rigged:



Core Information

----------------

Name: Tesla C2050

Compute Capability: 2.0

Clock Rate: 1401 MHz

Multiprocessors: 15

Warp Size: 32

Regs Per Block: 32768

Threads Per Block: 1024

Watchdog Enabled: No

Threads Dimentions: 1024 x 1024 x 64

Grid Dimentions: 65535 x 65535 x 65535



Memory Information

------------------

Total Global: 1535.69 MB

Shared Per Block: 48 KB

Pitch: 2.09715e+06 KB

Total Constant: 64 KB

Texture Alignment: 512

GPU Overlap: Yes



Performance Information

-----------------------

Memory Copy

Host Pinned to Device: 5736.88 MB/s

Host Pageable to Device: 4452.88 MB/s

Device to Host Pinned: 5737.86 MB/s

Device to Host Pageable: 5012.45 MB/s

Device to Device: 57671.8 MB/s

GPU Core Performance

Single-precision Float: 1.25623e+06 Mflop/s

Double-precision Float: 168116 Mflop/s

32-bit Integer: 670805 Miop/s

24-bit Integer: 670030 Miop/s




deviceQuery 4.0 - Primary GTX480 stock:



Device 0: "GeForce GTX 480"

CUDA Driver Version: 4.0

CUDA Runtime Version: 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 1543045120 bytes

(15) Multiprocessors x (32) CUDA Cores/MP: 480 CUDA Cores

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Clock rate: 1.40 GHz

Concurrent copy and execution: Yes

# of Asynchronous Copy Engines: 1

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Concurrent kernel execution: Yes

Device has ECC support enabled: No

Device is using TCC driver mode: No




deviceQuery 4.0 - Secondary GTX480 rigged:



Device 1: "Tesla C2050"

CUDA Driver Version: 4.0

CUDA Runtime Version: 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 1610285056 bytes

(15) Multiprocessors x (32) CUDA Cores/MP: 480 CUDA Cores

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Clock rate: 1.40 GHz

Concurrent copy and execution: Yes

# of Asynchronous Copy Engines: 2

Run time limit on kernels: No

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

Concurrent kernel execution: Yes

Device has ECC support enabled: No

Device is using TCC driver mode: Yes




The immediate performance boost I was talking about earlier obviously only affects pageable memory. Double precision rates are identical. Though, check out the additional asynchronous copy engine (bidirectional DMA?).



I may do some additional forensic research on the firmwares and see what happens.

Contact me at http://ijsf.nl/

#5
Posted 03/19/2011 08:09 PM   
[quote name='seibert' date='19 March 2011 - 11:38 AM' timestamp='1300552720' post='1210054']
The other limitations on the GTX 480 relative to the Telsa cards, like fast double precision and bidirectional DMA, are almost certainly imposed by on-card firmware and not OS drivers. (However, if you discover I'm wrong, then there will be a lot of happy GTX 400 and 500 series users...)
[/quote]

As I understand it, he is modifying the firmware so that the card is detected as C2050. Since the driver doesnt seem to unlock faster double-FP I guess I would have to agree with you on that it is not in the driver :(

But I'm still interested on his technique to modify the card identification, since it may give hints on how to uncripple the rest of the features.
[quote name='seibert' date='19 March 2011 - 11:38 AM' timestamp='1300552720' post='1210054']

The other limitations on the GTX 480 relative to the Telsa cards, like fast double precision and bidirectional DMA, are almost certainly imposed by on-card firmware and not OS drivers. (However, if you discover I'm wrong, then there will be a lot of happy GTX 400 and 500 series users...)





As I understand it, he is modifying the firmware so that the card is detected as C2050. Since the driver doesnt seem to unlock faster double-FP I guess I would have to agree with you on that it is not in the driver :(



But I'm still interested on his technique to modify the card identification, since it may give hints on how to uncripple the rest of the features.

i7-920 @ 4 GHz (20x200) 1.41250 VCore, 1.45 QPI/UC, 2.02V PLL

24 GB DDR3-1600(2x CMX12GX3M3A1333C9)

eVGA GTX580 SC (980/1960/2350 @ 1.213V)

Mushkin Callisto Deluxe 60 GB

A-DATA S599 120 GB

Western Digital Black 1TB x2 (RAID 0)

Western Digital Blue 500GB x4 (RAID 5)

HP LP3065 + Samsung 245BW

Antec Earthwatts 750W



Cooling: Swiftech MCP35X + MCR420 + MicroRes V2 + Apogee XT + Danger Den GTX580 GPU block + Large tower fan.

#6
Posted 03/19/2011 09:49 PM   
Alright, so here's a quick tutorial on I modified my GTX480 firmware (PCI Expansion ROM). Please understand that this is UNSUPPORTED, UNTESTED and MAY VOID YOUR WARRANTY, so proceed AT YOUR OWN RISK. Changing your firmware (and especially softstraps) can potentially render your card useless where you may have to resort to hardware modifications.

Note that this tutorial assumes that you have a dual card setup, like me, so that you don't lose your graphics functionality (TCC mode) and you can easily recover from a broken firmware by using the primary card.

A short rundown of my own workstation:
[code]
Operating System: Windows 7 Professional, 64-bit
Driver version: 270.32
CPU: Intel i7 920 @ 2.67GHz
Bus: PCI Express x16 Gen2

Primary card: Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0
Secondary card: Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0
[/code]
Recommended firmware modification tools (you're advised to check out the documentation of each of these):
[list]
[*][b]NVIDIA Firmware Update Utility v5.95[/b]: http://downloads.guru3d.com/NVFlash-5.95.0.1-download-2590.html
This (official) nvflash tool works under Windows and allows you to do firmware manipulation. (Has a couple of interesting undocumented features as well.)
[*][b]NVIDIA BIOS Editor v6.01[/b]: http://www.mvktech.net/content/view/4875/143/
This (unofficial) tool is called NiBiTor and has some basic editing functionality for NVIDIA firmwares.
[*][b]Your favourite hex editor[/b] (I prefer HxD)
[/list]
In short, the goal of this firmware modification is to change the PCI Device ID of the card so it is detected as a Tesla series by the NVIDIA driver, enabling additional functionality that's otherwise disabled. Specifically, I want to change my Device ID from 06C0 (GeForce GTX 480) into 06D1 (Tesla C2050). Coincidentally, I have a HP C2050 firmware (version 70.00.2B.00.0E) lying around to do some comparisons.

Let's query the devices:
[code]
> nvflash -a

NVIDIA Firmware Update Utility (Version 5.95)

NVIDIA display adapters present in system:
<0> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00
<1> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00
[/code]
The card I'm interested in is the one hanging on bus id 3 or index 1, so let's save the firmware to a file called firmware.rom:
[code]
> nvflash --index=1 -b firmware.rom

NVIDIA Firmware Update Utility (Version 5.95)

Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

The display may go *BLANK* on and off for up to 10 seconds during access to the
EEPROM depending on your display adapter and output device.

Identifying EEPROM...
EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page
Reading adapter firmware image...
Image Size : 62464 bytes
Version : 70.00.21.00.02
~CRC32 : D040F75A
Subsystem ID : 10DE-075F
Hierarchy ID : Normal Board
Chip SKU : 375-0
Project : 1022-0000
CDP : N/A
Build Date : 04/14/10
Modification Date : 04/14/10
Saving of image completed.
[/code]
Open up the firmware with your hex editor, as we will be changing the following:
[list]
[*]Softstraps. The firmware contains a mechanism called softstraps, explained below, which allows the firmware to override certain chip settings (hardstraps) including the PCI Device ID. Manipulation of softstraps is done by using a combination of two sets of 32-bit AND + OR masks. Nvflash has an option to change the straps, which we will be using, though we will first need to read out the original straps from the firmware:
AND mask 0 location: 00000058, little endian
OR mask 0 location: 0000005C, little endian
AND mask 1 location: 00000060, little endian
OR mask 1 location: 00000064, little endian
(Additionally, 00000068 and 0000006C contain the checksums for the softstraps.)
[*]Regular PCI Device ID, location: 0000018E, little endian
[*]For the sake of authenticity, the board ID/boot strings and firmware versions will also be modified.
Board boot string location: 00000086
Board ID string location: 00000122
Firmware version location: 00000238, little endian
[/list]
I'll start by modifying the firmware's PCI Device ID value at 0000018E, located right after the PCI Vendor ID for NVIDIA (0x10DE) within the PCI block:
[code]
Little endian!

00000180: 91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 C0 06

(NEW)
00000180: 91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 D1 06
[/code]
Next up, I'll change the board boot string at 00000086 and board ID string at 00000122 into their C2050 counterparts:
[code]
00000086: GF100 P1022 SKU 0000 VGA BIOS
00000122: GF100 Board - 10220000

(NEW)
00000086: GF100 P1030 SKU 0200 VGA BIOS
00000122: GF100 Board - 10300200
[/code]
Then, the firmware version:
[code]
Little endian!

00000230: 00 00 00 00 00 00 00 00 00 21 00 70 02 00 00 00

(NEW)
00000230: 00 00 00 00 00 00 00 00 00 2B 00 70 0E 00 00 00
[/code]
So far for the firmware image modifications, so be sure to save your modified firmware. The checksum of the modified firmware image still needs to be recalculated. You can do this by opening your modified firmware in NiBiTor (ignore the warnings about unknown device IDs) and save the firmware to another file. The "Integrity" icon should be green in your final firmware file. (The checksum can be found in the Adv. Info tab, if you're interested.)

We're now ready to flash the modified firmware onto the secondary card. I'll be changing the softstraps later on by using nvflash separately, which makes it a lot easier. Let's use nvflash with a couple of options to make it clear that we totally want it to override all kinds of settings we really shouldn't be overriding and flash the modified firmware to the card. We'll also perform a full erase of the EEPROM first just to be sure:
[code]
> nvflash --index=1 --eraseeeprom

> nvflash --index=1 --overridesub --overrideboard --auto --noconfirm -5 -6 firmware-new.rom

NVIDIA Firmware Update Utility (Version 5.95)

Checking for matches between display adapter(s) and image(s)...

Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

WARNING: None of the firmware image compatible PCI Device ID's
match the PCI Device ID of the adapter.
Adapter PCI Device ID: 06C0
Firmware image PCI Device ID: 06D1
PCI Device ID override confirmation skipped.
Overriding GPU mismatch
Current - Version:70.00.21.00.02 ID:10DE:06C0:10DE:075F
GF100 Board - 10220000 (Normal Board)
Replace with - Version:70.00.2B.00.0E ID:10DE:06D1:10DE:075F
GF100 Board - 10300200 (Normal Board)
The display may go *BLANK* on and off for up to 10 seconds or more during the up
date process depending on your display adapter and output device.

Identifying EEPROM...
EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page
NOTE: Preserving board settings in preservation slot 6
NOTE: Preserving board settings in preservation slot 7
Clearing original firmware image...
.
Storing updated firmware image...
..
Verifying update...
Update successful.
[/code]
Note that we're using some undocumented features here (-5 -6 disables a couple of security checks for the overrides). If this doesn't work for you, you may want to check out -h or try these undocumented options:
[code]
--eraseeeprom erases all data from the EEPROM
--debug gives a load of debug information during the upgrade
--refreshstraps refreshes the softstraps on the card
[/code]
Now it's time to change the softstraps. Let's first do a binary comparison of the two PCI Device IDs:
[code]
GTX480 06C0 0000011011000000
C2050 06D1 0000011011010001
[/code]
I read out my AND/OR masks and compare them to the masks of the C2050 firmware. Turns out AND/OR 0 are different, but AND/OR 1 are identical:
[code]
Hex: AND mask 0 OR mask 0 AND mask 1 OR mask 1 CHECKSUM!
C2050 0x6FFC03FF 0x10000400 0x7FF1FFFF 0x80020000
GTX480 0x7FFC3FFF 0x00004000 0x7FF1FFFF 0x80020000
(ignore first bit! always set to zero)
[/code]
Softstraps in the firmware are applied over hardstraps on the card, where the AND and OR masks control how the hardstraps are modified. Masks should always be below or equal to 0x7FFFFFFF. The mechanism works as follows:
[code]
( ( [hardstraps] & [AND mask] ) | [OR mask] ) = final straps
[/code]
In practice, the AND mask allows you to disable certain hardstraps while the OR mask allows you to enable specific straps.

So far, I've managed to figure out the functionality of a few of the strap bits (any additional information is welcome!).
[code]
straps 0:
-xx+xxxx xxxxxxxx xx++++xx xxxxxxxx
^ ^^^^
| ||||-pci dev id[0]
| |||--pci dev id[1]
| ||---pci dev id[2]
| |----pci dev id[3]
|---------------------pci dev id[4]

- cannot be set, always 0
[/code]
So in my case, it's just a matter of ensuring that bits 0 and 4 of the PCI Device ID in straps 0 are set to 1. I can do this by adding the appropriate bits to OR mask 0. I'll take the original OR mask for my GTX 480 - which has a few other bits set to 1 as well for who knows what, I'll keep these just to be sure - and enable the appropriate bits for the PCI Device ID:
[code]
OR mask 0:
GTX480 -0000000 00000000 01000000 00000000
NEW -0010000 00000000 01000100 00000000
[/code]
Be sure to figure this out for your own card. If you've figured this out, you can use nvflash to apply the new masks:
[code]
(--straps [AND mask 0] [OR mask 0] [AND mask 1] [OR mask 1])
>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000
[/code]
nvflash will directly change the softstraps on the card's firmware. This means that if you save the firmware back to a file, you should be able to see that the softstraps at the appropriate locations in the firmware have changed (including the checksums at 00000068).

The secondary card should now contain the modified firmware + modified softstraps, but in order to see the changes, you'll now have to reboot your computer. The next boot in Windows will likely result in the secondary card being detected as new hardware, after which Windows will attempt to install the "appropriate" drivers (never works for me). Make sure that you immediately re-install the latest NVIDIA drivers (you don't need the special Tesla drivers) and do another reboot after the installation is complete.

Run nvflash again to verify that the PCI Device ID has indeed changed:
[code]
> nvflash -a

NVIDIA Firmware Update Utility (Version 5.95)

NVIDIA display adapters present in system:
<0> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00
<1> Tesla C2050 (10DE,06D1,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00
[/code]
If all went well, you should now be able to enable TCC by using the nvidia-smi tool located at C:\Program Files\NVIDIA Corporation\NVSMI. Note that with the 270.32 (CUDA 4.0rc) drivers, nvidia-smi is broken for me (known issue) and I had to grab the nvidia-smi tool from 263.06 (Tesla) in order to get things running:
[code]
> nvidia-smi -g 0

==============NVSMI LOG==============

Timestamp : 03/20/2011 12:53:07 AM
Driver Version : 270.32

GPU 0:
Product Name : Tesla C2050
PCI Device/Vendor ID : 6d110de
PCI Location ID : 0:3:0
Board Serial : 6182738065
Display : Not connected
Temperature : 46 C
Utilization
GPU : 0%
Memory : 0%
...

> nvidia-smi -g 0 -dm 1
TCC enabled for device 0
[/code]
Keep in mind that you may have to reboot in order for TCC to be enabled. If everything goes smoothly, your Tesla card should not be showing up in the NVIDIA Control Panel, and the deviceQuery SDK sample should be outputting the following information:

[code]
Device 1: "Tesla C2050"
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
...
Device is using TCC driver mode: Yes
[/code]
Congratulations, you're now running your secondary card in TCC! Don't forget to make sure that both your cards are still functioning correctly though. Don't forget to post a response!

P.S. If you're interested in helping out figure out the meaning of the strap bits.. one way of doing this is by trial and error: set all strap bits to 0 by using the AND mask (0x00000000), then add the appropriate PCI Device ID bits through the OR mask to make sure your card is still detected properly, and try settings a single bit to 1, reboot, perform a test/query (CUDA-Z or deviceQuery) and see what changes, then start over again and continue with the next bit.

[b]EDIT[/b]: Added instructions for working softstraps modifications. Added softstraps documentation.
Alright, so here's a quick tutorial on I modified my GTX480 firmware (PCI Expansion ROM). Please understand that this is UNSUPPORTED, UNTESTED and MAY VOID YOUR WARRANTY, so proceed AT YOUR OWN RISK. Changing your firmware (and especially softstraps) can potentially render your card useless where you may have to resort to hardware modifications.



Note that this tutorial assumes that you have a dual card setup, like me, so that you don't lose your graphics functionality (TCC mode) and you can easily recover from a broken firmware by using the primary card.



A short rundown of my own workstation:



Operating System: Windows 7 Professional, 64-bit

Driver version: 270.32

CPU: Intel i7 920 @ 2.67GHz

Bus: PCI Express x16 Gen2



Primary card: Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0

Secondary card: Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0


Recommended firmware modification tools (you're advised to check out the documentation of each of these):


  • NVIDIA Firmware Update Utility v5.95: http://downloads.guru3d.com/NVFlash-5.95.0.1-download-2590.html
  • This (official) nvflash tool works under Windows and allows you to do firmware manipulation. (Has a couple of interesting undocumented features as well.)

  • NVIDIA BIOS Editor v6.01: http://www.mvktech.net/content/view/4875/143/
  • This (unofficial) tool is called NiBiTor and has some basic editing functionality for NVIDIA firmwares.

  • Your favourite hex editor (I prefer HxD)


In short, the goal of this firmware modification is to change the PCI Device ID of the card so it is detected as a Tesla series by the NVIDIA driver, enabling additional functionality that's otherwise disabled. Specifically, I want to change my Device ID from 06C0 (GeForce GTX 480) into 06D1 (Tesla C2050). Coincidentally, I have a HP C2050 firmware (version 70.00.2B.00.0E) lying around to do some comparisons.



Let's query the devices:



> nvflash -a



NVIDIA Firmware Update Utility (Version 5.95)



NVIDIA display adapters present in system:

<0> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00

<1> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00


The card I'm interested in is the one hanging on bus id 3 or index 1, so let's save the firmware to a file called firmware.rom:



> nvflash --index=1 -b firmware.rom



NVIDIA Firmware Update Utility (Version 5.95)



Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00



The display may go *BLANK* on and off for up to 10 seconds during access to the

EEPROM depending on your display adapter and output device.



Identifying EEPROM...

EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page

Reading adapter firmware image...

Image Size : 62464 bytes

Version : 70.00.21.00.02

~CRC32 : D040F75A

Subsystem ID : 10DE-075F

Hierarchy ID : Normal Board

Chip SKU : 375-0

Project : 1022-0000

CDP : N/A

Build Date : 04/14/10

Modification Date : 04/14/10

Saving of image completed.


Open up the firmware with your hex editor, as we will be changing the following:


  • Softstraps. The firmware contains a mechanism called softstraps, explained below, which allows the firmware to override certain chip settings (hardstraps) including the PCI Device ID. Manipulation of softstraps is done by using a combination of two sets of 32-bit AND + OR masks. Nvflash has an option to change the straps, which we will be using, though we will first need to read out the original straps from the firmware:
  • AND mask 0 location: 00000058, little endian

    OR mask 0 location: 0000005C, little endian

    AND mask 1 location: 00000060, little endian

    OR mask 1 location: 00000064, little endian

    (Additionally, 00000068 and 0000006C contain the checksums for the softstraps.)

  • Regular PCI Device ID, location: 0000018E, little endian
  • For the sake of authenticity, the board ID/boot strings and firmware versions will also be modified.
  • Board boot string location: 00000086

    Board ID string location: 00000122

    Firmware version location: 00000238, little endian



I'll start by modifying the firmware's PCI Device ID value at 0000018E, located right after the PCI Vendor ID for NVIDIA (0x10DE) within the PCI block:



Little endian!



00000180: 91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 C0 06



(NEW)

00000180: 91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 D1 06


Next up, I'll change the board boot string at 00000086 and board ID string at 00000122 into their C2050 counterparts:



00000086: GF100 P1022 SKU 0000 VGA BIOS

00000122: GF100 Board - 10220000



(NEW)

00000086: GF100 P1030 SKU 0200 VGA BIOS

00000122: GF100 Board - 10300200


Then, the firmware version:



Little endian!



00000230: 00 00 00 00 00 00 00 00 00 21 00 70 02 00 00 00



(NEW)

00000230: 00 00 00 00 00 00 00 00 00 2B 00 70 0E 00 00 00


So far for the firmware image modifications, so be sure to save your modified firmware. The checksum of the modified firmware image still needs to be recalculated. You can do this by opening your modified firmware in NiBiTor (ignore the warnings about unknown device IDs) and save the firmware to another file. The "Integrity" icon should be green in your final firmware file. (The checksum can be found in the Adv. Info tab, if you're interested.)



We're now ready to flash the modified firmware onto the secondary card. I'll be changing the softstraps later on by using nvflash separately, which makes it a lot easier. Let's use nvflash with a couple of options to make it clear that we totally want it to override all kinds of settings we really shouldn't be overriding and flash the modified firmware to the card. We'll also perform a full erase of the EEPROM first just to be sure:



> nvflash --index=1 --eraseeeprom



> nvflash --index=1 --overridesub --overrideboard --auto --noconfirm -5 -6 firmware-new.rom



NVIDIA Firmware Update Utility (Version 5.95)



Checking for matches between display adapter(s) and image(s)...



Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00



WARNING: None of the firmware image compatible PCI Device ID's

match the PCI Device ID of the adapter.

Adapter PCI Device ID: 06C0

Firmware image PCI Device ID: 06D1

PCI Device ID override confirmation skipped.

Overriding GPU mismatch

Current - Version:70.00.21.00.02 ID:10DE:06C0:10DE:075F

GF100 Board - 10220000 (Normal Board)

Replace with - Version:70.00.2B.00.0E ID:10DE:06D1:10DE:075F

GF100 Board - 10300200 (Normal Board)

The display may go *BLANK* on and off for up to 10 seconds or more during the up

date process depending on your display adapter and output device.



Identifying EEPROM...

EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page

NOTE: Preserving board settings in preservation slot 6

NOTE: Preserving board settings in preservation slot 7

Clearing original firmware image...

.

Storing updated firmware image...

..

Verifying update...

Update successful.


Note that we're using some undocumented features here (-5 -6 disables a couple of security checks for the overrides). If this doesn't work for you, you may want to check out -h or try these undocumented options:



--eraseeeprom erases all data from the EEPROM

--debug gives a load of debug information during the upgrade

--refreshstraps refreshes the softstraps on the card


Now it's time to change the softstraps. Let's first do a binary comparison of the two PCI Device IDs:



GTX480 06C0 0000011011000000

C2050 06D1 0000011011010001


I read out my AND/OR masks and compare them to the masks of the C2050 firmware. Turns out AND/OR 0 are different, but AND/OR 1 are identical:



Hex: AND mask 0 OR mask 0 AND mask 1 OR mask 1 CHECKSUM!

C2050 0x6FFC03FF 0x10000400 0x7FF1FFFF 0x80020000

GTX480 0x7FFC3FFF 0x00004000 0x7FF1FFFF 0x80020000

(ignore first bit! always set to zero)


Softstraps in the firmware are applied over hardstraps on the card, where the AND and OR masks control how the hardstraps are modified. Masks should always be below or equal to 0x7FFFFFFF. The mechanism works as follows:



( ( [hardstraps] & [AND mask] ) | [OR mask] ) = final straps


In practice, the AND mask allows you to disable certain hardstraps while the OR mask allows you to enable specific straps.



So far, I've managed to figure out the functionality of a few of the strap bits (any additional information is welcome!).



straps 0:

-xx+xxxx xxxxxxxx xx++++xx xxxxxxxx

^ ^^^^

| ||||-pci dev id[0]

| |||--pci dev id[1]

| ||---pci dev id[2]

| |----pci dev id[3]

|---------------------pci dev id[4]



- cannot be set, always 0


So in my case, it's just a matter of ensuring that bits 0 and 4 of the PCI Device ID in straps 0 are set to 1. I can do this by adding the appropriate bits to OR mask 0. I'll take the original OR mask for my GTX 480 - which has a few other bits set to 1 as well for who knows what, I'll keep these just to be sure - and enable the appropriate bits for the PCI Device ID:



OR mask 0:

GTX480 -0000000 00000000 01000000 00000000

NEW -0010000 00000000 01000100 00000000


Be sure to figure this out for your own card. If you've figured this out, you can use nvflash to apply the new masks:



(--straps [AND mask 0] [OR mask 0] [AND mask 1] [OR mask 1])

>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000


nvflash will directly change the softstraps on the card's firmware. This means that if you save the firmware back to a file, you should be able to see that the softstraps at the appropriate locations in the firmware have changed (including the checksums at 00000068).



The secondary card should now contain the modified firmware + modified softstraps, but in order to see the changes, you'll now have to reboot your computer. The next boot in Windows will likely result in the secondary card being detected as new hardware, after which Windows will attempt to install the "appropriate" drivers (never works for me). Make sure that you immediately re-install the latest NVIDIA drivers (you don't need the special Tesla drivers) and do another reboot after the installation is complete.



Run nvflash again to verify that the PCI Device ID has indeed changed:



> nvflash -a



NVIDIA Firmware Update Utility (Version 5.95)



NVIDIA display adapters present in system:

<0> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00

<1> Tesla C2050 (10DE,06D1,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00


If all went well, you should now be able to enable TCC by using the nvidia-smi tool located at C:\Program Files\NVIDIA Corporation\NVSMI. Note that with the 270.32 (CUDA 4.0rc) drivers, nvidia-smi is broken for me (known issue) and I had to grab the nvidia-smi tool from 263.06 (Tesla) in order to get things running:



> nvidia-smi -g 0



==============NVSMI LOG==============



Timestamp : 03/20/2011 12:53:07 AM

Driver Version : 270.32



GPU 0:

Product Name : Tesla C2050

PCI Device/Vendor ID : 6d110de

PCI Location ID : 0:3:0

Board Serial : 6182738065

Display : Not connected

Temperature : 46 C

Utilization

GPU : 0%

Memory : 0%

...



> nvidia-smi -g 0 -dm 1

TCC enabled for device 0


Keep in mind that you may have to reboot in order for TCC to be enabled. If everything goes smoothly, your Tesla card should not be showing up in the NVIDIA Control Panel, and the deviceQuery SDK sample should be outputting the following information:





Device 1: "Tesla C2050"

CUDA Driver Version: 4.0

CUDA Runtime Version: 4.0

...

Device is using TCC driver mode: Yes


Congratulations, you're now running your secondary card in TCC! Don't forget to make sure that both your cards are still functioning correctly though. Don't forget to post a response!



P.S. If you're interested in helping out figure out the meaning of the strap bits.. one way of doing this is by trial and error: set all strap bits to 0 by using the AND mask (0x00000000), then add the appropriate PCI Device ID bits through the OR mask to make sure your card is still detected properly, and try settings a single bit to 1, reboot, perform a test/query (CUDA-Z or deviceQuery) and see what changes, then start over again and continue with the next bit.



EDIT: Added instructions for working softstraps modifications. Added softstraps documentation.

Contact me at http://ijsf.nl/

#7
Posted 03/20/2011 12:10 AM   
Thanks! I am archiving your post because I think that the Ministry of Truth will have this redacted ASAP.
Thanks! I am archiving your post because I think that the Ministry of Truth will have this redacted ASAP.

#8
Posted 03/20/2011 01:16 AM   
Thank you for a well written guide!

Just wanted to point out one thing: the memory is little-endian, so your byte/word orders are a little bit confusing to figure out at first. (I'm a long time electrical engineer, so I'm comfortable with hex, although I haven't really used any real hex editors, so my apologies if I was too presumptuous about how these things should be displayed)

Anyways, my hex below is in actual word order with the MSB on the left.

Just for your reference

GTX580 (in my sig):

AND MASK 0: 0xFFFFFFFF
OR MASK 0: 0x00000000

AND MASK 1: 0x7FFFFFFF
OR MASK 1: 0x80000000

Quadro 6000:


AND MASK 0: 0x7FFC3FFF
OR MASK 0: 0x00004000


AND MASK 1: 0x7FF0FFFF
OR MASK 1: 0x80030000

I'll be working on converting my GTX580 to a Quadro 6000 tommorow.

Edit: Fix endianness
Thank you for a well written guide!



Just wanted to point out one thing: the memory is little-endian, so your byte/word orders are a little bit confusing to figure out at first. (I'm a long time electrical engineer, so I'm comfortable with hex, although I haven't really used any real hex editors, so my apologies if I was too presumptuous about how these things should be displayed)



Anyways, my hex below is in actual word order with the MSB on the left.



Just for your reference



GTX580 (in my sig):



AND MASK 0: 0xFFFFFFFF

OR MASK 0: 0x00000000



AND MASK 1: 0x7FFFFFFF

OR MASK 1: 0x80000000



Quadro 6000:





AND MASK 0: 0x7FFC3FFF

OR MASK 0: 0x00004000





AND MASK 1: 0x7FF0FFFF

OR MASK 1: 0x80030000



I'll be working on converting my GTX580 to a Quadro 6000 tommorow.



Edit: Fix endianness

i7-920 @ 4 GHz (20x200) 1.41250 VCore, 1.45 QPI/UC, 2.02V PLL

24 GB DDR3-1600(2x CMX12GX3M3A1333C9)

eVGA GTX580 SC (980/1960/2350 @ 1.213V)

Mushkin Callisto Deluxe 60 GB

A-DATA S599 120 GB

Western Digital Black 1TB x2 (RAID 0)

Western Digital Blue 500GB x4 (RAID 5)

HP LP3065 + Samsung 245BW

Antec Earthwatts 750W



Cooling: Swiftech MCP35X + MCR420 + MicroRes V2 + Apogee XT + Danger Den GTX580 GPU block + Large tower fan.

#9
Posted 03/20/2011 03:34 AM   
[quote name='ijsfz' date='19 March 2011 - 07:10 PM' timestamp='1300579804' post='1210250']
OFFSET HEX BINARY
00000058: FF 3F FC 7F 11111111 [b]00111111[/b] 11111100 01111111 <--- GTX480 (ID 06C0) AND mask 0
00000058: FF 03 FC 6F 11111111 [b]00000011[/b] 11111100 01101111 <--- C2050 (ID 06D1) AND mask 0
[b]||||[/b] |
(NEW) 3B 6F [b]00111011[/b] 01101111
[b] ^ ^ [/b] ^
[/quote]

I'm confused by this step, you seem to be ANDing the two AND masks together, but the middle 3 digits should turn out 0 no?
[quote name='ijsfz' date='19 March 2011 - 07:10 PM' timestamp='1300579804' post='1210250']

OFFSET HEX BINARY

00000058: FF 3F FC 7F 11111111 00111111 11111100 01111111 <--- GTX480 (ID 06C0) AND mask 0

00000058: FF 03 FC 6F 11111111 00000011 11111100 01101111 <--- C2050 (ID 06D1) AND mask 0

|||| |

(NEW) 3B 6F 00111011 01101111

^ ^ ^





I'm confused by this step, you seem to be ANDing the two AND masks together, but the middle 3 digits should turn out 0 no?

i7-920 @ 4 GHz (20x200) 1.41250 VCore, 1.45 QPI/UC, 2.02V PLL

24 GB DDR3-1600(2x CMX12GX3M3A1333C9)

eVGA GTX580 SC (980/1960/2350 @ 1.213V)

Mushkin Callisto Deluxe 60 GB

A-DATA S599 120 GB

Western Digital Black 1TB x2 (RAID 0)

Western Digital Blue 500GB x4 (RAID 5)

HP LP3065 + Samsung 245BW

Antec Earthwatts 750W



Cooling: Swiftech MCP35X + MCR420 + MicroRes V2 + Apogee XT + Danger Den GTX580 GPU block + Large tower fan.

#10
Posted 03/20/2011 03:58 AM   
[quote name='hocheung20' date='20 March 2011 - 04:58 AM' timestamp='1300593486' post='1210320']
I'm confused by this step, you seem to be ANDing the two AND masks together, but the middle 3 digits should turn out 0 no?
[/quote]

No, in my case I just took the GTX 480 AND mask and put the bits to 0 that would later be OR'ed to 1 (through the OR mask). As far as I can tell, the firmware takes whatever the strap values are, applies the AND mask, and then applies the OR mask, so for case 0:

(((hard straps 0) & AND mask 0) | OR mask 0)

You seem to want to go from a GTX 580 (1080) to a Quadro 6000 (06D8 or 06DC). I'm not sure how far the soft straps will allow to you go here.
[quote name='hocheung20' date='20 March 2011 - 04:58 AM' timestamp='1300593486' post='1210320']

I'm confused by this step, you seem to be ANDing the two AND masks together, but the middle 3 digits should turn out 0 no?





No, in my case I just took the GTX 480 AND mask and put the bits to 0 that would later be OR'ed to 1 (through the OR mask). As far as I can tell, the firmware takes whatever the strap values are, applies the AND mask, and then applies the OR mask, so for case 0:



(((hard straps 0) & AND mask 0) | OR mask 0)



You seem to want to go from a GTX 580 (1080) to a Quadro 6000 (06D8 or 06DC). I'm not sure how far the soft straps will allow to you go here.

Contact me at http://ijsf.nl/

#11
Posted 03/20/2011 12:41 PM   
Unfortunately I'm having a bit of trouble looking for an explanation as to why the values I used before are working. I've tried a couple of combination of masks, but they all result in the device being detected as 06C0 (GTX 480).

The combinations I've tried for AND mask 0:
[code]
11111111 00111011 11111100 01101111 works

11111111 11111111 11111100 01101111 fails
11111111 00110011 11111100 01101111 fails
11111111 00100011 11111100 01101111 fails
11111111 00011011 11111100 01101111 fails
11111111 00000011 11111100 01101111 fails
11111111 00000000 11111100 01101111 fails
[/code]

I'm suspecting that there are two 32-bit values located at 0x68 and 0x6C that serve as some kind of checksum for the softstraps. These seem to change proportionally with the softstraps in different firmwares.

[b]EDIT: You might as well save yourself the trouble and just use nvflash's straps option to change the straps properly, instead of editing the straps in the firmware.[/b] You can still read out the strap values from your firmware though and probably use that as a guideline to do the modifications. As pointed out before, they're all little endian.

[code]
Note: --straps (AND mask 0) (OR mask 0) (AND mask 1) (OR mask 1), and all masks should be below 0x7FFFFFFF

>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000

NVIDIA Firmware Update Utility (Version 5.95)

Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

The display may go *BLANK* on and off for up to 10 seconds during access to the
EEPROM depending on your display adapter and output device.

Identifying EEPROM...
EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page
Reading adapter firmware image...
Erasing EEPROM...
.
Storing updated firmware image...

Verifying update...
Update successful.
[/code]

To verify, you can then download the firmware with nvflash (-b) and the straps should've been changed properly.
Unfortunately I'm having a bit of trouble looking for an explanation as to why the values I used before are working. I've tried a couple of combination of masks, but they all result in the device being detected as 06C0 (GTX 480).



The combinations I've tried for AND mask 0:



11111111 00111011 11111100 01101111 works



11111111 11111111 11111100 01101111 fails

11111111 00110011 11111100 01101111 fails

11111111 00100011 11111100 01101111 fails

11111111 00011011 11111100 01101111 fails

11111111 00000011 11111100 01101111 fails

11111111 00000000 11111100 01101111 fails




I'm suspecting that there are two 32-bit values located at 0x68 and 0x6C that serve as some kind of checksum for the softstraps. These seem to change proportionally with the softstraps in different firmwares.



EDIT: You might as well save yourself the trouble and just use nvflash's straps option to change the straps properly, instead of editing the straps in the firmware. You can still read out the strap values from your firmware though and probably use that as a guideline to do the modifications. As pointed out before, they're all little endian.





Note: --straps (AND mask 0) (OR mask 0) (AND mask 1) (OR mask 1), and all masks should be below 0x7FFFFFFF



>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000



NVIDIA Firmware Update Utility (Version 5.95)



Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00



The display may go *BLANK* on and off for up to 10 seconds during access to the

EEPROM depending on your display adapter and output device.



Identifying EEPROM...

EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page

Reading adapter firmware image...

Erasing EEPROM...

.

Storing updated firmware image...



Verifying update...

Update successful.




To verify, you can then download the firmware with nvflash (-b) and the straps should've been changed properly.

Contact me at http://ijsf.nl/

#12
Posted 03/21/2011 02:11 PM   
Alright, I managed to find a proper way to flash the softstraps by using nvflash and found out where the relevant PCI Device ID bits are encoded in the softstraps. Be sure to take a look at the updated guide!
Alright, I managed to find a proper way to flash the softstraps by using nvflash and found out where the relevant PCI Device ID bits are encoded in the softstraps. Be sure to take a look at the updated guide!

Contact me at http://ijsf.nl/

#13
Posted 03/22/2011 07:20 PM   
[quote name='ijsfz' date='22 March 2011 - 02:20 PM' timestamp='1300821614' post='1211767']
Alright, I managed to find a proper way to flash the softstraps by using nvflash and found out where the relevant PCI Device ID bits are encoded in the softstraps. Be sure to take a look at the updated guide!
[/quote]

I wasn't having much luck changing my device ID manually through the firmware. Maybe it was the checksum issue you were talking about. I also noticed the strap flashing option on nvflash. I will give it a go tonight.

Any ideas on what strap 1 does?
[quote name='ijsfz' date='22 March 2011 - 02:20 PM' timestamp='1300821614' post='1211767']

Alright, I managed to find a proper way to flash the softstraps by using nvflash and found out where the relevant PCI Device ID bits are encoded in the softstraps. Be sure to take a look at the updated guide!





I wasn't having much luck changing my device ID manually through the firmware. Maybe it was the checksum issue you were talking about. I also noticed the strap flashing option on nvflash. I will give it a go tonight.



Any ideas on what strap 1 does?

i7-920 @ 4 GHz (20x200) 1.41250 VCore, 1.45 QPI/UC, 2.02V PLL

24 GB DDR3-1600(2x CMX12GX3M3A1333C9)

eVGA GTX580 SC (980/1960/2350 @ 1.213V)

Mushkin Callisto Deluxe 60 GB

A-DATA S599 120 GB

Western Digital Black 1TB x2 (RAID 0)

Western Digital Blue 500GB x4 (RAID 5)

HP LP3065 + Samsung 245BW

Antec Earthwatts 750W



Cooling: Swiftech MCP35X + MCR420 + MicroRes V2 + Apogee XT + Danger Den GTX580 GPU block + Large tower fan.

#14
Posted 03/22/2011 09:42 PM   
I'll try and find out somewhere tomorrow. If you do try, don't forget to clear the MSB of the masks so they stay below 0x7FFFFFFF.
I'll try and find out somewhere tomorrow. If you do try, don't forget to clear the MSB of the masks so they stay below 0x7FFFFFFF.

Contact me at http://ijsf.nl/

#15
Posted 03/22/2011 11:40 PM   
  1 / 5    
Scroll To Top