PyCUDA -> C/CUDA
Hello,

We are currently looking for professional software developers being able to translate existing PyCuda code to pure C/Cuda or preferrable to OpenCL. Our working source code comprises approx. five thousand lines of PyCuda code and implements an image processing application. Emphasis should be laid on platform-independent, robust and fast execution. In particular, the implementation should also run on a CPU making use of common multicore architectures. As we are planning to commercialize the application, the code shouldn't make any use of proprietary libraries. The code should be developed using unit tests and contain enough documentation to understand and modify the source code.

On the project's background: we are two PhD students (+ an MBA for the business stuff) working in the field of advanced image processing and computational photography. We are thinking of founding a startup to commercialize one of our latest developments. Since our financial means are limited, we try to find out how much of the code development can be outsourced, how much has to be done by our own. Unfortunately, both of us lack expertise in professional software development, which is why we seek professional assistance to set the basis for a successful company.

Requirements and features which should be supported by implementation
- Intelligent GPU memory handling: images larger than the available memory should be processable
this can be achieved by batching the image
- The batch size should be dependent on the available memory
- In order to avoid performance loss, while processing one batch, the next batch should already be asynchronously transfered
- In absence of any GPU, code should run on CPU through C++ implementation
- Should ideally be easily portable to OpenCL
- Should make use of multicore architecture of modern CPUs
- It is allowed to specify hardware requirements, which however should be as low as possible
- C++ implementation should/can use Intel MKL for maximum performance
- Code should be robust and handle errors
- Code should run in single and double precision
- Code should be easily expandable
- Results of example cases should be reproduced
- Code should follow unit testing paradigm
- Code should follow consistent style, e.g. Google C++ style guide
- Class structure of python template does not have to be followed strictly
- Documentation
- Code repository should be used

Thank you very much for your efforts and best regards,
Cheers

Lui
Hello,



We are currently looking for professional software developers being able to translate existing PyCuda code to pure C/Cuda or preferrable to OpenCL. Our working source code comprises approx. five thousand lines of PyCuda code and implements an image processing application. Emphasis should be laid on platform-independent, robust and fast execution. In particular, the implementation should also run on a CPU making use of common multicore architectures. As we are planning to commercialize the application, the code shouldn't make any use of proprietary libraries. The code should be developed using unit tests and contain enough documentation to understand and modify the source code.



On the project's background: we are two PhD students (+ an MBA for the business stuff) working in the field of advanced image processing and computational photography. We are thinking of founding a startup to commercialize one of our latest developments. Since our financial means are limited, we try to find out how much of the code development can be outsourced, how much has to be done by our own. Unfortunately, both of us lack expertise in professional software development, which is why we seek professional assistance to set the basis for a successful company.



Requirements and features which should be supported by implementation

- Intelligent GPU memory handling: images larger than the available memory should be processable

this can be achieved by batching the image

- The batch size should be dependent on the available memory

- In order to avoid performance loss, while processing one batch, the next batch should already be asynchronously transfered

- In absence of any GPU, code should run on CPU through C++ implementation

- Should ideally be easily portable to OpenCL

- Should make use of multicore architecture of modern CPUs

- It is allowed to specify hardware requirements, which however should be as low as possible

- C++ implementation should/can use Intel MKL for maximum performance

- Code should be robust and handle errors

- Code should run in single and double precision

- Code should be easily expandable

- Results of example cases should be reproduced

- Code should follow unit testing paradigm

- Code should follow consistent style, e.g. Google C++ style guide

- Class structure of python template does not have to be followed strictly

- Documentation

- Code repository should be used



Thank you very much for your efforts and best regards,

Cheers



Lui

#1
Posted 01/29/2012 04:25 PM   
Shoot me an email to discuss, see signature below.

Also, take a look at the following for background before we chat:

* [url="http://accelereyes.com/products/consulting"]AccelerEyes consulting[/url]
* [url="http://accelereyes.com/products/arrayfire"]ArrayFire[/url]

-John
Shoot me an email to discuss, see signature below.



Also, take a look at the following for background before we chat:



* AccelerEyes consulting

* ArrayFire



-John

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

#2
Posted 01/29/2012 05:08 PM   
Feel free to contact me at erich at royal-caliber dot com to discuss further. I think my company will be able to help you.
Feel free to contact me at erich at royal-caliber dot com to discuss further. I think my company will be able to help you.

#3
Posted 02/02/2012 06:05 PM   
Scroll To Top