# GPUs /CUDA technology for RAW image decoding and processing



## Neutral (Jan 23, 2015)

I am wondering when image processing softwares (LR, DXO, C1 etc.) will start using GPU for image processing .
This could drastically increase performance and processing capabilities and could help to implement more complicated and more resource demanding algorithms. Especially using NVIDIA CUDA – when it is possible to utilize more than 1500 processors on latest NVIDA cards for processing instead of just 4 or 8 cores on main CPU.
One who first implement this could have great advantage over other competitors.

This question comes to my mind each time when there are news about new major S/W releases – e.g. now with the information that Lightroom 6 will be coming soon.

Earlier Adobe was telling about difficulties to implement parallel processing but this does not reflect current realities.

Simple search on WEB shows that there are patents existing for GPU image processing as well as number implementations and API Libraries to utilize CUDA technology for image and video processing including RAW files processing and some implementations which provide amazing processing speed.

Here are some references:

1. GPU Raw image processing patent US 8098964 B2
http://www.google.ca/patents/US8098964

2. http://www.ximea.com/de/technology-news/gpu

3. http://on-demand.gputechconf.com/siggraph/2013/presentation/SG3108-GPU-Programming-Video-Image-Processing.pdf
------
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0CDUQFjAE&url=http%3A%2F%2Fon-demand.gputechconf.com%2Fsiggraph%2F2013%2Fpresentation%2FSG3108-GPU-Programming-Video-Image-Processing.pdf&ei=cfjBVOPfFpDiav_0gcAF&usg=AFQjCNEZ78COMnMT4hvBZrflwN3-b_ZibQ&bvm=bv.84349003,d.d2s
------
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0CDUQFjAE&url=http%3A%2F%2Fon-demand.gputechconf.com%2Fsiggraph%2F2013%2Fpresentation%2FSG3108-GPU-Programming-Video-Image-Processing.pdf&ei=cfjBVOPfFpDiav_0gcAF&usg=AFQjCNEZ78COMnMT4hvBZrflwN3-b_ZibQ&bvm=bv.84349003,d.d2s


----------



## Lawliet (Jan 23, 2015)

Neutral said:


> I am wondering when image processing softwares (LR, DXO, C1 etc.) will start using GPU for image processing .


For C1 that would be quite a while ago with continuing improvemts over the recent years/releases.


----------



## Neutral (Jan 23, 2015)

Lawliet said:


> Neutral said:
> 
> 
> > I am wondering when image processing softwares (LR, DXO, C1 etc.) will start using GPU for image processing .
> ...



Yes , you are right, C1 and DXO uses that to some extent (with GPU acceleration enabled) but I am not sure how efficiently they utilize full GPU power. My understanding that mostly for image rendering and less for RAW processing. 
But actually both are pretty fast on my laptop with NVIDIA GTX 780m card

Here are some interesting tests results for C1:
http://diglloyd.com/blog/2014/20140117_1-CaptureOnePro-GPU.html

But LR is still not using GPU acceleration and not sure if that this will be utilized in LR6.


----------



## iKenndac (Jan 23, 2015)

Neutral said:


> ...when it is possible to utilize more than 1500 processors on latest NVIDA cards for processing instead of just 4 or 8 cores on main CPU.



Just to level expectations - just because modern GPUs have 1500+ cores doesn't mean that you'll gain a 375x increase in performance over a quad core machine by utilising them.

GPU cores are highly specialised in the things they do well. Additionally, there's quite a large overhead in getting your data into the GPU for them to work on it to begin with, then getting it out again at the other side. It's not just a case of enabling CUDA or OpenCL in your app and watching the numbers fly. 

Of course, Adobe can and should be using these technologies for both RAW decoding and their entire pipeline. Remember, RAW decoding is only part of it — they decode the RAWs, then they individually apply every edit you've done to get the final output. This likely means paging data in and out of the GPU multiple times, and you need to do a lot of work to ensure you're taking the most efficient path — perhaps one adjustment is actually really fast on the CPU already and the overhead of getting everything into the GPU isn't worth it in that instance.

LR does a good job of faking speed by creating previews, but the main performance problems are while you're modifying edits. Doing this well is a big task, and I really hope the delays are because they're taking the time to do it right. Bash Apple's Aperture all you want, but their imaging pipeline absolutely screams since it's a mature GPU-accelerated API.


----------



## PavelR (Jan 23, 2015)

Zoner Photo Studio can utilize CUDA - see http://www.zoner.com/en/system-requirements
The application says that enabling GPU calculations on my system (8-core AMD CPU + Quadro 6000 GPU) speed is increased 12x.
Unfortunately I do not know how to measure the real life speed boost, because I do not know which operations utilize CUDA processing...


----------



## Neutral (Jan 23, 2015)

iKenndac said:


> Neutral said:
> 
> 
> > ...when it is possible to utilize more than 1500 processors on latest NVIDA cards for processing instead of just 4 or 8 cores on main CPU.
> ...



Sure, I was not telling that using additional 1500 CUDA cores compared to 4 main CPU cores would give 375x increase in performance. Single CUDA core and single main CPU cores have different processing power and available to them resources and their main target applications are different.

What I meant that using full CUDA processing resources using all available CUDA cores could provide drastic improvement in performance for RAW image processing. Especially could be useful for image denoising which could be splitted into huge number of parallel processes using separate process for each small image block.

If on my laptop PRIME DXO denoising takes about 150 seconds for A7R, 80 seconds for 1DX and ~23 seconds for a7s RAW files and I see this process is using all 4 main CPU cores (8 threads) on laptop up to 100% and boosts CPU clocking up to 3.4Ghz and CPU temperature jumps to 100C then using 1536 cores of NVIDIA GTX 780M could provide drastic performance improvement and reduce load on main CPU. Even 10X better performance would result in 15sec for a7R, 8 sec for 1Dx and 2 sec for a7S with prime denoising and I believe that could be even much better fully utilizing all CUDA cores

When I started this topic I provided several inks to the very interesting presentation regarding the subject, one is NDIVA presentation and the other one of the very impressive real time embedded image processing implementations. That papers shows what level of performance improvement could be achieved using CUDA technology for image and video processing directly from RAW file .

Benchmark results for Fastvideo industrial cameras implementation with real time processing are really amazing:
http://www.fastcompression.com
http://on-demand.gputechconf.com/gtc/2014/presentations/S4728-gpu-image-processing-camera-apps.pdf
====== 
Final Benchmark on GPU (Titan)
CMOSIS image sensor CMV20000, 5120x3840 (~20mpx), 12-bit, 30 fps
GeForce GTX Titan GPU
Host to device transfer ~1.5 ms
Demosaic ~3.1 ms
JPEG encoding (90%, 4:2:0) ~7.8 ms
Device to Host transfer ~1.3 ms
Total: ~13.7 ms
P.S. This is the benchmark for PCIE camera CB-200 from XIMEA
------- 
Solution for Photo Hosting
Task description: load-decode-resize-encode-store
Image load ~1.5 ms for 2048x2048 jpg image
JPEG decoding ~3.4 ms
Downsize to 1024x1024 with bicubic algorithm ~0.7 ms
JPEG encoding (quality 90%, 4:4:4) ~3.4 ms
Image store ~1.0 ms
GPU processing time ~7.5 ms
Total time ~10 ms
-------------------- 

I wish to see that level of performance in products that I currently use, especially in some new LR release.

And more info in NVIDAI presentation:
http://on-demand.gputechconf.com/siggraph/2013/presentation/SG3108-GPU-Programming-Video-Image-Processing.pdf


----------



## Don Haines (Jan 23, 2015)

Already there....

Autopano Giga (Image stitching software) uses GPU cores to accelerate image processing. It makes a HUGE increase in speed...


----------



## Neutral (Jan 23, 2015)

Don Haines said:


> Already there....
> 
> Autopano Giga (Image stitching software) uses GPU cores to accelerate image processing. It makes a HUGE increase in speed...



Autopano Giga ( http://www.kolor.com/image-stitching-software-autopano-giga-photo-stitching-feature.html ) seems to be very attractive product - seems to be much superiors compared to stitching in Photoshop

Are you using it and if so what is your feedback on that ?
It is not cheap but seems that product cost is for real value


----------



## Mt Spokane Photography (Jan 23, 2015)

It might be a worthwhile upgrade once images get large enough for it to make a difference. It would be for displaying images, so its very useful for video where images need to be displayed very quickly. For stills, I'm not sure if 1/2 sec or 1/20 sec to render a image makes a big difference to my post processing, software like Lightroom pre-renders the images, so they display in a eyeblink most of the time. It takes 1/2 sec or less to display them.


----------

