We’ve been having some fun with an NVidia GTX690 card in Sundog’s office lately – its dual Kepler GPU’s can really do wonders for the FFT calculations that power Triton’s simulation of thousands of waves at once. Developing with it allowed us to see that a big bottleneck we never knew about was copying data from the GPU back to the CPU, to power our spray effects and height queries. By simply adjusting the timing of those copies, we improved performance 185% when using CUDA on this card. We also found some other optimizations, such as avoiding some copies up to the GPU entirely in most cases – and those optimizations also help our OpenCL and DirectX11 code paths.

At left is a screenshot of Triton’s FFT-fueled infinite ocean rendering at over 300 frames per second, as measured by FRAPS. That works out to just over 3 milliseconds per frame – leaving quite a few left over for everything else your game or simulation needs to do.

These performance gains are available to you now with the release of Triton 1.51. It’s available now at our download page.

While we were in performance tuning mode, we came up with some interesting benchmarks using the GTX690:

FFT Method Framerate
FFTSS library 30
Intel IPP 70
CUDA 300
DirectX11 Compute Shaders 270

Since Triton uses CUDA on NVidia systems and OpenCL on ATI systems, we couldn’t do an apples-to-apples comparison of OpenCL and CUDA on the same card. But, we do see performance of 170 FPS on an ATI HD5850 card with Triton 1.51, which is a mid-range single-GPU card – I think this means OpenCL performance is at least on par with CUDA at this point.

Triton 1.51 also improves our calculation of spray and foam effects, and fixes a bug. Between those enhancements and the performance improvements, we recommend this update for everyone.