nvidia RTX 5080 rtx 5090 compute benchmarks

Benchmarks for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2501297-PTS-NVIDIART00
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
RTX 5090
January 24
  1 Hour
RTX 5080
January 28
  2 Hours, 17 Minutes
NVIDIA RTX 5080
January 28
  2 Hours, 15 Minutes
Invert Behavior (Only Show Selected Data)
  1 Hour, 51 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


nvidia RTX 5080 rtx 5090 compute benchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionRTX 5090RTX 5080NVIDIA RTX 5080Intel Core Ultra 9 285K @ 5.10GHz (24 Cores)ASUS ROG MAXIMUS Z890 HERO (1203 BIOS)Intel Device ae7f2 x 16GB DDR5-6400MT/s Micron CP16G64C38U5B.M8D11000GB Western Digital WDS100T1X0E-00AFY0 + 4001GB Western Digital WD_BLACK SN850X 4000GBASUS NVIDIA GeForce RTX 5090 32GBIntel Device 7f50ASUS VP28URealtek Device 8126 + Intel I226-V + Intel Wi-Fi 7Ubuntu 24.106.11.0-13-generic (x86_64)GNOME Shell 47.0X Server 1.21.1.13NVIDIA 570.86.104.6.0OpenCL 3.0 CUDA 12.8.51 + OpenCL 3.0GCC 14.2.0ext43840x2160ASUS NVIDIA GeForce RTX 5080 16GB6.11.0-14-generic (x86_64)OpenCL 3.0 CUDA 12.8.51GCC 14.2.0 + CUDA 12.8OpenBenchmarking.orgKernel Details- RTX 5090: nouveau.modeset=0 - Transparent Huge Pages: madvise- RTX 5080: Transparent Huge Pages: madvise- NVIDIA RTX 5080: Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: default) - CPU Microcode: 0x114 - Thermald 2.5.8Graphics Details- RTX 5090: BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 98.02.2e.00.03- RTX 5080: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 98.03.3b.00.01- NVIDIA RTX 5080: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 98.03.3b.00.01OpenCL Details- RTX 5090: GPU Compute Cores: 21760- RTX 5080: GPU Compute Cores: 10752- NVIDIA RTX 5080: GPU Compute Cores: 10752Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

RTX 5090RTX 5080NVIDIA RTX 5080Result OverviewPhoronix Test Suite100%128%155%183%211%NCNNVkResampleProjectPhysX OpenCL-BenchmarkFluidX3DHashcatChaos Group V-RAYBlenderclpeakIndigoBenchSHOC Scalable HeterOgeneous ComputingWaifu2x-NCNN VulkanRealSR-NCNN

nvidia RTX 5080 rtx 5090 compute benchmarksncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128v-ray: NVIDIA CUDA GPUv-ray: NVIDIA RTX GPUindigobench: OpenCL GPU - Bedroomindigobench: OpenCL GPU - Supercarnamd-cuda: ATPase Simulation - 327,506 Atomsblender: Barbershop - NVIDIA CUDAllama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048vkresample: 2x - Doubleblender: Barbershop - NVIDIA OptiXfluidx3d: FP32-FP32blender: Pabellon Barcelona - NVIDIA CUDAllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048clpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferllama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024realsr-ncnn: 4x - Yesnamd-cuda: STMV with 1,066,628 Atomsfluidx3d: FP32-FP16Sfluidx3d: FP32-FP16Cblender: Fishy Cat - NVIDIA CUDAllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024blender: Junkshop - NVIDIA CUDAblender: Classroom - NVIDIA CUDAllama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024hashcat: MD5llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512blender: Pabellon Barcelona - NVIDIA OptiXhashcat: SHA-512hashcat: SHA1blender: Classroom - NVIDIA OptiXopencl-benchmark: Memory Bandwidth Coalesced Writeopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: INT8 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: FP16 Computeopencl-benchmark: FP32 Computeopencl-benchmark: FP64 Computevkresample: 2x - Singleclpeak: Double-Precision Computeblender: Junkshop - NVIDIA OptiXhashcat: 7-Zipblender: BMW27 - NVIDIA CUDAblender: Fishy Cat - NVIDIA OptiXllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512shoc: OpenCL - Texture Read Bandwidthhashcat: TrueCrypt RIPEMD160 + XTSnamd-cuda: ATPase with 327,506 Atomsrealsr-ncnn: 4x - Noblender: BMW27 - NVIDIA OptiXwaifu2x-ncnn: 2x - 3 - Yesclpeak: Global Memory Bandwidthshoc: OpenCL - FFT SPshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Triadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - S3Dshoc: OpenCL - Reductionclpeak: Single-Precision Computeclpeak: Integer Computeclpeak: Integer 24-bit Computeshoc: OpenCL - MD5 Hashclpeak: Kernel LatencyRTX 5090RTX 5080NVIDIA RTX 508011.0162.7647.922.0239.3442.4428.588.7410.9337.0513.922.6727.449.84.534.6810.2142.4448511192342.75992.70.0581035.14103.50524.33952417.3513.8318.4113.48418499191408.928.998.3810684825000078900400000688525000006.161687.491596.2441.79554.01861.7594.396122.914117.8471.955.6481976.95.6632723004.724.552870.6827760004.632.922.2991562.974398.3935937.227.832928.689528.78671117.54837.207121415.5362151.9461843.11142.4075.1543107.87165.9768.0244.0971.2744.7618.5417.4242.0136.586.4263.3937.5910.824.2241.5271.2797101.93105.092528738826.29368.34658.33705.87212.34839.48517431.556627.5913.9518.646642.523831.5318.6764.02164102761032814.47292.8413.814.087327.35994350000003878.9210.64243000000325225000009.47911.11909.6620.36827.2230.214.27659.46857.8510.9510.278962.397.8416570007.347.157804.827865.652728.64129470014.572545.2214.122.93849.232381.818948.127.846228.379828.7868591.436858.04259218.8930128.0430086.569.56354.8732.48108.87155.2268.0144.5571.0257.5717.616.3142.4742.544.0865.7729.339.75633.9671.0297.03101.84105.252528737226.27268.31758.073687.77212.31339.09517431.546630.6713.718.496653.973832.0318.644.01155102751033014.437304.8312.9414.057331.65998158000003849.7810.394265500000326331000009.43911.96910.0720.16927.25530.1974.359.46958.1160.9510.276962.457.7716645007.297.197799.247864.882734.06129420014.436035.1534.12.733849.742381.9219013.127.543828.557228.7864592.115857.86159305.8530173.7330059.4369.38684.87OpenBenchmarking.org

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetRTX 5090NVIDIA RTX 5080RTX 5080102030405011.0132.4843.00MIN: 5.09 / MAX: 92.36MIN: 5.11 / MAX: 96.35MIN: 5.09 / MAX: 96.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerRTX 5090RTX 5080NVIDIA RTX 50802040608010062.76107.87108.87MIN: 40.3 / MAX: 105.61MIN: 40.48 / MAX: 118.93MIN: 45.11 / MAX: 120.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mRTX 5090NVIDIA RTX 5080RTX 5080408012016020047.90155.22165.97MIN: 21.96 / MAX: 421.33MIN: 21.96 / MAX: 495.67MIN: 21.95 / MAX: 492.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdRTX 5090NVIDIA RTX 5080RTX 5080153045607522.0268.0168.02MIN: 7.41 / MAX: 92.66MIN: 7.27 / MAX: 102.26MIN: 7.34 / MAX: 99.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyRTX 5090RTX 5080NVIDIA RTX 5080102030405039.3444.0944.55MIN: 15.92 / MAX: 49.04MIN: 14.62 / MAX: 49.42MIN: 13.75 / MAX: 49.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3RTX 5090NVIDIA RTX 5080RTX 5080163248648042.4471.0271.27MIN: 8.34 / MAX: 76.17MIN: 8.7 / MAX: 81.03MIN: 8.2 / MAX: 79.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50RTX 5090RTX 5080NVIDIA RTX 5080132639526528.5844.7657.57MIN: 10.02 / MAX: 89.48MIN: 9.99 / MAX: 92.95MIN: 9.95 / MAX: 941. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetRTX 5090NVIDIA RTX 5080RTX 50805101520258.7417.6018.54MIN: 3.21 / MAX: 22.23MIN: 3.2 / MAX: 23.03MIN: 3.21 / MAX: 22.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18RTX 5090NVIDIA RTX 5080RTX 50804812162010.9316.3117.42MIN: 4.48 / MAX: 44.35MIN: 4.4 / MAX: 44.98MIN: 4.48 / MAX: 44.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16RTX 5090RTX 5080NVIDIA RTX 5080102030405037.0542.0142.47MIN: 22.53 / MAX: 46.22MIN: 23.96 / MAX: 46.31MIN: 25.67 / MAX: 47.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetRTX 5090RTX 5080NVIDIA RTX 5080102030405013.9236.5842.54MIN: 7.49 / MAX: 98.37MIN: 7.53 / MAX: 102.56MIN: 7.59 / MAX: 104.851. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceRTX 5090NVIDIA RTX 5080RTX 50802468102.674.086.42MIN: 2.4 / MAX: 41.21MIN: 2.36 / MAX: 52.23MIN: 2.36 / MAX: 52.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0RTX 5090RTX 5080NVIDIA RTX 5080153045607527.4463.3965.77MIN: 6.34 / MAX: 109.98MIN: 6.28 / MAX: 114.02MIN: 6.28 / MAX: 115.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetRTX 5090NVIDIA RTX 5080RTX 50809182736459.8029.3337.59MIN: 3.75 / MAX: 63.5MIN: 3.68 / MAX: 66.87MIN: 3.7 / MAX: 66.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2RTX 5090NVIDIA RTX 5080RTX 508036912154.539.7510.82MIN: 3.88 / MAX: 57.59MIN: 3.91 / MAX: 74.73MIN: 3.9 / MAX: 74.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3RTX 5080RTX 5090NVIDIA RTX 50802468104.224.686.00MIN: 4.04 / MAX: 4.96MIN: 4.08 / MAX: 57.94MIN: 4.05 / MAX: 791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2RTX 5090NVIDIA RTX 5080RTX 508091827364510.2133.9641.52MIN: 3.85 / MAX: 64.73MIN: 3.85 / MAX: 68.9MIN: 3.78 / MAX: 68.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetRTX 5090NVIDIA RTX 5080RTX 5080163248648042.4471.0271.27MIN: 8.34 / MAX: 76.17MIN: 8.7 / MAX: 81.03MIN: 8.2 / MAX: 79.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 50802040608010097.0397.001. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128RTX 5080NVIDIA RTX 508020406080100101.93101.841. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 508020406080100105.25105.091. (CXX) g++ options: -O3

Chaos Group V-RAY

This is a test of Chaos Group's V-RAY benchmark. V-RAY is a commercial renderer that can integrate with various creator software products like SketchUp and 3ds Max. The V-RAY benchmark is standalone and supports CPU and NVIDIA CUDA/RTX based rendering. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 6.0Mode: NVIDIA CUDA GPURTX 5090NVIDIA RTX 5080RTX 508010002000300040005000485125282528

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 6.0Mode: NVIDIA RTX GPURTX 5090RTX 5080NVIDIA RTX 50803K6K9K12K15K1192373887372

IndigoBench

This is a test of Indigo Renderer's IndigoBench benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomRTX 5090RTX 5080NVIDIA RTX 5080102030405042.7626.2926.27

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarRTX 5090RTX 5080NVIDIA RTX 50802040608010092.7068.3568.32

NAMD CUDA

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. This version of the NAMD test profile uses CUDA GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsRTX 50900.01310.02620.03930.05240.06550.05810

ATPase Simulation - 327,506 Atoms

RTX 5080: The test run did not produce a result. E: FATAL ERROR: No simulation config file specified on command line.

NVIDIA RTX 5080: The test run did not produce a result. E: FATAL ERROR: No simulation config file specified on command line.

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 5080132639526535.1458.0758.30

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048RTX 5080NVIDIA RTX 508080016002400320040003705.873687.771. (CXX) g++ options: -O3

VkResample

VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleRTX 5090NVIDIA RTX 5080RTX 508050100150200250103.51212.31212.351. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 508091827364524.3339.0939.48

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP32RTX 5090NVIDIA RTX 5080RTX 50802K4K6K8K10K952451745174

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 508071421283517.3531.5431.55

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 5080140028004200560070006630.676627.591. (CXX) g++ options: -O3

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueReadBufferRTX 5080RTX 5090NVIDIA RTX 50804812162013.9513.8313.701. (CXX) g++ options: -O3

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueWriteBufferRTX 5080NVIDIA RTX 5080RTX 509051015202518.6418.4918.411. (CXX) g++ options: -O3

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 5080140028004200560070006653.976642.521. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 508080016002400320040003832.033831.531. (CXX) g++ options: -O3

RealSR-NCNN

RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesRTX 5090NVIDIA RTX 5080RTX 508051015202513.4818.6418.68

NAMD CUDA

OpenBenchmarking.orgns/day, More Is BetterNAMD CUDA 3.0.1Input: STMV with 1,066,628 AtomsRTX 5080NVIDIA RTX 50800.90491.80982.71473.61964.52454.021644.01155

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP16SRTX 5090RTX 5080NVIDIA RTX 50804K8K12K16K20K184991027610275

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP16CRTX 5090NVIDIA RTX 5080RTX 50804K8K12K16K20K191401033010328

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: NVIDIA CUDARTX 5090RTX 5080NVIDIA RTX 5080481216208.9214.4014.43

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 5080160032004800640080007304.837292.841. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 5080481216208.9912.9413.80

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 5080481216208.3814.0514.08

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 5080160032004800640080007331.657327.351. (CXX) g++ options: -O3

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5RTX 5090NVIDIA RTX 5080RTX 508020000M40000M60000M80000M100000MSE +/- 102551750000.00, N = 21068482500009981580000099435000000

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512RTX 5080NVIDIA RTX 508080016002400320040003878.923849.781. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 508036912157.0010.3910.60

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512RTX 5090NVIDIA RTX 5080RTX 50802000M4000M6000M8000M10000M890040000042655000004243000000

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1RTX 5090NVIDIA RTX 5080RTX 508015000M30000M45000M60000M75000M688525000003263310000032522500000

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 508036912156.169.439.47

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: Memory Bandwidth Coalesced WriteRTX 5090NVIDIA RTX 5080RTX 50804008001200160020001687.49911.96911.111. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: Memory Bandwidth Coalesced ReadRTX 5090NVIDIA RTX 5080RTX 5080300600900120015001596.24910.07909.661. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT8 ComputeRTX 5090RTX 5080NVIDIA RTX 5080102030405041.8020.3720.171. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT16 ComputeRTX 5090NVIDIA RTX 5080RTX 5080122436486054.0227.2627.221. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT32 ComputeRTX 5090RTX 5080NVIDIA RTX 5080142842567061.7630.2130.201. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT64 ComputeRTX 5090NVIDIA RTX 5080RTX 50800.98911.97822.96733.95644.94554.3964.3004.2761. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP16 ComputeRTX 5090NVIDIA RTX 5080RTX 5080306090120150122.9159.4759.471. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP32 ComputeRTX 5090NVIDIA RTX 5080RTX 5080306090120150117.8558.1257.851. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP64 ComputeRTX 5090NVIDIA RTX 5080RTX 50800.43880.87761.31641.75522.1941.950.950.951. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

VkResample

VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleRTX 5090NVIDIA RTX 5080RTX 508036912155.64810.27610.2781. (CXX) g++ options: -O3

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision ComputeRTX 5090NVIDIA RTX 5080RTX 50804008001200160020001976.90962.45962.391. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 50802468105.667.777.84

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipRTX 5090NVIDIA RTX 5080RTX 5080700K1400K2100K2800K3500K327230016645001657000

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 50802468104.727.297.34

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: NVIDIA OptiXRTX 5090RTX 5080NVIDIA RTX 50802468104.557.157.19

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512RTX 5080NVIDIA RTX 50802K4K6K8K10K7804.827799.241. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512RTX 5080NVIDIA RTX 50802K4K6K8K10K7865.657864.881. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthRTX 5090NVIDIA RTX 5080RTX 508060012001800240030002870.682734.062728.641. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSRTX 5090RTX 5080NVIDIA RTX 5080600K1200K1800K2400K3000K277600012947001294200

NAMD CUDA

OpenBenchmarking.orgns/day, More Is BetterNAMD CUDA 3.0.1Input: ATPase with 327,506 AtomsRTX 5080NVIDIA RTX 50804812162014.5714.44

RealSR-NCNN

RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoRTX 5090NVIDIA RTX 5080RTX 50801.17472.34943.52414.69885.87354.6305.1535.221

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 50800.9271.8542.7813.7084.6352.924.104.12

Waifu2x-NCNN Vulkan

Waifu2x-NCNN is an NCNN neural network implementation of the Waifu2x converter project and accelerated using the Vulkan API. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesRTX 5090NVIDIA RTX 5080RTX 50800.65931.31861.97792.63723.29652.2992.7332.930

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthRTX 5090NVIDIA RTX 5080RTX 5080300600900120015001562.97849.74849.231. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPRTX 5090NVIDIA RTX 5080RTX 508090018002700360045004398.392381.922381.801. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NRTX 5090NVIDIA RTX 5080RTX 50808K16K24K32K40K35937.219013.118948.11. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadRTX 5080RTX 5090NVIDIA RTX 508071421283527.8527.8327.541. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackRTX 5090NVIDIA RTX 5080RTX 508071421283528.6928.5628.381. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadRTX 5080RTX 5090NVIDIA RTX 508071421283528.7928.7928.791. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DRTX 5090NVIDIA RTX 5080RTX 508020040060080010001117.54592.12591.441. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionRTX 5080NVIDIA RTX 5080RTX 50902004006008001000858.04857.86837.211. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision ComputeRTX 5090NVIDIA RTX 5080RTX 508030K60K90K120K150K121415.5359305.8559218.891. (CXX) g++ options: -O3

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer ComputeRTX 5090NVIDIA RTX 5080RTX 508013K26K39K52K65K62151.9430173.7330128.041. (CXX) g++ options: -O3

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer 24-bit ComputeRTX 5090RTX 5080NVIDIA RTX 508013K26K39K52K65K61843.1130086.5030059.431. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashRTX 5090RTX 5080NVIDIA RTX 5080306090120150142.4169.5669.391. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgus, Fewer Is Betterclpeak 1.1.2OpenCL Test: Kernel LatencyRTX 5080NVIDIA RTX 5080RTX 50901.15882.31763.47644.63525.7944.874.875.151. (CXX) g++ options: -O3

88 Results Shown

NCNN:
  Vulkan GPU - FastestDet
  Vulkan GPU - vision_transformer
  Vulkan GPU - regnety_400m
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - yolov4-tiny
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - resnet50
  Vulkan GPU - alexnet
  Vulkan GPU - resnet18
  Vulkan GPU - vgg16
  Vulkan GPU - googlenet
  Vulkan GPU - blazeface
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - mnasnet
  Vulkan GPU - shufflenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU - mobilenet
Llama.cpp:
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128
Chaos Group V-RAY:
  NVIDIA CUDA GPU
  NVIDIA RTX GPU
IndigoBench:
  OpenCL GPU - Bedroom
  OpenCL GPU - Supercar
NAMD CUDA
Blender
Llama.cpp
VkResample
Blender
FluidX3D
Blender
Llama.cpp
clpeak:
  Transfer Bandwidth enqueueReadBuffer
  Transfer Bandwidth enqueueWriteBuffer
Llama.cpp:
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024
RealSR-NCNN
NAMD CUDA
FluidX3D:
  FP32-FP16S
  FP32-FP16C
Blender
Llama.cpp
Blender:
  Junkshop - NVIDIA CUDA
  Classroom - NVIDIA CUDA
Llama.cpp
Hashcat
Llama.cpp
Blender
Hashcat:
  SHA-512
  SHA1
Blender
ProjectPhysX OpenCL-Benchmark:
  Memory Bandwidth Coalesced Write
  Memory Bandwidth Coalesced Read
  INT8 Compute
  INT16 Compute
  INT32 Compute
  INT64 Compute
  FP16 Compute
  FP32 Compute
  FP64 Compute
VkResample
clpeak
Blender
Hashcat
Blender:
  BMW27 - NVIDIA CUDA
  Fishy Cat - NVIDIA OptiX
Llama.cpp:
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
SHOC Scalable HeterOgeneous Computing
Hashcat
NAMD CUDA
RealSR-NCNN
Blender
Waifu2x-NCNN Vulkan
clpeak
SHOC Scalable HeterOgeneous Computing:
  OpenCL - FFT SP
  OpenCL - GEMM SGEMM_N
  OpenCL - Triad
  OpenCL - Bus Speed Readback
  OpenCL - Bus Speed Download
  OpenCL - S3D
  OpenCL - Reduction
clpeak:
  Single-Precision Compute
  Integer Compute
  Integer 24-bit Compute
SHOC Scalable HeterOgeneous Computing
clpeak