nvidia RTX 5080 rtx 5090 compute benchmarks

Benchmarks for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2501297-PTS-NVIDIART00
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
RTX 5090
January 24
  1 Hour
RTX 5080
January 28
  2 Hours, 17 Minutes
NVIDIA RTX 5080
January 28
  2 Hours, 15 Minutes
Invert Behavior (Only Show Selected Data)
  1 Hour, 51 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


nvidia RTX 5080 rtx 5090 compute benchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionRTX 5090RTX 5080NVIDIA RTX 5080Intel Core Ultra 9 285K @ 5.10GHz (24 Cores)ASUS ROG MAXIMUS Z890 HERO (1203 BIOS)Intel Device ae7f2 x 16GB DDR5-6400MT/s Micron CP16G64C38U5B.M8D11000GB Western Digital WDS100T1X0E-00AFY0 + 4001GB Western Digital WD_BLACK SN850X 4000GBASUS NVIDIA GeForce RTX 5090 32GBIntel Device 7f50ASUS VP28URealtek Device 8126 + Intel I226-V + Intel Wi-Fi 7Ubuntu 24.106.11.0-13-generic (x86_64)GNOME Shell 47.0X Server 1.21.1.13NVIDIA 570.86.104.6.0OpenCL 3.0 CUDA 12.8.51 + OpenCL 3.0GCC 14.2.0ext43840x2160ASUS NVIDIA GeForce RTX 5080 16GB6.11.0-14-generic (x86_64)OpenCL 3.0 CUDA 12.8.51GCC 14.2.0 + CUDA 12.8OpenBenchmarking.orgKernel Details- RTX 5090: nouveau.modeset=0 - Transparent Huge Pages: madvise- RTX 5080: Transparent Huge Pages: madvise- NVIDIA RTX 5080: Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: default) - CPU Microcode: 0x114 - Thermald 2.5.8Graphics Details- RTX 5090: BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 98.02.2e.00.03- RTX 5080: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 98.03.3b.00.01- NVIDIA RTX 5080: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 98.03.3b.00.01OpenCL Details- RTX 5090: GPU Compute Cores: 21760- RTX 5080: GPU Compute Cores: 10752- NVIDIA RTX 5080: GPU Compute Cores: 10752Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

RTX 5090RTX 5080NVIDIA RTX 5080Result OverviewPhoronix Test Suite100%128%155%183%211%NCNNVkResampleProjectPhysX OpenCL-BenchmarkFluidX3DHashcatChaos Group V-RAYBlenderclpeakIndigoBenchSHOC Scalable HeterOgeneous ComputingWaifu2x-NCNN VulkanRealSR-NCNN

nvidia RTX 5080 rtx 5090 compute benchmarksncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128v-ray: NVIDIA CUDA GPUv-ray: NVIDIA RTX GPUindigobench: OpenCL GPU - Bedroomindigobench: OpenCL GPU - Supercarnamd-cuda: ATPase Simulation - 327,506 Atomsblender: Barbershop - NVIDIA CUDAllama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048vkresample: 2x - Doubleblender: Barbershop - NVIDIA OptiXfluidx3d: FP32-FP32blender: Pabellon Barcelona - NVIDIA CUDAllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048clpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferllama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024realsr-ncnn: 4x - Yesnamd-cuda: STMV with 1,066,628 Atomsfluidx3d: FP32-FP16Sfluidx3d: FP32-FP16Cblender: Fishy Cat - NVIDIA CUDAllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024blender: Junkshop - NVIDIA CUDAblender: Classroom - NVIDIA CUDAllama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024hashcat: MD5llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512blender: Pabellon Barcelona - NVIDIA OptiXhashcat: SHA-512hashcat: SHA1blender: Classroom - NVIDIA OptiXopencl-benchmark: Memory Bandwidth Coalesced Writeopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: INT8 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: FP16 Computeopencl-benchmark: FP32 Computeopencl-benchmark: FP64 Computevkresample: 2x - Singleclpeak: Double-Precision Computeblender: Junkshop - NVIDIA OptiXhashcat: 7-Zipblender: BMW27 - NVIDIA CUDAblender: Fishy Cat - NVIDIA OptiXllama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512shoc: OpenCL - Texture Read Bandwidthhashcat: TrueCrypt RIPEMD160 + XTSnamd-cuda: ATPase with 327,506 Atomsrealsr-ncnn: 4x - Noblender: BMW27 - NVIDIA OptiXwaifu2x-ncnn: 2x - 3 - Yesclpeak: Global Memory Bandwidthshoc: OpenCL - FFT SPshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Triadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - S3Dshoc: OpenCL - Reductionclpeak: Single-Precision Computeclpeak: Integer Computeclpeak: Integer 24-bit Computeshoc: OpenCL - MD5 Hashclpeak: Kernel LatencyRTX 5090RTX 5080NVIDIA RTX 508011.0162.7647.922.0239.3442.4428.588.7410.9337.0513.922.6727.449.84.534.6810.2142.4448511192342.75992.70.0581035.14103.50524.33952417.3513.8318.4113.48418499191408.928.998.3810684825000078900400000688525000006.161687.491596.2441.79554.01861.7594.396122.914117.8471.955.6481976.95.6632723004.724.552870.6827760004.632.922.2991562.974398.3935937.227.832928.689528.78671117.54837.207121415.5362151.9461843.11142.4075.1543107.87165.9768.0244.0971.2744.7618.5417.4242.0136.586.4263.3937.5910.824.2241.5271.2797101.93105.092528738826.29368.34658.33705.87212.34839.48517431.556627.5913.9518.646642.523831.5318.6764.02164102761032814.47292.8413.814.087327.35994350000003878.9210.64243000000325225000009.47911.11909.6620.36827.2230.214.27659.46857.8510.9510.278962.397.8416570007.347.157804.827865.652728.64129470014.572545.2214.122.93849.232381.818948.127.846228.379828.7868591.436858.04259218.8930128.0430086.569.56354.8732.48108.87155.2268.0144.5571.0257.5717.616.3142.4742.544.0865.7729.339.75633.9671.0297.03101.84105.252528737226.27268.31758.073687.77212.31339.09517431.546630.6713.718.496653.973832.0318.644.01155102751033014.437304.8312.9414.057331.65998158000003849.7810.394265500000326331000009.43911.96910.0720.16927.25530.1974.359.46958.1160.9510.276962.457.7716645007.297.197799.247864.882734.06129420014.436035.1534.12.733849.742381.9219013.127.543828.557228.7864592.115857.86159305.8530173.7330059.4369.38684.87OpenBenchmarking.org

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 5080RTX 5080RTX 5090102030405032.4843.0011.01MIN: 5.11 / MAX: 96.35MIN: 5.09 / MAX: 96.19MIN: 5.09 / MAX: 92.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 5080RTX 5080RTX 509020406080100108.87107.8762.76MIN: 45.11 / MAX: 120.3MIN: 40.48 / MAX: 118.93MIN: 40.3 / MAX: 105.611. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 5080RTX 5080RTX 50904080120160200155.22165.9747.90MIN: 21.96 / MAX: 495.67MIN: 21.95 / MAX: 492.35MIN: 21.96 / MAX: 421.331. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 5080RTX 5080RTX 5090153045607568.0168.0222.02MIN: 7.27 / MAX: 102.26MIN: 7.34 / MAX: 99.35MIN: 7.41 / MAX: 92.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 5080RTX 5080RTX 5090102030405044.5544.0939.34MIN: 13.75 / MAX: 49.72MIN: 14.62 / MAX: 49.42MIN: 15.92 / MAX: 49.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3NVIDIA RTX 5080RTX 5080RTX 5090163248648071.0271.2742.44MIN: 8.7 / MAX: 81.03MIN: 8.2 / MAX: 79.42MIN: 8.34 / MAX: 76.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50NVIDIA RTX 5080RTX 5080RTX 5090132639526557.5744.7628.58MIN: 9.95 / MAX: 94MIN: 9.99 / MAX: 92.95MIN: 10.02 / MAX: 89.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetNVIDIA RTX 5080RTX 5080RTX 509051015202517.6018.548.74MIN: 3.2 / MAX: 23.03MIN: 3.21 / MAX: 22.49MIN: 3.21 / MAX: 22.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18NVIDIA RTX 5080RTX 5080RTX 50904812162016.3117.4210.93MIN: 4.4 / MAX: 44.98MIN: 4.48 / MAX: 44.53MIN: 4.48 / MAX: 44.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16NVIDIA RTX 5080RTX 5080RTX 5090102030405042.4742.0137.05MIN: 25.67 / MAX: 47.21MIN: 23.96 / MAX: 46.31MIN: 22.53 / MAX: 46.221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetNVIDIA RTX 5080RTX 5080RTX 5090102030405042.5436.5813.92MIN: 7.59 / MAX: 104.85MIN: 7.53 / MAX: 102.56MIN: 7.49 / MAX: 98.371. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 5080RTX 5080RTX 50902468104.086.422.67MIN: 2.36 / MAX: 52.23MIN: 2.36 / MAX: 52.25MIN: 2.4 / MAX: 41.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 5080RTX 5080RTX 5090153045607565.7763.3927.44MIN: 6.28 / MAX: 115.25MIN: 6.28 / MAX: 114.02MIN: 6.34 / MAX: 109.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 5080RTX 5080RTX 509091827364529.3337.599.80MIN: 3.68 / MAX: 66.87MIN: 3.7 / MAX: 66.4MIN: 3.75 / MAX: 63.51. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 5080RTX 5080RTX 509036912159.7510.824.53MIN: 3.91 / MAX: 74.73MIN: 3.9 / MAX: 74.52MIN: 3.88 / MAX: 57.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 5080RTX 5080RTX 50902468106.004.224.68MIN: 4.05 / MAX: 79MIN: 4.04 / MAX: 4.96MIN: 4.08 / MAX: 57.941. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 5080RTX 5080RTX 509091827364533.9641.5210.21MIN: 3.85 / MAX: 68.9MIN: 3.78 / MAX: 68.66MIN: 3.85 / MAX: 64.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 5080RTX 5080RTX 5090163248648071.0271.2742.44MIN: 8.7 / MAX: 81.03MIN: 8.2 / MAX: 79.42MIN: 8.34 / MAX: 76.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 50802040608010097.0397.001. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 508020406080100101.84101.931. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 508020406080100105.25105.091. (CXX) g++ options: -O3

Chaos Group V-RAY

This is a test of Chaos Group's V-RAY benchmark. V-RAY is a commercial renderer that can integrate with various creator software products like SketchUp and 3ds Max. The V-RAY benchmark is standalone and supports CPU and NVIDIA CUDA/RTX based rendering. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 6.0Mode: NVIDIA CUDA GPUNVIDIA RTX 5080RTX 5080RTX 509010002000300040005000252825284851

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 6.0Mode: NVIDIA RTX GPUNVIDIA RTX 5080RTX 5080RTX 50903K6K9K12K15K7372738811923

IndigoBench

This is a test of Indigo Renderer's IndigoBench benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 5080RTX 5080RTX 5090102030405026.2726.2942.76

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 5080RTX 5080RTX 50902040608010068.3268.3592.70

NAMD CUDA

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. This version of the NAMD test profile uses CUDA GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsRTX 50900.01310.02620.03930.05240.06550.05810

ATPase Simulation - 327,506 Atoms

RTX 5080: The test run did not produce a result. E: FATAL ERROR: No simulation config file specified on command line.

NVIDIA RTX 5080: The test run did not produce a result. E: FATAL ERROR: No simulation config file specified on command line.

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: NVIDIA CUDANVIDIA RTX 5080RTX 5080RTX 5090132639526558.0758.3035.14

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 508080016002400320040003687.773705.871. (CXX) g++ options: -O3

VkResample

VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 5080RTX 5080RTX 509050100150200250212.31212.35103.511. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 5080RTX 5080RTX 509091827364539.0939.4824.33

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP32NVIDIA RTX 5080RTX 5080RTX 50902K4K6K8K10K517451749524

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: NVIDIA CUDANVIDIA RTX 5080RTX 5080RTX 509071421283531.5431.5517.35

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 5080140028004200560070006630.676627.591. (CXX) g++ options: -O3

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueReadBufferNVIDIA RTX 5080RTX 5080RTX 50904812162013.7013.9513.831. (CXX) g++ options: -O3

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueWriteBufferNVIDIA RTX 5080RTX 5080RTX 509051015202518.4918.6418.411. (CXX) g++ options: -O3

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 5080140028004200560070006653.976642.521. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 508080016002400320040003832.033831.531. (CXX) g++ options: -O3

RealSR-NCNN

RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 5080RTX 5080RTX 509051015202518.6418.6813.48

NAMD CUDA

OpenBenchmarking.orgns/day, More Is BetterNAMD CUDA 3.0.1Input: STMV with 1,066,628 AtomsNVIDIA RTX 5080RTX 50800.90491.80982.71473.61964.52454.011554.02164

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP16SNVIDIA RTX 5080RTX 5080RTX 50904K8K12K16K20K102751027618499

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP16CNVIDIA RTX 5080RTX 5080RTX 50904K8K12K16K20K103301032819140

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: NVIDIA CUDANVIDIA RTX 5080RTX 5080RTX 50904812162014.4314.408.92

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 5080160032004800640080007304.837292.841. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: NVIDIA CUDANVIDIA RTX 5080RTX 5080RTX 50904812162012.9413.808.99

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: NVIDIA CUDANVIDIA RTX 5080RTX 5080RTX 50904812162014.0514.088.38

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 5080160032004800640080007331.657327.351. (CXX) g++ options: -O3

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 5080RTX 5080RTX 509020000M40000M60000M80000M100000MSE +/- 102551750000.00, N = 29981580000099435000000106848250000

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512NVIDIA RTX 5080RTX 508080016002400320040003849.783878.921. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 5080RTX 5080RTX 5090369121510.3910.607.00

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 5080RTX 5080RTX 50902000M4000M6000M8000M10000M426550000042430000008900400000

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 5080RTX 5080RTX 509015000M30000M45000M60000M75000M326331000003252250000068852500000

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 5080RTX 5080RTX 509036912159.439.476.16

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 5080RTX 5080RTX 5090400800120016002000911.96911.111687.491. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 5080RTX 5080RTX 509030060090012001500910.07909.661596.241. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT8 ComputeNVIDIA RTX 5080RTX 5080RTX 5090102030405020.1720.3741.801. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT16 ComputeNVIDIA RTX 5080RTX 5080RTX 5090122436486027.2627.2254.021. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT32 ComputeNVIDIA RTX 5080RTX 5080RTX 5090142842567030.2030.2161.761. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT64 ComputeNVIDIA RTX 5080RTX 5080RTX 50900.98911.97822.96733.95644.94554.3004.2764.3961. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP16 ComputeNVIDIA RTX 5080RTX 5080RTX 509030609012015059.4759.47122.911. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP32 ComputeNVIDIA RTX 5080RTX 5080RTX 509030609012015058.1257.85117.851. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP64 ComputeNVIDIA RTX 5080RTX 5080RTX 50900.43880.87761.31641.75522.1940.950.951.951. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

VkResample

VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 5080RTX 5080RTX 5090369121510.27610.2785.6481. (CXX) g++ options: -O3

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision ComputeNVIDIA RTX 5080RTX 5080RTX 5090400800120016002000962.45962.391976.901. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: NVIDIA OptiXNVIDIA RTX 5080RTX 5080RTX 50902468107.777.845.66

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 5080RTX 5080RTX 5090700K1400K2100K2800K3500K166450016570003272300

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: NVIDIA CUDANVIDIA RTX 5080RTX 5080RTX 50902468107.297.344.72

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 5080RTX 5080RTX 50902468107.197.154.55

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512NVIDIA RTX 5080RTX 50802K4K6K8K10K7799.247804.821. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512NVIDIA RTX 5080RTX 50802K4K6K8K10K7864.887865.651. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthNVIDIA RTX 5080RTX 5080RTX 509060012001800240030002734.062728.642870.681. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 5080RTX 5080RTX 5090600K1200K1800K2400K3000K129420012947002776000

NAMD CUDA

OpenBenchmarking.orgns/day, More Is BetterNAMD CUDA 3.0.1Input: ATPase with 327,506 AtomsNVIDIA RTX 5080RTX 50804812162014.4414.57

RealSR-NCNN

RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 5080RTX 5080RTX 50901.17472.34943.52414.69885.87355.1535.2214.630

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 5080RTX 5080RTX 50900.9271.8542.7813.7084.6354.104.122.92

Waifu2x-NCNN Vulkan

Waifu2x-NCNN is an NCNN neural network implementation of the Waifu2x converter project and accelerated using the Vulkan API. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 5080RTX 5080RTX 50900.65931.31861.97792.63723.29652.7332.9302.299

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 5080RTX 5080RTX 509030060090012001500849.74849.231562.971. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPNVIDIA RTX 5080RTX 5080RTX 509090018002700360045002381.922381.804398.391. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NNVIDIA RTX 5080RTX 5080RTX 50908K16K24K32K40K19013.118948.135937.21. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadNVIDIA RTX 5080RTX 5080RTX 509071421283527.5427.8527.831. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackNVIDIA RTX 5080RTX 5080RTX 509071421283528.5628.3828.691. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadNVIDIA RTX 5080RTX 5080RTX 509071421283528.7928.7928.791. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DNVIDIA RTX 5080RTX 5080RTX 50902004006008001000592.12591.441117.541. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionNVIDIA RTX 5080RTX 5080RTX 50902004006008001000857.86858.04837.211. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision ComputeNVIDIA RTX 5080RTX 5080RTX 509030K60K90K120K150K59305.8559218.89121415.531. (CXX) g++ options: -O3

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer ComputeNVIDIA RTX 5080RTX 5080RTX 509013K26K39K52K65K30173.7330128.0462151.941. (CXX) g++ options: -O3

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer 24-bit ComputeNVIDIA RTX 5080RTX 5080RTX 509013K26K39K52K65K30059.4330086.5061843.111. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashNVIDIA RTX 5080RTX 5080RTX 509030609012015069.3969.56142.411. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgus, Fewer Is Betterclpeak 1.1.2OpenCL Test: Kernel LatencyNVIDIA RTX 5080RTX 5080RTX 50901.15882.31763.47644.63525.7944.874.875.151. (CXX) g++ options: -O3

88 Results Shown

NCNN:
  Vulkan GPU - FastestDet
  Vulkan GPU - vision_transformer
  Vulkan GPU - regnety_400m
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - yolov4-tiny
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - resnet50
  Vulkan GPU - alexnet
  Vulkan GPU - resnet18
  Vulkan GPU - vgg16
  Vulkan GPU - googlenet
  Vulkan GPU - blazeface
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - mnasnet
  Vulkan GPU - shufflenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU - mobilenet
Llama.cpp:
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128
Chaos Group V-RAY:
  NVIDIA CUDA GPU
  NVIDIA RTX GPU
IndigoBench:
  OpenCL GPU - Bedroom
  OpenCL GPU - Supercar
NAMD CUDA
Blender
Llama.cpp
VkResample
Blender
FluidX3D
Blender
Llama.cpp
clpeak:
  Transfer Bandwidth enqueueReadBuffer
  Transfer Bandwidth enqueueWriteBuffer
Llama.cpp:
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024
RealSR-NCNN
NAMD CUDA
FluidX3D:
  FP32-FP16S
  FP32-FP16C
Blender
Llama.cpp
Blender:
  Junkshop - NVIDIA CUDA
  Classroom - NVIDIA CUDA
Llama.cpp
Hashcat
Llama.cpp
Blender
Hashcat:
  SHA-512
  SHA1
Blender
ProjectPhysX OpenCL-Benchmark:
  Memory Bandwidth Coalesced Write
  Memory Bandwidth Coalesced Read
  INT8 Compute
  INT16 Compute
  INT32 Compute
  INT64 Compute
  FP16 Compute
  FP32 Compute
  FP64 Compute
VkResample
clpeak
Blender
Hashcat
Blender:
  BMW27 - NVIDIA CUDA
  Fishy Cat - NVIDIA OptiX
Llama.cpp:
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
SHOC Scalable HeterOgeneous Computing
Hashcat
NAMD CUDA
RealSR-NCNN
Blender
Waifu2x-NCNN Vulkan
clpeak
SHOC Scalable HeterOgeneous Computing:
  OpenCL - FFT SP
  OpenCL - GEMM SGEMM_N
  OpenCL - Triad
  OpenCL - Bus Speed Readback
  OpenCL - Bus Speed Download
  OpenCL - S3D
  OpenCL - Reduction
clpeak:
  Single-Precision Compute
  Integer Compute
  Integer 24-bit Compute
SHOC Scalable HeterOgeneous Computing
clpeak