nvidia RTX 5080 rtx 5090 compute benchmarks

Benchmarks for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2501297-PTS-NVIDIART00
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
RTX 5090
January 24
  1 Hour
RTX 5080
January 28
  2 Hours, 17 Minutes
NVIDIA RTX 5080
January 28
  2 Hours, 15 Minutes
Invert Behavior (Only Show Selected Data)
  1 Hour, 51 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


nvidia RTX 5080 rtx 5090 compute benchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionRTX 5090RTX 5080NVIDIA RTX 5080Intel Core Ultra 9 285K @ 5.10GHz (24 Cores)ASUS ROG MAXIMUS Z890 HERO (1203 BIOS)Intel Device ae7f2 x 16GB DDR5-6400MT/s Micron CP16G64C38U5B.M8D11000GB Western Digital WDS100T1X0E-00AFY0 + 4001GB Western Digital WD_BLACK SN850X 4000GBASUS NVIDIA GeForce RTX 5090 32GBIntel Device 7f50ASUS VP28URealtek Device 8126 + Intel I226-V + Intel Wi-Fi 7Ubuntu 24.106.11.0-13-generic (x86_64)GNOME Shell 47.0X Server 1.21.1.13NVIDIA 570.86.104.6.0OpenCL 3.0 CUDA 12.8.51 + OpenCL 3.0GCC 14.2.0ext43840x2160ASUS NVIDIA GeForce RTX 5080 16GB6.11.0-14-generic (x86_64)OpenCL 3.0 CUDA 12.8.51GCC 14.2.0 + CUDA 12.8OpenBenchmarking.orgKernel Details- RTX 5090: nouveau.modeset=0 - Transparent Huge Pages: madvise- RTX 5080: Transparent Huge Pages: madvise- NVIDIA RTX 5080: Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: default) - CPU Microcode: 0x114 - Thermald 2.5.8Graphics Details- RTX 5090: BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 98.02.2e.00.03- RTX 5080: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 98.03.3b.00.01- NVIDIA RTX 5080: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 98.03.3b.00.01OpenCL Details- RTX 5090: GPU Compute Cores: 21760- RTX 5080: GPU Compute Cores: 10752- NVIDIA RTX 5080: GPU Compute Cores: 10752Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

RTX 5090RTX 5080NVIDIA RTX 5080Result OverviewPhoronix Test Suite100%128%155%183%211%NCNNVkResampleProjectPhysX OpenCL-BenchmarkFluidX3DHashcatChaos Group V-RAYBlenderclpeakIndigoBenchSHOC Scalable HeterOgeneous ComputingWaifu2x-NCNN VulkanRealSR-NCNN

nvidia RTX 5080 rtx 5090 compute benchmarksncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - shufflenet-v2hashcat: TrueCrypt RIPEMD160 + XTSncnn: Vulkan GPU - alexnethashcat: SHA1hashcat: SHA-512opencl-benchmark: INT8 Computeopencl-benchmark: FP16 Computeclpeak: Integer Computeclpeak: Integer 24-bit Computeclpeak: Double-Precision Computeopencl-benchmark: FP64 Computeshoc: OpenCL - MD5 Hashvkresample: 2x - Doubleclpeak: Single-Precision Computeopencl-benchmark: INT32 Computeopencl-benchmark: FP32 Computencnn: Vulkan GPU - resnet50opencl-benchmark: INT16 Computehashcat: 7-Zipv-ray: NVIDIA CUDA GPUshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - S3Dfluidx3d: FP32-FP16Copencl-benchmark: Memory Bandwidth Coalesced Writeshoc: OpenCL - FFT SPfluidx3d: FP32-FP32clpeak: Global Memory Bandwidthvkresample: 2x - Singleblender: Pabellon Barcelona - NVIDIA CUDAfluidx3d: FP32-FP16Sopencl-benchmark: Memory Bandwidth Coalesced Readncnn: Vulkan GPU - vision_transformerblender: Classroom - NVIDIA CUDAncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - mobilenetblender: Barbershop - NVIDIA CUDAindigobench: OpenCL GPU - Bedroomblender: Barbershop - NVIDIA OptiXblender: Fishy Cat - NVIDIA CUDAv-ray: NVIDIA RTX GPUncnn: Vulkan GPU - resnet18blender: Fishy Cat - NVIDIA OptiXblender: BMW27 - NVIDIA CUDAblender: Classroom - NVIDIA OptiXblender: Junkshop - NVIDIA CUDAblender: Pabellon Barcelona - NVIDIA OptiXncnn: Vulkan GPU-v3-v3 - mobilenet-v3blender: BMW27 - NVIDIA OptiXblender: Junkshop - NVIDIA OptiXrealsr-ncnn: 4x - Yesindigobench: OpenCL GPU - Supercarwaifu2x-ncnn: 2x - 3 - Yesncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - yolov4-tinyrealsr-ncnn: 4x - Nohashcat: MD5clpeak: Kernel Latencyshoc: OpenCL - Texture Read Bandwidthopencl-benchmark: INT64 Computeshoc: OpenCL - Reductionclpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBuffershoc: OpenCL - Triadshoc: OpenCL - Bus Speed Readbacknamd-cuda: ATPase with 327,506 Atomsllama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048namd-cuda: STMV with 1,066,628 Atomsllama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512shoc: OpenCL - Bus Speed Downloadnamd-cuda: ATPase Simulation - 327,506 AtomsRTX 5090RTX 5080NVIDIA RTX 508010.2111.019.847.922.0213.922.6727.444.5327760008.7468852500000890040000041.795122.91462151.9461843.111976.91.95142.407103.505121415.5361.759117.84728.5854.0183272300485135937.21117.54191401687.494398.3995241562.975.64817.35184991596.2462.768.3842.4442.4435.1442.75924.338.921192310.934.554.726.168.9974.682.925.6613.48492.72.29937.0539.344.631068482500005.152870.684.396837.20713.8318.4127.832928.689528.78670.0581041.524337.59165.9768.0236.586.4263.3910.82129470018.5432522500000424300000020.36859.46830128.0430086.5962.390.9569.5635212.34859218.8930.2157.85144.7627.221657000252818948.1591.43610328911.112381.85174849.2310.27831.5510276909.66107.8714.0871.2771.2758.326.29339.4814.4738817.427.157.349.4713.810.64.224.127.8418.67668.3462.9342.0144.095.221994350000004.872728.644.276858.04213.9518.6427.846228.379814.572543878.923705.874.021646642.527292.84105.09101.937804.827327.356627.59973831.537865.6528.786833.9632.4829.33155.2268.0142.544.0865.779.75129420017.632633100000426550000020.16959.46930173.7330059.43962.450.9569.3868212.31359305.8530.19758.11657.5727.2551664500252819013.1592.11510330911.962381.925174849.7410.27631.5410275910.07108.8714.0571.0271.0258.0726.27239.0914.43737216.317.197.299.4312.9410.3964.17.7718.6468.3172.73342.4744.555.153998158000004.872734.064.3857.86113.718.4927.543828.557214.436033849.783687.774.011556653.977304.83105.25101.847799.247331.656630.6797.033832.037864.8828.7864OpenBenchmarking.org

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2RTX 5090NVIDIA RTX 5080RTX 508091827364510.2133.9641.52MIN: 3.85 / MAX: 64.73MIN: 3.85 / MAX: 68.9MIN: 3.78 / MAX: 68.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetRTX 5090NVIDIA RTX 5080RTX 5080102030405011.0132.4843.00MIN: 5.09 / MAX: 92.36MIN: 5.11 / MAX: 96.35MIN: 5.09 / MAX: 96.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetRTX 5090NVIDIA RTX 5080RTX 50809182736459.8029.3337.59MIN: 3.75 / MAX: 63.5MIN: 3.68 / MAX: 66.87MIN: 3.7 / MAX: 66.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mRTX 5090NVIDIA RTX 5080RTX 5080408012016020047.90155.22165.97MIN: 21.96 / MAX: 421.33MIN: 21.96 / MAX: 495.67MIN: 21.95 / MAX: 492.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdRTX 5090NVIDIA RTX 5080RTX 5080153045607522.0268.0168.02MIN: 7.41 / MAX: 92.66MIN: 7.27 / MAX: 102.26MIN: 7.34 / MAX: 99.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetRTX 5090RTX 5080NVIDIA RTX 5080102030405013.9236.5842.54MIN: 7.49 / MAX: 98.37MIN: 7.53 / MAX: 102.56MIN: 7.59 / MAX: 104.851. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceRTX 5090NVIDIA RTX 5080RTX 50802468102.674.086.42MIN: 2.4 / MAX: 41.21MIN: 2.36 / MAX: 52.23MIN: 2.36 / MAX: 52.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0RTX 5090RTX 5080NVIDIA RTX 5080153045607527.4463.3965.77MIN: 6.34 / MAX: 109.98MIN: 6.28 / MAX: 114.02MIN: 6.28 / MAX: 115.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2RTX 5090NVIDIA RTX 5080RTX 508036912154.539.7510.82MIN: 3.88 / MAX: 57.59MIN: 3.91 / MAX: 74.73MIN: 3.9 / MAX: 74.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSRTX 5090RTX 5080NVIDIA RTX 5080600K1200K1800K2400K3000K277600012947001294200

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetRTX 5090NVIDIA RTX 5080RTX 50805101520258.7417.6018.54MIN: 3.21 / MAX: 22.23MIN: 3.2 / MAX: 23.03MIN: 3.21 / MAX: 22.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1RTX 5090NVIDIA RTX 5080RTX 508015000M30000M45000M60000M75000M688525000003263310000032522500000

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512RTX 5090NVIDIA RTX 5080RTX 50802000M4000M6000M8000M10000M890040000042655000004243000000

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT8 ComputeRTX 5090RTX 5080NVIDIA RTX 5080102030405041.8020.3720.171. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP16 ComputeRTX 5090NVIDIA RTX 5080RTX 5080306090120150122.9159.4759.471. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer ComputeRTX 5090NVIDIA RTX 5080RTX 508013K26K39K52K65K62151.9430173.7330128.041. (CXX) g++ options: -O3

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer 24-bit ComputeRTX 5090RTX 5080NVIDIA RTX 508013K26K39K52K65K61843.1130086.5030059.431. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision ComputeRTX 5090NVIDIA RTX 5080RTX 50804008001200160020001976.90962.45962.391. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP64 ComputeRTX 5090NVIDIA RTX 5080RTX 50800.43880.87761.31641.75522.1941.950.950.951. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashRTX 5090RTX 5080NVIDIA RTX 5080306090120150142.4169.5669.391. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

VkResample

VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleRTX 5090NVIDIA RTX 5080RTX 508050100150200250103.51212.31212.351. (CXX) g++ options: -O3

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision ComputeRTX 5090NVIDIA RTX 5080RTX 508030K60K90K120K150K121415.5359305.8559218.891. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT32 ComputeRTX 5090RTX 5080NVIDIA RTX 5080142842567061.7630.2130.201. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: FP32 ComputeRTX 5090NVIDIA RTX 5080RTX 5080306090120150117.8558.1257.851. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50RTX 5090RTX 5080NVIDIA RTX 5080132639526528.5844.7657.57MIN: 10.02 / MAX: 89.48MIN: 9.99 / MAX: 92.95MIN: 9.95 / MAX: 941. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT16 ComputeRTX 5090NVIDIA RTX 5080RTX 5080122436486054.0227.2627.221. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipRTX 5090NVIDIA RTX 5080RTX 5080700K1400K2100K2800K3500K327230016645001657000

Chaos Group V-RAY

This is a test of Chaos Group's V-RAY benchmark. V-RAY is a commercial renderer that can integrate with various creator software products like SketchUp and 3ds Max. The V-RAY benchmark is standalone and supports CPU and NVIDIA CUDA/RTX based rendering. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 6.0Mode: NVIDIA CUDA GPURTX 5090NVIDIA RTX 5080RTX 508010002000300040005000485125282528

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NRTX 5090NVIDIA RTX 5080RTX 50808K16K24K32K40K35937.219013.118948.11. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DRTX 5090NVIDIA RTX 5080RTX 508020040060080010001117.54592.12591.441. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP16CRTX 5090NVIDIA RTX 5080RTX 50804K8K12K16K20K191401033010328

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: Memory Bandwidth Coalesced WriteRTX 5090NVIDIA RTX 5080RTX 50804008001200160020001687.49911.96911.111. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPRTX 5090NVIDIA RTX 5080RTX 508090018002700360045004398.392381.922381.801. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP32RTX 5090NVIDIA RTX 5080RTX 50802K4K6K8K10K952451745174

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthRTX 5090NVIDIA RTX 5080RTX 5080300600900120015001562.97849.74849.231. (CXX) g++ options: -O3

VkResample

VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleRTX 5090NVIDIA RTX 5080RTX 508036912155.64810.27610.2781. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 508071421283517.3531.5431.55

FluidX3D

OpenBenchmarking.orgMLUPs/s, More Is BetterFluidX3D 3.0Test: FP32-FP16SRTX 5090RTX 5080NVIDIA RTX 50804K8K12K16K20K184991027610275

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: Memory Bandwidth Coalesced ReadRTX 5090NVIDIA RTX 5080RTX 5080300600900120015001596.24910.07909.661. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerRTX 5090RTX 5080NVIDIA RTX 50802040608010062.76107.87108.87MIN: 40.3 / MAX: 105.61MIN: 40.48 / MAX: 118.93MIN: 45.11 / MAX: 120.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 5080481216208.3814.0514.08

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3RTX 5090NVIDIA RTX 5080RTX 5080163248648042.4471.0271.27MIN: 8.34 / MAX: 76.17MIN: 8.7 / MAX: 81.03MIN: 8.2 / MAX: 79.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetRTX 5090NVIDIA RTX 5080RTX 5080163248648042.4471.0271.27MIN: 8.34 / MAX: 76.17MIN: 8.7 / MAX: 81.03MIN: 8.2 / MAX: 79.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 5080132639526535.1458.0758.30

IndigoBench

This is a test of Indigo Renderer's IndigoBench benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomRTX 5090RTX 5080NVIDIA RTX 5080102030405042.7626.2926.27

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 508091827364524.3339.0939.48

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: NVIDIA CUDARTX 5090RTX 5080NVIDIA RTX 5080481216208.9214.4014.43

Chaos Group V-RAY

This is a test of Chaos Group's V-RAY benchmark. V-RAY is a commercial renderer that can integrate with various creator software products like SketchUp and 3ds Max. The V-RAY benchmark is standalone and supports CPU and NVIDIA CUDA/RTX based rendering. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 6.0Mode: NVIDIA RTX GPURTX 5090RTX 5080NVIDIA RTX 50803K6K9K12K15K1192373887372

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18RTX 5090NVIDIA RTX 5080RTX 50804812162010.9316.3117.42MIN: 4.48 / MAX: 44.35MIN: 4.4 / MAX: 44.98MIN: 4.48 / MAX: 44.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: NVIDIA OptiXRTX 5090RTX 5080NVIDIA RTX 50802468104.557.157.19

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 50802468104.727.297.34

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 508036912156.169.439.47

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: NVIDIA CUDARTX 5090NVIDIA RTX 5080RTX 5080481216208.9912.9413.80

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 508036912157.0010.3910.60

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3RTX 5080RTX 5090NVIDIA RTX 50802468104.224.686.00MIN: 4.04 / MAX: 4.96MIN: 4.08 / MAX: 57.94MIN: 4.05 / MAX: 791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 50800.9271.8542.7813.7084.6352.924.104.12

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: NVIDIA OptiXRTX 5090NVIDIA RTX 5080RTX 50802468105.667.777.84

RealSR-NCNN

RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesRTX 5090NVIDIA RTX 5080RTX 508051015202513.4818.6418.68

IndigoBench

This is a test of Indigo Renderer's IndigoBench benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarRTX 5090RTX 5080NVIDIA RTX 50802040608010092.7068.3568.32

Waifu2x-NCNN Vulkan

Waifu2x-NCNN is an NCNN neural network implementation of the Waifu2x converter project and accelerated using the Vulkan API. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesRTX 5090NVIDIA RTX 5080RTX 50800.65931.31861.97792.63723.29652.2992.7332.930

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16RTX 5090RTX 5080NVIDIA RTX 5080102030405037.0542.0142.47MIN: 22.53 / MAX: 46.22MIN: 23.96 / MAX: 46.31MIN: 25.67 / MAX: 47.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyRTX 5090RTX 5080NVIDIA RTX 5080102030405039.3444.0944.55MIN: 15.92 / MAX: 49.04MIN: 14.62 / MAX: 49.42MIN: 13.75 / MAX: 49.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

RealSR-NCNN

RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoRTX 5090NVIDIA RTX 5080RTX 50801.17472.34943.52414.69885.87354.6305.1535.221

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5RTX 5090NVIDIA RTX 5080RTX 508020000M40000M60000M80000M100000MSE +/- 102551750000.00, N = 21068482500009981580000099435000000

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgus, Fewer Is Betterclpeak 1.1.2OpenCL Test: Kernel LatencyRTX 5080NVIDIA RTX 5080RTX 50901.15882.31763.47644.63525.7944.874.875.151. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthRTX 5090NVIDIA RTX 5080RTX 508060012001800240030002870.682734.062728.641. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

ProjectPhysX OpenCL-Benchmark

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.6Operation: INT64 ComputeRTX 5090NVIDIA RTX 5080RTX 50800.98911.97822.96733.95644.94554.3964.3004.2761. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionRTX 5080NVIDIA RTX 5080RTX 50902004006008001000858.04857.86837.211. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueReadBufferRTX 5080RTX 5090NVIDIA RTX 50804812162013.9513.8313.701. (CXX) g++ options: -O3

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueWriteBufferRTX 5080NVIDIA RTX 5080RTX 509051015202518.6418.4918.411. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadRTX 5080RTX 5090NVIDIA RTX 508071421283527.8527.8327.541. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackRTX 5090NVIDIA RTX 5080RTX 508071421283528.6928.5628.381. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

NAMD CUDA

OpenBenchmarking.orgns/day, More Is BetterNAMD CUDA 3.0.1Input: ATPase with 327,506 AtomsRTX 5080NVIDIA RTX 50804812162014.5714.44

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512RTX 5080NVIDIA RTX 508080016002400320040003878.923849.781. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048RTX 5080NVIDIA RTX 508080016002400320040003705.873687.771. (CXX) g++ options: -O3

NAMD CUDA

OpenBenchmarking.orgns/day, More Is BetterNAMD CUDA 3.0.1Input: STMV with 1,066,628 AtomsRTX 5080NVIDIA RTX 50800.90491.80982.71473.61964.52454.021644.01155

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 5080140028004200560070006653.976642.521. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 5080160032004800640080007304.837292.841. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 508020406080100105.25105.091. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128RTX 5080NVIDIA RTX 508020406080100101.93101.841. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512RTX 5080NVIDIA RTX 50802K4K6K8K10K7804.827799.241. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 5080160032004800640080007331.657327.351. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048NVIDIA RTX 5080RTX 5080140028004200560070006630.676627.591. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128NVIDIA RTX 5080RTX 50802040608010097.0397.001. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024NVIDIA RTX 5080RTX 508080016002400320040003832.033831.531. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512RTX 5080NVIDIA RTX 50802K4K6K8K10K7865.657864.881. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadRTX 5080RTX 5090NVIDIA RTX 508071421283528.7928.7928.791. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

NAMD CUDA

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. This version of the NAMD test profile uses CUDA GPU acceleration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsRTX 50900.01310.02620.03930.05240.06550.05810

ATPase Simulation - 327,506 Atoms

RTX 5080: The test run did not produce a result. E: FATAL ERROR: No simulation config file specified on command line.

NVIDIA RTX 5080: The test run did not produce a result. E: FATAL ERROR: No simulation config file specified on command line.

88 Results Shown

NCNN:
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU - FastestDet
  Vulkan GPU - mnasnet
  Vulkan GPU - regnety_400m
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - googlenet
  Vulkan GPU - blazeface
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - shufflenet-v2
Hashcat
NCNN
Hashcat:
  SHA1
  SHA-512
ProjectPhysX OpenCL-Benchmark:
  INT8 Compute
  FP16 Compute
clpeak:
  Integer Compute
  Integer 24-bit Compute
  Double-Precision Compute
ProjectPhysX OpenCL-Benchmark
SHOC Scalable HeterOgeneous Computing
VkResample
clpeak
ProjectPhysX OpenCL-Benchmark:
  INT32 Compute
  FP32 Compute
NCNN
ProjectPhysX OpenCL-Benchmark
Hashcat
Chaos Group V-RAY
SHOC Scalable HeterOgeneous Computing:
  OpenCL - GEMM SGEMM_N
  OpenCL - S3D
FluidX3D
ProjectPhysX OpenCL-Benchmark
SHOC Scalable HeterOgeneous Computing
FluidX3D
clpeak
VkResample
Blender
FluidX3D
ProjectPhysX OpenCL-Benchmark
NCNN
Blender
NCNN:
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - mobilenet
Blender
IndigoBench
Blender:
  Barbershop - NVIDIA OptiX
  Fishy Cat - NVIDIA CUDA
Chaos Group V-RAY
NCNN
Blender:
  Fishy Cat - NVIDIA OptiX
  BMW27 - NVIDIA CUDA
  Classroom - NVIDIA OptiX
  Junkshop - NVIDIA CUDA
  Pabellon Barcelona - NVIDIA OptiX
NCNN
Blender:
  BMW27 - NVIDIA OptiX
  Junkshop - NVIDIA OptiX
RealSR-NCNN
IndigoBench
Waifu2x-NCNN Vulkan
NCNN:
  Vulkan GPU - vgg16
  Vulkan GPU - yolov4-tiny
RealSR-NCNN
Hashcat
clpeak
SHOC Scalable HeterOgeneous Computing
ProjectPhysX OpenCL-Benchmark
SHOC Scalable HeterOgeneous Computing
clpeak:
  Transfer Bandwidth enqueueReadBuffer
  Transfer Bandwidth enqueueWriteBuffer
SHOC Scalable HeterOgeneous Computing:
  OpenCL - Triad
  OpenCL - Bus Speed Readback
NAMD CUDA
Llama.cpp:
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048
NAMD CUDA
Llama.cpp:
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048
  NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
  NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024
  NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
SHOC Scalable HeterOgeneous Computing
NAMD CUDA