NVIDIA GPU Compute Benchmarks

Benchmarks for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2106089-IB-NVIDIACOM84
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

BLAS (Basic Linear Algebra Sub-Routine) Tests 2 Tests
CPU Massive 4 Tests
Creator Workloads 5 Tests
Game Development 2 Tests
HPC - High Performance Computing 4 Tests
Machine Learning 3 Tests
Multi-Core 5 Tests
NVIDIA GPU Compute 22 Tests
OpenCL 4 Tests
Renderers 4 Tests
Vulkan Compute 5 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
RTX 2060 SUPER
June 03 2021
  7 Hours, 7 Minutes
RTX 2070
June 06 2021
  7 Hours, 8 Minutes
RTX 2070 SUPER
June 05 2021
  6 Hours, 3 Minutes
RTX 2080
June 03 2021
  6 Hours, 9 Minutes
RTX 2080 SUPER
June 05 2021
  5 Hours, 28 Minutes
RTX 2080 Ti
June 06 2021
  4 Hours, 47 Minutes
TITAN RTX
June 04 2021
  4 Hours, 45 Minutes
RTX 3060
June 04 2021
  5 Hours, 58 Minutes
RTX 3060 Ti
June 05 2021
  5 Hours, 10 Minutes
RTX 3070
June 04 2021
  4 Hours, 37 Minutes
RTX 3070 Ti
June 02 2021
  4 Hours, 49 Minutes
RTX 3080
June 03 2021
  4 Hours, 1 Minute
RTX 3080 Ti
June 03 2021
  4 Hours, 36 Minutes
RTX 3090
June 02 2021
  4 Hours, 22 Minutes
Invert Hiding All Results Option
  5 Hours, 21 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


NVIDIA GPU Compute Benchmarks Suite 1.0.0 System Test suite extracted from NVIDIA GPU Compute Benchmarks. pts/hashcat-1.0.0 -m 0 Benchmark: MD5 pts/hashcat-1.0.0 -m 100 Benchmark: SHA1 pts/hashcat-1.0.0 -m 1700 Benchmark: SHA-512 pts/hashcat-1.0.0 -m 11600 Benchmark: 7-Zip pts/hashcat-1.0.0 -m 6211 Benchmark: TrueCrypt RIPEMD160 + XTS pts/fahbench-1.0.2 pts/namd-cuda-1.1.1 ATPase Simulation - 327,506 Atoms pts/mixbench-1.1.1 mixbench-cuda-ro SPGFLOPS Backend: NVIDIA CUDA - Benchmark: Single Precision pts/mixbench-1.1.1 mixbench-cuda-ro DPGFLOPS Backend: NVIDIA CUDA - Benchmark: Double Precision pts/mixbench-1.1.1 mixbench-cuda-ro HPGFLOPS Backend: NVIDIA CUDA - Benchmark: Half Precision pts/mixbench-1.1.1 mixbench-cuda-ro GIOPS Backend: NVIDIA CUDA - Benchmark: Integer pts/octanebench-1.3.0 Total Score pts/luxcorerender-1.3.0 DLSC/LuxCoreScene/render.cfg -D renderengine.type PATHOCL -D opencl.native.threads.count 0 -D context.cuda.optix.enable 0 Scene: DLSC - Acceleration: GPU pts/luxcorerender-1.3.0 RainbowColorsAndPrism/LuxCoreScene/render.cfg -D renderengine.type PATHOCL -D opencl.native.threads.count 0 -D context.cuda.optix.enable 0 Scene: Rainbow Colors and Prism - Acceleration: GPU pts/luxcorerender-1.3.0 LuxCore2.1Benchmark/LuxCoreScene/render.cfg -D renderengine.type PATHOCL -D opencl.native.threads.count 0 -D context.cuda.optix.enable 0 Scene: LuxCore Benchmark - Acceleration: GPU pts/luxcorerender-1.3.0 OrangeJuice/LuxCoreScene/render.cfg -D renderengine.type PATHOCL -D opencl.native.threads.count 0 -D context.cuda.optix.enable 0 Scene: Orange Juice - Acceleration: GPU pts/luxcorerender-1.3.0 DanishMood/LuxCoreScene/render.cfg -D renderengine.type PATHOCL -D opencl.native.threads.count 0 -D context.cuda.optix.enable 0 Scene: Danish Mood - Acceleration: GPU pts/arrayfire-1.1.0 blas_opencl Test: BLAS OpenCL pts/arrayfire-1.1.0 cg_opencl Test: Conjugate Gradient OpenCL pts/clpeak-1.0.1 --global-bandwidth OpenCL Test: Global Memory Bandwidth pts/clpeak-1.0.1 --compute-sp OpenCL Test: Single-Precision Float pts/clpeak-1.0.1 --compute-dp OpenCL Test: Double-Precision Double pts/clpeak-1.0.1 --compute-integer OpenCL Test: Integer Compute INT pts/vkpeak-1.0.2 fp32-scalar pts/vkpeak-1.0.2 fp32-vec4 pts/vkpeak-1.0.2 fp16-scalar pts/vkpeak-1.0.2 fp16-vec4 pts/vkpeak-1.0.2 fp64-scalar pts/vkpeak-1.0.2 fp64-vec4 pts/vkpeak-1.0.2 int32-scalar pts/vkpeak-1.0.2 int32-vec4 pts/vkpeak-1.0.2 int16-scalar pts/vkpeak-1.0.2 int16-vec4 pts/plaidml-1.0.4 --no-fp16 --no-train resnet50 OPENCL FP16: No - Mode: Inference - Network: ResNet 50 - Device: OpenCL pts/plaidml-1.0.4 --no-fp16 --no-train vgg16 OPENCL FP16: No - Mode: Inference - Network: VGG16 - Device: OpenCL pts/plaidml-1.0.4 --no-fp16 --no-train vgg19 OPENCL FP16: No - Mode: Inference - Network: VGG19 - Device: OpenCL pts/lczero-1.5.1 -b opencl Backend: OpenCL pts/cl-mem-1.0.1 READ Benchmark: Read pts/cl-mem-1.0.1 WRITE Benchmark: Write pts/cl-mem-1.0.1 COPY Benchmark: Copy pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - sCOPY pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - sAXPY pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dCOPY pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dAXPY pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dDOT pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dGEMM-NN pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dGEMM-NT pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dGEMM-TN pts/shoc-1.2.0 -opencl -benchmark DeviceMemory Target: OpenCL - Benchmark: Texture Read Bandwidth pts/shoc-1.2.0 -opencl -benchmark FFT Target: OpenCL - Benchmark: FFT SP pts/shoc-1.2.0 -opencl -benchmark GEMM Target: OpenCL - Benchmark: GEMM SGEMM_N pts/shoc-1.2.0 -opencl -benchmark MD5Hash Target: OpenCL - Benchmark: MD5 Hash pts/shoc-1.2.0 -opencl -benchmark S3D Target: OpenCL - Benchmark: S3D pts/indigobench-1.1.0 --gpuonly --scenes supercar Acceleration: OpenCL GPU - Scene: Supercar pts/indigobench-1.1.0 --gpuonly --scenes bedroom Acceleration: OpenCL GPU - Scene: Bedroom pts/blender-1.9.0 -b ../bmw27_gpu.blend -o output.test -x 1 -F JPEG -f 1 CUDA Blend File: BMW27 - Compute: CUDA pts/blender-1.9.0 -b ../bmw27_gpu.blend -o output.test -x 1 -F JPEG -f 1 OPTIX Blend File: BMW27 - Compute: NVIDIA OptiX pts/blender-1.9.0 -b ../classroom_gpu.blend -o output.test -x 1 -F JPEG -f 1 CUDA Blend File: Classroom - Compute: CUDA pts/blender-1.9.0 -b ../classroom_gpu.blend -o output.test -x 1 -F JPEG -f 1 OPTIX Blend File: Classroom - Compute: NVIDIA OptiX pts/blender-1.9.0 -b ../fishy_cat_gpu.blend -o output.test -x 1 -F JPEG -f 1 CUDA Blend File: Fishy Cat - Compute: CUDA pts/blender-1.9.0 -b ../fishy_cat_gpu.blend -o output.test -x 1 -F JPEG -f 1 OPTIX Blend File: Fishy Cat - Compute: NVIDIA OptiX pts/blender-1.9.0 -b ../pavillon_barcelone_gpu.blend -o output.test -x 1 -F JPEG -f 1 CUDA Blend File: Pabellon Barcelona - Compute: CUDA pts/blender-1.9.0 -b ../pavillon_barcelone_gpu.blend -o output.test -x 1 -F JPEG -f 1 OPTIX Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX pts/blender-1.9.0 -b ../barbershop_interior_gpu.blend -o output.test -x 1 -F JPEG -f 1 CUDA Blend File: Barbershop - Compute: CUDA pts/blender-1.9.0 -b ../barbershop_interior_gpu.blend -o output.test -x 1 -F JPEG -f 1 OPTIX Blend File: Barbershop - Compute: NVIDIA OptiX pts/v-ray-1.3.0 -m vray-gpu-cuda Mode: NVIDIA CUDA GPU pts/v-ray-1.3.0 -m vray-gpu-rtx Mode: NVIDIA RTX GPU pts/vkresample-1.0.0 -u 2 -p 0 Upscale: 2x - Precision: Single pts/realsr-ncnn-1.0.0 -s 4 -x Scale: 4x - TAA: Yes pts/realsr-ncnn-1.0.0 -s 4 Scale: 4x - TAA: No pts/waifu2x-ncnn-1.0.0 -s 2 -n 3 -x Scale: 2x - Denoise: 3 - TAA: Yes pts/betsy-1.0.0 --codec=etc1 --quality=2 Codec: ETC1 - Quality: Highest pts/betsy-1.0.0 --codec=etc2_rgb --quality=2 Codec: ETC2 RGB - Quality: Highest pts/redshift-1.0.1 pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - sDOT pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dGEMV-N pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dGEMV-T pts/viennacl-1.1.0 dense_blas-bench-opencl Test: OpenCL BLAS - dGEMM-TT