NVIDIA GPU Compute OpenCL CUDA Benchmarks

Some benchmarks by Michael Larabel tests for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2009112-FI-ROCMGPUCO10
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

BLAS (Basic Linear Algebra Sub-Routine) Tests 2 Tests
CPU Massive 3 Tests
HPC - High Performance Computing 4 Tests
Machine Learning 2 Tests
Multi-Core 3 Tests
NVIDIA GPU Compute 14 Tests
OpenCL 5 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
GTX 1060
September 08 2020
  1 Hour, 10 Minutes
GTX 1070
September 07 2020
  1 Hour, 6 Minutes
GTX 1070 Ti
September 06 2020
  1 Hour, 11 Minutes
GTX 1080
September 07 2020
  1 Hour, 4 Minutes
GTX 1650
September 09 2020
  1 Hour, 10 Minutes
GTX 1650 SUPER
September 08 2020
  1 Hour, 8 Minutes
GTX 1660
September 09 2020
  1 Hour, 5 Minutes
GTX 1660 SUPER
September 07 2020
  1 Hour, 5 Minutes
GTX 1660 Ti
September 09 2020
  1 Hour, 3 Minutes
RTX 2060
September 08 2020
  1 Hour, 3 Minutes
RTX 2060 SUPER
September 09 2020
  1 Hour, 1 Minute
RTX 2070
September 09 2020
  1 Hour, 1 Minute
RTX 2070 SUPER
September 08 2020
  59 Minutes
RTX 2080
September 08 2020
  1 Hour
RTX 2080 SUPER
September 08 2020
  59 Minutes
RTX 2080 Ti
September 07 2020
  1 Hour
TITAN RTX
September 06 2020
  1 Hour, 1 Minute
Invert Hiding All Results Option
  1 Hour, 4 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


NVIDIA GPU Compute OpenCL CUDA Benchmarks Suite 1.0.0 System Test suite extracted from NVIDIA GPU Compute OpenCL CUDA Benchmarks. pts/lczero-1.5.0 -b opencl Backend: OpenCL pts/clpeak-1.0.1 --compute-sp OpenCL Test: Single-Precision Float pts/cl-mem-1.0.1 READ Benchmark: Read pts/clpeak-1.0.1 --global-bandwidth OpenCL Test: Global Memory Bandwidth pts/clpeak-1.0.1 --compute-dp OpenCL Test: Double-Precision Double pts/mixbench-1.1.1 mixbench-cuda-ro HPGFLOPS Backend: NVIDIA CUDA - Benchmark: Half Precision pts/mixbench-1.1.1 mixbench-ocl-ro DPGFLOPS Backend: OpenCL - Benchmark: Double Precision pts/cl-mem-1.0.1 WRITE Benchmark: Write pts/mixbench-1.1.1 mixbench-ocl-ro SPGFLOPS Backend: OpenCL - Benchmark: Single Precision pts/luxcorerender-cl-1.2.0 RainbowColorsAndPrism/LuxCoreScene/render.cfg Scene: Rainbow Colors and Prism pts/mixbench-1.1.1 mixbench-cuda-ro DPGFLOPS Backend: NVIDIA CUDA - Benchmark: Double Precision pts/plaidml-1.0.4 --no-fp16 --no-train mobilenet OPENCL FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL pts/mixbench-1.1.1 mixbench-cuda-ro SPGFLOPS Backend: NVIDIA CUDA - Benchmark: Single Precision pts/financebench-1.0.0 Black-Scholes/OpenCL/blackScholesAnalyticEngine.exe Benchmark: Black-Scholes OpenCL pts/rodinia-1.3.1 OCL_PARTICLEFILTER Test: OpenCL Particle Filter pts/octanebench-1.2.1 Total Score pts/arrayfire-1.1.0 cg_opencl Test: Conjugate Gradient OpenCL pts/plaidml-1.0.4 --no-fp16 --no-train imdb_lstm OPENCL FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL pts/fahbench-1.0.1 pts/plaidml-1.0.4 --no-fp16 --no-train densenet201 OPENCL FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL pts/neatbench-1.0.4 gpu Acceleration: GPU pts/luxcorerender-cl-1.2.0 DLSC/LuxCoreScene/render.cfg Scene: DLSC pts/mandelgpu-1.3.1 0 1 OpenCL Device: GPU pts/luxcorerender-cl-1.2.0 LuxCore2.1Benchmark/LuxCoreScene/render.cfg Scene: LuxCore Benchmark pts/plaidml-1.0.4 --fp16 --no-train mobilenet OPENCL FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL pts/luxcorerender-cl-1.2.0 Food/LuxCoreScene/render.cfg Scene: Food pts/cl-mem-1.0.1 COPY Benchmark: Copy pts/plaidml-1.0.4 --no-fp16 --train mobilenet OPENCL FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL pts/gromacs-gpu-1.1.0 Water Benchmark pts/mixbench-1.1.1 mixbench-ocl-ro GIOPS Backend: OpenCL - Benchmark: Integer pts/mixbench-1.1.1 mixbench-cuda-ro GIOPS Backend: NVIDIA CUDA - Benchmark: Integer pts/clpeak-1.0.1 --compute-integer OpenCL Test: Integer Compute INT pts/namd-cuda-1.1.0 ATPase Simulation - 327,506 Atoms pts/viennacl-1.0.0 OpenCL LU Factorization