gpu_compute_nvidia0_run1 Intel Core i7-12700K testing with a Gigabyte B660 DS3H DDR4 (F4 BIOS) and Gigabyte NVIDIA GeForce RTX 3070 8GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2304282-NE-GPUCOMPUT72&grs .
gpu_compute_nvidia0_run1 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution gpu_compute_0_1 Intel Core i7-12700K @ 4.90GHz (12 Cores / 20 Threads) Gigabyte B660 DS3H DDR4 (F4 BIOS) Intel Device 7aa7 64GB 2000GB PNY CS2130 2TB SSD + 4 x 6001GB Western Digital WD60EZAZ-00S Gigabyte NVIDIA GeForce RTX 3070 8GB Realtek ALC897 BenQ PD2700U Realtek RTL8111/8168/8411 Ubuntu 22.04 5.15.0-71-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 NVIDIA 530.41.03 4.6.0 OpenCL 3.0 CUDA 12.1.98 1.3.236 GCC 11.3.0 + CUDA 11.5 ext4 3840x2160 OpenBenchmarking.org - Transparent Huge Pages: madvise - __GLX_VENDOR_LIBRARY_NAME=nvidia NVM_CD_FLAGS= - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x2c - Thermald 2.4.9 - BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.46.00.e3 - GPU Compute Cores: 5888 - Python 3.10.6 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
gpu_compute_nvidia0_run1 neatbench: GPU mandelgpu: GPU indigobench: OpenCL GPU - Supercar indigobench: OpenCL GPU - Bedroom blender: Pabellon Barcelona - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: BMW27 - NVIDIA OptiX viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sCOPY viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sCOPY financebench: Black-Scholes OpenCL luxcorerender: Rainbow Colors and Prism - GPU luxcorerender: LuxCore Benchmark - GPU luxcorerender: Orange Juice - GPU luxcorerender: Danish Mood - GPU luxcorerender: DLSC - GPU arrayfire: Conjugate Gradient OpenCL rodinia: OpenCL Particle Filter lczero: OpenCL clpeak: Global Memory Bandwidth clpeak: Double-Precision Double clpeak: Single-Precision Float clpeak: Integer Compute INT fahbench: octanebench: Total Score vkresample: 2x - Single vkresample: 2x - Double namd-cuda: ATPase Simulation - 327,506 Atoms cl-mem: Write cl-mem: Read cl-mem: Copy libplacebo: hdr_peakdetect libplacebo: polar_nocompute libplacebo: deband_heavy mixbench: NVIDIA CUDA - Single Precision mixbench: NVIDIA CUDA - Double Precision mixbench: NVIDIA CUDA - Half Precision mixbench: OpenCL - Single Precision mixbench: OpenCL - Double Precision mixbench: NVIDIA CUDA - Integer mixbench: OpenCL - Integer hashcat: TrueCrypt RIPEMD160 + XTS hashcat: SHA-512 hashcat: 7-Zip hashcat: SHA1 hashcat: MD5 vkfft: waifu2x-ncnn: 2x - 3 - Yes realsr-ncnn: 4x - Yes realsr-ncnn: 4x - No vkpeak: int16-vec4 vkpeak: int16-scalar vkpeak: int32-vec4 vkpeak: int32-scalar vkpeak: fp64-vec4 vkpeak: fp64-scalar vkpeak: fp16-vec4 vkpeak: fp16-scalar vkpeak: fp32-vec4 vkpeak: fp32-scalar viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMV-N libplacebo: av1_grain_lap libplacebo: hdr_lut waifu2x-ncnn: 2x - 3 - No gpu_compute_0_1 3070 399785618.2 70.557 25.881 14.15 49.68 10.45 12.65 5.33 330 332 176 402 400 377 328 361 285 67.8 59.0 38.6 37.2 34.3 51.0 48.0 45.8 9.801 35.97 12.50 13.98 9.93 16.43 2.110 6.108 11591 390.85 354.00 19632.77 10137.57 253.7659 405.758304 17.404 219.146 0.34927 388.7 395.0 295.9 1858.13 401.15 683.98 20980.61 294.24 22037.60 21538.33 298.80 9812.41 11162.25 962500 3647233333 1244967 25076066667 80251200000 33111 4.104 48.868 8.346 9838.10 7436.90 11206.15 11252.47 354.1 353.66 22221.79 11276.80 14920.77 11271.30 62.6 59.6 39.0 30.94 165.02 400.45 OpenBenchmarking.org
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU gpu_compute_0_1 700 1400 2100 2800 3500 SE +/- 0.00, N = 3 3070
MandelGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelGPU 1.3pts1 OpenCL Device: GPU gpu_compute_0_1 90M 180M 270M 360M 450M SE +/- 2377934.88, N = 3 399785618.2 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
IndigoBench Acceleration: OpenCL GPU - Scene: Supercar OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Supercar gpu_compute_0_1 16 32 48 64 80 SE +/- 0.02, N = 3 70.56
IndigoBench Acceleration: OpenCL GPU - Scene: Bedroom OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Bedroom gpu_compute_0_1 6 12 18 24 30 SE +/- 0.01, N = 3 25.88
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX gpu_compute_0_1 4 8 12 16 20 SE +/- 0.01, N = 3 14.15
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Barbershop - Compute: NVIDIA OptiX gpu_compute_0_1 11 22 33 44 55 SE +/- 0.09, N = 3 49.68
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Fishy Cat - Compute: NVIDIA OptiX gpu_compute_0_1 3 6 9 12 15 SE +/- 0.01, N = 3 10.45
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Classroom - Compute: NVIDIA OptiX gpu_compute_0_1 3 6 9 12 15 SE +/- 0.01, N = 3 12.65
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: BMW27 - Compute: NVIDIA OptiX gpu_compute_0_1 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.02, N = 3 5.33
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN gpu_compute_0_1 70 140 210 280 350 SE +/- 0.00, N = 2 330 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T gpu_compute_0_1 70 140 210 280 350 SE +/- 0.67, N = 3 332 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N gpu_compute_0_1 40 80 120 160 200 SE +/- 0.33, N = 3 176 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT gpu_compute_0_1 90 180 270 360 450 SE +/- 0.00, N = 3 402 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY gpu_compute_0_1 90 180 270 360 450 SE +/- 0.00, N = 3 400 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY gpu_compute_0_1 80 160 240 320 400 SE +/- 0.00, N = 3 377 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT gpu_compute_0_1 70 140 210 280 350 SE +/- 0.67, N = 3 328 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY gpu_compute_0_1 80 160 240 320 400 SE +/- 0.00, N = 3 361 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY gpu_compute_0_1 60 120 180 240 300 SE +/- 0.33, N = 3 285 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN gpu_compute_0_1 15 30 45 60 75 SE +/- 2.12, N = 3 67.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT gpu_compute_0_1 13 26 39 52 65 SE +/- 1.39, N = 3 59.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT gpu_compute_0_1 9 18 27 36 45 SE +/- 0.07, N = 3 38.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY gpu_compute_0_1 9 18 27 36 45 SE +/- 0.07, N = 3 37.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY gpu_compute_0_1 8 16 24 32 40 SE +/- 0.06, N = 3 34.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT gpu_compute_0_1 12 24 36 48 60 SE +/- 0.03, N = 3 51.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY gpu_compute_0_1 11 22 33 44 55 SE +/- 0.07, N = 3 48.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY gpu_compute_0_1 10 20 30 40 50 SE +/- 0.15, N = 3 45.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL gpu_compute_0_1 3 6 9 12 15 SE +/- 0.005, N = 3 9.801 1. (CXX) g++ options: -O3 -march=native -fopenmp
LuxCoreRender Scene: Rainbow Colors and Prism - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU gpu_compute_0_1 8 16 24 32 40 SE +/- 0.02, N = 3 35.97 MIN: 23.02 / MAX: 47.98
LuxCoreRender Scene: LuxCore Benchmark - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU gpu_compute_0_1 3 6 9 12 15 SE +/- 0.06, N = 3 12.50 MIN: 2.03 / MAX: 16.47
LuxCoreRender Scene: Orange Juice - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU gpu_compute_0_1 4 8 12 16 20 SE +/- 0.02, N = 3 13.98 MIN: 8.24 / MAX: 19.03
LuxCoreRender Scene: Danish Mood - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU gpu_compute_0_1 3 6 9 12 15 SE +/- 0.09, N = 3 9.93 MIN: 1.96 / MAX: 13.27
LuxCoreRender Scene: DLSC - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU gpu_compute_0_1 4 8 12 16 20 SE +/- 0.03, N = 3 16.43 MIN: 12.61 / MAX: 17.13
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL gpu_compute_0_1 0.4748 0.9496 1.4244 1.8992 2.374 SE +/- 0.001, N = 3 2.110 1. (CXX) g++ options: -rdynamic
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter gpu_compute_0_1 2 4 6 8 10 SE +/- 0.009, N = 3 6.108 1. (CXX) g++ options: -O2 -lOpenCL
LeelaChessZero Backend: OpenCL OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: OpenCL gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 123.11, N = 4 11591 1. (CXX) g++ options: -flto -pthread
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth gpu_compute_0_1 80 160 240 320 400 SE +/- 0.04, N = 3 390.85 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double gpu_compute_0_1 80 160 240 320 400 SE +/- 0.18, N = 3 354.00 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float gpu_compute_0_1 4K 8K 12K 16K 20K SE +/- 1.04, N = 3 19632.77 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 95.68, N = 3 10137.57 1. (CXX) g++ options: -O3
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 gpu_compute_0_1 60 120 180 240 300 SE +/- 0.21, N = 3 253.77
OctaneBench Total Score OpenBenchmarking.org Score, More Is Better OctaneBench 2020.1 Total Score gpu_compute_0_1 90 180 270 360 450 405.76
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single gpu_compute_0_1 4 8 12 16 20 SE +/- 0.01, N = 3 17.40 1. (CXX) g++ options: -O3
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double gpu_compute_0_1 50 100 150 200 250 SE +/- 0.30, N = 3 219.15 1. (CXX) g++ options: -O3
NAMD CUDA ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD CUDA 2.14 ATPase Simulation - 327,506 Atoms gpu_compute_0_1 0.0786 0.1572 0.2358 0.3144 0.393 SE +/- 0.00036, N = 3 0.34927
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write gpu_compute_0_1 80 160 240 320 400 SE +/- 0.03, N = 3 388.7 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read gpu_compute_0_1 90 180 270 360 450 SE +/- 0.03, N = 3 395.0 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy gpu_compute_0_1 60 120 180 240 300 SE +/- 0.09, N = 3 295.9 1. (CC) gcc options: -O2 -flto -lOpenCL
Libplacebo Test: hdr_peakdetect OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: hdr_peakdetect gpu_compute_0_1 400 800 1200 1600 2000 SE +/- 61.85, N = 3 1858.13
Libplacebo Test: polar_nocompute OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: polar_nocompute gpu_compute_0_1 90 180 270 360 450 SE +/- 7.98, N = 3 401.15
Libplacebo Test: deband_heavy OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: deband_heavy gpu_compute_0_1 150 300 450 600 750 SE +/- 0.06, N = 3 683.98
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision gpu_compute_0_1 4K 8K 12K 16K 20K SE +/- 61.49, N = 3 20980.61 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision gpu_compute_0_1 60 120 180 240 300 SE +/- 0.00, N = 3 294.24 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision gpu_compute_0_1 5K 10K 15K 20K 25K SE +/- 4.30, N = 3 22037.60 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision gpu_compute_0_1 5K 10K 15K 20K 25K SE +/- 8.31, N = 3 21538.33 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision gpu_compute_0_1 70 140 210 280 350 SE +/- 0.00, N = 3 298.80 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 1.70, N = 3 9812.41 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 22.27, N = 3 11162.25 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS gpu_compute_0_1 200K 400K 600K 800K 1000K SE +/- 2967.04, N = 3 962500
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 gpu_compute_0_1 800M 1600M 2400M 3200M 4000M SE +/- 1386041.53, N = 3 3647233333
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: 7-Zip gpu_compute_0_1 300K 600K 900K 1200K 1500K SE +/- 352.77, N = 3 1244967
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 gpu_compute_0_1 5000M 10000M 15000M 20000M 25000M SE +/- 24708995.21, N = 3 25076066667
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: MD5 gpu_compute_0_1 20000M 40000M 60000M 80000M 100000M SE +/- 20201320.09, N = 3 80251200000
VkFFT OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.1.1 gpu_compute_0_1 7K 14K 21K 28K 35K SE +/- 351.06, N = 5 33111 1. (CXX) g++ options: -O3
Waifu2x-NCNN Vulkan Scale: 2x - Denoise: 3 - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes gpu_compute_0_1 0.9234 1.8468 2.7702 3.6936 4.617 SE +/- 0.008, N = 3 4.104
RealSR-NCNN Scale: 4x - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: Yes gpu_compute_0_1 11 22 33 44 55 SE +/- 0.08, N = 3 48.87
RealSR-NCNN Scale: 4x - TAA: No OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: No gpu_compute_0_1 2 4 6 8 10 SE +/- 0.098, N = 15 8.346
vkpeak int16-vec4 OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int16-vec4 gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 0.48, N = 3 9838.10
vkpeak int16-scalar OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int16-scalar gpu_compute_0_1 1600 3200 4800 6400 8000 SE +/- 0.52, N = 3 7436.90
vkpeak int32-vec4 OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int32-vec4 gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 0.64, N = 3 11206.15
vkpeak int32-scalar OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int32-scalar gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 0.84, N = 3 11252.47
vkpeak fp64-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp64-vec4 gpu_compute_0_1 80 160 240 320 400 SE +/- 0.03, N = 3 354.1
vkpeak fp64-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp64-scalar gpu_compute_0_1 80 160 240 320 400 SE +/- 0.02, N = 3 353.66
vkpeak fp16-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp16-vec4 gpu_compute_0_1 5K 10K 15K 20K 25K SE +/- 0.48, N = 3 22221.79
vkpeak fp16-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp16-scalar gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 20.79, N = 3 11276.80
vkpeak fp32-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp32-vec4 gpu_compute_0_1 3K 6K 9K 12K 15K SE +/- 46.41, N = 3 14920.77
vkpeak fp32-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp32-scalar gpu_compute_0_1 2K 4K 6K 8K 10K SE +/- 26.89, N = 3 11271.30
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT gpu_compute_0_1 14 28 42 56 70 SE +/- 2.29, N = 3 62.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN gpu_compute_0_1 13 26 39 52 65 SE +/- 2.10, N = 3 59.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T gpu_compute_0_1 9 18 27 36 45 SE +/- 4.17, N = 3 39.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N gpu_compute_0_1 7 14 21 28 35 SE +/- 11.06, N = 3 30.94 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Libplacebo Test: av1_grain_lap OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: av1_grain_lap gpu_compute_0_1 40 80 120 160 200 SE +/- 6.87, N = 3 165.02
Libplacebo Test: hdr_lut OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: hdr_lut gpu_compute_0_1 90 180 270 360 450 SE +/- 17.07, N = 3 400.45
Phoronix Test Suite v10.8.4