gpu_compute_nvidia1_run0 Intel Core i7-12700K testing with a Gigabyte B660 DS3H DDR4 (F4 BIOS) and Gigabyte NVIDIA GeForce RTX 3070 8GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2304284-NE-GPUCOMPUT95 .
gpu_compute_nvidia1_run0 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution gpu_compute_1_0 Intel Core i7-12700K @ 4.90GHz (12 Cores / 20 Threads) Gigabyte B660 DS3H DDR4 (F4 BIOS) Intel Device 7aa7 64GB 2000GB PNY CS2130 2TB SSD + 4 x 6001GB Western Digital WD60EZAZ-00S Gigabyte NVIDIA GeForce RTX 3070 8GB Realtek ALC897 BenQ PD2700U Realtek RTL8111/8168/8411 Ubuntu 22.04 5.15.0-70-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 NVIDIA 530.41.03 4.6.0 OpenCL 3.0 CUDA 12.1.98 1.3.236 GCC 11.3.0 + CUDA 11.5 ext4 3840x2160 OpenBenchmarking.org - Transparent Huge Pages: madvise - __GLX_VENDOR_LIBRARY_NAME=nvidia NVM_CD_FLAGS= - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x2c - Thermald 2.4.9 - BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.46.00.e3 - GPU Compute Cores: 5888 - Python 3.10.6 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
gpu_compute_nvidia1_run0 vkpeak: fp32-scalar vkpeak: fp32-vec4 vkpeak: fp16-scalar vkpeak: fp16-vec4 vkpeak: fp64-scalar vkpeak: fp64-vec4 vkpeak: int32-scalar vkpeak: int32-vec4 vkpeak: int16-scalar vkpeak: int16-vec4 realsr-ncnn: 4x - No realsr-ncnn: 4x - Yes waifu2x-ncnn: 2x - 3 - Yes vkfft: hashcat: MD5 hashcat: SHA1 hashcat: 7-Zip hashcat: SHA-512 hashcat: TrueCrypt RIPEMD160 + XTS mixbench: OpenCL - Integer mixbench: NVIDIA CUDA - Integer mixbench: OpenCL - Double Precision mixbench: OpenCL - Single Precision mixbench: NVIDIA CUDA - Half Precision mixbench: NVIDIA CUDA - Double Precision mixbench: NVIDIA CUDA - Single Precision cl-mem: Copy cl-mem: Read cl-mem: Write namd-cuda: ATPase Simulation - 327,506 Atoms vkresample: 2x - Double vkresample: 2x - Single octanebench: Total Score fahbench: clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth lczero: OpenCL rodinia: OpenCL Particle Filter arrayfire: Conjugate Gradient OpenCL luxcorerender: DLSC - GPU luxcorerender: Danish Mood - GPU luxcorerender: Orange Juice - GPU luxcorerender: LuxCore Benchmark - GPU luxcorerender: Rainbow Colors and Prism - GPU financebench: Black-Scholes OpenCL viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMM-TN blender: BMW27 - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Pabellon Barcelona - NVIDIA OptiX indigobench: OpenCL GPU - Bedroom indigobench: OpenCL GPU - Supercar mandelgpu: GPU neatbench: GPU gpu_compute_1_0 11248.76 14887.96 11255.49 22223.36 352.99 353.36 11217.56 11170.38 7409.34 9837.29 8.220 48.861 4.162 32563 79917100000 25055033333 1244700 3652700000 957867 11118.02 9818.92 298.81 21840.61 22097.96 295.70 21050.32 296.1 395.1 388.5 0.34911 217.287 17.394 406.193104 253.3261 10064.62 19624.76 354.21 390.84 12008 6.116 2.107 16.40 10.06 14.02 12.46 35.86 9.806 45.6 47.8 50.8 34.2 36.9 38.4 40.4 43.0 59.3 58.7 62.2 62.1 286 361 329 377 400 402 176 333 332 330 5.36 12.65 10.46 49.78 14.21 25.861 70.622 220704201.9 3070 OpenBenchmarking.org
vkpeak fp32-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp32-scalar gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 30.07, N = 3 11248.76
vkpeak fp32-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp32-vec4 gpu_compute_1_0 3K 6K 9K 12K 15K SE +/- 8.69, N = 3 14887.96
vkpeak fp16-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp16-scalar gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 0.67, N = 3 11255.49
vkpeak fp16-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp16-vec4 gpu_compute_1_0 5K 10K 15K 20K 25K SE +/- 0.65, N = 3 22223.36
vkpeak fp64-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp64-scalar gpu_compute_1_0 80 160 240 320 400 SE +/- 0.32, N = 3 352.99
vkpeak fp64-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20210424 fp64-vec4 gpu_compute_1_0 80 160 240 320 400 SE +/- 0.01, N = 3 353.36
vkpeak int32-scalar OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int32-scalar gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 0.34, N = 3 11217.56
vkpeak int32-vec4 OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int32-vec4 gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 0.54, N = 3 11170.38
vkpeak int16-scalar OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int16-scalar gpu_compute_1_0 1600 3200 4800 6400 8000 SE +/- 4.84, N = 3 7409.34
vkpeak int16-vec4 OpenBenchmarking.org GIOPS, More Is Better vkpeak 20210424 int16-vec4 gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 0.66, N = 3 9837.29
RealSR-NCNN Scale: 4x - TAA: No OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: No gpu_compute_1_0 2 4 6 8 10 SE +/- 0.054, N = 3 8.220
RealSR-NCNN Scale: 4x - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: Yes gpu_compute_1_0 11 22 33 44 55 SE +/- 0.03, N = 3 48.86
Waifu2x-NCNN Vulkan Scale: 2x - Denoise: 3 - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes gpu_compute_1_0 0.9365 1.873 2.8095 3.746 4.6825 SE +/- 0.012, N = 3 4.162
VkFFT OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.1.1 gpu_compute_1_0 7K 14K 21K 28K 35K SE +/- 361.86, N = 9 32563 1. (CXX) g++ options: -O3
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: MD5 gpu_compute_1_0 20000M 40000M 60000M 80000M 100000M SE +/- 29937490.43, N = 3 79917100000
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 gpu_compute_1_0 5000M 10000M 15000M 20000M 25000M SE +/- 23458994.96, N = 3 25055033333
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: 7-Zip gpu_compute_1_0 300K 600K 900K 1200K 1500K SE +/- 3564.17, N = 3 1244700
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 gpu_compute_1_0 800M 1600M 2400M 3200M 4000M SE +/- 2598076.21, N = 3 3652700000
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS gpu_compute_1_0 200K 400K 600K 800K 1000K SE +/- 448.45, N = 3 957867
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 0.31, N = 3 11118.02 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 9818.92 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision gpu_compute_1_0 70 140 210 280 350 SE +/- 0.00, N = 3 298.81 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision gpu_compute_1_0 5K 10K 15K 20K 25K SE +/- 9.46, N = 3 21840.61 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision gpu_compute_1_0 5K 10K 15K 20K 25K SE +/- 4.22, N = 3 22097.96 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision gpu_compute_1_0 60 120 180 240 300 SE +/- 0.03, N = 3 295.70 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision gpu_compute_1_0 5K 10K 15K 20K 25K SE +/- 14.82, N = 3 21050.32 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy gpu_compute_1_0 60 120 180 240 300 SE +/- 0.23, N = 3 296.1 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read gpu_compute_1_0 90 180 270 360 450 SE +/- 0.03, N = 3 395.1 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write gpu_compute_1_0 80 160 240 320 400 SE +/- 0.10, N = 3 388.5 1. (CC) gcc options: -O2 -flto -lOpenCL
NAMD CUDA ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD CUDA 2.14 ATPase Simulation - 327,506 Atoms gpu_compute_1_0 0.0785 0.157 0.2355 0.314 0.3925 SE +/- 0.00105, N = 3 0.34911
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double gpu_compute_1_0 50 100 150 200 250 SE +/- 0.22, N = 3 217.29 1. (CXX) g++ options: -O3
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single gpu_compute_1_0 4 8 12 16 20 SE +/- 0.00, N = 3 17.39 1. (CXX) g++ options: -O3
OctaneBench Total Score OpenBenchmarking.org Score, More Is Better OctaneBench 2020.1 Total Score gpu_compute_1_0 90 180 270 360 450 406.19
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 gpu_compute_1_0 60 120 180 240 300 SE +/- 0.02, N = 3 253.33
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT gpu_compute_1_0 2K 4K 6K 8K 10K SE +/- 30.77, N = 3 10064.62 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float gpu_compute_1_0 4K 8K 12K 16K 20K SE +/- 4.35, N = 3 19624.76 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double gpu_compute_1_0 80 160 240 320 400 SE +/- 0.24, N = 3 354.21 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth gpu_compute_1_0 80 160 240 320 400 SE +/- 0.04, N = 3 390.84 1. (CXX) g++ options: -O3
LeelaChessZero Backend: OpenCL OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: OpenCL gpu_compute_1_0 3K 6K 9K 12K 15K SE +/- 131.06, N = 4 12008 1. (CXX) g++ options: -flto -pthread
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter gpu_compute_1_0 2 4 6 8 10 SE +/- 0.005, N = 3 6.116 1. (CXX) g++ options: -O2 -lOpenCL
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL gpu_compute_1_0 0.4741 0.9482 1.4223 1.8964 2.3705 SE +/- 0.004, N = 3 2.107 1. (CXX) g++ options: -rdynamic
LuxCoreRender Scene: DLSC - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU gpu_compute_1_0 4 8 12 16 20 SE +/- 0.03, N = 3 16.40 MIN: 12.54 / MAX: 17.11
LuxCoreRender Scene: Danish Mood - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU gpu_compute_1_0 3 6 9 12 15 SE +/- 0.08, N = 15 10.06 MIN: 1.73 / MAX: 13.5
LuxCoreRender Scene: Orange Juice - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU gpu_compute_1_0 4 8 12 16 20 SE +/- 0.05, N = 3 14.02 MIN: 7.68 / MAX: 19.2
LuxCoreRender Scene: LuxCore Benchmark - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU gpu_compute_1_0 3 6 9 12 15 SE +/- 0.06, N = 3 12.46 MIN: 2.04 / MAX: 16.44
LuxCoreRender Scene: Rainbow Colors and Prism - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU gpu_compute_1_0 8 16 24 32 40 SE +/- 0.13, N = 3 35.86 MIN: 22.99 / MAX: 47.98
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL gpu_compute_1_0 3 6 9 12 15 SE +/- 0.003, N = 3 9.806 1. (CXX) g++ options: -O3 -march=native -fopenmp
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY gpu_compute_1_0 10 20 30 40 50 SE +/- 0.83, N = 12 45.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY gpu_compute_1_0 11 22 33 44 55 SE +/- 0.84, N = 12 47.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT gpu_compute_1_0 11 22 33 44 55 SE +/- 0.81, N = 12 50.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY gpu_compute_1_0 8 16 24 32 40 SE +/- 0.60, N = 12 34.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY gpu_compute_1_0 8 16 24 32 40 SE +/- 0.64, N = 12 36.9 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT gpu_compute_1_0 9 18 27 36 45 SE +/- 0.62, N = 12 38.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N gpu_compute_1_0 9 18 27 36 45 SE +/- 1.38, N = 12 40.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T gpu_compute_1_0 10 20 30 40 50 SE +/- 0.70, N = 12 43.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN gpu_compute_1_0 13 26 39 52 65 SE +/- 1.02, N = 12 59.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT gpu_compute_1_0 13 26 39 52 65 SE +/- 0.81, N = 12 58.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN gpu_compute_1_0 14 28 42 56 70 SE +/- 0.83, N = 12 62.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT gpu_compute_1_0 14 28 42 56 70 SE +/- 0.81, N = 12 62.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY gpu_compute_1_0 60 120 180 240 300 SE +/- 0.33, N = 3 286 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY gpu_compute_1_0 80 160 240 320 400 SE +/- 0.00, N = 3 361 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT gpu_compute_1_0 70 140 210 280 350 SE +/- 0.33, N = 3 329 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY gpu_compute_1_0 80 160 240 320 400 SE +/- 0.00, N = 3 377 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY gpu_compute_1_0 90 180 270 360 450 SE +/- 0.00, N = 3 400 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT gpu_compute_1_0 90 180 270 360 450 SE +/- 0.00, N = 3 402 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N gpu_compute_1_0 40 80 120 160 200 SE +/- 0.00, N = 3 176 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T gpu_compute_1_0 70 140 210 280 350 SE +/- 0.33, N = 3 333 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN gpu_compute_1_0 70 140 210 280 350 SE +/- 1.00, N = 2 332 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN gpu_compute_1_0 70 140 210 280 350 330 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: BMW27 - Compute: NVIDIA OptiX gpu_compute_1_0 1.206 2.412 3.618 4.824 6.03 SE +/- 0.03, N = 3 5.36
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Classroom - Compute: NVIDIA OptiX gpu_compute_1_0 3 6 9 12 15 SE +/- 0.01, N = 3 12.65
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Fishy Cat - Compute: NVIDIA OptiX gpu_compute_1_0 3 6 9 12 15 SE +/- 0.01, N = 3 10.46
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Barbershop - Compute: NVIDIA OptiX gpu_compute_1_0 11 22 33 44 55 SE +/- 0.03, N = 3 49.78
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.5 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX gpu_compute_1_0 4 8 12 16 20 SE +/- 0.04, N = 3 14.21
IndigoBench Acceleration: OpenCL GPU - Scene: Bedroom OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Bedroom gpu_compute_1_0 6 12 18 24 30 SE +/- 0.02, N = 3 25.86
IndigoBench Acceleration: OpenCL GPU - Scene: Supercar OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Supercar gpu_compute_1_0 16 32 48 64 80 SE +/- 0.04, N = 3 70.62
MandelGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelGPU 1.3pts1 OpenCL Device: GPU gpu_compute_1_0 50M 100M 150M 200M 250M SE +/- 391525.78, N = 3 220704201.9 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU gpu_compute_1_0 700 1400 2100 2800 3500 SE +/- 0.00, N = 3 3070
Phoronix Test Suite v10.8.4