mighty-3090x2-2 AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 DESIGNARE (F5 BIOS) and NVIDIA GeForce RTX 3090 24GB on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2303302-NE-MIGHTY30974&grt .
mighty-3090x2-2 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenCL Vulkan Compiler File-System Screen Resolution RTX 3090 x 2 AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 DESIGNARE (F5 BIOS) AMD Starship/Matisse 256GB 2048GB ADATA SX8200PNP + 3 x 2048GB SPCC M.2 PCIe SSD + 5 x 14001GB Western Digital WUH721414AL NVIDIA GeForce RTX 3090 24GB NVIDIA Device 1aef 2 x Intel I210 + Intel Wi-Fi 6 AX200 Ubuntu 20.04 5.15.0-67-generic (x86_64) X Server 1.20.11 NVIDIA OpenCL 3.0 CUDA 11.6.134 1.3.194 GCC 9.4.0 + CUDA 11.6 btrfs 1024x768 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0x8301039 - BAR1 / Visible vRAM Size: 256 MiB - Python 3.8.10 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
mighty-3090x2-2 arrayfire: Conjugate Gradient OpenCL caffe: AlexNet - NVIDIA CUDA - 100 caffe: AlexNet - NVIDIA CUDA - 200 caffe: AlexNet - NVIDIA CUDA - 1000 caffe: GoogleNet - NVIDIA CUDA - 100 caffe: GoogleNet - NVIDIA CUDA - 200 caffe: GoogleNet - NVIDIA CUDA - 1000 cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth fahbench: financebench: Black-Scholes OpenCL hashcat: MD5 hashcat: SHA1 hashcat: 7-Zip hashcat: SHA-512 hashcat: TrueCrypt RIPEMD160 + XTS lczero: OpenCL luxcorerender: DLSC - GPU luxcorerender: Danish Mood - GPU luxcorerender: Orange Juice - GPU luxcorerender: LuxCore Benchmark - GPU luxcorerender: Rainbow Colors and Prism - GPU mixbench: OpenCL - Integer mixbench: NVIDIA CUDA - Integer mixbench: OpenCL - Double Precision mixbench: OpenCL - Single Precision mixbench: NVIDIA CUDA - Half Precision mixbench: NVIDIA CUDA - Double Precision mixbench: NVIDIA CUDA - Single Precision neatbench: GPU rodinia: OpenCL Particle Filter viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-NT viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-TT RTX 3090 x 2 1.501 699.569 1381.71 6874.60 2800.17 5566.12 27981.1 360.3 795.6 749.3 17621.66 34570.34 642.61 816.35 334.5374 6.326 100686168750 42287666667 2145400 6099166667 1584767 19173 27.82 19.41 22.61 21.48 61.65 17461.90 16566.76 512.43 36828.07 36773.91 493.23 34440.08 3090 3.898 825 764 617 1753 1447 1130 42.2 717 106 111 110 106 361 493 365 599 714 648 235 370 588 590 585 587 OpenBenchmarking.org
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL RTX 3090 x 2 0.3377 0.6754 1.0131 1.3508 1.6885 SE +/- 0.004, N = 3 1.501 1. (CXX) g++ options: -rdynamic
Caffe Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 RTX 3090 x 2 150 300 450 600 750 SE +/- 0.44, N = 3 699.57 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 RTX 3090 x 2 300 600 900 1200 1500 SE +/- 3.10, N = 3 1381.71 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 RTX 3090 x 2 1500 3000 4500 6000 7500 SE +/- 8.67, N = 3 6874.60 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 RTX 3090 x 2 600 1200 1800 2400 3000 SE +/- 14.32, N = 3 2800.17 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 RTX 3090 x 2 1200 2400 3600 4800 6000 SE +/- 2.52, N = 3 5566.12 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 RTX 3090 x 2 6K 12K 18K 24K 30K SE +/- 96.41, N = 3 27981.1 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy RTX 3090 x 2 80 160 240 320 400 SE +/- 0.07, N = 3 360.3 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read RTX 3090 x 2 200 400 600 800 1000 SE +/- 0.99, N = 3 795.6 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write RTX 3090 x 2 160 320 480 640 800 SE +/- 0.03, N = 3 749.3 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT RTX 3090 x 2 4K 8K 12K 16K 20K SE +/- 0.16, N = 3 17621.66 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float RTX 3090 x 2 7K 14K 21K 28K 35K SE +/- 119.42, N = 3 34570.34 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double RTX 3090 x 2 140 280 420 560 700 SE +/- 0.78, N = 3 642.61 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth RTX 3090 x 2 200 400 600 800 1000 SE +/- 0.02, N = 3 816.35 1. (CXX) g++ options: -O3
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 RTX 3090 x 2 70 140 210 280 350 SE +/- 0.26, N = 3 334.54
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL RTX 3090 x 2 2 4 6 8 10 SE +/- 0.002, N = 3 6.326 1. (CXX) g++ options: -O3 -march=native -fopenmp
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: MD5 RTX 3090 x 2 20000M 40000M 60000M 80000M 100000M SE +/- 8558826478.48, N = 16 100686168750
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 RTX 3090 x 2 9000M 18000M 27000M 36000M 45000M SE +/- 53718567.04, N = 3 42287666667
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: 7-Zip RTX 3090 x 2 500K 1000K 1500K 2000K 2500K SE +/- 1700.98, N = 3 2145400
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 RTX 3090 x 2 1300M 2600M 3900M 5200M 6500M SE +/- 13877599.86, N = 3 6099166667
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS RTX 3090 x 2 300K 600K 900K 1200K 1500K SE +/- 3668.48, N = 3 1584767
LeelaChessZero Backend: OpenCL OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: OpenCL RTX 3090 x 2 4K 8K 12K 16K 20K SE +/- 197.85, N = 4 19173 1. (CXX) g++ options: -flto -pthread
LuxCoreRender Scene: DLSC - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU RTX 3090 x 2 7 14 21 28 35 SE +/- 0.16, N = 3 27.82 MIN: 25.71 / MAX: 28.34
LuxCoreRender Scene: Danish Mood - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU RTX 3090 x 2 5 10 15 20 25 SE +/- 0.06, N = 3 19.41 MIN: 8.01 / MAX: 22.62
LuxCoreRender Scene: Orange Juice - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU RTX 3090 x 2 5 10 15 20 25 SE +/- 0.23, N = 5 22.61 MIN: 19.29 / MAX: 32
LuxCoreRender Scene: LuxCore Benchmark - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU RTX 3090 x 2 5 10 15 20 25 SE +/- 0.02, N = 3 21.48 MIN: 8.67 / MAX: 26.83
LuxCoreRender Scene: Rainbow Colors and Prism - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU RTX 3090 x 2 14 28 42 56 70 SE +/- 0.30, N = 3 61.65 MIN: 53.45 / MAX: 71.5
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer RTX 3090 x 2 4K 8K 12K 16K 20K SE +/- 6.05, N = 3 17461.90 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer RTX 3090 x 2 4K 8K 12K 16K 20K SE +/- 323.63, N = 15 16566.76 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision RTX 3090 x 2 110 220 330 440 550 SE +/- 9.27, N = 15 512.43 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision RTX 3090 x 2 8K 16K 24K 32K 40K SE +/- 625.58, N = 15 36828.07 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision RTX 3090 x 2 8K 16K 24K 32K 40K SE +/- 670.64, N = 15 36773.91 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision RTX 3090 x 2 110 220 330 440 550 SE +/- 8.10, N = 15 493.23 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision RTX 3090 x 2 7K 14K 21K 28K 35K SE +/- 586.71, N = 15 34440.08 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU RTX 3090 x 2 700 1400 2100 2800 3500 3090
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter RTX 3090 x 2 0.8771 1.7542 2.6313 3.5084 4.3855 SE +/- 0.030, N = 10 3.898 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY RTX 3090 x 2 200 400 600 800 1000 SE +/- 26.26, N = 15 825 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY RTX 3090 x 2 160 320 480 640 800 SE +/- 30.72, N = 15 764 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT RTX 3090 x 2 130 260 390 520 650 SE +/- 24.85, N = 15 617 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY RTX 3090 x 2 400 800 1200 1600 2000 SE +/- 59.20, N = 15 1753 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY RTX 3090 x 2 300 600 900 1200 1500 SE +/- 50.51, N = 15 1447 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT RTX 3090 x 2 200 400 600 800 1000 SE +/- 58.79, N = 15 1130 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N RTX 3090 x 2 10 20 30 40 50 SE +/- 2.09, N = 15 42.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T RTX 3090 x 2 150 300 450 600 750 SE +/- 28.09, N = 15 717 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN RTX 3090 x 2 20 40 60 80 100 SE +/- 0.53, N = 15 106 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN RTX 3090 x 2 20 40 60 80 100 SE +/- 0.54, N = 14 111 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT RTX 3090 x 2 20 40 60 80 100 SE +/- 0.48, N = 12 110 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT RTX 3090 x 2 20 40 60 80 100 106 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY RTX 3090 x 2 80 160 240 320 400 SE +/- 0.88, N = 3 361 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY RTX 3090 x 2 110 220 330 440 550 SE +/- 0.67, N = 3 493 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT RTX 3090 x 2 80 160 240 320 400 SE +/- 0.88, N = 3 365 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY RTX 3090 x 2 130 260 390 520 650 SE +/- 1.00, N = 3 599 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY RTX 3090 x 2 150 300 450 600 750 714 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT RTX 3090 x 2 140 280 420 560 700 SE +/- 0.88, N = 3 648 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N RTX 3090 x 2 50 100 150 200 250 SE +/- 0.33, N = 3 235 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T RTX 3090 x 2 80 160 240 320 400 SE +/- 0.33, N = 3 370 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN RTX 3090 x 2 130 260 390 520 650 SE +/- 1.53, N = 3 588 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT RTX 3090 x 2 130 260 390 520 650 SE +/- 1.53, N = 3 590 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN RTX 3090 x 2 130 260 390 520 650 SE +/- 1.33, N = 3 585 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT RTX 3090 x 2 130 260 390 520 650 587 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Phoronix Test Suite v10.8.5