rocky-gpu-2023-08-17 AMD Ryzen 9 7950X 16-Core testing with a Gigabyte B650 AORUS ELITE AX (F4b BIOS) and NVIDIA GeForce RTX 4090 24GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2308180-NE-ROCKYGPU248&grs .
rocky-gpu-2023-08-17 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenCL Vulkan Compiler File-System rocky-gpu-2023-08-17 AMD Ryzen 9 7950X 16-Core @ 4.50GHz (16 Cores / 32 Threads) Gigabyte B650 AORUS ELITE AX (F4b BIOS) AMD Device 14d8 2 x 32 GB DDR5-6000MT/s F5-6000J3238G32G 2000GB Samsung SSD 990 PRO 2TB + 2 x 18000GB TOSHIBA MG09ACA1 NVIDIA GeForce RTX 4090 24GB NVIDIA Device 22ba Realtek RTL8125 2.5GbE + MEDIATEK Device 0616 Ubuntu 22.04 5.15.0-79-generic (x86_64) X Server NVIDIA OpenCL 3.0 CUDA 12.2.128 1.3.242 GCC 11.4.0 + CUDA 11.7 ext4 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa601203 - BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 95.02.3c.00.8c - Python 3.10.12 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
rocky-gpu-2023-08-17 neatbench: GPU blender: Pabellon Barcelona - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: BMW27 - NVIDIA OptiX ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet viennacl: OpenCL BLAS - dGEMM-TT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sCOPY viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sCOPY financebench: Black-Scholes OpenCL luxcorerender: Rainbow Colors and Prism - GPU luxcorerender: LuxCore Benchmark - GPU luxcorerender: Orange Juice - GPU luxcorerender: Danish Mood - GPU luxcorerender: DLSC - GPU arrayfire: Conjugate Gradient OpenCL rodinia: OpenCL Particle Filter lczero: OpenCL clpeak: Global Memory Bandwidth clpeak: Double-Precision Double clpeak: Single-Precision Float clpeak: Integer Compute INT fahbench: cl-mem: Write cl-mem: Read cl-mem: Copy shoc: OpenCL - Texture Read Bandwidth shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Bus Speed Download shoc: OpenCL - Max SP Flops shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Reduction shoc: OpenCL - MD5 Hash shoc: OpenCL - FFT SP shoc: OpenCL - Triad shoc: OpenCL - S3D mixbench: NVIDIA CUDA - Single Precision mixbench: NVIDIA CUDA - Double Precision mixbench: NVIDIA CUDA - Half Precision mixbench: OpenCL - Single Precision mixbench: OpenCL - Double Precision mixbench: NVIDIA CUDA - Integer mixbench: OpenCL - Integer hashcat: MD5 rocky-gpu-2023-08-17 4090 8.10 29.34 5.31 7.07 3.45 4.01 31.04 8.24 6.72 12.30 9.50 3.99 5.01 20.32 7.72 1.36 3.81 2.93 3.31 3.14 3.11 7.92 1340 1290 1270 1150 442 219 681 777 662 447 567 435 111 116 106 110 170 136 121 120 79.4 408 417 278 2.895 45.06 21.26 20.40 20.49 26.02 0.8745 2.151 45692 873.15 1388.89 79753.35 40829.93 433.1189 806.9 889.2 409.1 2975.91 26.3534 26.8071 87267.3 26870.9 972.750 93.6509 2789.43 26.1610 643.436 71978.75 1085.41 73760.20 76445.25 1098.77 34761.32 40183.58 OpenBenchmarking.org
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU rocky-gpu-2023-08-17 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4090
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.02, N = 3 8.10
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 7 14 21 28 35 SE +/- 0.10, N = 3 29.34
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 1.1948 2.3896 3.5844 4.7792 5.974 SE +/- 0.07, N = 12 5.31
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.01, N = 3 7.07
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 0.7763 1.5526 2.3289 3.1052 3.8815 SE +/- 0.05, N = 15 3.45
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet rocky-gpu-2023-08-17 0.9023 1.8046 2.7069 3.6092 4.5115 SE +/- 0.01, N = 3 4.01 MIN: 3.97 / MAX: 4.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer rocky-gpu-2023-08-17 7 14 21 28 35 SE +/- 0.03, N = 3 31.04 MIN: 30.84 / MAX: 33.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.03, N = 3 8.24 MIN: 8.16 / MAX: 8.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.01, N = 3 6.72 MIN: 6.66 / MAX: 7.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny rocky-gpu-2023-08-17 3 6 9 12 15 SE +/- 0.05, N = 3 12.30 MIN: 11.99 / MAX: 12.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 rocky-gpu-2023-08-17 3 6 9 12 15 SE +/- 0.03, N = 3 9.50 MIN: 9.39 / MAX: 9.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet rocky-gpu-2023-08-17 0.8978 1.7956 2.6934 3.5912 4.489 SE +/- 0.01, N = 3 3.99 MIN: 3.94 / MAX: 4.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 rocky-gpu-2023-08-17 1.1273 2.2546 3.3819 4.5092 5.6365 SE +/- 0.01, N = 3 5.01 MIN: 4.97 / MAX: 5.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.03, N = 3 20.32 MIN: 20.16 / MAX: 21.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.03, N = 3 7.72 MIN: 7.62 / MAX: 9.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface rocky-gpu-2023-08-17 0.306 0.612 0.918 1.224 1.53 SE +/- 0.00, N = 3 1.36 MIN: 1.33 / MAX: 1.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 rocky-gpu-2023-08-17 0.8573 1.7146 2.5719 3.4292 4.2865 SE +/- 0.01, N = 3 3.81 MIN: 3.78 / MAX: 3.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet rocky-gpu-2023-08-17 0.6593 1.3186 1.9779 2.6372 3.2965 SE +/- 0.01, N = 3 2.93 MIN: 2.89 / MAX: 3.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 rocky-gpu-2023-08-17 0.7448 1.4896 2.2344 2.9792 3.724 SE +/- 0.01, N = 3 3.31 MIN: 3.27 / MAX: 3.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 rocky-gpu-2023-08-17 0.7065 1.413 2.1195 2.826 3.5325 SE +/- 0.01, N = 3 3.14 MIN: 3.09 / MAX: 3.22 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 rocky-gpu-2023-08-17 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.01, N = 3 3.11 MIN: 3.05 / MAX: 3.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.03, N = 3 7.92 MIN: 7.86 / MAX: 8.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 0.00, N = 3 1340 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 0.00, N = 3 1290 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 0.00, N = 3 1270 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.00, N = 3 1150 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T rocky-gpu-2023-08-17 100 200 300 400 500 SE +/- 0.00, N = 3 442 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N rocky-gpu-2023-08-17 50 100 150 200 250 SE +/- 0.33, N = 3 219 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT rocky-gpu-2023-08-17 150 300 450 600 750 SE +/- 0.33, N = 3 681 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.00, N = 3 777 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY rocky-gpu-2023-08-17 140 280 420 560 700 SE +/- 0.33, N = 3 662 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT rocky-gpu-2023-08-17 100 200 300 400 500 SE +/- 0.00, N = 3 447 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY rocky-gpu-2023-08-17 120 240 360 480 600 SE +/- 0.00, N = 3 567 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.67, N = 3 435 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.00, N = 2 111 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.33, N = 3 116 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.33, N = 3 106 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.33, N = 3 110 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T rocky-gpu-2023-08-17 40 80 120 160 200 SE +/- 0.33, N = 3 170 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.58, N = 3 136 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.33, N = 3 121 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.33, N = 3 120 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.09, N = 3 79.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.88, N = 3 408 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 1.33, N = 3 417 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY rocky-gpu-2023-08-17 60 120 180 240 300 SE +/- 0.88, N = 3 278 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL rocky-gpu-2023-08-17 0.6514 1.3028 1.9542 2.6056 3.257 SE +/- 0.002, N = 3 2.895 1. (CXX) g++ options: -O3 -march=native -fopenmp
LuxCoreRender Scene: Rainbow Colors and Prism - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU rocky-gpu-2023-08-17 10 20 30 40 50 SE +/- 0.04, N = 3 45.06 MIN: 38.27 / MAX: 47.55
LuxCoreRender Scene: LuxCore Benchmark - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.04, N = 3 21.26 MIN: 9.33 / MAX: 25.54
LuxCoreRender Scene: Orange Juice - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.04, N = 3 20.40 MIN: 18.32 / MAX: 28.37
LuxCoreRender Scene: Danish Mood - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.25, N = 4 20.49 MIN: 7.86 / MAX: 23.88
LuxCoreRender Scene: DLSC - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.02, N = 3 26.02 MIN: 24.8 / MAX: 26.26
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL rocky-gpu-2023-08-17 0.1968 0.3936 0.5904 0.7872 0.984 SE +/- 0.0024, N = 3 0.8745 1. (CXX) g++ options: -rdynamic
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter rocky-gpu-2023-08-17 0.484 0.968 1.452 1.936 2.42 2.151 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
LeelaChessZero Backend: OpenCL OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: OpenCL rocky-gpu-2023-08-17 10K 20K 30K 40K 50K SE +/- 306.89, N = 3 45692 1. (CXX) g++ options: -flto -pthread
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.04, N = 3 873.15 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 3.82, N = 3 1388.89 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float rocky-gpu-2023-08-17 20K 40K 60K 80K 100K SE +/- 0.00, N = 3 79753.35 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT rocky-gpu-2023-08-17 9K 18K 27K 36K 45K SE +/- 9.32, N = 3 40829.93 1. (CXX) g++ options: -O3
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.29, N = 3 433.12
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.64, N = 3 806.9 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.03, N = 3 889.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.03, N = 3 409.1 1. (CC) gcc options: -O2 -flto -lOpenCL
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth rocky-gpu-2023-08-17 600 1200 1800 2400 3000 SE +/- 0.63, N = 3 2975.91 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.00, N = 3 26.35 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.00, N = 3 26.81 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops rocky-gpu-2023-08-17 20K 40K 60K 80K 100K SE +/- 577.21, N = 3 87267.3 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N rocky-gpu-2023-08-17 6K 12K 18K 24K 30K SE +/- 138.65, N = 3 26870.9 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 8.94, N = 15 972.75 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.87, N = 15 93.65 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP rocky-gpu-2023-08-17 600 1200 1800 2400 3000 SE +/- 2.06, N = 3 2789.43 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.01, N = 3 26.16 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D rocky-gpu-2023-08-17 140 280 420 560 700 SE +/- 0.18, N = 3 643.44 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision rocky-gpu-2023-08-17 15K 30K 45K 60K 75K SE +/- 89.94, N = 3 71978.75 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.02, N = 3 1085.41 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision rocky-gpu-2023-08-17 16K 32K 48K 64K 80K SE +/- 81.83, N = 3 73760.20 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision rocky-gpu-2023-08-17 16K 32K 48K 64K 80K SE +/- 36.23, N = 3 76445.25 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 12.91, N = 4 1098.77 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer rocky-gpu-2023-08-17 7K 14K 21K 28K 35K SE +/- 21.61, N = 3 34761.32 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer rocky-gpu-2023-08-17 9K 18K 27K 36K 45K SE +/- 84.62, N = 3 40183.58 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Phoronix Test Suite v10.8.5