rocky-gpu-2023-08-17 AMD Ryzen 9 7950X 16-Core testing with a Gigabyte B650 AORUS ELITE AX (F4b BIOS) and NVIDIA GeForce RTX 4090 24GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2308180-NE-ROCKYGPU248&grt .
rocky-gpu-2023-08-17 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenCL Vulkan Compiler File-System rocky-gpu-2023-08-17 AMD Ryzen 9 7950X 16-Core @ 4.50GHz (16 Cores / 32 Threads) Gigabyte B650 AORUS ELITE AX (F4b BIOS) AMD Device 14d8 2 x 32 GB DDR5-6000MT/s F5-6000J3238G32G 2000GB Samsung SSD 990 PRO 2TB + 2 x 18000GB TOSHIBA MG09ACA1 NVIDIA GeForce RTX 4090 24GB NVIDIA Device 22ba Realtek RTL8125 2.5GbE + MEDIATEK Device 0616 Ubuntu 22.04 5.15.0-79-generic (x86_64) X Server NVIDIA OpenCL 3.0 CUDA 12.2.128 1.3.242 GCC 11.4.0 + CUDA 11.7 ext4 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa601203 - BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 95.02.3c.00.8c - Python 3.10.12 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
rocky-gpu-2023-08-17 arrayfire: Conjugate Gradient OpenCL blender: BMW27 - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Pabellon Barcelona - NVIDIA OptiX cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth fahbench: financebench: Black-Scholes OpenCL lczero: OpenCL luxcorerender: DLSC - GPU luxcorerender: Danish Mood - GPU luxcorerender: Orange Juice - GPU luxcorerender: LuxCore Benchmark - GPU luxcorerender: Rainbow Colors and Prism - GPU mixbench: OpenCL - Integer mixbench: NVIDIA CUDA - Integer mixbench: OpenCL - Double Precision mixbench: OpenCL - Single Precision mixbench: NVIDIA CUDA - Half Precision mixbench: NVIDIA CUDA - Double Precision mixbench: NVIDIA CUDA - Single Precision ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet neatbench: GPU rodinia: OpenCL Particle Filter shoc: OpenCL - S3D shoc: OpenCL - Triad shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash shoc: OpenCL - Reduction shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Max SP Flops shoc: OpenCL - Bus Speed Download shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Texture Read Bandwidth viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-TT rocky-gpu-2023-08-17 0.8745 3.45 7.07 5.31 29.34 8.10 409.1 889.2 806.9 40829.93 79753.35 1388.89 873.15 433.1189 2.895 45692 26.02 20.49 20.40 21.26 45.06 40183.58 34761.32 1098.77 76445.25 73760.20 1085.41 71978.75 7.92 3.11 3.14 3.31 2.93 3.81 1.36 7.72 20.32 5.01 3.99 9.50 12.30 6.72 8.24 31.04 4.01 4090 2.151 643.436 26.1610 2789.43 93.6509 972.750 26870.9 87267.3 26.8071 26.3534 2975.91 278 417 408 79.4 120 121 136 170 110 106 116 111 435 567 447 662 777 681 219 442 1150 1270 1290 1340 OpenBenchmarking.org
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL rocky-gpu-2023-08-17 0.1968 0.3936 0.5904 0.7872 0.984 SE +/- 0.0024, N = 3 0.8745 1. (CXX) g++ options: -rdynamic
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 0.7763 1.5526 2.3289 3.1052 3.8815 SE +/- 0.05, N = 15 3.45
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.01, N = 3 7.07
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 1.1948 2.3896 3.5844 4.7792 5.974 SE +/- 0.07, N = 12 5.31
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 7 14 21 28 35 SE +/- 0.10, N = 3 29.34
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.02, N = 3 8.10
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.03, N = 3 409.1 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.03, N = 3 889.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.64, N = 3 806.9 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT rocky-gpu-2023-08-17 9K 18K 27K 36K 45K SE +/- 9.32, N = 3 40829.93 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float rocky-gpu-2023-08-17 20K 40K 60K 80K 100K SE +/- 0.00, N = 3 79753.35 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 3.82, N = 3 1388.89 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.04, N = 3 873.15 1. (CXX) g++ options: -O3
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.29, N = 3 433.12
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL rocky-gpu-2023-08-17 0.6514 1.3028 1.9542 2.6056 3.257 SE +/- 0.002, N = 3 2.895 1. (CXX) g++ options: -O3 -march=native -fopenmp
LeelaChessZero Backend: OpenCL OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: OpenCL rocky-gpu-2023-08-17 10K 20K 30K 40K 50K SE +/- 306.89, N = 3 45692 1. (CXX) g++ options: -flto -pthread
LuxCoreRender Scene: DLSC - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.02, N = 3 26.02 MIN: 24.8 / MAX: 26.26
LuxCoreRender Scene: Danish Mood - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.25, N = 4 20.49 MIN: 7.86 / MAX: 23.88
LuxCoreRender Scene: Orange Juice - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.04, N = 3 20.40 MIN: 18.32 / MAX: 28.37
LuxCoreRender Scene: LuxCore Benchmark - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.04, N = 3 21.26 MIN: 9.33 / MAX: 25.54
LuxCoreRender Scene: Rainbow Colors and Prism - Acceleration: GPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU rocky-gpu-2023-08-17 10 20 30 40 50 SE +/- 0.04, N = 3 45.06 MIN: 38.27 / MAX: 47.55
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer rocky-gpu-2023-08-17 9K 18K 27K 36K 45K SE +/- 84.62, N = 3 40183.58 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer rocky-gpu-2023-08-17 7K 14K 21K 28K 35K SE +/- 21.61, N = 3 34761.32 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 12.91, N = 4 1098.77 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision rocky-gpu-2023-08-17 16K 32K 48K 64K 80K SE +/- 36.23, N = 3 76445.25 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision rocky-gpu-2023-08-17 16K 32K 48K 64K 80K SE +/- 81.83, N = 3 73760.20 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.02, N = 3 1085.41 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision rocky-gpu-2023-08-17 15K 30K 45K 60K 75K SE +/- 89.94, N = 3 71978.75 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.03, N = 3 7.92 MIN: 7.86 / MAX: 8.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 rocky-gpu-2023-08-17 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.01, N = 3 3.11 MIN: 3.05 / MAX: 3.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 rocky-gpu-2023-08-17 0.7065 1.413 2.1195 2.826 3.5325 SE +/- 0.01, N = 3 3.14 MIN: 3.09 / MAX: 3.22 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 rocky-gpu-2023-08-17 0.7448 1.4896 2.2344 2.9792 3.724 SE +/- 0.01, N = 3 3.31 MIN: 3.27 / MAX: 3.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet rocky-gpu-2023-08-17 0.6593 1.3186 1.9779 2.6372 3.2965 SE +/- 0.01, N = 3 2.93 MIN: 2.89 / MAX: 3.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 rocky-gpu-2023-08-17 0.8573 1.7146 2.5719 3.4292 4.2865 SE +/- 0.01, N = 3 3.81 MIN: 3.78 / MAX: 3.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface rocky-gpu-2023-08-17 0.306 0.612 0.918 1.224 1.53 SE +/- 0.00, N = 3 1.36 MIN: 1.33 / MAX: 1.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.03, N = 3 7.72 MIN: 7.62 / MAX: 9.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 rocky-gpu-2023-08-17 5 10 15 20 25 SE +/- 0.03, N = 3 20.32 MIN: 20.16 / MAX: 21.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 rocky-gpu-2023-08-17 1.1273 2.2546 3.3819 4.5092 5.6365 SE +/- 0.01, N = 3 5.01 MIN: 4.97 / MAX: 5.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet rocky-gpu-2023-08-17 0.8978 1.7956 2.6934 3.5912 4.489 SE +/- 0.01, N = 3 3.99 MIN: 3.94 / MAX: 4.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 rocky-gpu-2023-08-17 3 6 9 12 15 SE +/- 0.03, N = 3 9.50 MIN: 9.39 / MAX: 9.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny rocky-gpu-2023-08-17 3 6 9 12 15 SE +/- 0.05, N = 3 12.30 MIN: 11.99 / MAX: 12.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.01, N = 3 6.72 MIN: 6.66 / MAX: 7.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m rocky-gpu-2023-08-17 2 4 6 8 10 SE +/- 0.03, N = 3 8.24 MIN: 8.16 / MAX: 8.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer rocky-gpu-2023-08-17 7 14 21 28 35 SE +/- 0.03, N = 3 31.04 MIN: 30.84 / MAX: 33.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet rocky-gpu-2023-08-17 0.9023 1.8046 2.7069 3.6092 4.5115 SE +/- 0.01, N = 3 4.01 MIN: 3.97 / MAX: 4.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU rocky-gpu-2023-08-17 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4090
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter rocky-gpu-2023-08-17 0.484 0.968 1.452 1.936 2.42 2.151 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D rocky-gpu-2023-08-17 140 280 420 560 700 SE +/- 0.18, N = 3 643.44 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.01, N = 3 26.16 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP rocky-gpu-2023-08-17 600 1200 1800 2400 3000 SE +/- 2.06, N = 3 2789.43 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.87, N = 15 93.65 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 8.94, N = 15 972.75 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N rocky-gpu-2023-08-17 6K 12K 18K 24K 30K SE +/- 138.65, N = 3 26870.9 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops rocky-gpu-2023-08-17 20K 40K 60K 80K 100K SE +/- 577.21, N = 3 87267.3 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.00, N = 3 26.81 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback rocky-gpu-2023-08-17 6 12 18 24 30 SE +/- 0.00, N = 3 26.35 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth rocky-gpu-2023-08-17 600 1200 1800 2400 3000 SE +/- 0.63, N = 3 2975.91 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY rocky-gpu-2023-08-17 60 120 180 240 300 SE +/- 0.88, N = 3 278 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 1.33, N = 3 417 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.88, N = 3 408 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.09, N = 3 79.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.33, N = 3 120 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.33, N = 3 121 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.58, N = 3 136 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T rocky-gpu-2023-08-17 40 80 120 160 200 SE +/- 0.33, N = 3 170 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.33, N = 3 110 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.33, N = 3 106 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN rocky-gpu-2023-08-17 30 60 90 120 150 SE +/- 0.33, N = 3 116 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT rocky-gpu-2023-08-17 20 40 60 80 100 SE +/- 0.00, N = 2 111 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY rocky-gpu-2023-08-17 90 180 270 360 450 SE +/- 0.67, N = 3 435 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY rocky-gpu-2023-08-17 120 240 360 480 600 SE +/- 0.00, N = 3 567 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT rocky-gpu-2023-08-17 100 200 300 400 500 SE +/- 0.00, N = 3 447 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY rocky-gpu-2023-08-17 140 280 420 560 700 SE +/- 0.33, N = 3 662 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.00, N = 3 777 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT rocky-gpu-2023-08-17 150 300 450 600 750 SE +/- 0.33, N = 3 681 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N rocky-gpu-2023-08-17 50 100 150 200 250 SE +/- 0.33, N = 3 219 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T rocky-gpu-2023-08-17 100 200 300 400 500 SE +/- 0.00, N = 3 442 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN rocky-gpu-2023-08-17 200 400 600 800 1000 SE +/- 0.00, N = 3 1150 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 0.00, N = 3 1270 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 0.00, N = 3 1290 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT rocky-gpu-2023-08-17 300 600 900 1200 1500 SE +/- 0.00, N = 3 1340 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Phoronix Test Suite v10.8.5