rocky-gpu-2023-08-17

AMD Ryzen 9 7950X 16-Core testing with a Gigabyte B650 AORUS ELITE AX (F4b BIOS) and NVIDIA GeForce RTX 4090 24GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2308180-NE-ROCKYGPU248.

rocky-gpu-2023-08-17ProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDisplay ServerDisplay DriverOpenCLVulkanCompilerFile-Systemrocky-gpu-2023-08-17AMD Ryzen 9 7950X 16-Core @ 4.50GHz (16 Cores / 32 Threads)Gigabyte B650 AORUS ELITE AX (F4b BIOS)AMD Device 14d82 x 32 GB DDR5-6000MT/s F5-6000J3238G32G2000GB Samsung SSD 990 PRO 2TB + 2 x 18000GB TOSHIBA MG09ACA1NVIDIA GeForce RTX 4090 24GBNVIDIA Device 22baRealtek RTL8125 2.5GbE + MEDIATEK Device 0616Ubuntu 22.045.15.0-79-generic (x86_64)X ServerNVIDIAOpenCL 3.0 CUDA 12.2.1281.3.242GCC 11.4.0 + CUDA 11.7ext4OpenBenchmarking.org- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa601203 - BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 95.02.3c.00.8c- Python 3.10.12- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

rocky-gpu-2023-08-17mixbench: OpenCL - Integermixbench: NVIDIA CUDA - Integermixbench: OpenCL - Double Precisionmixbench: OpenCL - Single Precisionmixbench: NVIDIA CUDA - Half Precisionmixbench: NVIDIA CUDA - Double Precisionmixbench: NVIDIA CUDA - Single Precisionshoc: OpenCL - S3Dshoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Reductionshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthcl-mem: Copycl-mem: Readcl-mem: Writefahbench: clpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidthlczero: OpenCLrodinia: OpenCL Particle Filterarrayfire: Conjugate Gradient OpenCLluxcorerender: DLSC - GPUluxcorerender: Danish Mood - GPUluxcorerender: Orange Juice - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Rainbow Colors and Prism - GPUfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetblender: BMW27 - NVIDIA OptiXblender: Classroom - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXblender: Pabellon Barcelona - NVIDIA OptiXneatbench: GPUrocky-gpu-2023-08-1740183.5834761.321098.7776445.2573760.201085.4171978.75643.43626.16102789.4393.6509972.75026870.987267.326.807126.35342975.91409.1889.2806.9433.118940829.9379753.351388.89873.15456922.1510.874526.0220.4920.4021.2645.062.89527841740879.412012113617011010611611143556744766277768121944211501270129013407.923.113.143.312.933.811.367.7220.325.013.999.5012.306.728.2431.044.013.457.075.3129.348.104090OpenBenchmarking.org

Mixbench

Backend: OpenCL - Benchmark: Integer

OpenBenchmarking.orgGIOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Integerrocky-gpu-2023-08-179K18K27K36K45KSE +/- 84.62, N = 340183.581. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Integer

OpenBenchmarking.orgGIOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Integerrocky-gpu-2023-08-177K14K21K28K35KSE +/- 21.61, N = 334761.321. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Double Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Double Precisionrocky-gpu-2023-08-172004006008001000SE +/- 12.91, N = 41098.771. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Single Precisionrocky-gpu-2023-08-1716K32K48K64K80KSE +/- 36.23, N = 376445.251. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Half Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Half Precisionrocky-gpu-2023-08-1716K32K48K64K80KSE +/- 81.83, N = 373760.201. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Double Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Double Precisionrocky-gpu-2023-08-172004006008001000SE +/- 0.02, N = 31085.411. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Single Precisionrocky-gpu-2023-08-1715K30K45K60K75KSE +/- 89.94, N = 371978.751. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3Drocky-gpu-2023-08-17140280420560700SE +/- 0.18, N = 3643.441. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Triadrocky-gpu-2023-08-17612182430SE +/- 0.01, N = 326.161. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SProcky-gpu-2023-08-176001200180024003000SE +/- 2.06, N = 32789.431. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 Hashrocky-gpu-2023-08-1720406080100SE +/- 0.87, N = 1593.651. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Reductionrocky-gpu-2023-08-172004006008001000SE +/- 8.94, N = 15972.751. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_Nrocky-gpu-2023-08-176K12K18K24K30KSE +/- 138.65, N = 326870.91. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP Flopsrocky-gpu-2023-08-1720K40K60K80K100KSE +/- 577.21, N = 387267.31. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Downloadrocky-gpu-2023-08-17612182430SE +/- 0.00, N = 326.811. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Readbackrocky-gpu-2023-08-17612182430SE +/- 0.00, N = 326.351. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read Bandwidthrocky-gpu-2023-08-176001200180024003000SE +/- 0.63, N = 32975.911. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copyrocky-gpu-2023-08-1790180270360450SE +/- 0.03, N = 3409.11. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readrocky-gpu-2023-08-172004006008001000SE +/- 0.03, N = 3889.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writerocky-gpu-2023-08-172004006008001000SE +/- 0.64, N = 3806.91. (CC) gcc options: -O2 -flto -lOpenCL

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2rocky-gpu-2023-08-1790180270360450SE +/- 0.29, N = 3433.12

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTrocky-gpu-2023-08-179K18K27K36K45KSE +/- 9.32, N = 340829.931. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Floatrocky-gpu-2023-08-1720K40K60K80K100KSE +/- 0.00, N = 379753.351. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Doublerocky-gpu-2023-08-1730060090012001500SE +/- 3.82, N = 31388.891. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthrocky-gpu-2023-08-172004006008001000SE +/- 0.04, N = 3873.151. (CXX) g++ options: -O3

LeelaChessZero

Backend: OpenCL

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: OpenCLrocky-gpu-2023-08-1710K20K30K40K50KSE +/- 306.89, N = 3456921. (CXX) g++ options: -flto -pthread

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle Filterrocky-gpu-2023-08-170.4840.9681.4521.9362.422.1511. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.7Test: Conjugate Gradient OpenCLrocky-gpu-2023-08-170.19680.39360.59040.78720.984SE +/- 0.0024, N = 30.87451. (CXX) g++ options: -rdynamic

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUrocky-gpu-2023-08-17612182430SE +/- 0.02, N = 326.02MIN: 24.8 / MAX: 26.26

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUrocky-gpu-2023-08-17510152025SE +/- 0.25, N = 420.49MIN: 7.86 / MAX: 23.88

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUrocky-gpu-2023-08-17510152025SE +/- 0.04, N = 320.40MIN: 18.32 / MAX: 28.37

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUrocky-gpu-2023-08-17510152025SE +/- 0.04, N = 321.26MIN: 9.33 / MAX: 25.54

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUrocky-gpu-2023-08-171020304050SE +/- 0.04, N = 345.06MIN: 38.27 / MAX: 47.55

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLrocky-gpu-2023-08-170.65141.30281.95422.60563.257SE +/- 0.002, N = 32.8951. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYrocky-gpu-2023-08-1760120180240300SE +/- 0.88, N = 32781. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYrocky-gpu-2023-08-1790180270360450SE +/- 1.33, N = 34171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTrocky-gpu-2023-08-1790180270360450SE +/- 0.88, N = 34081. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYrocky-gpu-2023-08-1720406080100SE +/- 0.09, N = 379.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYrocky-gpu-2023-08-17306090120150SE +/- 0.33, N = 31201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTrocky-gpu-2023-08-17306090120150SE +/- 0.33, N = 31211. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Nrocky-gpu-2023-08-17306090120150SE +/- 0.58, N = 31361. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Trocky-gpu-2023-08-174080120160200SE +/- 0.33, N = 31701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNrocky-gpu-2023-08-1720406080100SE +/- 0.33, N = 31101. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTrocky-gpu-2023-08-1720406080100SE +/- 0.33, N = 31061. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNrocky-gpu-2023-08-17306090120150SE +/- 0.33, N = 31161. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTrocky-gpu-2023-08-1720406080100SE +/- 0.00, N = 21111. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYrocky-gpu-2023-08-1790180270360450SE +/- 0.67, N = 34351. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYrocky-gpu-2023-08-17120240360480600SE +/- 0.00, N = 35671. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTrocky-gpu-2023-08-17100200300400500SE +/- 0.00, N = 34471. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYrocky-gpu-2023-08-17140280420560700SE +/- 0.33, N = 36621. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYrocky-gpu-2023-08-172004006008001000SE +/- 0.00, N = 37771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTrocky-gpu-2023-08-17150300450600750SE +/- 0.33, N = 36811. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Nrocky-gpu-2023-08-1750100150200250SE +/- 0.33, N = 32191. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Trocky-gpu-2023-08-17100200300400500SE +/- 0.00, N = 34421. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNrocky-gpu-2023-08-172004006008001000SE +/- 0.00, N = 311501. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTrocky-gpu-2023-08-1730060090012001500SE +/- 0.00, N = 312701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNrocky-gpu-2023-08-1730060090012001500SE +/- 0.00, N = 312901. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTrocky-gpu-2023-08-1730060090012001500SE +/- 0.00, N = 313401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetrocky-gpu-2023-08-17246810SE +/- 0.03, N = 37.92MIN: 7.86 / MAX: 8.271. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2rocky-gpu-2023-08-170.69981.39962.09942.79923.499SE +/- 0.01, N = 33.11MIN: 3.05 / MAX: 3.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3rocky-gpu-2023-08-170.70651.4132.11952.8263.5325SE +/- 0.01, N = 33.14MIN: 3.09 / MAX: 3.221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2rocky-gpu-2023-08-170.74481.48962.23442.97923.724SE +/- 0.01, N = 33.31MIN: 3.27 / MAX: 3.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetrocky-gpu-2023-08-170.65931.31861.97792.63723.2965SE +/- 0.01, N = 32.93MIN: 2.89 / MAX: 3.121. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0rocky-gpu-2023-08-170.85731.71462.57193.42924.2865SE +/- 0.01, N = 33.81MIN: 3.78 / MAX: 3.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefacerocky-gpu-2023-08-170.3060.6120.9181.2241.53SE +/- 0.00, N = 31.36MIN: 1.33 / MAX: 1.461. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetrocky-gpu-2023-08-17246810SE +/- 0.03, N = 37.72MIN: 7.62 / MAX: 9.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16rocky-gpu-2023-08-17510152025SE +/- 0.03, N = 320.32MIN: 20.16 / MAX: 21.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18rocky-gpu-2023-08-171.12732.25463.38194.50925.6365SE +/- 0.01, N = 35.01MIN: 4.97 / MAX: 5.131. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetrocky-gpu-2023-08-170.89781.79562.69343.59124.489SE +/- 0.01, N = 33.99MIN: 3.94 / MAX: 4.161. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50rocky-gpu-2023-08-173691215SE +/- 0.03, N = 39.50MIN: 9.39 / MAX: 9.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyrocky-gpu-2023-08-173691215SE +/- 0.05, N = 312.30MIN: 11.99 / MAX: 12.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdrocky-gpu-2023-08-17246810SE +/- 0.01, N = 36.72MIN: 6.66 / MAX: 7.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mrocky-gpu-2023-08-17246810SE +/- 0.03, N = 38.24MIN: 8.16 / MAX: 8.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerrocky-gpu-2023-08-17714212835SE +/- 0.03, N = 331.04MIN: 30.84 / MAX: 33.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetrocky-gpu-2023-08-170.90231.80462.70693.60924.5115SE +/- 0.01, N = 34.01MIN: 3.97 / MAX: 4.121. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: NVIDIA OptiXrocky-gpu-2023-08-170.77631.55262.32893.10523.8815SE +/- 0.05, N = 153.45

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: NVIDIA OptiXrocky-gpu-2023-08-17246810SE +/- 0.01, N = 37.07

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: NVIDIA OptiXrocky-gpu-2023-08-171.19482.38963.58444.77925.974SE +/- 0.07, N = 125.31

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: NVIDIA OptiXrocky-gpu-2023-08-17714212835SE +/- 0.10, N = 329.34

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXrocky-gpu-2023-08-17246810SE +/- 0.02, N = 38.10

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUrocky-gpu-2023-08-179001800270036004500SE +/- 0.00, N = 34090


Phoronix Test Suite v10.8.4