FirstPhoronixOpenCLtest AMD Ryzen 9 7900X 12-Core testing with a ASUS ROG STRIX X670E-A GAMING WIFI (0925 BIOS) and Sapphire AMD Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M 12GB on Debian 12 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2304256-NE-FIRSTPHOR20 .
FirstPhoronixOpenCLtest Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenCL Compiler File-System Screen Resolution base line, mostly aiming at ensuring some functionality AMD Ryzen 9 7900X 12-Core @ 4.70GHz (12 Cores / 24 Threads) ASUS ROG STRIX X670E-A GAMING WIFI (0925 BIOS) AMD Device 14d8 2 x 32 GB DDR5-5600MT/s CMT64GX5M2X5600C40 Western Digital WD_BLACK SN850X 1000GB Sapphire AMD Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M 12GB AMD Navi 21/23 PHL 246E9Q Intel I225-V Debian 12 6.1.0-7-amd64 (x86_64) Xfce X Server 1.21.1.7 OpenCL 2.1 AMD-APP (3513.0) GCC 12.2.0 xfs 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: always - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa601203 - GLAMOR - vBIOS Version: 113-D5270301-S01 - Python 3.11.2 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
FirstPhoronixOpenCLtest shoc: OpenCL - S3D shoc: OpenCL - Triad shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash shoc: OpenCL - Reduction shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Max SP Flops shoc: OpenCL - Bus Speed Download shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Texture Read Bandwidth cl-mem: Copy cl-mem: Read cl-mem: Write fluidx3d: FP32-FP32 fluidx3d: FP32-FP16C fluidx3d: FP32-FP16S clpeak: Kernel Latency clpeak: Integer Compute clpeak: Integer 24-bit Compute clpeak: Global Memory Bandwidth clpeak: Double-Precision Compute clpeak: Single-Precision Compute clpeak: Transfer Bandwidth enqueueReadBuffer clpeak: Transfer Bandwidth enqueueWriteBuffer rodinia: OpenCL Myocyte rodinia: OpenCL Heartwall rodinia: OpenCL Leukocyte rodinia: OpenCL Particle Filter viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-TT darktable: Boat - OpenCL darktable: Masskrug - OpenCL darktable: Server Rack - OpenCL darktable: Server Room - OpenCL lulesh-cl: base line, mostly aiming at ensuring some functionality 102.5068 20.6610 1185.71 16.9534 611.678 4698.03 21042717 28.8524 25.2684 661.622 299.6 387.6 332.1 1365 2929 2914 14.06 3193.96 11033.56 363.42 835.45 12217.73 2.84 8.57 64.783 5.373 7.631 4.342 97.44 101.3 86.59 41.7 64.2 74.44 92.1 111.8 63.3 60.7 67.5 64.7 461 647 333 293 326 351 94.0 321 740 755 728 748 9.761 2.351 0.149 0.624 OpenBenchmarking.org
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 0.92, N = 15 102.51 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad base line, mostly aiming at ensuring some functionality 5 10 15 20 25 SE +/- 0.05, N = 3 20.66 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP base line, mostly aiming at ensuring some functionality 300 600 900 1200 1500 SE +/- 2.21, N = 3 1185.71 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash base line, mostly aiming at ensuring some functionality 4 8 12 16 20 SE +/- 0.00, N = 3 16.95 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction base line, mostly aiming at ensuring some functionality 130 260 390 520 650 SE +/- 0.83, N = 3 611.68 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N base line, mostly aiming at ensuring some functionality 1000 2000 3000 4000 5000 SE +/- 56.03, N = 4 4698.03 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops base line, mostly aiming at ensuring some functionality 5M 10M 15M 20M 25M SE +/- 841453.09, N = 6 21042717 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download base line, mostly aiming at ensuring some functionality 7 14 21 28 35 SE +/- 0.00, N = 3 28.85 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback base line, mostly aiming at ensuring some functionality 6 12 18 24 30 SE +/- 0.05, N = 3 25.27 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth base line, mostly aiming at ensuring some functionality 140 280 420 560 700 SE +/- 3.26, N = 3 661.62 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy base line, mostly aiming at ensuring some functionality 70 140 210 280 350 SE +/- 0.09, N = 3 299.6 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read base line, mostly aiming at ensuring some functionality 80 160 240 320 400 SE +/- 0.66, N = 3 387.6 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write base line, mostly aiming at ensuring some functionality 70 140 210 280 350 SE +/- 0.50, N = 3 332.1 1. (CC) gcc options: -O2 -flto -lOpenCL
FluidX3D Test: FP32-FP32 OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 2.3 Test: FP32-FP32 base line, mostly aiming at ensuring some functionality 300 600 900 1200 1500 SE +/- 0.58, N = 3 1365
FluidX3D Test: FP32-FP16C OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 2.3 Test: FP32-FP16C base line, mostly aiming at ensuring some functionality 600 1200 1800 2400 3000 SE +/- 0.88, N = 3 2929
FluidX3D Test: FP32-FP16S OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 2.3 Test: FP32-FP16S base line, mostly aiming at ensuring some functionality 600 1200 1800 2400 3000 SE +/- 0.88, N = 3 2914
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak 1.1.2 OpenCL Test: Kernel Latency base line, mostly aiming at ensuring some functionality 4 8 12 16 20 SE +/- 0.17, N = 4 14.06 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute base line, mostly aiming at ensuring some functionality 700 1400 2100 2800 3500 SE +/- 2.94, N = 3 3193.96 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer 24-bit Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute base line, mostly aiming at ensuring some functionality 2K 4K 6K 8K 10K SE +/- 10.04, N = 3 11033.56 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth base line, mostly aiming at ensuring some functionality 80 160 240 320 400 SE +/- 0.34, N = 3 363.42 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Compute base line, mostly aiming at ensuring some functionality 200 400 600 800 1000 SE +/- 0.65, N = 3 835.45 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Compute base line, mostly aiming at ensuring some functionality 3K 6K 9K 12K 15K SE +/- 3.09, N = 3 12217.73 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer base line, mostly aiming at ensuring some functionality 0.639 1.278 1.917 2.556 3.195 SE +/- 0.03, N = 3 2.84 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer base line, mostly aiming at ensuring some functionality 2 4 6 8 10 SE +/- 0.11, N = 15 8.57 1. (CXX) g++ options: -O3
Rodinia Test: OpenCL Myocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Myocyte base line, mostly aiming at ensuring some functionality 14 28 42 56 70 SE +/- 0.33, N = 3 64.78 1. (CXX) g++ options: -O2 -lOpenCL
Rodinia Test: OpenCL Heartwall OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Heartwall base line, mostly aiming at ensuring some functionality 1.2089 2.4178 3.6267 4.8356 6.0445 SE +/- 0.020, N = 3 5.373 1. (CXX) g++ options: -O2 -lOpenCL
Rodinia Test: OpenCL Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Leukocyte base line, mostly aiming at ensuring some functionality 2 4 6 8 10 SE +/- 0.078, N = 3 7.631 1. (CXX) g++ options: -O2 -lOpenCL
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter base line, mostly aiming at ensuring some functionality 0.977 1.954 2.931 3.908 4.885 SE +/- 0.018, N = 3 4.342 1. (CXX) g++ options: -O2 -lOpenCL
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 25.57, N = 15 97.44 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 35.62, N = 15 101.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 32.89, N = 15 86.59 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY base line, mostly aiming at ensuring some functionality 10 20 30 40 50 SE +/- 5.65, N = 15 41.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY base line, mostly aiming at ensuring some functionality 14 28 42 56 70 SE +/- 8.48, N = 15 64.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 9.69, N = 15 74.44 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 10.55, N = 15 92.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T base line, mostly aiming at ensuring some functionality 30 60 90 120 150 SE +/- 5.06, N = 15 111.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN base line, mostly aiming at ensuring some functionality 14 28 42 56 70 SE +/- 0.31, N = 15 63.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT base line, mostly aiming at ensuring some functionality 14 28 42 56 70 SE +/- 0.34, N = 15 60.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN base line, mostly aiming at ensuring some functionality 15 30 45 60 75 SE +/- 0.26, N = 15 67.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT base line, mostly aiming at ensuring some functionality 14 28 42 56 70 SE +/- 0.21, N = 13 64.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY base line, mostly aiming at ensuring some functionality 100 200 300 400 500 SE +/- 5.42, N = 15 461 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY base line, mostly aiming at ensuring some functionality 140 280 420 560 700 SE +/- 8.30, N = 15 647 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT base line, mostly aiming at ensuring some functionality 70 140 210 280 350 SE +/- 17.41, N = 15 333 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY base line, mostly aiming at ensuring some functionality 60 120 180 240 300 SE +/- 0.96, N = 15 293 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY base line, mostly aiming at ensuring some functionality 70 140 210 280 350 SE +/- 0.97, N = 15 326 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT base line, mostly aiming at ensuring some functionality 80 160 240 320 400 SE +/- 0.95, N = 15 351 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N base line, mostly aiming at ensuring some functionality 20 40 60 80 100 SE +/- 1.72, N = 15 94.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T base line, mostly aiming at ensuring some functionality 70 140 210 280 350 SE +/- 1.91, N = 15 321 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN base line, mostly aiming at ensuring some functionality 160 320 480 640 800 SE +/- 2.50, N = 15 740 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT base line, mostly aiming at ensuring some functionality 160 320 480 640 800 SE +/- 1.13, N = 15 755 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN base line, mostly aiming at ensuring some functionality 160 320 480 640 800 SE +/- 1.99, N = 14 728 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT base line, mostly aiming at ensuring some functionality 160 320 480 640 800 SE +/- 2.43, N = 15 748 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Darktable Test: Boat - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.1 Test: Boat - Acceleration: OpenCL base line, mostly aiming at ensuring some functionality 3 6 9 12 15 SE +/- 0.002, N = 3 9.761
Darktable Test: Masskrug - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.1 Test: Masskrug - Acceleration: OpenCL base line, mostly aiming at ensuring some functionality 0.529 1.058 1.587 2.116 2.645 SE +/- 0.027, N = 3 2.351
Darktable Test: Server Rack - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.1 Test: Server Rack - Acceleration: OpenCL base line, mostly aiming at ensuring some functionality 0.0335 0.067 0.1005 0.134 0.1675 SE +/- 0.001, N = 15 0.149
Darktable Test: Server Room - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.1 Test: Server Room - Acceleration: OpenCL base line, mostly aiming at ensuring some functionality 0.1404 0.2808 0.4212 0.5616 0.702 SE +/- 0.005, N = 15 0.624
Phoronix Test Suite v10.8.5