VIENNACL CL BLAS

AMD Ryzen 9 7945HX testing with a Alienware 0DWD2H (1.13.1 BIOS) and NVIDIA GeForce RTX 4090 Laptop GPU 16GB on cachyos rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2409213-EIRI-240307070&grr&sor.

VIENNACL CL BLASProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLCompilerFile-SystemScreen ResolutionDisplay DriverRadeon HD 8790MIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG70nVidia RTX 4090 mobileIntel Core i5-4300M @ 3.30GHz (2 Cores / 4 Threads)Dell 0VWNW8 (A26 BIOS)Intel Xeon E3-1200 v3/4th8GB128GB SAMSUNG SSD PM85AMD Radeon HD 8790M (1250MHz)Intel Xeon E3-1200 v3/4thIntel I217-LM + Intel Centrino Ultimate-N 6300cachyos rolling6.6.2-4-cachyos-lto (x86_64)GNOME Shell 45.1X Server 1.21.1.94.6 Mesa 24.0.0-devel (git-023fa0aa5d) (LLVM 16.0.6 DRM 3.54)OpenCL 1.1 Mesa 24.0.0-devel (git-023fa0aa5d)GCC 13.2.1 20231110 + Clang 16.0.6 + LLVM 16.0.6 + CUDA 12.3xfs1920x1080Intel HD 4600 HSW GT2 2GB (1250MHz)6.7.6-1-cachyos-rt-bore-lto (x86_64)KDE Plasma 5.27.10X Server 1.21.1.114.6 Mesa 24.0.1-arch1.1OpenCL 2.0 beignet 1.4 (git-f72309a5)GCC 13.2.1 20230801 + Clang 16.0.6 + LLVM 16.0.66.7.9-1-cachyos-rt-bore-lto (x86_64)KDE Plasma 6.0.14.6 Mesa 24.0.2-arch1.2Clang 17.0.6 + GCC 13.2.1 20230801 + LLVM 17.0.6AMD Ryzen 9 7945HX @ 5.46GHz (16 Cores / 32 Threads)Alienware 0DWD2H (1.13.1 BIOS)AMD Device 14d862GBPC SN810 NVMe WDC 2048GB + 4001GB CT4000P3SSD8NVIDIA GeForce RTX 4090 Laptop GPU 16GBNVIDIA Device 22bbRealtek RTL8125 2.5GbE + Qualcomm QCNFA7656.11.0-5-cachyos-lto (x86_64)GNOME Shell 47.0X Server 1.21.1.13NVIDIA 560.35.034.6.0OpenCL 3.0 CUDA 12.6.65GCC 14.2.1 20240910 + Clang 18.1.8 + LLVM 18.1.8 + CUDA 12.6zfs2560x1600OpenBenchmarking.orgKernel Details- Radeon HD 8790M: cfg80211.cfg80211_disable_40mhz_24ghz=1 mac80211.minstrel_vht_only=1 - Transparent Huge Pages: always- nVidia RTX 4090 mobile: Transparent Huge Pages: alwaysEnvironment Details- Radeon HD 8790M: DRI_PRIME=1 NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"- nVidia RTX 4090 mobile: MUTTER_DEBUG_KMS_THREAD_TYPE=userCompiler Details- Radeon HD 8790M: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - IntelR HD Graphics 4600 HSW GT2 0x416: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - nVidia RTX 4090 mobile: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- Radeon HD 8790M: Scaling Governor: intel_cpufreq performance - CPU Microcode: 0x28- IntelR HD Graphics 4600 HSW GT2 0x416: Scaling Governor: intel_cpufreq powersave - CPU Microcode: 0x28- Intel HD Graphics 4600 HSW GT2 CLANG70: Scaling Governor: intel_cpufreq performance - CPU Microcode: 0x28- nVidia RTX 4090 mobile: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601206Security Details- Radeon HD 8790M: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: vulnerable + mds: Vulnerable; SMT vulnerable + meltdown: Vulnerable + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable IBPB: disabled STIBP: disabled PBRSB-eIBRS: Not affected + srbds: Vulnerable + tsx_async_abort: Not affected - IntelR HD Graphics 4600 HSW GT2 0x416: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Mitigation of Microcode + tsx_async_abort: Not affected - Intel HD Graphics 4600 HSW GT2 CLANG70: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Mitigation of Microcode + tsx_async_abort: Not affected - nVidia RTX 4090 mobile: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected OpenCL Details- nVidia RTX 4090 mobile: GPU Compute Cores: 9728

VIENNACL CL BLASviennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTRadeon HD 8790MIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG70nVidia RTX 4090 mobile38.537.739.739.022.734.544.740.635.529.531.127.314.214.216.114.413.315.3681666637620373196518521469267437375OpenBenchmarking.org

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTnVidia RTX 4090 mobileRadeon HD 8790M150300450600750SE +/- 0.54, N = 14SE +/- 0.17, N = 3681.038.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNnVidia RTX 4090 mobileRadeon HD 8790M140280420560700SE +/- 0.55, N = 14SE +/- 0.06, N = 3666.037.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTnVidia RTX 4090 mobileRadeon HD 8790M140280420560700SE +/- 0.46, N = 14SE +/- 0.13, N = 3637.039.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNnVidia RTX 4090 mobileRadeon HD 8790M130260390520650SE +/- 0.41, N = 14SE +/- 0.07, N = 3620.039.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TnVidia RTX 4090 mobileRadeon HD 8790M80160240320400SE +/- 1.41, N = 14SE +/- 0.07, N = 3373.022.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NnVidia RTX 4090 mobileRadeon HD 8790M4080120160200SE +/- 0.07, N = 14SE +/- 0.28, N = 3196.034.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTnVidia RTX 4090 mobileRadeon HD 8790M110220330440550SE +/- 0.45, N = 14SE +/- 0.10, N = 3518.044.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYnVidia RTX 4090 mobileRadeon HD 8790M110220330440550SE +/- 0.33, N = 14SE +/- 0.03, N = 3521.040.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYnVidia RTX 4090 mobileRadeon HD 8790M100200300400500SE +/- 0.39, N = 14SE +/- 0.09, N = 3469.035.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYnVidia RTX 4090 mobileRadeon HD 8790MIntel HD Graphics 4600 HSW GT2 CLANG70IntelR HD Graphics 4600 HSW GT2 0x41660120180240300SE +/- 4.49, N = 14SE +/- 0.07, N = 3SE +/- 0.13, N = 3SE +/- 0.11, N = 15267.029.514.714.4

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYnVidia RTX 4090 mobileRadeon HD 8790MIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG7090180270360450SE +/- 0.54, N = 14SE +/- 0.03, N = 3SE +/- 0.12, N = 3SE +/- 0.03, N = 3437.031.114.213.5

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTnVidia RTX 4090 mobileRadeon HD 8790MIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG7080160240320400SE +/- 1.06, N = 14SE +/- 0.03, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 4375.0027.3016.1015.30


Phoronix Test Suite v10.8.5