VIENNACL CL BLAS

AMD Ryzen 9 7945HX testing with a Alienware 0DWD2H (1.13.1 BIOS) and NVIDIA GeForce RTX 4090 Laptop GPU 16GB on cachyos rolling via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2409213-EIRI-240307070
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Additional Graphs

Show Perf Per Core/Thread Calculation Graphs Where Applicable
Show Perf Per Clock Calculation Graphs Where Applicable

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
Radeon HD 8790M
November 29 2023
  19 Minutes
IntelR HD Graphics 4600 HSW GT2 0x416
February 27
  2 Minutes
Intel HD Graphics 4600 HSW GT2 CLANG70
March 08
  5 Minutes
nVidia RTX 4090 mobile
September 21
  55 Minutes
Invert Hiding All Results Option
  20 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


VIENNACL CL BLASProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLCompilerFile-SystemScreen ResolutionDisplay DriverRadeon HD 8790MIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG70nVidia RTX 4090 mobileIntel Core i5-4300M @ 3.30GHz (2 Cores / 4 Threads)Dell 0VWNW8 (A26 BIOS)Intel Xeon E3-1200 v3/4th8GB128GB SAMSUNG SSD PM85AMD Radeon HD 8790M (1250MHz)Intel Xeon E3-1200 v3/4thIntel I217-LM + Intel Centrino Ultimate-N 6300cachyos rolling6.6.2-4-cachyos-lto (x86_64)GNOME Shell 45.1X Server 1.21.1.94.6 Mesa 24.0.0-devel (git-023fa0aa5d) (LLVM 16.0.6 DRM 3.54)OpenCL 1.1 Mesa 24.0.0-devel (git-023fa0aa5d)GCC 13.2.1 20231110 + Clang 16.0.6 + LLVM 16.0.6 + CUDA 12.3xfs1920x1080Intel HD 4600 HSW GT2 2GB (1250MHz)6.7.6-1-cachyos-rt-bore-lto (x86_64)KDE Plasma 5.27.10X Server 1.21.1.114.6 Mesa 24.0.1-arch1.1OpenCL 2.0 beignet 1.4 (git-f72309a5)GCC 13.2.1 20230801 + Clang 16.0.6 + LLVM 16.0.66.7.9-1-cachyos-rt-bore-lto (x86_64)KDE Plasma 6.0.14.6 Mesa 24.0.2-arch1.2Clang 17.0.6 + GCC 13.2.1 20230801 + LLVM 17.0.6AMD Ryzen 9 7945HX @ 5.46GHz (16 Cores / 32 Threads)Alienware 0DWD2H (1.13.1 BIOS)AMD Device 14d862GBPC SN810 NVMe WDC 2048GB + 4001GB CT4000P3SSD8NVIDIA GeForce RTX 4090 Laptop GPU 16GBNVIDIA Device 22bbRealtek RTL8125 2.5GbE + Qualcomm QCNFA7656.11.0-5-cachyos-lto (x86_64)GNOME Shell 47.0X Server 1.21.1.13NVIDIA 560.35.034.6.0OpenCL 3.0 CUDA 12.6.65GCC 14.2.1 20240910 + Clang 18.1.8 + LLVM 18.1.8 + CUDA 12.6zfs2560x1600OpenBenchmarking.orgKernel Details- Radeon HD 8790M: cfg80211.cfg80211_disable_40mhz_24ghz=1 mac80211.minstrel_vht_only=1 - Transparent Huge Pages: always- nVidia RTX 4090 mobile: Transparent Huge Pages: alwaysEnvironment Details- Radeon HD 8790M: DRI_PRIME=1 NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"- nVidia RTX 4090 mobile: MUTTER_DEBUG_KMS_THREAD_TYPE=userCompiler Details- Radeon HD 8790M: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - IntelR HD Graphics 4600 HSW GT2 0x416: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - nVidia RTX 4090 mobile: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- Radeon HD 8790M: Scaling Governor: intel_cpufreq performance - CPU Microcode: 0x28- IntelR HD Graphics 4600 HSW GT2 0x416: Scaling Governor: intel_cpufreq powersave - CPU Microcode: 0x28- Intel HD Graphics 4600 HSW GT2 CLANG70: Scaling Governor: intel_cpufreq performance - CPU Microcode: 0x28- nVidia RTX 4090 mobile: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601206Security Details- Radeon HD 8790M: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: vulnerable + mds: Vulnerable; SMT vulnerable + meltdown: Vulnerable + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable IBPB: disabled STIBP: disabled PBRSB-eIBRS: Not affected + srbds: Vulnerable + tsx_async_abort: Not affected - IntelR HD Graphics 4600 HSW GT2 0x416: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Mitigation of Microcode + tsx_async_abort: Not affected - Intel HD Graphics 4600 HSW GT2 CLANG70: gather_data_sampling: Not affected + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Mitigation of Microcode + tsx_async_abort: Not affected - nVidia RTX 4090 mobile: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected OpenCL Details- nVidia RTX 4090 mobile: GPU Compute Cores: 9728

VIENNACL CL BLASviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTRadeon HD 8790MIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG70nVidia RTX 4090 mobile29.531.127.335.540.644.734.522.739.039.737.738.514.214.216.114.413.315.3267437375469521518196373620637666681OpenBenchmarking.org

ViennaCL

ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile makes use of ViennaCL's built-in benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYIntel HD Graphics 4600 HSW GT2 CLANG70IntelR HD Graphics 4600 HSW GT2 0x416Radeon HD 8790MnVidia RTX 4090 mobile60120180240300SE +/- 0.06, N = 3SE +/- 0.12, N = 7SE +/- 0.07, N = 3SE +/- 4.49, N = 1413.413.629.5267.0

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG70Radeon HD 8790MnVidia RTX 4090 mobile90180270360450SE +/- 0.30, N = 14SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.54, N = 1413.213.331.1437.0

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTIntelR HD Graphics 4600 HSW GT2 0x416Intel HD Graphics 4600 HSW GT2 CLANG70Radeon HD 8790MnVidia RTX 4090 mobile80160240320400SE +/- 0.53, N = 14SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 1.06, N = 1414.6115.0027.30375.00

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYRadeon HD 8790MnVidia RTX 4090 mobile100200300400500SE +/- 0.09, N = 3SE +/- 0.39, N = 1435.5469.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYRadeon HD 8790MnVidia RTX 4090 mobile110220330440550SE +/- 0.03, N = 3SE +/- 0.33, N = 1440.6521.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTRadeon HD 8790MnVidia RTX 4090 mobile110220330440550SE +/- 0.10, N = 3SE +/- 0.45, N = 1444.7518.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NRadeon HD 8790MnVidia RTX 4090 mobile4080120160200SE +/- 0.28, N = 3SE +/- 0.07, N = 1434.5196.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TRadeon HD 8790MnVidia RTX 4090 mobile80160240320400SE +/- 0.07, N = 3SE +/- 1.41, N = 1422.7373.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNRadeon HD 8790MnVidia RTX 4090 mobile130260390520650SE +/- 0.07, N = 3SE +/- 0.41, N = 1439.0620.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTRadeon HD 8790MnVidia RTX 4090 mobile140280420560700SE +/- 0.13, N = 3SE +/- 0.46, N = 1439.7637.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNRadeon HD 8790MnVidia RTX 4090 mobile140280420560700SE +/- 0.06, N = 3SE +/- 0.55, N = 1437.7666.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTRadeon HD 8790MnVidia RTX 4090 mobile150300450600750SE +/- 0.17, N = 3SE +/- 0.54, N = 1438.5681.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL