clpeak benchmark

AMD EPYC 7262 8-Core testing with a GIGABYTE MZ32-AR0-00 v01000100 (R21 BIOS) and NVIDIA GeForce RTX 4090 24GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2401230-NE-CLPEAKBEN88.

clpeak benchmarkProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionNVIDIA GeForce RTX 4090AMD EPYC 7262 8-Core @ 3.20GHz (8 Cores / 16 Threads)GIGABYTE MZ32-AR0-00 v01000100 (R21 BIOS)AMD Starship/Matisse128GB1000GB Samsung SSD 980 PRO 1TBNVIDIA GeForce RTX 4090 24GBNVIDIA Device 22baDELL U2720Q2 x Intel I350Ubuntu 22.046.5.0-14-generic (x86_64)GNOME Shell 42.9X Server 1.21.1.4NVIDIA 535.154.054.6.0OpenCL 3.0 CUDA 12.2.1481.3.242GCC 11.4.0 + CUDA 11.8ext43840x2160OpenBenchmarking.org- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0x830107a - BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.02.18.80.53- GPU Compute Cores: 16384- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

clpeak benchmarkclpeak: Kernel Latencyclpeak: Integer Computeclpeak: Integer 24-bit Computeclpeak: Global Memory Bandwidthclpeak: Double-Precision Computeclpeak: Single-Precision Computeclpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferNVIDIA GeForce RTX 40906.2140578.1040776.55869.391346.2278861.249.3211.71OpenBenchmarking.org

clpeak

OpenCL Test: Kernel Latency

OpenBenchmarking.orgus, Fewer Is Betterclpeak 1.1.2OpenCL Test: Kernel LatencyNVIDIA GeForce RTX 4090246810SE +/- 0.02, N = 36.211. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer ComputeNVIDIA GeForce RTX 40909K18K27K36K45KSE +/- 186.80, N = 340578.101. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer 24-bit Compute

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer 24-bit ComputeNVIDIA GeForce RTX 40909K18K27K36K45KSE +/- 82.15, N = 340776.551. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA GeForce RTX 40902004006008001000SE +/- 2.22, N = 3869.391. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Compute

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision ComputeNVIDIA GeForce RTX 409030060090012001500SE +/- 0.63, N = 31346.221. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Compute

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision ComputeNVIDIA GeForce RTX 409020K40K60K80K100KSE +/- 505.70, N = 378861.241. (CXX) g++ options: -O3

clpeak

OpenCL Test: Transfer Bandwidth enqueueReadBuffer

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueReadBufferNVIDIA GeForce RTX 40903691215SE +/- 0.04, N = 39.321. (CXX) g++ options: -O3

clpeak

OpenCL Test: Transfer Bandwidth enqueueWriteBuffer

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueWriteBufferNVIDIA GeForce RTX 40903691215SE +/- 0.08, N = 311.711. (CXX) g++ options: -O3


Phoronix Test Suite v10.8.4