clpeak benchmark AMD EPYC 7262 8-Core testing with a GIGABYTE MZ32-AR0-00 v01000100 (R21 BIOS) and NVIDIA GeForce RTX 4090 24GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401230-NE-CLPEAKBEN88&grs .
clpeak benchmark Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution NVIDIA GeForce RTX 4090 AMD EPYC 7262 8-Core @ 3.20GHz (8 Cores / 16 Threads) GIGABYTE MZ32-AR0-00 v01000100 (R21 BIOS) AMD Starship/Matisse 128GB 1000GB Samsung SSD 980 PRO 1TB NVIDIA GeForce RTX 4090 24GB NVIDIA Device 22ba DELL U2720Q 2 x Intel I350 Ubuntu 22.04 6.5.0-14-generic (x86_64) GNOME Shell 42.9 X Server 1.21.1.4 NVIDIA 535.154.05 4.6.0 OpenCL 3.0 CUDA 12.2.148 1.3.242 GCC 11.4.0 + CUDA 11.8 ext4 3840x2160 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0x830107a - BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.02.18.80.53 - GPU Compute Cores: 16384 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
clpeak benchmark clpeak: Transfer Bandwidth enqueueWriteBuffer clpeak: Transfer Bandwidth enqueueReadBuffer clpeak: Single-Precision Compute clpeak: Double-Precision Compute clpeak: Global Memory Bandwidth clpeak: Integer 24-bit Compute clpeak: Integer Compute clpeak: Kernel Latency NVIDIA GeForce RTX 4090 11.71 9.32 78861.24 1346.22 869.39 40776.55 40578.10 6.21 OpenBenchmarking.org
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer NVIDIA GeForce RTX 4090 3 6 9 12 15 SE +/- 0.08, N = 3 11.71 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer NVIDIA GeForce RTX 4090 3 6 9 12 15 SE +/- 0.04, N = 3 9.32 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Compute NVIDIA GeForce RTX 4090 20K 40K 60K 80K 100K SE +/- 505.70, N = 3 78861.24 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Compute NVIDIA GeForce RTX 4090 300 600 900 1200 1500 SE +/- 0.63, N = 3 1346.22 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth NVIDIA GeForce RTX 4090 200 400 600 800 1000 SE +/- 2.22, N = 3 869.39 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer 24-bit Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute NVIDIA GeForce RTX 4090 9K 18K 27K 36K 45K SE +/- 82.15, N = 3 40776.55 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute NVIDIA GeForce RTX 4090 9K 18K 27K 36K 45K SE +/- 186.80, N = 3 40578.10 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak 1.1.2 OpenCL Test: Kernel Latency NVIDIA GeForce RTX 4090 2 4 6 8 10 SE +/- 0.02, N = 3 6.21 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5