gpuowl cs2 vkfft AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 3080 10GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402242-PTS-GPUOWLCS40&grr .
gpuowl cs2 vkfft Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB NVIDIA GeForce RTX 3080 10GB NVIDIA GA102 HD Audio DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 NVIDIA 550.40.07 4.6.0 OpenCL 3.0 CUDA 12.4.74 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601203 Graphics Details - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 94.02.20.00.07 OpenCL Details - GPU Compute Cores: 8704 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
gpuowl cs2 vkfft vkfft: FFT + iFFT C2C 1D batched in double precision gpuowl: 77936867 gpuowl: 332220523 gpuowl: 57885161 vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C multidimensional in single precision cs2: 3840 x 2160 vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling cs2: 2560 x 1440 vkfft: FFT + iFFT R2C / C2R cs2: 1920 x 1200 cs2: 1920 x 1080 opencl-benchmark: Memory Bandwidth Coalesced Write opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: INT8 Compute opencl-benchmark: INT16 Compute opencl-benchmark: INT32 Compute opencl-benchmark: INT64 Compute opencl-benchmark: FP32 Compute opencl-benchmark: FP64 Compute a b c 25136 532.10 115.73 723.24 148147 3724 46283 121.4 113874 13286 116216 221.9 51046 291.7 308.0 721.83 702.72 12.108 14.565 16.861 3.231 32.873 0.528 23693 536.19 116.65 729.39 143220 3763 47650 121.6 113952 13225 116227 221.3 50803 292.7 311.4 721.79 702.78 12.078 14.56 16.666 3.222 32.915 0.531 25627 532.20 115.78 728.86 145097 3758 48385 122.8 113935 13358 116319 221.1 49951 293.7 309.8 721.72 702.84 12.173 14.562 16.921 3.225 32.797 0.527 OpenBenchmarking.org
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a b c 5K 10K 15K 20K 25K SE +/- 311.90, N = 11 25136 23693 25627 1. (CXX) g++ options: -O3
GpuOwl Exponent: 77936867 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 77936867 a b c 120 240 360 480 600 SE +/- 0.09, N = 3 532.10 536.19 532.20 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
GpuOwl Exponent: 332220523 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 332220523 a b c 30 60 90 120 150 SE +/- 0.01, N = 3 115.73 116.65 115.78 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
GpuOwl Exponent: 57885161 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 57885161 a b c 160 320 480 640 800 SE +/- 0.17, N = 3 723.24 729.39 728.86 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c 30K 60K 90K 120K 150K SE +/- 1616.73, N = 15 148147 143220 145097 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a b c 800 1600 2400 3200 4000 SE +/- 8.17, N = 3 3724 3763 3758 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision a b c 10K 20K 30K 40K 50K SE +/- 368.38, N = 15 46283 47650 48385 1. (CXX) g++ options: -O3
Counter-Strike 2 Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 3840 x 2160 a b c 30 60 90 120 150 SE +/- 0.29, N = 3 121.4 121.6 122.8 MIN: 120.9 / MAX: 121.9
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision a b c 20K 40K 60K 80K 100K SE +/- 22.70, N = 3 113874 113952 113935 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision a b c 3K 6K 9K 12K 15K SE +/- 60.40, N = 3 13286 13225 13358 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c 20K 40K 60K 80K 100K SE +/- 85.34, N = 3 116216 116227 116319 1. (CXX) g++ options: -O3
Counter-Strike 2 Resolution: 2560 x 1440 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 2560 x 1440 a b c 50 100 150 200 250 SE +/- 0.58, N = 3 221.9 221.3 221.1 MIN: 221.3 / MAX: 223.1
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c 11K 22K 33K 44K 55K SE +/- 351.55, N = 15 51046 50803 49951 1. (CXX) g++ options: -O3
Counter-Strike 2 Resolution: 1920 x 1200 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1200 a b c 60 120 180 240 300 SE +/- 1.27, N = 3 291.7 292.7 293.7 MIN: 289.4 / MAX: 293.8
Counter-Strike 2 Resolution: 1920 x 1080 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1080 a b c 70 140 210 280 350 SE +/- 0.38, N = 3 308.0 311.4 309.8 MIN: 307.3 / MAX: 308.6
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c 160 320 480 640 800 SE +/- 0.03, N = 3 721.83 721.79 721.72 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read a b c 150 300 450 600 750 SE +/- 0.00, N = 3 702.72 702.78 702.84 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute a b c 3 6 9 12 15 SE +/- 0.03, N = 3 12.11 12.08 12.17 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a b c 4 8 12 16 20 SE +/- 0.00, N = 3 14.57 14.56 14.56 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute a b c 4 8 12 16 20 SE +/- 0.04, N = 3 16.86 16.67 16.92 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a b c 0.727 1.454 2.181 2.908 3.635 SE +/- 0.009, N = 3 3.231 3.222 3.225 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute a b c 8 16 24 32 40 SE +/- 0.03, N = 3 32.87 32.92 32.80 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute a b c 0.1195 0.239 0.3585 0.478 0.5975 SE +/- 0.001, N = 3 0.528 0.531 0.527 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Phoronix Test Suite v10.8.5