gpuowl cs2 vkfft AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 3080 10GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402242-PTS-GPUOWLCS40&grt&rdt .
gpuowl cs2 vkfft Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB NVIDIA GeForce RTX 3080 10GB NVIDIA GA102 HD Audio DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 NVIDIA 550.40.07 4.6.0 OpenCL 3.0 CUDA 12.4.74 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601203 Graphics Details - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 94.02.20.00.07 OpenCL Details - GPU Compute Cores: 8704 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
gpuowl cs2 vkfft cs2: 1920 x 1080 cs2: 1920 x 1200 cs2: 2560 x 1440 cs2: 3840 x 2160 gpuowl: 57885161 gpuowl: 77936867 gpuowl: 332220523 opencl-benchmark: FP64 Compute opencl-benchmark: FP32 Compute opencl-benchmark: INT64 Compute opencl-benchmark: INT32 Compute opencl-benchmark: INT16 Compute opencl-benchmark: INT8 Compute opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: Memory Bandwidth Coalesced Write vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c 308.0 291.7 221.9 121.4 723.24 532.10 115.73 0.528 32.873 3.231 16.861 14.565 12.108 702.72 721.83 51046 148147 13286 25136 113874 46283 3724 116216 311.4 292.7 221.3 121.6 729.39 536.19 116.65 0.531 32.915 3.222 16.666 14.56 12.078 702.78 721.79 50803 143220 13225 23693 113952 47650 3763 116227 309.8 293.7 221.1 122.8 728.86 532.20 115.78 0.527 32.797 3.225 16.921 14.562 12.173 702.84 721.72 49951 145097 13358 25627 113935 48385 3758 116319 OpenBenchmarking.org
Counter-Strike 2 Resolution: 1920 x 1080 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1080 a b c 70 140 210 280 350 SE +/- 0.38, N = 3 308.0 311.4 309.8 MIN: 307.3 / MAX: 308.6
Counter-Strike 2 Resolution: 1920 x 1200 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1200 a b c 60 120 180 240 300 SE +/- 1.27, N = 3 291.7 292.7 293.7 MIN: 289.4 / MAX: 293.8
Counter-Strike 2 Resolution: 2560 x 1440 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 2560 x 1440 a b c 50 100 150 200 250 SE +/- 0.58, N = 3 221.9 221.3 221.1 MIN: 221.3 / MAX: 223.1
Counter-Strike 2 Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 3840 x 2160 a b c 30 60 90 120 150 SE +/- 0.29, N = 3 121.4 121.6 122.8 MIN: 120.9 / MAX: 121.9
GpuOwl Exponent: 57885161 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 57885161 a b c 160 320 480 640 800 SE +/- 0.17, N = 3 723.24 729.39 728.86 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
GpuOwl Exponent: 77936867 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 77936867 a b c 120 240 360 480 600 SE +/- 0.09, N = 3 532.10 536.19 532.20 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
GpuOwl Exponent: 332220523 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 332220523 a b c 30 60 90 120 150 SE +/- 0.01, N = 3 115.73 116.65 115.78 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute a b c 0.1195 0.239 0.3585 0.478 0.5975 SE +/- 0.001, N = 3 0.528 0.531 0.527 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute a b c 8 16 24 32 40 SE +/- 0.03, N = 3 32.87 32.92 32.80 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a b c 0.727 1.454 2.181 2.908 3.635 SE +/- 0.009, N = 3 3.231 3.222 3.225 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute a b c 4 8 12 16 20 SE +/- 0.04, N = 3 16.86 16.67 16.92 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a b c 4 8 12 16 20 SE +/- 0.00, N = 3 14.57 14.56 14.56 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute a b c 3 6 9 12 15 SE +/- 0.03, N = 3 12.11 12.08 12.17 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read a b c 150 300 450 600 750 SE +/- 0.00, N = 3 702.72 702.78 702.84 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c 160 320 480 640 800 SE +/- 0.03, N = 3 721.83 721.79 721.72 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c 11K 22K 33K 44K 55K SE +/- 351.55, N = 15 51046 50803 49951 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c 30K 60K 90K 120K 150K SE +/- 1616.73, N = 15 148147 143220 145097 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision a b c 3K 6K 9K 12K 15K SE +/- 60.40, N = 3 13286 13225 13358 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a b c 5K 10K 15K 20K 25K SE +/- 311.90, N = 11 25136 23693 25627 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision a b c 20K 40K 60K 80K 100K SE +/- 22.70, N = 3 113874 113952 113935 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision a b c 10K 20K 30K 40K 50K SE +/- 368.38, N = 15 46283 47650 48385 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a b c 800 1600 2400 3200 4000 SE +/- 8.17, N = 3 3724 3763 3758 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c 20K 40K 60K 80K 100K SE +/- 85.34, N = 3 116216 116227 116319 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5