gpuowl cs2 vkfft AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 3080 10GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402242-PTS-GPUOWLCS40&grs&sor .
gpuowl cs2 vkfft Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB NVIDIA GeForce RTX 3080 10GB NVIDIA GA102 HD Audio DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 NVIDIA 550.40.07 4.6.0 OpenCL 3.0 CUDA 12.4.74 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601203 Graphics Details - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 94.02.20.00.07 OpenCL Details - GPU Compute Cores: 8704 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
gpuowl cs2 vkfft vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT R2C / C2R opencl-benchmark: INT32 Compute cs2: 3840 x 2160 cs2: 1920 x 1080 vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C Bluestein in single precision gpuowl: 57885161 gpuowl: 332220523 opencl-benchmark: INT8 Compute gpuowl: 77936867 opencl-benchmark: FP64 Compute cs2: 1920 x 1200 cs2: 2560 x 1440 opencl-benchmark: FP32 Compute opencl-benchmark: INT64 Compute vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling vkfft: FFT + iFFT C2C 1D batched in single precision opencl-benchmark: INT16 Compute opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: Memory Bandwidth Coalesced Write a b c 25136 46283 148147 51046 16.861 121.4 308.0 3724 13286 723.24 115.73 12.108 532.10 0.528 291.7 221.9 32.873 3.231 116216 113874 14.565 702.72 721.83 23693 47650 143220 50803 16.666 121.6 311.4 3763 13225 729.39 116.65 12.078 536.19 0.531 292.7 221.3 32.915 3.222 116227 113952 14.56 702.78 721.79 25627 48385 145097 49951 16.921 122.8 309.8 3758 13358 728.86 115.78 12.173 532.20 0.527 293.7 221.1 32.797 3.225 116319 113935 14.562 702.84 721.72 OpenBenchmarking.org
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision c a b 5K 10K 15K 20K 25K SE +/- 311.90, N = 11 25627 25136 23693 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision c b a 10K 20K 30K 40K 50K SE +/- 368.38, N = 15 48385 47650 46283 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a c b 30K 60K 90K 120K 150K SE +/- 1616.73, N = 15 148147 145097 143220 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c 11K 22K 33K 44K 55K SE +/- 351.55, N = 15 51046 50803 49951 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute c a b 4 8 12 16 20 SE +/- 0.04, N = 3 16.92 16.86 16.67 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Counter-Strike 2 Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 3840 x 2160 c b a 30 60 90 120 150 SE +/- 0.29, N = 3 122.8 121.6 121.4 MIN: 120.9 / MAX: 121.9
Counter-Strike 2 Resolution: 1920 x 1080 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1080 b c a 70 140 210 280 350 SE +/- 0.38, N = 3 311.4 309.8 308.0 MIN: 307.3 / MAX: 308.6
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision b c a 800 1600 2400 3200 4000 SE +/- 8.17, N = 3 3763 3758 3724 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision c a b 3K 6K 9K 12K 15K SE +/- 60.40, N = 3 13358 13286 13225 1. (CXX) g++ options: -O3
GpuOwl Exponent: 57885161 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 57885161 b c a 160 320 480 640 800 SE +/- 0.17, N = 3 729.39 728.86 723.24 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
GpuOwl Exponent: 332220523 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 332220523 b c a 30 60 90 120 150 SE +/- 0.01, N = 3 116.65 115.78 115.73 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute c a b 3 6 9 12 15 SE +/- 0.03, N = 3 12.17 12.11 12.08 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
GpuOwl Exponent: 77936867 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 77936867 b c a 120 240 360 480 600 SE +/- 0.09, N = 3 536.19 532.20 532.10 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute b a c 0.1195 0.239 0.3585 0.478 0.5975 SE +/- 0.001, N = 3 0.531 0.528 0.527 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Counter-Strike 2 Resolution: 1920 x 1200 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 1920 x 1200 c b a 60 120 180 240 300 SE +/- 1.27, N = 3 293.7 292.7 291.7 MIN: 289.4 / MAX: 293.8
Counter-Strike 2 Resolution: 2560 x 1440 OpenBenchmarking.org Frames Per Second, More Is Better Counter-Strike 2 Resolution: 2560 x 1440 a b c 50 100 150 200 250 SE +/- 0.58, N = 3 221.9 221.3 221.1 MIN: 221.3 / MAX: 223.1
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute b a c 8 16 24 32 40 SE +/- 0.03, N = 3 32.92 32.87 32.80 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a c b 0.727 1.454 2.181 2.908 3.635 SE +/- 0.009, N = 3 3.231 3.225 3.222 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling c b a 20K 40K 60K 80K 100K SE +/- 85.34, N = 3 116319 116227 116216 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision b c a 20K 40K 60K 80K 100K SE +/- 22.70, N = 3 113952 113935 113874 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a c b 4 8 12 16 20 SE +/- 0.00, N = 3 14.57 14.56 14.56 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read c b a 150 300 450 600 750 SE +/- 0.00, N = 3 702.84 702.78 702.72 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c 160 320 480 640 800 SE +/- 0.03, N = 3 721.83 721.79 721.72 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Phoronix Test Suite v10.8.5