more gpu comp AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 4080 16GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402252-PTS-MOREGPUC88&rdt&grr .
more gpu comp Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution 4080 a c d AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB NVIDIA GeForce RTX 4080 16GB NVIDIA Device 22bb DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 NVIDIA 550.40.07 4.6.0 OpenCL 3.0 CUDA 12.4.74 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203 Graphics Details - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.03.0e.00.04 OpenCL Details - GPU Compute Cores: 9728 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
more gpu comp vkfft: FFT + iFFT C2C 1D batched in double precision gpuowl: 77936867 gpuowl: 332220523 vkfft: FFT + iFFT C2C Bluestein benchmark in double precision gpuowl: 57885161 vkfft: FFT + iFFT C2C 1D batched in single precision blender: Barbershop - NVIDIA CUDA vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling vkfft: FFT + iFFT C2C Bluestein in single precision fluidx3d: FP32-FP32 libplacebo: gaussian libplacebo: av1_grain_lap libplacebo: hdr_lut libplacebo: hdr_peakdetect libplacebo: polar_nocompute libplacebo: deband_heavy blender: Barbershop - NVIDIA OptiX vkfft: FFT + iFFT C2C 1D batched in half precision blender: Pabellon Barcelona - NVIDIA CUDA vkfft: FFT + iFFT C2C multidimensional in single precision fluidx3d: FP32-FP16S fluidx3d: FP32-FP16C blender: Fishy Cat - NVIDIA CUDA vkfft: FFT + iFFT R2C / C2R blender: Classroom - NVIDIA CUDA blender: Pabellon Barcelona - NVIDIA OptiX opencl-benchmark: Memory Bandwidth Coalesced Write opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: INT8 Compute opencl-benchmark: INT16 Compute opencl-benchmark: INT32 Compute opencl-benchmark: INT64 Compute opencl-benchmark: FP32 Compute opencl-benchmark: FP64 Compute blender: Classroom - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: BMW27 - NVIDIA CUDA blender: BMW27 - NVIDIA OptiX 4080 a c d 27333 868.06 183.45 5629 1142.86 103773 61.02 105197 16852 3831 3343.21 3520.39 2837.37 2748.42 1866.33 1482.91 40.76 140578 32.3 71825 7694 7768 15.8 67280 14.61 11.21 604.74 652.26 20.12 23.103 26.566 4.516 51.545 0.83 10.03 8.52 7.84 5.43 32480 865.05 183.12 5599 1142.86 103677 61.07 105077 16895 3818 3339.53 3480.1 2834.07 2903.94 1866 1481.21 40.72 140729 68422 7695 7765 15.85 72294 14.62 604.14 652.31 20.12 23.106 26.647 4.516 51.549 0.83 10.03 7.7 7.93 4.58 29102 865.05 183.08 5607 1142.86 103679 60.76 105106 17012 3836 3337.95 3416.93 2838.24 2939.6 1866.69 1481.72 40.73 144146 32.35 70669 7697 7778 15.83 66686 14.65 11.21 603.59 652.33 20.116 23.124 26.554 4.539 51.549 0.83 10.05 7.67 7.84 4.59 33512 865.05 183.08 5610 1142.86 103680 60.69 105113 16930 3837 3344.45 3518.33 2843.73 2936.07 1870.99 1483.3 40.88 141755 32.31 70089 7694 7763 15.85 62135 14.63 11.16 603.94 652.3 20.125 23.103 26.567 4.514 51.557 0.83 10.04 7.65 7.93 4.58 OpenBenchmarking.org
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision 4080 a c d 7K 14K 21K 28K 35K 27333 32480 29102 33512 1. (CXX) g++ options: -O3
GpuOwl Exponent: 77936867 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 77936867 4080 a c d 200 400 600 800 1000 868.06 865.05 865.05 865.05 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
GpuOwl Exponent: 332220523 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 332220523 4080 a c d 40 80 120 160 200 183.45 183.12 183.08 183.08 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision 4080 a c d 1200 2400 3600 4800 6000 5629 5599 5607 5610 1. (CXX) g++ options: -O3
GpuOwl Exponent: 57885161 OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.5 Exponent: 57885161 4080 a c d 200 400 600 800 1000 1142.86 1142.86 1142.86 1142.86 1. (CXX) g++ options: -O3 -lgmp -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision 4080 a c d 20K 40K 60K 80K 100K 103773 103677 103679 103680 1. (CXX) g++ options: -O3
Blender Blend File: Barbershop - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: NVIDIA CUDA 4080 a c d 14 28 42 56 70 61.02 61.07 60.76 60.69
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling 4080 a c d 20K 40K 60K 80K 100K 105197 105077 105106 105113 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision 4080 a c d 4K 8K 12K 16K 20K 16852 16895 17012 16930 1. (CXX) g++ options: -O3
FluidX3D Test: FP32-FP32 OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 2.9 Test: FP32-FP32 4080 a c d 800 1600 2400 3200 4000 3831 3818 3836 3837
Libplacebo Test: gaussian OpenBenchmarking.org FPS, More Is Better Libplacebo 6.338.2 Test: gaussian 4080 a c d 700 1400 2100 2800 3500 3343.21 3339.53 3337.95 3344.45 1. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF
Libplacebo Test: av1_grain_lap OpenBenchmarking.org FPS, More Is Better Libplacebo 6.338.2 Test: av1_grain_lap 4080 a c d 800 1600 2400 3200 4000 3520.39 3480.10 3416.93 3518.33 1. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF
Libplacebo Test: hdr_lut OpenBenchmarking.org FPS, More Is Better Libplacebo 6.338.2 Test: hdr_lut 4080 a c d 600 1200 1800 2400 3000 2837.37 2834.07 2838.24 2843.73 1. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF
Libplacebo Test: hdr_peakdetect OpenBenchmarking.org FPS, More Is Better Libplacebo 6.338.2 Test: hdr_peakdetect 4080 a c d 600 1200 1800 2400 3000 2748.42 2903.94 2939.60 2936.07 1. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF
Libplacebo Test: polar_nocompute OpenBenchmarking.org FPS, More Is Better Libplacebo 6.338.2 Test: polar_nocompute 4080 a c d 400 800 1200 1600 2000 1866.33 1866.00 1866.69 1870.99 1. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF
Libplacebo Test: deband_heavy OpenBenchmarking.org FPS, More Is Better Libplacebo 6.338.2 Test: deband_heavy 4080 a c d 300 600 900 1200 1500 1482.91 1481.21 1481.72 1483.30 1. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: NVIDIA OptiX 4080 a c d 9 18 27 36 45 40.76 40.72 40.73 40.88
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision 4080 a c d 30K 60K 90K 120K 150K 140578 140729 144146 141755 1. (CXX) g++ options: -O3
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA 4080 c d 8 16 24 32 40 32.30 32.35 32.31
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision 4080 a c d 15K 30K 45K 60K 75K 71825 68422 70669 70089 1. (CXX) g++ options: -O3
FluidX3D Test: FP32-FP16S OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 2.9 Test: FP32-FP16S 4080 a c d 1600 3200 4800 6400 8000 7694 7695 7697 7694
FluidX3D Test: FP32-FP16C OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 2.9 Test: FP32-FP16C 4080 a c d 1700 3400 5100 6800 8500 7768 7765 7778 7763
Blender Blend File: Fishy Cat - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA CUDA 4080 a c d 4 8 12 16 20 15.80 15.85 15.83 15.85
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R 4080 a c d 15K 30K 45K 60K 75K 67280 72294 66686 62135 1. (CXX) g++ options: -O3
Blender Blend File: Classroom - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: NVIDIA CUDA 4080 a c d 4 8 12 16 20 14.61 14.62 14.65 14.63
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX 4080 c d 3 6 9 12 15 11.21 11.21 11.16
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write 4080 a c d 130 260 390 520 650 604.74 604.14 603.59 603.94 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read 4080 a c d 140 280 420 560 700 652.26 652.31 652.33 652.30 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute 4080 a c d 5 10 15 20 25 20.12 20.12 20.12 20.13 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute 4080 a c d 6 12 18 24 30 23.10 23.11 23.12 23.10 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute 4080 a c d 6 12 18 24 30 26.57 26.65 26.55 26.57 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute 4080 a c d 1.0213 2.0426 3.0639 4.0852 5.1065 4.516 4.516 4.539 4.514 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute 4080 a c d 12 24 36 48 60 51.55 51.55 51.55 51.56 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute 4080 a c d 0.1868 0.3736 0.5604 0.7472 0.934 0.83 0.83 0.83 0.83 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: NVIDIA OptiX 4080 a c d 3 6 9 12 15 10.03 10.03 10.05 10.04
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA OptiX 4080 a c d 2 4 6 8 10 8.52 7.70 7.67 7.65
Blender Blend File: BMW27 - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA CUDA 4080 a c d 2 4 6 8 10 7.84 7.93 7.84 7.93
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA OptiX 4080 a c d 1.2218 2.4436 3.6654 4.8872 6.109 5.43 4.58 4.59 4.58
Phoronix Test Suite v10.8.5