AMD Ryzen Threadripper PRO 7995WX 96-Cores testing with a HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS) and NVIDIA RTX A4000 16GB on Ubuntu 23.10 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2402188-PTS-VKFFTNVI69 vkfft nvidia - Phoronix Test Suite vkfft nvidia AMD Ryzen Threadripper PRO 7995WX 96-Cores testing with a HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS) and NVIDIA RTX A4000 16GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402188-PTS-VKFFTNVI69&sor&grt .
vkfft nvidia Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads) HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS) AMD Device 14a4 8 x 16GB DRAM-5200MT/s Hynix HMCG78AGBRA190N 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1 NVIDIA RTX A4000 16GB NVIDIA GA104 HD Audio ASUS VP28U Realtek RTL8111/8168/8411 Ubuntu 23.10 6.5.0-17-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 NVIDIA 535.154.05 4.6.0 OpenCL 3.0 CUDA 12.2.148 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa108105 Graphics Details - BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.57.00.0b Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
vkfft nvidia vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c 34574 115485 10187 15703 68268 33526 2752 69653 35977 114380 10190 15685 68251 34221 2748 69684 35065 113700 10170 15738 68306 34364 2750 69704 OpenBenchmarking.org
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R b c a 8K 16K 24K 32K 40K SE +/- 301.81, N = 8 SE +/- 426.48, N = 3 SE +/- 327.00, N = 3 35977 35065 34574 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c 20K 40K 60K 80K 100K SE +/- 1137.13, N = 15 SE +/- 1436.38, N = 12 SE +/- 982.37, N = 14 115485 114380 113700 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision b a c 2K 4K 6K 8K 10K SE +/- 104.33, N = 3 SE +/- 31.81, N = 3 SE +/- 30.25, N = 3 10190 10187 10170 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision c a b 3K 6K 9K 12K 15K SE +/- 113.53, N = 3 SE +/- 154.36, N = 3 SE +/- 133.51, N = 3 15738 15703 15685 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision c a b 15K 30K 45K 60K 75K SE +/- 13.32, N = 3 SE +/- 30.33, N = 3 SE +/- 14.53, N = 3 68306 68268 68251 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision c b a 7K 14K 21K 28K 35K SE +/- 117.17, N = 3 SE +/- 162.73, N = 3 SE +/- 146.47, N = 3 34364 34221 33526 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a c b 600 1200 1800 2400 3000 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 SE +/- 0.67, N = 3 2752 2750 2748 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling c b a 15K 30K 45K 60K 75K SE +/- 25.36, N = 3 SE +/- 24.84, N = 3 SE +/- 7.26, N = 3 69704 69684 69653 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.4