vkfft nvidia

AMD Ryzen Threadripper PRO 7995WX 96-Cores testing with a HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS) and NVIDIA RTX A4000 16GB on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402188-PTS-VKFFTNVI69.

vkfft nvidiaProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionabcAMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads)HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS)AMD Device 14a48 x 16GB DRAM-5200MT/s Hynix HMCG78AGBRA190N2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1NVIDIA RTX A4000 16GBNVIDIA GA104 HD AudioASUS VP28URealtek RTL8111/8168/8411Ubuntu 23.106.5.0-17-generic (x86_64)GNOME Shell 45.2X Server 1.21.1.7NVIDIA 535.154.054.6.0OpenCL 3.0 CUDA 12.2.148GCC 13.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa108105Graphics Details- BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.57.00.0bSecurity Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

vkfft nvidiavkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C Bluestein in single precisionvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingabc345741154851018715703682683352627526965335977114380101901568568251342212748696843506511370010170157386830634364275069704OpenBenchmarking.org

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rabc8K16K24K32K40KSE +/- 327.00, N = 3SE +/- 301.81, N = 8SE +/- 426.48, N = 33457435977350651. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisionabc20K40K60K80K100KSE +/- 1137.13, N = 15SE +/- 1436.38, N = 12SE +/- 982.37, N = 141154851143801137001. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisionabc2K4K6K8K10KSE +/- 31.81, N = 3SE +/- 104.33, N = 3SE +/- 30.25, N = 31018710190101701. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisionabc3K6K9K12K15KSE +/- 154.36, N = 3SE +/- 133.51, N = 3SE +/- 113.53, N = 31570315685157381. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisionabc15K30K45K60K75KSE +/- 30.33, N = 3SE +/- 14.53, N = 3SE +/- 13.32, N = 36826868251683061. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisionabc7K14K21K28K35KSE +/- 146.47, N = 3SE +/- 162.73, N = 3SE +/- 117.17, N = 33352634221343641. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precisionabc6001200180024003000SE +/- 0.88, N = 3SE +/- 0.67, N = 3SE +/- 0.33, N = 32752274827501. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingabc15K30K45K60K75KSE +/- 7.26, N = 3SE +/- 24.84, N = 3SE +/- 25.36, N = 36965369684697041. (CXX) g++ options: -O3


Phoronix Test Suite v10.8.4