heffte 7950x AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA GeForce RTX 3060 Ti 8GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2310274-PTS-HEFFTE7999&grr .
heffte 7950x Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 32GB Western Digital WD_BLACK SN850X 1000GB NVIDIA GeForce RTX 3060 Ti 8GB NVIDIA GA104 HD Audio ASUS MG28U Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.6.0-060600rc5-generic (x86_64) GNOME Shell 45.0 X Server 1.21.1.7 NVIDIA 545.23.06 4.6.0 OpenCL 3.0 CUDA 12.3.68 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
heffte 7950x heffte: c2c - FFTW - double-long - 512 heffte: c2c - FFTW - double - 512 heffte: c2c - Stock - double - 512 heffte: c2c - Stock - double-long - 512 heffte: r2c - FFTW - double-long - 512 heffte: r2c - FFTW - double - 512 heffte: c2c - FFTW - float-long - 512 heffte: c2c - FFTW - float - 512 heffte: r2c - Stock - double-long - 512 heffte: r2c - Stock - double - 512 heffte: c2c - Stock - float-long - 512 heffte: c2c - Stock - float - 512 heffte: r2c - FFTW - float-long - 512 heffte: r2c - FFTW - float - 512 heffte: r2c - Stock - float-long - 512 heffte: r2c - Stock - float - 512 heffte: c2c - FFTW - double-long - 256 heffte: c2c - FFTW - double - 256 heffte: c2c - Stock - double - 256 heffte: c2c - Stock - double-long - 256 heffte: c2c - FFTW - float-long - 128 heffte: r2c - FFTW - double - 128 heffte: c2c - Stock - double-long - 128 heffte: r2c - FFTW - double-long - 128 heffte: c2c - Stock - float-long - 128 heffte: c2c - Stock - float - 128 heffte: r2c - FFTW - double-long - 256 heffte: r2c - FFTW - double - 256 heffte: r2c - FFTW - float-long - 128 heffte: c2c - FFTW - float - 256 heffte: r2c - Stock - float - 128 heffte: c2c - FFTW - float-long - 256 heffte: r2c - Stock - double - 256 heffte: r2c - Stock - double-long - 256 heffte: c2c - Stock - float-long - 256 heffte: c2c - Stock - float - 256 heffte: c2c - FFTW - float - 128 heffte: r2c - FFTW - float - 128 heffte: c2c - FFTW - double-long - 128 heffte: c2c - FFTW - double - 128 heffte: r2c - Stock - double-long - 128 heffte: r2c - Stock - double - 128 heffte: r2c - FFTW - float - 256 heffte: r2c - FFTW - float-long - 256 heffte: r2c - Stock - float-long - 256 heffte: r2c - Stock - float - 256 heffte: c2c - Stock - double - 128 heffte: r2c - Stock - float-long - 128 a b c 11.3033 11.3175 11.5504 11.5650 20.5876 20.5913 22.3077 22.3129 22.6477 22.6468 22.4717 22.4705 40.6924 40.7358 43.9465 44.0010 10.1326 10.2063 10.4724 10.4604 80.7018 44.5940 17.5118 45.2281 60.5802 62.4184 19.0626 19.1375 137.395 20.8780 115.306 20.8630 21.2099 21.1703 21.1193 21.1409 77.4191 136.396 17.2073 16.9742 47.4095 49.7110 42.0885 42.0270 46.1778 46.0696 17.5246 118.581 11.3054 11.3006 11.5585 11.5733 20.4946 20.6314 22.2746 22.3309 22.6688 22.6361 22.4482 22.4766 40.6980 40.7427 43.9320 44.0299 10.1385 10.1193 10.4680 10.4640 77.9866 44.3283 17.3012 46.5240 59.8700 61.7571 19.1298 19.0872 133.317 20.8919 120.987 20.8895 21.1655 21.1834 21.1267 21.1237 76.8080 135.919 16.8374 16.4968 49.8414 49.3308 41.7135 41.8702 46.1111 46.1572 18.0207 116.576 11.3213 11.3353 11.5672 11.5655 20.5184 20.5596 22.2755 22.2898 22.6505 22.6527 22.4678 22.4658 40.6778 40.7257 43.9385 43.9660 10.1830 10.1553 10.4641 10.4565 77.6373 44.0131 17.3218 45.5619 60.8686 61.2619 19.1423 19.1488 133.515 20.8252 120.153 20.8501 21.1486 21.2032 21.1028 21.1643 77.2216 135.262 17.0719 17.0937 48.8921 49.1807 41.8229 41.6494 46.7987 46.2602 17.6702 120.073 OpenBenchmarking.org
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 a b c 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 11.30 11.31 11.32 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 a b c 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 11.32 11.30 11.34 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 a b c 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 11.55 11.56 11.57 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 a b c 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 11.57 11.57 11.57 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 20.59 20.49 20.52 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.07, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 20.59 20.63 20.56 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 22.31 22.27 22.28 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 22.31 22.33 22.29 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 22.65 22.67 22.65 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 22.65 22.64 22.65 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 22.47 22.45 22.47 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 a b c 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 22.47 22.48 22.47 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 a b c 9 18 27 36 45 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 40.69 40.70 40.68 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 a b c 9 18 27 36 45 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 40.74 40.74 40.73 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 a b c 10 20 30 40 50 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 43.95 43.93 43.94 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 a b c 10 20 30 40 50 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 44.00 44.03 43.97 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 a b c 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 10.13 10.14 10.18 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 a b c 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 10.21 10.12 10.16 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 a b c 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 10.47 10.47 10.46 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 a b c 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 10.46 10.46 10.46 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 128 a b c 20 40 60 80 100 SE +/- 1.13, N = 15 SE +/- 1.24, N = 15 SE +/- 2.02, N = 15 80.70 77.99 77.64 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 a b c 10 20 30 40 50 SE +/- 0.98, N = 15 SE +/- 0.46, N = 15 SE +/- 1.16, N = 12 44.59 44.33 44.01 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 a b c 4 8 12 16 20 SE +/- 0.24, N = 3 SE +/- 0.14, N = 15 SE +/- 0.13, N = 15 17.51 17.30 17.32 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 a b c 11 22 33 44 55 SE +/- 0.67, N = 15 SE +/- 0.37, N = 3 SE +/- 0.54, N = 15 45.23 46.52 45.56 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 128 a b c 14 28 42 56 70 SE +/- 0.82, N = 15 SE +/- 0.80, N = 3 SE +/- 0.69, N = 15 60.58 59.87 60.87 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 a b c 14 28 42 56 70 SE +/- 0.65, N = 15 SE +/- 0.74, N = 3 SE +/- 0.56, N = 15 62.42 61.76 61.26 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 19.06 19.13 19.14 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 19.14 19.09 19.15 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 128 a b c 30 60 90 120 150 SE +/- 1.34, N = 15 SE +/- 1.70, N = 3 SE +/- 1.45, N = 15 137.40 133.32 133.52 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 20.88 20.89 20.83 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 a b c 30 60 90 120 150 SE +/- 0.62, N = 3 SE +/- 1.24, N = 14 SE +/- 1.05, N = 15 115.31 120.99 120.15 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 20.86 20.89 20.85 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 21.21 21.17 21.15 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 21.17 21.18 21.20 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 21.12 21.13 21.10 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 a b c 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 21.14 21.12 21.16 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 a b c 20 40 60 80 100 SE +/- 1.05, N = 3 SE +/- 1.96, N = 15 SE +/- 2.22, N = 12 77.42 76.81 77.22 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 a b c 30 60 90 120 150 SE +/- 0.97, N = 12 SE +/- 1.41, N = 3 SE +/- 1.19, N = 15 136.40 135.92 135.26 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 a b c 4 8 12 16 20 SE +/- 0.24, N = 3 SE +/- 0.05, N = 3 SE +/- 0.12, N = 15 17.21 16.84 17.07 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 a b c 4 8 12 16 20 SE +/- 0.15, N = 15 SE +/- 0.02, N = 3 SE +/- 0.10, N = 3 16.97 16.50 17.09 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 a b c 11 22 33 44 55 SE +/- 0.53, N = 3 SE +/- 0.46, N = 6 SE +/- 0.58, N = 15 47.41 49.84 48.89 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 a b c 11 22 33 44 55 SE +/- 0.50, N = 3 SE +/- 0.36, N = 15 SE +/- 0.58, N = 3 49.71 49.33 49.18 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 a b c 10 20 30 40 50 SE +/- 0.13, N = 3 SE +/- 0.08, N = 3 SE +/- 0.11, N = 3 42.09 41.71 41.82 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 a b c 10 20 30 40 50 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.10, N = 3 42.03 41.87 41.65 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 a b c 11 22 33 44 55 SE +/- 0.01, N = 3 SE +/- 0.23, N = 3 SE +/- 0.14, N = 3 46.18 46.11 46.80 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 a b c 10 20 30 40 50 SE +/- 0.17, N = 3 SE +/- 0.18, N = 3 SE +/- 0.03, N = 3 46.07 46.16 46.26 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 a b c 4 8 12 16 20 SE +/- 0.22, N = 4 SE +/- 0.20, N = 4 SE +/- 0.18, N = 3 17.52 18.02 17.67 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.4 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 128 a b c 30 60 90 120 150 SE +/- 0.75, N = 3 SE +/- 0.49, N = 3 SE +/- 1.65, N = 3 118.58 116.58 120.07 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.4