EPYC 9684X 1P Tests for a future article. AMD EPYC 9684X 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2307209-NE-EPYC9684X51&sor&grr .
EPYC 9684X 1P Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution EPYC 9684X AMD 9684X AMD EPYC 9684X 96-Core @ 2.55GHz (96 Cores / 192 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 768GB 2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007 ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 22.04 5.19.0-41-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 1.3.224 GCC 11.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa101121 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
EPYC 9684X 1P hpcg: 192 192 192 - 60 hpcg: 160 160 160 - 60 hpcg: 144 144 144 - 60 libxsmm: 128 incompact3d: X3D-benchmarking input.i3d openfoam: drivaerFastback, Medium Mesh Size - Execution Time openfoam: drivaerFastback, Medium Mesh Size - Mesh Time askap: tConvolve MT - Degridding askap: tConvolve MT - Gridding hpcg: 104 104 104 - 60 libxsmm: 256 blender: Barbershop - CPU-Only openfoam: drivaerFastback, Small Mesh Size - Execution Time openfoam: drivaerFastback, Small Mesh Size - Mesh Time blender: Pabellon Barcelona - CPU-Only blender: Classroom - CPU-Only stress-ng: CPU Cache stress-ng: Vector Shuffle stress-ng: Wide Vector Math stress-ng: CPU Stress stress-ng: Vector Math stress-ng: Vector Floating Point stress-ng: Matrix Math askap: tConvolve MPI - Gridding askap: tConvolve MPI - Degridding heffte: c2c - FFTW - double - 512 heffte: c2c - Stock - double - 512 heffte: c2c - FFTW - double-long - 512 heffte: c2c - Stock - double-long - 512 gromacs: MPI CPU - water_GMX50_bare blender: Fishy Cat - CPU-Only blender: BMW27 - CPU-Only namd: ATPase Simulation - 327,506 Atoms lulesh: heffte: r2c - FFTW - double-long - 512 heffte: r2c - FFTW - double - 512 xmrig: Monero - 1M embree: Pathtracer - Asian Dragon Obj heffte: r2c - Stock - double - 512 heffte: r2c - Stock - double-long - 512 embree: Pathtracer ISPC - Asian Dragon Obj xmrig: Wownero - 1M heffte: c2c - Stock - float-long - 512 heffte: c2c - Stock - float - 512 npb: EP.D heffte: c2c - FFTW - float - 512 heffte: c2c - FFTW - float-long - 512 askap: Hogbom Clean OpenMP astcenc: Exhaustive npb: BT.C minife: Small incompact3d: input.i3d 193 Cells Per Direction npb: IS.D astcenc: Thorough npb: SP.C npb: LU.C heffte: r2c - FFTW - float-long - 512 heffte: r2c - FFTW - float - 512 astcenc: Fast heffte: r2c - Stock - float-long - 512 heffte: r2c - Stock - float - 512 embree: Pathtracer - Crown askap: tConvolve OpenMP - Degridding askap: tConvolve OpenMP - Gridding embree: Pathtracer ISPC - Crown embree: Pathtracer - Asian Dragon embree: Pathtracer ISPC - Asian Dragon libxsmm: 64 libxsmm: 32 astcenc: Medium npb: FT.C heffte: c2c - Stock - double-long - 256 heffte: c2c - Stock - double - 256 heffte: c2c - FFTW - double - 256 heffte: c2c - FFTW - double-long - 256 npb: CG.C incompact3d: input.i3d 129 Cells Per Direction npb: SP.B npb: MG.C heffte: c2c - Stock - float - 256 heffte: r2c - Stock - double-long - 256 heffte: r2c - Stock - double - 256 heffte: c2c - FFTW - float - 256 heffte: r2c - FFTW - double-long - 256 heffte: c2c - Stock - float-long - 256 heffte: c2c - FFTW - float-long - 256 heffte: r2c - FFTW - double - 256 npb: EP.C heffte: r2c - Stock - float-long - 256 heffte: r2c - FFTW - float - 256 heffte: r2c - FFTW - float-long - 256 heffte: r2c - Stock - float - 256 heffte: c2c - Stock - double - 128 heffte: c2c - Stock - double-long - 128 heffte: c2c - FFTW - double-long - 128 heffte: c2c - FFTW - double - 128 heffte: r2c - FFTW - double - 128 heffte: r2c - Stock - double - 128 heffte: c2c - Stock - float - 128 heffte: c2c - FFTW - float-long - 128 heffte: r2c - FFTW - double-long - 128 heffte: c2c - FFTW - float - 128 heffte: r2c - Stock - double-long - 128 heffte: c2c - Stock - float-long - 128 heffte: r2c - Stock - float-long - 128 heffte: r2c - FFTW - float-long - 128 heffte: r2c - FFTW - float - 128 heffte: r2c - Stock - float - 128 EPYC 9684X AMD 9684X 22.7211 22.8174 23.546 3119.7 377.588806 184.45409 107.493 15691 13617.8 34.355 3177.4 142.1 28.957931 23.301871 49.42 40.48 1373344.98 63766.48 3480531.79 213036.81 545786.37 256936.31 418126.43 73226.7 59410.3 68.1627 68.8242 68.445 68.8118 11.801 20.61 16.26 0.25068 30858.462 135.542 136.458 69478.2 114.1719 144.622 145.496 122.0413 74195 148.562 149.982 10620.59 154.311 155.713 1204.82 6.1412 305166.4 53878.4 7.65203524 5839.25 56.8898 211454.6 340754.87 332.908 336.451 1019.0656 341.923 343.723 110.6946 66564 26625.6 117.0799 126.2454 142.2792 2479.3 1299.8 421.4357 118915.97 83.5304 83.8913 82.5599 89.7978 62397.22 2.24493504 170601.48 141081.08 170.734 175.594 174 177.706 184.786 177.123 182.81 194.017 7839.23 327.355 315.15 318.534 331.609 70.1221 68.6498 84.3653 81.8473 125.241 115.816 110.016 126.301 129.282 130.752 115.339 110.998 176.529 184.354 186.051 178.478 22.4173 22.7753 22.8476 3091.5 388.701202 185.1475 107.82577 15691 13606.9 23.1377 3131.5 141.96 29.259894 23.166335 49.5 40.4 1426891.13 63760.46 3484985.88 214798.32 545747.29 254718.1 418120.08 74970.2 58310.1 68.2756 68.3745 68.6279 68.6366 11.798 20.44 16.46 0.24623 31065.289 136.27 135.556 69671.8 115.0435 143.312 143.534 122.3101 74288.7 149.973 149.163 10476.83 155.387 155.664 1204.82 6.1406 314982.58 53906.3 7.65393496 5572.16 56.8975 208198.13 335538.3 327.73 340.748 1009.8949 336.837 339.165 109.6474 53251.2 29584 117.2488 126.4095 142.2995 2472.2 1300.7 421.9692 121389.76 82.8587 82.8517 85.1501 82.3926 58099.4 1.96318495 169864.87 132064.96 167.193 182.393 176.447 176.295 184.654 178.56 179.544 189.443 7952 328.778 312.386 332.764 321.42 68.8689 70.2296 76.337 78.4937 125.7 116.285 111.535 130.477 124.932 128.608 116.264 114.309 176.556 187.191 188.224 175.82 OpenBenchmarking.org
High Performance Conjugate Gradient X Y Z: 192 192 192 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 192 192 192 - RT: 60 EPYC 9684X AMD 9684X 5 10 15 20 25 22.72 22.42 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 EPYC 9684X AMD 9684X 5 10 15 20 25 22.82 22.78 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 EPYC 9684X AMD 9684X 6 12 18 24 30 23.55 22.85 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 EPYC 9684X AMD 9684X 700 1400 2100 2800 3500 3119.7 3091.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Xcompact3d Incompact3d Input: X3D-benchmarking input.i3d OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d EPYC 9684X AMD 9684X 80 160 240 320 400 377.59 388.70 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Execution Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time EPYC 9684X AMD 9684X 40 80 120 160 200 184.45 185.15 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Mesh Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time EPYC 9684X AMD 9684X 20 40 60 80 100 107.49 107.83 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding AMD 9684X EPYC 9684X 3K 6K 9K 12K 15K 15691 15691 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding EPYC 9684X AMD 9684X 3K 6K 9K 12K 15K 13617.8 13606.9 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 EPYC 9684X AMD 9684X 8 16 24 32 40 34.36 23.14 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 EPYC 9684X AMD 9684X 700 1400 2100 2800 3500 3177.4 3131.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: CPU-Only AMD 9684X EPYC 9684X 30 60 90 120 150 141.96 142.10
OpenFOAM Input: drivaerFastback, Small Mesh Size - Execution Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Execution Time EPYC 9684X AMD 9684X 7 14 21 28 35 28.96 29.26 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenFOAM Input: drivaerFastback, Small Mesh Size - Mesh Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Mesh Time AMD 9684X EPYC 9684X 6 12 18 24 30 23.17 23.30 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only EPYC 9684X AMD 9684X 11 22 33 44 55 49.42 49.50
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: CPU-Only AMD 9684X EPYC 9684X 9 18 27 36 45 40.40 40.48
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: CPU Cache AMD 9684X EPYC 9684X 300K 600K 900K 1200K 1500K 1426891.13 1373344.98 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Vector Shuffle OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Shuffle EPYC 9684X AMD 9684X 14K 28K 42K 56K 70K 63766.48 63760.46 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Wide Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Wide Vector Math AMD 9684X EPYC 9684X 700K 1400K 2100K 2800K 3500K 3484985.88 3480531.79 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: CPU Stress OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: CPU Stress AMD 9684X EPYC 9684X 50K 100K 150K 200K 250K 214798.32 213036.81 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Math EPYC 9684X AMD 9684X 120K 240K 360K 480K 600K 545786.37 545747.29 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Vector Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Floating Point EPYC 9684X AMD 9684X 60K 120K 180K 240K 300K 256936.31 254718.10 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Matrix Math EPYC 9684X AMD 9684X 90K 180K 270K 360K 450K 418126.43 418120.08 1. (CXX) g++ options: -O2 -std=gnu99 -lc
ASKAP Test: tConvolve MPI - Gridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding AMD 9684X EPYC 9684X 16K 32K 48K 64K 80K 74970.2 73226.7 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Degridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding EPYC 9684X AMD 9684X 13K 26K 39K 52K 65K 59410.3 58310.1 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 AMD 9684X EPYC 9684X 15 30 45 60 75 68.28 68.16 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 EPYC 9684X AMD 9684X 15 30 45 60 75 68.82 68.37 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 AMD 9684X EPYC 9684X 15 30 45 60 75 68.63 68.45 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 EPYC 9684X AMD 9684X 15 30 45 60 75 68.81 68.64 1. (CXX) g++ options: -O3
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2023 Implementation: MPI CPU - Input: water_GMX50_bare EPYC 9684X AMD 9684X 3 6 9 12 15 11.80 11.80 1. (CXX) g++ options: -O3
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only AMD 9684X EPYC 9684X 5 10 15 20 25 20.44 20.61
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only EPYC 9684X AMD 9684X 4 8 12 16 20 16.26 16.46
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms AMD 9684X EPYC 9684X 0.0564 0.1128 0.1692 0.2256 0.282 0.24623 0.25068
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 AMD 9684X EPYC 9684X 7K 14K 21K 28K 35K 31065.29 30858.46 1. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 AMD 9684X EPYC 9684X 30 60 90 120 150 136.27 135.54 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 EPYC 9684X AMD 9684X 30 60 90 120 150 136.46 135.56 1. (CXX) g++ options: -O3
Xmrig Variant: Monero - Hash Count: 1M OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Monero - Hash Count: 1M AMD 9684X EPYC 9684X 15K 30K 45K 60K 75K 69671.8 69478.2 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Embree Binary: Pathtracer - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Asian Dragon Obj AMD 9684X EPYC 9684X 30 60 90 120 150 115.04 114.17 MIN: 113.62 / MAX: 117.34 MIN: 112.65 / MAX: 116.39
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 EPYC 9684X AMD 9684X 30 60 90 120 150 144.62 143.31 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 EPYC 9684X AMD 9684X 30 60 90 120 150 145.50 143.53 1. (CXX) g++ options: -O3
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon Obj AMD 9684X EPYC 9684X 30 60 90 120 150 122.31 122.04 MIN: 120.61 / MAX: 124.64 MIN: 120.36 / MAX: 124.72
Xmrig Variant: Wownero - Hash Count: 1M OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Wownero - Hash Count: 1M AMD 9684X EPYC 9684X 16K 32K 48K 64K 80K 74288.7 74195.0 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 AMD 9684X EPYC 9684X 30 60 90 120 150 149.97 148.56 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 EPYC 9684X AMD 9684X 30 60 90 120 150 149.98 149.16 1. (CXX) g++ options: -O3
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D EPYC 9684X AMD 9684X 2K 4K 6K 8K 10K 10620.59 10476.83 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 AMD 9684X EPYC 9684X 30 60 90 120 150 155.39 154.31 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 EPYC 9684X AMD 9684X 30 60 90 120 150 155.71 155.66 1. (CXX) g++ options: -O3
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP AMD 9684X EPYC 9684X 300 600 900 1200 1500 1204.82 1204.82 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive EPYC 9684X AMD 9684X 2 4 6 8 10 6.1412 6.1406 1. (CXX) g++ options: -O3 -flto -pthread
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C AMD 9684X EPYC 9684X 70K 140K 210K 280K 350K 314982.58 305166.40 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small AMD 9684X EPYC 9684X 12K 24K 36K 48K 60K 53906.3 53878.4 1. (CXX) g++ options: -O3 -fopenmp -lmpi_cxx -lmpi
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction EPYC 9684X AMD 9684X 2 4 6 8 10 7.65203524 7.65393496 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D EPYC 9684X AMD 9684X 1300 2600 3900 5200 6500 5839.25 5572.16 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
ASTC Encoder Preset: Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough AMD 9684X EPYC 9684X 13 26 39 52 65 56.90 56.89 1. (CXX) g++ options: -O3 -flto -pthread
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C EPYC 9684X AMD 9684X 50K 100K 150K 200K 250K 211454.60 208198.13 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
NAS Parallel Benchmarks Test / Class: LU.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C EPYC 9684X AMD 9684X 70K 140K 210K 280K 350K 340754.87 335538.30 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 EPYC 9684X AMD 9684X 70 140 210 280 350 332.91 327.73 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 AMD 9684X EPYC 9684X 70 140 210 280 350 340.75 336.45 1. (CXX) g++ options: -O3
ASTC Encoder Preset: Fast OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Fast EPYC 9684X AMD 9684X 200 400 600 800 1000 1019.07 1009.89 1. (CXX) g++ options: -O3 -flto -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 EPYC 9684X AMD 9684X 70 140 210 280 350 341.92 336.84 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 EPYC 9684X AMD 9684X 70 140 210 280 350 343.72 339.17 1. (CXX) g++ options: -O3
Embree Binary: Pathtracer - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Crown EPYC 9684X AMD 9684X 20 40 60 80 100 110.69 109.65 MIN: 108.41 / MAX: 113.64 MIN: 107.65 / MAX: 113.51
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding EPYC 9684X AMD 9684X 14K 28K 42K 56K 70K 66564.0 53251.2 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding AMD 9684X EPYC 9684X 6K 12K 18K 24K 30K 29584.0 26625.6 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Crown AMD 9684X EPYC 9684X 30 60 90 120 150 117.25 117.08 MIN: 114.61 / MAX: 121.35 MIN: 114.57 / MAX: 120.79
Embree Binary: Pathtracer - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Asian Dragon AMD 9684X EPYC 9684X 30 60 90 120 150 126.41 126.25 MIN: 124.93 / MAX: 128.49 MIN: 124.97 / MAX: 128.11
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon AMD 9684X EPYC 9684X 30 60 90 120 150 142.30 142.28 MIN: 140.74 / MAX: 144.53 MIN: 140.41 / MAX: 144.57
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 EPYC 9684X AMD 9684X 500 1000 1500 2000 2500 2479.3 2472.2 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 AMD 9684X EPYC 9684X 300 600 900 1200 1500 1300.7 1299.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
ASTC Encoder Preset: Medium OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Medium AMD 9684X EPYC 9684X 90 180 270 360 450 421.97 421.44 1. (CXX) g++ options: -O3 -flto -pthread
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C AMD 9684X EPYC 9684X 30K 60K 90K 120K 150K 121389.76 118915.97 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 EPYC 9684X AMD 9684X 20 40 60 80 100 83.53 82.86 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 EPYC 9684X AMD 9684X 20 40 60 80 100 83.89 82.85 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 AMD 9684X EPYC 9684X 20 40 60 80 100 85.15 82.56 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 EPYC 9684X AMD 9684X 20 40 60 80 100 89.80 82.39 1. (CXX) g++ options: -O3
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C EPYC 9684X AMD 9684X 13K 26K 39K 52K 65K 62397.22 58099.40 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction AMD 9684X EPYC 9684X 0.5051 1.0102 1.5153 2.0204 2.5255 1.96318495 2.24493504 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
NAS Parallel Benchmarks Test / Class: SP.B OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B EPYC 9684X AMD 9684X 40K 80K 120K 160K 200K 170601.48 169864.87 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C EPYC 9684X AMD 9684X 30K 60K 90K 120K 150K 141081.08 132064.96 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 EPYC 9684X AMD 9684X 40 80 120 160 200 170.73 167.19 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 AMD 9684X EPYC 9684X 40 80 120 160 200 182.39 175.59 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 AMD 9684X EPYC 9684X 40 80 120 160 200 176.45 174.00 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 EPYC 9684X AMD 9684X 40 80 120 160 200 177.71 176.30 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 EPYC 9684X AMD 9684X 40 80 120 160 200 184.79 184.65 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 AMD 9684X EPYC 9684X 40 80 120 160 200 178.56 177.12 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 EPYC 9684X AMD 9684X 40 80 120 160 200 182.81 179.54 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 EPYC 9684X AMD 9684X 40 80 120 160 200 194.02 189.44 1. (CXX) g++ options: -O3
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C AMD 9684X EPYC 9684X 2K 4K 6K 8K 10K 7952.00 7839.23 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 AMD 9684X EPYC 9684X 70 140 210 280 350 328.78 327.36 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 EPYC 9684X AMD 9684X 70 140 210 280 350 315.15 312.39 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 AMD 9684X EPYC 9684X 70 140 210 280 350 332.76 318.53 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 EPYC 9684X AMD 9684X 70 140 210 280 350 331.61 321.42 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 EPYC 9684X AMD 9684X 16 32 48 64 80 70.12 68.87 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 AMD 9684X EPYC 9684X 16 32 48 64 80 70.23 68.65 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 EPYC 9684X AMD 9684X 20 40 60 80 100 84.37 76.34 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 EPYC 9684X AMD 9684X 20 40 60 80 100 81.85 78.49 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 AMD 9684X EPYC 9684X 30 60 90 120 150 125.70 125.24 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 AMD 9684X EPYC 9684X 30 60 90 120 150 116.29 115.82 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 AMD 9684X EPYC 9684X 20 40 60 80 100 111.54 110.02 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 128 AMD 9684X EPYC 9684X 30 60 90 120 150 130.48 126.30 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 EPYC 9684X AMD 9684X 30 60 90 120 150 129.28 124.93 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 EPYC 9684X AMD 9684X 30 60 90 120 150 130.75 128.61 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 AMD 9684X EPYC 9684X 30 60 90 120 150 116.26 115.34 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 128 AMD 9684X EPYC 9684X 30 60 90 120 150 114.31 111.00 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 128 AMD 9684X EPYC 9684X 40 80 120 160 200 176.56 176.53 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 128 AMD 9684X EPYC 9684X 40 80 120 160 200 187.19 184.35 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 AMD 9684X EPYC 9684X 40 80 120 160 200 188.22 186.05 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 EPYC 9684X AMD 9684X 40 80 120 160 200 178.48 175.82 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5