ddds Tests for a future article. Intel Core i7-1280P testing with a MSI MS-14C6 (E14C6IMS.115 BIOS) and MSI Intel ADL GT2 15GB on Pop 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2306220-NE-DDDS3788647&rdt .
ddds Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution s b Intel Core i7-1280P @ 4.70GHz (14 Cores / 20 Threads) MSI MS-14C6 (E14C6IMS.115 BIOS) Intel Alder Lake PCH 16GB 1024GB Micron_3400_MTFDKBA1T0TFH MSI Intel ADL GT2 15GB (1450MHz) Realtek ALC274 Intel Alder Lake-P PCH CNVi WiFi Pop 22.04 6.2.6-76060206-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 4.6 Mesa 22.3.5 1.3.230 GCC 11.3.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x429 - Thermald 2.4.9 Python Details - Python 3.10.6 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
ddds cp2k: H20-64 cp2k: Fayalite-FIST libxsmm: 128 libxsmm: 256 libxsmm: 32 libxsmm: 64 laghos: Triple Point Problem laghos: Sedov Blast Wave, ube_922_hex.mesh heffte: c2c - FFTW - float - 128 heffte: c2c - FFTW - float - 256 heffte: r2c - FFTW - float - 128 heffte: r2c - FFTW - float - 256 heffte: c2c - FFTW - double - 128 heffte: c2c - FFTW - double - 256 heffte: c2c - Stock - float - 128 heffte: c2c - Stock - float - 256 heffte: r2c - FFTW - double - 128 heffte: r2c - FFTW - double - 256 heffte: r2c - Stock - float - 128 heffte: r2c - Stock - float - 256 heffte: c2c - Stock - double - 128 heffte: c2c - Stock - double - 256 heffte: r2c - Stock - double - 128 heffte: r2c - Stock - double - 256 heffte: c2c - FFTW - float-long - 128 heffte: c2c - FFTW - float-long - 256 heffte: r2c - FFTW - float-long - 128 heffte: r2c - FFTW - float-long - 256 heffte: c2c - FFTW - double-long - 128 heffte: c2c - FFTW - double-long - 256 heffte: c2c - Stock - float-long - 128 heffte: c2c - Stock - float-long - 256 heffte: r2c - FFTW - double-long - 128 heffte: r2c - FFTW - double-long - 256 heffte: r2c - Stock - float-long - 128 heffte: r2c - Stock - float-long - 256 heffte: c2c - Stock - double-long - 128 heffte: c2c - Stock - double-long - 256 heffte: r2c - Stock - double-long - 128 heffte: r2c - Stock - double-long - 256 palabos: 100 palabos: 400 palabos: 500 qmcpack: Li2_STO_ae qmcpack: simple-H2O qmcpack: FeCO6_b3lyp_gms qmcpack: FeCO6_b3lyp_gms mocassin: Gas HII40 mocassin: Dust 2D tau100.0 remhos: Sample Remap Example srsran: Downlink Processor Benchmark srsran: PUSCH Processor Benchmark, Throughput Total srsran: PUSCH Processor Benchmark, Throughput Thread svt-av1: Preset 4 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 12 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 4 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 12 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p ospray: particle_volume/ao/real_time ospray: particle_volume/scivis/real_time ospray: particle_volume/pathtracer/real_time ospray: gravity_spheres_volume/dim_512/ao/real_time ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/pathtracer/real_time liquid-dsp: 1 - 256 - 32 liquid-dsp: 1 - 256 - 57 liquid-dsp: 2 - 256 - 32 liquid-dsp: 2 - 256 - 57 liquid-dsp: 4 - 256 - 32 liquid-dsp: 4 - 256 - 57 liquid-dsp: 8 - 256 - 32 liquid-dsp: 8 - 256 - 57 liquid-dsp: 1 - 256 - 512 liquid-dsp: 16 - 256 - 32 liquid-dsp: 16 - 256 - 57 liquid-dsp: 2 - 256 - 512 liquid-dsp: 20 - 256 - 32 liquid-dsp: 20 - 256 - 57 liquid-dsp: 4 - 256 - 512 liquid-dsp: 8 - 256 - 512 liquid-dsp: 16 - 256 - 512 liquid-dsp: 20 - 256 - 512 gpaw: Carbon Nanotube kripke: s b 183.282 274.838 155 132 115.9 165.2 55.64 54.57 32.7562 23.7337 69.2827 43.6571 13.1156 12.4492 21.3012 23.0588 29.0664 22.4292 41.4002 44.8541 10.9079 12.2602 21.6659 23.3075 33.497 24.0358 70.3859 43.9622 13.0911 12.5828 21.4591 23.1243 29.2543 22.5498 41.2863 45.0903 11.1481 12.3561 21.2745 23.5563 42.8666 71.5688 4.01361 764.32 97.233 785.94 701.74 43.322 347.856 156.205 930.8 928.8 262.1 2.536 25.092 95.307 95.008 10.527 82.594 313.928 414.315 2.72471 2.68839 86.5083 1.38514 1.34452 1.90768 55367000 74746000 106970000 131530000 173530000 219310000 260250000 297390000 20261000 404840000 409920000 36447000 459560000 398190000 56270000 76055000 90490000 103260000 504.804 186.76 321.562 154.8 132.1 113.9 148.5 50.43 54.00 32.9737 23.7399 70.3081 43.4953 12.9978 12.3722 21.9433 22.8766 29.2268 22.3159 42.2187 44.8106 11.3257 12.1383 21.8393 23.2797 33.0029 23.863 70.4569 43.9486 13.0229 12.4748 21.4194 22.9256 29.1713 22.4287 41.4403 45.2388 10.9293 12.1257 21.6826 23.3425 41.8292 71.7773 3.59061 769.69 95.648 821.04 700 42.805 343.614 149.627 939 937.4 265.8 2.538 24.191 95.639 95.367 9.831 82.226 308.174 422.121 2.72216 2.68738 85.8189 1.37949 1.33905 1.8824 55376000 76920000 106820000 130780000 172210000 215430000 265880000 304170000 19701000 398890000 408980000 36484000 460350000 399280000 56254000 73590000 94755000 104360000 526.532 OpenBenchmarking.org
CP2K Molecular Dynamics Input: H20-64 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2023.1 Input: H20-64 s b 40 80 120 160 200 183.28 186.76 1. (F9X) gfortran options: -fopenmp -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kdbm -lcp2kgrid -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -lhdf5 -lhdf5_hl -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -lopenblas -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2023.1 Input: Fayalite-FIST s b 70 140 210 280 350 274.84 321.56 1. (F9X) gfortran options: -fopenmp -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kdbm -lcp2kgrid -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -lhdf5 -lhdf5_hl -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -lopenblas -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 s b 30 60 90 120 150 155.0 154.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 s b 30 60 90 120 150 132.0 132.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 s b 30 60 90 120 150 115.9 113.9 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 s b 40 80 120 160 200 165.2 148.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem s b 12 24 36 48 60 55.64 50.43 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh s b 12 24 36 48 60 54.57 54.00 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 s b 8 16 24 32 40 32.76 32.97 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 s b 6 12 18 24 30 23.73 23.74 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 s b 16 32 48 64 80 69.28 70.31 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 s b 10 20 30 40 50 43.66 43.50 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 s b 3 6 9 12 15 13.12 13.00 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 s b 3 6 9 12 15 12.45 12.37 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 s b 5 10 15 20 25 21.30 21.94 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 s b 6 12 18 24 30 23.06 22.88 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 s b 7 14 21 28 35 29.07 29.23 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 s b 5 10 15 20 25 22.43 22.32 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 s b 10 20 30 40 50 41.40 42.22 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 s b 10 20 30 40 50 44.85 44.81 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 s b 3 6 9 12 15 10.91 11.33 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 s b 3 6 9 12 15 12.26 12.14 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 s b 5 10 15 20 25 21.67 21.84 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 s b 6 12 18 24 30 23.31 23.28 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 128 s b 8 16 24 32 40 33.50 33.00 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 s b 6 12 18 24 30 24.04 23.86 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 128 s b 16 32 48 64 80 70.39 70.46 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 s b 10 20 30 40 50 43.96 43.95 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 s b 3 6 9 12 15 13.09 13.02 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 s b 3 6 9 12 15 12.58 12.47 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 128 s b 5 10 15 20 25 21.46 21.42 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 s b 6 12 18 24 30 23.12 22.93 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 s b 7 14 21 28 35 29.25 29.17 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 s b 5 10 15 20 25 22.55 22.43 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 128 s b 9 18 27 36 45 41.29 41.44 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 s b 10 20 30 40 50 45.09 45.24 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 s b 3 6 9 12 15 11.15 10.93 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 s b 3 6 9 12 15 12.36 12.13 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 s b 5 10 15 20 25 21.27 21.68 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 s b 6 12 18 24 30 23.56 23.34 1. (CXX) g++ options: -O3
Palabos Grid Size: 100 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 100 s b 10 20 30 40 50 42.87 41.83 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
Palabos Grid Size: 400 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 400 s b 16 32 48 64 80 71.57 71.78 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
Palabos Grid Size: 500 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 500 s b 0.9031 1.8062 2.7093 3.6124 4.5155 4.01361 3.59061 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
QMCPACK Input: Li2_STO_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: Li2_STO_ae s b 170 340 510 680 850 764.32 769.69 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: simple-H2O s b 20 40 60 80 100 97.23 95.65 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: FeCO6_b3lyp_gms s b 200 400 600 800 1000 785.94 821.04 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: FeCO6_b3lyp_gms s b 150 300 450 600 750 701.74 700.00 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Monte Carlo Simulations of Ionised Nebulae Input: Gas HII40 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2.02.73.3 Input: Gas HII40 s b 10 20 30 40 50 43.32 42.81 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O2 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lz
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2.02.73.3 Input: Dust 2D tau100.0 s b 80 160 240 320 400 347.86 343.61 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O2 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lz
Remhos Test: Sample Remap Example OpenBenchmarking.org Seconds, Fewer Is Better Remhos 1.0 Test: Sample Remap Example s b 30 60 90 120 150 156.21 149.63 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
srsRAN Project Test: Downlink Processor Benchmark OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: Downlink Processor Benchmark s b 200 400 600 800 1000 930.8 939.0 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: PUSCH Processor Benchmark, Throughput Total s b 200 400 600 800 1000 928.8 937.4 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: PUSCH Processor Benchmark, Throughput Thread s b 60 120 180 240 300 262.1 265.8 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
SVT-AV1 Encoder Mode: Preset 4 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 4 - Input: Bosphorus 4K s b 0.5711 1.1422 1.7133 2.2844 2.8555 2.536 2.538 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 8 - Input: Bosphorus 4K s b 6 12 18 24 30 25.09 24.19 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 12 - Input: Bosphorus 4K s b 20 40 60 80 100 95.31 95.64 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 13 - Input: Bosphorus 4K s b 20 40 60 80 100 95.01 95.37 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 4 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 4 - Input: Bosphorus 1080p s b 3 6 9 12 15 10.527 9.831 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 8 - Input: Bosphorus 1080p s b 20 40 60 80 100 82.59 82.23 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 12 - Input: Bosphorus 1080p s b 70 140 210 280 350 313.93 308.17 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 13 - Input: Bosphorus 1080p s b 90 180 270 360 450 414.32 422.12 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time s b 0.6131 1.2262 1.8393 2.4524 3.0655 2.72471 2.72216
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time s b 0.6049 1.2098 1.8147 2.4196 3.0245 2.68839 2.68738
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time s b 20 40 60 80 100 86.51 85.82
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time s b 0.3117 0.6234 0.9351 1.2468 1.5585 1.38514 1.37949
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time s b 0.3025 0.605 0.9075 1.21 1.5125 1.34452 1.33905
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time s b 0.4292 0.8584 1.2876 1.7168 2.146 1.90768 1.88240
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 s b 12M 24M 36M 48M 60M 55367000 55376000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 57 s b 16M 32M 48M 64M 80M 74746000 76920000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 2 - Buffer Length: 256 - Filter Length: 32 s b 20M 40M 60M 80M 100M 106970000 106820000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 2 - Buffer Length: 256 - Filter Length: 57 s b 30M 60M 90M 120M 150M 131530000 130780000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 4 - Buffer Length: 256 - Filter Length: 32 s b 40M 80M 120M 160M 200M 173530000 172210000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 4 - Buffer Length: 256 - Filter Length: 57 s b 50M 100M 150M 200M 250M 219310000 215430000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 8 - Buffer Length: 256 - Filter Length: 32 s b 60M 120M 180M 240M 300M 260250000 265880000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 8 - Buffer Length: 256 - Filter Length: 57 s b 70M 140M 210M 280M 350M 297390000 304170000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 512 s b 4M 8M 12M 16M 20M 20261000 19701000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 32 s b 90M 180M 270M 360M 450M 404840000 398890000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 57 s b 90M 180M 270M 360M 450M 409920000 408980000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 2 - Buffer Length: 256 - Filter Length: 512 s b 8M 16M 24M 32M 40M 36447000 36484000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 20 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 20 - Buffer Length: 256 - Filter Length: 32 s b 100M 200M 300M 400M 500M 459560000 460350000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 20 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 20 - Buffer Length: 256 - Filter Length: 57 s b 90M 180M 270M 360M 450M 398190000 399280000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 4 - Buffer Length: 256 - Filter Length: 512 s b 12M 24M 36M 48M 60M 56270000 56254000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 8 - Buffer Length: 256 - Filter Length: 512 s b 16M 32M 48M 64M 80M 76055000 73590000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 512 s b 20M 40M 60M 80M 100M 90490000 94755000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 20 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 20 - Buffer Length: 256 - Filter Length: 512 s b 20M 40M 60M 80M 100M 103260000 104360000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
GPAW Input: Carbon Nanotube OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube s b 110 220 330 440 550 504.80 526.53 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Phoronix Test Suite v10.8.5