AMD Ryzen Threadripper PRO 7995WX 96-Cores testing of NPS/SNC settings with default (disabled), SNC2, and SNC4 modes. Benchmarks by Michael Larabel for a future article.
Default - Disabled Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105OpenCL Notes: GPU Compute Cores: 6144Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
SNC2 SNC4 Processor: AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads), Motherboard: HP 8B24 (U65 Ver. 01.01.04 BIOS), Chipset: AMD Device 14a4, Memory: 128GB, Disk: 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1, Graphics: NVIDIA RTX A4000 16GB, Audio: NVIDIA GA104 HD Audio, Monitor: ASUS VP28U, Network: Realtek RTL8111/8168/8411
OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160
OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating SNC2 SNC4 Default - Disabled 140K 280K 420K 560K 700K SE +/- 11076.26, N = 3 SE +/- 5454.96, N = 3 SE +/- 364.51, N = 3 640928 649120 655203 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
Algebraic Multi-Grid Benchmark AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 Default - Disabled SNC2 SNC4 400M 800M 1200M 1600M 2000M SE +/- 1222576.38, N = 3 SE +/- 5512706.85, N = 3 SE +/- 14033805.07, N = 3 1662894000 1732149333 1773758333 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding SNC4 SNC2 Default - Disabled 2K 4K 6K 8K 10K SE +/- 78.17, N = 3 SE +/- 5.09, N = 3 SE +/- 13.20, N = 3 7467.14 8223.25 8655.84 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding SNC4 SNC2 Default - Disabled 3K 6K 9K 12K 15K SE +/- 138.85, N = 3 SE +/- 13.79, N = 3 SE +/- 75.70, N = 3 10522.5 11209.2 11871.2 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding SNC4 Default - Disabled SNC2 9K 18K 27K 36K 45K SE +/- 691.76, N = 15 SE +/- 170.33, N = 3 SE +/- 463.37, N = 3 34358.5 40198.3 40552.4 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding SNC4 SNC2 Default - Disabled 9K 18K 27K 36K 45K SE +/- 743.57, N = 15 SE +/- 341.22, N = 3 SE +/- 199.70, N = 3 35893.7 43138.9 43532.9 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding SNC4 SNC2 Default - Disabled 4K 8K 12K 16K 20K SE +/- 54.12, N = 3 SE +/- 122.89, N = 15 SE +/- 0.00, N = 3 6602.28 12083.70 19018.30 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding SNC4 SNC2 Default - Disabled 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 SE +/- 100.74, N = 15 7607.31 14122.50 20481.20 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP SNC4 SNC2 Default - Disabled 200 400 600 800 1000 SE +/- 1.54, N = 3 SE +/- 1.84, N = 3 SE +/- 4.25, N = 3 325.39 564.98 1127.85 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: CPU-Only SNC2 SNC4 Default - Disabled 4 8 12 16 20 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 15.32 15.31 15.25
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: CPU-Only SNC2 SNC4 Default - Disabled 9 18 27 36 45 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 38.33 38.24 38.02
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: CPU-Only SNC2 SNC4 Default - Disabled 5 10 15 20 25 SE +/- 0.20, N = 3 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 19.76 19.71 19.47
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: CPU-Only Default - Disabled SNC4 SNC2 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.23, N = 3 SE +/- 0.26, N = 3 136.29 136.28 136.20
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: CPU-Only SNC2 SNC4 Default - Disabled 11 22 33 44 55 SE +/- 0.23, N = 3 SE +/- 0.29, N = 3 SE +/- 0.20, N = 3 46.97 46.88 46.77
ClickHouse ClickHouse is an open-source, high performance OLAP data management system. This test profile uses ClickHouse's standard benchmark recommendations per https://clickhouse.com/docs/en/operations/performance-test/ / https://github.com/ClickHouse/ClickBench/tree/main/clickhouse with the 100 million rows web analytics dataset. The reported value is the query processing time using the geometric mean of all separate queries performed as an aggregate. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, First Run / Cold Cache SNC4 SNC2 Default - Disabled 100 200 300 400 500 SE +/- 10.37, N = 12 SE +/- 8.46, N = 12 SE +/- 4.35, N = 3 388.42 417.30 457.06 MIN: 30.26 / MAX: 6000 MIN: 32.68 / MAX: 6000 MIN: 47.36 / MAX: 4285.71
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Second Run SNC4 SNC2 Default - Disabled 110 220 330 440 550 SE +/- 10.89, N = 12 SE +/- 8.80, N = 12 SE +/- 6.17, N = 3 397.01 438.54 490.52 MIN: 39.66 / MAX: 6000 MIN: 35.21 / MAX: 6666.67 MIN: 34.27 / MAX: 4615.38
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Third Run SNC4 SNC2 Default - Disabled 110 220 330 440 550 SE +/- 10.68, N = 12 SE +/- 8.91, N = 12 SE +/- 5.43, N = 3 400.96 444.12 504.59 MIN: 41.1 / MAX: 6000 MIN: 48.62 / MAX: 7500 MIN: 58.03 / MAX: 3750
CloverLeaf CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf 1.3 Input: clover_bm SNC2 Default - Disabled SNC4 3 6 9 12 15 SE +/- 0.09, N = 15 SE +/- 0.07, N = 14 SE +/- 0.07, N = 3 11.82 10.96 10.95 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf 1.3 Input: clover_bm16 SNC4 SNC2 Default - Disabled 90 180 270 360 450 SE +/- 1.41, N = 3 SE +/- 1.55, N = 3 SE +/- 0.23, N = 3 396.53 355.04 329.61 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf 1.3 Input: clover_bm64_short SNC4 SNC2 Default - Disabled 10 20 30 40 50 SE +/- 0.08, N = 3 SE +/- 0.26, N = 3 SE +/- 0.01, N = 3 42.00 41.21 39.32 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
easyWave The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 SNC4 SNC2 Default - Disabled 7 14 21 28 35 SE +/- 0.57, N = 12 SE +/- 0.46, N = 15 SE +/- 0.32, N = 15 28.74 25.48 23.99 1. (CXX) g++ options: -O3 -fopenmp
OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 SNC4 SNC2 Default - Disabled 20 40 60 80 100 SE +/- 3.38, N = 12 SE +/- 0.49, N = 3 SE +/- 0.47, N = 9 75.17 59.25 58.65 1. (CXX) g++ options: -O3 -fopenmp
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Crown SNC2 SNC4 Default - Disabled 20 40 60 80 100 SE +/- 0.53, N = 3 SE +/- 0.40, N = 3 SE +/- 0.46, N = 3 106.26 106.43 108.08 MIN: 102.99 / MAX: 115.22 MIN: 103.16 / MAX: 113.77 MIN: 105.13 / MAX: 117.76
OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 1.43, N = 4 SE +/- 0.56, N = 3 SE +/- 0.63, N = 3 122.59 128.01 129.17 MIN: 115.06 / MAX: 129.69 MIN: 125.3 / MAX: 131.84 MIN: 126.31 / MAX: 136
OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon Obj SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 0.54, N = 3 SE +/- 0.58, N = 3 SE +/- 0.59, N = 3 104.80 111.36 112.83 MIN: 100.15 / MAX: 112.2 MIN: 108.45 / MAX: 114.81 MIN: 110.06 / MAX: 116.96
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube Default - Disabled SNC2 SNC4 9 18 27 36 45 SE +/- 0.34, N = 3 SE +/- 0.23, N = 3 SE +/- 0.25, N = 3 38.13 37.80 37.16 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 Default - Disabled SNC2 SNC4 200M 400M 600M 800M 1000M 727662000 1015300000 1056980000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 Default - Disabled SNC2 SNC4 200M 400M 600M 800M 1000M 756165000 1069560000 1104900000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 Default - Disabled SNC2 SNC4 80M 160M 240M 320M 400M 357127000 389235000 390682000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 Default - Disabled SNC2 SNC4 110M 220M 330M 440M 550M 462057000 522187000 527864000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2023 Implementation: MPI CPU - Input: water_GMX50_bare Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.38, N = 9 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 10.38 10.45 10.66 1. (CXX) g++ options: -O3
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: WPA PSK SNC4 SNC2 Default - Disabled 130K 260K 390K 520K 650K SE +/- 4063.87, N = 3 SE +/- 4996.06, N = 3 SE +/- 2146.03, N = 3 596180 611669 614263 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: Blowfish SNC4 SNC2 Default - Disabled 40K 80K 120K 160K 200K SE +/- 1944.34, N = 15 SE +/- 1828.86, N = 12 SE +/- 66.40, N = 3 161790 169287 173145 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: HMAC-SHA512 SNC4 SNC2 Default - Disabled 60M 120M 180M 240M 300M SE +/- 2535178.67, N = 12 SE +/- 1930805.92, N = 15 SE +/- 1312178.38, N = 3 287955500 289221400 296518667 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: MD5 SNC4 SNC2 Default - Disabled 3M 6M 9M 12M 15M SE +/- 174193.35, N = 15 SE +/- 193319.26, N = 15 SE +/- 43978.53, N = 3 13419000 13888800 14600667 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
libxsmm Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 SNC4 SNC2 Default - Disabled 400 800 1200 1600 2000 SE +/- 11.98, N = 3 SE +/- 0.82, N = 3 SE +/- 2.22, N = 3 2004.7 2026.2 2043.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 Default - Disabled SNC2 SNC4 600 1200 1800 2400 3000 SE +/- 18.22, N = 3 SE +/- 29.68, N = 3 SE +/- 21.00, N = 3 2564.6 2583.6 2599.9 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 SNC2 Default - Disabled SNC4 120 240 360 480 600 SE +/- 2.72, N = 3 SE +/- 0.58, N = 3 SE +/- 1.96, N = 3 553.3 555.6 559.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 SNC2 Default - Disabled SNC4 200 400 600 800 1000 SE +/- 2.33, N = 3 SE +/- 0.53, N = 3 SE +/- 1.09, N = 3 1052.3 1055.4 1059.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 Default - Disabled SNC2 SNC4 300M 600M 900M 1200M 1500M SE +/- 1937638.88, N = 3 SE +/- 5024716.69, N = 3 SE +/- 3769320.60, N = 3 1435333333 1436433333 1440933333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 SNC4 Default - Disabled SNC2 400M 800M 1200M 1600M 2000M SE +/- 6406333.67, N = 3 SE +/- 3985947.54, N = 3 SE +/- 3811532.21, N = 3 1721266667 1730566667 1744433333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 SNC2 SNC4 Default - Disabled 600M 1200M 1800M 2400M 3000M SE +/- 22995168.57, N = 3 SE +/- 26768908.17, N = 3 SE +/- 26479677.74, N = 3 2632233333 2643433333 2646100000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 SNC2 Default - Disabled SNC4 600M 1200M 1800M 2400M 3000M SE +/- 14068522.78, N = 3 SE +/- 20219380.14, N = 3 SE +/- 1844210.16, N = 3 3016100000 3020900000 3023166667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 32 Default - Disabled SNC2 SNC4 900M 1800M 2700M 3600M 4500M SE +/- 18076719.22, N = 3 SE +/- 26426018.32, N = 3 SE +/- 26314254.69, N = 3 4228133333 4233566667 4243600000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 SNC2 Default - Disabled SNC4 1000M 2000M 3000M 4000M 5000M SE +/- 10235938.86, N = 3 SE +/- 17623122.44, N = 3 SE +/- 22778084.01, N = 3 4493966667 4495533333 4518133333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 32 SNC2 Default - Disabled SNC4 1200M 2400M 3600M 4800M 6000M SE +/- 18095333.96, N = 3 SE +/- 19784955.00, N = 3 SE +/- 29526993.30, N = 3 5519933333 5526733333 5535500000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 57 SNC2 Default - Disabled SNC4 1200M 2400M 3600M 4800M 6000M SE +/- 15429013.07, N = 3 SE +/- 18999298.23, N = 3 SE +/- 14304195.19, N = 3 5393633333 5407200000 5416600000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 Default - Disabled SNC4 SNC2 110M 220M 330M 440M 550M SE +/- 2767116.43, N = 3 SE +/- 3819181.12, N = 3 SE +/- 3417076.40, N = 3 522560000 522996667 523603333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 SNC4 SNC2 Default - Disabled 200M 400M 600M 800M 1000M SE +/- 6276308.19, N = 3 SE +/- 8992613.64, N = 3 SE +/- 8999104.28, N = 3 927276667 935170000 938493333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 512 SNC4 SNC2 Default - Disabled 300M 600M 900M 1200M 1500M SE +/- 7198688.15, N = 3 SE +/- 6948700.92, N = 3 SE +/- 6222807.51, N = 3 1300233333 1302833333 1314200000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 512 SNC4 SNC2 Default - Disabled 300M 600M 900M 1200M 1500M SE +/- 5108163.40, N = 3 SE +/- 5353295.97, N = 3 SE +/- 4629254.80, N = 3 1493000000 1499133333 1514200000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: CPU SNC4 SNC2 Default - Disabled 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 SE +/- 0.02, N = 3 15.00 15.19 15.23 MIN: 14.61 / MAX: 18.76 MIN: 14.66 / MAX: 18.88 MIN: 14.85 / MAX: 18.76
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: CPU Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 SE +/- 0.11, N = 3 10.65 10.85 10.97 MIN: 4.82 / MAX: 12.11 MIN: 5.02 / MAX: 12.4 MIN: 5.05 / MAX: 12.72
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: CPU SNC4 Default - Disabled SNC2 5 10 15 20 25 SE +/- 0.06, N = 3 SE +/- 0.26, N = 15 SE +/- 0.29, N = 15 21.60 22.24 22.77 MIN: 18.58 / MAX: 28.17 MIN: 18.56 / MAX: 28.74 MIN: 18.36 / MAX: 29.05
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: CPU SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 11.89 11.94 12.39 MIN: 5.42 / MAX: 13.74 MIN: 5.67 / MAX: 13.6 MIN: 5.87 / MAX: 14.11
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: CPU SNC2 Default - Disabled SNC4 8 16 24 32 40 SE +/- 1.17, N = 15 SE +/- 1.27, N = 15 SE +/- 0.24, N = 3 31.82 32.75 34.72 MIN: 18.52 / MAX: 35.42 MIN: 17.73 / MAX: 35.6 MIN: 34.26 / MAX: 35.1
Memcached Memcached is a high performance, distributed memory object caching system. This Memcached test profiles makes use of memtier_benchmark for excuting this CPU/memory-focused server benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:5 SNC4 Default - Disabled SNC2 700K 1400K 2100K 2800K 3500K SE +/- 44466.09, N = 3 SE +/- 15559.68, N = 3 SE +/- 33459.28, N = 3 3327393.35 3359726.87 3359880.55 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:10 SNC4 Default - Disabled SNC2 1.2M 2.4M 3.6M 4.8M 6M SE +/- 8577.07, N = 3 SE +/- 23305.65, N = 3 SE +/- 17431.43, N = 3 5728479.09 5811757.93 5818020.66 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:100 SNC4 SNC2 Default - Disabled 1.7M 3.4M 5.1M 6.8M 8.5M SE +/- 35640.28, N = 3 SE +/- 52363.50, N = 3 SE +/- 23469.53, N = 3 7507583.68 7704668.27 7743874.04 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms Default - Disabled SNC2 SNC4 0.0581 0.1162 0.1743 0.2324 0.2905 SE +/- 0.00087, N = 3 SE +/- 0.00284, N = 3 SE +/- 0.00197, N = 3 0.25803 0.25612 0.25502
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C Default - Disabled SNC2 SNC4 50K 100K 150K 200K 250K SE +/- 280.16, N = 3 SE +/- 322.09, N = 3 SE +/- 112.43, N = 3 214445.97 215684.06 215764.82 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C Default - Disabled SNC2 SNC4 12K 24K 36K 48K 60K SE +/- 476.77, N = 3 SE +/- 324.22, N = 3 SE +/- 191.96, N = 3 52079.13 54560.03 56924.56 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C Default - Disabled SNC2 SNC4 2K 4K 6K 8K 10K SE +/- 60.80, N = 3 SE +/- 257.16, N = 15 SE +/- 304.30, N = 12 8869.84 9712.81 10411.60 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C SNC4 SNC2 Default - Disabled 20K 40K 60K 80K 100K SE +/- 107.70, N = 3 SE +/- 925.13, N = 3 SE +/- 245.52, N = 3 94921.27 97435.24 100728.49 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D SNC4 Default - Disabled SNC2 900 1800 2700 3600 4500 SE +/- 45.26, N = 4 SE +/- 21.28, N = 3 SE +/- 8.86, N = 3 4155.68 4279.98 4282.36 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C Default - Disabled SNC2 SNC4 60K 120K 180K 240K 300K SE +/- 1442.62, N = 3 SE +/- 427.36, N = 3 SE +/- 1304.12, N = 3 255135.24 256378.22 259883.62 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C Default - Disabled SNC2 SNC4 20K 40K 60K 80K 100K SE +/- 1045.55, N = 4 SE +/- 59.39, N = 3 SE +/- 517.95, N = 3 95501.50 97068.01 98739.50 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B Default - Disabled SNC2 SNC4 30K 60K 90K 120K 150K SE +/- 676.26, N = 3 SE +/- 674.53, N = 3 SE +/- 1739.78, N = 3 145525.94 151244.49 151839.30 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C SNC4 SNC2 Default - Disabled 20K 40K 60K 80K 100K SE +/- 287.86, N = 3 SE +/- 271.76, N = 3 SE +/- 61.36, N = 3 87467.23 88668.13 89217.91 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time Default - Disabled SNC2 SNC4 30 60 90 120 150 138.65 135.10 133.74 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time Default - Disabled SNC2 SNC4 70 140 210 280 350 331.86 310.66 302.69 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2023.09.15 Model: Chrysler Neon 1M Default - Disabled SNC4 SNC2 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.27, N = 3 SE +/- 0.28, N = 3 157.35 156.21 155.50
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA256 SNC2 Default - Disabled SNC4 30000M 60000M 90000M 120000M 150000M SE +/- 241913832.01, N = 3 SE +/- 311439878.18, N = 3 SE +/- 282602364.23, N = 3 131627463983 131629499893 131706698427 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA512 Default - Disabled SNC2 SNC4 9000M 18000M 27000M 36000M 45000M SE +/- 32408039.38, N = 3 SE +/- 19206282.00, N = 3 SE +/- 16399997.56, N = 3 42723911653 43151618910 43206641983 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 SNC2 Default - Disabled SNC4 11K 22K 33K 44K 55K SE +/- 67.46, N = 3 SE +/- 49.04, N = 3 SE +/- 90.07, N = 3 49862.1 49897.3 49924.9 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 SNC2 Default - Disabled SNC4 300K 600K 900K 1200K 1500K SE +/- 486.05, N = 3 SE +/- 3444.37, N = 3 SE +/- 3270.23, N = 3 1529451.7 1533067.1 1538165.3 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20 SNC2 Default - Disabled SNC4 110000M 220000M 330000M 440000M 550000M SE +/- 45048908.57, N = 3 SE +/- 157967680.92, N = 3 SE +/- 56755200.60, N = 3 511318347957 511694244653 512318580417 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-128-GCM Default - Disabled SNC2 SNC4 200000M 400000M 600000M 800000M 1000000M SE +/- 2155035998.71, N = 3 SE +/- 1081587572.26, N = 3 SE +/- 179395215.71, N = 3 941442701447 943365732580 946933378620 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-256-GCM Default - Disabled SNC2 SNC4 200000M 400000M 600000M 800000M 1000000M SE +/- 1400557991.78, N = 3 SE +/- 991457061.79, N = 3 SE +/- 1023563571.48, N = 3 815175705397 815393358357 817131387107 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20-Poly1305 Default - Disabled SNC2 SNC4 80000M 160000M 240000M 320000M 400000M SE +/- 176665463.84, N = 3 SE +/- 140024932.44, N = 3 SE +/- 129282934.95, N = 3 361852375860 361987547683 362765500023 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 12 24 36 48 60 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 SE +/- 0.15, N = 3 49.41 51.06 51.19 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 200 400 600 800 1000 SE +/- 0.53, N = 3 SE +/- 0.78, N = 3 SE +/- 1.50, N = 3 965.67 468.35 467.39 MIN: 767.19 / MAX: 1026.9 MIN: 396.69 / MAX: 533.02 MIN: 391.77 / MAX: 504.29 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 80 160 240 320 400 SE +/- 1.27, N = 3 SE +/- 1.55, N = 3 SE +/- 0.76, N = 3 338.09 356.72 367.12 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 30 60 90 120 150 SE +/- 0.52, N = 3 SE +/- 0.29, N = 3 SE +/- 0.13, N = 3 141.78 67.22 65.32 MIN: 54.39 / MAX: 210.64 MIN: 37.86 / MAX: 90.7 MIN: 36.8 / MAX: 87.75 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU Default - Disabled SNC2 SNC4 80 160 240 320 400 SE +/- 0.89, N = 3 SE +/- 2.29, N = 3 SE +/- 0.96, N = 3 338.74 357.03 366.59 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU Default - Disabled SNC2 SNC4 30 60 90 120 150 SE +/- 0.37, N = 3 SE +/- 0.43, N = 3 SE +/- 0.17, N = 3 141.54 67.17 65.40 MIN: 50.69 / MAX: 212.29 MIN: 38.08 / MAX: 93.26 MIN: 44.03 / MAX: 91.21 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU Default - Disabled SNC4 SNC2 800 1600 2400 3200 4000 SE +/- 9.07, N = 3 SE +/- 8.94, N = 3 SE +/- 8.11, N = 3 2730.60 3863.88 3875.39 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU Default - Disabled SNC4 SNC2 4 8 12 16 20 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 17.55 6.20 6.18 MIN: 5.74 / MAX: 43.74 MIN: 5.02 / MAX: 17.36 MIN: 5.27 / MAX: 16.2 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.15, N = 3 96.95 97.76 97.85 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 110 220 330 440 550 SE +/- 0.76, N = 3 SE +/- 0.27, N = 3 SE +/- 0.34, N = 3 493.57 245.02 244.80 MIN: 246.03 / MAX: 522.95 MIN: 201.96 / MAX: 285.01 MIN: 212 / MAX: 263.67 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU SNC2 SNC4 Default - Disabled 3K 6K 9K 12K 15K SE +/- 56.50, N = 3 SE +/- 61.71, N = 3 SE +/- 73.24, N = 3 11414.89 11421.25 12547.91 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU Default - Disabled SNC4 SNC2 0.8573 1.7146 2.5719 3.4292 4.2865 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.81 2.09 2.09 MIN: 2.1 / MAX: 21.93 MIN: 1.87 / MAX: 8.99 MIN: 1.85 / MAX: 7.8 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU Default - Disabled SNC4 SNC2 400 800 1200 1600 2000 SE +/- 12.81, N = 3 SE +/- 0.95, N = 3 SE +/- 4.22, N = 3 1310.86 1647.09 1652.91 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU Default - Disabled SNC4 SNC2 8 16 24 32 40 SE +/- 0.36, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 36.57 14.56 14.51 MIN: 15.82 / MAX: 76.01 MIN: 11.83 / MAX: 32.54 MIN: 12.17 / MAX: 35.53 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 1300 2600 3900 5200 6500 SE +/- 14.13, N = 3 SE +/- 15.69, N = 3 SE +/- 10.15, N = 3 5783.01 5797.28 5936.37 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 8.07 4.14 4.13 MIN: 4.25 / MAX: 26.67 MIN: 3.73 / MAX: 12.01 MIN: 3.67 / MAX: 12.51 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 1100 2200 3300 4400 5500 SE +/- 6.69, N = 3 SE +/- 9.79, N = 3 SE +/- 6.67, N = 3 4962.13 5023.42 5026.69 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 19.33 19.08 19.06 MIN: 9.33 / MAX: 85.74 MIN: 15.77 / MAX: 56.75 MIN: 17.12 / MAX: 40.94 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 4K 8K 12K 16K 20K SE +/- 14.02, N = 3 SE +/- 11.11, N = 3 SE +/- 14.08, N = 3 17366.29 17402.76 17809.43 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 1.242 2.484 3.726 4.968 6.21 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 5.52 5.51 5.38 MIN: 4.91 / MAX: 14.41 MIN: 4.84 / MAX: 15.48 MIN: 3.31 / MAX: 23.31 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 400 800 1200 1600 2000 SE +/- 4.08, N = 3 SE +/- 7.31, N = 3 SE +/- 7.84, N = 3 1898.14 1911.68 1914.89 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 25.26 12.54 12.52 MIN: 12.5 / MAX: 49.19 MIN: 10.7 / MAX: 28.81 MIN: 10.66 / MAX: 27.97 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU SNC4 SNC2 Default - Disabled 110 220 330 440 550 SE +/- 1.37, N = 3 SE +/- 2.07, N = 3 SE +/- 0.33, N = 3 474.11 481.41 514.22 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU Default - Disabled SNC4 SNC2 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.15, N = 3 SE +/- 0.22, N = 3 93.23 50.56 49.80 MIN: 42.51 / MAX: 145.87 MIN: 38.52 / MAX: 103.27 MIN: 38.67 / MAX: 117.73 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 2K 4K 6K 8K 10K SE +/- 30.53, N = 3 SE +/- 17.38, N = 3 SE +/- 26.50, N = 3 9866.45 9948.30 9973.18 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 9.72 9.63 9.61 MIN: 4.95 / MAX: 29.09 MIN: 8.28 / MAX: 16.71 MIN: 8.18 / MAX: 19.44 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 1200 2400 3600 4800 6000 SE +/- 10.32, N = 3 SE +/- 6.31, N = 3 SE +/- 12.20, N = 3 4721.81 4737.07 5643.89 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU Default - Disabled SNC4 SNC2 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 8.49 5.07 5.06 MIN: 5.66 / MAX: 25.99 MIN: 4.35 / MAX: 13.85 MIN: 4.47 / MAX: 12.75 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU SNC4 SNC2 Default - Disabled 600 1200 1800 2400 3000 SE +/- 9.95, N = 3 SE +/- 24.21, N = 3 SE +/- 19.25, N = 3 2333.46 2435.28 2592.59 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU SNC4 SNC2 Default - Disabled 9 18 27 36 45 SE +/- 0.17, N = 3 SE +/- 0.40, N = 3 SE +/- 0.27, N = 3 41.08 39.36 37.01 MIN: 31.72 / MAX: 55.31 MIN: 32.78 / MAX: 55.31 MIN: 20.68 / MAX: 66.67 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU Default - Disabled SNC4 SNC2 30K 60K 90K 120K 150K SE +/- 195.22, N = 3 SE +/- 140.85, N = 3 SE +/- 365.77, N = 3 86958.12 134176.02 134566.00 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU Default - Disabled SNC4 SNC2 0.1935 0.387 0.5805 0.774 0.9675 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.86 0.61 0.61 MIN: 0.28 / MAX: 17.81 MIN: 0.46 / MAX: 16.98 MIN: 0.44 / MAX: 16.14 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 500 1000 1500 2000 2500 SE +/- 3.37, N = 3 SE +/- 5.01, N = 3 SE +/- 7.69, N = 3 2120.23 2121.18 2127.42 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU Default - Disabled SNC2 SNC4 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.11, N = 3 SE +/- 0.16, N = 3 45.25 45.22 45.08 MIN: 34.3 / MAX: 61.3 MIN: 37.77 / MAX: 56.52 MIN: 36.6 / MAX: 52.28 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 40K 80K 120K 160K 200K SE +/- 251.60, N = 3 SE +/- 183.36, N = 3 SE +/- 805.14, N = 3 113350.17 164363.19 166444.72 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 0.1395 0.279 0.4185 0.558 0.6975 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.62 0.36 0.36 MIN: 0.21 / MAX: 39.81 MIN: 0.27 / MAX: 41.09 MIN: 0.27 / MAX: 31.55 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVKL OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU ISPC SNC4 SNC2 Default - Disabled 500 1000 1500 2000 2500 SE +/- 5.29, N = 3 SE +/- 2.73, N = 3 SE +/- 0.88, N = 3 1905 2089 2153 MIN: 180 / MAX: 27886 MIN: 178 / MAX: 27767 MIN: 179 / MAX: 27831
OSPRay Studio Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU SNC2 SNC4 Default - Disabled 200 400 600 800 1000 SE +/- 2.73, N = 3 SE +/- 1.86, N = 3 SE +/- 4.18, N = 3 1074 1073 1064
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU SNC2 SNC4 Default - Disabled 200 400 600 800 1000 SE +/- 3.61, N = 3 SE +/- 3.71, N = 3 SE +/- 3.79, N = 3 1080 1079 1076
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU SNC4 SNC2 Default - Disabled 300 600 900 1200 1500 SE +/- 4.33, N = 3 SE +/- 2.40, N = 3 SE +/- 1.53, N = 3 1266 1259 1252
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU SNC4 SNC2 Default - Disabled 4K 8K 12K 16K 20K SE +/- 2.67, N = 3 SE +/- 20.23, N = 3 SE +/- 35.36, N = 3 17014 16999 16939
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU SNC4 SNC2 Default - Disabled 8K 16K 24K 32K 40K SE +/- 154.47, N = 3 SE +/- 106.88, N = 3 SE +/- 51.31, N = 3 38284 38009 37890
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU SNC4 Default - Disabled SNC2 4K 8K 12K 16K 20K SE +/- 70.54, N = 3 SE +/- 94.37, N = 3 SE +/- 54.03, N = 3 17262 17193 17113
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU SNC4 Default - Disabled SNC2 8K 16K 24K 32K 40K SE +/- 65.34, N = 3 SE +/- 105.83, N = 3 SE +/- 192.95, N = 3 38523 38264 38015
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU SNC4 SNC2 Default - Disabled 4K 8K 12K 16K 20K SE +/- 57.00, N = 3 SE +/- 18.75, N = 3 SE +/- 42.71, N = 3 20199 20144 20017
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU SNC4 SNC2 Default - Disabled 10K 20K 30K 40K 50K SE +/- 146.66, N = 3 SE +/- 80.70, N = 3 SE +/- 70.44, N = 3 44535 44300 43999
PETSc PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better PETSc 3.19 Test: Streams Default - Disabled SNC2 SNC4 40K 80K 120K 160K 200K SE +/- 64.39, N = 3 SE +/- 708.72, N = 3 SE +/- 451.28, N = 3 183161.11 185070.83 187197.63 1. (CC) gcc options: -fPIC -O3 -O2 -lpthread -lpciaccess -lm
PostgreSQL This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only SNC2 SNC4 Default - Disabled 800K 1600K 2400K 3200K 4000K SE +/- 21467.51, N = 3 SE +/- 37681.71, N = 6 SE +/- 49179.25, N = 3 3746006 3759094 3792698 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency SNC2 SNC4 Default - Disabled 0.0601 0.1202 0.1803 0.2404 0.3005 SE +/- 0.002, N = 3 SE +/- 0.003, N = 6 SE +/- 0.003, N = 3 0.267 0.266 0.264 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Write SNC4 SNC2 Default - Disabled 4K 8K 12K 16K 20K SE +/- 577.54, N = 12 SE +/- 496.25, N = 12 SE +/- 160.41, N = 12 14067 14096 18068 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average Latency SNC4 SNC2 Default - Disabled 16 32 48 64 80 SE +/- 3.47, N = 12 SE +/- 3.12, N = 12 SE +/- 0.50, N = 12 72.65 72.15 55.40 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only SNC2 SNC4 Default - Disabled 400K 800K 1200K 1600K 2000K SE +/- 19687.71, N = 6 SE +/- 14067.03, N = 3 SE +/- 7755.83, N = 3 1950507 1956509 1986807 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only - Average Latency SNC2 SNC4 Default - Disabled 0.1154 0.2308 0.3462 0.4616 0.577 SE +/- 0.005, N = 6 SE +/- 0.004, N = 3 SE +/- 0.002, N = 3 0.513 0.511 0.504 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write SNC4 SNC2 Default - Disabled 4K 8K 12K 16K 20K SE +/- 181.92, N = 4 SE +/- 204.24, N = 3 SE +/- 472.23, N = 12 15314 15769 16623 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write - Average Latency SNC4 SNC2 Default - Disabled 15 30 45 60 75 SE +/- 0.79, N = 4 SE +/- 0.82, N = 3 SE +/- 1.61, N = 12 65.33 63.44 60.66 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PyTorch OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 Default - Disabled SNC4 SNC2 11 22 33 44 55 SE +/- 0.07, N = 3 SE +/- 0.25, N = 3 SE +/- 0.25, N = 3 47.70 48.36 48.83 MIN: 44.82 / MAX: 49.01 MIN: 25.09 / MAX: 50.72 MIN: 28.77 / MAX: 50.91
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 SNC2 SNC4 Default - Disabled 5 10 15 20 25 SE +/- 0.21, N = 3 SE +/- 0.20, N = 3 SE +/- 0.25, N = 3 18.44 18.62 18.99 MIN: 9.73 / MAX: 19.35 MIN: 10.5 / MAX: 19.64 MIN: 17.88 / MAX: 19.86
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 Default - Disabled SNC4 SNC2 9 18 27 36 45 SE +/- 0.12, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 38.68 39.80 40.09 MIN: 37.08 / MAX: 39.76 MIN: 26.68 / MAX: 41.28 MIN: 22.35 / MAX: 41.64
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 Default - Disabled SNC4 SNC2 9 18 27 36 45 SE +/- 0.10, N = 3 SE +/- 0.45, N = 3 SE +/- 0.40, N = 3 39.21 39.29 40.41 MIN: 36.81 / MAX: 40.31 MIN: 25.78 / MAX: 41.34 MIN: 34.93 / MAX: 42.02
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 Default - Disabled SNC4 SNC2 9 18 27 36 45 SE +/- 0.05, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 39.06 39.07 40.27 MIN: 37.1 / MAX: 40.22 MIN: 19.69 / MAX: 41.1 MIN: 23.49 / MAX: 42.09
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 SNC2 SNC4 Default - Disabled 4 8 12 16 20 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 SE +/- 0.16, N = 5 15.30 15.39 16.06 MIN: 8.79 / MAX: 15.89 MIN: 8.86 / MAX: 16.02 MIN: 15.37 / MAX: 16.74
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 SNC2 SNC4 Default - Disabled 4 8 12 16 20 SE +/- 0.19, N = 3 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 15.36 15.61 16.16 MIN: 8.91 / MAX: 15.79 MIN: 8.77 / MAX: 16.18 MIN: 15.77 / MAX: 16.51
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 SNC4 Default - Disabled SNC2 4 8 12 16 20 SE +/- 0.06, N = 3 SE +/- 0.11, N = 3 SE +/- 0.10, N = 3 15.24 15.91 16.03 MIN: 8.23 / MAX: 15.81 MIN: 15.34 / MAX: 16.26 MIN: 9.32 / MAX: 16.38
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.10, N = 3 SE +/- 0.08, N = 3 10.58 10.65 11.16 MIN: 5.94 / MAX: 11.15 MIN: 6.15 / MAX: 11.04 MIN: 10.83 / MAX: 11.47
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l SNC4 SNC2 Default - Disabled 2 4 6 8 10 SE +/- 0.23, N = 9 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 4.15 5.36 6.34 MIN: 1.09 / MAX: 6.41 MIN: 3.75 / MAX: 5.85 MIN: 5.68 / MAX: 6.64
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l SNC4 SNC2 Default - Disabled 2 4 6 8 10 SE +/- 0.29, N = 6 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 4.07 5.31 6.35 MIN: 1.11 / MAX: 6.41 MIN: 3.81 / MAX: 5.85 MIN: 5.72 / MAX: 6.64
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l SNC4 SNC2 Default - Disabled 2 4 6 8 10 SE +/- 0.13, N = 6 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 3.68 5.38 6.36 MIN: 1.19 / MAX: 6.36 MIN: 3.75 / MAX: 5.87 MIN: 5.69 / MAX: 6.62
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.17.1 Input: Li2_STO_ae SNC2 Default - Disabled SNC4 20 40 60 80 100 SE +/- 1.13, N = 3 SE +/- 0.55, N = 3 SE +/- 0.06, N = 3 106.01 103.90 103.77 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.32 Configuration: Multi-Threaded Default - Disabled SNC2 SNC4 70K 140K 210K 280K 350K SE +/- 716.12, N = 3 SE +/- 1378.05, N = 3 SE +/- 3981.12, N = 3 310771.1 313010.5 317145.3 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
Quantum ESPRESSO Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 7.0 Input: AUSURF112 Default - Disabled SNC2 SNC4 70 140 210 280 350 SE +/- 0.35, N = 3 SE +/- 0.87, N = 3 SE +/- 0.32, N = 3 326.06 316.50 307.06 1. (F9X) gfortran options: -pthread -fopenmp -ldevXlib -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3_omp -lfftw3 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD SNC4 Default - Disabled SNC2 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 26.97 26.95 26.91 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP HotSpot3D SNC4 SNC2 Default - Disabled 14 28 42 56 70 SE +/- 0.53, N = 15 SE +/- 0.62, N = 15 SE +/- 0.82, N = 3 61.23 59.86 58.87 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte SNC4 SNC2 Default - Disabled 7 14 21 28 35 SE +/- 0.05, N = 3 SE +/- 0.14, N = 3 SE +/- 0.23, N = 3 31.12 30.89 29.39 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver SNC4 SNC2 Default - Disabled 1.2566 2.5132 3.7698 5.0264 6.283 SE +/- 0.013, N = 3 SE +/- 0.002, N = 3 SE +/- 0.036, N = 3 5.585 5.581 5.577 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster SNC4 SNC2 Default - Disabled 1.1099 2.2198 3.3297 4.4396 5.5495 SE +/- 0.027, N = 3 SE +/- 0.019, N = 3 SE +/- 0.007, N = 3 4.933 4.884 4.703 1. (CXX) g++ options: -O2 -lOpenCL
SPECFEM3D simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Mount St. Helens SNC2 Default - Disabled SNC4 2 4 6 8 10 SE +/- 0.022697239, N = 3 SE +/- 0.077041220, N = 3 SE +/- 0.073100963, N = 3 7.756233737 7.724890032 7.545176178 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Layered Halfspace Default - Disabled SNC2 SNC4 5 10 15 20 25 SE +/- 0.16, N = 3 SE +/- 0.24, N = 3 SE +/- 0.14, N = 3 18.74 18.36 18.04 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Tomographic Model Default - Disabled SNC2 SNC4 2 4 6 8 10 SE +/- 0.088577121, N = 3 SE +/- 0.094423144, N = 3 SE +/- 0.068276455, N = 3 7.987430452 7.544143736 7.519415000 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Homogeneous Halfspace Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.062415502, N = 15 SE +/- 0.112412006, N = 4 SE +/- 0.063880399, N = 3 9.942989762 9.742806803 9.668759733 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Water-layered Halfspace Default - Disabled SNC2 SNC4 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.19, N = 3 SE +/- 0.06, N = 3 19.27 19.22 18.82 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Stockfish This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 15 Total Time Default - Disabled SNC2 SNC4 60M 120M 180M 240M 300M SE +/- 2377868.67, N = 3 SE +/- 3381815.04, N = 15 SE +/- 6406046.35, N = 15 287331097 295780798 300267148 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 SNC4 SNC2 Default - Disabled 12 24 36 48 60 SE +/- 0.63, N = 3 SE +/- 0.59, N = 3 SE +/- 0.23, N = 3 49.27 49.51 51.77
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 SNC2 SNC4 Default - Disabled 16 32 48 64 80 SE +/- 0.19, N = 3 SE +/- 0.74, N = 3 SE +/- 0.29, N = 3 66.13 66.88 70.35
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 SNC4 SNC2 Default - Disabled 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.22, N = 3 SE +/- 0.05, N = 3 84.50 85.54 90.46
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: ResNet-50 SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 0.12, N = 3 SE +/- 0.23, N = 3 SE +/- 0.12, N = 3 112.08 115.90 118.86
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: ResNet-50 SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.21, N = 3 SE +/- 0.04, N = 3 127.24 130.65 135.02
uvg266 uvg266 is an open-source VVC/H.266 (Versatile Video Coding) encoder based on Kvazaar as part of the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Slow SNC4 SNC2 Default - Disabled 7 14 21 28 35 SE +/- 0.13, N = 3 SE +/- 0.09, N = 3 SE +/- 0.09, N = 3 29.30 29.47 29.92
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Medium SNC4 SNC2 Default - Disabled 8 16 24 32 40 SE +/- 0.08, N = 3 SE +/- 0.17, N = 3 SE +/- 0.12, N = 3 32.75 33.12 33.23
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Slow SNC4 SNC2 Default - Disabled 20 40 60 80 100 SE +/- 0.40, N = 3 SE +/- 0.19, N = 3 SE +/- 0.27, N = 3 88.18 88.68 89.19
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Medium SNC4 SNC2 Default - Disabled 20 40 60 80 100 SE +/- 0.34, N = 3 SE +/- 0.18, N = 3 SE +/- 0.23, N = 3 97.60 98.22 98.33
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Very Fast SNC4 SNC2 Default - Disabled 15 30 45 60 75 SE +/- 0.22, N = 3 SE +/- 0.60, N = 3 SE +/- 0.38, N = 3 61.26 62.99 66.32
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Super Fast SNC4 SNC2 Default - Disabled 15 30 45 60 75 SE +/- 0.42, N = 3 SE +/- 0.12, N = 3 SE +/- 0.24, N = 3 62.25 64.65 67.47
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast SNC4 SNC2 Default - Disabled 15 30 45 60 75 SE +/- 0.25, N = 3 SE +/- 0.36, N = 3 SE +/- 0.15, N = 3 63.23 65.34 69.02
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Very Fast SNC2 SNC4 Default - Disabled 50 100 150 200 250 SE +/- 0.63, N = 3 SE +/- 1.28, N = 3 SE +/- 1.32, N = 3 203.48 207.62 208.36
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Super Fast SNC2 SNC4 Default - Disabled 50 100 150 200 250 SE +/- 1.37, N = 3 SE +/- 2.03, N = 3 SE +/- 0.37, N = 3 206.23 206.92 211.36
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast SNC2 Default - Disabled SNC4 50 100 150 200 250 SE +/- 0.83, N = 3 SE +/- 0.58, N = 3 SE +/- 2.40, N = 3 204.49 207.97 208.97
VVenC VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Fast SNC4 SNC2 Default - Disabled 2 4 6 8 10 SE +/- 0.118, N = 3 SE +/- 0.051, N = 3 SE +/- 0.094, N = 3 8.186 8.278 8.715 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Faster SNC4 SNC2 Default - Disabled 4 8 12 16 20 SE +/- 0.09, N = 3 SE +/- 0.12, N = 3 SE +/- 0.02, N = 3 14.01 14.31 15.25 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Fast SNC4 SNC2 Default - Disabled 6 12 18 24 30 SE +/- 0.24, N = 3 SE +/- 0.29, N = 3 SE +/- 0.17, N = 3 23.45 23.71 24.30 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Faster SNC4 SNC2 Default - Disabled 9 18 27 36 45 SE +/- 0.45, N = 3 SE +/- 0.32, N = 3 SE +/- 0.17, N = 3 40.20 40.22 41.40 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction Default - Disabled SNC2 SNC4 0.5889 1.1778 1.7667 2.3556 2.9445 SE +/- 0.03355279, N = 3 SE +/- 0.01815766, N = 3 SE +/- 0.00587555, N = 3 2.61738705 2.49659332 2.41614302 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 10.66 10.49 10.36 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Default - Disabled Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105OpenCL Notes: GPU Compute Cores: 6144Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 26 November 2023 00:02 by user phoronix.
SNC2 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105OpenCL Notes: GPU Compute Cores: 6144Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 26 November 2023 16:53 by user phoronix.
SNC4 Processor: AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads), Motherboard: HP 8B24 (U65 Ver. 01.01.04 BIOS), Chipset: AMD Device 14a4, Memory: 128GB, Disk: 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1, Graphics: NVIDIA RTX A4000 16GB, Audio: NVIDIA GA104 HD Audio, Monitor: ASUS VP28U, Network: Realtek RTL8111/8168/8411
OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105OpenCL Notes: GPU Compute Cores: 6144Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 27 November 2023 11:47 by user phoronix.