Benchmarks by Michael Larabel for a future article.
12c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
10c Changed Memory to 1264GB .
8c Changed Memory to 1008GB .
6c Changed Memory to 768GB .
High Performance Conjugate Gradient HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 12c 10c 8c 6c 20 40 60 80 100 SE +/- 1.12, N = 12 SE +/- 3.31, N = 9 SE +/- 0.49, N = 9 SE +/- 0.99, N = 9 86.81 48.29 45.00 36.54 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C 12c 10c 8c 6c 20K 40K 60K 80K 100K SE +/- 812.04, N = 15 SE +/- 899.80, N = 15 SE +/- 907.72, N = 15 SE +/- 554.69, N = 3 80225.01 81179.00 79784.15 71662.28 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D 12c 10c 8c 6c 2K 4K 6K 8K 10K SE +/- 84.88, N = 3 SE +/- 206.91, N = 12 SE +/- 134.50, N = 15 SE +/- 158.57, N = 12 8491.01 7124.92 6675.71 5690.01 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C 12c 10c 8c 6c 100K 200K 300K 400K 500K SE +/- 5489.08, N = 4 SE +/- 2546.14, N = 3 SE +/- 5095.33, N = 5 SE +/- 4680.97, N = 5 489164.65 489995.20 466769.54 454360.62 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C 12c 10c 8c 6c 40K 80K 120K 160K 200K SE +/- 2393.90, N = 3 SE +/- 2631.10, N = 15 SE +/- 2089.98, N = 15 SE +/- 1626.80, N = 15 209846.76 177097.42 153458.78 117733.57 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C 12c 10c 8c 6c 60K 120K 180K 240K 300K SE +/- 1589.72, N = 3 SE +/- 726.36, N = 3 SE +/- 1630.30, N = 3 SE +/- 1838.44, N = 3 260471.50 239496.01 208535.23 167474.70 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 12c 10c 8c 6c 2K 4K 6K 8K 10K SE +/- 27.15, N = 3 SE +/- 31.49, N = 3 SE +/- 63.13, N = 3 SE +/- 96.81, N = 3 8640.31 8666.98 8615.97 8651.92 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 12c 10c 8c 6c 80 160 240 320 400 SE +/- 1.09, N = 3 SE +/- 1.26, N = 3 SE +/- 2.53, N = 3 SE +/- 3.87, N = 3 345.61 346.68 344.64 346.08 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver 12c 10c 8c 6c 2 4 6 8 10 SE +/- 0.031, N = 3 SE +/- 0.014, N = 3 SE +/- 0.016, N = 3 SE +/- 0.024, N = 3 6.050 6.074 5.970 6.152 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster 12c 10c 8c 6c 2 4 6 8 10 SE +/- 0.089, N = 15 SE +/- 0.079, N = 15 SE +/- 0.078, N = 15 SE +/- 0.050, N = 3 6.001 6.285 6.018 6.409 1. (CXX) g++ options: -O2 -lOpenCL
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 12c 10c 8c 6c 0.0288 0.0576 0.0864 0.1152 0.144 SE +/- 0.00009, N = 3 SE +/- 0.00007, N = 3 SE +/- 0.00046, N = 3 SE +/- 0.00009, N = 3 0.12783 0.12759 0.12768 0.12820
nekRS nekRS is an open-source Navier Stokes solver based on the spectral element method. NekRS supports both CPU and GPU/accelerator support though this test profile is currently configured for CPU execution. NekRS is part of Nek5000 of the Mathematics and Computer Science MCS at Argonne National Laboratory. This nekRS benchmark is primarily relevant to large core count HPC servers and otherwise may be very time consuming. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FLOP/s, More Is Better nekRS 22.0 Input: TurboPipe Periodic 12c 10c 8c 6c 200000M 400000M 600000M 800000M 1000000M SE +/- 9551971733.63, N = 3 SE +/- 7825985326.68, N = 3 SE +/- 5892587066.25, N = 3 SE +/- 1934071468.29, N = 3 821462000000 786258000000 740247000000 659554333333 1. (CXX) g++ options: -fopenmp -O2 -march=native -mtune=native -ftree-vectorize -lmpi_cxx -lmpi
NWChem NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better NWChem 7.0.2 Input: C240 Buckyball 12c 10c 8c 6c 300 600 900 1200 1500 1537.1 1531.0 1519.6 1517.9 1. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d 12c 10c 8c 6c 80 160 240 320 400 SE +/- 0.14, N = 3 SE +/- 0.11, N = 3 SE +/- 2.69, N = 9 SE +/- 4.79, N = 9 125.53 146.29 270.09 348.88 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time 12c 10c 8c 6c 50 100 150 200 250 109.54 117.94 166.15 227.90 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bumper Beam 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.79, N = 3 SE +/- 0.75, N = 3 SE +/- 0.70, N = 3 SE +/- 0.71, N = 3 79.86 79.70 79.20 79.62
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bird Strike on Windshield 12c 10c 8c 6c 50 100 150 200 250 SE +/- 0.38, N = 3 SE +/- 0.54, N = 3 SE +/- 0.19, N = 3 SE +/- 0.14, N = 3 216.88 218.22 219.45 219.10
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: INIVOL and Fluid Structure Interaction Drop Container 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.14, N = 3 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 81.57 81.15 81.09 80.81
RELION RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 12c 10c 8c 6c 60 120 180 240 300 SE +/- 1.38, N = 5 SE +/- 1.86, N = 4 SE +/- 2.88, N = 3 SE +/- 2.59, N = 6 128.10 151.40 221.34 258.50 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: Kostya 12c 10c 8c 6c 0.9248 1.8496 2.7744 3.6992 4.624 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 4.11 4.11 4.11 4.11 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: TopTweet 12c 10c 8c 6c 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.07, N = 6 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 6.59 6.49 6.57 6.55 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: LargeRandom 12c 10c 8c 6c 0.2813 0.5626 0.8439 1.1252 1.4065 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.25 1.25 1.25 1.24 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: PartialTweets 12c 10c 8c 6c 1.2803 2.5606 3.8409 5.1212 6.4015 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.65 5.67 5.66 5.69 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: DistinctUserID 12c 10c 8c 6c 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 6.86 6.84 6.86 6.83 1. (CXX) g++ options: -O3
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Monero - Hash Count: 1M 12c 10c 8c 6c 20K 40K 60K 80K 100K SE +/- 328.13, N = 3 SE +/- 152.19, N = 3 SE +/- 383.60, N = 3 SE +/- 214.10, N = 3 104604.6 102599.6 101953.5 100446.2 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Wownero - Hash Count: 1M 12c 10c 8c 6c 30K 60K 90K 120K 150K SE +/- 849.90, N = 3 SE +/- 70.55, N = 3 SE +/- 122.05, N = 3 SE +/- 349.73, N = 3 126465.6 127226.6 127081.2 126057.7 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: CPU 12c 10c 8c 6c 3 6 9 12 15 SE +/- 0.09, N = 15 SE +/- 0.17, N = 12 SE +/- 0.11, N = 15 SE +/- 0.14, N = 12 9.69 9.62 9.56 9.49 MIN: 4 / MAX: 12.39 MIN: 3.97 / MAX: 12.9 MIN: 3.94 / MAX: 12.41 MIN: 3.85 / MAX: 12.15
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: CPU 12c 10c 8c 6c 7 14 21 28 35 SE +/- 0.63, N = 15 SE +/- 0.29, N = 3 SE +/- 0.72, N = 15 SE +/- 0.71, N = 15 28.82 28.19 29.04 28.90 MIN: 23.01 / MAX: 45.86 MIN: 23.3 / MAX: 45.65 MIN: 22.62 / MAX: 45.48 MIN: 22.4 / MAX: 44.91
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown 12c 10c 8c 6c 40 80 120 160 200 SE +/- 1.01, N = 3 SE +/- 0.47, N = 3 SE +/- 0.36, N = 3 SE +/- 0.33, N = 3 182.45 184.73 185.49 187.61 MIN: 128.42 / MAX: 209.42 MIN: 137.82 / MAX: 210.21 MIN: 134.45 / MAX: 211.64 MIN: 146.69 / MAX: 208.25
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Asian Dragon 12c 10c 8c 6c 50 100 150 200 250 SE +/- 0.13, N = 3 SE +/- 0.47, N = 3 SE +/- 0.39, N = 3 SE +/- 0.46, N = 3 213.75 214.31 217.41 221.29 MIN: 209.16 / MAX: 225.43 MIN: 209.11 / MAX: 223.97 MIN: 211.73 / MAX: 230.1 MIN: 215.19 / MAX: 233.21
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Medium 12c 10c 8c 6c 14 28 42 56 70 SE +/- 0.68, N = 3 SE +/- 0.11, N = 3 SE +/- 0.73, N = 3 SE +/- 0.53, N = 3 62.56 62.23 61.81 61.40 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.58, N = 10 SE +/- 0.74, N = 3 SE +/- 1.04, N = 3 SE +/- 0.77, N = 3 73.44 75.35 73.04 71.41 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.66, N = 3 SE +/- 1.02, N = 3 SE +/- 0.71, N = 3 SE +/- 0.63, N = 3 77.83 77.30 76.84 75.86 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 12 - Input: Bosphorus 4K 12c 10c 8c 6c 60 120 180 240 300 SE +/- 7.35, N = 15 SE +/- 7.16, N = 15 SE +/- 7.53, N = 15 SE +/- 9.18, N = 13 251.77 241.37 227.90 221.16
OpenVKL OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.3.1 Benchmark: vklBenchmark ISPC 12c 10c 8c 6c 300 600 900 1200 1500 SE +/- 6.93, N = 3 SE +/- 11.03, N = 9 SE +/- 8.82, N = 3 SE +/- 15.59, N = 3 1325 1317 1325 1212 MIN: 329 / MAX: 4553 MIN: 327 / MAX: 5660 MIN: 330 / MAX: 5664 MIN: 328 / MAX: 4115
OSPRay Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/ao/real_time 12c 10c 8c 6c 10 20 30 40 50 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 43.71 43.03 43.97 43.36
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/scivis/real_time 12c 10c 8c 6c 10 20 30 40 50 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 42.80 43.00 43.84 43.24
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/pathtracer/real_time 12c 10c 8c 6c 50 100 150 200 250 SE +/- 1.54, N = 3 SE +/- 1.94, N = 3 SE +/- 1.74, N = 3 SE +/- 0.59, N = 3 229.27 230.28 228.58 230.44
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/ao/real_time 12c 10c 8c 6c 10 20 30 40 50 SE +/- 0.13, N = 3 SE +/- 0.04, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 43.98 44.00 44.23 44.27
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time 12c 10c 8c 6c 10 20 30 40 50 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.13, N = 3 SE +/- 0.15, N = 3 43.13 43.33 43.43 43.29
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time 12c 10c 8c 6c 12 24 36 48 60 SE +/- 0.50, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 SE +/- 0.04, N = 3 53.77 54.41 54.51 54.61
OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating 12c 10c 8c 6c 300K 600K 900K 1200K 1500K SE +/- 3305.67, N = 3 SE +/- 5138.86, N = 3 SE +/- 9235.88, N = 3 SE +/- 2020.82, N = 3 1181435 1171627 1159901 1177484 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 96000 - Buffer Size: 1024 12c 10c 8c 6c 0.9821 1.9642 2.9463 3.9284 4.9105 SE +/- 0.023689, N = 3 SE +/- 0.010431, N = 3 SE +/- 0.008144, N = 3 SE +/- 0.002133, N = 3 4.345890 4.354556 4.351402 4.364767 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 192000 - Buffer Size: 1024 12c 10c 8c 6c 0.6365 1.273 1.9095 2.546 3.1825 SE +/- 0.001919, N = 3 SE +/- 0.017291, N = 3 SE +/- 0.019484, N = 3 SE +/- 0.004057, N = 3 2.829061 2.806190 2.811555 2.824814 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 2 12c 10c 8c 6c 8 16 24 32 40 SE +/- 0.14, N = 3 SE +/- 0.08, N = 3 SE +/- 0.10, N = 3 SE +/- 0.14, N = 3 34.85 34.91 34.69 34.87 1. (CXX) g++ options: -O3 -fPIC -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6 12c 10c 8c 6c 0.5533 1.1066 1.6599 2.2132 2.7665 SE +/- 0.016, N = 3 SE +/- 0.003, N = 3 SE +/- 0.017, N = 3 SE +/- 0.004, N = 3 2.459 2.411 2.420 2.435 1. (CXX) g++ options: -O3 -fPIC -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6, Lossless 12c 10c 8c 6c 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.076, N = 3 SE +/- 0.044, N = 3 SE +/- 0.034, N = 3 SE +/- 0.055, N = 3 5.287 5.286 5.270 5.330 1. (CXX) g++ options: -O3 -fPIC -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 10, Lossless 12c 10c 8c 6c 0.9758 1.9516 2.9274 3.9032 4.879 SE +/- 0.024, N = 3 SE +/- 0.055, N = 3 SE +/- 0.009, N = 3 SE +/- 0.043, N = 3 4.241 4.337 4.252 4.250 1. (CXX) g++ options: -O3 -fPIC -lm
Timed Gem5 Compilation This test times how long it takes to compile Gem5. Gem5 is a simulator for computer system architecture research. Gem5 is widely used for computer architecture research within the industry, academia, and more. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed Gem5 Compilation 21.2 Time To Compile 12c 10c 8c 6c 30 60 90 120 150 SE +/- 0.16, N = 3 SE +/- 0.36, N = 3 SE +/- 0.77, N = 3 SE +/- 0.57, N = 3 139.24 134.37 136.79 134.70
Timed Mesa Compilation This test profile times how long it takes to compile Mesa with Meson/Ninja. For minimizing build dependencies and avoid versioning conflicts, test this is just the core Mesa build without LLVM or the extra Gallium3D/Mesa drivers enabled. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed Mesa Compilation 21.0 Time To Compile 12c 10c 8c 6c 5 10 15 20 25 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.05, N = 3 20.12 20.21 20.11 20.16
Build2 This test profile measures the time to bootstrap/install the build2 C++ build toolchain from source. Build2 is a cross-platform build toolchain for C/C++ code and features Cargo-like features. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile 12c 10c 8c 6c 11 22 33 44 55 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.20, N = 3 SE +/- 0.28, N = 3 49.92 49.80 49.87 50.08
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 12c 10c 8c 6c 0.9021 1.8042 2.7063 3.6084 4.5105 SE +/- 0.02537, N = 3 SE +/- 0.05885, N = 12 SE +/- 0.08932, N = 12 SE +/- 0.01788, N = 3 3.95471 4.00938 3.99305 3.96488 MIN: 3.05 MIN: 2.96 MIN: 2.67 MIN: 2.99 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 12c 10c 8c 6c 400 800 1200 1600 2000 SE +/- 31.84, N = 15 SE +/- 14.89, N = 3 SE +/- 28.30, N = 3 SE +/- 16.27, N = 10 1968.70 2030.72 1982.15 2072.57 MIN: 1632.62 MIN: 1981.15 MIN: 1911.33 MIN: 1942.14 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 12c 10c 8c 6c 500 1000 1500 2000 2500 SE +/- 21.01, N = 3 SE +/- 30.76, N = 3 SE +/- 21.41, N = 3 SE +/- 25.74, N = 15 2344.29 2438.00 2375.45 2479.62 MIN: 2288.85 MIN: 2353.97 MIN: 2319.45 MIN: 2293.49 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 12c 10c 8c 6c 500 1000 1500 2000 2500 SE +/- 24.22, N = 3 SE +/- 25.04, N = 15 SE +/- 25.14, N = 15 SE +/- 31.16, N = 3 2275.86 2325.71 2371.78 2471.57 MIN: 2213.34 MIN: 2171.69 MIN: 2234.23 MIN: 2410.73 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 12c 10c 8c 6c 0.1048 0.2096 0.3144 0.4192 0.524 SE +/- 0.005042, N = 3 SE +/- 0.005241, N = 4 SE +/- 0.006374, N = 3 SE +/- 0.005815, N = 3 0.446930 0.463454 0.465796 0.465059 MIN: 0.38 MIN: 0.38 MIN: 0.38 MIN: 0.38 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 12c 10c 8c 6c 2000M 4000M 6000M 8000M 10000M SE +/- 4618802.15, N = 3 SE +/- 5196152.42, N = 3 SE +/- 4333333.33, N = 3 SE +/- 3844187.53, N = 3 10347000000 10340000000 10337666667 10340333333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 384 - Buffer Length: 256 - Filter Length: 57 12c 10c 8c 6c 2000M 4000M 6000M 8000M 10000M SE +/- 4582575.69, N = 3 SE +/- 4409585.52, N = 3 SE +/- 5783117.19, N = 3 SE +/- 3214550.25, N = 3 10347000000 10352666667 10349666667 10349000000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
CockroachDB CockroachDB is a cloud-native, distributed SQL database for data intensive applications. This test profile uses a server-less CockroachDB configuration to test various Coackroach workloads on the local host with a single node. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: MoVR - Concurrency: 512 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 3.38, N = 3 SE +/- 3.66, N = 3 SE +/- 9.03, N = 3 SE +/- 4.87, N = 3 948.5 949.6 960.3 954.7
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: MoVR - Concurrency: 1024 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 1.42, N = 3 SE +/- 0.58, N = 3 SE +/- 3.18, N = 3 SE +/- 1.56, N = 3 953.8 949.5 946.9 952.7
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 10% Reads - Concurrency: 512 12c 10c 8c 6c 8K 16K 24K 32K 40K SE +/- 343.66, N = 15 SE +/- 270.36, N = 15 SE +/- 351.71, N = 6 SE +/- 438.30, N = 15 35970.0 35993.1 34832.9 35742.3
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 50% Reads - Concurrency: 512 12c 10c 8c 6c 11K 22K 33K 44K 55K SE +/- 464.03, N = 15 SE +/- 514.54, N = 3 SE +/- 454.84, N = 15 SE +/- 32.88, N = 3 47621.9 49102.7 47596.6 47428.0
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 60% Reads - Concurrency: 512 12c 10c 8c 6c 11K 22K 33K 44K 55K SE +/- 268.61, N = 3 SE +/- 620.92, N = 15 SE +/- 411.73, N = 13 SE +/- 555.56, N = 15 52330.1 51748.8 52515.2 51275.1
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 95% Reads - Concurrency: 512 12c 10c 8c 6c 14K 28K 42K 56K 70K SE +/- 702.29, N = 3 SE +/- 1044.13, N = 15 SE +/- 890.57, N = 3 SE +/- 813.26, N = 15 64467.6 60769.7 64111.9 62666.5
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 10% Reads - Concurrency: 1024 12c 10c 8c 6c 8K 16K 24K 32K 40K SE +/- 155.07, N = 3 SE +/- 346.25, N = 3 SE +/- 322.68, N = 3 SE +/- 206.35, N = 3 36846.9 35776.8 36685.7 36329.6
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 50% Reads - Concurrency: 1024 12c 10c 8c 6c 10K 20K 30K 40K 50K SE +/- 366.75, N = 15 SE +/- 380.16, N = 3 SE +/- 468.66, N = 15 SE +/- 391.13, N = 9 47465.5 48449.0 47498.1 47593.9
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 60% Reads - Concurrency: 1024 12c 10c 8c 6c 11K 22K 33K 44K 55K SE +/- 239.52, N = 3 SE +/- 400.61, N = 10 SE +/- 447.89, N = 3 SE +/- 448.33, N = 3 52573.3 51959.5 52559.0 52626.4
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 95% Reads - Concurrency: 1024 12c 10c 8c 6c 14K 28K 42K 56K 70K SE +/- 575.30, N = 3 SE +/- 1142.40, N = 15 SE +/- 1317.65, N = 15 SE +/- 1310.27, N = 15 64661.8 62029.8 58195.5 60137.3
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.10, N = 3 106.57 106.85 107.11 106.51 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive 12c 10c 8c 6c 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 11.73 11.76 11.81 11.82 1. (CXX) g++ options: -O3 -flto -pthread
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 12c 10c 8c 6c 120M 240M 360M 480M 600M 565152000 574018000 531854000 392496000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare 12c 10c 8c 6c 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 18.71 18.68 18.68 17.94 1. (CXX) g++ options: -O3
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 256 - Model: ResNet-50 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.48, N = 3 SE +/- 0.36, N = 3 SE +/- 0.48, N = 3 SE +/- 0.26, N = 3 109.13 105.91 105.01 95.67
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.18, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.31, N = 3 84.35 84.48 84.21 82.49
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 0.82, N = 3 SE +/- 0.20, N = 3 SE +/- 0.88, N = 3 SE +/- 0.67, N = 3 1133.28 1133.18 1136.85 1148.50
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 160 320 480 640 800 SE +/- 0.72, N = 3 SE +/- 2.41, N = 3 SE +/- 2.11, N = 3 SE +/- 6.13, N = 15 761.49 742.80 705.71 575.75
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 40 80 120 160 200 SE +/- 0.11, N = 3 SE +/- 0.43, N = 3 SE +/- 0.38, N = 3 SE +/- 1.66, N = 15 125.72 128.92 135.62 166.43
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 0.57, N = 3 SE +/- 0.53, N = 3 SE +/- 1.22, N = 3 SE +/- 6.69, N = 15 856.02 844.43 773.07 635.02
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 30 60 90 120 150 SE +/- 0.07, N = 3 SE +/- 0.08, N = 3 SE +/- 0.19, N = 3 SE +/- 1.56, N = 15 111.89 113.41 123.86 150.92
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 400 800 1200 1600 2000 SE +/- 4.95, N = 3 SE +/- 1.61, N = 3 SE +/- 1.56, N = 3 SE +/- 8.40, N = 3 1964.27 1965.56 1954.12 1930.33
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 11 22 33 44 55 SE +/- 0.12, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.21, N = 3 48.77 48.74 49.00 49.63
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 300 600 900 1200 1500 SE +/- 4.04, N = 3 SE +/- 0.69, N = 3 SE +/- 3.22, N = 3 SE +/- 1.21, N = 3 1195.91 1201.14 1201.98 1190.53
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.27, N = 3 SE +/- 0.03, N = 3 SE +/- 0.20, N = 3 SE +/- 0.07, N = 3 80.08 79.71 79.69 80.44
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 130 260 390 520 650 SE +/- 1.72, N = 3 SE +/- 2.48, N = 3 SE +/- 1.32, N = 3 SE +/- 2.24, N = 3 615.45 611.29 614.61 608.53
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 30 60 90 120 150 SE +/- 0.46, N = 3 SE +/- 0.55, N = 3 SE +/- 0.27, N = 3 SE +/- 0.58, N = 3 155.48 156.54 155.82 157.22
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.21, N = 3 SE +/- 0.03, N = 3 SE +/- 0.16, N = 3 SE +/- 0.25, N = 3 84.25 84.27 84.15 82.26
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 1.25, N = 3 SE +/- 1.00, N = 3 SE +/- 1.67, N = 3 SE +/- 1.05, N = 3 1133.48 1135.18 1137.51 1148.33
WRF WRF, the Weather Research and Forecasting Model, is a "next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. It features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WRF 4.2.2 Input: conus 2.5km 12c 10c 8c 6c 1600 3200 4800 6400 8000 4070.19 4563.18 6551.88 7432.66 1. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 22.1 Input: Carbon Nanotube 12c 10c 8c 6c 6 12 18 24 30 SE +/- 0.23, N = 5 SE +/- 0.13, N = 3 SE +/- 0.18, N = 3 SE +/- 0.20, N = 3 23.15 23.37 24.60 26.31 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: BMW27 - Compute: CPU-Only 12c 10c 8c 6c 2 4 6 8 10 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 8.58 8.42 8.34 8.33
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Classroom - Compute: CPU-Only 12c 10c 8c 6c 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 20.92 20.76 20.68 20.71
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Barbershop - Compute: CPU-Only 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.21, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 SE +/- 0.31, N = 3 81.03 80.37 80.18 79.93
nginx This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better nginx 1.23.2 Connections: 500 12c 10c 8c 6c 40K 80K 120K 160K 200K SE +/- 291.63, N = 3 SE +/- 335.64, N = 3 SE +/- 453.48, N = 3 SE +/- 113.87, N = 3 201032.06 198858.66 197081.98 196805.30 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard 12c 10c 8c 6c 60 120 180 240 300 SE +/- 2.33, N = 7 SE +/- 3.09, N = 3 SE +/- 2.84, N = 5 SE +/- 2.17, N = 12 254 255 257 253 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU 12c 10c 8c 6c 20 40 60 80 100 SE +/- 0.08, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 101.74 102.01 101.26 101.08 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU 12c 10c 8c 6c 100 200 300 400 500 SE +/- 0.21, N = 3 SE +/- 0.10, N = 3 SE +/- 0.27, N = 3 SE +/- 0.14, N = 3 470.98 469.43 472.84 473.69 MIN: 451.07 / MAX: 556.04 MIN: 432.92 / MAX: 555.25 MIN: 394.37 / MAX: 553.15 MIN: 423.34 / MAX: 579.41 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU 12c 10c 8c 6c 10 20 30 40 50 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 SE +/- 0.15, N = 3 SE +/- 0.17, N = 3 42.98 42.94 42.59 41.33 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 3.30, N = 3 SE +/- 2.71, N = 3 SE +/- 3.60, N = 3 SE +/- 4.42, N = 3 1109.45 1110.44 1119.79 1153.70 MIN: 810.74 / MAX: 1835.01 MIN: 769.04 / MAX: 1860.23 MIN: 808.33 / MAX: 1875.91 MIN: 853.88 / MAX: 1939.06 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU 12c 10c 8c 6c 10 20 30 40 50 SE +/- 0.32, N = 3 SE +/- 0.20, N = 3 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 42.95 43.18 42.22 41.44 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 8.73, N = 3 SE +/- 5.36, N = 3 SE +/- 0.54, N = 3 SE +/- 1.87, N = 3 1110.68 1104.59 1129.01 1150.54 MIN: 833.53 / MAX: 1865.19 MIN: 807.38 / MAX: 1818.79 MIN: 850.94 / MAX: 1870.94 MIN: 870.26 / MAX: 1902.46 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU 12c 10c 8c 6c 1600 3200 4800 6400 8000 SE +/- 2.30, N = 3 SE +/- 13.32, N = 3 SE +/- 6.27, N = 3 SE +/- 4.59, N = 3 7394.65 7425.10 7389.00 7306.47 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU 12c 10c 8c 6c 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 6.48 6.45 6.49 6.56 MIN: 5.06 / MAX: 59.88 MIN: 4.97 / MAX: 59.86 MIN: 4.93 / MAX: 59.51 MIN: 4.99 / MAX: 59.46 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU 12c 10c 8c 6c 40 80 120 160 200 SE +/- 0.21, N = 3 SE +/- 0.03, N = 3 SE +/- 0.48, N = 3 SE +/- 0.09, N = 3 191.43 192.30 192.25 191.29 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU 12c 10c 8c 6c 50 100 150 200 250 SE +/- 0.32, N = 3 SE +/- 0.03, N = 3 SE +/- 0.69, N = 3 SE +/- 0.13, N = 3 250.34 249.12 249.26 250.49 MIN: 222.95 / MAX: 301.42 MIN: 209.28 / MAX: 311.3 MIN: 207.76 / MAX: 340.53 MIN: 213.3 / MAX: 307.84 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU 12c 10c 8c 6c 2K 4K 6K 8K 10K SE +/- 1.42, N = 3 SE +/- 3.30, N = 3 SE +/- 1.79, N = 3 SE +/- 1.79, N = 3 11018.37 11066.16 11108.16 11150.32 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU 12c 10c 8c 6c 0.9788 1.9576 2.9364 3.9152 4.894 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.35 4.33 4.31 4.30 MIN: 3.52 / MAX: 41.44 MIN: 3.51 / MAX: 41.25 MIN: 3.51 / MAX: 43.89 MIN: 3.52 / MAX: 43.57 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU 12c 10c 8c 6c 2K 4K 6K 8K 10K SE +/- 2.57, N = 3 SE +/- 2.08, N = 3 SE +/- 7.50, N = 3 SE +/- 3.42, N = 3 9867.41 9900.47 9931.49 9959.38 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU 12c 10c 8c 6c 1.0913 2.1826 3.2739 4.3652 5.4565 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.85 4.83 4.82 4.81 MIN: 4.06 / MAX: 28.62 MIN: 4.08 / MAX: 28.68 MIN: 3.98 / MAX: 28.83 MIN: 4.14 / MAX: 27.29 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU 12c 10c 8c 6c 200 400 600 800 1000 SE +/- 2.32, N = 3 SE +/- 1.48, N = 3 SE +/- 8.79, N = 6 SE +/- 5.14, N = 3 959.16 934.71 875.39 817.27 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU 12c 10c 8c 6c 13 26 39 52 65 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 SE +/- 0.57, N = 6 SE +/- 0.37, N = 3 49.98 51.29 54.80 58.67 MIN: 38.24 / MAX: 187.97 MIN: 40.28 / MAX: 292.83 MIN: 40.7 / MAX: 276.86 MIN: 43.56 / MAX: 315.05 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 12c 10c 8c 6c 4K 8K 12K 16K 20K SE +/- 12.43, N = 3 SE +/- 30.88, N = 3 SE +/- 31.30, N = 3 SE +/- 33.95, N = 3 19171.51 19254.08 19278.93 19314.04 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 12c 10c 8c 6c 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 9.95 9.91 9.90 9.89 MIN: 8.42 / MAX: 52.38 MIN: 8.4 / MAX: 50.42 MIN: 8.39 / MAX: 56.99 MIN: 8.35 / MAX: 32.16 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU 12c 10c 8c 6c 2K 4K 6K 8K 10K SE +/- 9.96, N = 3 SE +/- 5.19, N = 3 SE +/- 2.85, N = 3 SE +/- 7.67, N = 3 9038.47 9063.84 9113.11 9081.73 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU 12c 10c 8c 6c 1.1925 2.385 3.5775 4.77 5.9625 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 5.30 5.28 5.26 5.28 MIN: 4.42 / MAX: 40.66 MIN: 4.37 / MAX: 41.23 MIN: 4.42 / MAX: 42.93 MIN: 4.34 / MAX: 38.93 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 12c 10c 8c 6c 30K 60K 90K 120K 150K SE +/- 745.28, N = 3 SE +/- 1134.97, N = 10 SE +/- 994.61, N = 3 SE +/- 365.43, N = 3 147769.26 147717.32 152292.39 151213.17 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 12c 10c 8c 6c 0.1238 0.2476 0.3714 0.4952 0.619 SE +/- 0.00, N = 3 SE +/- 0.00, N = 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.55 0.55 0.55 0.54 MIN: 0.5 / MAX: 34.71 MIN: 0.5 / MAX: 41.23 MIN: 0.5 / MAX: 30.68 MIN: 0.5 / MAX: 34.19 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 12c 10c 8c 6c 30K 60K 90K 120K 150K SE +/- 1214.59, N = 3 SE +/- 815.42, N = 3 SE +/- 1158.58, N = 3 SE +/- 681.80, N = 3 119606.21 122938.23 123571.68 121027.25 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 12c 10c 8c 6c 0.2205 0.441 0.6615 0.882 1.1025 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.97 0.98 0.98 0.97 MIN: 0.85 / MAX: 22.9 MIN: 0.85 / MAX: 39.82 MIN: 0.86 / MAX: 39.58 MIN: 0.86 / MAX: 33.82 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
12c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 21 December 2022 05:44 by user phoronix.
10c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1264GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 21 December 2022 20:48 by user phoronix.
8c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1008GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 22 December 2022 12:15 by user phoronix.
6c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 23 December 2022 05:27 by user phoronix.