Benchmarks by Michael Larabel for a future article.
12c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
10c Changed Memory to 1264GB .
8c Changed Memory to 1008GB .
6c Changed Memory to 768GB .
High Performance Conjugate Gradient HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.99, N = 9 SE +/- 0.49, N = 9 SE +/- 3.31, N = 9 SE +/- 1.12, N = 12 36.54 45.00 48.29 86.81 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C 6c 8c 10c 12c 20K 40K 60K 80K 100K SE +/- 554.69, N = 3 SE +/- 907.72, N = 15 SE +/- 899.80, N = 15 SE +/- 812.04, N = 15 71662.28 79784.15 81179.00 80225.01 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D 6c 8c 10c 12c 2K 4K 6K 8K 10K SE +/- 158.57, N = 12 SE +/- 134.50, N = 15 SE +/- 206.91, N = 12 SE +/- 84.88, N = 3 5690.01 6675.71 7124.92 8491.01 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C 6c 8c 10c 12c 100K 200K 300K 400K 500K SE +/- 4680.97, N = 5 SE +/- 5095.33, N = 5 SE +/- 2546.14, N = 3 SE +/- 5489.08, N = 4 454360.62 466769.54 489995.20 489164.65 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C 6c 8c 10c 12c 40K 80K 120K 160K 200K SE +/- 1626.80, N = 15 SE +/- 2089.98, N = 15 SE +/- 2631.10, N = 15 SE +/- 2393.90, N = 3 117733.57 153458.78 177097.42 209846.76 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C 6c 8c 10c 12c 60K 120K 180K 240K 300K SE +/- 1838.44, N = 3 SE +/- 1630.30, N = 3 SE +/- 726.36, N = 3 SE +/- 1589.72, N = 3 167474.70 208535.23 239496.01 260471.50 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 6c 8c 10c 12c 2K 4K 6K 8K 10K SE +/- 96.81, N = 3 SE +/- 63.13, N = 3 SE +/- 31.49, N = 3 SE +/- 27.15, N = 3 8651.92 8615.97 8666.98 8640.31 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 6c 8c 10c 12c 80 160 240 320 400 SE +/- 3.87, N = 3 SE +/- 2.53, N = 3 SE +/- 1.26, N = 3 SE +/- 1.09, N = 3 346.08 344.64 346.68 345.61 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver 6c 8c 10c 12c 2 4 6 8 10 SE +/- 0.024, N = 3 SE +/- 0.016, N = 3 SE +/- 0.014, N = 3 SE +/- 0.031, N = 3 6.152 5.970 6.074 6.050 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster 6c 8c 10c 12c 2 4 6 8 10 SE +/- 0.050, N = 3 SE +/- 0.078, N = 15 SE +/- 0.079, N = 15 SE +/- 0.089, N = 15 6.409 6.018 6.285 6.001 1. (CXX) g++ options: -O2 -lOpenCL
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 6c 8c 10c 12c 0.0288 0.0576 0.0864 0.1152 0.144 SE +/- 0.00009, N = 3 SE +/- 0.00046, N = 3 SE +/- 0.00007, N = 3 SE +/- 0.00009, N = 3 0.12820 0.12768 0.12759 0.12783
nekRS nekRS is an open-source Navier Stokes solver based on the spectral element method. NekRS supports both CPU and GPU/accelerator support though this test profile is currently configured for CPU execution. NekRS is part of Nek5000 of the Mathematics and Computer Science MCS at Argonne National Laboratory. This nekRS benchmark is primarily relevant to large core count HPC servers and otherwise may be very time consuming. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FLOP/s, More Is Better nekRS 22.0 Input: TurboPipe Periodic 6c 8c 10c 12c 200000M 400000M 600000M 800000M 1000000M SE +/- 1934071468.29, N = 3 SE +/- 5892587066.25, N = 3 SE +/- 7825985326.68, N = 3 SE +/- 9551971733.63, N = 3 659554333333 740247000000 786258000000 821462000000 1. (CXX) g++ options: -fopenmp -O2 -march=native -mtune=native -ftree-vectorize -lmpi_cxx -lmpi
NWChem NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better NWChem 7.0.2 Input: C240 Buckyball 6c 8c 10c 12c 300 600 900 1200 1500 1517.9 1519.6 1531.0 1537.1 1. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d 6c 8c 10c 12c 80 160 240 320 400 SE +/- 4.79, N = 9 SE +/- 2.69, N = 9 SE +/- 0.11, N = 3 SE +/- 0.14, N = 3 348.88 270.09 146.29 125.53 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time 6c 8c 10c 12c 50 100 150 200 250 227.90 166.15 117.94 109.54 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bumper Beam 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.71, N = 3 SE +/- 0.70, N = 3 SE +/- 0.75, N = 3 SE +/- 0.79, N = 3 79.62 79.20 79.70 79.86
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bird Strike on Windshield 6c 8c 10c 12c 50 100 150 200 250 SE +/- 0.14, N = 3 SE +/- 0.19, N = 3 SE +/- 0.54, N = 3 SE +/- 0.38, N = 3 219.10 219.45 218.22 216.88
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: INIVOL and Fluid Structure Interaction Drop Container 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 SE +/- 0.14, N = 3 80.81 81.09 81.15 81.57
RELION RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 6c 8c 10c 12c 60 120 180 240 300 SE +/- 2.59, N = 6 SE +/- 2.88, N = 3 SE +/- 1.86, N = 4 SE +/- 1.38, N = 5 258.50 221.34 151.40 128.10 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: Kostya 6c 8c 10c 12c 0.9248 1.8496 2.7744 3.6992 4.624 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 4.11 4.11 4.11 4.11 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: TopTweet 6c 8c 10c 12c 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.07, N = 6 SE +/- 0.01, N = 3 6.55 6.57 6.49 6.59 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: LargeRandom 6c 8c 10c 12c 0.2813 0.5626 0.8439 1.1252 1.4065 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.24 1.25 1.25 1.25 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: PartialTweets 6c 8c 10c 12c 1.2803 2.5606 3.8409 5.1212 6.4015 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 5.69 5.66 5.67 5.65 1. (CXX) g++ options: -O3
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: DistinctUserID 6c 8c 10c 12c 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 6.83 6.86 6.84 6.86 1. (CXX) g++ options: -O3
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Monero - Hash Count: 1M 6c 8c 10c 12c 20K 40K 60K 80K 100K SE +/- 214.10, N = 3 SE +/- 383.60, N = 3 SE +/- 152.19, N = 3 SE +/- 328.13, N = 3 100446.2 101953.5 102599.6 104604.6 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Wownero - Hash Count: 1M 6c 8c 10c 12c 30K 60K 90K 120K 150K SE +/- 349.73, N = 3 SE +/- 122.05, N = 3 SE +/- 70.55, N = 3 SE +/- 849.90, N = 3 126057.7 127081.2 127226.6 126465.6 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: CPU 6c 8c 10c 12c 3 6 9 12 15 SE +/- 0.14, N = 12 SE +/- 0.11, N = 15 SE +/- 0.17, N = 12 SE +/- 0.09, N = 15 9.49 9.56 9.62 9.69 MIN: 3.85 / MAX: 12.15 MIN: 3.94 / MAX: 12.41 MIN: 3.97 / MAX: 12.9 MIN: 4 / MAX: 12.39
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: CPU 6c 8c 10c 12c 7 14 21 28 35 SE +/- 0.71, N = 15 SE +/- 0.72, N = 15 SE +/- 0.29, N = 3 SE +/- 0.63, N = 15 28.90 29.04 28.19 28.82 MIN: 22.4 / MAX: 44.91 MIN: 22.62 / MAX: 45.48 MIN: 23.3 / MAX: 45.65 MIN: 23.01 / MAX: 45.86
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown 6c 8c 10c 12c 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.36, N = 3 SE +/- 0.47, N = 3 SE +/- 1.01, N = 3 187.61 185.49 184.73 182.45 MIN: 146.69 / MAX: 208.25 MIN: 134.45 / MAX: 211.64 MIN: 137.82 / MAX: 210.21 MIN: 128.42 / MAX: 209.42
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Asian Dragon 6c 8c 10c 12c 50 100 150 200 250 SE +/- 0.46, N = 3 SE +/- 0.39, N = 3 SE +/- 0.47, N = 3 SE +/- 0.13, N = 3 221.29 217.41 214.31 213.75 MIN: 215.19 / MAX: 233.21 MIN: 211.73 / MAX: 230.1 MIN: 209.11 / MAX: 223.97 MIN: 209.16 / MAX: 225.43
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Medium 6c 8c 10c 12c 14 28 42 56 70 SE +/- 0.53, N = 3 SE +/- 0.73, N = 3 SE +/- 0.11, N = 3 SE +/- 0.68, N = 3 61.40 61.81 62.23 62.56 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.77, N = 3 SE +/- 1.04, N = 3 SE +/- 0.74, N = 3 SE +/- 0.58, N = 10 71.41 73.04 75.35 73.44 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.63, N = 3 SE +/- 0.71, N = 3 SE +/- 1.02, N = 3 SE +/- 0.66, N = 3 75.86 76.84 77.30 77.83 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 12 - Input: Bosphorus 4K 6c 8c 10c 12c 60 120 180 240 300 SE +/- 9.18, N = 13 SE +/- 7.53, N = 15 SE +/- 7.16, N = 15 SE +/- 7.35, N = 15 221.16 227.90 241.37 251.77
OpenVKL OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.3.1 Benchmark: vklBenchmark ISPC 6c 8c 10c 12c 300 600 900 1200 1500 SE +/- 15.59, N = 3 SE +/- 8.82, N = 3 SE +/- 11.03, N = 9 SE +/- 6.93, N = 3 1212 1325 1317 1325 MIN: 328 / MAX: 4115 MIN: 330 / MAX: 5664 MIN: 327 / MAX: 5660 MIN: 329 / MAX: 4553
OSPRay Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/ao/real_time 6c 8c 10c 12c 10 20 30 40 50 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 43.36 43.97 43.03 43.71
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/scivis/real_time 6c 8c 10c 12c 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 43.24 43.84 43.00 42.80
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/pathtracer/real_time 6c 8c 10c 12c 50 100 150 200 250 SE +/- 0.59, N = 3 SE +/- 1.74, N = 3 SE +/- 1.94, N = 3 SE +/- 1.54, N = 3 230.44 228.58 230.28 229.27
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/ao/real_time 6c 8c 10c 12c 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 44.27 44.23 44.00 43.98
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time 6c 8c 10c 12c 10 20 30 40 50 SE +/- 0.15, N = 3 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 SE +/- 0.15, N = 3 43.29 43.43 43.33 43.13
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time 6c 8c 10c 12c 12 24 36 48 60 SE +/- 0.04, N = 3 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 SE +/- 0.50, N = 3 54.61 54.51 54.41 53.77
OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating 6c 8c 10c 12c 300K 600K 900K 1200K 1500K SE +/- 2020.82, N = 3 SE +/- 9235.88, N = 3 SE +/- 5138.86, N = 3 SE +/- 3305.67, N = 3 1177484 1159901 1171627 1181435 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 96000 - Buffer Size: 1024 6c 8c 10c 12c 0.9821 1.9642 2.9463 3.9284 4.9105 SE +/- 0.002133, N = 3 SE +/- 0.008144, N = 3 SE +/- 0.010431, N = 3 SE +/- 0.023689, N = 3 4.364767 4.351402 4.354556 4.345890 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 192000 - Buffer Size: 1024 6c 8c 10c 12c 0.6365 1.273 1.9095 2.546 3.1825 SE +/- 0.004057, N = 3 SE +/- 0.019484, N = 3 SE +/- 0.017291, N = 3 SE +/- 0.001919, N = 3 2.824814 2.811555 2.806190 2.829061 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 2 6c 8c 10c 12c 8 16 24 32 40 SE +/- 0.14, N = 3 SE +/- 0.10, N = 3 SE +/- 0.08, N = 3 SE +/- 0.14, N = 3 34.87 34.69 34.91 34.85 1. (CXX) g++ options: -O3 -fPIC -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6 6c 8c 10c 12c 0.5533 1.1066 1.6599 2.2132 2.7665 SE +/- 0.004, N = 3 SE +/- 0.017, N = 3 SE +/- 0.003, N = 3 SE +/- 0.016, N = 3 2.435 2.420 2.411 2.459 1. (CXX) g++ options: -O3 -fPIC -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6, Lossless 6c 8c 10c 12c 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.055, N = 3 SE +/- 0.034, N = 3 SE +/- 0.044, N = 3 SE +/- 0.076, N = 3 5.330 5.270 5.286 5.287 1. (CXX) g++ options: -O3 -fPIC -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 10, Lossless 6c 8c 10c 12c 0.9758 1.9516 2.9274 3.9032 4.879 SE +/- 0.043, N = 3 SE +/- 0.009, N = 3 SE +/- 0.055, N = 3 SE +/- 0.024, N = 3 4.250 4.252 4.337 4.241 1. (CXX) g++ options: -O3 -fPIC -lm
Timed Gem5 Compilation This test times how long it takes to compile Gem5. Gem5 is a simulator for computer system architecture research. Gem5 is widely used for computer architecture research within the industry, academia, and more. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed Gem5 Compilation 21.2 Time To Compile 6c 8c 10c 12c 30 60 90 120 150 SE +/- 0.57, N = 3 SE +/- 0.77, N = 3 SE +/- 0.36, N = 3 SE +/- 0.16, N = 3 134.70 136.79 134.37 139.24
Timed Mesa Compilation This test profile times how long it takes to compile Mesa with Meson/Ninja. For minimizing build dependencies and avoid versioning conflicts, test this is just the core Mesa build without LLVM or the extra Gallium3D/Mesa drivers enabled. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed Mesa Compilation 21.0 Time To Compile 6c 8c 10c 12c 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 20.16 20.11 20.21 20.12
Build2 This test profile measures the time to bootstrap/install the build2 C++ build toolchain from source. Build2 is a cross-platform build toolchain for C/C++ code and features Cargo-like features. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile 6c 8c 10c 12c 11 22 33 44 55 SE +/- 0.28, N = 3 SE +/- 0.20, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 50.08 49.87 49.80 49.92
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 6c 8c 10c 12c 0.9021 1.8042 2.7063 3.6084 4.5105 SE +/- 0.01788, N = 3 SE +/- 0.08932, N = 12 SE +/- 0.05885, N = 12 SE +/- 0.02537, N = 3 3.96488 3.99305 4.00938 3.95471 MIN: 2.99 MIN: 2.67 MIN: 2.96 MIN: 3.05 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 6c 8c 10c 12c 400 800 1200 1600 2000 SE +/- 16.27, N = 10 SE +/- 28.30, N = 3 SE +/- 14.89, N = 3 SE +/- 31.84, N = 15 2072.57 1982.15 2030.72 1968.70 MIN: 1942.14 MIN: 1911.33 MIN: 1981.15 MIN: 1632.62 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 6c 8c 10c 12c 500 1000 1500 2000 2500 SE +/- 25.74, N = 15 SE +/- 21.41, N = 3 SE +/- 30.76, N = 3 SE +/- 21.01, N = 3 2479.62 2375.45 2438.00 2344.29 MIN: 2293.49 MIN: 2319.45 MIN: 2353.97 MIN: 2288.85 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 6c 8c 10c 12c 500 1000 1500 2000 2500 SE +/- 31.16, N = 3 SE +/- 25.14, N = 15 SE +/- 25.04, N = 15 SE +/- 24.22, N = 3 2471.57 2371.78 2325.71 2275.86 MIN: 2410.73 MIN: 2234.23 MIN: 2171.69 MIN: 2213.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 6c 8c 10c 12c 0.1048 0.2096 0.3144 0.4192 0.524 SE +/- 0.005815, N = 3 SE +/- 0.006374, N = 3 SE +/- 0.005241, N = 4 SE +/- 0.005042, N = 3 0.465059 0.465796 0.463454 0.446930 MIN: 0.38 MIN: 0.38 MIN: 0.38 MIN: 0.38 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 6c 8c 10c 12c 2000M 4000M 6000M 8000M 10000M SE +/- 3844187.53, N = 3 SE +/- 4333333.33, N = 3 SE +/- 5196152.42, N = 3 SE +/- 4618802.15, N = 3 10340333333 10337666667 10340000000 10347000000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 384 - Buffer Length: 256 - Filter Length: 57 6c 8c 10c 12c 2000M 4000M 6000M 8000M 10000M SE +/- 3214550.25, N = 3 SE +/- 5783117.19, N = 3 SE +/- 4409585.52, N = 3 SE +/- 4582575.69, N = 3 10349000000 10349666667 10352666667 10347000000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
CockroachDB CockroachDB is a cloud-native, distributed SQL database for data intensive applications. This test profile uses a server-less CockroachDB configuration to test various Coackroach workloads on the local host with a single node. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: MoVR - Concurrency: 512 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 4.87, N = 3 SE +/- 9.03, N = 3 SE +/- 3.66, N = 3 SE +/- 3.38, N = 3 954.7 960.3 949.6 948.5
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: MoVR - Concurrency: 1024 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 1.56, N = 3 SE +/- 3.18, N = 3 SE +/- 0.58, N = 3 SE +/- 1.42, N = 3 952.7 946.9 949.5 953.8
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 10% Reads - Concurrency: 512 6c 8c 10c 12c 8K 16K 24K 32K 40K SE +/- 438.30, N = 15 SE +/- 351.71, N = 6 SE +/- 270.36, N = 15 SE +/- 343.66, N = 15 35742.3 34832.9 35993.1 35970.0
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 50% Reads - Concurrency: 512 6c 8c 10c 12c 11K 22K 33K 44K 55K SE +/- 32.88, N = 3 SE +/- 454.84, N = 15 SE +/- 514.54, N = 3 SE +/- 464.03, N = 15 47428.0 47596.6 49102.7 47621.9
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 60% Reads - Concurrency: 512 6c 8c 10c 12c 11K 22K 33K 44K 55K SE +/- 555.56, N = 15 SE +/- 411.73, N = 13 SE +/- 620.92, N = 15 SE +/- 268.61, N = 3 51275.1 52515.2 51748.8 52330.1
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 95% Reads - Concurrency: 512 6c 8c 10c 12c 14K 28K 42K 56K 70K SE +/- 813.26, N = 15 SE +/- 890.57, N = 3 SE +/- 1044.13, N = 15 SE +/- 702.29, N = 3 62666.5 64111.9 60769.7 64467.6
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 10% Reads - Concurrency: 1024 6c 8c 10c 12c 8K 16K 24K 32K 40K SE +/- 206.35, N = 3 SE +/- 322.68, N = 3 SE +/- 346.25, N = 3 SE +/- 155.07, N = 3 36329.6 36685.7 35776.8 36846.9
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 50% Reads - Concurrency: 1024 6c 8c 10c 12c 10K 20K 30K 40K 50K SE +/- 391.13, N = 9 SE +/- 468.66, N = 15 SE +/- 380.16, N = 3 SE +/- 366.75, N = 15 47593.9 47498.1 48449.0 47465.5
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 60% Reads - Concurrency: 1024 6c 8c 10c 12c 11K 22K 33K 44K 55K SE +/- 448.33, N = 3 SE +/- 447.89, N = 3 SE +/- 400.61, N = 10 SE +/- 239.52, N = 3 52626.4 52559.0 51959.5 52573.3
OpenBenchmarking.org ops/s, More Is Better CockroachDB 22.2 Workload: KV, 95% Reads - Concurrency: 1024 6c 8c 10c 12c 14K 28K 42K 56K 70K SE +/- 1310.27, N = 15 SE +/- 1317.65, N = 15 SE +/- 1142.40, N = 15 SE +/- 575.30, N = 3 60137.3 58195.5 62029.8 64661.8
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 106.51 107.11 106.85 106.57 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive 6c 8c 10c 12c 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 11.82 11.81 11.76 11.73 1. (CXX) g++ options: -O3 -flto -pthread
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 6c 8c 10c 12c 120M 240M 360M 480M 600M 392496000 531854000 574018000 565152000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare 6c 8c 10c 12c 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 17.94 18.68 18.68 18.71 1. (CXX) g++ options: -O3
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 256 - Model: ResNet-50 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.48, N = 3 SE +/- 0.36, N = 3 SE +/- 0.48, N = 3 95.67 105.01 105.91 109.13
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.31, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.18, N = 3 82.49 84.21 84.48 84.35
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 0.67, N = 3 SE +/- 0.88, N = 3 SE +/- 0.20, N = 3 SE +/- 0.82, N = 3 1148.50 1136.85 1133.18 1133.28
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 160 320 480 640 800 SE +/- 6.13, N = 15 SE +/- 2.11, N = 3 SE +/- 2.41, N = 3 SE +/- 0.72, N = 3 575.75 705.71 742.80 761.49
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 40 80 120 160 200 SE +/- 1.66, N = 15 SE +/- 0.38, N = 3 SE +/- 0.43, N = 3 SE +/- 0.11, N = 3 166.43 135.62 128.92 125.72
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 6.69, N = 15 SE +/- 1.22, N = 3 SE +/- 0.53, N = 3 SE +/- 0.57, N = 3 635.02 773.07 844.43 856.02
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 30 60 90 120 150 SE +/- 1.56, N = 15 SE +/- 0.19, N = 3 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 150.92 123.86 113.41 111.89
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 400 800 1200 1600 2000 SE +/- 8.40, N = 3 SE +/- 1.56, N = 3 SE +/- 1.61, N = 3 SE +/- 4.95, N = 3 1930.33 1954.12 1965.56 1964.27
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 11 22 33 44 55 SE +/- 0.21, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.12, N = 3 49.63 49.00 48.74 48.77
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 300 600 900 1200 1500 SE +/- 1.21, N = 3 SE +/- 3.22, N = 3 SE +/- 0.69, N = 3 SE +/- 4.04, N = 3 1190.53 1201.98 1201.14 1195.91
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.20, N = 3 SE +/- 0.03, N = 3 SE +/- 0.27, N = 3 80.44 79.69 79.71 80.08
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 130 260 390 520 650 SE +/- 2.24, N = 3 SE +/- 1.32, N = 3 SE +/- 2.48, N = 3 SE +/- 1.72, N = 3 608.53 614.61 611.29 615.45
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 30 60 90 120 150 SE +/- 0.58, N = 3 SE +/- 0.27, N = 3 SE +/- 0.55, N = 3 SE +/- 0.46, N = 3 157.22 155.82 156.54 155.48
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.25, N = 3 SE +/- 0.16, N = 3 SE +/- 0.03, N = 3 SE +/- 0.21, N = 3 82.26 84.15 84.27 84.25
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 1.05, N = 3 SE +/- 1.67, N = 3 SE +/- 1.00, N = 3 SE +/- 1.25, N = 3 1148.33 1137.51 1135.18 1133.48
WRF WRF, the Weather Research and Forecasting Model, is a "next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. It features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WRF 4.2.2 Input: conus 2.5km 6c 8c 10c 12c 1600 3200 4800 6400 8000 7432.66 6551.88 4563.18 4070.19 1. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 22.1 Input: Carbon Nanotube 6c 8c 10c 12c 6 12 18 24 30 SE +/- 0.20, N = 3 SE +/- 0.18, N = 3 SE +/- 0.13, N = 3 SE +/- 0.23, N = 5 26.31 24.60 23.37 23.15 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: BMW27 - Compute: CPU-Only 6c 8c 10c 12c 2 4 6 8 10 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 8.33 8.34 8.42 8.58
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Classroom - Compute: CPU-Only 6c 8c 10c 12c 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 0.00, N = 3 20.71 20.68 20.76 20.92
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Barbershop - Compute: CPU-Only 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.31, N = 3 SE +/- 0.24, N = 3 SE +/- 0.15, N = 3 SE +/- 0.21, N = 3 79.93 80.18 80.37 81.03
nginx This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better nginx 1.23.2 Connections: 500 6c 8c 10c 12c 40K 80K 120K 160K 200K SE +/- 113.87, N = 3 SE +/- 453.48, N = 3 SE +/- 335.64, N = 3 SE +/- 291.63, N = 3 196805.30 197081.98 198858.66 201032.06 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard 6c 8c 10c 12c 60 120 180 240 300 SE +/- 2.17, N = 12 SE +/- 2.84, N = 5 SE +/- 3.09, N = 3 SE +/- 2.33, N = 7 253 257 255 254 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU 6c 8c 10c 12c 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.08, N = 3 101.08 101.26 102.01 101.74 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU 6c 8c 10c 12c 100 200 300 400 500 SE +/- 0.14, N = 3 SE +/- 0.27, N = 3 SE +/- 0.10, N = 3 SE +/- 0.21, N = 3 473.69 472.84 469.43 470.98 MIN: 423.34 / MAX: 579.41 MIN: 394.37 / MAX: 553.15 MIN: 432.92 / MAX: 555.25 MIN: 451.07 / MAX: 556.04 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU 6c 8c 10c 12c 10 20 30 40 50 SE +/- 0.17, N = 3 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.13, N = 3 41.33 42.59 42.94 42.98 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 4.42, N = 3 SE +/- 3.60, N = 3 SE +/- 2.71, N = 3 SE +/- 3.30, N = 3 1153.70 1119.79 1110.44 1109.45 MIN: 853.88 / MAX: 1939.06 MIN: 808.33 / MAX: 1875.91 MIN: 769.04 / MAX: 1860.23 MIN: 810.74 / MAX: 1835.01 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU 6c 8c 10c 12c 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.01, N = 3 SE +/- 0.20, N = 3 SE +/- 0.32, N = 3 41.44 42.22 43.18 42.95 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 1.87, N = 3 SE +/- 0.54, N = 3 SE +/- 5.36, N = 3 SE +/- 8.73, N = 3 1150.54 1129.01 1104.59 1110.68 MIN: 870.26 / MAX: 1902.46 MIN: 850.94 / MAX: 1870.94 MIN: 807.38 / MAX: 1818.79 MIN: 833.53 / MAX: 1865.19 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU 6c 8c 10c 12c 1600 3200 4800 6400 8000 SE +/- 4.59, N = 3 SE +/- 6.27, N = 3 SE +/- 13.32, N = 3 SE +/- 2.30, N = 3 7306.47 7389.00 7425.10 7394.65 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU 6c 8c 10c 12c 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 6.56 6.49 6.45 6.48 MIN: 4.99 / MAX: 59.46 MIN: 4.93 / MAX: 59.51 MIN: 4.97 / MAX: 59.86 MIN: 5.06 / MAX: 59.88 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU 6c 8c 10c 12c 40 80 120 160 200 SE +/- 0.09, N = 3 SE +/- 0.48, N = 3 SE +/- 0.03, N = 3 SE +/- 0.21, N = 3 191.29 192.25 192.30 191.43 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU 6c 8c 10c 12c 50 100 150 200 250 SE +/- 0.13, N = 3 SE +/- 0.69, N = 3 SE +/- 0.03, N = 3 SE +/- 0.32, N = 3 250.49 249.26 249.12 250.34 MIN: 213.3 / MAX: 307.84 MIN: 207.76 / MAX: 340.53 MIN: 209.28 / MAX: 311.3 MIN: 222.95 / MAX: 301.42 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU 6c 8c 10c 12c 2K 4K 6K 8K 10K SE +/- 1.79, N = 3 SE +/- 1.79, N = 3 SE +/- 3.30, N = 3 SE +/- 1.42, N = 3 11150.32 11108.16 11066.16 11018.37 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU 6c 8c 10c 12c 0.9788 1.9576 2.9364 3.9152 4.894 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.30 4.31 4.33 4.35 MIN: 3.52 / MAX: 43.57 MIN: 3.51 / MAX: 43.89 MIN: 3.51 / MAX: 41.25 MIN: 3.52 / MAX: 41.44 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU 6c 8c 10c 12c 2K 4K 6K 8K 10K SE +/- 3.42, N = 3 SE +/- 7.50, N = 3 SE +/- 2.08, N = 3 SE +/- 2.57, N = 3 9959.38 9931.49 9900.47 9867.41 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU 6c 8c 10c 12c 1.0913 2.1826 3.2739 4.3652 5.4565 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.81 4.82 4.83 4.85 MIN: 4.14 / MAX: 27.29 MIN: 3.98 / MAX: 28.83 MIN: 4.08 / MAX: 28.68 MIN: 4.06 / MAX: 28.62 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU 6c 8c 10c 12c 200 400 600 800 1000 SE +/- 5.14, N = 3 SE +/- 8.79, N = 6 SE +/- 1.48, N = 3 SE +/- 2.32, N = 3 817.27 875.39 934.71 959.16 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU 6c 8c 10c 12c 13 26 39 52 65 SE +/- 0.37, N = 3 SE +/- 0.57, N = 6 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 58.67 54.80 51.29 49.98 MIN: 43.56 / MAX: 315.05 MIN: 40.7 / MAX: 276.86 MIN: 40.28 / MAX: 292.83 MIN: 38.24 / MAX: 187.97 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 6c 8c 10c 12c 4K 8K 12K 16K 20K SE +/- 33.95, N = 3 SE +/- 31.30, N = 3 SE +/- 30.88, N = 3 SE +/- 12.43, N = 3 19314.04 19278.93 19254.08 19171.51 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 6c 8c 10c 12c 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 9.89 9.90 9.91 9.95 MIN: 8.35 / MAX: 32.16 MIN: 8.39 / MAX: 56.99 MIN: 8.4 / MAX: 50.42 MIN: 8.42 / MAX: 52.38 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU 6c 8c 10c 12c 2K 4K 6K 8K 10K SE +/- 7.67, N = 3 SE +/- 2.85, N = 3 SE +/- 5.19, N = 3 SE +/- 9.96, N = 3 9081.73 9113.11 9063.84 9038.47 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU 6c 8c 10c 12c 1.1925 2.385 3.5775 4.77 5.9625 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 5.28 5.26 5.28 5.30 MIN: 4.34 / MAX: 38.93 MIN: 4.42 / MAX: 42.93 MIN: 4.37 / MAX: 41.23 MIN: 4.42 / MAX: 40.66 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 6c 8c 10c 12c 30K 60K 90K 120K 150K SE +/- 365.43, N = 3 SE +/- 994.61, N = 3 SE +/- 1134.97, N = 10 SE +/- 745.28, N = 3 151213.17 152292.39 147717.32 147769.26 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 6c 8c 10c 12c 0.1238 0.2476 0.3714 0.4952 0.619 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 10 SE +/- 0.00, N = 3 0.54 0.55 0.55 0.55 MIN: 0.5 / MAX: 34.19 MIN: 0.5 / MAX: 30.68 MIN: 0.5 / MAX: 41.23 MIN: 0.5 / MAX: 34.71 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 6c 8c 10c 12c 30K 60K 90K 120K 150K SE +/- 681.80, N = 3 SE +/- 1158.58, N = 3 SE +/- 815.42, N = 3 SE +/- 1214.59, N = 3 121027.25 123571.68 122938.23 119606.21 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 6c 8c 10c 12c 0.2205 0.441 0.6615 0.882 1.1025 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.97 0.98 0.98 0.97 MIN: 0.86 / MAX: 33.82 MIN: 0.86 / MAX: 39.58 MIN: 0.85 / MAX: 39.82 MIN: 0.85 / MAX: 22.9 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
12c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 21 December 2022 05:44 by user phoronix.
10c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1264GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 21 December 2022 20:48 by user phoronix.
8c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1008GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 22 December 2022 12:15 by user phoronix.
6c Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 23 December 2022 05:27 by user phoronix.