AMD EPYC 9654 Genoa AVX-512 benchmark comparison by Michael Larabel for a future article.
AVX-512 On Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512" CFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AVX-512 Off Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mno-avx512f" CFLAGS="-O3 -march=native -mno-avx512f"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 4th Gen AVX-512 Comparison OpenBenchmarking.org Phoronix Test Suite 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1002E BIOS) AMD Device 14a4 1520GB 800GB INTEL SSDPF21Q800GB ASPEED VGA HDMI Broadcom NetXtreme BCM5720 PCIe Ubuntu 22.10 6.1.0-phx (x86_64) GNOME Shell 43.0 X Server 1.21.1.4 1.3.224 GCC 12.2.0 + Clang 15.0.2-1 ext4 1920x1080 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution AMD EPYC 4th Gen AVX-512 Comparison Benchmarks System Logs - Transparent Huge Pages: madvise - AVX-512 On: CXXFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512" CFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512" - AVX-512 Off: CXXFLAGS="-O3 -march=native -mno-avx512f" CFLAGS="-O3 -march=native -mno-avx512f" - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110d - Python 3.10.7 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AVX-512 On vs. AVX-512 Off Comparison Phoronix Test Suite Baseline +46.4% +46.4% +92.8% +92.8% +139.2% +139.2% 50.5% 37.6% 3.6% CPU - 16 - AlexNet 185.4% R.N.N.T - bf16bf16bf16 - CPU 155.1% D.T.S 153.4% R.N.N.T - f32 - CPU 152.5% W.P.D.F - CPU 143.4% W.P.D.F - CPU 143% F.D.F - CPU 132.2% F.D.F - CPU 131.9% A.G.R.R.0.F - CPU 120% LBC, LBRY Credits 117% Device AI Score 115.1% M.T.E.T.D.F - CPU 114.8% M.T.E.T.D.F - CPU 114.7% W.P.D.F.I - CPU 104.7% W.P.D.F.I - CPU 104.5% F.D.F.I - CPU 104.2% F.D.F.I - CPU 103.9% V.D.F - CPU 97.1% V.D.F - CPU 97% CPU - 16 - GoogLeNet 93.3% D.I.S 92.8% V.D.F.I - CPU 84.7% V.D.F.I - CPU 84.6% D.B.s - f32 - CPU 78.2% P.V.B.D.F - CPU 76.5% P.V.B.D.F - CPU 76.4% CPU - 16 - ResNet-50 73.7% gravity_spheres_volume/dim_512/scivis/real_time 71.4% Q.S.2.P 70.9% P.D.F - CPU 70% P.D.F - CPU 69.5% P.D.F - CPU 68.4% P.D.F - CPU 67.7% gravity_spheres_volume/dim_512/ao/real_time 67.4% scrypt 62.9% A.G.R.R.0.F.I - CPU 58.3% resnet-v2-50 56.1% N.Q.A.B.b.u.S.1.P - A.M.S 55.2% N.Q.A.B.b.u.S.1.P - A.M.S 55.2% inception-v3 Garlicoin 46.5% OpenMP - BM1 44.1% OpenMP - BM1 44.1% N.Q.A.B.b.u.S.1.P - S.S.S 41.5% N.Q.A.B.b.u.S.1.P - S.S.S 41.5% C.C.R.5.I - A.M.S 38.5% C.C.R.5.I - A.M.S 38.4% R.N.N.I - bf16bf16bf16 - CPU A.G.R.R.0.F - CPU 37.4% Skeincoin 36.7% OpenMP - BM2 35.4% OpenMP - BM2 35.4% gravity_spheres_volume/dim_512/pathtracer/real_time 33.8% x25x 33.1% DistinctUserID 30.7% PartialTweets 30.4% C.D.Y.C - A.M.S 24.1% C.D.Y.C - A.M.S 24% Kostya 23.3% 2 - 1080p - 32 - Path Tracer 23% 2 - 4K - 16 - Path Tracer 22.8% 2 - 1080p - 16 - Path Tracer 22.7% 1 - 1080p - 32 - Path Tracer 22.4% 1 - 1080p - 1 - Path Tracer 22.3% 2 - 4K - 32 - Path Tracer 22.2% 1 - 4K - 1 - Path Tracer 22.2% 1 - 4K - 32 - Path Tracer 22.1% 1 - 4K - 16 - Path Tracer 22.1% 1 - 1080p - 16 - Path Tracer 22% 3 - 1080p - 16 - Path Tracer 21.7% 3 - 1080p - 32 - Path Tracer 21.6% TopTweet 21.2% 3 - 4K - 1 - Path Tracer 21.2% LargeRand 21.2% N.T.C.B.b.u.S - A.M.S 21% 3 - 4K - 16 - Path Tracer 21% N.T.C.B.b.u.S - A.M.S 20.9% 3 - 1080p - 1 - Path Tracer 20.9% 3 - 4K - 32 - Path Tracer 20.7% Pathtracer ISPC - Asian Dragon 20.4% Pathtracer ISPC - Asian Dragon Obj 20.2% C.D.Y.C - S.S.S 20.1% C.D.Y.C - S.S.S 20.1% Pathtracer ISPC - Crown 19.8% 2 - 4K - 1 - Path Tracer 19.5% 2 - 1080p - 1 - Path Tracer 19.5% N.T.C.D.m - A.M.S 19% d.M.M.S - Execution Time 19% N.T.C.D.m - A.M.S 19% OpenMP - Points2Image 18.7% super-resolution-10 - CPU - Standard 17.7% N.T.C.B.b.u.S - S.S.S 15.6% N.T.C.B.b.u.S - S.S.S 15.6% vklBenchmark ISPC 15.3% CPU - vision_transformer 15% ArcFace ResNet-100 - CPU - Standard 14.5% A.G.R.R.0.F.I - CPU 12.8% N.T.C.B.b.u.c - S.S.S 12.8% N.T.C.B.b.u.c - S.S.S 12.8% N.D.C.o.b.u.o.I - S.S.S 12.2% N.D.C.o.b.u.o.I - S.S.S 12.2% CPU - blazeface 12.1% Eigen 11.4% CPU - regnety_400m 9.2% F.x.A 9.1% CPU - efficientnet-b0 8.9% fcn-resnet101-11 - CPU - Standard 8.4% BLAS 7.8% Fayalite-FIST 6.7% SqueezeNetV1.0 6.6% d.M.M.S - Mesh Time 6.3% N.T.C.D.m - S.S.S 5.4% N.T.C.D.m - S.S.S 5.4% Preset 12 - Bosphorus 4K 5% 4.9% JPEG - 90 4.6% CPU - googlenet 4.6% PNG - 90 4.5% CPU - mnasnet 4.5% B.C 4.3% C.B.S.A - f32 - CPU 4.2% RTLightmap.hdr.4096x4096 bertsquad-12 - CPU - Standard 3.4% Windowed Gaussian 3.3% CPU - FastestDet 3% Preset 13 - Bosphorus 4K 2.8% JPEG - 100 2.8% OpenMP - NDT Mapping 2.7% TensorFlow oneDNN AI Benchmark Alpha oneDNN OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO Cpuminer-Opt AI Benchmark Alpha OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO TensorFlow AI Benchmark Alpha OpenVINO OpenVINO oneDNN OpenVINO OpenVINO TensorFlow OSPRay Cpuminer-Opt OpenVINO OpenVINO OpenVINO OpenVINO OSPRay Cpuminer-Opt OpenVINO Mobile Neural Network Neural Magic DeepSparse Neural Magic DeepSparse Mobile Neural Network Cpuminer-Opt miniBUDE miniBUDE Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse oneDNN OpenVINO Cpuminer-Opt miniBUDE miniBUDE OSPRay Cpuminer-Opt simdjson simdjson Neural Magic DeepSparse Neural Magic DeepSparse simdjson OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio OSPRay Studio simdjson OSPRay Studio simdjson Neural Magic DeepSparse OSPRay Studio Neural Magic DeepSparse OSPRay Studio OSPRay Studio Embree Embree Neural Magic DeepSparse Neural Magic DeepSparse Embree OSPRay Studio OSPRay Studio Neural Magic DeepSparse OpenFOAM Neural Magic DeepSparse Darmstadt Automotive Parallel Heterogeneous Suite ONNX Runtime Neural Magic DeepSparse Neural Magic DeepSparse OpenVKL NCNN ONNX Runtime OpenVINO Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse NCNN LeelaChessZero NCNN SMHasher NCNN ONNX Runtime LeelaChessZero CP2K Molecular Dynamics Mobile Neural Network OpenFOAM Neural Magic DeepSparse Neural Magic DeepSparse SVT-AV1 Numpy Benchmark JPEG XL libjxl NCNN JPEG XL libjxl NCNN Numenta Anomaly Benchmark oneDNN Intel Open Image Denoise ONNX Runtime Numenta Anomaly Benchmark NCNN SVT-AV1 JPEG XL libjxl Darmstadt Automotive Parallel Heterogeneous Suite AVX-512 On AVX-512 Off
CP2K Molecular Dynamics CP2K is an open-source molecular dynamics software package focused on quantum chemistry and solid-state physics. This test profile currently uses the SSMP (OpenMP) version of cp2k. Learn more via the OpenBenchmarking.org test page.
CPU Power Consumption Monitor OpenBenchmarking.org Watts CPU Power Consumption Monitor Phoronix Test Suite System Monitoring AVX-512 Off AVX-512 On 130 260 390 520 650 Min: 106.95 / Avg: 449.58 / Max: 735.32 Min: 26.37 / Avg: 434.8 / Max: 766.01
CPU Temperature Monitor OpenBenchmarking.org Celsius CPU Temperature Monitor Phoronix Test Suite System Monitoring AVX-512 Off AVX-512 On 14 28 42 56 70 Min: 35.5 / Avg: 51.26 / Max: 73.75 Min: 30.13 / Avg: 49.97 / Max: 73.38
Cpuminer-Opt Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 AVX-512 Off AVX-512 On 1600 3200 4800 6400 8000 SE +/- 26.42, N = 8 SE +/- 5.79, N = 10 5065.10 7299.55 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 AVX-512 Off AVX-512 On 2K 4K 6K 8K 10K SE +/- 31.65, N = 3 SE +/- 11.62, N = 4 6391.52 8652.01 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
Mobile Neural Network MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. This MNN test profile is building the OpenMP / CPU threaded version for processor benchmarking and not any GPU-accelerated test. MNN does allow making use of AVX-512 extensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 AVX-512 Off AVX-512 On 6 12 18 24 30 SE +/- 0.08, N = 8 SE +/- 0.08, N = 9 24.10 15.44 -mno-avx512f - MIN: 23.44 / MAX: 71.32 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 14.79 / MAX: 54.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 AVX-512 Off AVX-512 On 3 6 9 12 15 SE +/- 0.091, N = 8 SE +/- 0.147, N = 9 9.146 8.579 -mno-avx512f - MIN: 7.72 / MAX: 19.03 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 6.67 / MAX: 21.5 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: mnasnet AVX-512 Off AVX-512 On 10 20 30 40 50 SE +/- 0.49, N = 3 SE +/- 0.26, N = 8 44.19 42.30 -mno-avx512f - MIN: 41.56 / MAX: 148.98 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 39.64 / MAX: 571.78 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: efficientnet-b0 AVX-512 Off AVX-512 On 14 28 42 56 70 SE +/- 0.31, N = 3 SE +/- 0.35, N = 8 62.86 57.71 -mno-avx512f - MIN: 59.46 / MAX: 154.82 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 54.52 / MAX: 522.74 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: blazeface AVX-512 Off AVX-512 On 7 14 21 28 35 SE +/- 0.54, N = 3 SE +/- 0.18, N = 8 29.00 25.88 -mno-avx512f - MIN: 26.03 / MAX: 144.12 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 24.45 / MAX: 112.37 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: googlenet AVX-512 Off AVX-512 On 20 40 60 80 100 SE +/- 1.35, N = 3 SE +/- 0.76, N = 8 76.13 72.80 -mno-avx512f - MIN: 70.13 / MAX: 155.61 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 67.13 / MAX: 388.52 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: resnet50 AVX-512 Off AVX-512 On 15 30 45 60 75 SE +/- 0.89, N = 3 SE +/- 0.50, N = 8 67.48 66.34 -mno-avx512f - MIN: 63.4 / MAX: 170.14 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 62.92 / MAX: 194.74 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: regnety_400m AVX-512 Off AVX-512 On 60 120 180 240 300 SE +/- 3.75, N = 3 SE +/- 2.25, N = 8 270.12 247.32 -mno-avx512f - MIN: 245.47 / MAX: 498.8 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 232.61 / MAX: 506.55 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: vision_transformer AVX-512 Off AVX-512 On 20 40 60 80 100 SE +/- 5.82, N = 3 SE +/- 1.59, N = 8 86.16 74.93 -mno-avx512f - MIN: 73.72 / MAX: 1760.74 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 65.04 / MAX: 2154.62 1. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream AVX-512 Off AVX-512 On 20 40 60 80 100 SE +/- 0.27, N = 3 SE +/- 0.45, N = 3 73.54 85.00
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream AVX-512 Off AVX-512 On 130 260 390 520 650 SE +/- 0.33, N = 3 SE +/- 1.92, N = 3 509.45 616.68
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream AVX-512 Off AVX-512 On 40 80 120 160 200 SE +/- 0.29, N = 3 SE +/- 0.11, N = 3 177.54 187.08
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream AVX-512 Off AVX-512 On 300 600 900 1200 1500 SE +/- 0.78, N = 3 SE +/- 2.39, N = 3 1005.16 1195.94
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream AVX-512 Off AVX-512 On 400 800 1200 1600 2000 SE +/- 0.17, N = 3 SE +/- 4.32, N = 3 1410.81 1953.03
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream AVX-512 Off AVX-512 On 8 16 24 32 40 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 30.61 34.52
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream AVX-512 Off AVX-512 On 20 40 60 80 100 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 71.04 100.55
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream AVX-512 Off AVX-512 On 160 320 480 640 800 SE +/- 0.61, N = 3 SE +/- 0.79, N = 3 490.37 761.22
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream AVX-512 Off AVX-512 On 8 16 24 32 40 SE +/- 0.10, N = 3 SE +/- 0.08, N = 3 30.53 34.25
Numenta Anomaly Benchmark Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time AVX-512 Off AVX-512 On 30 60 90 120 150 144.35 135.77 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Face Detection FP16 - Device: CPU AVX-512 Off AVX-512 On 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 43.94 102.04 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Face Detection FP16-INT8 - Device: CPU AVX-512 Off AVX-512 On 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 94.97 193.93 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU AVX-512 Off AVX-512 On 30K 60K 90K 120K 150K SE +/- 1130.85, N = 4 SE +/- 1328.20, N = 3 108449.49 148967.98 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU AVX-512 Off AVX-512 On 40K 80K 120K 160K 200K SE +/- 1455.37, N = 3 SE +/- 127.59, N = 3 151239.60 170652.71 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Detection FP16 - Device: CPU AVX-512 Off AVX-512 On 10 20 30 40 50 SE +/- 0.22, N = 3 SE +/- 0.20, N = 3 25.50 43.34 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Detection FP32 - Device: CPU AVX-512 Off AVX-512 On 10 20 30 40 50 SE +/- 0.04, N = 3 SE +/- 0.18, N = 3 25.74 43.34 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU AVX-512 Off AVX-512 On 4K 8K 12K 16K 20K SE +/- 1.81, N = 3 SE +/- 12.72, N = 3 9672.94 19800.20 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16 - Device: CPU AVX-512 Off AVX-512 On 2K 4K 6K 8K 10K SE +/- 4.29, N = 3 SE +/- 6.96, N = 3 4110.49 9988.44 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU AVX-512 Off AVX-512 On 2K 4K 6K 8K 10K SE +/- 1.39, N = 3 SE +/- 2.53, N = 3 6065.76 11202.62 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16 - Device: CPU AVX-512 Off AVX-512 On 1600 3200 4800 6400 8000 SE +/- 10.17, N = 3 SE +/- 3.52, N = 3 3782.08 7452.96 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU AVX-512 Off AVX-512 On 2K 4K 6K 8K 10K SE +/- 2.17, N = 3 SE +/- 4.01, N = 3 5135.56 9065.34 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU AVX-512 Off AVX-512 On 200 400 600 800 1000 SE +/- 3.04, N = 3 SE +/- 4.00, N = 3 445.59 956.88 -mno-avx512f -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared
OSPRay Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OSPRay Studio Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
AVX-512 On Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512" CFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 18 December 2022 08:13 by user phoronix.
AVX-512 Off Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mno-avx512f" CFLAGS="-O3 -march=native -mno-avx512f"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 18 December 2022 08:13 by user phoronix.