AMD EPYC 9754 Bergamo AVX-512 AMD EPYC 9754 1P benchmarks with AVX-512 benchmarking and then AVX-512 disabled. Tests by Michael Larabel for a future article.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2307197-NE-AMDBERGAM43 AVX512 On Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa0010bPython Notes: Python 3.10.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AVX512 Off Processor: AMD EPYC 9754 128-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: AMD Titanite_4G (RTI1007B BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.04, Kernel: 5.19.0-41-generic (x86_64), Desktop: GNOME Shell 42.5, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 11.3.0, File-System: ext4, Screen Resolution: 1024x768
AMD EPYC 9754 Bergamo AVX-512 OpenBenchmarking.org Phoronix Test Suite AMD EPYC 9754 128-Core @ 2.25GHz (128 Cores / 256 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 768GB 2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007 ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 22.04 5.19.0-41-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 1.3.224 GCC 11.3.0 ext4 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution AMD EPYC 9754 Bergamo AVX-512 Benchmarks System Logs - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa0010b - Python 3.10.6 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AVX512 On vs. AVX512 Off Comparison Phoronix Test Suite Baseline +346.4% +346.4% +692.8% +692.8% +1039.2% +1039.2% CPU - 512 - GoogLeNet 789.3% CPU - 64 - AlexNet 778.7% CPU - 32 - AlexNet 562.1% CPU - 256 - ResNet-50 499.9% CPU - 64 - GoogLeNet 499% CPU - 512 - ResNet-50 491% CPU - 64 - ResNet-50 423.7% CPU - 16 - AlexNet 389.1% CPU - 32 - ResNet-50 309.7% CPU - 32 - GoogLeNet 306.6% CPU - 16 - ResNet-50 171% CPU - 16 - GoogLeNet 164.3% W.P.D.F - CPU 138.2% W.P.D.F - CPU 138% F.D.F - CPU 132% F.D.F - CPU 131.2% M.T.E.T.D.F - CPU 130.3% M.T.E.T.D.F - CPU 130.2% W.P.D.F.I - CPU 107.7% W.P.D.F.I - CPU 107.6% F.D.F.I - CPU 103.6% F.D.F.I - CPU 102.6% P.V.B.D.F - CPU 100.2% P.V.B.D.F - CPU 100.1% CPU - 512 - AlexNet 1385.6% CPU - 256 - AlexNet 1238.3% CPU - 256 - GoogLeNet 986.6% LBC, LBRY Credits 98.6% A.G.R.R.0.F - CPU 92.9% N.S.A.8.P.Q.B.B.U - A.M.S 92.3% N.S.A.8.P.Q.B.B.U - A.M.S 92.2% C.S.9.P.Y.P - A.M.S 83.3% C.S.9.P.Y.P - A.M.S 81.9% P.D.F - CPU 80.2% P.D.F - CPU 79.8% P.D.F - CPU 78.2% P.D.F - CPU 77.9% A.G.R.R.0.F - CPU 76.2% gravity_spheres_volume/dim_512/scivis/real_time 75.8% Q.S.2.P 71.1% gravity_spheres_volume/dim_512/ao/real_time 70.8% x25x 65.4% Blake-2 S 58.9% scrypt 47.2% V.D.F.I - CPU 43.9% V.D.F.I - CPU 43.6% gravity_spheres_volume/dim_512/pathtracer/real_time 43.1% 256 38.4% Garlicoin 34.5% OpenMP - BM2 31.9% OpenMP - BM2 31.9% OpenMP - BM1 26.7% OpenMP - BM1 26.7% Skeincoin 25.3% N.T.C.B.b.u.S - A.M.S 21.2% N.T.C.B.b.u.S - A.M.S 21.1% Myriad-Groestl 20.7% V.D.F - CPU 20.1% V.D.F - CPU 20% N.D.C.o.b.u.o.I - A.M.S 19.8% N.T.C.D.m - A.M.S 19.7% N.T.C.D.m - A.M.S 19.6% N.T.C.B.b.u.c - A.M.S 19.6% N.T.C.B.b.u.c - A.M.S 19.3% N.D.C.o.b.u.o.I - A.M.S 19.2% vklBenchmark ISPC 18.6% Pathtracer ISPC - Asian Dragon 18.6% Pathtracer ISPC - Asian Dragon Obj 17% N.Q.A.B.b.u.S.1.P - A.M.S 16.1% N.Q.A.B.b.u.S.1.P - A.M.S 14.8% R.N.N.T - bf16bf16bf16 - CPU 12.2% Pathtracer ISPC - Crown 11.9% A.G.R.R.0.F.I - CPU 11.4% C.C.R.5.I - A.M.S 11.4% C.C.R.5.I - A.M.S 11.3% A.G.R.R.0.F.I - CPU 10.6% 128 4.6% TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow TensorFlow OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO TensorFlow TensorFlow TensorFlow Cpuminer-Opt OpenVINO Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse OpenVINO OpenVINO OpenVINO OpenVINO OpenVINO OSPRay Cpuminer-Opt OSPRay Cpuminer-Opt Cpuminer-Opt Cpuminer-Opt OpenVINO OpenVINO OSPRay libxsmm Cpuminer-Opt miniBUDE miniBUDE miniBUDE miniBUDE Cpuminer-Opt Neural Magic DeepSparse Neural Magic DeepSparse Cpuminer-Opt OpenVINO OpenVINO Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse Neural Magic DeepSparse OpenVKL Embree Embree Neural Magic DeepSparse Neural Magic DeepSparse oneDNN Embree OpenVINO Neural Magic DeepSparse Neural Magic DeepSparse OpenVINO libxsmm AVX512 On AVX512 Off
AMD EPYC 9754 Bergamo AVX-512 minibude: OpenMP - BM1 minibude: OpenMP - BM1 minibude: OpenMP - BM2 minibude: OpenMP - BM2 libxsmm: 128 libxsmm: 256 embree: Pathtracer ISPC - Crown embree: Pathtracer ISPC - Asian Dragon embree: Pathtracer ISPC - Asian Dragon Obj openvkl: vklBenchmark ISPC ospray: gravity_spheres_volume/dim_512/ao/real_time ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/pathtracer/real_time onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU cpuminer-opt: x25x cpuminer-opt: scrypt cpuminer-opt: Blake-2 S cpuminer-opt: Garlicoin cpuminer-opt: Skeincoin cpuminer-opt: Myriad-Groestl cpuminer-opt: LBC, LBRY Credits cpuminer-opt: Quad SHA-256, Pyrite tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 64 - AlexNet tensorflow: CPU - 256 - AlexNet tensorflow: CPU - 512 - AlexNet tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 32 - ResNet-50 tensorflow: CPU - 64 - GoogLeNet tensorflow: CPU - 64 - ResNet-50 tensorflow: CPU - 256 - GoogLeNet tensorflow: CPU - 256 - ResNet-50 tensorflow: CPU - 512 - GoogLeNet tensorflow: CPU - 512 - ResNet-50 deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU AVX512 On AVX512 Off 5925.670 237.027 5972.187 238.887 2690.7 3342.5 125.5414 157.6450 134.8396 1398 32.7557 31.6753 27.9715 1174.75 4977.89 2993.21 7238650 53090 1174953 8628.76 660660 1498937 342.88 562.48 857.55 1422.36 1632.40 104.77 43.11 180.77 71.62 277.24 96.63 501.67 119.32 417.25 122.81 73.5393 858.4029 1381.5669 46.2608 247.9653 259.6468 970.0568 65.8909 624.7370 102.2238 127.0611 498.4779 316.1679 201.5964 73.1459 859.7119 60.73 1048.37 27.08 2334.29 27.01 2339.81 1430.45 44.84 118.00 540.35 5690.34 11.26 6073.22 10.52 580.40 110.35 11818.33 10.82 6638.71 9.63 110240.89 0.99 73970.12 1.58 4677.682 187.107 4528.305 181.132 2573.3 2415.3 112.2274 132.9504 115.2682 1179 19.1731 18.0203 19.5463 1317.57 3010.37 2033.15 4555580 39473 937977 7149.95 332667 876137 70.10 84.96 97.59 106.28 109.88 39.64 15.91 44.46 17.48 46.28 18.45 46.17 19.89 46.92 20.78 61.3689 1023.1690 718.3032 88.9093 213.6393 298.0628 870.9541 73.3502 522.1104 122.2295 69.3341 906.9414 260.7671 244.1513 61.1755 1025.4806 26.18 2423.39 15.06 4153.59 14.99 4170.49 1190.91 53.79 57.95 1094.52 3954.90 16.17 2551.70 25.06 251.97 254.01 5692.99 22.47 3317.34 19.28 62564.16 1.91 66895.49 1.76 OpenBenchmarking.org
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 AVX512 On AVX512 Off 1300 2600 3900 5200 6500 SE +/- 2.27, N = 9 SE +/- 18.03, N = 8 5925.67 4677.68 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 AVX512 On AVX512 Off 1300 2600 3900 5200 6500 SE +/- 0.44, N = 3 SE +/- 15.49, N = 3 5972.19 4528.31 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
libxsmm Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OSPRay Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
Cpuminer-Opt Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.
Neural Magic DeepSparse This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 16 32 48 64 80 SE +/- 0.14, N = 3 SE +/- 0.02, N = 3 73.54 61.37
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 300 600 900 1200 1500 SE +/- 1.58, N = 3 SE +/- 5.33, N = 3 1381.57 718.30
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 50 100 150 200 250 SE +/- 7.49, N = 15 SE +/- 0.14, N = 3 247.97 213.64
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 200 400 600 800 1000 SE +/- 0.42, N = 3 SE +/- 0.45, N = 3 970.06 870.95
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 130 260 390 520 650 SE +/- 0.52, N = 3 SE +/- 0.58, N = 3 624.74 522.11
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 30 60 90 120 150 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 127.06 69.33
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 70 140 210 280 350 SE +/- 0.78, N = 3 SE +/- 0.63, N = 3 316.17 260.77
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream AVX512 On AVX512 Off 16 32 48 64 80 SE +/- 0.19, N = 3 SE +/- 0.13, N = 3 73.15 61.18
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU AVX512 On AVX512 Off 14 28 42 56 70 SE +/- 0.06, N = 3 SE +/- 0.36, N = 3 60.73 26.18 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU AVX512 On AVX512 Off 6 12 18 24 30 SE +/- 0.30, N = 12 SE +/- 0.21, N = 3 27.08 15.06 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU AVX512 On AVX512 Off 6 12 18 24 30 SE +/- 0.18, N = 12 SE +/- 0.16, N = 5 27.01 14.99 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU AVX512 On AVX512 Off 300 600 900 1200 1500 SE +/- 22.93, N = 15 SE +/- 15.59, N = 14 1430.45 1190.91 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU AVX512 On AVX512 Off 30 60 90 120 150 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 118.00 57.95 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU AVX512 On AVX512 Off 1200 2400 3600 4800 6000 SE +/- 89.26, N = 15 SE +/- 2.12, N = 3 5690.34 3954.90 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU AVX512 On AVX512 Off 1300 2600 3900 5200 6500 SE +/- 1.32, N = 3 SE +/- 0.56, N = 3 6073.22 2551.70 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU AVX512 On AVX512 Off 130 260 390 520 650 SE +/- 7.10, N = 15 SE +/- 3.15, N = 15 580.40 251.97 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU AVX512 On AVX512 Off 3K 6K 9K 12K 15K SE +/- 1.17, N = 3 SE +/- 1.65, N = 3 11818.33 5692.99 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU AVX512 On AVX512 Off 1400 2800 4200 5600 7000 SE +/- 13.51, N = 3 SE +/- 26.07, N = 15 6638.71 3317.34 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU AVX512 On AVX512 Off 20K 40K 60K 80K 100K SE +/- 314.35, N = 3 SE +/- 278.13, N = 3 110240.89 62564.16 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU AVX512 On AVX512 Off 16K 32K 48K 64K 80K SE +/- 95.74, N = 3 SE +/- 20.10, N = 3 73970.12 66895.49 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
CPU Peak Freq (Highest CPU Core Frequency) Monitor OpenBenchmarking.org Megahertz CPU Peak Freq (Highest CPU Core Frequency) Monitor Phoronix Test Suite System Monitoring AVX512 On AVX512 Off 600 1200 1800 2400 3000 Min: 2250 / Avg: 2918.06 / Max: 3532 Min: 2203 / Avg: 2979.69 / Max: 3559
CPU Power Consumption Monitor OpenBenchmarking.org Watts CPU Power Consumption Monitor Phoronix Test Suite System Monitoring AVX512 On AVX512 Off 70 140 210 280 350 Min: 10.25 / Avg: 231.36 / Max: 398.39 Min: 10.15 / Avg: 179.15 / Max: 378.14
CPU Temperature Monitor OpenBenchmarking.org Celsius CPU Temperature Monitor Phoronix Test Suite System Monitoring AVX512 On AVX512 Off 15 30 45 60 75 Min: 23.25 / Avg: 51.4 / Max: 74.25 Min: 20.75 / Avg: 44.22 / Max: 76.13
AVX512 On Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa0010bPython Notes: Python 3.10.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 16 July 2023 06:38 by user phoronix.
AVX512 Off Processor: AMD EPYC 9754 128-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: AMD Titanite_4G (RTI1007B BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.04, Kernel: 5.19.0-41-generic (x86_64), Desktop: GNOME Shell 42.5, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 11.3.0, File-System: ext4, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa0010bPython Notes: Python 3.10.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 16 July 2023 14:04 by user phoronix.