AMD EPYC 9754 Bergamo AVX-512

AMD EPYC 9754 1P benchmarks with AVX-512 benchmarking and then AVX-512 disabled. Tests by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2307197-NE-AMDBERGAM43
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
AVX512 On
July 16 2023
  7 Hours, 54 Minutes
AVX512 Off
July 16 2023
  11 Hours, 27 Minutes
Invert Behavior (Only Show Selected Data)
  9 Hours, 40 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC 9754 Bergamo AVX-512OpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9754 128-Core @ 2.25GHz (128 Cores / 256 Threads)AMD Titanite_4G (RTI1007B BIOS)AMD Device 14a4768GB2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007ASPEEDBroadcom NetXtreme BCM5720 PCIeUbuntu 22.045.19.0-41-generic (x86_64)GNOME Shell 42.5X Server 1.21.1.41.3.224GCC 11.3.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAMD EPYC 9754 Bergamo AVX-512 BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa0010b - Python 3.10.6- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AVX512 On vs. AVX512 Off ComparisonPhoronix Test SuiteBaseline+346.4%+346.4%+692.8%+692.8%+1039.2%+1039.2%CPU - 512 - GoogLeNet789.3%CPU - 64 - AlexNet778.7%CPU - 32 - AlexNet562.1%CPU - 256 - ResNet-50499.9%CPU - 64 - GoogLeNet499%CPU - 512 - ResNet-50491%CPU - 64 - ResNet-50423.7%CPU - 16 - AlexNet389.1%CPU - 32 - ResNet-50309.7%CPU - 32 - GoogLeNet306.6%CPU - 16 - ResNet-50171%CPU - 16 - GoogLeNet164.3%W.P.D.F - CPU138.2%W.P.D.F - CPU138%F.D.F - CPU132%F.D.F - CPU131.2%M.T.E.T.D.F - CPU130.3%M.T.E.T.D.F - CPU130.2%W.P.D.F.I - CPU107.7%W.P.D.F.I - CPU107.6%F.D.F.I - CPU103.6%F.D.F.I - CPU102.6%P.V.B.D.F - CPU100.2%P.V.B.D.F - CPU100.1%CPU - 512 - AlexNet1385.6%CPU - 256 - AlexNet1238.3%CPU - 256 - GoogLeNet986.6%LBC, LBRY Credits98.6%A.G.R.R.0.F - CPU92.9%N.S.A.8.P.Q.B.B.U - A.M.S92.3%N.S.A.8.P.Q.B.B.U - A.M.S92.2%C.S.9.P.Y.P - A.M.S83.3%C.S.9.P.Y.P - A.M.S81.9%P.D.F - CPU80.2%P.D.F - CPU79.8%P.D.F - CPU78.2%P.D.F - CPU77.9%A.G.R.R.0.F - CPU76.2%gravity_spheres_volume/dim_512/scivis/real_time75.8%Q.S.2.P71.1%gravity_spheres_volume/dim_512/ao/real_time70.8%x25x65.4%Blake-2 S58.9%scrypt47.2%V.D.F.I - CPU43.9%V.D.F.I - CPU43.6%gravity_spheres_volume/dim_512/pathtracer/real_time43.1%25638.4%Garlicoin34.5%OpenMP - BM231.9%OpenMP - BM231.9%OpenMP - BM126.7%OpenMP - BM126.7%Skeincoin25.3%N.T.C.B.b.u.S - A.M.S21.2%N.T.C.B.b.u.S - A.M.S21.1%Myriad-Groestl20.7%V.D.F - CPU20.1%V.D.F - CPU20%N.D.C.o.b.u.o.I - A.M.S19.8%N.T.C.D.m - A.M.S19.7%N.T.C.D.m - A.M.S19.6%N.T.C.B.b.u.c - A.M.S19.6%N.T.C.B.b.u.c - A.M.S19.3%N.D.C.o.b.u.o.I - A.M.S19.2%vklBenchmark ISPC18.6%Pathtracer ISPC - Asian Dragon18.6%Pathtracer ISPC - Asian Dragon Obj17%N.Q.A.B.b.u.S.1.P - A.M.S16.1%N.Q.A.B.b.u.S.1.P - A.M.S14.8%R.N.N.T - bf16bf16bf16 - CPU12.2%Pathtracer ISPC - Crown11.9%A.G.R.R.0.F.I - CPU11.4%C.C.R.5.I - A.M.S11.4%C.C.R.5.I - A.M.S11.3%A.G.R.R.0.F.I - CPU10.6%1284.6%TensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowTensorFlowOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOTensorFlowTensorFlowTensorFlowCpuminer-OptOpenVINONeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOSPRayCpuminer-OptOSPRayCpuminer-OptCpuminer-OptCpuminer-OptOpenVINOOpenVINOOSPRaylibxsmmCpuminer-OptminiBUDEminiBUDEminiBUDEminiBUDECpuminer-OptNeural Magic DeepSparseNeural Magic DeepSparseCpuminer-OptOpenVINOOpenVINONeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseOpenVKLEmbreeEmbreeNeural Magic DeepSparseNeural Magic DeepSparseoneDNNEmbreeOpenVINONeural Magic DeepSparseNeural Magic DeepSparseOpenVINOlibxsmmAVX512 OnAVX512 Off

AMD EPYC 9754 Bergamo AVX-512minibude: OpenMP - BM1minibude: OpenMP - BM2openvino: Face Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUembree: Pathtracer ISPC - Crownembree: Pathtracer ISPC - Asian Dragonembree: Pathtracer ISPC - Asian Dragon Objminibude: OpenMP - BM1minibude: OpenMP - BM2libxsmm: 256libxsmm: 128tensorflow: CPU - 64 - ResNet-50tensorflow: CPU - 16 - AlexNettensorflow: CPU - 32 - AlexNettensorflow: CPU - 64 - AlexNettensorflow: CPU - 256 - AlexNettensorflow: CPU - 512 - AlexNettensorflow: CPU - 16 - GoogLeNettensorflow: CPU - 16 - ResNet-50tensorflow: CPU - 32 - GoogLeNettensorflow: CPU - 32 - ResNet-50tensorflow: CPU - 64 - GoogLeNettensorflow: CPU - 256 - GoogLeNettensorflow: CPU - 256 - ResNet-50tensorflow: CPU - 512 - GoogLeNettensorflow: CPU - 512 - ResNet-50openvkl: vklBenchmark ISPCospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timedeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamcpuminer-opt: scryptcpuminer-opt: Skeincoincpuminer-opt: Myriad-Groestlcpuminer-opt: x25xcpuminer-opt: Blake-2 Scpuminer-opt: Garlicoincpuminer-opt: LBC, LBRY Creditscpuminer-opt: Quad SHA-256, Pyriteonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-StreamAVX512 OnAVX512 Off237.027238.88760.7327.0827.011430.45118.005690.346073.22580.4011818.336638.71110240.8973970.12125.5414157.6450134.83965925.6705972.1873342.52690.796.63342.88562.48857.551422.361632.40104.7743.11180.7771.62277.24501.67119.32417.25122.81139832.755731.675327.971573.53931381.5669247.9653970.0568624.7370127.0611316.167973.14592993.2111749538628.764977.8972386505309066066014989371174.751048.372334.292339.8144.84540.3511.2610.52110.3510.829.630.991.58858.402946.2608259.646865.8909102.2238498.4779201.5964859.7119187.107181.13226.1815.0614.991190.9157.953954.902551.70251.975692.993317.3462564.1666895.49112.2274132.9504115.26824677.6824528.3052415.32573.318.4570.1084.9697.59106.28109.8839.6415.9144.4617.4846.2846.1719.8946.9220.78117919.173118.020319.546361.3689718.3032213.6393870.9541522.110469.3341260.767161.17552033.159379777149.953010.374555580394733326678761371317.572423.394153.594170.4953.791094.5216.1725.06254.0122.4719.281.911.761023.169088.9093298.062873.3502122.2295906.9414244.15131025.4806OpenBenchmarking.org

CPU Temperature Monitor

OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System MonitoringAVX512 OnAVX512 Off1530456075Min: 23.25 / Avg: 51.4 / Max: 74.25Min: 20.75 / Avg: 44.22 / Max: 76.13

CPU Peak Freq (Highest CPU Core Frequency) Monitor

OpenBenchmarking.orgMegahertzCPU Peak Freq (Highest CPU Core Frequency) MonitorPhoronix Test Suite System MonitoringAVX512 OffAVX512 On6001200180024003000Min: 2203 / Avg: 2979.69 / Max: 3559Min: 2250 / Avg: 2918.06 / Max: 3532

CPU Power Consumption Monitor

OpenBenchmarking.orgWattsCPU Power Consumption MonitorPhoronix Test Suite System MonitoringAVX512 OnAVX512 Off70140210280350Min: 10.25 / Avg: 231.36 / Max: 398.39Min: 10.15 / Avg: 179.15 / Max: 378.14

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX512 OffAVX512 On50100150200250SE +/- 0.72, N = 8SE +/- 0.09, N = 9187.11237.031. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX512 OffAVX512 On50100150200250SE +/- 0.62, N = 3SE +/- 0.02, N = 3181.13238.891. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPUAVX512 OffAVX512 On1428425670SE +/- 0.36, N = 3SE +/- 0.06, N = 326.1860.731. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPUAVX512 OffAVX512 On612182430SE +/- 0.21, N = 3SE +/- 0.30, N = 1215.0627.081. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPUAVX512 OffAVX512 On612182430SE +/- 0.16, N = 5SE +/- 0.18, N = 1214.9927.011. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPUAVX512 OffAVX512 On30060090012001500SE +/- 15.59, N = 14SE +/- 22.93, N = 151190.911430.451. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPUAVX512 OffAVX512 On306090120150SE +/- 0.03, N = 3SE +/- 0.02, N = 357.95118.001. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPUAVX512 OffAVX512 On12002400360048006000SE +/- 2.12, N = 3SE +/- 89.26, N = 153954.905690.341. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPUAVX512 OffAVX512 On13002600390052006500SE +/- 0.56, N = 3SE +/- 1.32, N = 32551.706073.221. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPUAVX512 OffAVX512 On130260390520650SE +/- 3.15, N = 15SE +/- 7.10, N = 15251.97580.401. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPUAVX512 OffAVX512 On3K6K9K12K15KSE +/- 1.65, N = 3SE +/- 1.17, N = 35692.9911818.331. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX512 OffAVX512 On14002800420056007000SE +/- 26.07, N = 15SE +/- 13.51, N = 33317.346638.711. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUAVX512 OffAVX512 On20K40K60K80K100KSE +/- 278.13, N = 3SE +/- 314.35, N = 362564.16110240.891. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX512 OffAVX512 On16K32K48K64K80KSE +/- 20.10, N = 3SE +/- 95.74, N = 366895.4973970.121. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Embree

OpenBenchmarking.orgFrames Per Second Per Watt, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: CrownAVX512 OffAVX512 On0.15120.30240.45360.60480.7560.5600.672

OpenBenchmarking.orgFrames Per Second Per Watt, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian DragonAVX512 OffAVX512 On0.2070.4140.6210.8281.0350.7150.920

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: CrownAVX512 OffAVX512 On306090120150SE +/- 0.11, N = 7SE +/- 0.09, N = 7112.23125.54MIN: 109.49 / MAX: 117.02MIN: 122.35 / MAX: 131.95

OpenBenchmarking.orgFrames Per Second Per Watt, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon ObjAVX512 OffAVX512 On0.24080.48160.72240.96321.2040.8421.070

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian DragonAVX512 OffAVX512 On306090120150SE +/- 0.11, N = 7SE +/- 0.09, N = 8132.95157.65MIN: 130.4 / MAX: 138.16MIN: 155.33 / MAX: 162.92

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon ObjAVX512 OffAVX512 On306090120150SE +/- 0.08, N = 4SE +/- 0.16, N = 4115.27134.84MIN: 113.48 / MAX: 119.15MIN: 132.49 / MAX: 139

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX512 OffAVX512 On13002600390052006500SE +/- 18.03, N = 8SE +/- 2.27, N = 94677.685925.671. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX512 OffAVX512 On13002600390052006500SE +/- 15.49, N = 3SE +/- 0.44, N = 34528.315972.191. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

libxsmm

OpenBenchmarking.orgGFLOPS/s Per Watt, More Is Betterlibxsmm 2-1.17-3645M N K: 128AVX512 OffAVX512 On369121512.5313.07

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256AVX512 OffAVX512 On7001400210028003500SE +/- 6.53, N = 3SE +/- 5.78, N = 32415.33342.51. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128AVX512 OffAVX512 On6001200180024003000SE +/- 2.70, N = 3SE +/- 12.20, N = 32573.32690.71. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

TensorFlow

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: AlexNetAVX512 OffAVX512 On0.66741.33482.00222.66963.3370.5762.966

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: AlexNetAVX512 OffAVX512 On0.98461.96922.95383.93844.9230.6914.376

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: AlexNetAVX512 OffAVX512 On1.33612.67224.00835.34446.68050.7785.938

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNetAVX512 OffAVX512 On2468100.8396.984

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNetAVX512 OffAVX512 On2468100.8657.097

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: GoogLeNetAVX512 OffAVX512 On0.17420.34840.52260.69680.8710.3350.774

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: ResNet-50AVX512 OffAVX512 On0.06620.13240.19860.26480.3310.1130.294

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: GoogLeNetAVX512 OffAVX512 On0.25610.51220.76831.02441.28050.3601.138

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: ResNet-50AVX512 OffAVX512 On0.09160.18320.27480.36640.4580.1240.407

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: GoogLeNetAVX512 OffAVX512 On0.33210.66420.99631.32841.66050.3691.476

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: ResNet-50AVX512 OffAVX512 On20406080100SE +/- 0.03, N = 3SE +/- 0.06, N = 318.4596.63

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNetAVX512 OffAVX512 On0.46760.93521.40281.87042.3380.3622.078

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: AlexNetAVX512 OffAVX512 On70140210280350SE +/- 0.29, N = 3SE +/- 0.70, N = 670.10342.88

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: AlexNetAVX512 OffAVX512 On120240360480600SE +/- 0.16, N = 3SE +/- 2.06, N = 684.96562.48

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: AlexNetAVX512 OffAVX512 On2004006008001000SE +/- 0.08, N = 3SE +/- 1.99, N = 597.59857.55

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-50AVX512 OffAVX512 On0.12080.24160.36240.48320.6040.1430.537

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNetAVX512 OffAVX512 On30060090012001500SE +/- 0.24, N = 3SE +/- 6.55, N = 3106.281422.36

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNetAVX512 OffAVX512 On400800120016002000SE +/- 0.06, N = 3SE +/- 1.53, N = 3109.881632.40

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: GoogLeNetAVX512 OffAVX512 On20406080100SE +/- 0.17, N = 3SE +/- 1.96, N = 1539.64104.77

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: ResNet-50AVX512 OffAVX512 On1020304050SE +/- 0.05, N = 3SE +/- 0.03, N = 315.9143.11

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: GoogLeNetAVX512 OffAVX512 On4080120160200SE +/- 0.10, N = 3SE +/- 3.09, N = 1544.46180.77

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: ResNet-50AVX512 OffAVX512 On1632486480SE +/- 0.04, N = 3SE +/- 0.23, N = 317.4871.62

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNetAVX512 OffAVX512 On0.39740.79481.19221.58961.9870.3691.766

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: GoogLeNetAVX512 OffAVX512 On60120180240300SE +/- 0.18, N = 3SE +/- 2.62, N = 1546.28277.24

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNetAVX512 OffAVX512 On110220330440550SE +/- 0.06, N = 3SE +/- 4.92, N = 346.17501.67

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-50AVX512 OffAVX512 On0.1220.2440.3660.4880.610.1480.542

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-50AVX512 OffAVX512 On306090120150SE +/- 0.02, N = 3SE +/- 1.01, N = 1219.89119.32

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNetAVX512 OffAVX512 On90180270360450SE +/- 0.05, N = 3SE +/- 5.30, N = 1246.92417.25

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-50AVX512 OffAVX512 On306090120150SE +/- 0.04, N = 3SE +/- 0.99, N = 320.78122.81

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPCAVX512 OffAVX512 On30060090012001500SE +/- 0.33, N = 3SE +/- 2.65, N = 311791398MIN: 178 / MAX: 10473MIN: 229 / MAX: 11779

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAVX512 OffAVX512 On816243240SE +/- 0.02, N = 3SE +/- 0.01, N = 319.1732.76

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeAVX512 OffAVX512 On714212835SE +/- 0.00, N = 3SE +/- 0.02, N = 318.0231.68

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeAVX512 OffAVX512 On714212835SE +/- 0.01, N = 3SE +/- 0.01, N = 319.5527.97

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On1632486480SE +/- 0.02, N = 3SE +/- 0.14, N = 361.3773.54

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On30060090012001500SE +/- 5.33, N = 3SE +/- 1.58, N = 3718.301381.57

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On50100150200250SE +/- 0.14, N = 3SE +/- 7.49, N = 15213.64247.97

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On2004006008001000SE +/- 0.45, N = 3SE +/- 0.42, N = 3870.95970.06

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On130260390520650SE +/- 0.58, N = 3SE +/- 0.52, N = 3522.11624.74

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On306090120150SE +/- 0.03, N = 3SE +/- 0.01, N = 369.33127.06

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On70140210280350SE +/- 0.63, N = 3SE +/- 0.78, N = 3260.77316.17

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On1632486480SE +/- 0.13, N = 3SE +/- 0.19, N = 361.1873.15

Cpuminer-Opt

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25xAVX512 OffAVX512 On4812162010.8116.85

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: scryptAVX512 OffAVX512 On2468106.4768.913

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 SAVX512 OffAVX512 On5K10K15K20K25K14551.2422160.82

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: GarlicoinAVX512 OffAVX512 On60120180240300202.32272.43

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: SkeincoinAVX512 OffAVX512 On80016002400320040002892.873691.56

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-GroestlAVX512 OffAVX512 On122436486044.6751.65

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY CreditsAVX512 OffAVX512 On4008001200160020001052.192003.50

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scryptAVX512 OffAVX512 On6001200180024003000SE +/- 0.24, N = 3SE +/- 1.66, N = 32033.152993.211. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: SkeincoinAVX512 OffAVX512 On300K600K900K1200K1500KSE +/- 1811.04, N = 3SE +/- 4577.90, N = 393797711749531. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-GroestlAVX512 OffAVX512 On2K4K6K8K10KSE +/- 21.55, N = 3SE +/- 340.56, N = 157149.958628.761. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s Per Watt, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, PyriteAVX512 OffAVX512 On90018002700360045002825.314347.58

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25xAVX512 OffAVX512 On11002200330044005500SE +/- 35.48, N = 4SE +/- 15.71, N = 33010.374977.891. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 SAVX512 OffAVX512 On1.6M3.2M4.8M6.4M8MSE +/- 37514.23, N = 15SE +/- 3854.05, N = 3455558072386501. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: GarlicoinAVX512 OffAVX512 On11K22K33K44K55KSE +/- 102.69, N = 3SE +/- 110.15, N = 339473530901. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY CreditsAVX512 OffAVX512 On140K280K420K560K700KSE +/- 141.93, N = 3SE +/- 76.38, N = 33326676606601. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, PyriteAVX512 OffAVX512 On300K600K900K1200K1500KSE +/- 2198.34, N = 3SE +/- 3455.26, N = 387613714989371. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

oneDNN

MinAvgMaxAVX512 Off225028243100AVX512 On225029983097OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.1CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Neural Magic DeepSparse

MinAvgMaxAVX512 Off225029333136AVX512 On225030113136OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225030153127AVX512 On225030263124OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 On225030023416AVX512 Off225030233152OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225029623101AVX512 On225030123134OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225029313101AVX512 On225030343099OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 On225030063115AVX512 Off225030483116OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225029403122AVX512 On225030133127OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225029523146AVX512 On225030063135OpenBenchmarking.orgMegahertz, More Is BetterNeural Magic DeepSparse 1.5CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

OpenVINO

MinAvgMaxAVX512 Off225024913096AVX512 On225027083101OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225026493132AVX512 On225029313185OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225026453103AVX512 On225029363426OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off224129183105AVX512 On225030293257OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225026883124AVX512 On225026963116OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225026593098AVX512 On225027053121OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225023733112AVX512 On225027183095OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off221425373101AVX512 On225026613111OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 On225026193137AVX512 Off225026713102OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225024833131AVX512 On225026793096OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225025903101AVX512 On225028923097OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxAVX512 Off225026263099AVX512 On225027613119OpenBenchmarking.orgMegahertz, More Is BetterOpenVINO 2022.3CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

oneDNN

MinAvgMaxAVX512 Off24.539.845.1AVX512 On30.439.245.3OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.1CPU Temperature Monitor1224364860

Neural Magic DeepSparse

MinAvgMaxAVX512 On34.657.571.4AVX512 Off30.654.570.5OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On39.458.167.6AVX512 Off38.153.864.3OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On38.653.166.5AVX512 Off38.152.965.8OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off40.065.272.9AVX512 On39.862.870.4OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off41.060.768.5AVX512 On39.160.467.1OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off40.054.069.1AVX512 On38.153.166.1OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off38.458.370.5AVX512 On36.057.770.9OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off38.958.370.9AVX512 On37.357.670.9OpenBenchmarking.orgCelsius, Fewer Is BetterNeural Magic DeepSparse 1.5CPU Temperature Monitor20406080100

OpenVINO

MinAvgMaxAVX512 On38.663.170.0AVX512 Off39.859.368.1OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On41.361.273.4AVX512 Off37.657.467.6OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On37.660.672.9AVX512 Off36.558.072.4OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off36.853.962.3AVX512 On36.851.757.5OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off35.865.576.1AVX512 On34.459.570.5OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off42.167.772.8AVX512 On40.059.266.6OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On36.864.269.9AVX512 Off41.961.164.3OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On40.557.365.8AVX512 Off41.056.764.6OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off36.867.473.3AVX512 On36.861.366.1OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On40.560.666.6AVX512 Off41.057.164.8OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 On41.064.670.0AVX512 Off37.361.868.0OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

MinAvgMaxAVX512 Off41.063.969.0AVX512 On41.463.468.1OpenBenchmarking.orgCelsius, Fewer Is BetterOpenVINO 2022.3CPU Temperature Monitor20406080100

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUAVX512 OffAVX512 On30060090012001500SE +/- 1.80, N = 3SE +/- 12.60, N = 41317.571174.75MIN: 1299.38MIN: 1143.881. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPUAVX512 OffAVX512 On5001000150020002500SE +/- 26.84, N = 3SE +/- 0.42, N = 32423.391048.37MIN: 1130.42 / MAX: 2937.22MIN: 508.1 / MAX: 1159.41. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPUAVX512 OffAVX512 On9001800270036004500SE +/- 59.43, N = 3SE +/- 22.79, N = 124153.592334.29MIN: 1975.28 / MAX: 5152.66MIN: 1017.83 / MAX: 3101.051. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPUAVX512 OffAVX512 On9001800270036004500SE +/- 41.79, N = 5SE +/- 14.83, N = 124170.492339.81MIN: 1759.65 / MAX: 4969.5MIN: 1045.35 / MAX: 3232.691. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPUAVX512 OffAVX512 On1224364860SE +/- 0.62, N = 14SE +/- 0.60, N = 1553.7944.84MIN: 13.85 / MAX: 136.83MIN: 8.1 / MAX: 137.011. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPUAVX512 OffAVX512 On2004006008001000SE +/- 0.34, N = 3SE +/- 0.07, N = 31094.52540.35MIN: 509.48 / MAX: 1179.63MIN: 257.81 / MAX: 586.561. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPUAVX512 OffAVX512 On48121620SE +/- 0.01, N = 3SE +/- 0.15, N = 1516.1711.26MIN: 8.44 / MAX: 56.88MIN: 4.37 / MAX: 45.061. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPUAVX512 OffAVX512 On612182430SE +/- 0.01, N = 3SE +/- 0.00, N = 325.0610.52MIN: 13.08 / MAX: 57.02MIN: 5.11 / MAX: 34.371. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPUAVX512 OffAVX512 On60120180240300SE +/- 2.83, N = 15SE +/- 1.22, N = 15254.01110.35MIN: 116.71 / MAX: 398.88MIN: 49.7 / MAX: 183.591. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPUAVX512 OffAVX512 On510152025SE +/- 0.01, N = 3SE +/- 0.00, N = 322.4710.82MIN: 10.87 / MAX: 43.5MIN: 5 / MAX: 31.441. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX512 OffAVX512 On510152025SE +/- 0.14, N = 15SE +/- 0.02, N = 319.289.63MIN: 10.31 / MAX: 50.61MIN: 6.4 / MAX: 33.261. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUAVX512 OffAVX512 On0.42980.85961.28941.71922.149SE +/- 0.01, N = 3SE +/- 0.00, N = 31.910.99MIN: 0.67 / MAX: 19.48MIN: 0.35 / MAX: 19.471. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX512 OffAVX512 On0.3960.7921.1881.5841.98SE +/- 0.00, N = 3SE +/- 0.00, N = 31.761.58MIN: 0.64 / MAX: 19.6MIN: 0.55 / MAX: 17.781. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On2004006008001000SE +/- 1.40, N = 3SE +/- 0.33, N = 31023.17858.40

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On20406080100SE +/- 0.67, N = 3SE +/- 0.05, N = 388.9146.26

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On60120180240300SE +/- 0.24, N = 3SE +/- 6.02, N = 15298.06259.65

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On1632486480SE +/- 0.04, N = 3SE +/- 0.03, N = 373.3565.89

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On306090120150SE +/- 0.15, N = 3SE +/- 0.07, N = 3122.23102.22

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On2004006008001000SE +/- 0.94, N = 3SE +/- 0.16, N = 3906.94498.48

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On50100150200250SE +/- 0.64, N = 3SE +/- 0.47, N = 3244.15201.60

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamAVX512 OffAVX512 On2004006008001000SE +/- 1.31, N = 3SE +/- 0.45, N = 31025.48859.71

148 Results Shown

CPU Temperature Monitor:
  Phoronix Test Suite System Monitoring:
    Celsius
    Megahertz
    Watts
miniBUDE:
  OpenMP - BM1
  OpenMP - BM2
OpenVINO:
  Face Detection FP16 - CPU
  Person Detection FP16 - CPU
  Person Detection FP32 - CPU
  Vehicle Detection FP16 - CPU
  Face Detection FP16-INT8 - CPU
  Vehicle Detection FP16-INT8 - CPU
  Weld Porosity Detection FP16 - CPU
  Machine Translation EN To DE FP16 - CPU
  Weld Porosity Detection FP16-INT8 - CPU
  Person Vehicle Bike Detection FP16 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU
Embree:
  Pathtracer ISPC - Crown
  Pathtracer ISPC - Asian Dragon
Embree
Embree
Embree:
  Pathtracer ISPC - Asian Dragon
  Pathtracer ISPC - Asian Dragon Obj
miniBUDE:
  OpenMP - BM1
  OpenMP - BM2
libxsmm
libxsmm:
  256
  128
TensorFlow:
  CPU - 16 - AlexNet
  CPU - 32 - AlexNet
  CPU - 64 - AlexNet
  CPU - 256 - AlexNet
  CPU - 512 - AlexNet
  CPU - 16 - GoogLeNet
  CPU - 16 - ResNet-50
  CPU - 32 - GoogLeNet
  CPU - 32 - ResNet-50
  CPU - 64 - GoogLeNet
TensorFlow
TensorFlow
TensorFlow:
  CPU - 16 - AlexNet
  CPU - 32 - AlexNet
  CPU - 64 - AlexNet
TensorFlow
TensorFlow:
  CPU - 256 - AlexNet
  CPU - 512 - AlexNet
  CPU - 16 - GoogLeNet
  CPU - 16 - ResNet-50
  CPU - 32 - GoogLeNet
  CPU - 32 - ResNet-50
TensorFlow
TensorFlow:
  CPU - 64 - GoogLeNet
  CPU - 256 - GoogLeNet
TensorFlow
TensorFlow:
  CPU - 256 - ResNet-50
  CPU - 512 - GoogLeNet
  CPU - 512 - ResNet-50
OpenVKL
OSPRay:
  gravity_spheres_volume/dim_512/ao/real_time
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/pathtracer/real_time
Neural Magic DeepSparse:
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream
  NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream
Cpuminer-Opt:
  x25x
  scrypt
  Blake-2 S
  Garlicoin
  Skeincoin
  Myriad-Groestl
  LBC, LBRY Credits
Cpuminer-Opt:
  scrypt
  Skeincoin
  Myriad-Groestl
Cpuminer-Opt
Cpuminer-Opt:
  x25x
  Blake-2 S
  Garlicoin
  LBC, LBRY Credits
  Quad SHA-256, Pyrite
oneDNN:
  CPU Peak Freq (Highest CPU Core Frequency) Monitor:
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
    Megahertz
  CPU Temp Monitor:
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
    Celsius
oneDNN
OpenVINO:
  Face Detection FP16 - CPU
  Person Detection FP16 - CPU
  Person Detection FP32 - CPU
  Vehicle Detection FP16 - CPU
  Face Detection FP16-INT8 - CPU
  Vehicle Detection FP16-INT8 - CPU
  Weld Porosity Detection FP16 - CPU
  Machine Translation EN To DE FP16 - CPU
  Weld Porosity Detection FP16-INT8 - CPU
  Person Vehicle Bike Detection FP16 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU
Neural Magic DeepSparse:
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream
  NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream