AMD EPYC Turin AVX-512 Comparison

AMD EPYC 9755 AVX-512 comparison by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2410104-NE-TURINAVX566
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
AVX-512 Off
September 29
  5 Hours, 55 Minutes
AVX-512 256b DP
September 28
  6 Hours, 28 Minutes
AVX-512 512b DP
September 30
  6 Hours, 26 Minutes
Invert Behavior (Only Show Selected Data)
  6 Hours, 16 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC Turin AVX-512 ComparisonOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9755 128-Core @ 2.70GHz (128 Cores / 256 Threads)AMD VOLCANO (RVOT1000D BIOS)AMD Device 153a12 x 64GB DDR5-6000MT/s Samsung M321R8GA0PB1-CCPKC2 x 1920GB KIOXIA KCD8XPUG1T92ASPEEDBroadcom NetXtreme BCM5720 PCIeUbuntu 24.046.10.0-phx (x86_64)GCC 13.2.0ext41920x1200ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionAMD EPYC Turin AVX-512 Comparison BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002110 - Python 3.12.2- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AVX-512 OffAVX-512 256b DPAVX-512 512b DPResult OverviewPhoronix Test Suite100%137%174%212%249%oneDNNNAMDminiBUDEOpenVINOOSPRayTensorFlowsimdjsonGROMACSONNX RuntimeY-CruncherMobile Neural NetworkXmrigOSPRay StudioOpenVKLPyTorchlibxsmmSVT-AV1EmbreeNumpy BenchmarkSMHasherOpenFOAM

AVX-512 OffAVX-512 256b DPAVX-512 512b DPPer Watt Result OverviewPhoronix Test Suite100%138%176%215%miniBUDENAMDTensorFlowOSPRayGROMACSlibxsmmsimdjsonPyTorchOpenVKLXmrigEmbreeSVT-AV1Numpy BenchmarkP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.MP.W.G.M

AMD EPYC Turin AVX-512 Comparisononednn: IP Shapes 3D - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUonednn: Deconvolution Batch shapes_3d - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUnamd: STMV with 1,066,628 Atomsnamd: ATPase with 327,506 Atomsminibude: OpenMP - BM1minibude: OpenMP - BM1minibude: OpenMP - BM2minibude: OpenMP - BM2openvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeonednn: Convolution Batch Shapes Auto - CPUtensorflow: CPU - 512 - ResNet-50y-cruncher: 10Bsimdjson: TopTweetmnn: resnet-v2-50simdjson: DistinctUserIDy-cruncher: 5Bsimdjson: PartialTweetstensorflow: CPU - 256 - ResNet-50ospray-studio: 3 - 4K - 32 - Path Tracer - CPUonnx: super-resolution-10 - CPU - Standardopenvino: Road Segmentation ADAS FP16-INT8 - CPUsimdjson: Kostyaopenvino: Road Segmentation ADAS FP16-INT8 - CPUgromacs: MPI CPU - water_GMX50_bareospray: gravity_spheres_volume/dim_512/pathtracer/real_timeopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvkl: vklBenchmarkCPU ISPCopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUpytorch: CPU - 1 - ResNet-50onnx: ArcFace ResNet-100 - CPU - Standardpytorch: CPU - 256 - ResNet-50simdjson: LargeRandonednn: Recurrent Neural Network Training - CPUsvt-av1: Preset 8 - Bosphorus 4Kospray-studio: 2 - 4K - 32 - Path Tracer - CPUospray-studio: 1 - 4K - 32 - Path Tracer - CPUospray-studio: 1 - 4K - 16 - Path Tracer - CPUpytorch: CPU - 512 - ResNet-50ospray-studio: 2 - 4K - 1 - Path Tracer - CPUospray-studio: 2 - 4K - 16 - Path Tracer - CPUospray-studio: 1 - 4K - 1 - Path Tracer - CPUospray-studio: 3 - 4K - 16 - Path Tracer - CPUospray-studio: 3 - 4K - 1 - Path Tracer - CPUmnn: mobilenetV3svt-av1: Preset 5 - Bosphorus 4Klibxsmm: 128svt-av1: Preset 3 - Bosphorus 4Kembree: Pathtracer ISPC - Asian Dragonembree: Pathtracer ISPC - Crownnumpy: embree: Pathtracer ISPC - Asian Dragon Objsmhasher: FarmHash32 x86_64 AVXonnx: GPT-2 - CPU - Standardy-cruncher: 1Bopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenfoam: drivaerFastback, Medium Mesh Size - Execution Timesvt-av1: Preset 13 - Bosphorus 4Konnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardsmhasher: FarmHash32 x86_64 AVXxmrig: GhostRider - 1MAVX-512 OffAVX-512 256b DPAVX-512 512b DP3.1419842.423010.6313.354727.35160.34398.5768.681.312141861.7215.324150.452.280977.00068195.3124882.808191.6054790.132382.43167.0710.266189.69117.14544.3810.756.2911702.4419828.7930.607231.71900.373925179.2561.2567.4910.0537.6431.2457.30158.2530820161.96424.304.692626.8318.35042.13160.573099165052.3645.0538.838038.381.40447.798101.84422955228691139638.8271911451714133578362.05546.2453431.713.038205.9535167.0048739.69178.483932099.03181.5568.02020.421965161.95194378.434181.7365.5049125.74575.5057859.030417.05826.17395127.0197.8942426.51216826.70.32322111.7910112.346.2910058.8473.73866.8330.350.7003794203.217.908024.134.1702613.31002328.8708221.739316.6567916.409710.9889.907.488450.68137.51463.889.255.0413579.9024619.0444.787645.40870.350034221.2547.2718.848.8518.9924.6028.60190.0825393162.96022.365.662855.3419.54449.34990.553560175156.0451.6344.612943.551.57482.881108.12421646215391073343.5067510816671126937911.98048.5953652.913.762221.3817178.1347794.98190.120034422.41191.1767.75119.976723163.5145388.542177.4875.6762622.43635.2286255.821717.92096.13926112.5428.9286126.88517480.40.32233110.7510450.984.4913925.7657.471111.6926.630.5090194780.247.218755.754.6256514.18293395.3709884.254387.0509676.254766.9983.325.6111239.49194.04329.056.603.8718690.3231620.5846.326946.97210.257632246.1045.8559.877.6749.9224.0679.46203.3324030206.82319.295.903297.8522.83950.90150.483660192118.5652.3343.822243.851.59425.220114.98020438203791016943.3964210231639119637521.87250.7373764.414.278223.1431179.6822795.31191.639034394.98192.8377.78920.24893160.99742400.898143.9406.9679122.84735.1836946.336921.59984.8348196.805010.329926.49120005.2OpenBenchmarking.org

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 3D - Engine: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP0.70691.41382.12072.82763.5345SE +/- 0.003218, N = 5SE +/- 0.000924, N = 5SE +/- 0.000544, N = 53.1419800.3232210.322331MIN: 3.071. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP1020304050SE +/- 0.24, N = 3SE +/- 0.06, N = 3SE +/- 0.07, N = 342.4211.7910.75MIN: 15.51 / MAX: 60.87MIN: 6.96 / MAX: 33.85MIN: 6.13 / MAX: 31.811. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 16.72, N = 3SE +/- 47.84, N = 3SE +/- 29.86, N = 33010.6310112.3410450.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 313.356.294.49MIN: 6.22 / MAX: 36.5MIN: 3.16 / MAX: 20.63MIN: 1.96 / MAX: 22.351. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP3K6K9K12K15KSE +/- 1.32, N = 3SE +/- 6.22, N = 3SE +/- 7.20, N = 34727.3510058.8413925.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.34, N = 3SE +/- 0.11, N = 3SE +/- 0.30, N = 3160.3473.7357.47MIN: 88.77 / MAX: 245.28MIN: 35.26 / MAX: 113MIN: 26.92 / MAX: 106.121. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2004006008001000SE +/- 0.88, N = 3SE +/- 1.32, N = 3SE +/- 5.73, N = 3398.57866.831111.691. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP1530456075SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 368.6830.3526.63MIN: 40.62 / MAX: 87.32MIN: 17.71 / MAX: 42.78MIN: 15.61 / MAX: 47.581. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_3d - Engine: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP0.29520.59040.88561.18081.476SE +/- 0.001041, N = 9SE +/- 0.000342, N = 9SE +/- 0.000468, N = 91.3121400.7003790.509019MIN: 1.28MIN: 0.68MIN: 0.481. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP10002000300040005000SE +/- 1.11, N = 3SE +/- 2.89, N = 3SE +/- 1.88, N = 31861.724203.214780.241. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP48121620SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 315.327.907.21MIN: 7.91 / MAX: 45.71MIN: 5.02 / MAX: 29.4MIN: 4.13 / MAX: 25.41. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 9.55, N = 3SE +/- 14.15, N = 3SE +/- 25.40, N = 34150.458024.138755.751. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

NAMD

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: STMV with 1,066,628 AtomsAVX-512 OffAVX-512 256b DPAVX-512 512b DP1.04082.08163.12244.16325.204SE +/- 0.00213, N = 3SE +/- 0.00881, N = 4SE +/- 0.00436, N = 42.280974.170264.62565

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: ATPase with 327,506 AtomsAVX-512 OffAVX-512 256b DPAVX-512 512b DP48121620SE +/- 0.01866, N = 3SE +/- 0.02496, N = 7SE +/- 0.06451, N = 77.0006813.3100214.18293

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 OffAVX-512 256b DPAVX-512 512b DP90180270360450SE +/- 0.12, N = 8SE +/- 0.13, N = 10SE +/- 0.11, N = 11195.31328.87395.371. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 3.11, N = 8SE +/- 3.23, N = 10SE +/- 2.65, N = 114882.818221.749884.251. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 OffAVX-512 256b DPAVX-512 512b DP80160240320400SE +/- 0.64, N = 3SE +/- 0.70, N = 4SE +/- 4.22, N = 4191.61316.66387.051. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 15.88, N = 3SE +/- 17.43, N = 4SE +/- 105.52, N = 44790.137916.419676.251. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP170340510680850SE +/- 0.04, N = 3SE +/- 0.61, N = 3SE +/- 0.42, N = 3382.43710.98766.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.02, N = 3SE +/- 0.08, N = 3SE +/- 0.04, N = 3167.0789.9083.32MIN: 75.83 / MAX: 256.32MIN: 40.69 / MAX: 160.4MIN: 35.6 / MAX: 146.571. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 310.267.485.61MIN: 6.21 / MAX: 38.75MIN: 4.14 / MAX: 23.69MIN: 2.24 / MAX: 30.881. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 1.18, N = 3SE +/- 0.88, N = 3SE +/- 0.53, N = 36189.698450.6811239.491. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.46, N = 3117.14137.51194.041. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP120240360480600SE +/- 0.31, N = 3SE +/- 0.07, N = 3SE +/- 0.76, N = 3544.38463.88329.05MIN: 261.15 / MAX: 569.93MIN: 405.2 / MAX: 482.04MIN: 146.73 / MAX: 360.281. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 310.759.256.60MIN: 4.66 / MAX: 32.11MIN: 4.08 / MAX: 21.6MIN: 2.24 / MAX: 26.721. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 36.295.043.87MIN: 3.06 / MAX: 24.12MIN: 2.51 / MAX: 19.14MIN: 1.69 / MAX: 21.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP4K8K12K16K20KSE +/- 1.96, N = 3SE +/- 5.42, N = 3SE +/- 12.83, N = 311702.4413579.9018690.321. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP7K14K21K28K35KSE +/- 6.94, N = 3SE +/- 4.94, N = 3SE +/- 23.78, N = 319828.7924619.0431620.581. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OSPRay

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeAVX-512 OffAVX-512 256b DPAVX-512 512b DP1122334455SE +/- 0.02, N = 3SE +/- 0.13, N = 3SE +/- 0.11, N = 330.6144.7946.33

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAVX-512 OffAVX-512 256b DPAVX-512 512b DP1122334455SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.16, N = 331.7245.4146.97

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Convolution Batch Shapes Auto - Engine: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP0.08410.16820.25230.33640.4205SE +/- 0.000549, N = 7SE +/- 0.000363, N = 7SE +/- 0.000532, N = 70.3739250.3500340.257632MIN: 0.35MIN: 0.33MIN: 0.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50AVX-512 OffAVX-512 256b DPAVX-512 512b DP50100150200250SE +/- 0.20, N = 3SE +/- 0.23, N = 3SE +/- 0.45, N = 3179.25221.25246.10

Y-Cruncher

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 10BAVX-512 OffAVX-512 256b DPAVX-512 512b DP1428425670SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 361.2647.2745.86

simdjson

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: TopTweetAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 37.498.849.871. (CXX) g++ options: -O3 -lrt

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: resnet-v2-50AVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.040, N = 3SE +/- 0.037, N = 3SE +/- 0.113, N = 310.0538.8517.674MIN: 9.74 / MAX: 12.33MIN: 8.53 / MAX: 10.37MIN: 7.38 / MAX: 8.781. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

simdjson

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: DistinctUserIDAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.12, N = 47.648.999.921. (CXX) g++ options: -O3 -lrt

Y-Cruncher

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 5BAVX-512 OffAVX-512 256b DPAVX-512 512b DP714212835SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.15, N = 331.2524.6024.07

simdjson

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: PartialTweetsAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.04, N = 3SE +/- 0.09, N = 6SE +/- 0.09, N = 157.308.609.461. (CXX) g++ options: -O3 -lrt

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50AVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.47, N = 3SE +/- 0.40, N = 3SE +/- 0.98, N = 3158.25190.08203.33

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP7K14K21K28K35KSE +/- 109.14, N = 3SE +/- 2.00, N = 3SE +/- 20.00, N = 3308202539324030

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP50100150200250SE +/- 0.15, N = 3SE +/- 1.79, N = 5SE +/- 0.29, N = 3161.96162.96206.821. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP612182430SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 324.3022.3619.29MIN: 13.82 / MAX: 53.77MIN: 11.44 / MAX: 40.97MIN: 9.84 / MAX: 44.851. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

simdjson

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: KostyaAVX-512 OffAVX-512 256b DPAVX-512 512b DP1.32752.6553.98255.316.6375SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 34.695.665.901. (CXX) g++ options: -O3 -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP7001400210028003500SE +/- 0.80, N = 3SE +/- 3.53, N = 3SE +/- 5.22, N = 32626.832855.343297.851. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareAVX-512 OffAVX-512 256b DPAVX-512 512b DP510152025SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 318.3519.5422.841. (CXX) g++ options: -O3 -lm

OSPRay

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeAVX-512 OffAVX-512 256b DPAVX-512 512b DP1122334455SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 342.1349.3550.90

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP0.12830.25660.38490.51320.6415SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 30.570.550.48MIN: 0.2 / MAX: 26.73MIN: 0.18 / MAX: 22.43MIN: 0.13 / MAX: 25.071. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 2.0.0Benchmark: vklBenchmarkCPU ISPCAVX-512 OffAVX-512 256b DPAVX-512 512b DP8001600240032004000SE +/- 0.33, N = 3SE +/- 1.76, N = 3SE +/- 0.58, N = 3309935603660MIN: 245 / MAX: 36357MIN: 284 / MAX: 41727MIN: 293 / MAX: 42710

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP40K80K120K160K200KSE +/- 339.72, N = 3SE +/- 517.64, N = 3SE +/- 544.06, N = 3165052.36175156.04192118.561. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 1 - Model: ResNet-50AVX-512 OffAVX-512 256b DPAVX-512 512b DP1224364860SE +/- 0.23, N = 3SE +/- 0.12, N = 3SE +/- 0.38, N = 345.0551.6352.33MIN: 43.23 / MAX: 46.3MIN: 48.86 / MAX: 52.93MIN: 49.93 / MAX: 54.1

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP1020304050SE +/- 0.02, N = 3SE +/- 0.38, N = 15SE +/- 0.43, N = 1538.8444.6143.821. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50AVX-512 OffAVX-512 256b DPAVX-512 512b DP1020304050SE +/- 0.43, N = 3SE +/- 0.31, N = 3SE +/- 0.55, N = 338.3843.5543.85MIN: 36.82 / MAX: 39.67MIN: 41.56 / MAX: 44.75MIN: 41.39 / MAX: 45.93

simdjson

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: LargeRandomAVX-512 OffAVX-512 256b DPAVX-512 512b DP0.35780.71561.07341.43121.789SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.401.571.591. (CXX) g++ options: -O3 -lrt

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Training - Engine: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP100200300400500SE +/- 0.35, N = 3SE +/- 0.27, N = 3SE +/- 0.60, N = 3447.80482.88425.22MIN: 443.23MIN: 478.35MIN: 418.821. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 8 - Input: Bosphorus 4KAVX-512 OffAVX-512 256b DPAVX-512 512b DP306090120150SE +/- 0.40, N = 3SE +/- 0.47, N = 3SE +/- 0.52, N = 3101.84108.12114.98-mavx2 -mavx512f -mavx512bw -mavx512dq-mavx2 -mavx512f -mavx512bw -mavx512dq1. (CXX) g++ options: -march=native -mno-avx

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP5K10K15K20K25KSE +/- 25.87, N = 3SE +/- 8.65, N = 3SE +/- 8.88, N = 3229552164620438

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP5K10K15K20K25KSE +/- 34.33, N = 3SE +/- 6.39, N = 3SE +/- 25.38, N = 3228692153920379

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 7.69, N = 3SE +/- 5.86, N = 3SE +/- 9.28, N = 3113961073310169

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50AVX-512 OffAVX-512 256b DPAVX-512 512b DP1020304050SE +/- 0.08, N = 3SE +/- 0.34, N = 3SE +/- 0.52, N = 438.8243.5043.39MIN: 37.4 / MAX: 39.86MIN: 41.09 / MAX: 44.92MIN: 40.31 / MAX: 44.91

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP160320480640800SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3719675642

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2K4K6K8K10KSE +/- 2.85, N = 3SE +/- 11.85, N = 3SE +/- 10.11, N = 3114511081610231

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP150300450600750SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.58, N = 3714671639

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP3K6K9K12K15KSE +/- 13.35, N = 3SE +/- 7.80, N = 3SE +/- 7.06, N = 3133571269311963

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 OffAVX-512 256b DPAVX-512 512b DP2004006008001000SE +/- 0.33, N = 3SE +/- 0.58, N = 3SE +/- 0.33, N = 3836791752

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: mobilenetV3AVX-512 OffAVX-512 256b DPAVX-512 512b DP0.46240.92481.38721.84962.312SE +/- 0.012, N = 3SE +/- 0.033, N = 3SE +/- 0.008, N = 32.0551.9801.872MIN: 1.91 / MAX: 2.32MIN: 1.83 / MAX: 2.34MIN: 1.73 / MAX: 2.431. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 5 - Input: Bosphorus 4KAVX-512 OffAVX-512 256b DPAVX-512 512b DP1122334455SE +/- 0.13, N = 3SE +/- 0.20, N = 3SE +/- 0.02, N = 346.2548.6050.74-mavx2 -mavx512f -mavx512bw -mavx512dq-mavx2 -mavx512f -mavx512bw -mavx512dq1. (CXX) g++ options: -march=native -mno-avx

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128AVX-512 OffAVX-512 256b DPAVX-512 512b DP8001600240032004000SE +/- 12.94, N = 3SE +/- 5.61, N = 3SE +/- 9.17, N = 33431.73652.93764.4-pedantic -fopenmp -march=core-avx2-msse4.2-msse4.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 3 - Input: Bosphorus 4KAVX-512 OffAVX-512 256b DPAVX-512 512b DP48121620SE +/- 0.08, N = 3SE +/- 0.05, N = 3SE +/- 0.05, N = 313.0413.7614.28-mavx2 -mavx512f -mavx512bw -mavx512dq-mavx2 -mavx512f -mavx512bw -mavx512dq1. (CXX) g++ options: -march=native -mno-avx

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian DragonAVX-512 OffAVX-512 256b DPAVX-512 512b DP50100150200250SE +/- 0.08, N = 8SE +/- 0.05, N = 8SE +/- 0.09, N = 8205.95221.38223.14MIN: 202.63 / MAX: 210.88MIN: 217.86 / MAX: 225.89MIN: 218.68 / MAX: 229

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: CrownAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.18, N = 8SE +/- 0.12, N = 8SE +/- 0.09, N = 8167.00178.13179.68MIN: 162.73 / MAX: 173.25MIN: 173.22 / MAX: 184.71MIN: 174.97 / MAX: 186.29

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAVX-512 OffAVX-512 256b DPAVX-512 512b DP2004006008001000SE +/- 1.08, N = 3SE +/- 2.12, N = 3SE +/- 2.30, N = 3739.69794.98795.31

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian Dragon ObjAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.17, N = 5SE +/- 0.11, N = 5SE +/- 0.08, N = 5178.48190.12191.64MIN: 174.73 / MAX: 183.09MIN: 186.67 / MAX: 194.33MIN: 188.18 / MAX: 196.08

SMHasher

SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMiB/sec, More Is BetterSMHasher 2022-08-22Hash: FarmHash32 x86_64 AVXAVX-512 OffAVX-512 256b DPAVX-512 512b DP7K14K21K28K35KSE +/- 19.92, N = 6SE +/- 20.30, N = 6SE +/- 23.52, N = 632099.0334422.4134394.981. (CXX) g++ options: -march=native -O3 -flto=auto -fno-fat-lto-objects

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: GPT-2 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 0.16, N = 3SE +/- 0.18, N = 3SE +/- 0.74, N = 3181.56191.18192.841. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Y-Cruncher

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 1BAVX-512 OffAVX-512 256b DPAVX-512 512b DP246810SE +/- 0.014, N = 5SE +/- 0.005, N = 5SE +/- 0.012, N = 58.0207.7517.789

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeAVX-512 OffAVX-512 256b DPAVX-512 512b DP51015202520.4219.9820.251. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200161.95163.51161.001. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

CPU Temperature Monitor

OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System MonitoringAVX-512 OffAVX-512 256b DPAVX-512 512b DP1326395265Min: 26.13 / Avg: 49.06 / Max: 63.5Min: 25.13 / Avg: 49.34 / Max: 63.75Min: 23.88 / Avg: 50.93 / Max: 66

CPU Power Consumption Monitor

OpenBenchmarking.orgWattsCPU Power Consumption MonitorPhoronix Test Suite System MonitoringAVX-512 OffAVX-512 256b DPAVX-512 512b DP90180270360450Min: 22.25 / Avg: 297.71 / Max: 505.2Min: 22.32 / Avg: 305.93 / Max: 503.55Min: 22.21 / Avg: 292.98 / Max: 502.06

CPU Peak Freq (Highest CPU Core Frequency) Monitor

OpenBenchmarking.orgMegahertzCPU Peak Freq (Highest CPU Core Frequency) MonitorPhoronix Test Suite System MonitoringAVX-512 OffAVX-512 256b DPAVX-512 512b DP8001600240032004000Min: 2294 / Avg: 3647.31 / Max: 4647Min: 2172 / Avg: 3712.06 / Max: 4195Min: 1886 / Avg: 3621.72 / Max: 4224

SVT-AV1

MinAvgMaxAVX-512 Off28.640.144.5AVX-512 256b DP28.640.944.8AVX-512 512b DP31.944.649.0OpenBenchmarking.orgCelsius, Fewer Is BetterSVT-AV1 2.2CPU Temperature Monitor1428425670

MinAvgMaxAVX-512 Off44.5178.1293.7AVX-512 256b DP44.2176.5291.7AVX-512 512b DP44.7178.3298.3OpenBenchmarking.orgWatts, Fewer Is BetterSVT-AV1 2.2CPU Power Consumption Monitor70140210280350

MinAvgMaxAVX-512 Off270036794157AVX-512 256b DP270036394168AVX-512 512b DP270036344159OpenBenchmarking.orgMegahertz, More Is BetterSVT-AV1 2.2CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgFrames Per Second Per Watt, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 13 - Input: Bosphorus 4KAVX-512 OffAVX-512 256b DPAVX-512 512b DP0.5061.0121.5182.0242.532.1252.2022.249

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 13 - Input: Bosphorus 4KAVX-512 OffAVX-512 256b DPAVX-512 512b DP90180270360450SE +/- 0.63, N = 6SE +/- 6.38, N = 15SE +/- 6.48, N = 15378.43388.54400.90-mavx2 -mavx512f -mavx512bw -mavx512dq-mavx2 -mavx512f -mavx512bw -mavx512dq1. (CXX) g++ options: -march=native -mno-avx

ONNX Runtime

MinAvgMaxAVX-512 Off33.652.454.8AVX-512 256b DP33.653.856.1AVX-512 512b DP37.551.553.8OpenBenchmarking.orgCelsius, Fewer Is BetterONNX Runtime 1.19CPU Temperature Monitor1632486480

MinAvgMaxAVX-512 Off44.9381.1419.9AVX-512 256b DP44.6394.2436.1AVX-512 512b DP44.8336.5372.3OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19CPU Power Consumption Monitor110220330440550

MinAvgMaxAVX-512 Off270034024159AVX-512 256b DP270036944148AVX-512 512b DP270031684140OpenBenchmarking.orgMegahertz, More Is BetterONNX Runtime 1.19CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP4080120160200SE +/- 2.26, N = 4SE +/- 4.12, N = 15SE +/- 2.10, N = 15181.74177.49143.941. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP246810SE +/- 0.06677, N = 4SE +/- 0.12978, N = 15SE +/- 0.10140, N = 155.504915.676266.967911. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

MinAvgMaxAVX-512 Off30.853.655.9AVX-512 256b DP30.651.654.8AVX-512 512b DP34.155.857.8OpenBenchmarking.orgCelsius, Fewer Is BetterONNX Runtime 1.19CPU Temperature Monitor1632486480

MinAvgMaxAVX-512 Off24.1353.1382.5AVX-512 256b DP44.6337.8367.1AVX-512 512b DP45.1350.2376.7OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19CPU Power Consumption Monitor100200300400500

MinAvgMaxAVX-512 Off270040444139AVX-512 256b DP270040524148AVX-512 512b DP270040434162OpenBenchmarking.orgMegahertz, More Is BetterONNX Runtime 1.19CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: bertsquad-12 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP1326395265SE +/- 1.68, N = 12SE +/- 0.65, N = 4SE +/- 0.38, N = 1559.0355.8246.341. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: bertsquad-12 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP510152025SE +/- 0.38, N = 12SE +/- 0.21, N = 4SE +/- 0.17, N = 1517.0617.9221.601. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

MinAvgMaxAVX-512 Off33.549.253.6AVX-512 256b DP33.952.254.6AVX-512 512b DP37.649.851.4OpenBenchmarking.orgCelsius, Fewer Is BetterONNX Runtime 1.19CPU Temperature Monitor1530456075

MinAvgMaxAVX-512 Off45.0343.5379.8AVX-512 256b DP28.1381.0419.9AVX-512 512b DP45.1317.6346.2OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19CPU Power Consumption Monitor110220330440550

MinAvgMaxAVX-512 Off270035194155AVX-512 256b DP270038224161AVX-512 512b DP270033874149OpenBenchmarking.orgMegahertz, More Is BetterONNX Runtime 1.19CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP306090120150SE +/- 1.75, N = 15SE +/- 2.11, N = 15SE +/- 0.16, N = 3127.02112.5496.811. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 256b DPAVX-512 512b DP3691215SE +/- 0.11217, N = 15SE +/- 0.16447, N = 15SE +/- 0.01681, N = 37.894248.9286110.329901. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Xmrig

MinAvgMaxAVX-512 Off26.155.760.3AVX-512 256b DP25.655.660.1AVX-512 512b DP28.959.263.0OpenBenchmarking.orgCelsius, Fewer Is BetterXmrig 6.21CPU Temperature Monitor20406080100

MinAvgMaxAVX-512 Off44.9411.9480.5AVX-512 256b DP23.4413.3484.4AVX-512 512b DP45.0425.0483.0OpenBenchmarking.orgWatts, Fewer Is BetterXmrig 6.21CPU Power Consumption Monitor120240360480600

MinAvgMaxAVX-512 Off270040114140AVX-512 256b DP270040134153AVX-512 512b DP270039554136OpenBenchmarking.orgMegahertz, More Is BetterXmrig 6.21CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgH/s Per Watt, More Is BetterXmrig 6.21Variant: GhostRider - Hash Count: 1MAVX-512 OffAVX-512 256b DPAVX-512 512b DP112233445540.8542.2947.07

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.21Variant: GhostRider - Hash Count: 1MAVX-512 OffAVX-512 256b DPAVX-512 512b DP4K8K12K16K20KSE +/- 973.00, N = 15SE +/- 973.55, N = 15SE +/- 779.44, N = 1516826.717480.420005.21. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

105 Results Shown

oneDNN
OpenVINO:
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    ms
    FPS
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
  Handwritten English Recognition FP16-INT8 - CPU:
    ms
oneDNN
OpenVINO:
  Handwritten English Recognition FP16-INT8 - CPU
  Person Vehicle Bike Detection FP16 - CPU
  Person Vehicle Bike Detection FP16 - CPU
NAMD:
  STMV with 1,066,628 Atoms
  ATPase with 327,506 Atoms
miniBUDE:
  OpenMP - BM1:
    Billion Interactions/s
    GFInst/s
  OpenMP - BM2:
    Billion Interactions/s
    GFInst/s
OpenVINO:
  Person Detection FP16 - CPU:
    FPS
    ms
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
  Face Detection FP16-INT8 - CPU:
    FPS
    ms
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
  Face Detection Retail FP16-INT8 - CPU:
    ms
  Weld Porosity Detection FP16-INT8 - CPU:
    FPS
  Face Detection Retail FP16-INT8 - CPU:
    FPS
OSPRay:
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/ao/real_time
oneDNN
TensorFlow
Y-Cruncher
simdjson
Mobile Neural Network
simdjson
Y-Cruncher
simdjson
TensorFlow
OSPRay Studio
ONNX Runtime
OpenVINO
simdjson
OpenVINO
GROMACS
OSPRay
OpenVINO
OpenVKL
OpenVINO
PyTorch
ONNX Runtime
PyTorch
simdjson
oneDNN
SVT-AV1
OSPRay Studio:
  2 - 4K - 32 - Path Tracer - CPU
  1 - 4K - 32 - Path Tracer - CPU
  1 - 4K - 16 - Path Tracer - CPU
PyTorch
OSPRay Studio:
  2 - 4K - 1 - Path Tracer - CPU
  2 - 4K - 16 - Path Tracer - CPU
  1 - 4K - 1 - Path Tracer - CPU
  3 - 4K - 16 - Path Tracer - CPU
  3 - 4K - 1 - Path Tracer - CPU
Mobile Neural Network
SVT-AV1
libxsmm
SVT-AV1
Embree:
  Pathtracer ISPC - Asian Dragon
  Pathtracer ISPC - Crown
Numpy Benchmark
Embree
SMHasher
ONNX Runtime
Y-Cruncher
OpenFOAM:
  drivaerFastback, Small Mesh Size - Execution Time
  drivaerFastback, Medium Mesh Size - Execution Time
CPU Temperature Monitor:
  Phoronix Test Suite System Monitoring:
    Celsius
    Watts
    Megahertz
  CPU Temp Monitor:
    Celsius
  CPU Power Consumption Monitor:
    Watts
  CPU Peak Freq (Highest CPU Core Frequency) Monitor:
    Megahertz
  Preset 13 - Bosphorus 4K:
    Frames Per Second Per Watt
SVT-AV1
ONNX Runtime:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
ONNX Runtime:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
ONNX Runtime:
  bertsquad-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
ONNX Runtime:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
ONNX Runtime:
  fcn-resnet101-11 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
Xmrig:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
  GhostRider - 1M
Xmrig