AMD EPYC Turin 2025 New AVX-512 Benchmarks

AMD EPYC 9655P AVX-512 on/off benchmarks by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2501295-NE-AMDEPYCTU09
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Comparison
Transpose Comparison

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
EPYC Turin: AVX-512 Enabled
January 20
  5 Hours, 37 Minutes
EPYC Turin: AVX-512 Disabled
January 20
  6 Hours, 58 Minutes
Invert Behavior (Only Show Selected Data)
  6 Hours, 18 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC Turin 2025 New AVX-512 BenchmarksOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.13.0-rc4-phx-stock (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC Turin 2025 New AVX-512 Benchmarks PerformanceSystem Logs- Transparent Huge Pages: madvise- CXXFLAGS="-O3 -march=znver5 -mprefer-vector-width=512 -flto" CFLAGS="-O3 -march=znver5 -mprefer-vector-width=512 -flto" - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

mt-dgemm: Sustained Floating-Point Rateopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUcpuminer-opt: scryptcpuminer-opt: LBC, LBRY Creditsopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUonednn: IP Shapes 3D - CPUminibude: OpenMP - BM2minibude: OpenMP - BM2onednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardcpuminer-opt: Skeincoinopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUy-cruncher: 10By-cruncher: 5Bospray: gravity_spheres_volume/dim_512/scivis/real_timetensorflow: CPU - 512 - ResNet-50ospray: gravity_spheres_volume/dim_512/ao/real_timeonnx: fcn-resnet101-11 - CPU - Standardtensorflow: CPU - 256 - ResNet-50openvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPUmnn: resnet-v2-50onnx: bertsquad-12 - CPU - Standardonnx: super-resolution-10 - CPU - Standardopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUgromacs: MPI CPU - water_GMX50_bareonednn: IP Shapes 1D - CPUospray: gravity_spheres_volume/dim_512/pathtracer/real_timeonnx: ArcFace ResNet-100 - CPU - Standardopenvkl: vklBenchmarkCPU ISPCpytorch: CPU - 512 - ResNet-50pytorch: CPU - 256 - ResNet-50onednn: Recurrent Neural Network Training - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048libxsmm: 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024srsran: PDSCH Processor Benchmark, Throughput Totalllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048ospray-studio: 1 - 4K - 32 - Path Tracer - CPUospray-studio: 2 - 4K - 1 - Path Tracer - CPUospray-studio: 2 - 4K - 16 - Path Tracer - CPUospray-studio: 1 - 4K - 1 - Path Tracer - CPUospray-studio: 1 - 4K - 16 - Path Tracer - CPUospray-studio: 3 - 4K - 16 - Path Tracer - CPUospray-studio: 2 - 4K - 32 - Path Tracer - CPUospray-studio: 3 - 4K - 1 - Path Tracer - CPUospray-studio: 3 - 4K - 32 - Path Tracer - CPUonednn: Recurrent Neural Network Inference - CPUsvt-av1: Preset 5 - Bosphorus 4Kembree: Pathtracer ISPC - Asian Dragonsvt-av1: Preset 8 - Bosphorus 4Ksvt-av1: Preset 3 - Bosphorus 4Kembree: Pathtracer ISPC - Crownonnx: GPT-2 - CPU - Standardembree: Pathtracer ISPC - Asian Dragon Objmnn: SqueezeNetV1.0numpy: smhasher: FarmHash32 x86_64 AVXmnn: mobilenetV3llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512onnx: T5 Encoder - CPU - Standardlaghos: Sedov Blast Wave, ube_922_hex.meshsvt-av1: Preset 13 - Bosphorus 4Konnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time Per Output Tokenopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time To First Tokensmhasher: FarmHash32 x86_64 AVXminibude: OpenMP - BM1minibude: OpenMP - BM1EPYC Turin AVX-512 Enabled AVX-512 Disabled4225.43289712.537611.172937.2675099322.782102.034.5267.47710.0410524.043488.2327.50857.1655.9513.476825.092.8216580.187.036775.460.2659277162.269286.4916.724660.71697964.06611286043147.06325.740.3144.05321.84734.6891241.2235.28499.56433212.8266.577.40922.4331215.162162706.1417.4160.53341438.197548.7817281750.8450.52428.500106.71108.743422.0109.25107.84107.32124852.3107.343095887313901868138231629731077102435955278.31056.477172.9627184.79115.649136.4155196.348148.32433.121882.0037638.621.736356.76244.989603.92416.846153.7076.507954.0811415.608620.50305.0910844.58134.64750104.827315.0318.5922.808254.7506368.73770.73251743.732189.06846.4424500767.12713.9613.30198.17241.773605.751200.3779.90295.88162.0238.052516.897.146688.9815.263132.503.075543666.094146.64412.86011.3543937.152074719085.54558.740.5368.12933.08523.1678161.2724.02806.58603152.4791.409.95017.7590170.949129762.9114.1250.64621131.581040.5548235343.4443.33491.68793.4595.273004.696.1595.0394.90110897.995.473466897415503966153821809834480113539785307.05051.735159.2583171.28914.521126.6317182.810138.27883.327828.6736139.951.803347.10238.640590.60412.131235.7054.309144.1897226.970824.66945.4682056.30865.85096151.84910.9727.0122.658136.6573416.427OpenBenchmarking.org

ACES DGEMM

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateAVX-512 EnabledAVX-512 Disabled9001800270036004500SE +/- 8.70, N = 4SE +/- 0.03, N = 34225.4370.731. (CC) gcc options: -ffast-math -O3 -march=znver5 -flto -mavx2 -fopenmp -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled1020304050SE +/- 0.01, N = 3SE +/- 0.03, N = 312.5343.73-march=znver5 - MIN: 5.9 / MAX: 31.1MIN: 23.02 / MAX: 62.811. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled16003200480064008000SE +/- 5.62, N = 3SE +/- 1.32, N = 37611.172189.06-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Cpuminer-Opt

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 24.3Algorithm: scryptAVX-512 EnabledAVX-512 Disabled6001200180024003000SE +/- 3.40, N = 3SE +/- 0.72, N = 32937.26846.44-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lcurl -lz -lpthread -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 24.3Algorithm: LBC, LBRY CreditsAVX-512 EnabledAVX-512 Disabled160K320K480K640K800KSE +/- 622.53, N = 3SE +/- 2841.66, N = 3750993245007-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lcurl -lz -lpthread -lgmp

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled1530456075SE +/- 0.07, N = 3SE +/- 0.18, N = 322.7867.12-march=znver5 - MIN: 11.57 / MAX: 67.03MIN: 17.27 / MAX: 133.161. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled5001000150020002500SE +/- 6.46, N = 3SE +/- 1.87, N = 32102.03713.96-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.00, N = 3SE +/- 0.01, N = 34.5213.30-march=znver5 - MIN: 2.34 / MAX: 21.71MIN: 7.56 / MAX: 31.81. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.09, N = 3SE +/- 0.68, N = 367.47198.17-march=znver5 - MIN: 29.95 / MAX: 185.48MIN: 94.52 / MAX: 339.841. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled150300450600750SE +/- 0.91, N = 3SE +/- 0.82, N = 3710.04241.77-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled2K4K6K8K10KSE +/- 6.44, N = 3SE +/- 2.23, N = 310524.043605.75-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled7001400210028003500SE +/- 4.77, N = 3SE +/- 0.28, N = 33488.231200.37-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.04, N = 3SE +/- 0.02, N = 327.5079.90-march=znver5 - MIN: 13.39 / MAX: 61.16MIN: 35.69 / MAX: 151.021. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 0.97, N = 3SE +/- 0.09, N = 3857.16295.88-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.07, N = 3SE +/- 0.05, N = 355.95162.02-march=znver5 - MIN: 26.76 / MAX: 125.18MIN: 75.62 / MAX: 264.141. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled918273645SE +/- 0.03, N = 3SE +/- 0.09, N = 313.4738.05-march=znver5 - MIN: 7.72 / MAX: 51.08MIN: 16.64 / MAX: 61.981. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled15003000450060007500SE +/- 13.37, N = 3SE +/- 6.13, N = 36825.092516.89-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled246810SE +/- 0.00, N = 3SE +/- 0.01, N = 32.827.14-march=znver5 - MIN: 1.14 / MAX: 18.12MIN: 2.91 / MAX: 32.041. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled4K8K12K16K20KSE +/- 12.77, N = 3SE +/- 8.72, N = 316580.186688.98-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled48121620SE +/- 0.02, N = 3SE +/- 0.01, N = 37.0315.26-march=znver5 - MIN: 3.43 / MAX: 24.51MIN: 7.72 / MAX: 48.831. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled15003000450060007500SE +/- 17.52, N = 3SE +/- 2.95, N = 36775.463132.50-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUAVX-512 EnabledAVX-512 Disabled0.6921.3842.0762.7683.46SE +/- 0.000384, N = 5SE +/- 0.002544, N = 50.2659273.075540-mno-avx512f - MIN: 3.021. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 EnabledAVX-512 Disabled15003000450060007500SE +/- 54.17, N = 3SE +/- 2.24, N = 37162.273666.091. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 EnabledAVX-512 Disabled60120180240300SE +/- 2.17, N = 3SE +/- 0.09, N = 3286.49146.641. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUAVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.01312, N = 3SE +/- 0.00410, N = 36.7246612.86010MIN: 4.2-mno-avx512f - MIN: 9.81. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUAVX-512 EnabledAVX-512 Disabled0.30470.60940.91411.21881.5235SE +/- 0.001216, N = 9SE +/- 0.001116, N = 90.7169791.354390MIN: 0.62-mno-avx512f - MIN: 1.311. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled1428425670SE +/- 0.50, N = 3SE +/- 0.45, N = 1564.0737.151. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Cpuminer-Opt

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 24.3Algorithm: SkeincoinAVX-512 EnabledAVX-512 Disabled300K600K900K1200K1500KSE +/- 6599.64, N = 3SE +/- 2345.30, N = 31286043747190-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lcurl -lz -lpthread -lgmp

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled306090120150SE +/- 0.11, N = 3SE +/- 0.04, N = 3147.0685.54-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled120240360480600SE +/- 0.27, N = 3SE +/- 0.19, N = 3325.74558.74-march=znver5 - MIN: 255.71 / MAX: 363.13MIN: 273.34 / MAX: 587.281. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled0.11930.23860.35790.47720.5965SE +/- 0.00, N = 3SE +/- 0.00, N = 30.310.53-march=znver5 - MIN: 0.12 / MAX: 25.73MIN: 0.18 / MAX: 13.441. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Y-Cruncher

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 10BAVX-512 EnabledAVX-512 Disabled1530456075SE +/- 0.01, N = 3SE +/- 0.19, N = 344.0568.13

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 5BAVX-512 EnabledAVX-512 Disabled816243240SE +/- 0.04, N = 3SE +/- 0.00, N = 321.8533.09

OSPRay

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeAVX-512 EnabledAVX-512 Disabled816243240SE +/- 0.01, N = 3SE +/- 0.01, N = 334.6923.17

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 0.01, N = 3SE +/- 0.06, N = 3241.22161.27

OSPRay

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAVX-512 EnabledAVX-512 Disabled816243240SE +/- 0.02, N = 3SE +/- 0.01, N = 335.2824.03

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.13137, N = 15SE +/- 0.04636, N = 39.564336.586031. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 0.23, N = 3SE +/- 0.30, N = 3212.82152.47

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPUAVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.94, N = 3SE +/- 1.17, N = 1566.5791.40

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 3.0Model: resnet-v2-50AVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.117, N = 3SE +/- 0.086, N = 37.4099.950-march=znver5 - MIN: 7.08 / MAX: 9.54MIN: 9.4 / MAX: 12.21. (CXX) g++ options: -O3 -flto -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: bertsquad-12 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled510152025SE +/- 0.19, N = 3SE +/- 0.06, N = 322.4317.761. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 0.59, N = 3SE +/- 1.92, N = 3215.16170.951. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled30K60K90K120K150KSE +/- 180.45, N = 3SE +/- 170.89, N = 3162706.14129762.91-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareAVX-512 EnabledAVX-512 Disabled48121620SE +/- 0.02, N = 3SE +/- 0.04, N = 317.4214.13-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lm

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUAVX-512 EnabledAVX-512 Disabled0.14540.29080.43620.58160.727SE +/- 0.001169, N = 4SE +/- 0.000257, N = 40.5334140.646211MIN: 0.49-mno-avx512f - MIN: 0.591. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OSPRay

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeAVX-512 EnabledAVX-512 Disabled918273645SE +/- 0.02, N = 3SE +/- 0.04, N = 338.2031.58

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled1122334455SE +/- 0.53, N = 3SE +/- 0.36, N = 848.7840.551. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 2.0.0Benchmark: vklBenchmarkCPU ISPCAVX-512 EnabledAVX-512 Disabled6001200180024003000SE +/- 0.58, N = 3SE +/- 0.58, N = 328172353MIN: 217 / MAX: 36244MIN: 179 / MAX: 30230

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled1122334455SE +/- 0.28, N = 3SE +/- 0.14, N = 350.8443.44MIN: 44.51 / MAX: 52.23MIN: 38.59 / MAX: 44.4

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled1122334455SE +/- 0.31, N = 3SE +/- 0.21, N = 350.5243.33MIN: 43.73 / MAX: 51.63MIN: 38.8 / MAX: 44.27

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUAVX-512 EnabledAVX-512 Disabled110220330440550SE +/- 0.35, N = 3SE +/- 0.56, N = 3428.50491.69MIN: 421.37-mno-avx512f - MIN: 486.041. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.46, N = 3SE +/- 0.60, N = 3106.7193.45-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.94, N = 3SE +/- 0.65, N = 3108.7495.27-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128AVX-512 EnabledAVX-512 Disabled7001400210028003500SE +/- 5.19, N = 3SE +/- 10.56, N = 33422.03004.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 1.17, N = 5SE +/- 1.01, N = 4109.2596.15-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.76, N = 3SE +/- 0.94, N = 3107.8495.03-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 1.15, N = 3SE +/- 0.77, N = 3107.3294.90-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

srsRAN Project

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput TotalAVX-512 EnabledAVX-512 Disabled30K60K90K120K150KSE +/- 1542.84, N = 3SE +/- 1008.70, N = 3124852.3110897.91. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.56, N = 3SE +/- 0.54, N = 3107.3495.47-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled7K14K21K28K35KSE +/- 80.19, N = 3SE +/- 88.93, N = 33095834668

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 0.67, N = 3SE +/- 0.67, N = 3873974

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled3K6K9K12K15KSE +/- 18.02, N = 3SE +/- 17.35, N = 31390115503

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 0.33, N = 3SE +/- 0.33, N = 3868966

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled3K6K9K12K15KSE +/- 15.50, N = 3SE +/- 6.36, N = 31382315382

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled4K8K12K16K20KSE +/- 1.45, N = 3SE +/- 34.35, N = 31629718098

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled7K14K21K28K35KSE +/- 30.12, N = 3SE +/- 52.20, N = 33107734480

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 1.15, N = 3SE +/- 0.33, N = 310241135

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled9K18K27K36K45KSE +/- 50.45, N = 3SE +/- 59.00, N = 33595539785

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUAVX-512 EnabledAVX-512 Disabled70140210280350SE +/- 0.34, N = 3SE +/- 0.47, N = 3278.31307.05MIN: 271.12-mno-avx512f - MIN: 301.261. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 5 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled1326395265SE +/- 0.23, N = 3SE +/- 0.29, N = 356.4851.741. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian DragonAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.06, N = 8SE +/- 0.10, N = 7172.96159.26MIN: 169.32 / MAX: 176.68MIN: 154.75 / MAX: 162.47

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 8 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 1.02, N = 4SE +/- 1.07, N = 4184.79171.291. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 3 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled48121620SE +/- 0.03, N = 3SE +/- 0.05, N = 315.6514.521. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: CrownAVX-512 EnabledAVX-512 Disabled306090120150SE +/- 0.09, N = 7SE +/- 0.04, N = 7136.42126.63MIN: 132.5 / MAX: 141.56MIN: 123.57 / MAX: 130.86

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: GPT-2 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.28, N = 3SE +/- 0.35, N = 3196.35182.811. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian Dragon ObjAVX-512 EnabledAVX-512 Disabled306090120150SE +/- 0.11, N = 5SE +/- 0.18, N = 4148.32138.28MIN: 144.93 / MAX: 151.79MIN: 134.48 / MAX: 141.24

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 3.0Model: SqueezeNetV1.0AVX-512 EnabledAVX-512 Disabled0.74861.49722.24582.99443.743SE +/- 0.011, N = 3SE +/- 0.030, N = 33.1213.327-march=znver5 - MIN: 3.05 / MAX: 5.8MIN: 3.21 / MAX: 5.61. (CXX) g++ options: -O3 -flto -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 1.81, N = 3SE +/- 0.83, N = 3882.00828.67

SMHasher

SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMiB/sec, More Is BetterSMHasher 2022-08-22Hash: FarmHash32 x86_64 AVXAVX-512 EnabledAVX-512 Disabled8K16K24K32K40KSE +/- 10.36, N = 6SE +/- 21.63, N = 637638.6236139.951. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -flto=auto -fno-fat-lto-objects

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 3.0Model: mobilenetV3AVX-512 EnabledAVX-512 Disabled0.40570.81141.21711.62282.0285SE +/- 0.010, N = 3SE +/- 0.010, N = 31.7361.803-march=znver5 - MIN: 1.59 / MAX: 2.21MIN: 1.68 / MAX: 2.881. (CXX) g++ options: -O3 -flto -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512AVX-512 EnabledAVX-512 Disabled80160240320400SE +/- 2.40, N = 5SE +/- 2.08, N = 5356.76347.10-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 1.84, N = 3SE +/- 0.75, N = 3244.99238.641. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Laghos

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshAVX-512 EnabledAVX-512 Disabled130260390520650SE +/- 2.72, N = 3SE +/- 3.13, N = 3603.92590.60-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 13 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled90180270360450SE +/- 2.58, N = 6SE +/- 1.74, N = 6416.85412.131. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

ONNX Runtime

OpenBenchmarking.orgCelsius, Fewer Is BetterONNX Runtime 1.19CPU Temperature MonitorAVX-512 EnabledAVX-512 Disabled1122334455Min: 39.75 / Avg: 48.18 / Max: 49.63Min: 39.5 / Avg: 51.21 / Max: 53.25

OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19CPU Power Consumption MonitorAVX-512 EnabledAVX-512 Disabled60120180240300Min: 0.49 / Avg: 258.72 / Max: 282.16Min: 0.55 / Avg: 286.1 / Max: 324.37

OpenBenchmarking.orgMegahertz, More Is BetterONNX Runtime 1.19CPU Peak Freq (Highest CPU Core Frequency) MonitorAVX-512 EnabledAVX-512 Disabled8001600240032004000Min: 2600 / Avg: 2916.93 / Max: 4543Min: 2600 / Avg: 3239.62 / Max: 4575

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 1.61, N = 4SE +/- 9.18, N = 12153.71235.711. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled246810SE +/- 0.06862, N = 4SE +/- 0.15591, N = 126.507954.309141. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

miniBUDE

OpenBenchmarking.orgCelsius, Fewer Is BetterminiBUDE 20210901CPU Temperature MonitorAVX-512 EnabledAVX-512 Disabled1020304050Min: 26.38 / Avg: 41.78 / Max: 46.38Min: 28 / Avg: 43.7 / Max: 48.25

OpenBenchmarking.orgWatts, Fewer Is BetterminiBUDE 20210901CPU Power Consumption MonitorAVX-512 EnabledAVX-512 Disabled60120180240300Min: 36.47 / Avg: 167.14 / Max: 343.87Min: 36.55 / Avg: 215.04 / Max: 347.21

OpenBenchmarking.orgMegahertz, More Is BetterminiBUDE 20210901CPU Peak Freq (Highest CPU Core Frequency) MonitorAVX-512 EnabledAVX-512 Disabled8001600240032004000Min: 2600 / Avg: 2984.55 / Max: 4564Min: 2600 / Avg: 3205.72 / Max: 4570

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 EnabledAVX-512 Disabled60120180240300SE +/- 5.97, N = 15SE +/- 2.22, N = 15254.75136.661. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 EnabledAVX-512 Disabled14002800420056007000SE +/- 149.28, N = 15SE +/- 55.43, N = 156368.743416.431. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

94 Results Shown

ACES DGEMM
OpenVINO:
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
Cpuminer-Opt:
  scrypt
  LBC, LBRY Credits
OpenVINO:
  Road Segmentation ADAS FP16 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    ms
  Person Detection FP16 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    FPS
  Handwritten English Recognition FP16 - CPU:
    FPS
    ms
  Machine Translation EN To DE FP16 - CPU:
    FPS
    ms
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Face Detection Retail FP16 - CPU:
    ms
    FPS
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
oneDNN
miniBUDE:
  OpenMP - BM2:
    GFInst/s
    Billion Interactions/s
oneDNN:
  Deconvolution Batch shapes_1d - CPU
  Deconvolution Batch shapes_3d - CPU
ONNX Runtime
Cpuminer-Opt
OpenVINO:
  Face Detection FP16-INT8 - CPU:
    FPS
    ms
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
    ms
Y-Cruncher:
  10B
  5B
OSPRay
TensorFlow
OSPRay
ONNX Runtime
TensorFlow
OpenVINO GenAI
Mobile Neural Network
ONNX Runtime:
  bertsquad-12 - CPU - Standard
  super-resolution-10 - CPU - Standard
OpenVINO
GROMACS
oneDNN
OSPRay
ONNX Runtime
OpenVKL
PyTorch:
  CPU - 512 - ResNet-50
  CPU - 256 - ResNet-50
oneDNN
Llama.cpp:
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
libxsmm
Llama.cpp:
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024
srsRAN Project
Llama.cpp
OSPRay Studio:
  1 - 4K - 32 - Path Tracer - CPU
  2 - 4K - 1 - Path Tracer - CPU
  2 - 4K - 16 - Path Tracer - CPU
  1 - 4K - 1 - Path Tracer - CPU
  1 - 4K - 16 - Path Tracer - CPU
  3 - 4K - 16 - Path Tracer - CPU
  2 - 4K - 32 - Path Tracer - CPU
  3 - 4K - 1 - Path Tracer - CPU
  3 - 4K - 32 - Path Tracer - CPU
oneDNN
SVT-AV1
Embree
SVT-AV1:
  Preset 8 - Bosphorus 4K
  Preset 3 - Bosphorus 4K
Embree
ONNX Runtime
Embree
Mobile Neural Network
Numpy Benchmark
SMHasher
Mobile Neural Network
Llama.cpp
ONNX Runtime
Laghos
SVT-AV1
ONNX Runtime:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
miniBUDE:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
miniBUDE:
  OpenMP - BM1:
    Billion Interactions/s
    GFInst/s