AMD EPYC Turin 2025 New AVX-512 Benchmarks

AMD EPYC 9655P AVX-512 on/off benchmarks by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2501295-NE-AMDEPYCTU09
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Comparison
Transpose Comparison

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
EPYC Turin: AVX-512 Enabled
January 20
  5 Hours, 37 Minutes
EPYC Turin: AVX-512 Disabled
January 20
  6 Hours, 58 Minutes
Invert Behavior (Only Show Selected Data)
  6 Hours, 18 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC Turin 2025 New AVX-512 BenchmarksOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.13.0-rc4-phx-stock (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC Turin 2025 New AVX-512 Benchmarks PerformanceSystem Logs- Transparent Huge Pages: madvise- CXXFLAGS="-O3 -march=znver5 -mprefer-vector-width=512 -flto" CFLAGS="-O3 -march=znver5 -mprefer-vector-width=512 -flto" - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

minibude: OpenMP - BM1minibude: OpenMP - BM1minibude: OpenMP - BM2minibude: OpenMP - BM2gromacs: MPI CPU - water_GMX50_baremt-dgemm: Sustained Floating-Point Ratelaghos: Sedov Blast Wave, ube_922_hex.meshembree: Pathtracer ISPC - Asian Dragonembree: Pathtracer ISPC - Asian Dragon Objembree: Pathtracer ISPC - Crownopenvkl: vklBenchmarkCPU ISPCospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timeospray-studio: 1 - 4K - 1 - Path Tracer - CPUospray-studio: 1 - 4K - 16 - Path Tracer - CPUospray-studio: 1 - 4K - 32 - Path Tracer - CPUospray-studio: 2 - 4K - 1 - Path Tracer - CPUospray-studio: 2 - 4K - 16 - Path Tracer - CPUospray-studio: 2 - 4K - 32 - Path Tracer - CPUospray-studio: 3 - 4K - 1 - Path Tracer - CPUospray-studio: 3 - 4K - 16 - Path Tracer - CPUospray-studio: 3 - 4K - 32 - Path Tracer - CPUy-cruncher: 5By-cruncher: 10Bcpuminer-opt: scryptcpuminer-opt: Skeincoincpuminer-opt: LBC, LBRY Creditssmhasher: FarmHash32 x86_64 AVXsmhasher: FarmHash32 x86_64 AVXonednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUpytorch: CPU - 256 - ResNet-50pytorch: CPU - 512 - ResNet-50tensorflow: CPU - 256 - ResNet-50tensorflow: CPU - 512 - ResNet-50openvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPUopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time To First Tokenopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time Per Output Tokenllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048onnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardnumpy: libxsmm: 128mnn: mobilenetV3mnn: resnet-v2-50mnn: SqueezeNetV1.0srsran: PDSCH Processor Benchmark, Throughput Totalsvt-av1: Preset 13 - Bosphorus 4Ksvt-av1: Preset 8 - Bosphorus 4Ksvt-av1: Preset 5 - Bosphorus 4Ksvt-av1: Preset 3 - Bosphorus 4KEPYC Turin AVX-512 Enabled AVX-512 Disabled6368.737254.7507162.269286.49117.4164225.432897603.92172.9627148.3243136.4155281735.284934.689138.1975868138233095887313901310771024162973595521.84744.0532937.26128604375099337638.6222.8086.724660.7169790.5334140.265927428.500278.31050.5250.84212.82241.22147.06325.74162706.140.31710.0467.477611.1712.536775.467.03857.1655.9516580.182.823488.2327.502102.0322.7810524.044.526825.0913.4766.5718.5915.03107.84106.71107.34356.76109.25107.32108.749.56433104.8273215.1624.6475022.433144.5813196.3485.0910848.781720.503064.066115.6086244.9894.081146.50795153.707882.003422.01.7367.4093.121124852.3416.846184.79156.47715.6493416.427136.6573666.094146.64414.12570.732517590.60159.2583138.2788126.6317235324.028023.167831.5810966153823466897415503344801135180983978533.08568.129846.4474719024500736139.9522.65812.86011.354390.6462113.07554491.687307.05043.3343.44152.47161.2785.54558.74129762.910.53241.77198.172189.0643.733132.5015.26295.88162.026688.987.141200.3779.90713.9667.123605.7513.302516.8938.0591.4027.0110.9795.0393.4595.47347.1096.1594.9095.276.58603151.849170.9495.8509617.759056.3086182.8105.4682040.554824.669437.152026.9708238.6404.189724.30914235.705828.673004.61.8039.9503.327110897.9412.131171.28951.73514.521OpenBenchmarking.org

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 EnabledAVX-512 Disabled14002800420056007000SE +/- 149.28, N = 15SE +/- 55.43, N = 156368.743416.431. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 EnabledAVX-512 Disabled60120180240300SE +/- 5.97, N = 15SE +/- 2.22, N = 15254.75136.661. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 EnabledAVX-512 Disabled15003000450060007500SE +/- 54.17, N = 3SE +/- 2.24, N = 37162.273666.091. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 EnabledAVX-512 Disabled60120180240300SE +/- 2.17, N = 3SE +/- 0.09, N = 3286.49146.641. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareAVX-512 EnabledAVX-512 Disabled48121620SE +/- 0.02, N = 3SE +/- 0.04, N = 317.4214.13-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lm

ACES DGEMM

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateAVX-512 EnabledAVX-512 Disabled9001800270036004500SE +/- 8.70, N = 4SE +/- 0.03, N = 34225.4370.731. (CC) gcc options: -ffast-math -O3 -march=znver5 -flto -mavx2 -fopenmp -lopenblas

Laghos

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshAVX-512 EnabledAVX-512 Disabled130260390520650SE +/- 2.72, N = 3SE +/- 3.13, N = 3603.92590.60-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian DragonAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.06, N = 8SE +/- 0.10, N = 7172.96159.26MIN: 169.32 / MAX: 176.68MIN: 154.75 / MAX: 162.47

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian Dragon ObjAVX-512 EnabledAVX-512 Disabled306090120150SE +/- 0.11, N = 5SE +/- 0.18, N = 4148.32138.28MIN: 144.93 / MAX: 151.79MIN: 134.48 / MAX: 141.24

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: CrownAVX-512 EnabledAVX-512 Disabled306090120150SE +/- 0.09, N = 7SE +/- 0.04, N = 7136.42126.63MIN: 132.5 / MAX: 141.56MIN: 123.57 / MAX: 130.86

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 2.0.0Benchmark: vklBenchmarkCPU ISPCAVX-512 EnabledAVX-512 Disabled6001200180024003000SE +/- 0.58, N = 3SE +/- 0.58, N = 328172353MIN: 217 / MAX: 36244MIN: 179 / MAX: 30230

OSPRay

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAVX-512 EnabledAVX-512 Disabled816243240SE +/- 0.02, N = 3SE +/- 0.01, N = 335.2824.03

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeAVX-512 EnabledAVX-512 Disabled816243240SE +/- 0.01, N = 3SE +/- 0.01, N = 334.6923.17

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeAVX-512 EnabledAVX-512 Disabled918273645SE +/- 0.02, N = 3SE +/- 0.04, N = 338.2031.58

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 0.33, N = 3SE +/- 0.33, N = 3868966

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled3K6K9K12K15KSE +/- 15.50, N = 3SE +/- 6.36, N = 31382315382

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled7K14K21K28K35KSE +/- 80.19, N = 3SE +/- 88.93, N = 33095834668

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 0.67, N = 3SE +/- 0.67, N = 3873974

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled3K6K9K12K15KSE +/- 18.02, N = 3SE +/- 17.35, N = 31390115503

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled7K14K21K28K35KSE +/- 30.12, N = 3SE +/- 52.20, N = 33107734480

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 1.15, N = 3SE +/- 0.33, N = 310241135

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled4K8K12K16K20KSE +/- 1.45, N = 3SE +/- 34.35, N = 31629718098

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 1.0Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUAVX-512 EnabledAVX-512 Disabled9K18K27K36K45KSE +/- 50.45, N = 3SE +/- 59.00, N = 33595539785

Y-Cruncher

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 5BAVX-512 EnabledAVX-512 Disabled816243240SE +/- 0.04, N = 3SE +/- 0.00, N = 321.8533.09

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 10BAVX-512 EnabledAVX-512 Disabled1530456075SE +/- 0.01, N = 3SE +/- 0.19, N = 344.0568.13

Cpuminer-Opt

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 24.3Algorithm: scryptAVX-512 EnabledAVX-512 Disabled6001200180024003000SE +/- 3.40, N = 3SE +/- 0.72, N = 32937.26846.44-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lcurl -lz -lpthread -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 24.3Algorithm: SkeincoinAVX-512 EnabledAVX-512 Disabled300K600K900K1200K1500KSE +/- 6599.64, N = 3SE +/- 2345.30, N = 31286043747190-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lcurl -lz -lpthread -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 24.3Algorithm: LBC, LBRY CreditsAVX-512 EnabledAVX-512 Disabled160K320K480K640K800KSE +/- 622.53, N = 3SE +/- 2841.66, N = 3750993245007-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto -lcurl -lz -lpthread -lgmp

SMHasher

SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMiB/sec, More Is BetterSMHasher 2022-08-22Hash: FarmHash32 x86_64 AVXAVX-512 EnabledAVX-512 Disabled8K16K24K32K40KSE +/- 10.36, N = 6SE +/- 21.63, N = 637638.6236139.951. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -flto=auto -fno-fat-lto-objects

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUAVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.01312, N = 3SE +/- 0.00410, N = 36.7246612.86010MIN: 4.2-mno-avx512f - MIN: 9.81. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUAVX-512 EnabledAVX-512 Disabled0.30470.60940.91411.21881.5235SE +/- 0.001216, N = 9SE +/- 0.001116, N = 90.7169791.354390MIN: 0.62-mno-avx512f - MIN: 1.311. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUAVX-512 EnabledAVX-512 Disabled0.14540.29080.43620.58160.727SE +/- 0.001169, N = 4SE +/- 0.000257, N = 40.5334140.646211MIN: 0.49-mno-avx512f - MIN: 0.591. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUAVX-512 EnabledAVX-512 Disabled0.6921.3842.0762.7683.46SE +/- 0.000384, N = 5SE +/- 0.002544, N = 50.2659273.075540-mno-avx512f - MIN: 3.021. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUAVX-512 EnabledAVX-512 Disabled110220330440550SE +/- 0.35, N = 3SE +/- 0.56, N = 3428.50491.69MIN: 421.37-mno-avx512f - MIN: 486.041. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUAVX-512 EnabledAVX-512 Disabled70140210280350SE +/- 0.34, N = 3SE +/- 0.47, N = 3278.31307.05MIN: 271.12-mno-avx512f - MIN: 301.261. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled1122334455SE +/- 0.31, N = 3SE +/- 0.21, N = 350.5243.33MIN: 43.73 / MAX: 51.63MIN: 38.8 / MAX: 44.27

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled1122334455SE +/- 0.28, N = 3SE +/- 0.14, N = 350.8443.44MIN: 44.51 / MAX: 52.23MIN: 38.59 / MAX: 44.4

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 0.23, N = 3SE +/- 0.30, N = 3212.82152.47

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50AVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 0.01, N = 3SE +/- 0.06, N = 3241.22161.27

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled306090120150SE +/- 0.11, N = 3SE +/- 0.04, N = 3147.0685.54-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled120240360480600SE +/- 0.27, N = 3SE +/- 0.19, N = 3325.74558.74-march=znver5 - MIN: 255.71 / MAX: 363.13MIN: 273.34 / MAX: 587.281. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled30K60K90K120K150KSE +/- 180.45, N = 3SE +/- 170.89, N = 3162706.14129762.91-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 EnabledAVX-512 Disabled0.11930.23860.35790.47720.5965SE +/- 0.00, N = 3SE +/- 0.00, N = 30.310.53-march=znver5 - MIN: 0.12 / MAX: 25.73MIN: 0.18 / MAX: 13.441. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled150300450600750SE +/- 0.91, N = 3SE +/- 0.82, N = 3710.04241.77-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.09, N = 3SE +/- 0.68, N = 367.47198.17-march=znver5 - MIN: 29.95 / MAX: 185.48MIN: 94.52 / MAX: 339.841. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled16003200480064008000SE +/- 5.62, N = 3SE +/- 1.32, N = 37611.172189.06-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled1020304050SE +/- 0.01, N = 3SE +/- 0.03, N = 312.5343.73-march=znver5 - MIN: 5.9 / MAX: 31.1MIN: 23.02 / MAX: 62.811. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled15003000450060007500SE +/- 17.52, N = 3SE +/- 2.95, N = 36775.463132.50-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled48121620SE +/- 0.02, N = 3SE +/- 0.01, N = 37.0315.26-march=znver5 - MIN: 3.43 / MAX: 24.51MIN: 7.72 / MAX: 48.831. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 0.97, N = 3SE +/- 0.09, N = 3857.16295.88-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.07, N = 3SE +/- 0.05, N = 355.95162.02-march=znver5 - MIN: 26.76 / MAX: 125.18MIN: 75.62 / MAX: 264.141. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled4K8K12K16K20KSE +/- 12.77, N = 3SE +/- 8.72, N = 316580.186688.98-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled246810SE +/- 0.00, N = 3SE +/- 0.01, N = 32.827.14-march=znver5 - MIN: 1.14 / MAX: 18.12MIN: 2.91 / MAX: 32.041. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled7001400210028003500SE +/- 4.77, N = 3SE +/- 0.28, N = 33488.231200.37-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.04, N = 3SE +/- 0.02, N = 327.5079.90-march=znver5 - MIN: 13.39 / MAX: 61.16MIN: 35.69 / MAX: 151.021. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled5001000150020002500SE +/- 6.46, N = 3SE +/- 1.87, N = 32102.03713.96-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled1530456075SE +/- 0.07, N = 3SE +/- 0.18, N = 322.7867.12-march=znver5 - MIN: 11.57 / MAX: 67.03MIN: 17.27 / MAX: 133.161. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled2K4K6K8K10KSE +/- 6.44, N = 3SE +/- 2.23, N = 310524.043605.75-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.00, N = 3SE +/- 0.01, N = 34.5213.30-march=znver5 - MIN: 2.34 / MAX: 21.71MIN: 7.56 / MAX: 31.81. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled15003000450060007500SE +/- 13.37, N = 3SE +/- 6.13, N = 36825.092516.89-march=znver51. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAVX-512 EnabledAVX-512 Disabled918273645SE +/- 0.03, N = 3SE +/- 0.09, N = 313.4738.05-march=znver5 - MIN: 7.72 / MAX: 51.08MIN: 16.64 / MAX: 61.981. (CXX) g++ options: -fPIC -O3 -flto -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPUAVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.94, N = 3SE +/- 1.17, N = 1566.5791.40

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.76, N = 3SE +/- 0.94, N = 3107.8495.03-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.46, N = 3SE +/- 0.60, N = 3106.7193.45-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.56, N = 3SE +/- 0.54, N = 3107.3495.47-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512AVX-512 EnabledAVX-512 Disabled80160240320400SE +/- 2.40, N = 5SE +/- 2.08, N = 5356.76347.10-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 1.17, N = 5SE +/- 1.01, N = 4109.2596.15-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 1.15, N = 3SE +/- 0.77, N = 3107.3294.90-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048AVX-512 EnabledAVX-512 Disabled20406080100SE +/- 0.94, N = 3SE +/- 0.65, N = 3108.7495.27-mno-avx512f1. (CXX) g++ options: -O3 -march=znver5 -flto

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.13137, N = 15SE +/- 0.04636, N = 39.564336.586031. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 0.59, N = 3SE +/- 1.92, N = 3215.16170.951. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: bertsquad-12 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled510152025SE +/- 0.19, N = 3SE +/- 0.06, N = 322.4317.761. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: GPT-2 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 0.28, N = 3SE +/- 0.35, N = 3196.35182.811. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled1122334455SE +/- 0.53, N = 3SE +/- 0.36, N = 848.7840.551. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled1428425670SE +/- 0.50, N = 3SE +/- 0.45, N = 1564.0737.151. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled50100150200250SE +/- 1.84, N = 3SE +/- 0.75, N = 3244.99238.641. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAVX-512 EnabledAVX-512 Disabled246810SE +/- 0.06862, N = 4SE +/- 0.15591, N = 126.507954.309141. (CXX) g++ options: -O3 -march=native -march=znver5 -flto -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAVX-512 EnabledAVX-512 Disabled2004006008001000SE +/- 1.81, N = 3SE +/- 0.83, N = 3882.00828.67

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128AVX-512 EnabledAVX-512 Disabled7001400210028003500SE +/- 5.19, N = 3SE +/- 10.56, N = 33422.03004.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 3.0Model: mobilenetV3AVX-512 EnabledAVX-512 Disabled0.40570.81141.21711.62282.0285SE +/- 0.010, N = 3SE +/- 0.010, N = 31.7361.803-march=znver5 - MIN: 1.59 / MAX: 2.21MIN: 1.68 / MAX: 2.881. (CXX) g++ options: -O3 -flto -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 3.0Model: resnet-v2-50AVX-512 EnabledAVX-512 Disabled3691215SE +/- 0.117, N = 3SE +/- 0.086, N = 37.4099.950-march=znver5 - MIN: 7.08 / MAX: 9.54MIN: 9.4 / MAX: 12.21. (CXX) g++ options: -O3 -flto -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 3.0Model: SqueezeNetV1.0AVX-512 EnabledAVX-512 Disabled0.74861.49722.24582.99443.743SE +/- 0.011, N = 3SE +/- 0.030, N = 33.1213.327-march=znver5 - MIN: 3.05 / MAX: 5.8MIN: 3.21 / MAX: 5.61. (CXX) g++ options: -O3 -flto -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

srsRAN Project

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput TotalAVX-512 EnabledAVX-512 Disabled30K60K90K120K150KSE +/- 1542.84, N = 3SE +/- 1008.70, N = 3124852.3110897.91. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl

SVT-AV1

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 13 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled90180270360450SE +/- 2.58, N = 6SE +/- 1.74, N = 6416.85412.131. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 8 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled4080120160200SE +/- 1.02, N = 4SE +/- 1.07, N = 4184.79171.291. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 5 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled1326395265SE +/- 0.23, N = 3SE +/- 0.29, N = 356.4851.741. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 3 - Input: Bosphorus 4KAVX-512 EnabledAVX-512 Disabled48121620SE +/- 0.03, N = 3SE +/- 0.05, N = 315.6514.521. (CXX) g++ options: -O3 -march=znver5 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

87 Results Shown

miniBUDE:
  OpenMP - BM1:
    GFInst/s
    Billion Interactions/s
  OpenMP - BM2:
    GFInst/s
    Billion Interactions/s
GROMACS
ACES DGEMM
Laghos
Embree:
  Pathtracer ISPC - Asian Dragon
  Pathtracer ISPC - Asian Dragon Obj
  Pathtracer ISPC - Crown
OpenVKL
OSPRay:
  gravity_spheres_volume/dim_512/ao/real_time
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/pathtracer/real_time
OSPRay Studio:
  1 - 4K - 1 - Path Tracer - CPU
  1 - 4K - 16 - Path Tracer - CPU
  1 - 4K - 32 - Path Tracer - CPU
  2 - 4K - 1 - Path Tracer - CPU
  2 - 4K - 16 - Path Tracer - CPU
  2 - 4K - 32 - Path Tracer - CPU
  3 - 4K - 1 - Path Tracer - CPU
  3 - 4K - 16 - Path Tracer - CPU
  3 - 4K - 32 - Path Tracer - CPU
Y-Cruncher:
  5B
  10B
Cpuminer-Opt:
  scrypt
  Skeincoin
  LBC, LBRY Credits
SMHasher
oneDNN:
  Deconvolution Batch shapes_1d - CPU
  Deconvolution Batch shapes_3d - CPU
  IP Shapes 1D - CPU
  IP Shapes 3D - CPU
  Recurrent Neural Network Training - CPU
  Recurrent Neural Network Inference - CPU
PyTorch:
  CPU - 256 - ResNet-50
  CPU - 512 - ResNet-50
TensorFlow:
  CPU - 256 - ResNet-50
  CPU - 512 - ResNet-50
OpenVINO:
  Face Detection FP16-INT8 - CPU:
    FPS
    ms
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
    FPS
    ms
  Person Detection FP16 - CPU:
    FPS
    ms
  Weld Porosity Detection FP16 - CPU:
    FPS
    ms
  Person Vehicle Bike Detection FP16 - CPU:
    FPS
    ms
  Machine Translation EN To DE FP16 - CPU:
    FPS
    ms
  Face Detection Retail FP16 - CPU:
    FPS
    ms
  Handwritten English Recognition FP16 - CPU:
    FPS
    ms
  Road Segmentation ADAS FP16 - CPU:
    FPS
    ms
  Person Re-Identification Retail FP16 - CPU:
    FPS
    ms
  Noise Suppression Poconet-Like FP16 - CPU:
    FPS
    ms
OpenVINO GenAI
Llama.cpp:
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
ONNX Runtime:
  fcn-resnet101-11 - CPU - Standard
  super-resolution-10 - CPU - Standard
  bertsquad-12 - CPU - Standard
  GPT-2 - CPU - Standard
  ArcFace ResNet-100 - CPU - Standard
  Faster R-CNN R-50-FPN-int8 - CPU - Standard
  T5 Encoder - CPU - Standard
  ResNet101_DUC_HDC-12 - CPU - Standard
Numpy Benchmark
libxsmm
Mobile Neural Network:
  mobilenetV3
  resnet-v2-50
  SqueezeNetV1.0
srsRAN Project
SVT-AV1:
  Preset 13 - Bosphorus 4K
  Preset 8 - Bosphorus 4K
  Preset 5 - Bosphorus 4K
  Preset 3 - Bosphorus 4K