xeon febby

Benchmarks for a future article. 2 x INTEL XEON PLATINUM 8592+ testing with a Quanta Cloud QuantaGrid D54Q-2U S6Q-MB-MPS (3B05.TEL4P1 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2402199-NE-XEONFEBBY14
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
February 19
  2 Hours, 29 Minutes
b
February 20
  1 Hour, 59 Minutes
Invert Behavior (Only Show Selected Data)
  2 Hours, 14 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


xeon febbyOpenBenchmarking.orgPhoronix Test Suite2 x INTEL XEON PLATINUM 8592+ @ 3.90GHz (128 Cores / 256 Threads)Quanta Cloud QuantaGrid D54Q-2U S6Q-MB-MPS (3B05.TEL4P1 BIOS)Intel Device 1bce1008GB3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Intel X710 for 10GBASE-TUbuntu 23.106.6.0-060600-generic (x86_64)GCC 13.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionXeon Febby BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x21000161 - Python 3.11.6- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

a vs. b ComparisonPhoronix Test SuiteBaseline+12.2%+12.2%+24.4%+24.4%+36.6%+36.6%41.9%11.6%7.4%7.3%5.2%4%3.2%2.8%2.7%2.3%11.6%41.9%7.4%A.w.3.5.A48.8%bertsquad-12 - CPU - StandardT5 Encoder - CPU - Standard29.1%llama-2-70b-chat.Q5_0.gguf26.5%Rand Read25%llama-2-13b.Q4_0.gguf22.2%llama-2-7b.Q4_0.gguf19%yolov4 - CPU - StandardArcFace ResNet-100 - CPU - StandardRead While WritingCPU - 1 - AlexNet6%CORAL2 P2fcn-resnet101-11 - CPU - Standard5%S.w.1.0.6.ACPU - 1 - ResNet-503.9%super-resolution-10 - CPU - Standard3.8%CPU - 1 - GoogLeNet3.5%wizardcoder-python-34b-v1.0.Q6_K - CPUCPU - 1 - VGG-163%mistral-7b-instruct-v0.2.Q8_0 - CPURT.ldr_alb_nrm.3840x2160 - CPU-Only2.8%1B2.8%MPI CPU - water_GMX50_bareCORAL2 P1CTS22.2%yolov4 - CPU - StandardT5 Encoder - CPU - Standard29.1%bertsquad-12 - CPU - Standardfcn-resnet101-11 - CPU - Standard5%ArcFace ResNet-100 - CPU - Standardsuper-resolution-10 - CPU - Standard3.8%NAMDONNX RuntimeONNX RuntimeLlama.cppSpeedbLlama.cppLlama.cppONNX RuntimeONNX RuntimeSpeedbTensorFlowQuicksilverONNX RuntimeNAMDTensorFlowONNX RuntimeTensorFlowLlamafileTensorFlowLlamafileIntel Open Image DenoiseY-CruncherGROMACSQuicksilverQuicksilverONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX Runtimeab

xeon febbyquicksilver: CORAL2 P2tensorflow: CPU - 1 - VGG-16tensorflow: CPU - 1 - AlexNettensorflow: CPU - 1 - GoogLeNettensorflow: CPU - 1 - ResNet-50onnx: GPT-2 - CPU - Standardy-cruncher: 500My-cruncher: 1Bonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardquicksilver: CORAL2 P1onnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardpytorch: CPU - 1 - ResNet-50pytorch: CPU - 1 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lllama-cpp: llama-2-7b.Q4_0.ggufllama-cpp: llama-2-13b.Q4_0.ggufllama-cpp: llama-2-70b-chat.Q5_0.ggufllamafile: llava-v1.5-7b-q4 - CPUllamafile: mistral-7b-instruct-v0.2.Q8_0 - CPUllamafile: wizardcoder-python-34b-v1.0.Q6_K - CPUgromacs: MPI CPU - water_GMX50_barequicksilver: CTS2namd: ATPase with 327,506 Atomsnamd: STMV with 1,066,628 Atomsdav1d: Chimera 1080pdav1d: Summer Nature 4Kdav1d: Summer Nature 1080pdav1d: Chimera 1080p 10-bitoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyspeedb: Rand Readspeedb: Update Randspeedb: Read While Writingspeedb: Read Rand Write Randonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardab800000012.2539.9818.237.28205.5632.7235.10715.3826465.64715.9646786.2629.90267862500035.9174170.023256.99638.330251.6619.080.420.690.550.430.538.573.7417.91895560005.983081.74622204.3868.4387.78235.425.105.142.466132577451570601694373915204364.8615165.00652.1469562.63611.27117100.9827.84055.880913.8905326.0875841800011.8937.717.627.01208.9932.7615.24917.1686360.77922.6587789.3129.4317882000038.5606170.235247.66537.883251.0019.210.580.450.340.538.813.8618.39893540004.020291.81638202.8368.3686.79239.015.165.002.474905331531564581818711015143314.7811358.24382.7709144.13111.26631106.02325.93195.873694.0371226.395OpenBenchmarking.org

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P2ba2M4M6M8M10M841800080000001. (CXX) g++ options: -fopenmp -O3 -march=native

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: VGG-16ba369121511.8912.25

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: AlexNetba91827364537.7039.98

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: GoogLeNetba4812162017.6218.23

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: ResNet-50ba2468107.017.28

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardba50100150200250208.99205.561. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Y-Cruncher

Y-Cruncher is a multi-threaded Pi benchmark capable of computing Pi to trillions of digits. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.3Pi Digits To Calculate: 500Mba0.62121.24241.86362.48483.1062.7612.723

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.3Pi Digits To Calculate: 1Bba1.1812.3623.5434.7245.9055.2495.107

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardba4812162017.1715.381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardba100200300400500360.78465.651. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardba51015202522.6615.961. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardba2004006008001000789.31786.261. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardba36912159.431709.902671. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P1ba2M4M6M8M10M882000086250001. (CXX) g++ options: -fopenmp -O3 -march=native

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardba91827364538.5635.921. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardba4080120160200170.24170.021. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardba60120180240300247.67257.001. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardba91827364537.8838.331. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50ba122436486051.0051.66MIN: 24.67 / MAX: 53.71MIN: 21.28 / MAX: 52.92

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152ba51015202519.2119.08MIN: 6.66 / MAX: 20.29MIN: 2.63 / MAX: 20.35

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_la0.09450.1890.28350.3780.47250.42MIN: 0.23 / MAX: 1.28

Llama.cpp

Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-7b.Q4_0.ggufba0.15530.31060.46590.62120.77650.580.691. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-13b.Q4_0.ggufba0.12380.24760.37140.49520.6190.450.551. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-70b-chat.Q5_0.ggufba0.09680.19360.29040.38720.4840.340.431. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Llamafile

Mozilla's Llamafile allows distributing and running large language models (LLMs) as a single file. Llamafile aims to make open-source LLMs more accessible to developers and users. Llamafile supports a variety of models, CPUs and GPUs, and other options. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: llava-v1.5-7b-q4 - Acceleration: CPUba0.11930.23860.35790.47720.59650.530.53

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPUba2468108.818.57

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPUba0.86851.7372.60553.4744.34253.863.74

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareba51015202518.4017.921. (CXX) g++ options: -O3 -lm

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CTS2ba2M4M6M8M10M935400095560001. (CXX) g++ options: -fopenmp -O3 -march=native

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: ATPase with 327,506 Atomsba1.34622.69244.03865.38486.7314.020295.98308

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: STMV with 1,066,628 Atomsba0.40870.81741.22611.63482.04351.816381.74622

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080pba4080120160200202.83204.381. (CC) gcc options: -pthread

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 4Kba153045607568.3668.431. (CC) gcc options: -pthread

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 1080pba2040608010086.7987.781. (CC) gcc options: -pthread

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080p 10-bitba50100150200250239.01235.421. (CC) gcc options: -pthread

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Onlyba1.1612.3223.4834.6445.8055.165.10

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Onlyba1.15652.3133.46954.6265.78255.005.14

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RTLightmap.hdr.4096x4096 - Device: CPU-Onlyba0.55581.11161.66742.22322.7792.472.46

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Random Readba130M260M390M520M650M4905331536132577451. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Update Randomba30K60K90K120K150K1564581570601. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Read While Writingba4M8M12M16M20M18187110169437391. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Read Random Write Randomba300K600K900K1200K1500K151433115204361. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread