Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2308099-NE-XEONPLATI49
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

AV1 2 Tests
Bioinformatics 2 Tests
C/C++ Compiler Tests 5 Tests
CPU Massive 10 Tests
Creator Workloads 12 Tests
Encoding 5 Tests
Fortran Tests 4 Tests
Game Development 3 Tests
HPC - High Performance Computing 12 Tests
Machine Learning 6 Tests
Molecular Dynamics 3 Tests
MPI Benchmarks 4 Tests
Multi-Core 14 Tests
NVIDIA GPU Compute 3 Tests
Intel oneAPI 6 Tests
OpenMPI Tests 10 Tests
Python Tests 5 Tests
Renderers 2 Tests
Scientific Computing 5 Tests
Server CPU Tests 6 Tests
Video Encoding 5 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
0xd000390
August 06 2023
  11 Hours, 56 Minutes
0xd0003a5
August 08 2023
  15 Hours, 40 Minutes
Invert Hiding All Results Option
  13 Hours, 48 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Xeon Platinum 8380 AVX-512 Workloads Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite. 0xd000390: Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Ice Lake IEH, Memory: 512GB, Disk: 7682GB INTEL SSDPF2KX076TZ, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP OS: Ubuntu 22.10, Kernel: 6.5.0-060500rc4daily20230804-generic (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.3, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080 0xd0003a5: Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Ice Lake IEH, Memory: 512GB, Disk: 7682GB INTEL SSDPF2KX076TZ, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP OS: Ubuntu 22.10, Kernel: 6.5.0-rc5-phx-tues (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.3, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080 TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: ResNet-50 images/sec > Higher Is Better 0xd000390 . 85.97 |============================================================ 0xd0003a5 . 84.84 |=========================================================== TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: ResNet-50 images/sec > Higher Is Better 0xd000390 . 83.89 |========================================================== 0xd0003a5 . 86.93 |============================================================ libxsmm 2-1.17-3645 M N K: 128 GFLOPS/s > Higher Is Better 0xd000390 . 1941.1 |========================================================== 0xd0003a5 . 1978.9 |=========================================================== ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 110.32 |======================================================= 0xd0003a5 . 117.55 |=========================================================== ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 9.08067 |========================================================== 0xd0003a5 . 8.59908 |======================================================= OpenVKL 1.3.1 Benchmark: vklBenchmark ISPC Items / Sec > Higher Is Better 0xd000390 . 912 |============================================================== 0xd0003a5 . 856 |========================================================== oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better 0xd000390 . 832.45 |=========================================================== 0xd0003a5 . 816.51 |========================================================== ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 85.80 |========================================================= 0xd0003a5 . 89.83 |============================================================ ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 11.66 |============================================================ 0xd0003a5 . 11.15 |========================================================= ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 5.54783 |========================================================== 0xd0003a5 . 5.28095 |======================================================= ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 180.16 |======================================================== 0xd0003a5 . 190.57 |=========================================================== ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 25.57 |=========================================================== 0xd0003a5 . 25.86 |============================================================ ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 39.12 |============================================================ 0xd0003a5 . 38.72 |=========================================================== ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 1.43407 |======================================================= 0xd0003a5 . 1.51029 |========================================================== ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 696.73 |=========================================================== 0xd0003a5 . 664.54 |======================================================== ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 6.31428 |======================================================= 0xd0003a5 . 6.71968 |========================================================== ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 158.36 |=========================================================== 0xd0003a5 . 158.01 |=========================================================== QMCPACK 3.16 Input: FeCO6_b3lyp_gms Total Execution Time - Seconds < Lower Is Better 0xd000390 . 268.56 |=========================================================== 0xd0003a5 . 263.23 |========================================================== ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 59.84 |========================================================== 0xd0003a5 . 61.51 |============================================================ ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 16.71 |============================================================ 0xd0003a5 . 16.31 |=========================================================== libxsmm 2-1.17-3645 M N K: 256 GFLOPS/s > Higher Is Better 0xd000390 . 594.6 |=========================================================== 0xd0003a5 . 600.2 |============================================================ TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: GoogLeNet images/sec > Higher Is Better 0xd000390 . 317.27 |========================================================== 0xd0003a5 . 323.79 |=========================================================== OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time Items Per Second > Higher Is Better 0xd000390 . 150.28 |=========================================================== 0xd0003a5 . 136.85 |====================================================== Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Seconds < Lower Is Better 0xd000390 . 166.53 |=========================================================== 0xd0003a5 . 165.42 |=========================================================== ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better 0xd000390 . 4.52403 |========================================================= 0xd0003a5 . 4.62085 |========================================================== ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better 0xd000390 . 221.02 |=========================================================== 0xd0003a5 . 216.48 |========================================================== QMCPACK 3.16 Input: FeCO6_b3lyp_gms Total Execution Time - Seconds < Lower Is Better 0xd000390 . 147.51 |================================================= 0xd0003a5 . 178.19 |=========================================================== Palabos 2.3 Grid Size: 100 Mega Site Updates Per Second > Higher Is Better 0xd000390 . 312.20 |=========================================================== 0xd0003a5 . 312.53 |=========================================================== Cpuminer-Opt 3.20.3 Algorithm: Garlicoin kH/s > Higher Is Better 0xd000390 . 29203.00 |========================================================= 0xd0003a5 . 22086.25 |=========================================== OSPRay 2.12 Benchmark: particle_volume/scivis/real_time Items Per Second > Higher Is Better 0xd000390 . 24.95 |============================================================ 0xd0003a5 . 16.38 |======================================= Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 405.97 |========================================================= 0xd0003a5 . 421.75 |=========================================================== Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 98.34 |============================================================ 0xd0003a5 . 94.26 |========================================================== Palabos 2.3 Grid Size: 400 Mega Site Updates Per Second > Higher Is Better 0xd000390 . 388.48 |========================================================== 0xd0003a5 . 393.84 |=========================================================== QMCPACK 3.16 Input: Li2_STO_ae Total Execution Time - Seconds < Lower Is Better 0xd000390 . 124.23 |=========================================================== 0xd0003a5 . 123.26 |=========================================================== Palabos 2.3 Grid Size: 500 Mega Site Updates Per Second > Higher Is Better 0xd000390 . 413.21 |========================================================== 0xd0003a5 . 417.48 |=========================================================== NCNN 20230517 Target: CPU - Model: FastestDet ms < Lower Is Better 0xd000390 . 10.20 |============================================================ 0xd0003a5 . 9.71 |========================================================= NCNN 20230517 Target: CPU - Model: vision_transformer ms < Lower Is Better 0xd000390 . 46.92 |============================================================ 0xd0003a5 . 45.23 |========================================================== NCNN 20230517 Target: CPU - Model: regnety_400m ms < Lower Is Better 0xd000390 . 45.54 |============================================================ 0xd0003a5 . 38.85 |=================================================== NCNN 20230517 Target: CPU - Model: squeezenet_ssd ms < Lower Is Better 0xd000390 . 15.34 |========================================================= 0xd0003a5 . 16.10 |============================================================ NCNN 20230517 Target: CPU - Model: yolov4-tiny ms < Lower Is Better 0xd000390 . 23.71 |=========================================================== 0xd0003a5 . 24.10 |============================================================ NCNN 20230517 Target: CPU - Model: resnet50 ms < Lower Is Better 0xd000390 . 17.15 |======================================================== 0xd0003a5 . 18.51 |============================================================ NCNN 20230517 Target: CPU - Model: alexnet ms < Lower Is Better 0xd000390 . 5.39 |============================================================ 0xd0003a5 . 5.46 |============================================================= NCNN 20230517 Target: CPU - Model: resnet18 ms < Lower Is Better 0xd000390 . 8.97 |========================================================== 0xd0003a5 . 9.42 |============================================================= NCNN 20230517 Target: CPU - Model: vgg16 ms < Lower Is Better 0xd000390 . 23.86 |======================================================== 0xd0003a5 . 25.71 |============================================================ NCNN 20230517 Target: CPU - Model: googlenet ms < Lower Is Better 0xd000390 . 15.36 |======================================================== 0xd0003a5 . 16.58 |============================================================ NCNN 20230517 Target: CPU - Model: blazeface ms < Lower Is Better 0xd000390 . 4.35 |========================================================== 0xd0003a5 . 4.54 |============================================================= NCNN 20230517 Target: CPU - Model: efficientnet-b0 ms < Lower Is Better 0xd000390 . 11.62 |============================================================ 0xd0003a5 . 11.71 |============================================================ NCNN 20230517 Target: CPU - Model: mnasnet ms < Lower Is Better 0xd000390 . 7.57 |============================================================= 0xd0003a5 . 7.41 |============================================================ NCNN 20230517 Target: CPU - Model: shufflenet-v2 ms < Lower Is Better 0xd000390 . 9.89 |============================================================= 0xd0003a5 . 9.76 |============================================================ NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better 0xd000390 . 8.88 |============================================================= 0xd0003a5 . 8.76 |============================================================ NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better 0xd000390 . 8.03 |============================================================= 0xd0003a5 . 7.96 |============================================================ NCNN 20230517 Target: CPU - Model: mobilenet ms < Lower Is Better 0xd000390 . 15.46 |=========================================================== 0xd0003a5 . 15.66 |============================================================ VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Fast Frames Per Second > Higher Is Better 0xd000390 . 5.722 |============================================================ 0xd0003a5 . 5.705 |============================================================ OSPRay 2.12 Benchmark: particle_volume/ao/real_time Items Per Second > Higher Is Better 0xd000390 . 24.75 |============================================================ 0xd0003a5 . 16.46 |======================================== TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet images/sec > Higher Is Better 0xd000390 . 309.63 |========================================================= 0xd0003a5 . 321.72 |=========================================================== Cpuminer-Opt 3.20.3 Algorithm: Myriad-Groestl kH/s > Higher Is Better 0xd000390 . 43127 |============================================================ 0xd0003a5 . 43450 |============================================================ Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh Major Kernels Total Rate > Higher Is Better 0xd000390 . 385.89 |=========================================================== 0xd0003a5 . 386.08 |=========================================================== oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better 0xd000390 . 524.38 |=========================================================== 0xd0003a5 . 521.74 |=========================================================== simdjson 2.0 Throughput Test: PartialTweets GB/s > Higher Is Better 0xd000390 . 4.62 |=========================================================== 0xd0003a5 . 4.77 |============================================================= simdjson 2.0 Throughput Test: DistinctUserID GB/s > Higher Is Better 0xd000390 . 5.52 |=========================================================== 0xd0003a5 . 5.71 |============================================================= simdjson 2.0 Throughput Test: TopTweet GB/s > Higher Is Better 0xd000390 . 5.60 |=========================================================== 0xd0003a5 . 5.75 |============================================================= TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet images/sec > Higher Is Better 0xd000390 . 760.15 |===================================================== 0xd0003a5 . 839.41 |=========================================================== Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 93.54 |======================================================= 0xd0003a5 . 100.19 |=========================================================== Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 426.95 |=========================================================== 0xd0003a5 . 398.72 |======================================================= OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 1490.76 |========================================================== 0xd0003a5 . 1496.19 |========================================================== OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 13.29 |============================================================ 0xd0003a5 . 13.25 |============================================================ OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU ms < Lower Is Better 0xd000390 . 1517.69 |========================================================== 0xd0003a5 . 1519.84 |========================================================== OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU FPS > Higher Is Better 0xd000390 . 13.03 |============================================================ 0xd0003a5 . 13.04 |============================================================ OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 827.51 |=========================================================== 0xd0003a5 . 823.09 |=========================================================== OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 24.04 |============================================================ 0xd0003a5 . 24.18 |============================================================ OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU ms < Lower Is Better 0xd000390 . 209.31 |=========================================================== 0xd0003a5 . 209.02 |=========================================================== OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU FPS > Higher Is Better 0xd000390 . 95.42 |============================================================ 0xd0003a5 . 95.54 |============================================================ OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 79.47 |============================================================ 0xd0003a5 . 78.18 |=========================================================== OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 251.00 |========================================================== 0xd0003a5 . 255.09 |=========================================================== OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 9.77 |============================================================= 0xd0003a5 . 9.63 |============================================================ OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 2039.63 |========================================================= 0xd0003a5 . 2070.72 |========================================================== VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Faster Frames Per Second > Higher Is Better 0xd000390 . 10.36 |============================================================ 0xd0003a5 . 10.42 |============================================================ OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 33.89 |============================================================ 0xd0003a5 . 33.63 |============================================================ OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 2344.97 |========================================================== 0xd0003a5 . 2362.84 |========================================================== OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU ms < Lower Is Better 0xd000390 . 1.16 |============================================================= 0xd0003a5 . 1.16 |============================================================= OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU FPS > Higher Is Better 0xd000390 . 67604.00 |========================================================= 0xd0003a5 . 67754.34 |========================================================= Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 37.99 |================================================== 0xd0003a5 . 45.96 |============================================================ Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 1051.79 |========================================================== 0xd0003a5 . 869.04 |================================================ OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU ms < Lower Is Better 0xd000390 . 8.50 |============================================================= 0xd0003a5 . 8.48 |============================================================= OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU FPS > Higher Is Better 0xd000390 . 9396.52 |========================================================== 0xd0003a5 . 9419.77 |========================================================== OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 1.33 |============================================================= 0xd0003a5 . 1.33 |============================================================= OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 59274.06 |========================================================= 0xd0003a5 . 59377.96 |========================================================= OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU ms < Lower Is Better 0xd000390 . 4.51 |============================================================= 0xd0003a5 . 4.49 |============================================================= OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU FPS > Higher Is Better 0xd000390 . 4419.17 |========================================================== 0xd0003a5 . 4442.98 |========================================================== OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU ms < Lower Is Better 0xd000390 . 17.80 |============================================================ 0xd0003a5 . 17.87 |============================================================ OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU FPS > Higher Is Better 0xd000390 . 1121.56 |========================================================== 0xd0003a5 . 1117.09 |========================================================== SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 10.46 |============================================================ 0xd0003a5 . 10.45 |============================================================ simdjson 2.0 Throughput Test: Kostya GB/s > Higher Is Better 0xd000390 . 2.61 |======================================================= 0xd0003a5 . 2.87 |============================================================= Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 474.69 |========================================================== 0xd0003a5 . 478.85 |=========================================================== Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 84.02 |============================================================ 0xd0003a5 . 83.20 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 17.35 |====================================================== 0xd0003a5 . 19.31 |============================================================ Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 2301.16 |========================================================== 0xd0003a5 . 2068.30 |==================================================== OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time Items Per Second > Higher Is Better 0xd000390 . 20.58 |============================================================ 0xd0003a5 . 18.46 |====================================================== OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time Items Per Second > Higher Is Better 0xd000390 . 21.08 |============================================================ 0xd0003a5 . 18.89 |====================================================== Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 129.81 |======================================================== 0xd0003a5 . 136.18 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 307.87 |=========================================================== 0xd0003a5 . 293.12 |======================================================== Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 551.12 |=========================================================== 0xd0003a5 . 553.27 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 72.07 |============================================================ 0xd0003a5 . 71.89 |============================================================ Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 551.82 |=========================================================== 0xd0003a5 . 555.09 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 72.18 |============================================================ 0xd0003a5 . 71.54 |=========================================================== simdjson 2.0 Throughput Test: LargeRandom GB/s > Higher Is Better 0xd000390 . 0.85 |====================================================== 0xd0003a5 . 0.96 |============================================================= VP9 libvpx Encoding 1.13 Speed: Speed 5 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 12.63 |============================================================ 0xd0003a5 . 12.33 |=========================================================== miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Billion Interactions/s > Higher Is Better 0xd000390 . 101.08 |========================================================= 0xd0003a5 . 104.05 |=========================================================== miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 GFInst/s > Higher Is Better 0xd000390 . 2526.89 |======================================================== 0xd0003a5 . 2601.21 |========================================================== SPECFEM3D 4.0 Model: Water-layered Halfspace Seconds < Lower Is Better 0xd000390 . 31.15 |============================================================ 0xd0003a5 . 31.38 |============================================================ oneDNN 3.1 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better 0xd000390 . 2.59271 |========================================================== 0xd0003a5 . 2.57437 |========================================================== Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 173.79 |=========================================================== 0xd0003a5 . 172.56 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 230.05 |=========================================================== 0xd0003a5 . 231.43 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 43.30 |============================================================ 0xd0003a5 . 42.94 |============================================================ Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 922.93 |=========================================================== 0xd0003a5 . 930.75 |=========================================================== Laghos 3.1 Test: Triple Point Problem Major Kernels Total Rate > Higher Is Better 0xd000390 . 256.27 |=========================================================== 0xd0003a5 . 256.87 |=========================================================== Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 62.01 |========================================================== 0xd0003a5 . 64.44 |============================================================ Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 644.44 |=========================================================== 0xd0003a5 . 620.25 |========================================================= QMCPACK 3.16 Input: simple-H2O Total Execution Time - Seconds < Lower Is Better 0xd000390 . 39.56 |========================================================== 0xd0003a5 . 41.25 |============================================================ TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet images/sec > Higher Is Better 0xd000390 . 723.27 |======================================================= 0xd0003a5 . 781.25 |=========================================================== Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 94.61 |=========================================================== 0xd0003a5 . 96.94 |============================================================ Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 422.12 |=========================================================== 0xd0003a5 . 412.05 |========================================================== Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 39.73 |=========================================================== 0xd0003a5 . 40.45 |============================================================ Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 1005.58 |========================================================== 0xd0003a5 . 987.55 |========================================================= Neural Magic DeepSparse 1.5 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 39.70 |============================================================ 0xd0003a5 . 39.41 |============================================================ Neural Magic DeepSparse 1.5 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 1006.34 |========================================================== 0xd0003a5 . 1013.82 |========================================================== Neural Magic DeepSparse 1.5 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream ms/batch < Lower Is Better 0xd000390 . 5.2210 |======================================================= 0xd0003a5 . 5.6210 |=========================================================== Neural Magic DeepSparse 1.5 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream items/sec > Higher Is Better 0xd000390 . 7633.26 |========================================================== 0xd0003a5 . 7092.50 |====================================================== libxsmm 2-1.17-3645 M N K: 64 GFLOPS/s > Higher Is Better 0xd000390 . 1098.8 |======================================================= 0xd0003a5 . 1177.1 |=========================================================== SPECFEM3D 4.0 Model: Layered Halfspace Seconds < Lower Is Better 0xd000390 . 29.50 |============================================================ 0xd0003a5 . 29.37 |============================================================ Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only Seconds < Lower Is Better 0xd000390 . 30.74 |============================================================ 0xd0003a5 . 30.90 |============================================================ libxsmm 2-1.17-3645 M N K: 32 GFLOPS/s > Higher Is Better 0xd000390 . 604.7 |============================================================ 0xd0003a5 . 609.2 |============================================================ Cpuminer-Opt 3.20.3 Algorithm: LBC, LBRY Credits kH/s > Higher Is Better 0xd000390 . 421660 |=========================================================== 0xd0003a5 . 423130 |=========================================================== Cpuminer-Opt 3.20.3 Algorithm: scrypt kH/s > Higher Is Better 0xd000390 . 2319.31 |========================================================== 0xd0003a5 . 2321.74 |========================================================== Cpuminer-Opt 3.20.3 Algorithm: Skeincoin kH/s > Higher Is Better 0xd000390 . 613333 |=========================================================== 0xd0003a5 . 617130 |=========================================================== Cpuminer-Opt 3.20.3 Algorithm: Blake-2 S kH/s > Higher Is Better 0xd000390 . 4462327 |========================================================== 0xd0003a5 . 4466653 |========================================================== Cpuminer-Opt 3.20.3 Algorithm: Magi kH/s > Higher Is Better 0xd000390 . 2309.47 |========================================================== 0xd0003a5 . 2308.66 |========================================================== Cpuminer-Opt 3.20.3 Algorithm: Deepcoin kH/s > Higher Is Better 0xd000390 . 64677 |============================================================ 0xd0003a5 . 64897 |============================================================ Cpuminer-Opt 3.20.3 Algorithm: Triple SHA-256, Onecoin kH/s > Higher Is Better 0xd000390 . 1332237 |========================================================== 0xd0003a5 . 1333117 |========================================================== Cpuminer-Opt 3.20.3 Algorithm: x25x kH/s > Higher Is Better 0xd000390 . 2659.17 |========================================================== 0xd0003a5 . 2659.55 |========================================================== Cpuminer-Opt 3.20.3 Algorithm: Quad SHA-256, Pyrite kH/s > Higher Is Better 0xd000390 . 921730 |=========================================================== 0xd0003a5 . 926277 |=========================================================== GROMACS 2023 Implementation: MPI CPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better 0xd000390 . 9.234 |============================================================ 0xd0003a5 . 9.094 |=========================================================== Blender 3.6 Blend File: BMW27 - Compute: CPU-Only Seconds < Lower Is Better 0xd000390 . 23.83 |============================================================ 0xd0003a5 . 23.72 |============================================================ oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better 0xd000390 . 3.90360 |========================================================== 0xd0003a5 . 3.91322 |========================================================== SPECFEM3D 4.0 Model: Homogeneous Halfspace Seconds < Lower Is Better 0xd000390 . 18.02 |============================================================ 0xd0003a5 . 17.75 |=========================================================== dav1d 1.2.1 Video Input: Chimera 1080p FPS > Higher Is Better 0xd000390 . 515.81 |=========================================================== 0xd0003a5 . 514.58 |=========================================================== SPECFEM3D 4.0 Model: Tomographic Model Seconds < Lower Is Better 0xd000390 . 14.57 |============================================================ 0xd0003a5 . 14.15 |========================================================== SPECFEM3D 4.0 Model: Mount St. Helens Seconds < Lower Is Better 0xd000390 . 13.15 |============================================================ 0xd0003a5 . 12.95 |=========================================================== Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only Images / Sec > Higher Is Better 0xd000390 . 1.46 |============================================================= 0xd0003a5 . 1.46 |============================================================= dav1d 1.2.1 Video Input: Summer Nature 4K FPS > Higher Is Better 0xd000390 . 281.36 |=========================================================== 0xd0003a5 . 280.84 |=========================================================== SVT-AV1 1.6 Encoder Mode: Preset 8 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 67.17 |============================================================ 0xd0003a5 . 66.46 |=========================================================== Remhos 1.0 Test: Sample Remap Example Seconds < Lower Is Better 0xd000390 . 12.25 |=========================================================== 0xd0003a5 . 12.40 |============================================================ CloverLeaf Lagrangian-Eulerian Hydrodynamics Seconds < Lower Is Better 0xd000390 . 12.04 |============================================================ 0xd0003a5 . 11.98 |============================================================ Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction Seconds < Lower Is Better 0xd000390 . 11.02 |============================================================ 0xd0003a5 . 11.00 |============================================================ miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Billion Interactions/s > Higher Is Better 0xd000390 . 94.14 |============================================================ 0xd0003a5 . 94.90 |============================================================ miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 GFInst/s > Higher Is Better 0xd000390 . 2353.39 |========================================================== 0xd0003a5 . 2372.42 |========================================================== Embree 4.1 Binary: Pathtracer ISPC - Model: Crown Frames Per Second > Higher Is Better 0xd000390 . 88.19 |============================================================ 0xd0003a5 . 83.46 |========================================================= Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only Images / Sec > Higher Is Better 0xd000390 . 3.03 |============================================================= 0xd0003a5 . 3.03 |============================================================= Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon Frames Per Second > Higher Is Better 0xd000390 . 104.68 |=========================================================== 0xd0003a5 . 101.10 |========================================================= SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 138.75 |=========================================================== 0xd0003a5 . 138.49 |=========================================================== oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better 0xd000390 . 2.06936 |========================================================== 0xd0003a5 . 2.06967 |========================================================== SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 184.38 |=========================================================== 0xd0003a5 . 182.74 |========================================================== SVT-AV1 1.6 Encoder Mode: Preset 12 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 180.97 |=========================================================== 0xd0003a5 . 177.72 |========================================================== SVT-AV1 1.6 Encoder Mode: Preset 13 - Input: Bosphorus 4K Frames Per Second > Higher Is Better 0xd000390 . 175.10 |========================================================== 0xd0003a5 . 177.20 |=========================================================== HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better 0xd000390 . 46.35 |=========================================================== 0xd0003a5 . 46.98 |============================================================ oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better 0xd000390 . 3.62526 |========================================================== 0xd0003a5 . 3.62266 |========================================================== HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better 0xd000390 . 93.75 |=========================================================== 0xd0003a5 . 94.86 |============================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better 0xd000390 . 101.98 |========================================================== 0xd0003a5 . 102.92 |=========================================================== HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better 0xd000390 . 224.42 |========================================================== 0xd0003a5 . 226.78 |=========================================================== HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 GFLOP/s > Higher Is Better 0xd000390 . 154.73 |========================================================= 0xd0003a5 . 159.10 |=========================================================== HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 GFLOP/s > Higher Is Better 0xd000390 . 144.94 |======================================================== 0xd0003a5 . 153.79 |=========================================================== HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 GFLOP/s > Higher Is Better 0xd000390 . 93.09 |=========================================================== 0xd0003a5 . 93.92 |============================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 GFLOP/s > Higher Is Better 0xd000390 . 195.20 |========================================================== 0xd0003a5 . 198.87 |===========================================================