AMD EPYC 4th Gen AVX-512 Comparison

AMD EPYC 9654 Genoa AVX-512 benchmark comparison by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2212195-NE-AVXCOMPAR69
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

C/C++ Compiler Tests 3 Tests
CPU Massive 7 Tests
Creator Workloads 9 Tests
Cryptography 2 Tests
Game Development 2 Tests
HPC - High Performance Computing 16 Tests
Machine Learning 11 Tests
Molecular Dynamics 3 Tests
Multi-Core 10 Tests
NVIDIA GPU Compute 3 Tests
Intel oneAPI 7 Tests
OpenMPI Tests 3 Tests
Python 2 Tests
Raytracing 2 Tests
Renderers 2 Tests
Scientific Computing 3 Tests
Server CPU Tests 6 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
AVX-512 On
December 18 2022
  20 Hours, 2 Minutes
AVX-512 Off
December 18 2022
  15 Hours, 29 Minutes
Invert Hiding All Results Option
  17 Hours, 45 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC 4th Gen AVX-512 Comparison Suite 1.0.0 System Test suite extracted from AMD EPYC 4th Gen AVX-512 Comparison. pts/jpegxl-1.5.0 --lossless_jpeg=0 sample-photo-6000x4000.JPG out.jxl -q 100 --num_reps 10 Input: JPEG - Quality: 100 pts/ncnn-1.4.0 -1 Target: CPU - Model: FastestDet pts/ncnn-1.4.0 -1 Target: CPU - Model: vision_transformer pts/ncnn-1.4.0 -1 Target: CPU - Model: regnety_400m pts/ncnn-1.4.0 -1 Target: CPU - Model: resnet50 pts/ncnn-1.4.0 -1 Target: CPU - Model: googlenet pts/ncnn-1.4.0 -1 Target: CPU - Model: blazeface pts/ncnn-1.4.0 -1 Target: CPU - Model: efficientnet-b0 pts/ncnn-1.4.0 -1 Target: CPU - Model: mnasnet pts/mnn-2.1.0 Model: inception-v3 pts/mnn-2.1.0 Model: SqueezeNetV1.0 pts/mnn-2.1.0 Model: resnet-v2-50 pts/onnx-1.5.0 resnet100/resnet100.onnx -e cpu Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard pts/lczero-1.6.0 -b blas Backend: BLAS pts/openvkl-1.3.0 vklBenchmark --benchmark_filter=ispc Benchmark: vklBenchmark ISPC pts/daphne-1.0.0 OpenMP points2image Backend: OpenMP - Kernel: Points2Image pts/cp2k-1.3.0 -i benchmarks/Fayalite-FIST/fayalite.inp Input: Fayalite-FIST pts/lczero-1.6.0 -b eigen Backend: Eigen pts/ai-benchmark-1.0.2 Device AI Score pts/ai-benchmark-1.0.2 Device Training Score pts/ai-benchmark-1.0.2 Device Inference Score pts/onnx-1.5.0 fcn-resnet101-11/model.onnx -e cpu Model: fcn-resnet101-11 - Device: CPU - Executor: Standard pts/onnx-1.5.0 super_resolution/super_resolution.onnx -e cpu Model: super-resolution-10 - Device: CPU - Executor: Standard pts/onednn-2.7.0 --rnn --batch=inputs/rnn/perf_rnn_training --cfg=bf16bf16bf16 --engine=cpu Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU pts/ospray-studio-1.1.0 --cameras 1 1 --resolution 3840 2160 --spp 16 --renderer pathtracer Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 2 2 --resolution 3840 2160 --spp 16 --renderer pathtracer Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 3 3 --resolution 3840 2160 --spp 16 --renderer pathtracer Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer pts/numpy-1.2.1 pts/ospray-studio-1.1.0 --cameras 3 3 --resolution 1920 1080 --spp 32 --renderer pathtracer Camera: 3 - Resolution: 1080p - Samples Per Pixel: 32 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 2 2 --resolution 1920 1080 --spp 32 --renderer pathtracer Camera: 2 - Resolution: 1080p - Samples Per Pixel: 32 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 1 1 --resolution 1920 1080 --spp 32 --renderer pathtracer Camera: 1 - Resolution: 1080p - Samples Per Pixel: 32 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 1 1 --resolution 3840 2160 --spp 32 --renderer pathtracer Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 2 2 --resolution 3840 2160 --spp 32 --renderer pathtracer Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 3 3 --resolution 1920 1080 --spp 16 --renderer pathtracer Camera: 3 - Resolution: 1080p - Samples Per Pixel: 16 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 2 2 --resolution 1920 1080 --spp 16 --renderer pathtracer Camera: 2 - Resolution: 1080p - Samples Per Pixel: 16 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 1 1 --resolution 1920 1080 --spp 16 --renderer pathtracer Camera: 1 - Resolution: 1080p - Samples Per Pixel: 16 - Renderer: Path Tracer pts/ospray-2.10.0 --benchmark_filter=gravity_spheres_volume/dim_512/scivis/real_time Benchmark: gravity_spheres_volume/dim_512/scivis/real_time pts/ospray-2.10.0 --benchmark_filter=gravity_spheres_volume/dim_512/ao/real_time Benchmark: gravity_spheres_volume/dim_512/ao/real_time pts/ospray-studio-1.1.0 --cameras 3 3 --resolution 1920 1080 --spp 1 --renderer pathtracer Camera: 3 - Resolution: 1080p - Samples Per Pixel: 1 - Renderer: Path Tracer pts/onednn-2.7.0 --rnn --batch=inputs/rnn/perf_rnn_training --cfg=f32 --engine=cpu Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU pts/onnx-1.5.0 bertsquad-12/bertsquad-12.onnx -e cpu Model: bertsquad-12 - Device: CPU - Executor: Standard pts/ospray-studio-1.1.0 --cameras 2 2 --resolution 1920 1080 --spp 1 --renderer pathtracer Camera: 2 - Resolution: 1080p - Samples Per Pixel: 1 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 1 1 --resolution 1920 1080 --spp 1 --renderer pathtracer Camera: 1 - Resolution: 1080p - Samples Per Pixel: 1 - Renderer: Path Tracer pts/ospray-2.10.0 --benchmark_filter=gravity_spheres_volume/dim_512/pathtracer/real_time Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time pts/tensorflow-2.0.0 --device cpu --batch_size=16 --model=resnet50 Device: CPU - Batch Size: 16 - Model: ResNet-50 pts/ospray-studio-1.1.0 --cameras 3 3 --resolution 3840 2160 --spp 32 --renderer pathtracer Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer pts/tensorflow-2.0.0 --device cpu --batch_size=16 --model=googlenet Device: CPU - Batch Size: 16 - Model: GoogLeNet pts/ospray-studio-1.1.0 --cameras 3 3 --resolution 3840 2160 --spp 1 --renderer pathtracer Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer pts/ospray-studio-1.1.0 --cameras 1 1 --resolution 3840 2160 --spp 1 --renderer pathtracer Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer pts/jpegxl-1.5.0 --lossless_jpeg=0 sample-photo-6000x4000.JPG out.jxl -q 90 --num_reps 40 Input: JPEG - Quality: 90 pts/ospray-studio-1.1.0 --cameras 2 2 --resolution 3840 2160 --spp 1 --renderer pathtracer Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer pts/jpegxl-1.5.0 sample-4.png out.jxl -q 90 --num_reps 40 Input: PNG - Quality: 90 pts/onednn-2.7.0 --rnn --batch=inputs/rnn/perf_rnn_inference_lb --cfg=bf16bf16bf16 --engine=cpu Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU pts/simdjson-2.0.1 top_tweet Throughput Test: TopTweet pts/openfoam-1.2.0 incompressible/simpleFoam/drivaerFastback/ -m M Input: drivaerFastback, Medium Mesh Size - Execution Time pts/openfoam-1.2.0 incompressible/simpleFoam/drivaerFastback/ -m M Input: drivaerFastback, Medium Mesh Size - Mesh Time pts/openvino-1.1.0 -m models/intel/age-gender-recognition-retail-0013/FP16/age-gender-recognition-retail-0013.xml -d CPU Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU pts/simdjson-2.0.1 distinct_user_id Throughput Test: DistinctUserID pts/simdjson-2.0.1 partial_tweets Throughput Test: PartialTweets pts/openvino-1.1.0 -m models/intel/person-detection-0106/FP16/person-detection-0106.xml -d CPU Model: Person Detection FP16 - Device: CPU pts/openvino-1.1.0 -m models/intel/person-detection-0106/FP32/person-detection-0106.xml -d CPU Model: Person Detection FP32 - Device: CPU pts/openvino-1.1.0 -m models/intel/face-detection-0206/FP16/face-detection-0206.xml -d CPU Model: Face Detection FP16 - Device: CPU pts/openvino-1.1.0 -m models/intel/face-detection-0206/FP16-INT8/face-detection-0206.xml -d CPU Model: Face Detection FP16-INT8 - Device: CPU pts/openvino-1.1.0 -m models/intel/person-vehicle-bike-detection-2004/FP16/person-vehicle-bike-detection-2004.xml -d CPU Model: Person Vehicle Bike Detection FP16 - Device: CPU pts/deepsparse-1.0.1 zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned90-none --scenario async Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream pts/openvino-1.1.0 -m models/intel/machine-translation-nar-en-de-0002/FP16/machine-translation-nar-en-de-0002.xml -d CPU Model: Machine Translation EN To DE FP16 - Device: CPU pts/openvino-1.1.0 -m models/intel/age-gender-recognition-retail-0013/FP16-INT8/age-gender-recognition-retail-0013.xml -d CPU Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU pts/openvino-1.1.0 -m models/intel/weld-porosity-detection-0001/FP16-INT8/weld-porosity-detection-0001.xml -d CPU Model: Weld Porosity Detection FP16-INT8 - Device: CPU pts/openvino-1.1.0 -m models/intel/vehicle-detection-0202/FP16/vehicle-detection-0202.xml -d CPU Model: Vehicle Detection FP16 - Device: CPU pts/openvino-1.1.0 -m models/intel/vehicle-detection-0202/FP16-INT8/vehicle-detection-0202.xml -d CPU Model: Vehicle Detection FP16-INT8 - Device: CPU pts/openvino-1.1.0 -m models/intel/weld-porosity-detection-0001/FP16/weld-porosity-detection-0001.xml -d CPU Model: Weld Porosity Detection FP16 - Device: CPU pts/deepsparse-1.0.1 zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none --scenario sync Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream pts/simdjson-2.0.1 kostya Throughput Test: Kostya pts/deepsparse-1.0.1 zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none --scenario async Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream pts/deepsparse-1.0.1 zoo:nlp/token_classification/bert-base/pytorch/huggingface/conll2003/base-none --scenario sync Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream pts/deepsparse-1.0.1 zoo:nlp/document_classification/obert-base/pytorch/huggingface/imdb/base-none --scenario sync Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream pts/simdjson-2.0.1 large_random Throughput Test: LargeRandom pts/deepsparse-1.0.1 zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned90-none --scenario sync Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream pts/deepsparse-1.0.1 zoo:nlp/text_classification/distilbert-none/pytorch/huggingface/mnli/base-none --scenario sync Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream pts/tensorflow-2.0.0 --device cpu --batch_size=16 --model=alexnet Device: CPU - Batch Size: 16 - Model: AlexNet pts/deepsparse-1.0.1 zoo:nlp/text_classification/distilbert-none/pytorch/huggingface/mnli/base-none --scenario async Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream pts/deepsparse-1.0.1 zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none --scenario async Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream pts/deepsparse-1.0.1 zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none --scenario sync Model: CV Detection,YOLOv5s COCO - Scenario: Synchronous Single-Stream pts/deepsparse-1.0.1 zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/base-none --scenario async Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream pts/cpuminer-opt-1.6.0 -a scrypt Algorithm: scrypt pts/cpuminer-opt-1.6.0 -a lbry Algorithm: LBC, LBRY Credits pts/cpuminer-opt-1.6.0 -a x25x Algorithm: x25x pts/svt-av1-2.7.0 --preset 12 -i Bosphorus_3840x2160.y4m -w 3840 -h 2160 Encoder Mode: Preset 12 - Input: Bosphorus 4K pts/cpuminer-opt-1.6.0 -a allium Algorithm: Garlicoin pts/cpuminer-opt-1.6.0 -a sha256q Algorithm: Quad SHA-256, Pyrite pts/cpuminer-opt-1.6.0 -a skein Algorithm: Skeincoin pts/numenta-nab-1.1.1 -d bayesChangePt Detector: Bayesian Changepoint pts/gromacs-1.7.0 mpi-build water-cut1.0_GMX50_bare/1536 Implementation: MPI CPU - Input: water_GMX50_bare pts/onednn-2.7.0 --conv --batch=inputs/conv/shapes_auto --cfg=f32 --engine=cpu Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU pts/onednn-2.7.0 --deconv --batch=inputs/deconv/shapes_1d --cfg=f32 --engine=cpu Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU pts/daphne-1.0.0 OpenMP ndt_mapping Backend: OpenMP - Kernel: NDT Mapping pts/svt-av1-2.7.0 --preset 13 -i Bosphorus_3840x2160.y4m -w 3840 -h 2160 Encoder Mode: Preset 13 - Input: Bosphorus 4K pts/minibude-1.0.0 --deck ../data/bm2 --iterations 10 Implementation: OpenMP - Input Deck: BM2 pts/embree-1.2.1 pathtracer_ispc -c asian_dragon_obj/asian_dragon.ecs Binary: Pathtracer ISPC - Model: Asian Dragon Obj pts/oidn-1.4.0 -r RTLightmap.hdr.4096x4096 Run: RTLightmap.hdr.4096x4096 pts/numenta-nab-1.1.1 -d windowedGaussian Detector: Windowed Gaussian pts/numenta-nab-1.1.1 -d relativeEntropy Detector: Relative Entropy pts/oidn-1.4.0 -r RT.hdr_alb_nrm.3840x2160 Run: RT.hdr_alb_nrm.3840x2160 pts/oidn-1.4.0 -r RT.ldr_alb_nrm.3840x2160 Run: RT.ldr_alb_nrm.3840x2160 pts/smhasher-1.1.0 --test=Speed FarmHash32 Hash: FarmHash32 x86_64 AVX pts/embree-1.2.1 pathtracer_ispc -c crown/crown.ecs Binary: Pathtracer ISPC - Model: Crown pts/embree-1.2.1 pathtracer_ispc -c asian_dragon/asian_dragon.ecs Binary: Pathtracer ISPC - Model: Asian Dragon pts/minibude-1.0.0 --deck ../data/bm1 --iterations 500 Implementation: OpenMP - Input Deck: BM1 pts/onednn-2.7.0 --deconv --batch=inputs/deconv/shapes_3d --cfg=f32 --engine=cpu Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU