AMD EPYC 8534P

AMD EPYC 8534P 64-Core testing with a AMD Cinnabar (RCB1009C BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2401286-NE-AMDEPYC8520&rdt&grs.

AMD EPYC 8534PProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionabcAMD EPYC 8534P 64-Core @ 2.30GHz (64 Cores / 128 Threads)AMD Cinnabar (RCB1009C BIOS)AMD Device 14a46 x 32 GB DRAM-4800MT/s Samsung M321R4GA0BB0-CQKMG3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 23.106.5.0-5-generic (x86_64)GNOME ShellX Server 1.21.1.7GCC 13.2.0ext4640x480OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-FTCNCZ/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-FTCNCZ/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa00212 Java Details- a: OpenJDK Runtime Environment (build 11.0.20+8-post-Ubuntu-1ubuntu1)Python Details- Python 3.11.5Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AMD EPYC 8534Pllama-cpp: llama-2-7b.Q4_0.ggufspeedb: Rand Readsvt-av1: Preset 8 - Bosphorus 1080psvt-av1: Preset 4 - Bosphorus 1080ppytorch: CPU - 512 - ResNet-152speedb: Update Randpytorch: CPU - 1 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lpytorch: CPU - 512 - ResNet-50svt-av1: Preset 13 - Bosphorus 4Kdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Streamsvt-av1: Preset 8 - Bosphorus 4Kdeepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Streampytorch: CPU - 16 - ResNet-50svt-av1: Preset 4 - Bosphorus 4Kpytorch: CPU - 1 - ResNet-50speedb: Read While Writingtensorflow: CPU - 1 - ResNet-50pytorch: CPU - 16 - ResNet-152deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Streamllamafile: wizardcoder-python-34b-v1.0.Q6_K - CPUsvt-av1: Preset 13 - Bosphorus 1080pdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Streamsvt-av1: Preset 12 - Bosphorus 1080pllama-cpp: llama-2-13b.Q4_0.ggufdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamsvt-av1: Preset 12 - Bosphorus 4Kdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Streamy-cruncher: 500Mdeepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamquicksilver: CORAL2 P2deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamquicksilver: CTS2deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamllamafile: llava-v1.5-7b-q4 - CPUdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streampytorch: CPU - 16 - Efficientnet_v2_lpytorch: CPU - 512 - Efficientnet_v2_ldeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamy-cruncher: 1Bdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamspeedb: Read Rand Write Randdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamllama-cpp: llama-2-70b-chat.Q5_0.ggufdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: ResNet-50, Baseline - Synchronous Single-Streamdeepsparse: ResNet-50, Baseline - Synchronous Single-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamquicksilver: CORAL2 P1llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPUdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamtensorflow: CPU - 512 - ResNet-50deepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamtensorflow: CPU - 16 - ResNet-50deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamabc27.95287702223129.02417.11514.3835526416.869.3836.44195.9044.9412202.16316.605151.25235.323767.32187.619936.636.69745.12153810205.9214.6731.082432.16195.5599.849508.89315.614163.9831506.24616.6546.4063688.5675192.6333766.1301213.59766.594.897151.49128.477916250000149.388116320000150.7692136.466962.39337.3198211.564921.9433.859829.52626.026.04885.82369.94635.5367250652835.65493.444.1242822.751.211522.641545.582629.640533.7288187.45295.32821439.773422.19132130000014.27101.6471695.3752103.3476.8367882.4326313.987253.65477.001466.999366.965527.94298445565133.52417.6314.7934551516.449.5937.28191.9274.9419202.11136.5703152.045.32868.423187.480236.136.7745.60152842195.9814.5530.993832.25345.46599.42512.169915.681963.7036505.53616.7246.6414685.1064193.5813749.0315214.54716.58664.914151.55958.513416260000148.77916360000150.1556136.528162.15947.3167212.392221.8933.792829.58416.046.05883.46229.91535.6066251419635.69443.443.9996823.4531.210722.703645.635829.642833.7262187.86945.31681441.042122.17972133000014.29101.7295694.563103.2476.6784882.8829313.801253.61477.282667.002866.952226.50298332203130.34517.56514.5535139416.589.6136.71194.4715.0360198.36916.4896153.93505.411267.778184.658136.686.72145.33154424205.9414.6730.851632.40235.50603.599510.483815.595864.0546503.50816.6346.5963685.7373193.4253754.3372213.87286.56094.918152.14058.502616193333149.208616293333150.4754137.015762.14457.2910211.909721.9733.739229.63136.026.03882.97219.92935.6474250808235.58913.4144.0566821.13171.214122.674345.531529.575933.8023187.46275.32791442.403722.15502131666714.27101.7894694.9264103.19477.0568883.1148313.752853.62477.253166.972366.9459OpenBenchmarking.org

Llama.cpp

Model: llama-2-7b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-7b.Q4_0.ggufabc714212835SE +/- 0.02, N = 327.9527.9426.501. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Speedb

Test: Random Read

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Random Readabc60M120M180M240M300MSE +/- 222389.57, N = 32877022232984455652983322031. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 8 - Input: Bosphorus 1080pabc306090120150SE +/- 1.85, N = 3129.02133.52130.351. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 4 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 4 - Input: Bosphorus 1080pabc48121620SE +/- 0.16, N = 317.1217.6317.571. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-152abc48121620SE +/- 0.06, N = 314.3814.7914.55MIN: 12.94 / MAX: 14.47MIN: 13.56 / MAX: 14.89MIN: 13.35 / MAX: 14.76

Speedb

Test: Update Random

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Update Randomabc80K160K240K320K400KSE +/- 2346.60, N = 33552643455153513941. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152abc48121620SE +/- 0.05, N = 316.8616.4416.58MIN: 14.83 / MAX: 17.02MIN: 14.99 / MAX: 16.55MIN: 16.34 / MAX: 16.81

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_labc3691215SE +/- 0.04, N = 39.389.599.61MIN: 8.84 / MAX: 9.52MIN: 9.48 / MAX: 9.67MIN: 8.95 / MAX: 9.78

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-50abc918273645SE +/- 0.29, N = 336.4437.2836.71MIN: 35.51 / MAX: 36.77MIN: 35.91 / MAX: 37.74MIN: 35.24 / MAX: 37.75

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 13 - Input: Bosphorus 4Kabc4080120160200SE +/- 1.23, N = 3195.90191.93194.471. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Streamabc1.13312.26623.39934.53245.6655SE +/- 0.0306, N = 34.94124.94195.0360

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Streamabc4080120160200SE +/- 1.20, N = 3202.16202.11198.37

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Streamabc246810SE +/- 0.0019, N = 36.60506.57036.4896

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Streamabc306090120150SE +/- 0.04, N = 3151.25152.04153.94

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Streamabc1.21752.4353.65254.876.0875SE +/- 0.0772, N = 35.32375.32805.4112

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 8 - Input: Bosphorus 4Kabc1530456075SE +/- 0.17, N = 367.3268.4267.781. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Streamabc4080120160200SE +/- 2.60, N = 3187.62187.48184.66

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-50abc816243240SE +/- 0.18, N = 336.6336.1336.68MIN: 35.5 / MAX: 37.04MIN: 34.96 / MAX: 36.54MIN: 30.63 / MAX: 37.51

SVT-AV1

Encoder Mode: Preset 4 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 4 - Input: Bosphorus 4Kabc246810SE +/- 0.038, N = 36.6976.7706.7211. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50abc1020304050SE +/- 0.20, N = 345.1245.6045.33MIN: 35.13 / MAX: 45.99MIN: 34.79 / MAX: 46.32MIN: 43.73 / MAX: 46.27

Speedb

Test: Read While Writing

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Read While Writingabc3M6M9M12M15MSE +/- 188799.32, N = 31538102015284219154424201. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

TensorFlow

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: ResNet-50abc1.34552.6914.03655.3826.7275SE +/- 0.04, N = 35.925.985.94

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-152abc48121620SE +/- 0.08, N = 314.6714.5514.67MIN: 14.48 / MAX: 14.78MIN: 14.35 / MAX: 14.64MIN: 13.21 / MAX: 14.86

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Streamabc714212835SE +/- 0.02, N = 331.0830.9930.85

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Streamabc816243240SE +/- 0.02, N = 332.1632.2532.40

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPUabc1.23752.4753.71254.956.1875SE +/- 0.02, N = 35.505.465.50

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 13 - Input: Bosphorus 1080pabc130260390520650SE +/- 1.86, N = 3599.85599.42603.601. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamabc110220330440550SE +/- 1.07, N = 3508.89512.17510.48

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Streamabc48121620SE +/- 0.03, N = 315.6115.6815.60

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Streamabc1428425670SE +/- 0.14, N = 363.9863.7064.05

SVT-AV1

Encoder Mode: Preset 12 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 12 - Input: Bosphorus 1080pabc110220330440550SE +/- 4.17, N = 3506.25505.54503.511. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Llama.cpp

Model: llama-2-13b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-13b.Q4_0.ggufabc48121620SE +/- 0.04, N = 316.6516.7216.631. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc1122334455SE +/- 0.10, N = 346.4146.6446.60

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc150300450600750SE +/- 1.46, N = 3688.57685.11685.74

SVT-AV1

Encoder Mode: Preset 12 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.8Encoder Mode: Preset 12 - Input: Bosphorus 4Kabc4080120160200SE +/- 2.19, N = 3192.63193.58193.431. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc8001600240032004000SE +/- 13.07, N = 33766.133749.033754.34

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc50100150200250SE +/- 0.26, N = 3213.60214.55213.87

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Streamabc246810SE +/- 0.0014, N = 36.59006.58666.5609

Y-Cruncher

Pi Digits To Calculate: 500M

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.3Pi Digits To Calculate: 500Mabc1.10662.21323.31984.42645.533SE +/- 0.007, N = 34.8974.9144.918

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Streamabc306090120150SE +/- 0.03, N = 3151.49151.56152.14

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc246810SE +/- 0.0299, N = 38.47798.51348.5026

Quicksilver

Input: CORAL2 P2

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P2abc3M6M9M12M15MSE +/- 6666.67, N = 31625000016260000161933331. (CXX) g++ options: -fopenmp -O3 -march=native

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc306090120150SE +/- 0.14, N = 3149.39148.78149.21

Quicksilver

Input: CTS2

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CTS2abc4M8M12M16M20MSE +/- 8819.17, N = 31632000016360000162933331. (CXX) g++ options: -fopenmp -O3 -march=native

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Streamabc306090120150SE +/- 0.10, N = 3150.77150.16150.48

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Streamabc306090120150SE +/- 0.24, N = 3136.47136.53137.02

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamabc1428425670SE +/- 0.16, N = 362.3962.1662.14

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Streamabc246810SE +/- 0.0130, N = 37.31987.31677.2910

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Streamabc50100150200250SE +/- 0.12, N = 3211.56212.39211.91

Llamafile

Test: llava-v1.5-7b-q4 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: llava-v1.5-7b-q4 - Acceleration: CPUabc510152025SE +/- 0.03, N = 321.9421.8921.97

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Streamabc816243240SE +/- 0.03, N = 333.8633.7933.74

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Streamabc714212835SE +/- 0.02, N = 329.5329.5829.63

PyTorch

Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_labc246810SE +/- 0.01, N = 36.026.046.02MIN: 5.57 / MAX: 6.13MIN: 5.63 / MAX: 6.16MIN: 5.54 / MAX: 6.14

PyTorch

Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_labc246810SE +/- 0.02, N = 36.046.056.03MIN: 5.57 / MAX: 6.16MIN: 5.62 / MAX: 6.16MIN: 5.42 / MAX: 6.21

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamabc2004006008001000SE +/- 0.60, N = 3885.82883.46882.97

Y-Cruncher

Pi Digits To Calculate: 1B

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.3Pi Digits To Calculate: 1Babc3691215SE +/- 0.006, N = 39.9469.9159.929

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamabc816243240SE +/- 0.06, N = 335.5435.6135.65

Speedb

Test: Read Random Write Random

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Read Random Write Randomabc500K1000K1500K2000K2500KSE +/- 12093.31, N = 32506528251419625080821. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamabc816243240SE +/- 0.02, N = 335.6535.6935.59

Llama.cpp

Model: llama-2-70b-chat.Q5_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-70b-chat.Q5_0.ggufabc0.76731.53462.30193.06923.8365SE +/- 0.01, N = 33.403.403.411. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Streamabc1020304050SE +/- 0.05, N = 344.1244.0044.06

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Streamabc2004006008001000SE +/- 1.85, N = 3822.75823.45821.13

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Streamabc0.27320.54640.81961.09281.366SE +/- 0.0028, N = 31.21151.21071.2141

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Streamabc510152025SE +/- 0.02, N = 322.6422.7022.67

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Streamabc1020304050SE +/- 0.07, N = 345.5845.6445.53

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Streamabc714212835SE +/- 0.02, N = 329.6429.6429.58

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Streamabc816243240SE +/- 0.03, N = 333.7333.7333.80

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Synchronous Single-Streamabc4080120160200SE +/- 0.17, N = 3187.45187.87187.46

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Synchronous Single-Streamabc1.19882.39763.59644.79525.994SE +/- 0.0047, N = 35.32825.31685.3279

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc30060090012001500SE +/- 0.78, N = 31439.771441.041442.40

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Streamabc510152025SE +/- 0.01, N = 322.1922.1822.16

Quicksilver

Input: CORAL2 P1

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P1abc5M10M15M20M25MSE +/- 34801.02, N = 32130000021330000213166671. (CXX) g++ options: -fopenmp -O3 -march=native

Llamafile

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPUabc48121620SE +/- 0.02, N = 314.2714.2914.27

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamabc20406080100SE +/- 0.11, N = 3101.65101.73101.79

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Streamabc150300450600750SE +/- 0.19, N = 3695.38694.56694.93

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-50abc20406080100SE +/- 0.01, N = 3103.30103.20103.19

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Streamabc100200300400500SE +/- 0.22, N = 3476.84476.68477.06

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamabc2004006008001000SE +/- 0.36, N = 3882.43882.88883.11

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamabc70140210280350SE +/- 0.43, N = 3313.99313.80313.75

TensorFlow

Device: CPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: ResNet-50abc1224364860SE +/- 0.04, N = 353.6553.6153.62

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamabc100200300400500SE +/- 0.10, N = 3477.00477.28477.25

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Streamabc1530456075SE +/- 0.04, N = 367.0067.0066.97

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamabc1530456075SE +/- 0.02, N = 366.9766.9566.95


Phoronix Test Suite v10.8.5