AMD SME Benchmark Genoa

4th Gen AMD EPYC "Genoa" Secure Memory Encryption (SME) benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2212212-NE-AMDSMEBEN19&sro&grt.

AMD SME Benchmark GenoaProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionNo SMEAMD SME Enabled2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads)AMD Titanite_4G (RTI1002E BIOS)AMD Device 14a41520GB800GB INTEL SSDPF21Q800GBASPEEDVGA HDMIBroadcom NetXtreme BCM5720 PCIeUbuntu 22.106.1.0-phx (x86_64)GNOME Shell 43.0X Server 1.21.1.41.3.224GCC 12.2.0 + Clang 15.0.2-1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110d Java Details- OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Details- Python 3.10.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AMD SME Benchmark Genoacompress-7zip: Compression Ratingcompress-7zip: Decompression Ratingmt-dgemm: Sustained Floating-Point Rateaom-av1: Speed 10 Realtime - Bosphorus 4Kappleseed: Emilyaskap: tConvolve MPI - Degriddingaskap: tConvolve MPI - Griddingastcenc: Thoroughastcenc: Exhaustiveblender: Classroom - CPU-Onlyblender: Barbershop - CPU-Onlydacapobench: H2embree: Pathtracer ISPC - Crowngpaw: Carbon Nanotubegraph500: 26graph500: 26graph500: 26graph500: 26gromacs: MPI CPU - water_GMX50_barehpcg: oidn: RTLightmap.hdr.4096x4096toktx: Zstd Compression 9toktx: Zstd Compression 19kvazaar: Bosphorus 4K - Very Fastkvazaar: Bosphorus 4K - Ultra Fastavifenc: 2avifenc: 6liquid-dsp: 256 - 256 - 57liquid-dsp: 384 - 256 - 57lulesh: minibude: OpenMP - BM1minibude: OpenMP - BM1minibude: OpenMP - BM2minibude: OpenMP - BM2namd: ATPase Simulation - 327,506 Atomsnpb: BT.Cnpb: EP.Cnpb: FT.Cnpb: SP.Cdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamnginx: 500nwchem: C240 Buckyballonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonnx: super-resolution-10 - CPU - Standardopenfoam: drivaerFastback, Small Mesh Size - Mesh Timeopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenradioss: Bumper Beamopenradioss: Cell Phone Drop Testopenradioss: INIVOL and Fluid Structure Interaction Drop Containeropenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvkl: vklBenchmark ISPCospray: particle_volume/pathtracer/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeospray-studio: 3 - 4K - 32 - Path Tracerpgbench: 100 - 250 - Read Onlypyhpc: CPU - Numpy - 4194304 - Equation of Statepyhpc: CPU - Numpy - 4194304 - Isoneutral Mixingquantlib: relion: Basic - CPUrenaissance: Finagle HTTP Requestsrenaissance: In-Memory Database Shootoutrodinia: OpenMP LavaMDrodinia: OpenMP CFD Solversrsran: OFDM_Testsrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 256-QAMsrsran: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMsrsran: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMsvt-av1: Preset 13 - Bosphorus 4Ktensorflow: CPU - 64 - AlexNetbuild-gem5: Time To Compilebuild-godot: Time To Compilebuild-linux-kernel: defconfigbuild-linux-kernel: allmodconfigbuild-llvm: Ninjabuild-llvm: Unix Makefileswrf: conus 2.5kmx264: Bosphorus 4Kx265: Bosphorus 4Kincompact3d: input.i3d 193 Cells Per Directionxmrig: Monero - 1Mxmrig: Wownero - 1Mxsbench: compress-zstd: 19, Long Mode - Compression Speedcompress-zstd: 19, Long Mode - Decompression SpeedNo SMEAMD SME Enabled917782116063270.37209534.47142.9470283598.393071.0106.424411.820620.9980.774807183.252823.1671426480000153318000059315300083850500018.71288.39021.662.73418.86374.3177.6834.6902.393103320000001034600000059069.4057281.284291.2518633.544345.3420.12831496467.9816457.94223096.07255564.1984.25001134.4234762.6984125.5335858.4729111.46141962.102448.82911204.849179.4895617.0281155.102984.27641133.3132201056.691524.40.8637263.892990.52205222.67950.9161332011.15560025.0678722.08426479.8518.4580.88101.90469.8143.291102.1542.761115.567437.736.44193.95246.9511180.634.289997.684.79967.9049.5419801.409.629027.725.31150792.420.55165194.420.361322230.61743.84782204329511470.8841.6913052.8128.65512286.34764.616.5085.938161733333415.1157.8413.9165.8445.2166.0444.0172.7139.794.4248.334508.40138.63934.14125.709146.32575.329160.1294077.189106.8623.484.37420527105141.7126508.32980641552.93825.0885135116903870.27743733.12150.9207178718.789541.8106.556611.837920.9581.575050180.571722.9811358510000152638000057251000083546700018.62387.15011.662.77619.88173.3276.2235.2602.469103446666671035000000057686.0867290.636291.6258588.749343.5500.12991494917.4416462.35220214.75253299.3383.63631143.0364745.6412128.4001840.9404113.82701911.399950.10801178.999081.2258601.9341158.942883.81731143.0106196386.411543.10.8504183.923130.52662823.14290.9184822002.43558327.09945822.1330279.9718.3280.90101.55471.5342.331127.5142.031134.687274.986.59193.83247.2311184.764.289990.194.79963.1449.7819704.849.678993.905.33148736.040.55167545.540.361286229.87943.37852261429708690.9341.7443061.3130.42612347.54838.516.6696.043162633333408.5157.7415.7165.9444.8165.7445.7172.2139.194.9251.441505.26142.18135.03825.303148.43576.629162.6294116.62103.0723.294.42424568101932.1123484.12902142849.93837.3OpenBenchmarking.org

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingAMD SME EnabledNo SME200K400K600K800K1000KSE +/- 6113.05, N = 3SE +/- 9930.11, N = 38851359177821. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingAMD SME EnabledNo SME300K600K900K1200K1500KSE +/- 7858.68, N = 3SE +/- 10921.46, N = 3116903811606321. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateAMD SME EnabledNo SME1632486480SE +/- 0.18, N = 3SE +/- 0.11, N = 370.2870.371. (CC) gcc options: -O3 -march=native -fopenmp

AOM AV1

Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.5Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4KAMD SME EnabledNo SME816243240SE +/- 0.56, N = 12SE +/- 0.53, N = 1533.1234.471. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm

Appleseed

Scene: Emily

OpenBenchmarking.orgSeconds, Fewer Is BetterAppleseed 2.0 BetaScene: EmilyAMD SME EnabledNo SME306090120150150.92142.95

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingAMD SME EnabledNo SME20K40K60K80K100KSE +/- 0.00, N = 3SE +/- 368.27, N = 378718.783598.31. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingAMD SME EnabledNo SME20K40K60K80K100KSE +/- 422.37, N = 3SE +/- 460.77, N = 389541.893071.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASTC Encoder

Preset: Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ThoroughAMD SME EnabledNo SME20406080100SE +/- 0.03, N = 3SE +/- 0.06, N = 3106.56106.421. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Exhaustive

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ExhaustiveAMD SME EnabledNo SME3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 311.8411.821. (CXX) g++ options: -O3 -flto -pthread

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: Classroom - Compute: CPU-OnlyAMD SME EnabledNo SME510152025SE +/- 0.02, N = 3SE +/- 0.08, N = 320.9520.99

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: Barbershop - Compute: CPU-OnlyAMD SME EnabledNo SME20406080100SE +/- 0.45, N = 3SE +/- 0.30, N = 381.5780.77

DaCapo Benchmark

Java Test: H2

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 9.12-MR1Java Test: H2AMD SME EnabledNo SME11002200330044005500SE +/- 50.10, N = 20SE +/- 54.36, N = 2050504807

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer ISPC - Model: CrownAMD SME EnabledNo SME4080120160200SE +/- 0.40, N = 3SE +/- 0.65, N = 3180.57183.25MIN: 129.9 / MAX: 210MIN: 135.3 / MAX: 213.14

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 22.1Input: Carbon NanotubeAMD SME EnabledNo SME612182430SE +/- 0.05, N = 3SE +/- 0.16, N = 322.9823.171. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME300M600M900M1200M1500M135851000014264800001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME300M600M900M1200M1500M152638000015331800001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME130M260M390M520M650M5725100005931530001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME200M400M600M800M1000M8354670008385050001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2022.1Implementation: MPI CPU - Input: water_GMX50_bareAMD SME EnabledNo SME510152025SE +/- 0.03, N = 3SE +/- 0.03, N = 318.6218.711. (CXX) g++ options: -O3

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1AMD SME EnabledNo SME20406080100SE +/- 0.01, N = 3SE +/- 0.10, N = 387.1588.391. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RTLightmap.hdr.4096x4096AMD SME EnabledNo SME0.37350.7471.12051.4941.8675SE +/- 0.00, N = 3SE +/- 0.00, N = 31.661.66

KTX-Software toktx

Settings: Zstd Compression 9

OpenBenchmarking.orgSeconds, Fewer Is BetterKTX-Software toktx 4.0Settings: Zstd Compression 9AMD SME EnabledNo SME0.62461.24921.87382.49843.123SE +/- 0.006, N = 3SE +/- 0.006, N = 32.7762.734

KTX-Software toktx

Settings: Zstd Compression 19

OpenBenchmarking.orgSeconds, Fewer Is BetterKTX-Software toktx 4.0Settings: Zstd Compression 19AMD SME EnabledNo SME510152025SE +/- 0.02, N = 3SE +/- 0.08, N = 319.8818.86

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Very Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Very FastAMD SME EnabledNo SME1632486480SE +/- 0.91, N = 3SE +/- 0.95, N = 373.3274.311. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Ultra Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Ultra FastAMD SME EnabledNo SME20406080100SE +/- 0.97, N = 3SE +/- 0.77, N = 376.2277.681. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

libavif avifenc

Encoder Speed: 2

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 2AMD SME EnabledNo SME816243240SE +/- 0.42, N = 4SE +/- 0.03, N = 335.2634.691. (CXX) g++ options: -O3 -fPIC -lm

libavif avifenc

Encoder Speed: 6

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 6AMD SME EnabledNo SME0.55551.1111.66652.2222.7775SE +/- 0.019, N = 3SE +/- 0.006, N = 32.4692.3931. (CXX) g++ options: -O3 -fPIC -lm

Liquid-DSP

Threads: 256 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 256 - Buffer Length: 256 - Filter Length: 57AMD SME EnabledNo SME2000M4000M6000M8000M10000MSE +/- 5206833.12, N = 3SE +/- 8082903.77, N = 310344666667103320000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 384 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 384 - Buffer Length: 256 - Filter Length: 57AMD SME EnabledNo SME2000M4000M6000M8000M10000MSE +/- 3605551.28, N = 3SE +/- 3785938.90, N = 310350000000103460000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3AMD SME EnabledNo SME13K26K39K52K65KSE +/- 360.17, N = 3SE +/- 197.53, N = 357686.0959069.411. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AMD SME EnabledNo SME16003200480064008000SE +/- 8.37, N = 3SE +/- 16.32, N = 37290.647281.281. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AMD SME EnabledNo SME60120180240300SE +/- 0.33, N = 3SE +/- 0.65, N = 3291.63291.251. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AMD SME EnabledNo SME2K4K6K8K10KSE +/- 99.74, N = 3SE +/- 80.07, N = 38588.758633.541. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AMD SME EnabledNo SME80160240320400SE +/- 3.99, N = 3SE +/- 3.20, N = 3343.55345.341. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsAMD SME EnabledNo SME0.02920.05840.08760.11680.146SE +/- 0.00010, N = 3SE +/- 0.00031, N = 30.129910.12831

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CAMD SME EnabledNo SME110K220K330K440K550KSE +/- 3984.99, N = 3SE +/- 529.89, N = 3494917.44496467.981. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

NAS Parallel Benchmarks

Test / Class: EP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.CAMD SME EnabledNo SME4K8K12K16K20KSE +/- 73.01, N = 3SE +/- 54.14, N = 316462.3516457.941. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CAMD SME EnabledNo SME50K100K150K200K250KSE +/- 1868.66, N = 3SE +/- 2651.33, N = 4220214.75223096.071. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CAMD SME EnabledNo SME50K100K150K200K250KSE +/- 2731.53, N = 3SE +/- 3645.86, N = 3253299.33255564.191. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME20406080100SE +/- 0.11, N = 3SE +/- 0.10, N = 383.6484.25

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME2004006008001000SE +/- 0.31, N = 3SE +/- 0.43, N = 31143.041134.42

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME160320480640800SE +/- 0.69, N = 3SE +/- 0.54, N = 3745.64762.70

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME306090120150SE +/- 0.14, N = 3SE +/- 0.08, N = 3128.40125.53

Neural Magic DeepSparse

Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME2004006008001000SE +/- 0.37, N = 3SE +/- 1.61, N = 3840.94858.47

Neural Magic DeepSparse

Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME306090120150SE +/- 0.01, N = 3SE +/- 0.16, N = 3113.83111.46

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME400800120016002000SE +/- 4.55, N = 3SE +/- 1.94, N = 31911.401962.10

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME1122334455SE +/- 0.13, N = 3SE +/- 0.05, N = 350.1148.83

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME30060090012001500SE +/- 1.22, N = 3SE +/- 2.65, N = 31179.001204.85

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME20406080100SE +/- 0.07, N = 3SE +/- 0.15, N = 381.2379.49

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME130260390520650SE +/- 0.11, N = 3SE +/- 0.96, N = 3601.93617.03

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME4080120160200SE +/- 0.02, N = 3SE +/- 0.23, N = 3158.94155.10

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME20406080100SE +/- 0.05, N = 3SE +/- 0.09, N = 383.8284.28

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME2004006008001000SE +/- 0.38, N = 3SE +/- 1.72, N = 31143.011133.31

nginx

Connections: 500

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 500AMD SME EnabledNo SME40K80K120K160K200KSE +/- 124.29, N = 3SE +/- 238.08, N = 3196386.41201056.691. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

NWChem

Input: C240 Buckyball

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballAMD SME EnabledNo SME300600900120015001543.11524.41. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUAMD SME EnabledNo SME0.19430.38860.58290.77720.9715SE +/- 0.004505, N = 3SE +/- 0.005137, N = 30.8504180.863726MIN: 0.74MIN: 0.751. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUAMD SME EnabledNo SME0.88271.76542.64813.53084.4135SE +/- 0.05423, N = 3SE +/- 0.06074, N = 153.923133.89299MIN: 2.89MIN: 2.771. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME0.11850.2370.35550.4740.5925SE +/- 0.001428, N = 3SE +/- 0.001481, N = 30.5266280.522052MIN: 0.42MIN: 0.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME612182430SE +/- 0.20, N = 3SE +/- 0.08, N = 323.1422.68MIN: 20.17MIN: 19.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUAMD SME EnabledNo SME0.20670.41340.62010.82681.0335SE +/- 0.004429, N = 3SE +/- 0.009364, N = 50.9184820.916133MIN: 0.78MIN: 0.761. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME400800120016002000SE +/- 18.68, N = 6SE +/- 18.08, N = 72002.432011.15MIN: 1924.96MIN: 1936.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardAMD SME EnabledNo SME12002400360048006000SE +/- 40.49, N = 3SE +/- 15.47, N = 3558356001. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Mesh Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh TimeAMD SME EnabledNo SME61218243027.1025.071. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Execution Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeAMD SME EnabledNo SME51015202522.1322.081. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenRadioss

Model: Bumper Beam

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Bumper BeamAMD SME EnabledNo SME20406080100SE +/- 0.15, N = 3SE +/- 0.73, N = 379.9779.85

OpenRadioss

Model: Cell Phone Drop Test

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Cell Phone Drop TestAMD SME EnabledNo SME510152025SE +/- 0.13, N = 3SE +/- 0.02, N = 318.3218.45

OpenRadioss

Model: INIVOL and Fluid Structure Interaction Drop Container

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: INIVOL and Fluid Structure Interaction Drop ContainerAMD SME EnabledNo SME20406080100SE +/- 0.15, N = 3SE +/- 0.09, N = 380.9080.88

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUAMD SME EnabledNo SME20406080100SE +/- 0.21, N = 3SE +/- 0.24, N = 3101.55101.901. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUAMD SME EnabledNo SME100200300400500SE +/- 0.71, N = 3SE +/- 0.87, N = 3471.53469.81MIN: 415.73 / MAX: 547.47MIN: 402.31 / MAX: 539.431. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUAMD SME EnabledNo SME1020304050SE +/- 0.21, N = 3SE +/- 0.12, N = 342.3343.291. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUAMD SME EnabledNo SME2004006008001000SE +/- 5.54, N = 3SE +/- 3.23, N = 31127.511102.15MIN: 802.53 / MAX: 1835.35MIN: 799.38 / MAX: 1782.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUAMD SME EnabledNo SME1020304050SE +/- 0.02, N = 3SE +/- 0.23, N = 342.0342.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUAMD SME EnabledNo SME2004006008001000SE +/- 0.32, N = 3SE +/- 5.83, N = 31134.681115.56MIN: 842.92 / MAX: 1806.88MIN: 773.47 / MAX: 1806.091. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUAMD SME EnabledNo SME16003200480064008000SE +/- 4.95, N = 3SE +/- 3.74, N = 37274.987437.731. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUAMD SME EnabledNo SME246810SE +/- 0.01, N = 3SE +/- 0.00, N = 36.596.44MIN: 5 / MAX: 63.2MIN: 5.05 / MAX: 61.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME4080120160200SE +/- 0.05, N = 3SE +/- 0.05, N = 3193.83193.951. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME50100150200250SE +/- 0.05, N = 3SE +/- 0.06, N = 3247.23246.95MIN: 208.68 / MAX: 293.42MIN: 205.26 / MAX: 303.511. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 4.87, N = 3SE +/- 2.65, N = 311184.7611180.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME0.9631.9262.8893.8524.815SE +/- 0.00, N = 3SE +/- 0.00, N = 34.284.28MIN: 3.5 / MAX: 42.32MIN: 3.51 / MAX: 38.771. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 13.80, N = 3SE +/- 3.98, N = 39990.199997.681. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUAMD SME EnabledNo SME1.07782.15563.23344.31125.389SE +/- 0.01, N = 3SE +/- 0.00, N = 34.794.79MIN: 3.95 / MAX: 30.1MIN: 3.96 / MAX: 30.921. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUAMD SME EnabledNo SME2004006008001000SE +/- 1.50, N = 3SE +/- 1.57, N = 3963.14967.901. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUAMD SME EnabledNo SME1122334455SE +/- 0.08, N = 3SE +/- 0.08, N = 349.7849.54MIN: 38.76 / MAX: 189.77MIN: 37.27 / MAX: 225.521. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME4K8K12K16K20KSE +/- 4.31, N = 3SE +/- 18.52, N = 319704.8419801.401. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME3691215SE +/- 0.00, N = 3SE +/- 0.01, N = 39.679.62MIN: 8.32 / MAX: 78.9MIN: 8.26 / MAX: 57.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 6.93, N = 3SE +/- 7.89, N = 38993.909027.721. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUAMD SME EnabledNo SME1.19932.39863.59794.79725.9965SE +/- 0.00, N = 3SE +/- 0.01, N = 35.335.31MIN: 4.45 / MAX: 44.32MIN: 4.34 / MAX: 44.161. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUAMD SME EnabledNo SME30K60K90K120K150KSE +/- 1824.08, N = 4SE +/- 878.85, N = 3148736.04150792.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUAMD SME EnabledNo SME0.12380.24760.37140.49520.619SE +/- 0.00, N = 4SE +/- 0.00, N = 30.550.55MIN: 0.5 / MAX: 36.47MIN: 0.5 / MAX: 30.131. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAMD SME EnabledNo SME40K80K120K160K200KSE +/- 702.31, N = 3SE +/- 2225.36, N = 3167545.54165194.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAMD SME EnabledNo SME0.0810.1620.2430.3240.405SE +/- 0.00, N = 3SE +/- 0.00, N = 30.360.36MIN: 0.34 / MAX: 47.65MIN: 0.34 / MAX: 40.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPCAMD SME EnabledNo SME30060090012001500SE +/- 15.55, N = 4SE +/- 6.81, N = 312861322MIN: 328 / MAX: 5485MIN: 329 / MAX: 4485

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: particle_volume/pathtracer/real_timeAMD SME EnabledNo SME50100150200250SE +/- 1.51, N = 3SE +/- 1.25, N = 3229.88230.62

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAMD SME EnabledNo SME1020304050SE +/- 0.14, N = 3SE +/- 0.04, N = 343.3843.85

OSPRay Studio

Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path TracerAMD SME EnabledNo SME5K10K15K20K25KSE +/- 32.58, N = 3SE +/- 6.36, N = 322614220431. (CXX) g++ options: -O3 -ldl

PostgreSQL

Scaling Factor: 100 - Clients: 250 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 250 - Mode: Read OnlyAMD SME EnabledNo SME600K1200K1800K2400K3000KSE +/- 40566.19, N = 3SE +/- 16891.69, N = 3297086929511471. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of StateAMD SME EnabledNo SME0.21020.42040.63060.84081.051SE +/- 0.003, N = 3SE +/- 0.002, N = 30.9340.884

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral MixingAMD SME EnabledNo SME0.39240.78481.17721.56961.962SE +/- 0.005, N = 3SE +/- 0.006, N = 31.7441.691

QuantLib

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.21AMD SME EnabledNo SME7001400210028003500SE +/- 8.14, N = 3SE +/- 6.39, N = 33061.33052.81. (CXX) g++ options: -O3 -march=native -rdynamic

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 3.1.1Test: Basic - Device: CPUAMD SME EnabledNo SME306090120150SE +/- 1.42, N = 5SE +/- 1.40, N = 5130.43128.661. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi

Renaissance

Test: Finagle HTTP Requests

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.14Test: Finagle HTTP RequestsAMD SME EnabledNo SME3K6K9K12K15KSE +/- 95.54, N = 3SE +/- 88.33, N = 312347.512286.3MIN: 11146.33 / MAX: 12514.13MIN: 11326.41 / MAX: 12632.65

Renaissance

Test: In-Memory Database Shootout

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.14Test: In-Memory Database ShootoutAMD SME EnabledNo SME10002000300040005000SE +/- 69.41, N = 3SE +/- 54.74, N = 124838.54764.6MIN: 4339.45 / MAX: 6109.38MIN: 4124.15 / MAX: 6577.01

Rodinia

Test: OpenMP LavaMD

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP LavaMDAMD SME EnabledNo SME48121620SE +/- 0.05, N = 3SE +/- 0.13, N = 316.6716.511. (CXX) g++ options: -O2 -lOpenCL

Rodinia

Test: OpenMP CFD Solver

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP CFD SolverAMD SME EnabledNo SME246810SE +/- 0.030, N = 3SE +/- 0.012, N = 36.0435.9381. (CXX) g++ options: -O2 -lOpenCL

srsRAN

Test: OFDM_Test

OpenBenchmarking.orgSamples / Second, More Is BettersrsRAN 22.04.1Test: OFDM_TestAMD SME EnabledNo SME30M60M90M120M150MSE +/- 883804.91, N = 3SE +/- 600925.21, N = 31626333331617333331. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMAMD SME EnabledNo SME90180270360450SE +/- 3.12, N = 3SE +/- 0.49, N = 3408.5415.11. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMAMD SME EnabledNo SME306090120150SE +/- 0.31, N = 3SE +/- 0.25, N = 3157.7157.81. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 64-QAMAMD SME EnabledNo SME90180270360450SE +/- 0.45, N = 3SE +/- 0.64, N = 3415.7413.91. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 64-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.46, N = 3SE +/- 0.84, N = 3165.9165.81. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAMAMD SME EnabledNo SME100200300400500SE +/- 1.01, N = 3SE +/- 1.49, N = 3444.8445.21. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.41, N = 3SE +/- 0.24, N = 3165.7166.01. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 256-QAMAMD SME EnabledNo SME100200300400500SE +/- 0.03, N = 3SE +/- 1.15, N = 3445.7444.01. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 256-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.09, N = 3SE +/- 0.47, N = 3172.2172.71. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMAMD SME EnabledNo SME306090120150SE +/- 0.19, N = 3SE +/- 0.32, N = 3139.1139.71. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMAMD SME EnabledNo SME20406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 394.994.41. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 13 - Input: Bosphorus 4KAMD SME EnabledNo SME50100150200250SE +/- 4.08, N = 15SE +/- 6.22, N = 15251.44248.33

TensorFlow

Device: CPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 64 - Model: AlexNetAMD SME EnabledNo SME110220330440550SE +/- 7.26, N = 15SE +/- 6.01, N = 15505.26508.40

Timed Gem5 Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Gem5 Compilation 21.2Time To CompileAMD SME EnabledNo SME306090120150SE +/- 1.00, N = 3SE +/- 1.59, N = 3142.18138.64

Timed Godot Game Engine Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Godot Game Engine Compilation 3.2.3Time To CompileAMD SME EnabledNo SME816243240SE +/- 0.36, N = 3SE +/- 0.48, N = 335.0434.14

Timed Linux Kernel Compilation

Build: defconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: defconfigAMD SME EnabledNo SME612182430SE +/- 0.23, N = 7SE +/- 0.22, N = 825.3025.71

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigAMD SME EnabledNo SME306090120150SE +/- 0.71, N = 3SE +/- 1.13, N = 3148.44146.33

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: NinjaAMD SME EnabledNo SME20406080100SE +/- 0.35, N = 3SE +/- 0.38, N = 376.6375.33

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: Unix MakefilesAMD SME EnabledNo SME4080120160200SE +/- 0.05, N = 3SE +/- 0.17, N = 3162.63160.13

WRF

Input: conus 2.5km

OpenBenchmarking.orgSeconds, Fewer Is BetterWRF 4.2.2Input: conus 2.5kmAMD SME EnabledNo SME90018002700360045004116.624077.191. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

x264

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2022-02-22Video Input: Bosphorus 4KAMD SME EnabledNo SME20406080100SE +/- 0.62, N = 3SE +/- 1.42, N = 3103.07106.861. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -flto

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.4Video Input: Bosphorus 4KAMD SME EnabledNo SME612182430SE +/- 0.17, N = 3SE +/- 0.29, N = 423.2923.481. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionAMD SME EnabledNo SME0.99551.9912.98653.9824.9775SE +/- 0.04122391, N = 3SE +/- 0.01135008, N = 34.424245684.374205271. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Xmrig

Variant: Monero - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Monero - Hash Count: 1MAMD SME EnabledNo SME20K40K60K80K100KSE +/- 540.08, N = 3SE +/- 111.86, N = 3101932.1105141.71. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

Xmrig

Variant: Wownero - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Wownero - Hash Count: 1MAMD SME EnabledNo SME30K60K90K120K150KSE +/- 341.44, N = 3SE +/- 211.86, N = 3123484.1126508.31. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

Xsbench

OpenBenchmarking.orgLookups/s, More Is BetterXsbench 2017-07-06AMD SME EnabledNo SME6M12M18M24M30MSE +/- 367701.15, N = 15SE +/- 43563.46, N = 329021428298064151. (CC) gcc options: -std=gnu99 -fopenmp -O3 -lm

Zstd Compression

Compression Level: 19, Long Mode - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Compression SpeedAMD SME EnabledNo SME1224364860SE +/- 0.70, N = 3SE +/- 1.03, N = 1549.952.91. (CC) gcc options: -O3 -pthread -lz -llzma

Zstd Compression

Compression Level: 19, Long Mode - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Decompression SpeedAMD SME EnabledNo SME8001600240032004000SE +/- 1.04, N = 3SE +/- 14.95, N = 153837.33825.01. (CC) gcc options: -O3 -pthread -lz -llzma


Phoronix Test Suite v10.8.5