AMD SME Benchmark Genoa

4th Gen AMD EPYC "Genoa" Secure Memory Encryption (SME) benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2212212-NE-AMDSMEBEN19&rdt&grr.

AMD SME Benchmark GenoaProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAMD SME EnabledNo SME2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads)AMD Titanite_4G (RTI1002E BIOS)AMD Device 14a41520GB800GB INTEL SSDPF21Q800GBASPEEDVGA HDMIBroadcom NetXtreme BCM5720 PCIeUbuntu 22.106.1.0-phx (x86_64)GNOME Shell 43.0X Server 1.21.1.41.3.224GCC 12.2.0 + Clang 15.0.2-1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110d Java Details- OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Details- Python 3.10.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AMD SME Benchmark Genoawrf: conus 2.5kmnwchem: C240 Buckyballopenvkl: vklBenchmark ISPCospray: particle_volume/pathtracer/real_timerenaissance: In-Memory Database Shootoutrelion: Basic - CPUhpcg: onednn: Recurrent Neural Network Training - f32 - CPUrenaissance: Finagle HTTP Requestsbuild-llvm: Unix Makefilesbuild-linux-kernel: allmodconfigbuild-gem5: Time To Compilepgbench: 100 - 250 - Read Onlygraph500: 26graph500: 26graph500: 26graph500: 26compress-zstd: 19, Long Mode - Decompression Speedcompress-zstd: 19, Long Mode - Compression Speedonnx: super-resolution-10 - CPU - Standardospray: gravity_spheres_volume/dim_512/ao/real_timeospray-studio: 3 - 4K - 32 - Path Tracerappleseed: Emilyopenradioss: INIVOL and Fluid Structure Interaction Drop Containernginx: 500pyhpc: CPU - Numpy - 4194304 - Isoneutral Mixingaom-av1: Speed 10 Realtime - Bosphorus 4Kopenradioss: Bumper Beamblender: Barbershop - CPU-Onlytensorflow: CPU - 64 - AlexNetbuild-llvm: Ninjaopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUnamd: ATPase Simulation - 327,506 Atomsopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUbuild-linux-kernel: defconfigopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdacapobench: H2deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamcompress-7zip: Decompression Ratingcompress-7zip: Compression Ratingdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamavifenc: 2askap: tConvolve MPI - Griddingaskap: tConvolve MPI - Degriddingsrsran: 4G PHY_DL_Test 100 PRB MIMO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 256-QAMbuild-godot: Time To Compileopenradioss: Cell Phone Drop Testpyhpc: CPU - Numpy - 4194304 - Equation of Statex265: Bosphorus 4Ksrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMsrsran: OFDM_Testonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUquantlib: gpaw: Carbon Nanotubegromacs: MPI CPU - water_GMX50_bareblender: Classroom - CPU-Onlyonednn: Deconvolution Batch shapes_1d - f32 - CPUliquid-dsp: 384 - 256 - 57liquid-dsp: 256 - 256 - 57svt-av1: Preset 13 - Bosphorus 4Ktoktx: Zstd Compression 19oidn: RTLightmap.hdr.4096x4096srsran: 4G PHY_DL_Test 100 PRB SISO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 256-QAMopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenfoam: drivaerFastback, Small Mesh Size - Mesh Timerodinia: OpenMP LavaMDlulesh: srsran: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMsrsran: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMminibude: OpenMP - BM2minibude: OpenMP - BM2srsran: 4G PHY_DL_Test 100 PRB SISO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 64-QAMxmrig: Monero - 1Mxsbench: xmrig: Wownero - 1Monednn: IP Shapes 3D - u8s8f32 - CPUkvazaar: Bosphorus 4K - Very Fastkvazaar: Bosphorus 4K - Ultra Fastastcenc: Exhaustivenpb: BT.Cnpb: SP.Conednn: Convolution Batch Shapes Auto - f32 - CPUastcenc: Thoroughrodinia: OpenMP CFD Solverx264: Bosphorus 4Kincompact3d: input.i3d 193 Cells Per Directionembree: Pathtracer ISPC - Crownnpb: FT.Cmt-dgemm: Sustained Floating-Point Rateminibude: OpenMP - BM1minibude: OpenMP - BM1toktx: Zstd Compression 9avifenc: 6npb: EP.CAMD SME EnabledNo SME4116.621543.11286229.8794838.5130.42687.15012002.4312347.5162.629148.435142.1812970869835467000572510000152638000013585100003837.349.9558343.378522614150.9207180.90196386.411.74433.1279.9781.57505.2676.6290.55148736.040.129911127.5142.331134.6842.03471.53101.55247.23193.8325.3035.338993.9049.78963.140.36167545.549.6719704.846.597274.984.2811184.764.799990.191143.010683.81731143.036483.6363158.9428601.93415050128.4001745.641281.22581178.999050.10801911.39991169038885135113.8270840.940435.26089541.878718.7165.7444.835.03818.320.93423.29157.7408.51626333330.9184823.923133061.322.98118.62320.9523.14291035000000010344666667251.44119.8811.66172.2445.722.1330227.09945816.66957686.08694.9139.1343.5508588.749165.9415.7101932.129021428123484.10.85041873.3276.2211.8379494917.44253299.330.526628106.55666.043103.074.42424568180.5717220214.7570.277437291.6257290.6362.7762.46916462.354077.1891524.41322230.6174764.6128.65588.39022011.1512286.3160.129146.325138.6392951147838505000593153000153318000014264800003825.052.9560043.847822043142.9470280.88201056.691.69134.4779.8580.77508.4075.3290.55150792.420.128311102.1543.291115.5642.76469.81101.90246.95193.9525.7095.319027.7249.54967.900.36165194.429.6219801.406.447437.734.2811180.634.799997.681133.313284.27641134.423484.2500155.1029617.02814807125.5335762.698479.48951204.849148.82911962.10241160632917782111.4614858.472934.69093071.083598.3166.0445.234.14118.450.88423.48157.8415.11617333330.9161333.892993052.823.16718.71220.9922.67951034600000010332000000248.33418.8631.66172.7444.022.08426425.0678716.50859069.40594.4139.7345.3428633.544165.8413.9105141.729806415126508.30.86372674.3177.6811.8206496467.98255564.190.522052106.42445.938106.864.37420527183.2528223096.0770.372095291.2517281.2842.7342.39316457.94OpenBenchmarking.org

WRF

Input: conus 2.5km

OpenBenchmarking.orgSeconds, Fewer Is BetterWRF 4.2.2Input: conus 2.5kmAMD SME EnabledNo SME90018002700360045004116.624077.191. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NWChem

Input: C240 Buckyball

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballAMD SME EnabledNo SME300600900120015001543.11524.41. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPCAMD SME EnabledNo SME30060090012001500SE +/- 15.55, N = 4SE +/- 6.81, N = 312861322MIN: 328 / MAX: 5485MIN: 329 / MAX: 4485

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: particle_volume/pathtracer/real_timeAMD SME EnabledNo SME50100150200250SE +/- 1.51, N = 3SE +/- 1.25, N = 3229.88230.62

Renaissance

Test: In-Memory Database Shootout

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.14Test: In-Memory Database ShootoutAMD SME EnabledNo SME10002000300040005000SE +/- 69.41, N = 3SE +/- 54.74, N = 124838.54764.6MIN: 4339.45 / MAX: 6109.38MIN: 4124.15 / MAX: 6577.01

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 3.1.1Test: Basic - Device: CPUAMD SME EnabledNo SME306090120150SE +/- 1.42, N = 5SE +/- 1.40, N = 5130.43128.661. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1AMD SME EnabledNo SME20406080100SE +/- 0.01, N = 3SE +/- 0.10, N = 387.1588.391. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME400800120016002000SE +/- 18.68, N = 6SE +/- 18.08, N = 72002.432011.15MIN: 1924.96MIN: 1936.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Renaissance

Test: Finagle HTTP Requests

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.14Test: Finagle HTTP RequestsAMD SME EnabledNo SME3K6K9K12K15KSE +/- 95.54, N = 3SE +/- 88.33, N = 312347.512286.3MIN: 11146.33 / MAX: 12514.13MIN: 11326.41 / MAX: 12632.65

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: Unix MakefilesAMD SME EnabledNo SME4080120160200SE +/- 0.05, N = 3SE +/- 0.17, N = 3162.63160.13

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigAMD SME EnabledNo SME306090120150SE +/- 0.71, N = 3SE +/- 1.13, N = 3148.44146.33

Timed Gem5 Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Gem5 Compilation 21.2Time To CompileAMD SME EnabledNo SME306090120150SE +/- 1.00, N = 3SE +/- 1.59, N = 3142.18138.64

PostgreSQL

Scaling Factor: 100 - Clients: 250 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 250 - Mode: Read OnlyAMD SME EnabledNo SME600K1200K1800K2400K3000KSE +/- 40566.19, N = 3SE +/- 16891.69, N = 3297086929511471. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

Graph500

Scale: 26

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME200M400M600M800M1000M8354670008385050001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME130M260M390M520M650M5725100005931530001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME300M600M900M1200M1500M152638000015331800001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26AMD SME EnabledNo SME300M600M900M1200M1500M135851000014264800001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Zstd Compression

Compression Level: 19, Long Mode - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Decompression SpeedAMD SME EnabledNo SME8001600240032004000SE +/- 1.04, N = 3SE +/- 14.95, N = 153837.33825.01. (CC) gcc options: -O3 -pthread -lz -llzma

Zstd Compression

Compression Level: 19, Long Mode - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Compression SpeedAMD SME EnabledNo SME1224364860SE +/- 0.70, N = 3SE +/- 1.03, N = 1549.952.91. (CC) gcc options: -O3 -pthread -lz -llzma

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardAMD SME EnabledNo SME12002400360048006000SE +/- 40.49, N = 3SE +/- 15.47, N = 3558356001. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAMD SME EnabledNo SME1020304050SE +/- 0.14, N = 3SE +/- 0.04, N = 343.3843.85

OSPRay Studio

Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path TracerAMD SME EnabledNo SME5K10K15K20K25KSE +/- 32.58, N = 3SE +/- 6.36, N = 322614220431. (CXX) g++ options: -O3 -ldl

Appleseed

Scene: Emily

OpenBenchmarking.orgSeconds, Fewer Is BetterAppleseed 2.0 BetaScene: EmilyAMD SME EnabledNo SME306090120150150.92142.95

OpenRadioss

Model: INIVOL and Fluid Structure Interaction Drop Container

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: INIVOL and Fluid Structure Interaction Drop ContainerAMD SME EnabledNo SME20406080100SE +/- 0.15, N = 3SE +/- 0.09, N = 380.9080.88

nginx

Connections: 500

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 500AMD SME EnabledNo SME40K80K120K160K200KSE +/- 124.29, N = 3SE +/- 238.08, N = 3196386.41201056.691. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral MixingAMD SME EnabledNo SME0.39240.78481.17721.56961.962SE +/- 0.005, N = 3SE +/- 0.006, N = 31.7441.691

AOM AV1

Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.5Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4KAMD SME EnabledNo SME816243240SE +/- 0.56, N = 12SE +/- 0.53, N = 1533.1234.471. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm

OpenRadioss

Model: Bumper Beam

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Bumper BeamAMD SME EnabledNo SME20406080100SE +/- 0.15, N = 3SE +/- 0.73, N = 379.9779.85

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: Barbershop - Compute: CPU-OnlyAMD SME EnabledNo SME20406080100SE +/- 0.45, N = 3SE +/- 0.30, N = 381.5780.77

TensorFlow

Device: CPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 64 - Model: AlexNetAMD SME EnabledNo SME110220330440550SE +/- 7.26, N = 15SE +/- 6.01, N = 15505.26508.40

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: NinjaAMD SME EnabledNo SME20406080100SE +/- 0.35, N = 3SE +/- 0.38, N = 376.6375.33

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUAMD SME EnabledNo SME0.12380.24760.37140.49520.619SE +/- 0.00, N = 4SE +/- 0.00, N = 30.550.55MIN: 0.5 / MAX: 36.47MIN: 0.5 / MAX: 30.131. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUAMD SME EnabledNo SME30K60K90K120K150KSE +/- 1824.08, N = 4SE +/- 878.85, N = 3148736.04150792.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsAMD SME EnabledNo SME0.02920.05840.08760.11680.146SE +/- 0.00010, N = 3SE +/- 0.00031, N = 30.129910.12831

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUAMD SME EnabledNo SME2004006008001000SE +/- 5.54, N = 3SE +/- 3.23, N = 31127.511102.15MIN: 802.53 / MAX: 1835.35MIN: 799.38 / MAX: 1782.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUAMD SME EnabledNo SME1020304050SE +/- 0.21, N = 3SE +/- 0.12, N = 342.3343.291. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUAMD SME EnabledNo SME2004006008001000SE +/- 0.32, N = 3SE +/- 5.83, N = 31134.681115.56MIN: 842.92 / MAX: 1806.88MIN: 773.47 / MAX: 1806.091. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUAMD SME EnabledNo SME1020304050SE +/- 0.02, N = 3SE +/- 0.23, N = 342.0342.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUAMD SME EnabledNo SME100200300400500SE +/- 0.71, N = 3SE +/- 0.87, N = 3471.53469.81MIN: 415.73 / MAX: 547.47MIN: 402.31 / MAX: 539.431. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUAMD SME EnabledNo SME20406080100SE +/- 0.21, N = 3SE +/- 0.24, N = 3101.55101.901. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME50100150200250SE +/- 0.05, N = 3SE +/- 0.06, N = 3247.23246.95MIN: 208.68 / MAX: 293.42MIN: 205.26 / MAX: 303.511. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME4080120160200SE +/- 0.05, N = 3SE +/- 0.05, N = 3193.83193.951. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

Timed Linux Kernel Compilation

Build: defconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: defconfigAMD SME EnabledNo SME612182430SE +/- 0.23, N = 7SE +/- 0.22, N = 825.3025.71

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUAMD SME EnabledNo SME1.19932.39863.59794.79725.9965SE +/- 0.00, N = 3SE +/- 0.01, N = 35.335.31MIN: 4.45 / MAX: 44.32MIN: 4.34 / MAX: 44.161. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 6.93, N = 3SE +/- 7.89, N = 38993.909027.721. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUAMD SME EnabledNo SME1122334455SE +/- 0.08, N = 3SE +/- 0.08, N = 349.7849.54MIN: 38.76 / MAX: 189.77MIN: 37.27 / MAX: 225.521. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUAMD SME EnabledNo SME2004006008001000SE +/- 1.50, N = 3SE +/- 1.57, N = 3963.14967.901. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAMD SME EnabledNo SME0.0810.1620.2430.3240.405SE +/- 0.00, N = 3SE +/- 0.00, N = 30.360.36MIN: 0.34 / MAX: 47.65MIN: 0.34 / MAX: 40.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAMD SME EnabledNo SME40K80K120K160K200KSE +/- 702.31, N = 3SE +/- 2225.36, N = 3167545.54165194.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME3691215SE +/- 0.00, N = 3SE +/- 0.01, N = 39.679.62MIN: 8.32 / MAX: 78.9MIN: 8.26 / MAX: 57.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME4K8K12K16K20KSE +/- 4.31, N = 3SE +/- 18.52, N = 319704.8419801.401. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUAMD SME EnabledNo SME246810SE +/- 0.01, N = 3SE +/- 0.00, N = 36.596.44MIN: 5 / MAX: 63.2MIN: 5.05 / MAX: 61.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUAMD SME EnabledNo SME16003200480064008000SE +/- 4.95, N = 3SE +/- 3.74, N = 37274.987437.731. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME0.9631.9262.8893.8524.815SE +/- 0.00, N = 3SE +/- 0.00, N = 34.284.28MIN: 3.5 / MAX: 42.32MIN: 3.51 / MAX: 38.771. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 4.87, N = 3SE +/- 2.65, N = 311184.7611180.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUAMD SME EnabledNo SME1.07782.15563.23344.31125.389SE +/- 0.01, N = 3SE +/- 0.00, N = 34.794.79MIN: 3.95 / MAX: 30.1MIN: 3.96 / MAX: 30.921. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 13.80, N = 3SE +/- 3.98, N = 39990.199997.681. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME2004006008001000SE +/- 0.38, N = 3SE +/- 1.72, N = 31143.011133.31

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME20406080100SE +/- 0.05, N = 3SE +/- 0.09, N = 383.8284.28

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME2004006008001000SE +/- 0.31, N = 3SE +/- 0.43, N = 31143.041134.42

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME20406080100SE +/- 0.11, N = 3SE +/- 0.10, N = 383.6484.25

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME4080120160200SE +/- 0.02, N = 3SE +/- 0.23, N = 3158.94155.10

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME130260390520650SE +/- 0.11, N = 3SE +/- 0.96, N = 3601.93617.03

DaCapo Benchmark

Java Test: H2

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 9.12-MR1Java Test: H2AMD SME EnabledNo SME11002200330044005500SE +/- 50.10, N = 20SE +/- 54.36, N = 2050504807

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME306090120150SE +/- 0.14, N = 3SE +/- 0.08, N = 3128.40125.53

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME160320480640800SE +/- 0.69, N = 3SE +/- 0.54, N = 3745.64762.70

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME20406080100SE +/- 0.07, N = 3SE +/- 0.15, N = 381.2379.49

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME30060090012001500SE +/- 1.22, N = 3SE +/- 2.65, N = 31179.001204.85

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME1122334455SE +/- 0.13, N = 3SE +/- 0.05, N = 350.1148.83

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME400800120016002000SE +/- 4.55, N = 3SE +/- 1.94, N = 31911.401962.10

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingAMD SME EnabledNo SME300K600K900K1200K1500KSE +/- 7858.68, N = 3SE +/- 10921.46, N = 3116903811606321. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingAMD SME EnabledNo SME200K400K600K800K1000KSE +/- 6113.05, N = 3SE +/- 9930.11, N = 38851359177821. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Neural Magic DeepSparse

Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME306090120150SE +/- 0.01, N = 3SE +/- 0.16, N = 3113.83111.46

Neural Magic DeepSparse

Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamAMD SME EnabledNo SME2004006008001000SE +/- 0.37, N = 3SE +/- 1.61, N = 3840.94858.47

libavif avifenc

Encoder Speed: 2

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 2AMD SME EnabledNo SME816243240SE +/- 0.42, N = 4SE +/- 0.03, N = 335.2634.691. (CXX) g++ options: -O3 -fPIC -lm

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingAMD SME EnabledNo SME20K40K60K80K100KSE +/- 422.37, N = 3SE +/- 460.77, N = 389541.893071.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingAMD SME EnabledNo SME20K40K60K80K100KSE +/- 0.00, N = 3SE +/- 368.27, N = 378718.783598.31. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.41, N = 3SE +/- 0.24, N = 3165.7166.01. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAMAMD SME EnabledNo SME100200300400500SE +/- 1.01, N = 3SE +/- 1.49, N = 3444.8445.21. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

Timed Godot Game Engine Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Godot Game Engine Compilation 3.2.3Time To CompileAMD SME EnabledNo SME816243240SE +/- 0.36, N = 3SE +/- 0.48, N = 335.0434.14

OpenRadioss

Model: Cell Phone Drop Test

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Cell Phone Drop TestAMD SME EnabledNo SME510152025SE +/- 0.13, N = 3SE +/- 0.02, N = 318.3218.45

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of StateAMD SME EnabledNo SME0.21020.42040.63060.84081.051SE +/- 0.003, N = 3SE +/- 0.002, N = 30.9340.884

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.4Video Input: Bosphorus 4KAMD SME EnabledNo SME612182430SE +/- 0.17, N = 3SE +/- 0.29, N = 423.2923.481. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMAMD SME EnabledNo SME306090120150SE +/- 0.31, N = 3SE +/- 0.25, N = 3157.7157.81. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMAMD SME EnabledNo SME90180270360450SE +/- 3.12, N = 3SE +/- 0.49, N = 3408.5415.11. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: OFDM_Test

OpenBenchmarking.orgSamples / Second, More Is BettersrsRAN 22.04.1Test: OFDM_TestAMD SME EnabledNo SME30M60M90M120M150MSE +/- 883804.91, N = 3SE +/- 600925.21, N = 31626333331617333331. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUAMD SME EnabledNo SME0.20670.41340.62010.82681.0335SE +/- 0.004429, N = 3SE +/- 0.009364, N = 50.9184820.916133MIN: 0.78MIN: 0.761. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUAMD SME EnabledNo SME0.88271.76542.64813.53084.4135SE +/- 0.05423, N = 3SE +/- 0.06074, N = 153.923133.89299MIN: 2.89MIN: 2.771. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

QuantLib

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.21AMD SME EnabledNo SME7001400210028003500SE +/- 8.14, N = 3SE +/- 6.39, N = 33061.33052.81. (CXX) g++ options: -O3 -march=native -rdynamic

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 22.1Input: Carbon NanotubeAMD SME EnabledNo SME612182430SE +/- 0.05, N = 3SE +/- 0.16, N = 322.9823.171. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2022.1Implementation: MPI CPU - Input: water_GMX50_bareAMD SME EnabledNo SME510152025SE +/- 0.03, N = 3SE +/- 0.03, N = 318.6218.711. (CXX) g++ options: -O3

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: Classroom - Compute: CPU-OnlyAMD SME EnabledNo SME510152025SE +/- 0.02, N = 3SE +/- 0.08, N = 320.9520.99

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME612182430SE +/- 0.20, N = 3SE +/- 0.08, N = 323.1422.68MIN: 20.17MIN: 19.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Liquid-DSP

Threads: 384 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 384 - Buffer Length: 256 - Filter Length: 57AMD SME EnabledNo SME2000M4000M6000M8000M10000MSE +/- 3605551.28, N = 3SE +/- 3785938.90, N = 310350000000103460000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 256 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 256 - Buffer Length: 256 - Filter Length: 57AMD SME EnabledNo SME2000M4000M6000M8000M10000MSE +/- 5206833.12, N = 3SE +/- 8082903.77, N = 310344666667103320000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 13 - Input: Bosphorus 4KAMD SME EnabledNo SME50100150200250SE +/- 4.08, N = 15SE +/- 6.22, N = 15251.44248.33

KTX-Software toktx

Settings: Zstd Compression 19

OpenBenchmarking.orgSeconds, Fewer Is BetterKTX-Software toktx 4.0Settings: Zstd Compression 19AMD SME EnabledNo SME510152025SE +/- 0.02, N = 3SE +/- 0.08, N = 319.8818.86

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RTLightmap.hdr.4096x4096AMD SME EnabledNo SME0.37350.7471.12051.4941.8675SE +/- 0.00, N = 3SE +/- 0.00, N = 31.661.66

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 256-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.09, N = 3SE +/- 0.47, N = 3172.2172.71. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 256-QAMAMD SME EnabledNo SME100200300400500SE +/- 0.03, N = 3SE +/- 1.15, N = 3445.7444.01. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Execution Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeAMD SME EnabledNo SME51015202522.1322.081. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Mesh Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh TimeAMD SME EnabledNo SME61218243027.1025.071. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Rodinia

Test: OpenMP LavaMD

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP LavaMDAMD SME EnabledNo SME48121620SE +/- 0.05, N = 3SE +/- 0.13, N = 316.6716.511. (CXX) g++ options: -O2 -lOpenCL

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3AMD SME EnabledNo SME13K26K39K52K65KSE +/- 360.17, N = 3SE +/- 197.53, N = 357686.0959069.411. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi

srsRAN

Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMAMD SME EnabledNo SME20406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 394.994.41. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMAMD SME EnabledNo SME306090120150SE +/- 0.19, N = 3SE +/- 0.32, N = 3139.1139.71. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AMD SME EnabledNo SME80160240320400SE +/- 3.99, N = 3SE +/- 3.20, N = 3343.55345.341. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AMD SME EnabledNo SME2K4K6K8K10KSE +/- 99.74, N = 3SE +/- 80.07, N = 38588.758633.541. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 64-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.46, N = 3SE +/- 0.84, N = 3165.9165.81. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 64-QAMAMD SME EnabledNo SME90180270360450SE +/- 0.45, N = 3SE +/- 0.64, N = 3415.7413.91. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

Xmrig

Variant: Monero - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Monero - Hash Count: 1MAMD SME EnabledNo SME20K40K60K80K100KSE +/- 540.08, N = 3SE +/- 111.86, N = 3101932.1105141.71. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

Xsbench

OpenBenchmarking.orgLookups/s, More Is BetterXsbench 2017-07-06AMD SME EnabledNo SME6M12M18M24M30MSE +/- 367701.15, N = 15SE +/- 43563.46, N = 329021428298064151. (CC) gcc options: -std=gnu99 -fopenmp -O3 -lm

Xmrig

Variant: Wownero - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Wownero - Hash Count: 1MAMD SME EnabledNo SME30K60K90K120K150KSE +/- 341.44, N = 3SE +/- 211.86, N = 3123484.1126508.31. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUAMD SME EnabledNo SME0.19430.38860.58290.77720.9715SE +/- 0.004505, N = 3SE +/- 0.005137, N = 30.8504180.863726MIN: 0.74MIN: 0.751. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Very Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Very FastAMD SME EnabledNo SME1632486480SE +/- 0.91, N = 3SE +/- 0.95, N = 373.3274.311. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Ultra Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Ultra FastAMD SME EnabledNo SME20406080100SE +/- 0.97, N = 3SE +/- 0.77, N = 376.2277.681. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

ASTC Encoder

Preset: Exhaustive

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ExhaustiveAMD SME EnabledNo SME3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 311.8411.821. (CXX) g++ options: -O3 -flto -pthread

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CAMD SME EnabledNo SME110K220K330K440K550KSE +/- 3984.99, N = 3SE +/- 529.89, N = 3494917.44496467.981. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CAMD SME EnabledNo SME50K100K150K200K250KSE +/- 2731.53, N = 3SE +/- 3645.86, N = 3253299.33255564.191. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME0.11850.2370.35550.4740.5925SE +/- 0.001428, N = 3SE +/- 0.001481, N = 30.5266280.522052MIN: 0.42MIN: 0.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

ASTC Encoder

Preset: Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ThoroughAMD SME EnabledNo SME20406080100SE +/- 0.03, N = 3SE +/- 0.06, N = 3106.56106.421. (CXX) g++ options: -O3 -flto -pthread

Rodinia

Test: OpenMP CFD Solver

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP CFD SolverAMD SME EnabledNo SME246810SE +/- 0.030, N = 3SE +/- 0.012, N = 36.0435.9381. (CXX) g++ options: -O2 -lOpenCL

x264

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2022-02-22Video Input: Bosphorus 4KAMD SME EnabledNo SME20406080100SE +/- 0.62, N = 3SE +/- 1.42, N = 3103.07106.861. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -flto

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionAMD SME EnabledNo SME0.99551.9912.98653.9824.9775SE +/- 0.04122391, N = 3SE +/- 0.01135008, N = 34.424245684.374205271. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer ISPC - Model: CrownAMD SME EnabledNo SME4080120160200SE +/- 0.40, N = 3SE +/- 0.65, N = 3180.57183.25MIN: 129.9 / MAX: 210MIN: 135.3 / MAX: 213.14

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CAMD SME EnabledNo SME50K100K150K200K250KSE +/- 1868.66, N = 3SE +/- 2651.33, N = 4220214.75223096.071. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateAMD SME EnabledNo SME1632486480SE +/- 0.18, N = 3SE +/- 0.11, N = 370.2870.371. (CC) gcc options: -O3 -march=native -fopenmp

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AMD SME EnabledNo SME60120180240300SE +/- 0.33, N = 3SE +/- 0.65, N = 3291.63291.251. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AMD SME EnabledNo SME16003200480064008000SE +/- 8.37, N = 3SE +/- 16.32, N = 37290.647281.281. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

KTX-Software toktx

Settings: Zstd Compression 9

OpenBenchmarking.orgSeconds, Fewer Is BetterKTX-Software toktx 4.0Settings: Zstd Compression 9AMD SME EnabledNo SME0.62461.24921.87382.49843.123SE +/- 0.006, N = 3SE +/- 0.006, N = 32.7762.734

libavif avifenc

Encoder Speed: 6

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 6AMD SME EnabledNo SME0.55551.1111.66652.2222.7775SE +/- 0.019, N = 3SE +/- 0.006, N = 32.4692.3931. (CXX) g++ options: -O3 -fPIC -lm

NAS Parallel Benchmarks

Test / Class: EP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.CAMD SME EnabledNo SME4K8K12K16K20KSE +/- 73.01, N = 3SE +/- 54.14, N = 316462.3516457.941. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4


Phoronix Test Suite v10.8.5