AMD SME Benchmark Genoa

4th Gen AMD EPYC "Genoa" Secure Memory Encryption (SME) benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2212212-NE-AMDSMEBEN19&grr&sor.

AMD SME Benchmark GenoaProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionNo SMEAMD SME Enabled2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads)AMD Titanite_4G (RTI1002E BIOS)AMD Device 14a41520GB800GB INTEL SSDPF21Q800GBASPEEDVGA HDMIBroadcom NetXtreme BCM5720 PCIeUbuntu 22.106.1.0-phx (x86_64)GNOME Shell 43.0X Server 1.21.1.41.3.224GCC 12.2.0 + Clang 15.0.2-1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110d Java Details- OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Details- Python 3.10.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AMD SME Benchmark Genoawrf: conus 2.5kmnwchem: C240 Buckyballopenvkl: vklBenchmark ISPCospray: particle_volume/pathtracer/real_timerenaissance: In-Memory Database Shootoutrelion: Basic - CPUhpcg: onednn: Recurrent Neural Network Training - f32 - CPUrenaissance: Finagle HTTP Requestsbuild-llvm: Unix Makefilesbuild-linux-kernel: allmodconfigbuild-gem5: Time To Compilepgbench: 100 - 250 - Read Onlygraph500: 26graph500: 26graph500: 26graph500: 26compress-zstd: 19, Long Mode - Decompression Speedcompress-zstd: 19, Long Mode - Compression Speedonnx: super-resolution-10 - CPU - Standardospray: gravity_spheres_volume/dim_512/ao/real_timeospray-studio: 3 - 4K - 32 - Path Tracerappleseed: Emilyopenradioss: INIVOL and Fluid Structure Interaction Drop Containernginx: 500pyhpc: CPU - Numpy - 4194304 - Isoneutral Mixingaom-av1: Speed 10 Realtime - Bosphorus 4Kopenradioss: Bumper Beamblender: Barbershop - CPU-Onlytensorflow: CPU - 64 - AlexNetbuild-llvm: Ninjaopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUnamd: ATPase Simulation - 327,506 Atomsopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUbuild-linux-kernel: defconfigopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdacapobench: H2deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamcompress-7zip: Decompression Ratingcompress-7zip: Compression Ratingdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamavifenc: 2askap: tConvolve MPI - Griddingaskap: tConvolve MPI - Degriddingsrsran: 4G PHY_DL_Test 100 PRB MIMO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 256-QAMbuild-godot: Time To Compileopenradioss: Cell Phone Drop Testpyhpc: CPU - Numpy - 4194304 - Equation of Statex265: Bosphorus 4Ksrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMsrsran: OFDM_Testonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUquantlib: gpaw: Carbon Nanotubegromacs: MPI CPU - water_GMX50_bareblender: Classroom - CPU-Onlyonednn: Deconvolution Batch shapes_1d - f32 - CPUliquid-dsp: 384 - 256 - 57liquid-dsp: 256 - 256 - 57svt-av1: Preset 13 - Bosphorus 4Ktoktx: Zstd Compression 19oidn: RTLightmap.hdr.4096x4096srsran: 4G PHY_DL_Test 100 PRB SISO 256-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 256-QAMopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenfoam: drivaerFastback, Small Mesh Size - Mesh Timerodinia: OpenMP LavaMDlulesh: srsran: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMsrsran: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMminibude: OpenMP - BM2minibude: OpenMP - BM2srsran: 4G PHY_DL_Test 100 PRB SISO 64-QAMsrsran: 4G PHY_DL_Test 100 PRB SISO 64-QAMxmrig: Monero - 1Mxsbench: xmrig: Wownero - 1Monednn: IP Shapes 3D - u8s8f32 - CPUkvazaar: Bosphorus 4K - Very Fastkvazaar: Bosphorus 4K - Ultra Fastastcenc: Exhaustivenpb: BT.Cnpb: SP.Conednn: Convolution Batch Shapes Auto - f32 - CPUastcenc: Thoroughrodinia: OpenMP CFD Solverx264: Bosphorus 4Kincompact3d: input.i3d 193 Cells Per Directionembree: Pathtracer ISPC - Crownnpb: FT.Cmt-dgemm: Sustained Floating-Point Rateminibude: OpenMP - BM1minibude: OpenMP - BM1toktx: Zstd Compression 9avifenc: 6npb: EP.CNo SMEAMD SME Enabled4077.1891524.41322230.6174764.6128.65588.39022011.1512286.3160.129146.325138.6392951147838505000593153000153318000014264800003825.052.9560043.847822043142.9470280.88201056.691.69134.4779.8580.77508.4075.3290.55150792.420.128311102.1543.291115.5642.76469.81101.90246.95193.9525.7095.319027.7249.54967.900.36165194.429.6219801.406.447437.734.2811180.634.799997.681133.313284.27641134.423484.2500155.1029617.02814807125.5335762.698479.48951204.849148.82911962.10241160632917782111.4614858.472934.69093071.083598.3166.0445.234.14118.450.88423.48157.8415.11617333330.9161333.892993052.823.16718.71220.9922.67951034600000010332000000248.33418.8631.66172.7444.022.08426425.0678716.50859069.40594.4139.7345.3428633.544165.8413.9105141.729806415126508.30.86372674.3177.6811.8206496467.98255564.190.522052106.42445.938106.864.37420527183.2528223096.0770.372095291.2517281.2842.7342.39316457.944116.621543.11286229.8794838.5130.42687.15012002.4312347.5162.629148.435142.1812970869835467000572510000152638000013585100003837.349.9558343.378522614150.9207180.90196386.411.74433.1279.9781.57505.2676.6290.55148736.040.129911127.5142.331134.6842.03471.53101.55247.23193.8325.3035.338993.9049.78963.140.36167545.549.6719704.846.597274.984.2811184.764.799990.191143.010683.81731143.036483.6363158.9428601.93415050128.4001745.641281.22581178.999050.10801911.39991169038885135113.8270840.940435.26089541.878718.7165.7444.835.03818.320.93423.29157.7408.51626333330.9184823.923133061.322.98118.62320.9523.14291035000000010344666667251.44119.8811.66172.2445.722.1330227.09945816.66957686.08694.9139.1343.5508588.749165.9415.7101932.129021428123484.10.85041873.3276.2211.8379494917.44253299.330.526628106.55666.043103.074.42424568180.5717220214.7570.277437291.6257290.6362.7762.46916462.35OpenBenchmarking.org

WRF

Input: conus 2.5km

OpenBenchmarking.orgSeconds, Fewer Is BetterWRF 4.2.2Input: conus 2.5kmNo SMEAMD SME Enabled90018002700360045004077.194116.621. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NWChem

Input: C240 Buckyball

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballNo SMEAMD SME Enabled300600900120015001524.41543.11. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPCNo SMEAMD SME Enabled30060090012001500SE +/- 6.81, N = 3SE +/- 15.55, N = 413221286MIN: 329 / MAX: 4485MIN: 328 / MAX: 5485

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: particle_volume/pathtracer/real_timeNo SMEAMD SME Enabled50100150200250SE +/- 1.25, N = 3SE +/- 1.51, N = 3230.62229.88

Renaissance

Test: In-Memory Database Shootout

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.14Test: In-Memory Database ShootoutNo SMEAMD SME Enabled10002000300040005000SE +/- 54.74, N = 12SE +/- 69.41, N = 34764.64838.5MIN: 4124.15 / MAX: 6577.01MIN: 4339.45 / MAX: 6109.38

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 3.1.1Test: Basic - Device: CPUNo SMEAMD SME Enabled306090120150SE +/- 1.40, N = 5SE +/- 1.42, N = 5128.66130.431. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1No SMEAMD SME Enabled20406080100SE +/- 0.10, N = 3SE +/- 0.01, N = 388.3987.151. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUAMD SME EnabledNo SME400800120016002000SE +/- 18.68, N = 6SE +/- 18.08, N = 72002.432011.15MIN: 1924.96MIN: 1936.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Renaissance

Test: Finagle HTTP Requests

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.14Test: Finagle HTTP RequestsNo SMEAMD SME Enabled3K6K9K12K15KSE +/- 88.33, N = 3SE +/- 95.54, N = 312286.312347.5MIN: 11326.41 / MAX: 12632.65MIN: 11146.33 / MAX: 12514.13

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: Unix MakefilesNo SMEAMD SME Enabled4080120160200SE +/- 0.17, N = 3SE +/- 0.05, N = 3160.13162.63

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigNo SMEAMD SME Enabled306090120150SE +/- 1.13, N = 3SE +/- 0.71, N = 3146.33148.44

Timed Gem5 Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Gem5 Compilation 21.2Time To CompileNo SMEAMD SME Enabled306090120150SE +/- 1.59, N = 3SE +/- 1.00, N = 3138.64142.18

PostgreSQL

Scaling Factor: 100 - Clients: 250 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 250 - Mode: Read OnlyAMD SME EnabledNo SME600K1200K1800K2400K3000KSE +/- 40566.19, N = 3SE +/- 16891.69, N = 3297086929511471. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

Graph500

Scale: 26

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26No SMEAMD SME Enabled200M400M600M800M1000M8385050008354670001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26No SMEAMD SME Enabled130M260M390M520M650M5931530005725100001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26No SMEAMD SME Enabled300M600M900M1200M1500M153318000015263800001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26No SMEAMD SME Enabled300M600M900M1200M1500M142648000013585100001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Zstd Compression

Compression Level: 19, Long Mode - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Decompression SpeedAMD SME EnabledNo SME8001600240032004000SE +/- 1.04, N = 3SE +/- 14.95, N = 153837.33825.01. (CC) gcc options: -O3 -pthread -lz -llzma

Zstd Compression

Compression Level: 19, Long Mode - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Compression SpeedNo SMEAMD SME Enabled1224364860SE +/- 1.03, N = 15SE +/- 0.70, N = 352.949.91. (CC) gcc options: -O3 -pthread -lz -llzma

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardNo SMEAMD SME Enabled12002400360048006000SE +/- 15.47, N = 3SE +/- 40.49, N = 3560055831. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: gravity_spheres_volume/dim_512/ao/real_timeNo SMEAMD SME Enabled1020304050SE +/- 0.04, N = 3SE +/- 0.14, N = 343.8543.38

OSPRay Studio

Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path TracerNo SMEAMD SME Enabled5K10K15K20K25KSE +/- 6.36, N = 3SE +/- 32.58, N = 322043226141. (CXX) g++ options: -O3 -ldl

Appleseed

Scene: Emily

OpenBenchmarking.orgSeconds, Fewer Is BetterAppleseed 2.0 BetaScene: EmilyNo SMEAMD SME Enabled306090120150142.95150.92

OpenRadioss

Model: INIVOL and Fluid Structure Interaction Drop Container

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: INIVOL and Fluid Structure Interaction Drop ContainerNo SMEAMD SME Enabled20406080100SE +/- 0.09, N = 3SE +/- 0.15, N = 380.8880.90

nginx

Connections: 500

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 500No SMEAMD SME Enabled40K80K120K160K200KSE +/- 238.08, N = 3SE +/- 124.29, N = 3201056.69196386.411. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral MixingNo SMEAMD SME Enabled0.39240.78481.17721.56961.962SE +/- 0.006, N = 3SE +/- 0.005, N = 31.6911.744

AOM AV1

Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.5Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4KNo SMEAMD SME Enabled816243240SE +/- 0.53, N = 15SE +/- 0.56, N = 1234.4733.121. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm

OpenRadioss

Model: Bumper Beam

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Bumper BeamNo SMEAMD SME Enabled20406080100SE +/- 0.73, N = 3SE +/- 0.15, N = 379.8579.97

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: Barbershop - Compute: CPU-OnlyNo SMEAMD SME Enabled20406080100SE +/- 0.30, N = 3SE +/- 0.45, N = 380.7781.57

TensorFlow

Device: CPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 64 - Model: AlexNetNo SMEAMD SME Enabled110220330440550SE +/- 6.01, N = 15SE +/- 7.26, N = 15508.40505.26

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: NinjaNo SMEAMD SME Enabled20406080100SE +/- 0.38, N = 3SE +/- 0.35, N = 375.3376.63

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUNo SMEAMD SME Enabled0.12380.24760.37140.49520.619SE +/- 0.00, N = 3SE +/- 0.00, N = 40.550.55MIN: 0.5 / MAX: 30.13MIN: 0.5 / MAX: 36.471. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUNo SMEAMD SME Enabled30K60K90K120K150KSE +/- 878.85, N = 3SE +/- 1824.08, N = 4150792.42148736.041. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsNo SMEAMD SME Enabled0.02920.05840.08760.11680.146SE +/- 0.00031, N = 3SE +/- 0.00010, N = 30.128310.12991

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUNo SMEAMD SME Enabled2004006008001000SE +/- 3.23, N = 3SE +/- 5.54, N = 31102.151127.51MIN: 799.38 / MAX: 1782.76MIN: 802.53 / MAX: 1835.351. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUNo SMEAMD SME Enabled1020304050SE +/- 0.12, N = 3SE +/- 0.21, N = 343.2942.331. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUNo SMEAMD SME Enabled2004006008001000SE +/- 5.83, N = 3SE +/- 0.32, N = 31115.561134.68MIN: 773.47 / MAX: 1806.09MIN: 842.92 / MAX: 1806.881. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUNo SMEAMD SME Enabled1020304050SE +/- 0.23, N = 3SE +/- 0.02, N = 342.7642.031. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUNo SMEAMD SME Enabled100200300400500SE +/- 0.87, N = 3SE +/- 0.71, N = 3469.81471.53MIN: 402.31 / MAX: 539.43MIN: 415.73 / MAX: 547.471. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUNo SMEAMD SME Enabled20406080100SE +/- 0.24, N = 3SE +/- 0.21, N = 3101.90101.551. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUNo SMEAMD SME Enabled50100150200250SE +/- 0.06, N = 3SE +/- 0.05, N = 3246.95247.23MIN: 205.26 / MAX: 303.51MIN: 208.68 / MAX: 293.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUNo SMEAMD SME Enabled4080120160200SE +/- 0.05, N = 3SE +/- 0.05, N = 3193.95193.831. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

Timed Linux Kernel Compilation

Build: defconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: defconfigAMD SME EnabledNo SME612182430SE +/- 0.23, N = 7SE +/- 0.22, N = 825.3025.71

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUNo SMEAMD SME Enabled1.19932.39863.59794.79725.9965SE +/- 0.01, N = 3SE +/- 0.00, N = 35.315.33MIN: 4.34 / MAX: 44.16MIN: 4.45 / MAX: 44.321. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUNo SMEAMD SME Enabled2K4K6K8K10KSE +/- 7.89, N = 3SE +/- 6.93, N = 39027.728993.901. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUNo SMEAMD SME Enabled1122334455SE +/- 0.08, N = 3SE +/- 0.08, N = 349.5449.78MIN: 37.27 / MAX: 225.52MIN: 38.76 / MAX: 189.771. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUNo SMEAMD SME Enabled2004006008001000SE +/- 1.57, N = 3SE +/- 1.50, N = 3967.90963.141. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUNo SMEAMD SME Enabled0.0810.1620.2430.3240.405SE +/- 0.00, N = 3SE +/- 0.00, N = 30.360.36MIN: 0.34 / MAX: 40.99MIN: 0.34 / MAX: 47.651. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAMD SME EnabledNo SME40K80K120K160K200KSE +/- 702.31, N = 3SE +/- 2225.36, N = 3167545.54165194.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUNo SMEAMD SME Enabled3691215SE +/- 0.01, N = 3SE +/- 0.00, N = 39.629.67MIN: 8.26 / MAX: 57.97MIN: 8.32 / MAX: 78.91. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUNo SMEAMD SME Enabled4K8K12K16K20KSE +/- 18.52, N = 3SE +/- 4.31, N = 319801.4019704.841. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUNo SMEAMD SME Enabled246810SE +/- 0.00, N = 3SE +/- 0.01, N = 36.446.59MIN: 5.05 / MAX: 61.63MIN: 5 / MAX: 63.21. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUNo SMEAMD SME Enabled16003200480064008000SE +/- 3.74, N = 3SE +/- 4.95, N = 37437.737274.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUNo SMEAMD SME Enabled0.9631.9262.8893.8524.815SE +/- 0.00, N = 3SE +/- 0.00, N = 34.284.28MIN: 3.51 / MAX: 38.77MIN: 3.5 / MAX: 42.321. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAMD SME EnabledNo SME2K4K6K8K10KSE +/- 4.87, N = 3SE +/- 2.65, N = 311184.7611180.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUNo SMEAMD SME Enabled1.07782.15563.23344.31125.389SE +/- 0.00, N = 3SE +/- 0.01, N = 34.794.79MIN: 3.96 / MAX: 30.92MIN: 3.95 / MAX: 30.11. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUNo SMEAMD SME Enabled2K4K6K8K10KSE +/- 3.98, N = 3SE +/- 13.80, N = 39997.689990.191. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled2004006008001000SE +/- 1.72, N = 3SE +/- 0.38, N = 31133.311143.01

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled20406080100SE +/- 0.09, N = 3SE +/- 0.05, N = 384.2883.82

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled2004006008001000SE +/- 0.43, N = 3SE +/- 0.31, N = 31134.421143.04

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled20406080100SE +/- 0.10, N = 3SE +/- 0.11, N = 384.2583.64

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled4080120160200SE +/- 0.23, N = 3SE +/- 0.02, N = 3155.10158.94

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled130260390520650SE +/- 0.96, N = 3SE +/- 0.11, N = 3617.03601.93

DaCapo Benchmark

Java Test: H2

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 9.12-MR1Java Test: H2No SMEAMD SME Enabled11002200330044005500SE +/- 54.36, N = 20SE +/- 50.10, N = 2048075050

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled306090120150SE +/- 0.08, N = 3SE +/- 0.14, N = 3125.53128.40

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled160320480640800SE +/- 0.54, N = 3SE +/- 0.69, N = 3762.70745.64

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled20406080100SE +/- 0.15, N = 3SE +/- 0.07, N = 379.4981.23

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled30060090012001500SE +/- 2.65, N = 3SE +/- 1.22, N = 31204.851179.00

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled1122334455SE +/- 0.05, N = 3SE +/- 0.13, N = 348.8350.11

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled400800120016002000SE +/- 1.94, N = 3SE +/- 4.55, N = 31962.101911.40

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingAMD SME EnabledNo SME300K600K900K1200K1500KSE +/- 7858.68, N = 3SE +/- 10921.46, N = 3116903811606321. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingNo SMEAMD SME Enabled200K400K600K800K1000KSE +/- 9930.11, N = 3SE +/- 6113.05, N = 39177828851351. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Neural Magic DeepSparse

Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled306090120150SE +/- 0.16, N = 3SE +/- 0.01, N = 3111.46113.83

Neural Magic DeepSparse

Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamNo SMEAMD SME Enabled2004006008001000SE +/- 1.61, N = 3SE +/- 0.37, N = 3858.47840.94

libavif avifenc

Encoder Speed: 2

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 2No SMEAMD SME Enabled816243240SE +/- 0.03, N = 3SE +/- 0.42, N = 434.6935.261. (CXX) g++ options: -O3 -fPIC -lm

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingNo SMEAMD SME Enabled20K40K60K80K100KSE +/- 460.77, N = 3SE +/- 422.37, N = 393071.089541.81. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingNo SMEAMD SME Enabled20K40K60K80K100KSE +/- 368.27, N = 3SE +/- 0.00, N = 383598.378718.71. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAMNo SMEAMD SME Enabled4080120160200SE +/- 0.24, N = 3SE +/- 0.41, N = 3166.0165.71. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAMNo SMEAMD SME Enabled100200300400500SE +/- 1.49, N = 3SE +/- 1.01, N = 3445.2444.81. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

Timed Godot Game Engine Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Godot Game Engine Compilation 3.2.3Time To CompileNo SMEAMD SME Enabled816243240SE +/- 0.48, N = 3SE +/- 0.36, N = 334.1435.04

OpenRadioss

Model: Cell Phone Drop Test

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Cell Phone Drop TestAMD SME EnabledNo SME510152025SE +/- 0.13, N = 3SE +/- 0.02, N = 318.3218.45

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of StateNo SMEAMD SME Enabled0.21020.42040.63060.84081.051SE +/- 0.002, N = 3SE +/- 0.003, N = 30.8840.934

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.4Video Input: Bosphorus 4KNo SMEAMD SME Enabled612182430SE +/- 0.29, N = 4SE +/- 0.17, N = 323.4823.291. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMNo SMEAMD SME Enabled306090120150SE +/- 0.25, N = 3SE +/- 0.31, N = 3157.8157.71. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMNo SMEAMD SME Enabled90180270360450SE +/- 0.49, N = 3SE +/- 3.12, N = 3415.1408.51. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: OFDM_Test

OpenBenchmarking.orgSamples / Second, More Is BettersrsRAN 22.04.1Test: OFDM_TestAMD SME EnabledNo SME30M60M90M120M150MSE +/- 883804.91, N = 3SE +/- 600925.21, N = 31626333331617333331. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUNo SMEAMD SME Enabled0.20670.41340.62010.82681.0335SE +/- 0.009364, N = 5SE +/- 0.004429, N = 30.9161330.918482MIN: 0.76MIN: 0.781. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUNo SMEAMD SME Enabled0.88271.76542.64813.53084.4135SE +/- 0.06074, N = 15SE +/- 0.05423, N = 33.892993.92313MIN: 2.77MIN: 2.891. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

QuantLib

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.21AMD SME EnabledNo SME7001400210028003500SE +/- 8.14, N = 3SE +/- 6.39, N = 33061.33052.81. (CXX) g++ options: -O3 -march=native -rdynamic

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 22.1Input: Carbon NanotubeAMD SME EnabledNo SME612182430SE +/- 0.05, N = 3SE +/- 0.16, N = 322.9823.171. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2022.1Implementation: MPI CPU - Input: water_GMX50_bareNo SMEAMD SME Enabled510152025SE +/- 0.03, N = 3SE +/- 0.03, N = 318.7118.621. (CXX) g++ options: -O3

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: Classroom - Compute: CPU-OnlyAMD SME EnabledNo SME510152025SE +/- 0.02, N = 3SE +/- 0.08, N = 320.9520.99

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUNo SMEAMD SME Enabled612182430SE +/- 0.08, N = 3SE +/- 0.20, N = 322.6823.14MIN: 19.97MIN: 20.171. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Liquid-DSP

Threads: 384 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 384 - Buffer Length: 256 - Filter Length: 57AMD SME EnabledNo SME2000M4000M6000M8000M10000MSE +/- 3605551.28, N = 3SE +/- 3785938.90, N = 310350000000103460000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 256 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 256 - Buffer Length: 256 - Filter Length: 57AMD SME EnabledNo SME2000M4000M6000M8000M10000MSE +/- 5206833.12, N = 3SE +/- 8082903.77, N = 310344666667103320000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 13 - Input: Bosphorus 4KAMD SME EnabledNo SME50100150200250SE +/- 4.08, N = 15SE +/- 6.22, N = 15251.44248.33

KTX-Software toktx

Settings: Zstd Compression 19

OpenBenchmarking.orgSeconds, Fewer Is BetterKTX-Software toktx 4.0Settings: Zstd Compression 19No SMEAMD SME Enabled510152025SE +/- 0.08, N = 3SE +/- 0.02, N = 318.8619.88

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RTLightmap.hdr.4096x4096AMD SME EnabledNo SME0.37350.7471.12051.4941.8675SE +/- 0.00, N = 3SE +/- 0.00, N = 31.661.66

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 256-QAMNo SMEAMD SME Enabled4080120160200SE +/- 0.47, N = 3SE +/- 0.09, N = 3172.7172.21. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 256-QAMAMD SME EnabledNo SME100200300400500SE +/- 0.03, N = 3SE +/- 1.15, N = 3445.7444.01. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Execution Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeNo SMEAMD SME Enabled51015202522.0822.131. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Mesh Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh TimeNo SMEAMD SME Enabled61218243025.0727.101. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Rodinia

Test: OpenMP LavaMD

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP LavaMDNo SMEAMD SME Enabled48121620SE +/- 0.13, N = 3SE +/- 0.05, N = 316.5116.671. (CXX) g++ options: -O2 -lOpenCL

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3No SMEAMD SME Enabled13K26K39K52K65KSE +/- 197.53, N = 3SE +/- 360.17, N = 359069.4157686.091. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi

srsRAN

Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMAMD SME EnabledNo SME20406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 394.994.41. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAMNo SMEAMD SME Enabled306090120150SE +/- 0.32, N = 3SE +/- 0.19, N = 3139.7139.11. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2No SMEAMD SME Enabled80160240320400SE +/- 3.20, N = 3SE +/- 3.99, N = 3345.34343.551. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2No SMEAMD SME Enabled2K4K6K8K10KSE +/- 80.07, N = 3SE +/- 99.74, N = 38633.548588.751. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM

OpenBenchmarking.orgUE Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 64-QAMAMD SME EnabledNo SME4080120160200SE +/- 0.46, N = 3SE +/- 0.84, N = 3165.9165.81. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB SISO 64-QAMAMD SME EnabledNo SME90180270360450SE +/- 0.45, N = 3SE +/- 0.64, N = 3415.7413.91. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

Xmrig

Variant: Monero - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Monero - Hash Count: 1MNo SMEAMD SME Enabled20K40K60K80K100KSE +/- 111.86, N = 3SE +/- 540.08, N = 3105141.7101932.11. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

Xsbench

OpenBenchmarking.orgLookups/s, More Is BetterXsbench 2017-07-06No SMEAMD SME Enabled6M12M18M24M30MSE +/- 43563.46, N = 3SE +/- 367701.15, N = 1529806415290214281. (CC) gcc options: -std=gnu99 -fopenmp -O3 -lm

Xmrig

Variant: Wownero - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Wownero - Hash Count: 1MNo SMEAMD SME Enabled30K60K90K120K150KSE +/- 211.86, N = 3SE +/- 341.44, N = 3126508.3123484.11. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUAMD SME EnabledNo SME0.19430.38860.58290.77720.9715SE +/- 0.004505, N = 3SE +/- 0.005137, N = 30.8504180.863726MIN: 0.74MIN: 0.751. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Very Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Very FastNo SMEAMD SME Enabled1632486480SE +/- 0.95, N = 3SE +/- 0.91, N = 374.3173.321. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Ultra Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Ultra FastNo SMEAMD SME Enabled20406080100SE +/- 0.77, N = 3SE +/- 0.97, N = 377.6876.221. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

ASTC Encoder

Preset: Exhaustive

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ExhaustiveAMD SME EnabledNo SME3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 311.8411.821. (CXX) g++ options: -O3 -flto -pthread

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CNo SMEAMD SME Enabled110K220K330K440K550KSE +/- 529.89, N = 3SE +/- 3984.99, N = 3496467.98494917.441. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CNo SMEAMD SME Enabled50K100K150K200K250KSE +/- 3645.86, N = 3SE +/- 2731.53, N = 3255564.19253299.331. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUNo SMEAMD SME Enabled0.11850.2370.35550.4740.5925SE +/- 0.001481, N = 3SE +/- 0.001428, N = 30.5220520.526628MIN: 0.42MIN: 0.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

ASTC Encoder

Preset: Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ThoroughAMD SME EnabledNo SME20406080100SE +/- 0.03, N = 3SE +/- 0.06, N = 3106.56106.421. (CXX) g++ options: -O3 -flto -pthread

Rodinia

Test: OpenMP CFD Solver

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP CFD SolverNo SMEAMD SME Enabled246810SE +/- 0.012, N = 3SE +/- 0.030, N = 35.9386.0431. (CXX) g++ options: -O2 -lOpenCL

x264

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2022-02-22Video Input: Bosphorus 4KNo SMEAMD SME Enabled20406080100SE +/- 1.42, N = 3SE +/- 0.62, N = 3106.86103.071. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -flto

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionNo SMEAMD SME Enabled0.99551.9912.98653.9824.9775SE +/- 0.01135008, N = 3SE +/- 0.04122391, N = 34.374205274.424245681. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer ISPC - Model: CrownNo SMEAMD SME Enabled4080120160200SE +/- 0.65, N = 3SE +/- 0.40, N = 3183.25180.57MIN: 135.3 / MAX: 213.14MIN: 129.9 / MAX: 210

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CNo SMEAMD SME Enabled50K100K150K200K250KSE +/- 2651.33, N = 4SE +/- 1868.66, N = 3223096.07220214.751. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateNo SMEAMD SME Enabled1632486480SE +/- 0.11, N = 3SE +/- 0.18, N = 370.3770.281. (CC) gcc options: -O3 -march=native -fopenmp

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AMD SME EnabledNo SME60120180240300SE +/- 0.33, N = 3SE +/- 0.65, N = 3291.63291.251. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AMD SME EnabledNo SME16003200480064008000SE +/- 8.37, N = 3SE +/- 16.32, N = 37290.647281.281. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

KTX-Software toktx

Settings: Zstd Compression 9

OpenBenchmarking.orgSeconds, Fewer Is BetterKTX-Software toktx 4.0Settings: Zstd Compression 9No SMEAMD SME Enabled0.62461.24921.87382.49843.123SE +/- 0.006, N = 3SE +/- 0.006, N = 32.7342.776

libavif avifenc

Encoder Speed: 6

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 6No SMEAMD SME Enabled0.55551.1111.66652.2222.7775SE +/- 0.006, N = 3SE +/- 0.019, N = 32.3932.4691. (CXX) g++ options: -O3 -fPIC -lm

NAS Parallel Benchmarks

Test / Class: EP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.CAMD SME EnabledNo SME4K8K12K16K20KSE +/- 73.01, N = 3SE +/- 54.14, N = 316462.3516457.941. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4


Phoronix Test Suite v10.8.5