4th Gen AMD EPYC "Genoa" Secure Memory Encryption (SME) benchmarks by Michael Larabel for a future article.
No SME Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AMD SME Enabled Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating No SME AMD SME Enabled 300K 600K 900K 1200K 1500K SE +/- 10921.46, N = 3 SE +/- 7858.68, N = 3 1160632 1169038 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
AOM AV1 This is a test of the AOMedia AV1 encoder (libaom) developed by AOMedia and Google as the AV1 Codec Library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K AMD SME Enabled No SME 8 16 24 32 40 SE +/- 0.56, N = 12 SE +/- 0.53, N = 15 33.12 34.47 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding AMD SME Enabled No SME 20K 40K 60K 80K 100K SE +/- 0.00, N = 3 SE +/- 368.27, N = 3 78718.7 83598.3 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding AMD SME Enabled No SME 20K 40K 60K 80K 100K SE +/- 422.37, N = 3 SE +/- 460.77, N = 3 89541.8 93071.0 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 106.42 106.56 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive No SME AMD SME Enabled 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 11.82 11.84 1. (CXX) g++ options: -O3 -flto -pthread
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Classroom - Compute: CPU-Only No SME AMD SME Enabled 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 20.99 20.95
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Barbershop - Compute: CPU-Only AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.45, N = 3 SE +/- 0.30, N = 3 81.57 80.77
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown AMD SME Enabled No SME 40 80 120 160 200 SE +/- 0.40, N = 3 SE +/- 0.65, N = 3 180.57 183.25 MIN: 129.9 / MAX: 210 MIN: 135.3 / MAX: 213.14
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 22.1 Input: Carbon Nanotube No SME AMD SME Enabled 6 12 18 24 30 SE +/- 0.16, N = 3 SE +/- 0.05, N = 3 23.17 22.98 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 AMD SME Enabled No SME 300M 600M 900M 1200M 1500M 1358510000 1426480000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 AMD SME Enabled No SME 300M 600M 900M 1200M 1500M 1526380000 1533180000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 AMD SME Enabled No SME 130M 260M 390M 520M 650M 572510000 593153000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 AMD SME Enabled No SME 200M 400M 600M 800M 1000M 835467000 838505000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare AMD SME Enabled No SME 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 18.62 18.71 1. (CXX) g++ options: -O3
KTX-Software toktx This is a benchmark of The Khronos Group's KTX-Software library and tools. KTX-Software provides "toktx" for converting/creating in the KTX container format for image textures. This benchmark times how long it takes to convert to KTX 2.0 format with various settings using a reference PNG sample input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better KTX-Software toktx 4.0 Settings: Zstd Compression 9 AMD SME Enabled No SME 0.6246 1.2492 1.8738 2.4984 3.123 SE +/- 0.006, N = 3 SE +/- 0.006, N = 3 2.776 2.734
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast AMD SME Enabled No SME 16 32 48 64 80 SE +/- 0.91, N = 3 SE +/- 0.95, N = 3 73.32 74.31 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.97, N = 3 SE +/- 0.77, N = 3 76.22 77.68 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6 AMD SME Enabled No SME 0.5555 1.111 1.6665 2.222 2.7775 SE +/- 0.019, N = 3 SE +/- 0.006, N = 3 2.469 2.393 1. (CXX) g++ options: -O3 -fPIC -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 No SME AMD SME Enabled 2000M 4000M 6000M 8000M 10000M SE +/- 8082903.77, N = 3 SE +/- 5206833.12, N = 3 10332000000 10344666667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 384 - Buffer Length: 256 - Filter Length: 57 No SME AMD SME Enabled 2000M 4000M 6000M 8000M 10000M SE +/- 3785938.90, N = 3 SE +/- 3605551.28, N = 3 10346000000 10350000000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 No SME AMD SME Enabled 1600 3200 4800 6400 8000 SE +/- 16.32, N = 3 SE +/- 8.37, N = 3 7281.28 7290.64 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 No SME AMD SME Enabled 60 120 180 240 300 SE +/- 0.65, N = 3 SE +/- 0.33, N = 3 291.25 291.63 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 AMD SME Enabled No SME 2K 4K 6K 8K 10K SE +/- 99.74, N = 3 SE +/- 80.07, N = 3 8588.75 8633.54 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 AMD SME Enabled No SME 80 160 240 320 400 SE +/- 3.99, N = 3 SE +/- 3.20, N = 3 343.55 345.34 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms AMD SME Enabled No SME 0.0292 0.0584 0.0876 0.1168 0.146 SE +/- 0.00010, N = 3 SE +/- 0.00031, N = 3 0.12991 0.12831
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C AMD SME Enabled No SME 110K 220K 330K 440K 550K SE +/- 3984.99, N = 3 SE +/- 529.89, N = 3 494917.44 496467.98 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C No SME AMD SME Enabled 4K 8K 12K 16K 20K SE +/- 54.14, N = 3 SE +/- 73.01, N = 3 16457.94 16462.35 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C AMD SME Enabled No SME 50K 100K 150K 200K 250K SE +/- 1868.66, N = 3 SE +/- 2651.33, N = 4 220214.75 223096.07 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C AMD SME Enabled No SME 50K 100K 150K 200K 250K SE +/- 2731.53, N = 3 SE +/- 3645.86, N = 3 253299.33 255564.19 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.11, N = 3 SE +/- 0.10, N = 3 83.64 84.25
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 200 400 600 800 1000 SE +/- 0.31, N = 3 SE +/- 0.43, N = 3 1143.04 1134.42
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 160 320 480 640 800 SE +/- 0.69, N = 3 SE +/- 0.54, N = 3 745.64 762.70
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 30 60 90 120 150 SE +/- 0.14, N = 3 SE +/- 0.08, N = 3 128.40 125.53
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 400 800 1200 1600 2000 SE +/- 4.55, N = 3 SE +/- 1.94, N = 3 1911.40 1962.10
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 300 600 900 1200 1500 SE +/- 1.22, N = 3 SE +/- 2.65, N = 3 1179.00 1204.85
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.15, N = 3 81.23 79.49
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 130 260 390 520 650 SE +/- 0.11, N = 3 SE +/- 0.96, N = 3 601.93 617.03
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.23, N = 3 158.94 155.10
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 83.82 84.28
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream AMD SME Enabled No SME 200 400 600 800 1000 SE +/- 0.38, N = 3 SE +/- 1.72, N = 3 1143.01 1133.31
nginx This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better nginx 1.23.2 Connections: 500 AMD SME Enabled No SME 40K 80K 120K 160K 200K SE +/- 124.29, N = 3 SE +/- 238.08, N = 3 196386.41 201056.69 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
NWChem NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better NWChem 7.0.2 Input: C240 Buckyball AMD SME Enabled No SME 300 600 900 1200 1500 1543.1 1524.4 1. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU No SME AMD SME Enabled 0.1943 0.3886 0.5829 0.7772 0.9715 SE +/- 0.005137, N = 3 SE +/- 0.004505, N = 3 0.863726 0.850418 MIN: 0.75 MIN: 0.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU AMD SME Enabled No SME 0.8827 1.7654 2.6481 3.5308 4.4135 SE +/- 0.05423, N = 3 SE +/- 0.06074, N = 15 3.92313 3.89299 MIN: 2.89 MIN: 2.77 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU AMD SME Enabled No SME 0.1185 0.237 0.3555 0.474 0.5925 SE +/- 0.001428, N = 3 SE +/- 0.001481, N = 3 0.526628 0.522052 MIN: 0.42 MIN: 0.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU AMD SME Enabled No SME 6 12 18 24 30 SE +/- 0.20, N = 3 SE +/- 0.08, N = 3 23.14 22.68 MIN: 20.17 MIN: 19.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU AMD SME Enabled No SME 0.2067 0.4134 0.6201 0.8268 1.0335 SE +/- 0.004429, N = 3 SE +/- 0.009364, N = 5 0.918482 0.916133 MIN: 0.78 MIN: 0.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU No SME AMD SME Enabled 400 800 1200 1600 2000 SE +/- 18.08, N = 7 SE +/- 18.68, N = 6 2011.15 2002.43 MIN: 1936.22 MIN: 1924.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard AMD SME Enabled No SME 1200 2400 3600 4800 6000 SE +/- 40.49, N = 3 SE +/- 15.47, N = 3 5583 5600 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Mesh Time AMD SME Enabled No SME 6 12 18 24 30 27.10 25.07 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Execution Time AMD SME Enabled No SME 5 10 15 20 25 22.13 22.08 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bumper Beam AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.73, N = 3 79.97 79.85
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Face Detection FP16 - Device: CPU AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.21, N = 3 SE +/- 0.24, N = 3 101.55 101.90 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Face Detection FP16 - Device: CPU AMD SME Enabled No SME 100 200 300 400 500 SE +/- 0.71, N = 3 SE +/- 0.87, N = 3 471.53 469.81 MIN: 415.73 / MAX: 547.47 MIN: 402.31 / MAX: 539.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Detection FP16 - Device: CPU AMD SME Enabled No SME 10 20 30 40 50 SE +/- 0.21, N = 3 SE +/- 0.12, N = 3 42.33 43.29 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Person Detection FP16 - Device: CPU AMD SME Enabled No SME 200 400 600 800 1000 SE +/- 5.54, N = 3 SE +/- 3.23, N = 3 1127.51 1102.15 MIN: 802.53 / MAX: 1835.35 MIN: 799.38 / MAX: 1782.76 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Detection FP32 - Device: CPU AMD SME Enabled No SME 10 20 30 40 50 SE +/- 0.02, N = 3 SE +/- 0.23, N = 3 42.03 42.76 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Person Detection FP32 - Device: CPU AMD SME Enabled No SME 200 400 600 800 1000 SE +/- 0.32, N = 3 SE +/- 5.83, N = 3 1134.68 1115.56 MIN: 842.92 / MAX: 1806.88 MIN: 773.47 / MAX: 1806.09 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16 - Device: CPU AMD SME Enabled No SME 1600 3200 4800 6400 8000 SE +/- 4.95, N = 3 SE +/- 3.74, N = 3 7274.98 7437.73 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16 - Device: CPU AMD SME Enabled No SME 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 6.59 6.44 MIN: 5 / MAX: 63.2 MIN: 5.05 / MAX: 61.63 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Face Detection FP16-INT8 - Device: CPU AMD SME Enabled No SME 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 193.83 193.95 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Face Detection FP16-INT8 - Device: CPU AMD SME Enabled No SME 50 100 150 200 250 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 247.23 246.95 MIN: 208.68 / MAX: 293.42 MIN: 205.26 / MAX: 303.51 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 2K 4K 6K 8K 10K SE +/- 2.65, N = 3 SE +/- 4.87, N = 3 11180.63 11184.76 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU AMD SME Enabled No SME 0.963 1.926 2.889 3.852 4.815 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.28 4.28 MIN: 3.5 / MAX: 42.32 MIN: 3.51 / MAX: 38.77 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16 - Device: CPU AMD SME Enabled No SME 2K 4K 6K 8K 10K SE +/- 13.80, N = 3 SE +/- 3.98, N = 3 9990.19 9997.68 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16 - Device: CPU AMD SME Enabled No SME 1.0778 2.1556 3.2334 4.3112 5.389 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 4.79 4.79 MIN: 3.95 / MAX: 30.1 MIN: 3.96 / MAX: 30.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU AMD SME Enabled No SME 200 400 600 800 1000 SE +/- 1.50, N = 3 SE +/- 1.57, N = 3 963.14 967.90 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU AMD SME Enabled No SME 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.08, N = 3 49.78 49.54 MIN: 38.76 / MAX: 189.77 MIN: 37.27 / MAX: 225.52 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU AMD SME Enabled No SME 4K 8K 12K 16K 20K SE +/- 4.31, N = 3 SE +/- 18.52, N = 3 19704.84 19801.40 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU AMD SME Enabled No SME 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 9.67 9.62 MIN: 8.32 / MAX: 78.9 MIN: 8.26 / MAX: 57.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU AMD SME Enabled No SME 2K 4K 6K 8K 10K SE +/- 6.93, N = 3 SE +/- 7.89, N = 3 8993.90 9027.72 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU AMD SME Enabled No SME 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 5.33 5.31 MIN: 4.45 / MAX: 44.32 MIN: 4.34 / MAX: 44.16 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU AMD SME Enabled No SME 30K 60K 90K 120K 150K SE +/- 1824.08, N = 4 SE +/- 878.85, N = 3 148736.04 150792.42 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU AMD SME Enabled No SME 0.1238 0.2476 0.3714 0.4952 0.619 SE +/- 0.00, N = 4 SE +/- 0.00, N = 3 0.55 0.55 MIN: 0.5 / MAX: 36.47 MIN: 0.5 / MAX: 30.13 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU No SME AMD SME Enabled 40K 80K 120K 160K 200K SE +/- 2225.36, N = 3 SE +/- 702.31, N = 3 165194.42 167545.54 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU AMD SME Enabled No SME 0.081 0.162 0.243 0.324 0.405 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.36 0.36 MIN: 0.34 / MAX: 47.65 MIN: 0.34 / MAX: 40.99 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenVKL OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.3.1 Benchmark: vklBenchmark ISPC AMD SME Enabled No SME 300 600 900 1200 1500 SE +/- 15.55, N = 4 SE +/- 6.81, N = 3 1286 1322 MIN: 328 / MAX: 5485 MIN: 329 / MAX: 4485
OSPRay Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/pathtracer/real_time AMD SME Enabled No SME 50 100 150 200 250 SE +/- 1.51, N = 3 SE +/- 1.25, N = 3 229.88 230.62
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/ao/real_time AMD SME Enabled No SME 10 20 30 40 50 SE +/- 0.14, N = 3 SE +/- 0.04, N = 3 43.38 43.85
OSPRay Studio Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.11 Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer AMD SME Enabled No SME 5K 10K 15K 20K 25K SE +/- 32.58, N = 3 SE +/- 6.36, N = 3 22614 22043 1. (CXX) g++ options: -O3 -ldl
PostgreSQL This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 100 - Clients: 250 - Mode: Read Only No SME AMD SME Enabled 600K 1200K 1800K 2400K 3000K SE +/- 16891.69, N = 3 SE +/- 40566.19, N = 3 2951147 2970869 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PyHPC Benchmarks PyHPC-Benchmarks is a suite of Python high performance computing benchmarks for execution on CPUs and GPUs using various popular Python HPC libraries. The PyHPC CPU-based benchmarks focus on sequential CPU performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better PyHPC Benchmarks 3.0 Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State AMD SME Enabled No SME 0.2102 0.4204 0.6306 0.8408 1.051 SE +/- 0.003, N = 3 SE +/- 0.002, N = 3 0.934 0.884
OpenBenchmarking.org Seconds, Fewer Is Better PyHPC Benchmarks 3.0 Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing AMD SME Enabled No SME 0.3924 0.7848 1.1772 1.5696 1.962 SE +/- 0.005, N = 3 SE +/- 0.006, N = 3 1.744 1.691
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 No SME AMD SME Enabled 700 1400 2100 2800 3500 SE +/- 6.39, N = 3 SE +/- 8.14, N = 3 3052.8 3061.3 1. (CXX) g++ options: -O3 -march=native -rdynamic
RELION RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU AMD SME Enabled No SME 30 60 90 120 150 SE +/- 1.42, N = 5 SE +/- 1.40, N = 5 130.43 128.66 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi
OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.14 Test: In-Memory Database Shootout AMD SME Enabled No SME 1000 2000 3000 4000 5000 SE +/- 69.41, N = 3 SE +/- 54.74, N = 12 4838.5 4764.6 MIN: 4339.45 / MAX: 6109.38 MIN: 4124.15 / MAX: 6577.01
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD AMD SME Enabled No SME 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.13, N = 3 16.67 16.51 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver AMD SME Enabled No SME 2 4 6 8 10 SE +/- 0.030, N = 3 SE +/- 0.012, N = 3 6.043 5.938 1. (CXX) g++ options: -O2 -lOpenCL
srsRAN srsRAN is an open-source LTE/5G software radio suite created by Software Radio Systems (SRS). The srsRAN radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Samples / Second, More Is Better srsRAN 22.04.1 Test: OFDM_Test No SME AMD SME Enabled 30M 60M 90M 120M 150M SE +/- 600925.21, N = 3 SE +/- 883804.91, N = 3 161733333 162633333 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM AMD SME Enabled No SME 90 180 270 360 450 SE +/- 3.12, N = 3 SE +/- 0.49, N = 3 408.5 415.1 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM AMD SME Enabled No SME 30 60 90 120 150 SE +/- 0.31, N = 3 SE +/- 0.25, N = 3 157.7 157.8 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM No SME AMD SME Enabled 90 180 270 360 450 SE +/- 0.64, N = 3 SE +/- 0.45, N = 3 413.9 415.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 0.46, N = 3 165.8 165.9 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM AMD SME Enabled No SME 100 200 300 400 500 SE +/- 1.01, N = 3 SE +/- 1.49, N = 3 444.8 445.2 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM AMD SME Enabled No SME 40 80 120 160 200 SE +/- 0.41, N = 3 SE +/- 0.24, N = 3 165.7 166.0 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM No SME AMD SME Enabled 100 200 300 400 500 SE +/- 1.15, N = 3 SE +/- 0.03, N = 3 444.0 445.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM AMD SME Enabled No SME 40 80 120 160 200 SE +/- 0.09, N = 3 SE +/- 0.47, N = 3 172.2 172.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM AMD SME Enabled No SME 30 60 90 120 150 SE +/- 0.19, N = 3 SE +/- 0.32, N = 3 139.1 139.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.22, N = 3 SE +/- 0.09, N = 3 94.4 94.9 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 13 - Input: Bosphorus 4K No SME AMD SME Enabled 50 100 150 200 250 SE +/- 6.22, N = 15 SE +/- 4.08, N = 15 248.33 251.44
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 64 - Model: AlexNet AMD SME Enabled No SME 110 220 330 440 550 SE +/- 7.26, N = 15 SE +/- 6.01, N = 15 505.26 508.40
WRF WRF, the Weather Research and Forecasting Model, is a "next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. It features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WRF 4.2.2 Input: conus 2.5km AMD SME Enabled No SME 900 1800 2700 3600 4500 4116.62 4077.19 1. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
x264 This is a multi-threaded test of the x264 video encoder run on the CPU with a choice of 1080p or 4K video input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2022-02-22 Video Input: Bosphorus 4K AMD SME Enabled No SME 20 40 60 80 100 SE +/- 0.62, N = 3 SE +/- 1.42, N = 3 103.07 106.86 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -flto
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD SME Enabled No SME 6 12 18 24 30 SE +/- 0.17, N = 3 SE +/- 0.29, N = 4 23.29 23.48 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction AMD SME Enabled No SME 0.9955 1.991 2.9865 3.982 4.9775 SE +/- 0.04122391, N = 3 SE +/- 0.01135008, N = 3 4.42424568 4.37420527 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Monero - Hash Count: 1M AMD SME Enabled No SME 20K 40K 60K 80K 100K SE +/- 540.08, N = 3 SE +/- 111.86, N = 3 101932.1 105141.7 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Wownero - Hash Count: 1M AMD SME Enabled No SME 30K 60K 90K 120K 150K SE +/- 341.44, N = 3 SE +/- 211.86, N = 3 123484.1 126508.3 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed AMD SME Enabled No SME 12 24 36 48 60 SE +/- 0.70, N = 3 SE +/- 1.03, N = 15 49.9 52.9 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed No SME AMD SME Enabled 800 1600 2400 3200 4000 SE +/- 14.95, N = 15 SE +/- 1.04, N = 3 3825.0 3837.3 1. (CC) gcc options: -O3 -pthread -lz -llzma
No SME Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 20 December 2022 18:43 by user phoronix.
AMD SME Enabled Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 19 December 2022 16:26 by user phoronix.