4th Gen AMD EPYC "Genoa" Secure Memory Encryption (SME) benchmarks by Michael Larabel for a future article.
No SME Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AMD SME Enabled Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating No SME AMD SME Enabled 300K 600K 900K 1200K 1500K SE +/- 10921.46, N = 3 SE +/- 7858.68, N = 3 1160632 1169038 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
AOM AV1 This is a test of the AOMedia AV1 encoder (libaom) developed by AOMedia and Google as the AV1 Codec Library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K No SME AMD SME Enabled 8 16 24 32 40 SE +/- 0.53, N = 15 SE +/- 0.56, N = 12 34.47 33.12 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding No SME AMD SME Enabled 20K 40K 60K 80K 100K SE +/- 368.27, N = 3 SE +/- 0.00, N = 3 83598.3 78718.7 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding No SME AMD SME Enabled 20K 40K 60K 80K 100K SE +/- 460.77, N = 3 SE +/- 422.37, N = 3 93071.0 89541.8 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 106.42 106.56 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive No SME AMD SME Enabled 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 11.82 11.84 1. (CXX) g++ options: -O3 -flto -pthread
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Classroom - Compute: CPU-Only No SME AMD SME Enabled 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 20.99 20.95
OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.4 Blend File: Barbershop - Compute: CPU-Only No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.30, N = 3 SE +/- 0.45, N = 3 80.77 81.57
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.65, N = 3 SE +/- 0.40, N = 3 183.25 180.57 MIN: 135.3 / MAX: 213.14 MIN: 129.9 / MAX: 210
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 22.1 Input: Carbon Nanotube No SME AMD SME Enabled 6 12 18 24 30 SE +/- 0.16, N = 3 SE +/- 0.05, N = 3 23.17 22.98 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 No SME AMD SME Enabled 300M 600M 900M 1200M 1500M 1426480000 1358510000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 No SME AMD SME Enabled 300M 600M 900M 1200M 1500M 1533180000 1526380000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 No SME AMD SME Enabled 130M 260M 390M 520M 650M 593153000 572510000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 No SME AMD SME Enabled 200M 400M 600M 800M 1000M 838505000 835467000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare No SME AMD SME Enabled 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 18.71 18.62 1. (CXX) g++ options: -O3
KTX-Software toktx This is a benchmark of The Khronos Group's KTX-Software library and tools. KTX-Software provides "toktx" for converting/creating in the KTX container format for image textures. This benchmark times how long it takes to convert to KTX 2.0 format with various settings using a reference PNG sample input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better KTX-Software toktx 4.0 Settings: Zstd Compression 9 No SME AMD SME Enabled 0.6246 1.2492 1.8738 2.4984 3.123 SE +/- 0.006, N = 3 SE +/- 0.006, N = 3 2.734 2.776
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast No SME AMD SME Enabled 16 32 48 64 80 SE +/- 0.95, N = 3 SE +/- 0.91, N = 3 74.31 73.32 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.77, N = 3 SE +/- 0.97, N = 3 77.68 76.22 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6 No SME AMD SME Enabled 0.5555 1.111 1.6665 2.222 2.7775 SE +/- 0.006, N = 3 SE +/- 0.019, N = 3 2.393 2.469 1. (CXX) g++ options: -O3 -fPIC -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 No SME AMD SME Enabled 2000M 4000M 6000M 8000M 10000M SE +/- 8082903.77, N = 3 SE +/- 5206833.12, N = 3 10332000000 10344666667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 384 - Buffer Length: 256 - Filter Length: 57 No SME AMD SME Enabled 2000M 4000M 6000M 8000M 10000M SE +/- 3785938.90, N = 3 SE +/- 3605551.28, N = 3 10346000000 10350000000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 No SME AMD SME Enabled 1600 3200 4800 6400 8000 SE +/- 16.32, N = 3 SE +/- 8.37, N = 3 7281.28 7290.64 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 No SME AMD SME Enabled 60 120 180 240 300 SE +/- 0.65, N = 3 SE +/- 0.33, N = 3 291.25 291.63 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 No SME AMD SME Enabled 2K 4K 6K 8K 10K SE +/- 80.07, N = 3 SE +/- 99.74, N = 3 8633.54 8588.75 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 No SME AMD SME Enabled 80 160 240 320 400 SE +/- 3.20, N = 3 SE +/- 3.99, N = 3 345.34 343.55 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms No SME AMD SME Enabled 0.0292 0.0584 0.0876 0.1168 0.146 SE +/- 0.00031, N = 3 SE +/- 0.00010, N = 3 0.12831 0.12991
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C No SME AMD SME Enabled 110K 220K 330K 440K 550K SE +/- 529.89, N = 3 SE +/- 3984.99, N = 3 496467.98 494917.44 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C No SME AMD SME Enabled 4K 8K 12K 16K 20K SE +/- 54.14, N = 3 SE +/- 73.01, N = 3 16457.94 16462.35 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C No SME AMD SME Enabled 50K 100K 150K 200K 250K SE +/- 2651.33, N = 4 SE +/- 1868.66, N = 3 223096.07 220214.75 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C No SME AMD SME Enabled 50K 100K 150K 200K 250K SE +/- 3645.86, N = 3 SE +/- 2731.53, N = 3 255564.19 253299.33 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.4
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.11, N = 3 84.25 83.64
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 200 400 600 800 1000 SE +/- 0.43, N = 3 SE +/- 0.31, N = 3 1134.42 1143.04
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 160 320 480 640 800 SE +/- 0.54, N = 3 SE +/- 0.69, N = 3 762.70 745.64
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.14, N = 3 125.53 128.40
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 400 800 1200 1600 2000 SE +/- 1.94, N = 3 SE +/- 4.55, N = 3 1962.10 1911.40
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 300 600 900 1200 1500 SE +/- 2.65, N = 3 SE +/- 1.22, N = 3 1204.85 1179.00
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.07, N = 3 79.49 81.23
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 130 260 390 520 650 SE +/- 0.96, N = 3 SE +/- 0.11, N = 3 617.03 601.93
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.23, N = 3 SE +/- 0.02, N = 3 155.10 158.94
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 84.28 83.82
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream No SME AMD SME Enabled 200 400 600 800 1000 SE +/- 1.72, N = 3 SE +/- 0.38, N = 3 1133.31 1143.01
nginx This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better nginx 1.23.2 Connections: 500 No SME AMD SME Enabled 40K 80K 120K 160K 200K SE +/- 238.08, N = 3 SE +/- 124.29, N = 3 201056.69 196386.41 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
NWChem NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better NWChem 7.0.2 Input: C240 Buckyball No SME AMD SME Enabled 300 600 900 1200 1500 1524.4 1543.1 1. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU No SME AMD SME Enabled 0.1943 0.3886 0.5829 0.7772 0.9715 SE +/- 0.005137, N = 3 SE +/- 0.004505, N = 3 0.863726 0.850418 MIN: 0.75 MIN: 0.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU No SME AMD SME Enabled 0.8827 1.7654 2.6481 3.5308 4.4135 SE +/- 0.06074, N = 15 SE +/- 0.05423, N = 3 3.89299 3.92313 MIN: 2.77 MIN: 2.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU No SME AMD SME Enabled 0.1185 0.237 0.3555 0.474 0.5925 SE +/- 0.001481, N = 3 SE +/- 0.001428, N = 3 0.522052 0.526628 MIN: 0.42 MIN: 0.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU No SME AMD SME Enabled 6 12 18 24 30 SE +/- 0.08, N = 3 SE +/- 0.20, N = 3 22.68 23.14 MIN: 19.97 MIN: 20.17 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU No SME AMD SME Enabled 0.2067 0.4134 0.6201 0.8268 1.0335 SE +/- 0.009364, N = 5 SE +/- 0.004429, N = 3 0.916133 0.918482 MIN: 0.76 MIN: 0.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU No SME AMD SME Enabled 400 800 1200 1600 2000 SE +/- 18.08, N = 7 SE +/- 18.68, N = 6 2011.15 2002.43 MIN: 1936.22 MIN: 1924.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard No SME AMD SME Enabled 1200 2400 3600 4800 6000 SE +/- 15.47, N = 3 SE +/- 40.49, N = 3 5600 5583 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Mesh Time No SME AMD SME Enabled 6 12 18 24 30 25.07 27.10 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Execution Time No SME AMD SME Enabled 5 10 15 20 25 22.08 22.13 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bumper Beam No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.73, N = 3 SE +/- 0.15, N = 3 79.85 79.97
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Face Detection FP16 - Device: CPU No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.24, N = 3 SE +/- 0.21, N = 3 101.90 101.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Face Detection FP16 - Device: CPU No SME AMD SME Enabled 100 200 300 400 500 SE +/- 0.87, N = 3 SE +/- 0.71, N = 3 469.81 471.53 MIN: 402.31 / MAX: 539.43 MIN: 415.73 / MAX: 547.47 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Detection FP16 - Device: CPU No SME AMD SME Enabled 10 20 30 40 50 SE +/- 0.12, N = 3 SE +/- 0.21, N = 3 43.29 42.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Person Detection FP16 - Device: CPU No SME AMD SME Enabled 200 400 600 800 1000 SE +/- 3.23, N = 3 SE +/- 5.54, N = 3 1102.15 1127.51 MIN: 799.38 / MAX: 1782.76 MIN: 802.53 / MAX: 1835.35 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Detection FP32 - Device: CPU No SME AMD SME Enabled 10 20 30 40 50 SE +/- 0.23, N = 3 SE +/- 0.02, N = 3 42.76 42.03 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Person Detection FP32 - Device: CPU No SME AMD SME Enabled 200 400 600 800 1000 SE +/- 5.83, N = 3 SE +/- 0.32, N = 3 1115.56 1134.68 MIN: 773.47 / MAX: 1806.09 MIN: 842.92 / MAX: 1806.88 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16 - Device: CPU No SME AMD SME Enabled 1600 3200 4800 6400 8000 SE +/- 3.74, N = 3 SE +/- 4.95, N = 3 7437.73 7274.98 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16 - Device: CPU No SME AMD SME Enabled 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 6.44 6.59 MIN: 5.05 / MAX: 61.63 MIN: 5 / MAX: 63.2 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Face Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 193.95 193.83 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Face Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 50 100 150 200 250 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 246.95 247.23 MIN: 205.26 / MAX: 303.51 MIN: 208.68 / MAX: 293.42 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 2K 4K 6K 8K 10K SE +/- 2.65, N = 3 SE +/- 4.87, N = 3 11180.63 11184.76 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 0.963 1.926 2.889 3.852 4.815 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.28 4.28 MIN: 3.51 / MAX: 38.77 MIN: 3.5 / MAX: 42.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16 - Device: CPU No SME AMD SME Enabled 2K 4K 6K 8K 10K SE +/- 3.98, N = 3 SE +/- 13.80, N = 3 9997.68 9990.19 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16 - Device: CPU No SME AMD SME Enabled 1.0778 2.1556 3.2334 4.3112 5.389 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 4.79 4.79 MIN: 3.96 / MAX: 30.92 MIN: 3.95 / MAX: 30.1 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU No SME AMD SME Enabled 200 400 600 800 1000 SE +/- 1.57, N = 3 SE +/- 1.50, N = 3 967.90 963.14 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU No SME AMD SME Enabled 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.08, N = 3 49.54 49.78 MIN: 37.27 / MAX: 225.52 MIN: 38.76 / MAX: 189.77 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 4K 8K 12K 16K 20K SE +/- 18.52, N = 3 SE +/- 4.31, N = 3 19801.40 19704.84 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU No SME AMD SME Enabled 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 9.62 9.67 MIN: 8.26 / MAX: 57.97 MIN: 8.32 / MAX: 78.9 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU No SME AMD SME Enabled 2K 4K 6K 8K 10K SE +/- 7.89, N = 3 SE +/- 6.93, N = 3 9027.72 8993.90 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU No SME AMD SME Enabled 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 5.31 5.33 MIN: 4.34 / MAX: 44.16 MIN: 4.45 / MAX: 44.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU No SME AMD SME Enabled 30K 60K 90K 120K 150K SE +/- 878.85, N = 3 SE +/- 1824.08, N = 4 150792.42 148736.04 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU No SME AMD SME Enabled 0.1238 0.2476 0.3714 0.4952 0.619 SE +/- 0.00, N = 3 SE +/- 0.00, N = 4 0.55 0.55 MIN: 0.5 / MAX: 30.13 MIN: 0.5 / MAX: 36.47 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU No SME AMD SME Enabled 40K 80K 120K 160K 200K SE +/- 2225.36, N = 3 SE +/- 702.31, N = 3 165194.42 167545.54 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU No SME AMD SME Enabled 0.081 0.162 0.243 0.324 0.405 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.36 0.36 MIN: 0.34 / MAX: 40.99 MIN: 0.34 / MAX: 47.65 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -flto -shared
OpenVKL OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.3.1 Benchmark: vklBenchmark ISPC No SME AMD SME Enabled 300 600 900 1200 1500 SE +/- 6.81, N = 3 SE +/- 15.55, N = 4 1322 1286 MIN: 329 / MAX: 4485 MIN: 328 / MAX: 5485
OSPRay Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: particle_volume/pathtracer/real_time No SME AMD SME Enabled 50 100 150 200 250 SE +/- 1.25, N = 3 SE +/- 1.51, N = 3 230.62 229.88
OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.10 Benchmark: gravity_spheres_volume/dim_512/ao/real_time No SME AMD SME Enabled 10 20 30 40 50 SE +/- 0.04, N = 3 SE +/- 0.14, N = 3 43.85 43.38
OSPRay Studio Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.11 Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer No SME AMD SME Enabled 5K 10K 15K 20K 25K SE +/- 6.36, N = 3 SE +/- 32.58, N = 3 22043 22614 1. (CXX) g++ options: -O3 -ldl
PostgreSQL This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 100 - Clients: 250 - Mode: Read Only No SME AMD SME Enabled 600K 1200K 1800K 2400K 3000K SE +/- 16891.69, N = 3 SE +/- 40566.19, N = 3 2951147 2970869 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PyHPC Benchmarks PyHPC-Benchmarks is a suite of Python high performance computing benchmarks for execution on CPUs and GPUs using various popular Python HPC libraries. The PyHPC CPU-based benchmarks focus on sequential CPU performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better PyHPC Benchmarks 3.0 Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State No SME AMD SME Enabled 0.2102 0.4204 0.6306 0.8408 1.051 SE +/- 0.002, N = 3 SE +/- 0.003, N = 3 0.884 0.934
OpenBenchmarking.org Seconds, Fewer Is Better PyHPC Benchmarks 3.0 Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing No SME AMD SME Enabled 0.3924 0.7848 1.1772 1.5696 1.962 SE +/- 0.006, N = 3 SE +/- 0.005, N = 3 1.691 1.744
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 No SME AMD SME Enabled 700 1400 2100 2800 3500 SE +/- 6.39, N = 3 SE +/- 8.14, N = 3 3052.8 3061.3 1. (CXX) g++ options: -O3 -march=native -rdynamic
RELION RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU No SME AMD SME Enabled 30 60 90 120 150 SE +/- 1.40, N = 5 SE +/- 1.42, N = 5 128.66 130.43 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -lmpi_cxx -lmpi
OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.14 Test: In-Memory Database Shootout No SME AMD SME Enabled 1000 2000 3000 4000 5000 SE +/- 54.74, N = 12 SE +/- 69.41, N = 3 4764.6 4838.5 MIN: 4124.15 / MAX: 6577.01 MIN: 4339.45 / MAX: 6109.38
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD No SME AMD SME Enabled 4 8 12 16 20 SE +/- 0.13, N = 3 SE +/- 0.05, N = 3 16.51 16.67 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver No SME AMD SME Enabled 2 4 6 8 10 SE +/- 0.012, N = 3 SE +/- 0.030, N = 3 5.938 6.043 1. (CXX) g++ options: -O2 -lOpenCL
srsRAN srsRAN is an open-source LTE/5G software radio suite created by Software Radio Systems (SRS). The srsRAN radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Samples / Second, More Is Better srsRAN 22.04.1 Test: OFDM_Test No SME AMD SME Enabled 30M 60M 90M 120M 150M SE +/- 600925.21, N = 3 SE +/- 883804.91, N = 3 161733333 162633333 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM No SME AMD SME Enabled 90 180 270 360 450 SE +/- 0.49, N = 3 SE +/- 3.12, N = 3 415.1 408.5 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM No SME AMD SME Enabled 30 60 90 120 150 SE +/- 0.25, N = 3 SE +/- 0.31, N = 3 157.8 157.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM No SME AMD SME Enabled 90 180 270 360 450 SE +/- 0.64, N = 3 SE +/- 0.45, N = 3 413.9 415.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 64-QAM No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 0.46, N = 3 165.8 165.9 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM No SME AMD SME Enabled 100 200 300 400 500 SE +/- 1.49, N = 3 SE +/- 1.01, N = 3 445.2 444.8 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB MIMO 256-QAM No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.24, N = 3 SE +/- 0.41, N = 3 166.0 165.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM No SME AMD SME Enabled 100 200 300 400 500 SE +/- 1.15, N = 3 SE +/- 0.03, N = 3 444.0 445.7 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 4G PHY_DL_Test 100 PRB SISO 256-QAM No SME AMD SME Enabled 40 80 120 160 200 SE +/- 0.47, N = 3 SE +/- 0.09, N = 3 172.7 172.2 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org eNb Mb/s, More Is Better srsRAN 22.04.1 Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM No SME AMD SME Enabled 30 60 90 120 150 SE +/- 0.32, N = 3 SE +/- 0.19, N = 3 139.7 139.1 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
OpenBenchmarking.org UE Mb/s, More Is Better srsRAN 22.04.1 Test: 5G PHY_DL_NR Test 52 PRB SISO 64-QAM No SME AMD SME Enabled 20 40 60 80 100 SE +/- 0.22, N = 3 SE +/- 0.09, N = 3 94.4 94.9 1. (CXX) g++ options: -std=c++14 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 13 - Input: Bosphorus 4K No SME AMD SME Enabled 50 100 150 200 250 SE +/- 6.22, N = 15 SE +/- 4.08, N = 15 248.33 251.44
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 64 - Model: AlexNet No SME AMD SME Enabled 110 220 330 440 550 SE +/- 6.01, N = 15 SE +/- 7.26, N = 15 508.40 505.26
WRF WRF, the Weather Research and Forecasting Model, is a "next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. It features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WRF 4.2.2 Input: conus 2.5km No SME AMD SME Enabled 900 1800 2700 3600 4500 4077.19 4116.62 1. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
x264 This is a multi-threaded test of the x264 video encoder run on the CPU with a choice of 1080p or 4K video input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2022-02-22 Video Input: Bosphorus 4K No SME AMD SME Enabled 20 40 60 80 100 SE +/- 1.42, N = 3 SE +/- 0.62, N = 3 106.86 103.07 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -flto
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K No SME AMD SME Enabled 6 12 18 24 30 SE +/- 0.29, N = 4 SE +/- 0.17, N = 3 23.48 23.29 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction No SME AMD SME Enabled 0.9955 1.991 2.9865 3.982 4.9775 SE +/- 0.01135008, N = 3 SE +/- 0.04122391, N = 3 4.37420527 4.42424568 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Monero - Hash Count: 1M No SME AMD SME Enabled 20K 40K 60K 80K 100K SE +/- 111.86, N = 3 SE +/- 540.08, N = 3 105141.7 101932.1 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.18.1 Variant: Wownero - Hash Count: 1M No SME AMD SME Enabled 30K 60K 90K 120K 150K SE +/- 211.86, N = 3 SE +/- 341.44, N = 3 126508.3 123484.1 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed No SME AMD SME Enabled 12 24 36 48 60 SE +/- 1.03, N = 15 SE +/- 0.70, N = 3 52.9 49.9 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed No SME AMD SME Enabled 800 1600 2400 3200 4000 SE +/- 14.95, N = 15 SE +/- 1.04, N = 3 3825.0 3837.3 1. (CC) gcc options: -O3 -pthread -lz -llzma
No SME Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 20 December 2022 18:43 by user phoronix.
AMD SME Enabled Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.10, Kernel: 6.1.0-phx (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 12.2.0 + Clang 15.0.2-1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110dJava Notes: OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu2)Python Notes: Python 3.10.7Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 19 December 2022 16:26 by user phoronix.