amazon testing on Ubuntu 22.04 via the Phoronix Test Suite by michael larabel for a future article.
armv8.4-a Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a" CFLAGS="-O3 -march=armv8.4-a"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
armv8.4-a+sve Processor: ARMv8 Neoverse-V1 (32 Cores), Motherboard: Amazon EC2 c7g.8xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 62GB, Disk: 301GB Amazon Elastic Block Store, Network: Amazon Elastic
OS: Ubuntu 22.04, Kernel: 5.15.0-1004-aws (aarch64), Compiler: GCC 12.0.0 20220117, File-System: ext4, System Layer: amazon
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a+sve" CFLAGS="-O3 -march=armv8.4-a+sve"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Nettle GNU Nettle is a low-level cryptographic library used by GnuTLS and other software. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: aes256 armv8.4-a armv8.4-a+sve 1000 2000 3000 4000 5000 SE +/- 3.11, N = 3 SE +/- 0.39, N = 3 4435.91 4447.04 -march=armv8.4-a - MIN: 3927.32 / MAX: 5628.86 -march=armv8.4-a+sve - MIN: 3925.11 / MAX: 5627.84 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: chacha armv8.4-a+sve armv8.4-a 160 320 480 640 800 SE +/- 0.55, N = 3 SE +/- 0.61, N = 3 733.59 740.25 -march=armv8.4-a+sve - MIN: 442.26 / MAX: 956.22 -march=armv8.4-a - MIN: 454.21 / MAX: 956.53 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: sha512 armv8.4-a armv8.4-a+sve 110 220 330 440 550 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 498.83 504.33 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: poly1305-aes armv8.4-a+sve armv8.4-a 200 400 600 800 1000 SE +/- 5.37, N = 3 SE +/- 1.52, N = 3 820.51 871.90 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
Botan Botan is a BSD-licensed cross-platform open-source C++ crypto library "cryptography toolkit" that supports most publicly known cryptographic algorithms. The project's stated goal is to be "the best option for cryptography in C++ by offering the tools necessary to implement a range of practical systems, such as TLS protocol, X.509 certificates, modern AEAD ciphers, PKCS#11 and TPM hardware support, password hashing, and post quantum crypto schemes." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI armv8.4-a+sve armv8.4-a 14 28 42 56 70 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 62.00 62.02 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI - Decrypt armv8.4-a+sve armv8.4-a 14 28 42 56 70 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 62.26 62.28 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 armv8.4-a+sve armv8.4-a 1200 2400 3600 4800 6000 SE +/- 9.30, N = 3 SE +/- 14.75, N = 3 5442.65 5494.31 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 - Decrypt armv8.4-a+sve armv8.4-a 1200 2400 3600 4800 6000 SE +/- 8.64, N = 3 SE +/- 5.58, N = 3 5474.32 5477.57 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish armv8.4-a armv8.4-a+sve 50 100 150 200 250 SE +/- 0.12, N = 3 SE +/- 0.26, N = 3 239.70 248.89 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish - Decrypt armv8.4-a armv8.4-a+sve 60 120 180 240 300 SE +/- 0.20, N = 3 SE +/- 0.11, N = 3 246.16 258.15 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish armv8.4-a armv8.4-a+sve 60 120 180 240 300 SE +/- 0.29, N = 3 SE +/- 0.10, N = 3 278.87 280.57 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish - Decrypt armv8.4-a armv8.4-a+sve 60 120 180 240 300 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 288.51 289.03 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 108.75 108.79 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 - Decrypt armv8.4-a armv8.4-a+sve 20 40 60 80 100 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 108.60 108.62 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 armv8.4-a armv8.4-a+sve 80 160 240 320 400 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 389.38 390.31 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 - Decrypt armv8.4-a armv8.4-a+sve 80 160 240 320 400 SE +/- 0.02, N = 3 SE +/- 0.13, N = 3 382.51 383.95 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 armv8.4-a armv8.4-a+sve 2 4 6 8 10 SE +/- 0.004, N = 3 SE +/- 0.002, N = 3 8.054 7.440 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -lm
Ngspice Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C2670 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.18, N = 3 SE +/- 0.06, N = 3 106.91 102.56 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C7552 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 1.04, N = 3 SE +/- 0.51, N = 3 111.64 103.93 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode armv8.4-a armv8.4-a+sve 5 10 15 20 25 SE +/- 0.00, N = 5 SE +/- 0.02, N = 5 18.32 14.40 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -fvisibility=hidden -logg -lm
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 44100 - Buffer Size: 512 armv8.4-a armv8.4-a+sve 2 4 6 8 10 SE +/- 0.002564, N = 3 SE +/- 0.003428, N = 3 6.072916 6.213500 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 96000 - Buffer Size: 512 armv8.4-a armv8.4-a+sve 1.0068 2.0136 3.0204 4.0272 5.034 SE +/- 0.003502, N = 3 SE +/- 0.002191, N = 3 4.414055 4.474848 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 44100 - Buffer Size: 1024 armv8.4-a armv8.4-a+sve 2 4 6 8 10 SE +/- 0.002515, N = 3 SE +/- 0.002411, N = 3 6.370035 6.538163 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 480000 - Buffer Size: 512 armv8.4-a armv8.4-a+sve 2 4 6 8 10 SE +/- 0.001956, N = 3 SE +/- 0.002030, N = 3 6.005386 6.124455 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 96000 - Buffer Size: 1024 armv8.4-a armv8.4-a+sve 1.0818 2.1636 3.2454 4.3272 5.409 SE +/- 0.000903, N = 3 SE +/- 0.002826, N = 3 4.729848 4.807797 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 480000 - Buffer Size: 1024 armv8.4-a armv8.4-a+sve 2 4 6 8 10 SE +/- 0.002118, N = 3 SE +/- 0.002250, N = 3 6.322684 6.450182 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Medium armv8.4-a armv8.4-a+sve 1.0987 2.1974 3.2961 4.3948 5.4935 SE +/- 0.0125, N = 3 SE +/- 0.0080, N = 3 4.8833 4.8092 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Thorough armv8.4-a armv8.4-a+sve 3 6 9 12 15 SE +/- 0.0046, N = 3 SE +/- 0.0027, N = 3 9.1435 9.0131 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Exhaustive armv8.4-a armv8.4-a+sve 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 35.36 35.19 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -flto -pthread
Google Draco Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Lion armv8.4-a armv8.4-a+sve 1100 2200 3300 4400 5500 SE +/- 2.65, N = 3 SE +/- 2.40, N = 3 5354 5309 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Church Facade armv8.4-a armv8.4-a+sve 2K 4K 6K 8K 10K SE +/- 6.64, N = 3 SE +/- 7.00, N = 3 7935 7843 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: PNG - Encode Speed: 7 armv8.4-a armv8.4-a+sve 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 8.32 8.36 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: PNG - Encode Speed: 8 armv8.4-a armv8.4-a+sve 0.1508 0.3016 0.4524 0.6032 0.754 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.67 0.67 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: JPEG - Encode Speed: 7 armv8.4-a armv8.4-a+sve 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 73.21 79.58 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: JPEG - Encode Speed: 8 armv8.4-a armv8.4-a+sve 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 26.30 27.34 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenJPEG OpenJPEG is an open-source JPEG 2000 codec written in the C programming language. The default input for this test profile is the NASA/JPL-Caltech/MSSS Curiosity panorama 717MB TIFF image file converting to JPEG2000 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OpenJPEG 2.4 Encode: NASA Curiosity Panorama M34 armv8.4-a armv8.4-a+sve 12K 24K 36K 48K 60K SE +/- 19.06, N = 3 SE +/- 89.48, N = 3 57205 55196 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -rdynamic
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless armv8.4-a armv8.4-a+sve 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.18, N = 3 23.85 23.40 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -fvisibility=hidden -O3 -lm -ljpeg -lpng16 -ltiff
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.017, N = 3 SE +/- 0.006, N = 3 8.640 8.630 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fvisibility=hidden -O3 -lm -ljpeg -lpng16 -ltiff
LuaJIT This test profile is a collection of Lua scripts/benchmarks run against a locally-built copy of LuaJIT upstream. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Composite armv8.4-a armv8.4-a+sve 300 600 900 1200 1500 SE +/- 0.41, N = 3 SE +/- 18.19, N = 3 1282.59 1309.03 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Monte Carlo armv8.4-a armv8.4-a+sve 70 140 210 280 350 SE +/- 0.35, N = 3 SE +/- 0.54, N = 3 343.27 343.85 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Fast Fourier Transform armv8.4-a+sve armv8.4-a 140 280 420 560 700 SE +/- 10.69, N = 3 SE +/- 0.39, N = 3 615.71 661.55 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Sparse Matrix Multiply armv8.4-a armv8.4-a+sve 300 600 900 1200 1500 SE +/- 3.14, N = 3 SE +/- 7.20, N = 3 1151.57 1162.33 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Dense LU Matrix Factorization armv8.4-a armv8.4-a+sve 800 1600 2400 3200 4000 SE +/- 6.02, N = 3 SE +/- 86.66, N = 3 3355.53 3521.13 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Jacobi Successive Over-Relaxation armv8.4-a armv8.4-a+sve 200 400 600 800 1000 SE +/- 0.68, N = 3 SE +/- 1.00, N = 3 901.02 902.16 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
eSpeak-NG Speech Engine This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis armv8.4-a armv8.4-a+sve 8 16 24 32 40 SE +/- 0.31, N = 16 SE +/- 0.30, N = 20 36.59 29.98 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -std=c99
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Monero - Hash Count: 1M armv8.4-a armv8.4-a+sve 2K 4K 6K 8K 10K SE +/- 9.56, N = 3 SE +/- 6.18, N = 3 8645.4 8669.8 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -fexceptions -fno-rtti -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Wownero - Hash Count: 1M armv8.4-a armv8.4-a+sve 3K 6K 9K 12K 15K SE +/- 27.59, N = 3 SE +/- 26.88, N = 3 11811.2 11877.8 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -fexceptions -fno-rtti -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS armv8.4-a armv8.4-a+sve 300 600 900 1200 1500 SE +/- 13.99, N = 5 SE +/- 6.96, N = 3 1281 1297 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -flto -O3 -pthread
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: Eigen armv8.4-a armv8.4-a+sve 300 600 900 1200 1500 SE +/- 13.65, N = 3 SE +/- 14.64, N = 5 1311 1333 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -flto -O3 -pthread
RNNoise RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 armv8.4-a armv8.4-a+sve 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 17.62 17.39 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 3K 6K 9K 12K 15K SE +/- 12.91, N = 3 SE +/- 63.90, N = 3 12317 12364 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 170 340 510 680 850 SE +/- 0.50, N = 3 SE +/- 0.44, N = 3 772 773 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard armv8.4-a armv8.4-a+sve 16 32 48 64 80 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 73 73 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 200 400 600 800 1000 SE +/- 0.17, N = 3 SE +/- 0.88, N = 3 935 938 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 1200 2400 3600 4800 6000 SE +/- 2.17, N = 3 SE +/- 0.93, N = 3 5411 5413 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet armv8.4-a armv8.4-a+sve 600 1200 1800 2400 3000 SE +/- 12.63, N = 3 SE +/- 22.16, N = 3 2730.40 2346.32 -march=armv8.4-a - MIN: 2665.42 / MAX: 2834.73 -march=armv8.4-a+sve - MIN: 2268.4 / MAX: 2446.6 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 armv8.4-a+sve armv8.4-a 60 120 180 240 300 SE +/- 0.69, N = 3 SE +/- 0.10, N = 3 280.24 260.78 -march=armv8.4-a+sve - MIN: 277.9 / MAX: 282.31 -march=armv8.4-a - MIN: 259.13 / MAX: 262.38 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 76.30 71.13 -march=armv8.4-a+sve - MIN: 76.07 / MAX: 76.53 -march=armv8.4-a - MIN: 70.76 / MAX: 71.58 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 armv8.4-a armv8.4-a+sve 60 120 180 240 300 SE +/- 0.08, N = 3 SE +/- 0.14, N = 3 257.70 205.80 -march=armv8.4-a - MIN: 256.95 / MAX: 258.39 -march=armv8.4-a+sve - MIN: 205.46 / MAX: 206.28 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 armv8.4-a+sve armv8.4-a 9K 18K 27K 36K 45K SE +/- 12.55, N = 3 SE +/- 31.22, N = 3 43931 43634 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 armv8.4-a+sve armv8.4-a 30K 60K 90K 120K 150K SE +/- 105.70, N = 3 SE +/- 49.72, N = 3 125125 123807 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare armv8.4-a+sve armv8.4-a 0.5123 1.0246 1.5369 2.0492 2.5615 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 2.275 2.277 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 armv8.4-a+sve armv8.4-a 40M 80M 120M 160M 200M SE +/- 298633.53, N = 3 SE +/- 226703.11, N = 3 192709233 204143167 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fopenmp
Primesieve Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.7 1e12 Prime Number Generation armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.043, N = 3 SE +/- 0.022, N = 3 8.533 8.438 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3
Stockfish This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time armv8.4-a+sve armv8.4-a 12M 24M 36M 48M 60M SE +/- 645132.01, N = 3 SE +/- 721518.45, N = 14 55823340 57485680 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -lgcov -lpthread -O3 -fno-exceptions -std=c++17 -pedantic -flto -fprofile-use -fno-peel-loops -fno-tracer -flto=jobserver
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Compression Speed armv8.4-a armv8.4-a+sve 1500 3000 4500 6000 7500 SE +/- 14.45, N = 3 SE +/- 15.42, N = 3 6937.8 7027.2 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed armv8.4-a armv8.4-a+sve 16 32 48 64 80 SE +/- 0.23, N = 3 SE +/- 0.03, N = 3 72.9 74.0 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed armv8.4-a armv8.4-a+sve 700 1400 2100 2800 3500 SE +/- 8.46, N = 3 SE +/- 6.60, N = 3 3083.4 3094.8 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Compression Speed armv8.4-a armv8.4-a+sve 300 600 900 1200 1500 SE +/- 5.37, N = 3 SE +/- 4.88, N = 3 1241.3 1242.7 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Decompression Speed armv8.4-a armv8.4-a+sve 800 1600 2400 3200 4000 SE +/- 3.95, N = 3 SE +/- 1.28, N = 3 3820.8 3824.8 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed armv8.4-a armv8.4-a+sve 9 18 27 36 45 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 40.0 40.3 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed armv8.4-a armv8.4-a+sve 700 1400 2100 2800 3500 SE +/- 7.62, N = 3 SE +/- 0.59, N = 3 3250.9 3263.7 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -pthread -lz -llzma
Sysbench This is a benchmark of Sysbench with the built-in CPU and memory sub-tests. Sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU armv8.4-a+sve armv8.4-a 20K 40K 60K 80K 100K SE +/- 2.72, N = 3 SE +/- 8.50, N = 3 96666.76 96726.40 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O2 -funroll-loops -O3 -rdynamic -ldl -laio -lm
AOM AV1 OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K armv8.4-a armv8.4-a+sve 14 28 42 56 70 SE +/- 0.43, N = 3 SE +/- 0.61, N = 3 62.13 62.19 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K armv8.4-a armv8.4-a+sve 15 30 45 60 75 SE +/- 0.47, N = 3 SE +/- 0.22, N = 3 61.88 65.62 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p armv8.4-a armv8.4-a+sve 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 120.13 123.95 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p armv8.4-a armv8.4-a+sve 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.20, N = 3 152.46 156.71 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 1080p armv8.4-a armv8.4-a+sve 40 80 120 160 200 SE +/- 0.28, N = 3 SE +/- 0.20, N = 3 190.27 193.68 -march=armv8.4-a -march=armv8.4-a+sve 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
AOBench AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time armv8.4-a+sve armv8.4-a 8 16 24 32 40 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 33.49 33.48 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -O3
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl armv8.4-a armv8.4-a+sve 300 600 900 1200 1500 SE +/- 0.33, N = 3 SE +/- 1.33, N = 3 1225 1272 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate armv8.4-a armv8.4-a+sve 130 260 390 520 650 SE +/- 0.00, N = 3 SE +/- 1.20, N = 3 577 611 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced armv8.4-a+sve armv8.4-a 160 320 480 640 800 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 718 732 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing armv8.4-a armv8.4-a+sve 500 1000 1500 2000 2500 SE +/- 22.36, N = 3 SE +/- 1.86, N = 3 2339 2414 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian armv8.4-a armv8.4-a+sve 110 220 330 440 550 SE +/- 0.67, N = 3 SE +/- 0.58, N = 3 494 515 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space armv8.4-a armv8.4-a+sve 200 400 600 800 1000 SE +/- 1.00, N = 3 SE +/- 0.33, N = 3 978 1067 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
x264 This is a multi-threaded test of the x264 video encoder run on the CPU with a choice of 1080p or 4K video input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2022-02-22 Video Input: Bosphorus 4K armv8.4-a armv8.4-a+sve 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 48.43 48.51 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -lm -lpthread -O3 -flto
OpenBenchmarking.org Frames Per Second, More Is Better x264 2022-02-22 Video Input: Bosphorus 1080p armv8.4-a armv8.4-a+sve 40 80 120 160 200 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 168.92 169.58 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -lm -lpthread -O3 -flto
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel armv8.4-a+sve armv8.4-a 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 19.30 19.30 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -lpthread -O3
POV-Ray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time armv8.4-a+sve armv8.4-a 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 20.26 19.85 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -pipe -O3 -ffast-math -R/usr/lib -lXpm -lSM -lICE -lX11 -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5 -lIexMath-2_5 -lIlmThread-2_5 -lIlmThread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
Smallpt Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 128 Samples armv8.4-a+sve armv8.4-a 0.8766 1.7532 2.6298 3.5064 4.383 SE +/- 0.003, N = 3 SE +/- 0.002, N = 3 3.896 3.895 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -fopenmp -O3
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 armv8.4-a+sve armv8.4-a 40M 80M 120M 160M 200M SE +/- 26666.67, N = 3 SE +/- 12018.50, N = 3 167733333 176363333 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 16 - Buffer Length: 256 - Filter Length: 57 armv8.4-a+sve armv8.4-a 80M 160M 240M 320M 400M SE +/- 20275.88, N = 3 SE +/- 30550.50, N = 3 335423333 352700000 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 32 - Buffer Length: 256 - Filter Length: 57 armv8.4-a+sve armv8.4-a 150M 300M 450M 600M 750M SE +/- 1978807.16, N = 3 SE +/- 125476.87, N = 3 668636667 705233333 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 armv8.4-a+sve armv8.4-a 6000M 12000M 18000M 24000M 30000M SE +/- 32102278.66, N = 3 SE +/- 25639974.71, N = 3 27428176880 27603943570 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 armv8.4-a+sve armv8.4-a 1100 2200 3300 4400 5500 SE +/- 0.53, N = 3 SE +/- 0.78, N = 3 5088.1 5090.5 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 armv8.4-a armv8.4-a+sve 80K 160K 240K 320K 400K SE +/- 8.85, N = 3 SE +/- 10.52, N = 3 356359.6 356407.8 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET armv8.4-a+sve armv8.4-a 500K 1000K 1500K 2000K 2500K SE +/- 9056.40, N = 3 SE +/- 1605.36, N = 3 2513289.20 2523377.92 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SET armv8.4-a+sve armv8.4-a 400K 800K 1200K 1600K 2000K SE +/- 7427.93, N = 3 SE +/- 794.78, N = 3 1861924.13 1865840.13 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
GNU GMP GMPbench GMPbench is a test of the GNU Multiple Precision Arithmetic (GMP) Library. GMPbench is a single-threaded integer benchmark that leverages the GMP library to stress the CPU with widening integer multiplication. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GMPbench Score, More Is Better GNU GMP GMPbench 6.2.1 Total Time armv8.4-a armv8.4-a+sve 900 1800 2700 3600 4500 4152.3 4155.6 -march=armv8.4-a -march=armv8.4-a+sve 1. (CC) gcc options: -O3 -lm
armv8.4-a Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a" CFLAGS="-O3 -march=armv8.4-a"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 31 December 1969 19:00 by user ubuntu.
armv8.4-a+sve Processor: ARMv8 Neoverse-V1 (32 Cores), Motherboard: Amazon EC2 c7g.8xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 62GB, Disk: 301GB Amazon Elastic Block Store, Network: Amazon Elastic
OS: Ubuntu 22.04, Kernel: 5.15.0-1004-aws (aarch64), Compiler: GCC 12.0.0 20220117, File-System: ext4, System Layer: amazon
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a+sve" CFLAGS="-O3 -march=armv8.4-a+sve"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 31 December 1969 19:00 by user ubuntu.