amazon testing on Ubuntu 22.04 via the Phoronix Test Suite by michael larabel for a future article.
armv8.4-a Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a" CFLAGS="-O3 -march=armv8.4-a"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
armv8.4-a+sve Processor: ARMv8 Neoverse-V1 (32 Cores), Motherboard: Amazon EC2 c7g.8xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 62GB, Disk: 301GB Amazon Elastic Block Store, Network: Amazon Elastic
OS: Ubuntu 22.04, Kernel: 5.15.0-1004-aws (aarch64), Compiler: GCC 12.0.0 20220117, File-System: ext4, System Layer: amazon
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a+sve" CFLAGS="-O3 -march=armv8.4-a+sve"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS armv8.4-a+sve armv8.4-a 300 600 900 1200 1500 SE +/- 6.96, N = 3 SE +/- 13.99, N = 5 1297 1281 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -flto -O3 -pthread
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: Eigen armv8.4-a+sve armv8.4-a 300 600 900 1200 1500 SE +/- 14.64, N = 5 SE +/- 13.65, N = 3 1333 1311 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -flto -O3 -pthread
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless armv8.4-a+sve armv8.4-a 6 12 18 24 30 SE +/- 0.18, N = 3 SE +/- 0.01, N = 3 23.40 23.85 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fvisibility=hidden -O3 -lm -ljpeg -lpng16 -ltiff
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.017, N = 3 SE +/- 0.006, N = 3 8.640 8.630 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fvisibility=hidden -O3 -lm -ljpeg -lpng16 -ltiff
GNU GMP GMPbench GMPbench is a test of the GNU Multiple Precision Arithmetic (GMP) Library. GMPbench is a single-threaded integer benchmark that leverages the GMP library to stress the CPU with widening integer multiplication. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GMPbench Score, More Is Better GNU GMP GMPbench 6.2.1 Total Time armv8.4-a+sve armv8.4-a 900 1800 2700 3600 4500 4155.6 4152.3 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -lm
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Monero - Hash Count: 1M armv8.4-a+sve armv8.4-a 2K 4K 6K 8K 10K SE +/- 6.18, N = 3 SE +/- 9.56, N = 3 8669.8 8645.4 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fexceptions -fno-rtti -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Wownero - Hash Count: 1M armv8.4-a+sve armv8.4-a 3K 6K 9K 12K 15K SE +/- 26.88, N = 3 SE +/- 27.59, N = 3 11877.8 11811.2 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fexceptions -fno-rtti -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Compression Speed armv8.4-a+sve armv8.4-a 1500 3000 4500 6000 7500 SE +/- 15.42, N = 3 SE +/- 14.45, N = 3 7027.2 6937.8 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed armv8.4-a+sve armv8.4-a 16 32 48 64 80 SE +/- 0.03, N = 3 SE +/- 0.23, N = 3 74.0 72.9 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed armv8.4-a+sve armv8.4-a 700 1400 2100 2800 3500 SE +/- 6.60, N = 3 SE +/- 8.46, N = 3 3094.8 3083.4 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Compression Speed armv8.4-a+sve armv8.4-a 300 600 900 1200 1500 SE +/- 4.88, N = 3 SE +/- 5.37, N = 3 1242.7 1241.3 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Decompression Speed armv8.4-a+sve armv8.4-a 800 1600 2400 3200 4000 SE +/- 1.28, N = 3 SE +/- 3.95, N = 3 3824.8 3820.8 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed armv8.4-a+sve armv8.4-a 9 18 27 36 45 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 40.3 40.0 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed armv8.4-a+sve armv8.4-a 700 1400 2100 2800 3500 SE +/- 0.59, N = 3 SE +/- 7.62, N = 3 3263.7 3250.9 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lz -llzma
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: PNG - Encode Speed: 7 armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 8.36 8.32 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: PNG - Encode Speed: 8 armv8.4-a+sve armv8.4-a 0.1508 0.3016 0.4524 0.6032 0.754 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.67 0.67 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: JPEG - Encode Speed: 7 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.13, N = 3 79.58 73.21 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: JPEG - Encode Speed: 8 armv8.4-a+sve armv8.4-a 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 27.34 26.30 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -funwind-tables -O2 -fPIE -pie
Nettle GNU Nettle is a low-level cryptographic library used by GnuTLS and other software. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: aes256 armv8.4-a+sve armv8.4-a 1000 2000 3000 4000 5000 SE +/- 0.39, N = 3 SE +/- 3.11, N = 3 4447.04 4435.91 -march=armv8.4-a+sve - MIN: 3925.11 / MAX: 5627.84 -march=armv8.4-a - MIN: 3927.32 / MAX: 5628.86 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: chacha armv8.4-a+sve armv8.4-a 160 320 480 640 800 SE +/- 0.55, N = 3 SE +/- 0.61, N = 3 733.59 740.25 -march=armv8.4-a+sve - MIN: 442.26 / MAX: 956.22 -march=armv8.4-a - MIN: 454.21 / MAX: 956.53 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: sha512 armv8.4-a+sve armv8.4-a 110 220 330 440 550 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 504.33 498.83 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
OpenBenchmarking.org Mbyte/s, More Is Better Nettle 3.8 Test: poly1305-aes armv8.4-a+sve armv8.4-a 200 400 600 800 1000 SE +/- 5.37, N = 3 SE +/- 1.52, N = 3 820.51 871.90 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -ggdb3 -lnettle -lgmp -lm -lcrypto
LuaJIT This test profile is a collection of Lua scripts/benchmarks run against a locally-built copy of LuaJIT upstream. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Composite armv8.4-a+sve armv8.4-a 300 600 900 1200 1500 SE +/- 18.19, N = 3 SE +/- 0.41, N = 3 1309.03 1282.59 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Monte Carlo armv8.4-a+sve armv8.4-a 70 140 210 280 350 SE +/- 0.54, N = 3 SE +/- 0.35, N = 3 343.85 343.27 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Fast Fourier Transform armv8.4-a+sve armv8.4-a 140 280 420 560 700 SE +/- 10.69, N = 3 SE +/- 0.39, N = 3 615.71 661.55 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Sparse Matrix Multiply armv8.4-a+sve armv8.4-a 300 600 900 1200 1500 SE +/- 7.20, N = 3 SE +/- 3.14, N = 3 1162.33 1151.57 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Dense LU Matrix Factorization armv8.4-a+sve armv8.4-a 800 1600 2400 3200 4000 SE +/- 86.66, N = 3 SE +/- 6.02, N = 3 3521.13 3355.53 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
OpenBenchmarking.org Mflops, More Is Better LuaJIT 2.1-git Test: Jacobi Successive Over-Relaxation armv8.4-a+sve armv8.4-a 200 400 600 800 1000 SE +/- 1.00, N = 3 SE +/- 0.68, N = 3 902.16 901.02 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -ldl -O2 -fomit-frame-pointer -O3 -U_FORTIFY_SOURCE -fno-stack-protector
Botan Botan is a BSD-licensed cross-platform open-source C++ crypto library "cryptography toolkit" that supports most publicly known cryptographic algorithms. The project's stated goal is to be "the best option for cryptography in C++ by offering the tools necessary to implement a range of practical systems, such as TLS protocol, X.509 certificates, modern AEAD ciphers, PKCS#11 and TPM hardware support, password hashing, and post quantum crypto schemes." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI armv8.4-a+sve armv8.4-a 14 28 42 56 70 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 62.00 62.02 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI - Decrypt armv8.4-a+sve armv8.4-a 14 28 42 56 70 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 62.26 62.28 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 armv8.4-a+sve armv8.4-a 1200 2400 3600 4800 6000 SE +/- 9.30, N = 3 SE +/- 14.75, N = 3 5442.65 5494.31 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 - Decrypt armv8.4-a+sve armv8.4-a 1200 2400 3600 4800 6000 SE +/- 8.64, N = 3 SE +/- 5.58, N = 3 5474.32 5477.57 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish armv8.4-a+sve armv8.4-a 50 100 150 200 250 SE +/- 0.26, N = 3 SE +/- 0.12, N = 3 248.89 239.70 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish - Decrypt armv8.4-a+sve armv8.4-a 60 120 180 240 300 SE +/- 0.11, N = 3 SE +/- 0.20, N = 3 258.15 246.16 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish armv8.4-a+sve armv8.4-a 60 120 180 240 300 SE +/- 0.10, N = 3 SE +/- 0.29, N = 3 280.57 278.87 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish - Decrypt armv8.4-a+sve armv8.4-a 60 120 180 240 300 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 289.03 288.51 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 108.75 108.79 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 - Decrypt armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 108.62 108.60 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 armv8.4-a+sve armv8.4-a 80 160 240 320 400 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 390.31 389.38 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 - Decrypt armv8.4-a+sve armv8.4-a 80 160 240 320 400 SE +/- 0.13, N = 3 SE +/- 0.02, N = 3 383.95 382.51 1. (CXX) g++ options: -fstack-protector -pthread -lbotan-2 -ldl -lrt
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl armv8.4-a+sve armv8.4-a 300 600 900 1200 1500 SE +/- 1.33, N = 3 SE +/- 0.33, N = 3 1272 1225 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate armv8.4-a+sve armv8.4-a 130 260 390 520 650 SE +/- 1.20, N = 3 SE +/- 0.00, N = 3 611 577 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced armv8.4-a+sve armv8.4-a 160 320 480 640 800 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 718 732 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing armv8.4-a+sve armv8.4-a 500 1000 1500 2000 2500 SE +/- 1.86, N = 3 SE +/- 22.36, N = 3 2414 2339 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian armv8.4-a+sve armv8.4-a 110 220 330 440 550 SE +/- 0.58, N = 3 SE +/- 0.67, N = 3 515 494 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space armv8.4-a+sve armv8.4-a 200 400 600 800 1000 SE +/- 0.33, N = 3 SE +/- 1.00, N = 3 1067 978 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -fopenmp -O3 -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
AOM AV1 OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K armv8.4-a+sve armv8.4-a 14 28 42 56 70 SE +/- 0.61, N = 3 SE +/- 0.43, N = 3 62.19 62.13 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K armv8.4-a+sve armv8.4-a 15 30 45 60 75 SE +/- 0.22, N = 3 SE +/- 0.47, N = 3 65.62 61.88 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p armv8.4-a+sve armv8.4-a 30 60 90 120 150 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 123.95 120.13 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p armv8.4-a+sve armv8.4-a 30 60 90 120 150 SE +/- 0.20, N = 3 SE +/- 0.03, N = 3 156.71 152.46 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.3 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 1080p armv8.4-a+sve armv8.4-a 40 80 120 160 200 SE +/- 0.20, N = 3 SE +/- 0.28, N = 3 193.68 190.27 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm
x264 This is a multi-threaded test of the x264 video encoder run on the CPU with a choice of 1080p or 4K video input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2022-02-22 Video Input: Bosphorus 4K armv8.4-a+sve armv8.4-a 11 22 33 44 55 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 48.51 48.43 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -lm -lpthread -O3 -flto
OpenBenchmarking.org Frames Per Second, More Is Better x264 2022-02-22 Video Input: Bosphorus 1080p armv8.4-a+sve armv8.4-a 40 80 120 160 200 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 169.58 168.92 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -lm -lpthread -O3 -flto
Stockfish This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time armv8.4-a+sve armv8.4-a 12M 24M 36M 48M 60M SE +/- 645132.01, N = 3 SE +/- 721518.45, N = 14 55823340 57485680 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -lgcov -lpthread -O3 -fno-exceptions -std=c++17 -pedantic -flto -fprofile-use -fno-peel-loops -fno-tracer -flto=jobserver
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 44100 - Buffer Size: 512 armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.003428, N = 3 SE +/- 0.002564, N = 3 6.213500 6.072916 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 96000 - Buffer Size: 512 armv8.4-a+sve armv8.4-a 1.0068 2.0136 3.0204 4.0272 5.034 SE +/- 0.002191, N = 3 SE +/- 0.003502, N = 3 4.474848 4.414055 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 44100 - Buffer Size: 1024 armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.002411, N = 3 SE +/- 0.002515, N = 3 6.538163 6.370035 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 480000 - Buffer Size: 512 armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.002030, N = 3 SE +/- 0.001956, N = 3 6.124455 6.005386 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 96000 - Buffer Size: 1024 armv8.4-a+sve armv8.4-a 1.0818 2.1636 3.2454 4.3272 5.409 SE +/- 0.002826, N = 3 SE +/- 0.000903, N = 3 4.807797 4.729848 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 21.10.9 Sample Rate: 480000 - Buffer Size: 1024 armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.002250, N = 3 SE +/- 0.002118, N = 3 6.450182 6.322684 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel armv8.4-a+sve armv8.4-a 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 19.30 19.30 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -lpthread -O3
POV-Ray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time armv8.4-a+sve armv8.4-a 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 20.26 19.85 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -pipe -O3 -ffast-math -R/usr/lib -lXpm -lSM -lICE -lX11 -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5 -lIexMath-2_5 -lIlmThread-2_5 -lIlmThread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
Primesieve Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.7 1e12 Prime Number Generation armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.043, N = 3 SE +/- 0.022, N = 3 8.533 8.438 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3
Smallpt Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 128 Samples armv8.4-a+sve armv8.4-a 0.8766 1.7532 2.6298 3.5064 4.383 SE +/- 0.003, N = 3 SE +/- 0.002, N = 3 3.896 3.895 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -fopenmp -O3
AOBench AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time armv8.4-a+sve armv8.4-a 8 16 24 32 40 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 33.49 33.48 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -lm -O3
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 armv8.4-a+sve armv8.4-a 2 4 6 8 10 SE +/- 0.002, N = 3 SE +/- 0.004, N = 3 7.440 8.054 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -lm
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode armv8.4-a+sve armv8.4-a 5 10 15 20 25 SE +/- 0.02, N = 5 SE +/- 0.00, N = 5 14.40 18.32 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fvisibility=hidden -logg -lm
eSpeak-NG Speech Engine This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis armv8.4-a+sve armv8.4-a 8 16 24 32 40 SE +/- 0.30, N = 20 SE +/- 0.31, N = 16 29.98 36.59 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -std=c99
Ngspice Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C2670 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.18, N = 3 SE +/- 0.06, N = 3 106.91 102.56 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C7552 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 1.04, N = 3 SE +/- 0.51, N = 3 111.64 103.93 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE
RNNoise RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 armv8.4-a+sve armv8.4-a 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 17.39 17.62 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden
OpenJPEG OpenJPEG is an open-source JPEG 2000 codec written in the C programming language. The default input for this test profile is the NASA/JPL-Caltech/MSSS Curiosity panorama 717MB TIFF image file converting to JPEG2000 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OpenJPEG 2.4 Encode: NASA Curiosity Panorama M34 armv8.4-a+sve armv8.4-a 12K 24K 36K 48K 60K SE +/- 89.48, N = 3 SE +/- 19.06, N = 3 55196 57205 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -rdynamic
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 armv8.4-a+sve armv8.4-a 6000M 12000M 18000M 24000M 30000M SE +/- 32102278.66, N = 3 SE +/- 25639974.71, N = 3 27428176880 27603943570 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 armv8.4-a+sve armv8.4-a 1100 2200 3300 4400 5500 SE +/- 0.53, N = 3 SE +/- 0.78, N = 3 5088.1 5090.5 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 armv8.4-a+sve armv8.4-a 80K 160K 240K 320K 400K SE +/- 10.52, N = 3 SE +/- 8.85, N = 3 356407.8 356359.6 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 armv8.4-a+sve armv8.4-a 40M 80M 120M 160M 200M SE +/- 26666.67, N = 3 SE +/- 12018.50, N = 3 167733333 176363333 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 16 - Buffer Length: 256 - Filter Length: 57 armv8.4-a+sve armv8.4-a 80M 160M 240M 320M 400M SE +/- 20275.88, N = 3 SE +/- 30550.50, N = 3 335423333 352700000 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 32 - Buffer Length: 256 - Filter Length: 57 armv8.4-a+sve armv8.4-a 150M 300M 450M 600M 750M SE +/- 1978807.16, N = 3 SE +/- 125476.87, N = 3 668636667 705233333 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare armv8.4-a+sve armv8.4-a 0.5123 1.0246 1.5369 2.0492 2.5615 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 2.275 2.277 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Medium armv8.4-a+sve armv8.4-a 1.0987 2.1974 3.2961 4.3948 5.4935 SE +/- 0.0080, N = 3 SE +/- 0.0125, N = 3 4.8092 4.8833 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Thorough armv8.4-a+sve armv8.4-a 3 6 9 12 15 SE +/- 0.0027, N = 3 SE +/- 0.0046, N = 3 9.0131 9.1435 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Exhaustive armv8.4-a+sve armv8.4-a 8 16 24 32 40 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 35.19 35.36 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -flto -pthread
Google Draco Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Lion armv8.4-a+sve armv8.4-a 1100 2200 3300 4400 5500 SE +/- 2.40, N = 3 SE +/- 2.65, N = 3 5309 5354 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Church Facade armv8.4-a+sve armv8.4-a 2K 4K 6K 8K 10K SE +/- 7.00, N = 3 SE +/- 6.64, N = 3 7843 7935 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET armv8.4-a+sve armv8.4-a 500K 1000K 1500K 2000K 2500K SE +/- 9056.40, N = 3 SE +/- 1605.36, N = 3 2513289.20 2523377.92 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SET armv8.4-a+sve armv8.4-a 400K 800K 1200K 1600K 2000K SE +/- 7427.93, N = 3 SE +/- 794.78, N = 3 1861924.13 1865840.13 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 armv8.4-a+sve armv8.4-a 9K 18K 27K 36K 45K SE +/- 12.55, N = 3 SE +/- 31.22, N = 3 43931 43634 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 armv8.4-a+sve armv8.4-a 30K 60K 90K 120K 150K SE +/- 105.70, N = 3 SE +/- 49.72, N = 3 125125 123807 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet armv8.4-a+sve armv8.4-a 600 1200 1800 2400 3000 SE +/- 22.16, N = 3 SE +/- 12.63, N = 3 2346.32 2730.40 -march=armv8.4-a+sve - MIN: 2268.4 / MAX: 2446.6 -march=armv8.4-a - MIN: 2665.42 / MAX: 2834.73 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 armv8.4-a+sve armv8.4-a 60 120 180 240 300 SE +/- 0.69, N = 3 SE +/- 0.10, N = 3 280.24 260.78 -march=armv8.4-a+sve - MIN: 277.9 / MAX: 282.31 -march=armv8.4-a - MIN: 259.13 / MAX: 262.38 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 armv8.4-a+sve armv8.4-a 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 76.30 71.13 -march=armv8.4-a+sve - MIN: 76.07 / MAX: 76.53 -march=armv8.4-a - MIN: 70.76 / MAX: 71.58 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 armv8.4-a+sve armv8.4-a 60 120 180 240 300 SE +/- 0.14, N = 3 SE +/- 0.08, N = 3 205.80 257.70 -march=armv8.4-a+sve - MIN: 205.46 / MAX: 206.28 -march=armv8.4-a - MIN: 256.95 / MAX: 258.39 1. (CXX) g++ options: -O3 -fopenmp -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl
Sysbench This is a benchmark of Sysbench with the built-in CPU and memory sub-tests. Sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU armv8.4-a+sve armv8.4-a 20K 40K 60K 80K 100K SE +/- 2.72, N = 3 SE +/- 8.50, N = 3 96666.76 96726.40 -march=armv8.4-a+sve -march=armv8.4-a 1. (CC) gcc options: -O2 -funroll-loops -O3 -rdynamic -ldl -laio -lm
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 3K 6K 9K 12K 15K SE +/- 12.91, N = 3 SE +/- 63.90, N = 3 12317 12364 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 170 340 510 680 850 SE +/- 0.50, N = 3 SE +/- 0.44, N = 3 772 773 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 16 32 48 64 80 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 73 73 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 200 400 600 800 1000 SE +/- 0.17, N = 3 SE +/- 0.88, N = 3 935 938 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard armv8.4-a+sve armv8.4-a 1200 2400 3600 4800 6000 SE +/- 2.17, N = 3 SE +/- 0.93, N = 3 5411 5413 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -ffunction-sections -fdata-sections -march=native -mtune=native -flto -fno-fat-lto-objects -ldl -lrt
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 armv8.4-a+sve armv8.4-a 40M 80M 120M 160M 200M SE +/- 298633.53, N = 3 SE +/- 226703.11, N = 3 192709233 204143167 -march=armv8.4-a+sve -march=armv8.4-a 1. (CXX) g++ options: -O3 -fopenmp
armv8.4-a Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a" CFLAGS="-O3 -march=armv8.4-a"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 31 December 1969 19:00 by user ubuntu.
armv8.4-a+sve Processor: ARMv8 Neoverse-V1 (32 Cores), Motherboard: Amazon EC2 c7g.8xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 62GB, Disk: 301GB Amazon Elastic Block Store, Network: Amazon Elastic
OS: Ubuntu 22.04, Kernel: 5.15.0-1004-aws (aarch64), Compiler: GCC 12.0.0 20220117, File-System: ext4, System Layer: amazon
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=armv8.4-a+sve" CFLAGS="-O3 -march=armv8.4-a+sve"Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-nls --disable-werror --enable-checking=yes,extra,rtl --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-objc-gc=auto --enable-plugin --enable-shared --host=aarch64-linux-gnu --program-prefix= --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 31 December 1969 19:00 by user ubuntu.