Apple M1 compiler testing for a future article.
GCC 11.2.0 Processor: Apple M1 @ 2.06GHz (4 Cores / 8 Threads), Motherboard: Apple Mac mini (M1 2020), Memory: 8GB, Disk: 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q, Graphics: llvmpipe, Network: Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69
OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: GCC 11.2.0 + Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Compiler Notes: --build=aarch64-unknown-linux-gnu --disable-libssp --disable-libstdcxx-pch --disable-multilib --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-unknown-linux-gnu --mandir=/usr/share/man --with-arch=armv8-a --with-isl --with-linker-hash-style=gnuDisk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Clang 13.0.1 OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Disk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Apple M1 Compilers OpenBenchmarking.org Phoronix Test Suite Apple M1 @ 2.06GHz (4 Cores / 8 Threads) Apple Mac mini (M1 2020) 8GB 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q llvmpipe Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69 Arch Linux ARM 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64) KDE Plasma 5.24.4 X Server 1.21.1.3 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits) GCC 11.2.0 + Clang 13.0.1 Clang 13.0.1 ext4 1920x1080 Processor Motherboard Memory Disk Graphics Network OS Kernel Desktop Display Server OpenGL Compilers File-System Screen Resolution Apple M1 Compilers Performance System Logs - CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto" - GCC 11.2.0: --build=aarch64-unknown-linux-gnu --disable-libssp --disable-libstdcxx-pch --disable-multilib --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-unknown-linux-gnu --mandir=/usr/share/man --with-arch=armv8-a --with-isl --with-linker-hash-style=gnu - MQ-DEADLINE / relatime,rw / Block Size: 4096 - Scaling Governor: apple-cpufreq schedutil - Python 3.10.4 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
GCC 11.2.0 vs. Clang 13.0.1 Comparison Phoronix Test Suite Baseline +43.2% +43.2% +86.4% +86.4% +129.6% +129.6% 74.9% 35.4% 31.8% 31.7% 31.7% 30% 28.3% 19.6% 15.4% 6.1% 5.7% 5.1% 4.8% 4.3% 3.9% 3.6% 3.3% 3.2% 3.2% 3.1% 3.1% 3.1% 2.7% 2.2% CPU - resnet50 172.7% CPU - alexnet 168.1% CPU - resnet18 148.7% CPU - vgg16 138.1% CPU - mnasnet 132.5% CPU - efficientnet-b0 127.5% CPU-v2-v2 - mobilenet-v2 123.8% CPU-v3-v3 - mobilenet-v3 106% CPU - yolov4-tiny 90.5% CPU - googlenet 88.5% Vector Math CPU - shufflenet-v2 73.3% Unkeyed Algorithms 59.4% CPU - mobilenet 54.2% CPU - squeezenet_ssd 41.3% CPU - regnety_400m 37.4% Total Time - 4.1.R.P.P 36.3% Keyed Algorithms 35.7% Memory Copying 4 - 256 - 57 2 - 256 - 57 1 - 256 - 57 8 - 256 - 57 Matrix Math 2048 x 2048 - Total Time 21.6% CoreMark Size 666 - I.P.S 21.3% WAV To FLAC All Algorithms 16% Trace Time 2 12.7% WAV To WavPack 12.3% WAV To MP3 12.2% I.E.C.P.K.A P.P.S 5.8% 3, Long Mode - Compression Speed 5.6% SHA256 T.T.S.S 5.1% 6, Lossless 6 D.T 4.2% 19 - D.S 9 - D.S 3.6% 3 - D.S 3.6% N.C.P.M 3 - D.S 19, Long Mode - D.S 3, Long Mode - D.S 8 - Compression Speed 3.1% 10, Lossless 8, Long Mode - D.S 8 - D.S Timed Time - Size 1,000 3% Eigen 19 - Compression Speed NCNN NCNN NCNN NCNN NCNN NCNN NCNN NCNN NCNN NCNN Stress-NG NCNN Crypto++ NCNN NCNN NCNN C-Ray Crypto++ Stress-NG Liquid-DSP Liquid-DSP Liquid-DSP Liquid-DSP Stress-NG AOBench Coremark FLAC Audio Encoding Crypto++ POV-Ray libavif avifenc WavPack Audio Encoding LAME MP3 Encoding Crypto++ Himeno Benchmark Zstd Compression libavif avifenc OpenSSL eSpeak-NG Speech Engine libavif avifenc libavif avifenc libjpeg-turbo tjbench Zstd Compression LZ4 Compression LZ4 Compression OpenJPEG Zstd Compression Zstd Compression Zstd Compression Zstd Compression libavif avifenc Zstd Compression Zstd Compression SQLite Speedtest LeelaChessZero Zstd Compression GCC 11.2.0 Clang 13.0.1
Apple M1 Compilers cryptopp: All Algorithms cryptopp: Keyed Algorithms cryptopp: Unkeyed Algorithms cryptopp: Integer + Elliptic Curve Public Key Algorithms compress-lz4: 1 - Compression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 3 - Compression Speed compress-lz4: 3 - Decompression Speed compress-lz4: 9 - Compression Speed compress-lz4: 9 - Decompression Speed stress-ng: Crypto stress-ng: IO_uring stress-ng: Matrix Math stress-ng: Vector Math stress-ng: Memory Copying stress-ng: Socket Activity encode-flac: WAV To FLAC encode-mp3: WAV To MP3 tjbench: Decompression Throughput encode-wavpack: WAV To WavPack draco: Lion draco: Church Facade openjpeg: NASA Curiosity Panorama M34 espeak: Text-To-Speech Synthesis xmrig: Monero - 1M xmrig: Wownero - 1M himeno: Poisson Pressure Solver lczero: Eigen ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m coremark: CoreMark Size 666 - Iterations Per Second primesieve: 1e12 Prime Number Generation compress-zstd: 3 - Compression Speed compress-zstd: 3 - Decompression Speed compress-zstd: 8 - Compression Speed compress-zstd: 8 - Decompression Speed compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed compress-zstd: 3, Long Mode - Compression Speed compress-zstd: 3, Long Mode - Decompression Speed compress-zstd: 8, Long Mode - Compression Speed compress-zstd: 8, Long Mode - Decompression Speed compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19, Long Mode - Decompression Speed aobench: 2048 x 2048 - Total Time c-ray: Total Time - 4K, 16 Rays Per Pixel povray: Trace Time avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 6, Lossless avifenc: 10, Lossless liquid-dsp: 1 - 256 - 57 liquid-dsp: 2 - 256 - 57 liquid-dsp: 4 - 256 - 57 liquid-dsp: 8 - 256 - 57 openssl: SHA256 openssl: RSA4096 openssl: RSA4096 sqlite-speedtest: Timed Time - Size 1,000 GCC 11.2.0 Clang 13.0.1 954.956113 508.836448 539.281827 1766.985880 21909.45 27018.5 51.99 17490.9 48.94 17478.5 1511.75 144281.67 23588.96 23954.10 2763.25 4331.71 70.648 7.239 206.177350 17.205 3747 5649 53890 22.289 2247.2 2798.2 7577.316534 1263 14.40 2.61 2.34 2.17 2.52 4.18 13.32 33.78 7.31 11.81 17.16 17.20 14.26 5.88 179896.599411 29.118 3341.2 3850.2 721.5 4016.4 22.7 3546.2 240.0 4221.1 693.0 4416.3 18.8 3765.4 27.458 64.437 72.017 287.397 143.442 14.094 15.653 6.070 28778667 57611000 115230000 151120000 8059691050 1408.5 99370.5 51.372 823.153532 374.896175 338.408369 1875.523520 21875.54 26736.4 51.32 16877.4 49.89 16863.3 1527.17 147040.98 30254.21 41899.94 3741.17 4313.48 59.074 8.124 197.945225 19.320 3772 5722 52024 23.429 2209.7 2804.8 7158.970486 1297 22.21 5.84 4.82 3.76 5.86 9.51 25.11 80.44 18.18 31.66 46.79 32.77 20.15 8.08 148361.362440 29.626 3301.1 3977.7 699.6 4141.0 23.2 3684.3 253.7 4356.6 703.4 4553.4 18.8 3887.3 33.402 87.824 62.416 303.550 161.612 13.516 14.929 5.887 37897333 75898667 151820000 196510000 8474527350 1391.4 99445.4 52.900 OpenBenchmarking.org
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Keyed Algorithms Clang 13.0.1 GCC 11.2.0 110 220 330 440 550 SE +/- 1.08, N = 3 SE +/- 0.07, N = 3 374.90 508.84 1. (CXX) g++ options: -O3 -flto -fPIC -pthread -pipe
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Unkeyed Algorithms Clang 13.0.1 GCC 11.2.0 120 240 360 480 600 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 338.41 539.28 1. (CXX) g++ options: -O3 -flto -fPIC -pthread -pipe
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Integer + Elliptic Curve Public Key Algorithms GCC 11.2.0 Clang 13.0.1 400 800 1200 1600 2000 SE +/- 0.67, N = 3 SE +/- 1.78, N = 3 1766.99 1875.52 1. (CXX) g++ options: -O3 -flto -fPIC -pthread -pipe
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Decompression Speed Clang 13.0.1 GCC 11.2.0 6K 12K 18K 24K 30K SE +/- 8.86, N = 3 SE +/- 1.47, N = 3 26736.4 27018.5 1. (CC) gcc options: -O3
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Decompression Speed Clang 13.0.1 GCC 11.2.0 4K 8K 12K 16K 20K SE +/- 3.46, N = 3 SE +/- 0.40, N = 3 16877.4 17490.9 1. (CC) gcc options: -O3
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Decompression Speed Clang 13.0.1 GCC 11.2.0 4K 8K 12K 16K 20K SE +/- 3.18, N = 3 SE +/- 1.03, N = 3 16863.3 17478.5 1. (CC) gcc options: -O3
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: IO_uring GCC 11.2.0 Clang 13.0.1 30K 60K 90K 120K 150K SE +/- 28.54, N = 3 SE +/- 271.95, N = 3 144281.67 147040.98 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Matrix Math GCC 11.2.0 Clang 13.0.1 6K 12K 18K 24K 30K SE +/- 332.61, N = 3 SE +/- 0.69, N = 3 23588.96 30254.21 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Vector Math GCC 11.2.0 Clang 13.0.1 9K 18K 27K 36K 45K SE +/- 195.44, N = 15 SE +/- 2.19, N = 3 23954.10 41899.94 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Memory Copying GCC 11.2.0 Clang 13.0.1 800 1600 2400 3200 4000 SE +/- 6.71, N = 3 SE +/- 15.21, N = 3 2763.25 3741.17 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Socket Activity Clang 13.0.1 GCC 11.2.0 900 1800 2700 3600 4500 SE +/- 13.20, N = 3 SE +/- 4.58, N = 3 4313.48 4331.71 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
Google Draco Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Lion Clang 13.0.1 GCC 11.2.0 800 1600 2400 3200 4000 SE +/- 0.58, N = 3 SE +/- 2.73, N = 3 3772 3747 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Church Facade Clang 13.0.1 GCC 11.2.0 1200 2400 3600 4800 6000 SE +/- 3.79, N = 3 SE +/- 7.21, N = 3 5722 5649 1. (CXX) g++ options: -O3 -flto
OpenJPEG OpenJPEG is an open-source JPEG 2000 codec written in the C programming language. The default input for this test profile is the NASA/JPL-Caltech/MSSS Curiosity panorama 717MB TIFF image file converting to JPEG2000 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OpenJPEG 2.4 Encode: NASA Curiosity Panorama M34 GCC 11.2.0 Clang 13.0.1 12K 24K 36K 48K 60K SE +/- 92.73, N = 3 SE +/- 161.48, N = 3 53890 52024 1. (CXX) g++ options: -O3 -flto -rdynamic
eSpeak-NG Speech Engine This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis Clang 13.0.1 GCC 11.2.0 6 12 18 24 30 SE +/- 0.03, N = 4 SE +/- 0.03, N = 4 23.43 22.29 1. (CC) gcc options: -O3 -flto -std=c99 -lpthread -lm
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Monero - Hash Count: 1M Clang 13.0.1 GCC 11.2.0 500 1000 1500 2000 2500 SE +/- 7.70, N = 3 SE +/- 9.05, N = 3 2209.7 2247.2 -funroll-loops -static-libgcc -static-libstdc++ 1. (CXX) g++ options: -O3 -flto -fexceptions -fno-rtti -Ofast -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Wownero - Hash Count: 1M GCC 11.2.0 Clang 13.0.1 600 1200 1800 2400 3000 SE +/- 1.83, N = 3 SE +/- 1.95, N = 3 2798.2 2804.8 -static-libgcc -static-libstdc++ -funroll-loops 1. (CXX) g++ options: -O3 -flto -fexceptions -fno-rtti -Ofast -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: Eigen GCC 11.2.0 Clang 13.0.1 300 600 900 1200 1500 SE +/- 10.69, N = 3 SE +/- 18.26, N = 3 1263 1297 1. (CXX) g++ options: -flto -O3 -pthread
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mobilenet Clang 13.0.1 GCC 11.2.0 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.17, N = 3 22.21 14.40 MIN: 22.15 / MAX: 22.25 -lgomp -lpthread - MIN: 9.21 / MAX: 25.2 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v2-v2 - Model: mobilenet-v2 Clang 13.0.1 GCC 11.2.0 1.314 2.628 3.942 5.256 6.57 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 5.84 2.61 MIN: 5.81 / MAX: 5.87 -lgomp -lpthread - MIN: 2.48 / MAX: 12.2 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v3-v3 - Model: mobilenet-v3 Clang 13.0.1 GCC 11.2.0 1.0845 2.169 3.2535 4.338 5.4225 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 4.82 2.34 MIN: 4.8 / MAX: 4.85 -lgomp -lpthread - MIN: 2.32 / MAX: 2.49 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: shufflenet-v2 Clang 13.0.1 GCC 11.2.0 0.846 1.692 2.538 3.384 4.23 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.76 2.17 MAX: 3.85 -lgomp -lpthread - MIN: 2.15 / MAX: 2.48 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mnasnet Clang 13.0.1 GCC 11.2.0 1.3185 2.637 3.9555 5.274 6.5925 SE +/- 0.01, N = 2 SE +/- 0.01, N = 3 5.86 2.52 MIN: 5.84 / MAX: 5.87 -lgomp -lpthread - MIN: 2.48 / MAX: 2.84 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: efficientnet-b0 Clang 13.0.1 GCC 11.2.0 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 9.51 4.18 MIN: 9.47 / MAX: 9.67 -lgomp -lpthread - MIN: 4.13 / MAX: 8.1 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: googlenet Clang 13.0.1 GCC 11.2.0 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.10, N = 3 25.11 13.32 MIN: 25.07 / MAX: 25.16 -lgomp -lpthread - MIN: 9.14 / MAX: 21.97 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: vgg16 Clang 13.0.1 GCC 11.2.0 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.14, N = 3 80.44 33.78 MIN: 80.22 / MAX: 80.95 -lgomp -lpthread - MIN: 30.68 / MAX: 45.72 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet18 Clang 13.0.1 GCC 11.2.0 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 18.18 7.31 MIN: 18.14 / MAX: 18.23 -lgomp -lpthread - MIN: 6.17 / MAX: 16.92 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: alexnet Clang 13.0.1 GCC 11.2.0 7 14 21 28 35 SE +/- 0.00, N = 3 SE +/- 0.10, N = 3 31.66 11.81 MIN: 31.62 / MAX: 33.42 -lgomp -lpthread - MIN: 9.48 / MAX: 21.58 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet50 Clang 13.0.1 GCC 11.2.0 11 22 33 44 55 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 46.79 17.16 MIN: 46.7 / MAX: 46.9 -lgomp -lpthread - MIN: 15.54 / MAX: 27.86 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: yolov4-tiny Clang 13.0.1 GCC 11.2.0 8 16 24 32 40 SE +/- 0.00, N = 3 SE +/- 0.07, N = 3 32.77 17.20 MIN: 32.68 / MAX: 32.88 -lgomp -lpthread - MIN: 14.01 / MAX: 27.33 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: squeezenet_ssd Clang 13.0.1 GCC 11.2.0 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.17, N = 3 20.15 14.26 MIN: 20.08 / MAX: 20.21 -lgomp -lpthread - MIN: 9.6 / MAX: 28.57 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: regnety_400m Clang 13.0.1 GCC 11.2.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 8.08 5.88 MIN: 8.05 / MAX: 8.15 -lgomp -lpthread - MIN: 5.78 / MAX: 8.62 1. (CXX) g++ options: -O3 -flto -rdynamic
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Compression Speed Clang 13.0.1 GCC 11.2.0 700 1400 2100 2800 3500 SE +/- 39.46, N = 3 SE +/- 6.19, N = 3 3301.1 3341.2 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Decompression Speed GCC 11.2.0 Clang 13.0.1 900 1800 2700 3600 4500 SE +/- 0.87, N = 3 SE +/- 0.75, N = 3 3850.2 3977.7 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Compression Speed Clang 13.0.1 GCC 11.2.0 160 320 480 640 800 SE +/- 4.97, N = 3 SE +/- 3.70, N = 3 699.6 721.5 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Decompression Speed GCC 11.2.0 Clang 13.0.1 900 1800 2700 3600 4500 SE +/- 1.95, N = 3 SE +/- 3.02, N = 3 4016.4 4141.0 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed GCC 11.2.0 Clang 13.0.1 6 12 18 24 30 SE +/- 0.07, N = 3 SE +/- 0.17, N = 3 22.7 23.2 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed GCC 11.2.0 Clang 13.0.1 800 1600 2400 3200 4000 SE +/- 0.15, N = 3 SE +/- 1.62, N = 3 3546.2 3684.3 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Compression Speed GCC 11.2.0 Clang 13.0.1 60 120 180 240 300 SE +/- 2.00, N = 15 SE +/- 3.51, N = 3 240.0 253.7 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Decompression Speed GCC 11.2.0 Clang 13.0.1 900 1800 2700 3600 4500 SE +/- 0.25, N = 15 SE +/- 0.40, N = 3 4221.1 4356.6 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Compression Speed GCC 11.2.0 Clang 13.0.1 150 300 450 600 750 SE +/- 2.35, N = 3 SE +/- 2.38, N = 3 693.0 703.4 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Decompression Speed GCC 11.2.0 Clang 13.0.1 1000 2000 3000 4000 5000 SE +/- 1.55, N = 3 SE +/- 3.13, N = 3 4416.3 4553.4 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed GCC 11.2.0 Clang 13.0.1 5 10 15 20 25 SE +/- 0.13, N = 3 SE +/- 0.21, N = 4 18.8 18.8 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed GCC 11.2.0 Clang 13.0.1 800 1600 2400 3200 4000 SE +/- 0.92, N = 3 SE +/- 0.69, N = 4 3765.4 3887.3 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Clang 13.0.1 GCC 11.2.0 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 87.82 64.44 1. (CC) gcc options: -lm -lpthread -O3 -flto
POV-Ray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time GCC 11.2.0 Clang 13.0.1 16 32 48 64 80 SE +/- 0.85, N = 4 SE +/- 0.64, N = 5 72.02 62.42 -R/usr/lib 1. (CXX) g++ options: -pipe -O3 -ffast-math -flto -lSDL -lpthread -lXpm -lSM -lICE -lX11 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 2 Clang 13.0.1 GCC 11.2.0 40 80 120 160 200 SE +/- 0.72, N = 3 SE +/- 0.32, N = 3 161.61 143.44 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 6, Lossless GCC 11.2.0 Clang 13.0.1 4 8 12 16 20 SE +/- 0.18, N = 3 SE +/- 0.21, N = 3 15.65 14.93 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 10, Lossless GCC 11.2.0 Clang 13.0.1 2 4 6 8 10 SE +/- 0.049, N = 3 SE +/- 0.047, N = 3 6.070 5.887 1. (CXX) g++ options: -O3 -fPIC -flto -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 GCC 11.2.0 Clang 13.0.1 8M 16M 24M 32M 40M SE +/- 3527.67, N = 3 SE +/- 2905.93, N = 3 28778667 37897333 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 2 - Buffer Length: 256 - Filter Length: 57 GCC 11.2.0 Clang 13.0.1 16M 32M 48M 64M 80M SE +/- 2081.67, N = 3 SE +/- 1763.83, N = 3 57611000 75898667 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 4 - Buffer Length: 256 - Filter Length: 57 GCC 11.2.0 Clang 13.0.1 30M 60M 90M 120M 150M SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 115230000 151820000 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 GCC 11.2.0 Clang 13.0.1 40M 80M 120M 160M 200M SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 151120000 196510000 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 GCC 11.2.0 Clang 13.0.1 2000M 4000M 6000M 8000M 10000M SE +/- 12283962.01, N = 3 SE +/- 3887401.32, N = 3 8059691050 8474527350 -Qunused-arguments 1. (CC) gcc options: -pthread -O3 -flto -lssl -lcrypto -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Clang 13.0.1 GCC 11.2.0 300 600 900 1200 1500 SE +/- 0.15, N = 3 SE +/- 0.78, N = 3 1391.4 1408.5 -Qunused-arguments 1. (CC) gcc options: -pthread -O3 -flto -lssl -lcrypto -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 GCC 11.2.0 Clang 13.0.1 20K 40K 60K 80K 100K SE +/- 18.59, N = 3 SE +/- 16.80, N = 3 99370.5 99445.4 -Qunused-arguments 1. (CC) gcc options: -pthread -O3 -flto -lssl -lcrypto -ldl
GCC 11.2.0 Processor: Apple M1 @ 2.06GHz (4 Cores / 8 Threads), Motherboard: Apple Mac mini (M1 2020), Memory: 8GB, Disk: 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q, Graphics: llvmpipe, Network: Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69
OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: GCC 11.2.0 + Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Compiler Notes: --build=aarch64-unknown-linux-gnu --disable-libssp --disable-libstdcxx-pch --disable-multilib --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-unknown-linux-gnu --mandir=/usr/share/man --with-arch=armv8-a --with-isl --with-linker-hash-style=gnuDisk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 8 April 2022 14:42 by user phoronix.
Clang 13.0.1 Processor: Apple M1 @ 2.06GHz (4 Cores / 8 Threads), Motherboard: Apple Mac mini (M1 2020), Memory: 8GB, Disk: 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q, Graphics: llvmpipe, Network: Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69
OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Disk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 9 April 2022 12:49 by user phoronix.