Apple M1 compiler testing for a future article.
GCC 11.2.0 Processor: Apple M1 @ 2.06GHz (4 Cores / 8 Threads), Motherboard: Apple Mac mini (M1 2020), Memory: 8GB, Disk: 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q, Graphics: llvmpipe, Network: Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69
OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: GCC 11.2.0 + Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Compiler Notes: --build=aarch64-unknown-linux-gnu --disable-libssp --disable-libstdcxx-pch --disable-multilib --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-unknown-linux-gnu --mandir=/usr/share/man --with-arch=armv8-a --with-isl --with-linker-hash-style=gnuDisk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Clang 13.0.1 OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Disk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Apple M1 Compilers OpenBenchmarking.org Phoronix Test Suite Apple M1 @ 2.06GHz (4 Cores / 8 Threads) Apple Mac mini (M1 2020) 8GB 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q llvmpipe Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69 Arch Linux ARM 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64) KDE Plasma 5.24.4 X Server 1.21.1.3 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits) GCC 11.2.0 + Clang 13.0.1 Clang 13.0.1 ext4 1920x1080 Processor Motherboard Memory Disk Graphics Network OS Kernel Desktop Display Server OpenGL Compilers File-System Screen Resolution Apple M1 Compilers Performance System Logs - CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto" - GCC 11.2.0: --build=aarch64-unknown-linux-gnu --disable-libssp --disable-libstdcxx-pch --disable-multilib --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-unknown-linux-gnu --mandir=/usr/share/man --with-arch=armv8-a --with-isl --with-linker-hash-style=gnu - MQ-DEADLINE / relatime,rw / Block Size: 4096 - Scaling Governor: apple-cpufreq schedutil - Python 3.10.4 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
GCC 11.2.0 vs. Clang 13.0.1 Comparison Phoronix Test Suite Baseline +43.2% +43.2% +86.4% +86.4% +129.6% +129.6% 74.9% 35.4% 31.8% 31.7% 31.7% 30% 28.3% 19.6% 15.4% 6.1% 5.7% 5.1% 4.8% 4.3% 3.9% 3.6% 3.3% 3.2% 3.2% 3.1% 3.1% 3.1% 2.7% 2.2% CPU - resnet50 172.7% CPU - alexnet 168.1% CPU - resnet18 148.7% CPU - vgg16 138.1% CPU - mnasnet 132.5% CPU - efficientnet-b0 127.5% CPU-v2-v2 - mobilenet-v2 123.8% CPU-v3-v3 - mobilenet-v3 106% CPU - yolov4-tiny 90.5% CPU - googlenet 88.5% Vector Math CPU - shufflenet-v2 73.3% Unkeyed Algorithms 59.4% CPU - mobilenet 54.2% CPU - squeezenet_ssd 41.3% CPU - regnety_400m 37.4% Total Time - 4.1.R.P.P 36.3% Keyed Algorithms 35.7% Memory Copying 4 - 256 - 57 2 - 256 - 57 1 - 256 - 57 8 - 256 - 57 Matrix Math 2048 x 2048 - Total Time 21.6% CoreMark Size 666 - I.P.S 21.3% WAV To FLAC All Algorithms 16% Trace Time 2 12.7% WAV To WavPack 12.3% WAV To MP3 12.2% I.E.C.P.K.A P.P.S 5.8% 3, Long Mode - Compression Speed 5.6% SHA256 T.T.S.S 5.1% 6, Lossless 6 D.T 4.2% 19 - D.S 9 - D.S 3.6% 3 - D.S 3.6% N.C.P.M 3 - D.S 19, Long Mode - D.S 3, Long Mode - D.S 8 - Compression Speed 3.1% 10, Lossless 8, Long Mode - D.S 8 - D.S Timed Time - Size 1,000 3% Eigen 19 - Compression Speed NCNN NCNN NCNN NCNN NCNN NCNN NCNN NCNN NCNN NCNN Stress-NG NCNN Crypto++ NCNN NCNN NCNN C-Ray Crypto++ Stress-NG Liquid-DSP Liquid-DSP Liquid-DSP Liquid-DSP Stress-NG AOBench Coremark FLAC Audio Encoding Crypto++ POV-Ray libavif avifenc WavPack Audio Encoding LAME MP3 Encoding Crypto++ Himeno Benchmark Zstd Compression libavif avifenc OpenSSL eSpeak-NG Speech Engine libavif avifenc libavif avifenc libjpeg-turbo tjbench Zstd Compression LZ4 Compression LZ4 Compression OpenJPEG Zstd Compression Zstd Compression Zstd Compression Zstd Compression libavif avifenc Zstd Compression Zstd Compression SQLite Speedtest LeelaChessZero Zstd Compression GCC 11.2.0 Clang 13.0.1
Apple M1 Compilers aobench: 2048 x 2048 - Total Time c-ray: Total Time - 4K, 16 Rays Per Pixel coremark: CoreMark Size 666 - Iterations Per Second cryptopp: All Algorithms cryptopp: Keyed Algorithms cryptopp: Unkeyed Algorithms cryptopp: Integer + Elliptic Curve Public Key Algorithms espeak: Text-To-Speech Synthesis encode-flac: WAV To FLAC draco: Lion draco: Church Facade himeno: Poisson Pressure Solver encode-mp3: WAV To MP3 lczero: Eigen avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 6, Lossless avifenc: 10, Lossless tjbench: Decompression Throughput liquid-dsp: 1 - 256 - 57 liquid-dsp: 2 - 256 - 57 liquid-dsp: 4 - 256 - 57 liquid-dsp: 8 - 256 - 57 compress-lz4: 1 - Compression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 3 - Compression Speed compress-lz4: 3 - Decompression Speed compress-lz4: 9 - Compression Speed compress-lz4: 9 - Decompression Speed ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m openjpeg: NASA Curiosity Panorama M34 openssl: SHA256 openssl: RSA4096 openssl: RSA4096 povray: Trace Time primesieve: 1e12 Prime Number Generation sqlite-speedtest: Timed Time - Size 1,000 stress-ng: Crypto stress-ng: IO_uring stress-ng: Matrix Math stress-ng: Vector Math stress-ng: Memory Copying stress-ng: Socket Activity encode-wavpack: WAV To WavPack xmrig: Monero - 1M xmrig: Wownero - 1M compress-zstd: 3 - Compression Speed compress-zstd: 3 - Decompression Speed compress-zstd: 8 - Compression Speed compress-zstd: 8 - Decompression Speed compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed compress-zstd: 3, Long Mode - Compression Speed compress-zstd: 3, Long Mode - Decompression Speed compress-zstd: 8, Long Mode - Compression Speed compress-zstd: 8, Long Mode - Decompression Speed compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19, Long Mode - Decompression Speed GCC 11.2.0 Clang 13.0.1 27.458 64.437 179896.599411 954.956113 508.836448 539.281827 1766.985880 22.289 70.648 3747 5649 7577.316534 7.239 1263 287.397 143.442 14.094 15.653 6.070 206.177350 28778667 57611000 115230000 151120000 21909.45 27018.5 51.99 17490.9 48.94 17478.5 14.40 2.61 2.34 2.17 2.52 4.18 13.32 33.78 7.31 11.81 17.16 17.20 14.26 5.88 53890 8059691050 1408.5 99370.5 72.017 29.118 51.372 1511.75 144281.67 23588.96 23954.10 2763.25 4331.71 17.205 2247.2 2798.2 3341.2 3850.2 721.5 4016.4 22.7 3546.2 240.0 4221.1 693.0 4416.3 18.8 3765.4 33.402 87.824 148361.362440 823.153532 374.896175 338.408369 1875.523520 23.429 59.074 3772 5722 7158.970486 8.124 1297 303.550 161.612 13.516 14.929 5.887 197.945225 37897333 75898667 151820000 196510000 21875.54 26736.4 51.32 16877.4 49.89 16863.3 22.21 5.84 4.82 3.76 5.86 9.51 25.11 80.44 18.18 31.66 46.79 32.77 20.15 8.08 52024 8474527350 1391.4 99445.4 62.416 29.626 52.900 1527.17 147040.98 30254.21 41899.94 3741.17 4313.48 19.320 2209.7 2804.8 3301.1 3977.7 699.6 4141.0 23.2 3684.3 253.7 4356.6 703.4 4553.4 18.8 3887.3 OpenBenchmarking.org
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel GCC 11.2.0 Clang 13.0.1 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 64.44 87.82 1. (CC) gcc options: -lm -lpthread -O3 -flto
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Keyed Algorithms GCC 11.2.0 Clang 13.0.1 110 220 330 440 550 SE +/- 0.07, N = 3 SE +/- 1.08, N = 3 508.84 374.90 1. (CXX) g++ options: -O3 -flto -fPIC -pthread -pipe
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Unkeyed Algorithms GCC 11.2.0 Clang 13.0.1 120 240 360 480 600 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 539.28 338.41 1. (CXX) g++ options: -O3 -flto -fPIC -pthread -pipe
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Integer + Elliptic Curve Public Key Algorithms Clang 13.0.1 GCC 11.2.0 400 800 1200 1600 2000 SE +/- 1.78, N = 3 SE +/- 0.67, N = 3 1875.52 1766.99 1. (CXX) g++ options: -O3 -flto -fPIC -pthread -pipe
eSpeak-NG Speech Engine This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis GCC 11.2.0 Clang 13.0.1 6 12 18 24 30 SE +/- 0.03, N = 4 SE +/- 0.03, N = 4 22.29 23.43 1. (CC) gcc options: -O3 -flto -std=c99 -lpthread -lm
Google Draco Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Lion GCC 11.2.0 Clang 13.0.1 800 1600 2400 3200 4000 SE +/- 2.73, N = 3 SE +/- 0.58, N = 3 3747 3772 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Church Facade GCC 11.2.0 Clang 13.0.1 1200 2400 3600 4800 6000 SE +/- 7.21, N = 3 SE +/- 3.79, N = 3 5649 5722 1. (CXX) g++ options: -O3 -flto
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: Eigen Clang 13.0.1 GCC 11.2.0 300 600 900 1200 1500 SE +/- 18.26, N = 3 SE +/- 10.69, N = 3 1297 1263 1. (CXX) g++ options: -flto -O3 -pthread
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 2 GCC 11.2.0 Clang 13.0.1 40 80 120 160 200 SE +/- 0.32, N = 3 SE +/- 0.72, N = 3 143.44 161.61 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 6, Lossless Clang 13.0.1 GCC 11.2.0 4 8 12 16 20 SE +/- 0.21, N = 3 SE +/- 0.18, N = 3 14.93 15.65 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 10, Lossless Clang 13.0.1 GCC 11.2.0 2 4 6 8 10 SE +/- 0.047, N = 3 SE +/- 0.049, N = 3 5.887 6.070 1. (CXX) g++ options: -O3 -fPIC -flto -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 Clang 13.0.1 GCC 11.2.0 8M 16M 24M 32M 40M SE +/- 2905.93, N = 3 SE +/- 3527.67, N = 3 37897333 28778667 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 2 - Buffer Length: 256 - Filter Length: 57 Clang 13.0.1 GCC 11.2.0 16M 32M 48M 64M 80M SE +/- 1763.83, N = 3 SE +/- 2081.67, N = 3 75898667 57611000 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 4 - Buffer Length: 256 - Filter Length: 57 Clang 13.0.1 GCC 11.2.0 30M 60M 90M 120M 150M SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 151820000 115230000 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 Clang 13.0.1 GCC 11.2.0 40M 80M 120M 160M 200M SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 196510000 151120000 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Decompression Speed GCC 11.2.0 Clang 13.0.1 6K 12K 18K 24K 30K SE +/- 1.47, N = 3 SE +/- 8.86, N = 3 27018.5 26736.4 1. (CC) gcc options: -O3
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Decompression Speed GCC 11.2.0 Clang 13.0.1 4K 8K 12K 16K 20K SE +/- 0.40, N = 3 SE +/- 3.46, N = 3 17490.9 16877.4 1. (CC) gcc options: -O3
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Decompression Speed GCC 11.2.0 Clang 13.0.1 4K 8K 12K 16K 20K SE +/- 1.03, N = 3 SE +/- 3.18, N = 3 17478.5 16863.3 1. (CC) gcc options: -O3
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mobilenet GCC 11.2.0 Clang 13.0.1 5 10 15 20 25 SE +/- 0.17, N = 3 SE +/- 0.01, N = 3 14.40 22.21 -lgomp -lpthread - MIN: 9.21 / MAX: 25.2 MIN: 22.15 / MAX: 22.25 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v2-v2 - Model: mobilenet-v2 GCC 11.2.0 Clang 13.0.1 1.314 2.628 3.942 5.256 6.57 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 2.61 5.84 -lgomp -lpthread - MIN: 2.48 / MAX: 12.2 MIN: 5.81 / MAX: 5.87 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v3-v3 - Model: mobilenet-v3 GCC 11.2.0 Clang 13.0.1 1.0845 2.169 3.2535 4.338 5.4225 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 2.34 4.82 -lgomp -lpthread - MIN: 2.32 / MAX: 2.49 MIN: 4.8 / MAX: 4.85 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: shufflenet-v2 GCC 11.2.0 Clang 13.0.1 0.846 1.692 2.538 3.384 4.23 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 2.17 3.76 -lgomp -lpthread - MIN: 2.15 / MAX: 2.48 MAX: 3.85 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mnasnet GCC 11.2.0 Clang 13.0.1 1.3185 2.637 3.9555 5.274 6.5925 SE +/- 0.01, N = 3 SE +/- 0.01, N = 2 2.52 5.86 -lgomp -lpthread - MIN: 2.48 / MAX: 2.84 MIN: 5.84 / MAX: 5.87 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: efficientnet-b0 GCC 11.2.0 Clang 13.0.1 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 4.18 9.51 -lgomp -lpthread - MIN: 4.13 / MAX: 8.1 MIN: 9.47 / MAX: 9.67 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: googlenet GCC 11.2.0 Clang 13.0.1 6 12 18 24 30 SE +/- 0.10, N = 3 SE +/- 0.01, N = 3 13.32 25.11 -lgomp -lpthread - MIN: 9.14 / MAX: 21.97 MIN: 25.07 / MAX: 25.16 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: vgg16 GCC 11.2.0 Clang 13.0.1 20 40 60 80 100 SE +/- 0.14, N = 3 SE +/- 0.01, N = 3 33.78 80.44 -lgomp -lpthread - MIN: 30.68 / MAX: 45.72 MIN: 80.22 / MAX: 80.95 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet18 GCC 11.2.0 Clang 13.0.1 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 7.31 18.18 -lgomp -lpthread - MIN: 6.17 / MAX: 16.92 MIN: 18.14 / MAX: 18.23 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: alexnet GCC 11.2.0 Clang 13.0.1 7 14 21 28 35 SE +/- 0.10, N = 3 SE +/- 0.00, N = 3 11.81 31.66 -lgomp -lpthread - MIN: 9.48 / MAX: 21.58 MIN: 31.62 / MAX: 33.42 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet50 GCC 11.2.0 Clang 13.0.1 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 17.16 46.79 -lgomp -lpthread - MIN: 15.54 / MAX: 27.86 MIN: 46.7 / MAX: 46.9 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: yolov4-tiny GCC 11.2.0 Clang 13.0.1 8 16 24 32 40 SE +/- 0.07, N = 3 SE +/- 0.00, N = 3 17.20 32.77 -lgomp -lpthread - MIN: 14.01 / MAX: 27.33 MIN: 32.68 / MAX: 32.88 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: squeezenet_ssd GCC 11.2.0 Clang 13.0.1 5 10 15 20 25 SE +/- 0.17, N = 3 SE +/- 0.00, N = 3 14.26 20.15 -lgomp -lpthread - MIN: 9.6 / MAX: 28.57 MIN: 20.08 / MAX: 20.21 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: regnety_400m GCC 11.2.0 Clang 13.0.1 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 5.88 8.08 -lgomp -lpthread - MIN: 5.78 / MAX: 8.62 MIN: 8.05 / MAX: 8.15 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenJPEG OpenJPEG is an open-source JPEG 2000 codec written in the C programming language. The default input for this test profile is the NASA/JPL-Caltech/MSSS Curiosity panorama 717MB TIFF image file converting to JPEG2000 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OpenJPEG 2.4 Encode: NASA Curiosity Panorama M34 Clang 13.0.1 GCC 11.2.0 12K 24K 36K 48K 60K SE +/- 161.48, N = 3 SE +/- 92.73, N = 3 52024 53890 1. (CXX) g++ options: -O3 -flto -rdynamic
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 Clang 13.0.1 GCC 11.2.0 2000M 4000M 6000M 8000M 10000M SE +/- 3887401.32, N = 3 SE +/- 12283962.01, N = 3 8474527350 8059691050 -Qunused-arguments 1. (CC) gcc options: -pthread -O3 -flto -lssl -lcrypto -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 GCC 11.2.0 Clang 13.0.1 300 600 900 1200 1500 SE +/- 0.78, N = 3 SE +/- 0.15, N = 3 1408.5 1391.4 -Qunused-arguments 1. (CC) gcc options: -pthread -O3 -flto -lssl -lcrypto -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Clang 13.0.1 GCC 11.2.0 20K 40K 60K 80K 100K SE +/- 16.80, N = 3 SE +/- 18.59, N = 3 99445.4 99370.5 -Qunused-arguments 1. (CC) gcc options: -pthread -O3 -flto -lssl -lcrypto -ldl
POV-Ray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time Clang 13.0.1 GCC 11.2.0 16 32 48 64 80 SE +/- 0.64, N = 5 SE +/- 0.85, N = 4 62.42 72.02 -R/usr/lib 1. (CXX) g++ options: -pipe -O3 -ffast-math -flto -lSDL -lpthread -lXpm -lSM -lICE -lX11 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: IO_uring Clang 13.0.1 GCC 11.2.0 30K 60K 90K 120K 150K SE +/- 271.95, N = 3 SE +/- 28.54, N = 3 147040.98 144281.67 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Matrix Math Clang 13.0.1 GCC 11.2.0 6K 12K 18K 24K 30K SE +/- 0.69, N = 3 SE +/- 332.61, N = 3 30254.21 23588.96 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Vector Math Clang 13.0.1 GCC 11.2.0 9K 18K 27K 36K 45K SE +/- 2.19, N = 3 SE +/- 195.44, N = 15 41899.94 23954.10 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Memory Copying Clang 13.0.1 GCC 11.2.0 800 1600 2400 3200 4000 SE +/- 15.21, N = 3 SE +/- 6.71, N = 3 3741.17 2763.25 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.13.02 Test: Socket Activity GCC 11.2.0 Clang 13.0.1 900 1800 2700 3600 4500 SE +/- 4.58, N = 3 SE +/- 13.20, N = 3 4331.71 4313.48 1. (CC) gcc options: -O3 -flto -O2 -std=gnu99 -lm -laio -lbsd -lcrypt -lrt -lz -ldl -pthread -lkmod -lc -latomic
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Monero - Hash Count: 1M GCC 11.2.0 Clang 13.0.1 500 1000 1500 2000 2500 SE +/- 9.05, N = 3 SE +/- 7.70, N = 3 2247.2 2209.7 -static-libgcc -static-libstdc++ -funroll-loops 1. (CXX) g++ options: -O3 -flto -fexceptions -fno-rtti -Ofast -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Wownero - Hash Count: 1M Clang 13.0.1 GCC 11.2.0 600 1200 1800 2400 3000 SE +/- 1.95, N = 3 SE +/- 1.83, N = 3 2804.8 2798.2 -funroll-loops -static-libgcc -static-libstdc++ 1. (CXX) g++ options: -O3 -flto -fexceptions -fno-rtti -Ofast -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Compression Speed GCC 11.2.0 Clang 13.0.1 700 1400 2100 2800 3500 SE +/- 6.19, N = 3 SE +/- 39.46, N = 3 3341.2 3301.1 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Decompression Speed Clang 13.0.1 GCC 11.2.0 900 1800 2700 3600 4500 SE +/- 0.75, N = 3 SE +/- 0.87, N = 3 3977.7 3850.2 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Compression Speed GCC 11.2.0 Clang 13.0.1 160 320 480 640 800 SE +/- 3.70, N = 3 SE +/- 4.97, N = 3 721.5 699.6 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Decompression Speed Clang 13.0.1 GCC 11.2.0 900 1800 2700 3600 4500 SE +/- 3.02, N = 3 SE +/- 1.95, N = 3 4141.0 4016.4 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed Clang 13.0.1 GCC 11.2.0 6 12 18 24 30 SE +/- 0.17, N = 3 SE +/- 0.07, N = 3 23.2 22.7 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed Clang 13.0.1 GCC 11.2.0 800 1600 2400 3200 4000 SE +/- 1.62, N = 3 SE +/- 0.15, N = 3 3684.3 3546.2 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Compression Speed Clang 13.0.1 GCC 11.2.0 60 120 180 240 300 SE +/- 3.51, N = 3 SE +/- 2.00, N = 15 253.7 240.0 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Decompression Speed Clang 13.0.1 GCC 11.2.0 900 1800 2700 3600 4500 SE +/- 0.40, N = 3 SE +/- 0.25, N = 15 4356.6 4221.1 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Compression Speed Clang 13.0.1 GCC 11.2.0 150 300 450 600 750 SE +/- 2.38, N = 3 SE +/- 2.35, N = 3 703.4 693.0 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Decompression Speed Clang 13.0.1 GCC 11.2.0 1000 2000 3000 4000 5000 SE +/- 3.13, N = 3 SE +/- 1.55, N = 3 4553.4 4416.3 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed Clang 13.0.1 GCC 11.2.0 5 10 15 20 25 SE +/- 0.21, N = 4 SE +/- 0.13, N = 3 18.8 18.8 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed Clang 13.0.1 GCC 11.2.0 800 1600 2400 3200 4000 SE +/- 0.69, N = 4 SE +/- 0.92, N = 3 3887.3 3765.4 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma -llz4
GCC 11.2.0 Processor: Apple M1 @ 2.06GHz (4 Cores / 8 Threads), Motherboard: Apple Mac mini (M1 2020), Memory: 8GB, Disk: 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q, Graphics: llvmpipe, Network: Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69
OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: GCC 11.2.0 + Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Compiler Notes: --build=aarch64-unknown-linux-gnu --disable-libssp --disable-libstdcxx-pch --disable-multilib --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-unknown-linux-gnu --mandir=/usr/share/man --with-arch=armv8-a --with-isl --with-linker-hash-style=gnuDisk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 8 April 2022 14:42 by user phoronix.
Clang 13.0.1 Processor: Apple M1 @ 2.06GHz (4 Cores / 8 Threads), Motherboard: Apple Mac mini (M1 2020), Memory: 8GB, Disk: 251GB APPLE SSD AP0256Q + 2 x 0GB APPLE SSD AP0256Q, Graphics: llvmpipe, Network: Broadcom NetXtreme BCM57762 PCIe + Broadcom BRCM4378 + Broadcom Device 5f69
OS: Arch Linux ARM, Kernel: 5.17.0-rc7-asahi-next-20220310-5-2-ARCH (aarch64), Desktop: KDE Plasma 5.24.4, Display Server: X Server 1.21.1.3, OpenGL: 4.5 Mesa 22.0.1 (LLVM 13.0.1 128 bits), Compiler: Clang 13.0.1, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -flto" CFLAGS="-O3 -flto"Disk Notes: MQ-DEADLINE / relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: apple-cpufreq schedutilPython Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 9 April 2022 12:49 by user phoronix.