Xeon Platinum 8380 compiler benchmarks by Michael Larabel looking at GCC 11 against LLVM Clang 12 for some initial holiday weekend tests...
GCC 11.1 Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP
OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: GCC 11.1.1 20210428, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driverProcessor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Clang 12.0 OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: Clang 12.0.0, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Processor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake OpenBenchmarking.org Phoronix Test Suite 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN 800GB INTEL SSDPF21Q800GB ASPEED 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Fedora 34 5.12.6-300.fc34.x86_64 (x86_64) GCC 11.1.1 20210428 Clang 12.0.0 xfs 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compilers File-System Screen Resolution GCC 11 Vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake Performance System Logs - Transparent Huge Pages: madvise - CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto" - GCC 11.1: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver - Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270 - Python 3.9.5 - SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
GCC 11.1 vs. Clang 12.0 Comparison Phoronix Test Suite Baseline +64.2% +64.2% +128.4% +128.4% +192.6% +192.6% 256.6% 158.4% 126.5% 98% 82.6% 80% 79.3% 78.8% 61.7% 61.6% 54.7% 42.4% 38.9% 36.5% 35.8% 35.3% 27.2% 26.9% 26.3% 25.6% 22.6% 17.9% 16.9% 16.7% 16.4% 15.9% 15.8% 15.5% 15.2% 14.3% 11.2% 8.5% 8.2% 8.1% 8% 7.8% 7.5% 6.9% 6.3% 6.1% 6% 5.7% 5.6% 5.3% 4.7% 4.5% 4.5% 4.1% 3.9% 3.9% 3.7% 3.5% 3.5% 3.4% 3% 2.8% 2.6% 2.5% 2.4% 2% CPU - regnety_400m CPU - blazeface CPU-v3-v3 - mobilenet-v3 CPU-v2-v2 - mobilenet-v2 Total Time - 4.1.R.P.P 97% M.M.B.S.T - u8s8f32 - CPU D.B.s - u8s8f32 - CPU CPU - mnasnet CPU - shufflenet-v2 CPU - efficientnet-b0 Resizing M.M.B.S.T - f32 - CPU R.4.b.P 54.1% CPU - MobileNet v2 46.5% Medium IP Shapes 1D - f32 - CPU CPU - mobilenet Thorough IP Shapes 3D - u8s8f32 - CPU R.N.N.I - f32 - CPU CPU - googlenet R.N.N.I - u8s8f32 - CPU R.N.N.I - bf16bf16bf16 - CPU 8, Long Mode - Compression Speed 25.4% IP Shapes 1D - u8s8f32 - CPU 1000 Stack 21.5% CoreMark Size 666 - I.P.S 18.4% CPU - squeezenet_ssd Exhaustive Sharpen 16.8% M.M.B.S.T - bf16bf16bf16 - CPU R.N.N.T - f32 - CPU R.N.N.T - bf16bf16bf16 - CPU 160 - 256 - 57 R.N.N.T - u8s8f32 - CPU Enhanced 15.2% IP Shapes 3D - f32 - CPU D.B.s - bf16bf16bf16 - CPU Q.1.C.E.5 12.7% P.P.S 11.8% WAV To Opus Encode 11.6% WAV To MP3 11.3% D.B.s - u8s8f32 - CPU 3000 Fall 11.1% 10.9% 136 Ragdolls 9.6% CPU - yolov4-tiny 9.3% 1000 Convex 9.2% Prim Trimesh 9.2% Convex Trimesh 8.9% IP Shapes 1D - bf16bf16bf16 - CPU Bosphorus 4K Q.1.H.C Unkeyed Algorithms CPU - vgg16 7.9% Q.9.C.E.7 C.B.S.A - u8s8f32 - CPU Q.7.C.E.7 CPU - SqueezeNet v1.1 6.8% 2048 x 2048 - Total Time 6.5% IP Shapes 3D - bf16bf16bf16 - CPU Default 1 - Bosphorus 1080p Bosphorus 4K - Very Fast 7 - Bosphorus 1080p 8 - Compression Speed Preset 4 - Bosphorus 1080p 100 - 250 - Read Only - Average Latency Bosphorus 1080p - Ultra Fast Q.1.L 4.1% Bosphorus 1080p - Very Fast 100 - 250 - Read Only 19, Long Mode - Compression Speed 3.7% D.T 100 - 250 - Read Write 100 - 250 - Read Write - Average Latency V.Q.O - Bosphorus 1080p 3.5% C.B.S.A - f32 - CPU Q.1.L.H.C 3.1% Q.1.L.C 3% Bosphorus 4K - Ultra Fast Preset 4 - Bosphorus 4K 19 - Compression Speed 2.8% P.P.A P.S.O - Bosphorus 1080p 2.5% CPU - resnet18 T.T.S.S VMAF Optimized - Bosphorus 1080p 2% Preset 8 - Bosphorus 4K NCNN NCNN NCNN NCNN C-Ray oneDNN oneDNN NCNN NCNN NCNN GraphicsMagick oneDNN OpenSSL TNN ASTC Encoder oneDNN NCNN ASTC Encoder oneDNN oneDNN NCNN oneDNN oneDNN Zstd Compression oneDNN Bullet Physics Engine Coremark NCNN ASTC Encoder GraphicsMagick oneDNN oneDNN oneDNN Liquid-DSP oneDNN GraphicsMagick oneDNN oneDNN WebP2 Image Encode Himeno Benchmark Opus Codec Encoding LAME MP3 Encoding oneDNN Bullet Physics Engine Kripke Bullet Physics Engine NCNN Bullet Physics Engine Bullet Physics Engine Bullet Physics Engine oneDNN x265 WebP Image Encode Crypto++ NCNN WebP2 Image Encode oneDNN WebP2 Image Encode TNN AOBench oneDNN WebP2 Image Encode SVT-HEVC Kvazaar SVT-HEVC Zstd Compression SVT-AV1 PostgreSQL pgbench Gcrypt Library Kvazaar WebP Image Encode Kvazaar PostgreSQL pgbench Zstd Compression libjpeg-turbo tjbench PostgreSQL pgbench PostgreSQL pgbench SVT-VP9 oneDNN WebP Image Encode WebP2 Image Encode Kvazaar SVT-AV1 Zstd Compression Timed MrBayes Analysis SVT-VP9 NCNN eSpeak-NG Speech Engine SVT-VP9 SVT-AV1 GCC 11.1 Clang 12.0
GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake kvazaar: Bosphorus 4K - Very Fast kvazaar: Bosphorus 4K - Ultra Fast kvazaar: Bosphorus 1080p - Very Fast kvazaar: Bosphorus 1080p - Ultra Fast svt-av1: Preset 4 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 4 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-hevc: 1 - Bosphorus 1080p svt-hevc: 7 - Bosphorus 1080p svt-hevc: 10 - Bosphorus 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p x265: Bosphorus 4K x265: Bosphorus 1080p gmpbench: Total Time graphics-magick: Rotate graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Resizing coremark: CoreMark Size 666 - Iterations Per Second compress-zstd: 8 - Compression Speed compress-zstd: 8 - Decompression Speed compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed compress-zstd: 8, Long Mode - Compression Speed compress-zstd: 8, Long Mode - Decompression Speed compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19, Long Mode - Decompression Speed tjbench: Decompression Throughput himeno: Poisson Pressure Solver cryptopp: Unkeyed Algorithms liquid-dsp: 1 - 256 - 57 liquid-dsp: 160 - 256 - 57 openssl: RSA 4096-bit Performance daphne: OpenMP - NDT Mapping daphne: OpenMP - Points2Image daphne: OpenMP - Euclidean Cluster kripke: pgbench: 100 - 250 - Read Only pgbench: 100 - 250 - Read Write webp: Default webp: Quality 100 webp: Quality 100, Lossless webp: Quality 100, Highest Compression webp: Quality 100, Lossless, Highest Compression caffe: AlexNet - CPU - 200 caffe: GoogleNet - CPU - 200 onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU pgbench: 100 - 250 - Read Only - Average Latency pgbench: 100 - 250 - Read Write - Average Latency ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 mrbayes: Primate Phylogeny Analysis c-ray: Total Time - 4K, 16 Rays Per Pixel primesieve: 1e12 Prime Number Generation aobench: 2048 x 2048 - Total Time bullet: 3000 Fall bullet: 1000 Stack bullet: 1000 Convex bullet: 136 Ragdolls bullet: Prim Trimesh bullet: Convex Trimesh encode-flac: WAV To FLAC encode-mp3: WAV To MP3 encode-opus: WAV To Opus Encode espeak: Text-To-Speech Synthesis gcrypt: webp2: Default webp2: Quality 75, Compression Effort 7 webp2: Quality 95, Compression Effort 7 webp2: Quality 100, Compression Effort 5 webp2: Quality 100, Lossless Compression astcenc: Medium astcenc: Thorough astcenc: Exhaustive encode-wavpack: WAV To WavPack GCC 11.1 Clang 12.0 42.31 46.58 159.71 176.66 4.213 55.190 8.996 167.388 39.59 336.37 609.56 476.85 477.90 393.17 26.92 76.88 3871.6 745 898 1315 380 2522898.568222 2611.0 2959.3 83.8 2537.3 1040.6 3168.4 47.9 2670.8 174.229617 4651.527908 359.900917 60985333 3182866667 17804.2 1046.60 14507.806103537 1013.99 177613600 907401 89425 1.638 2.645 19.473 8.026 40.912 298291 662408 0.923876 1.40268 1.23707 0.439708 3.00252 1.81351 1.45930 0.840383 0.969972 0.361691 0.194587 686.190 448.247 686.045 2.09246 3.27829 3.57127 445.263 0.247554 677.708 444.228 0.219899 0.601530 0.277 2.797 19.40 9.80 9.56 10.55 9.43 12.48 6.15 19.46 25.34 11.10 23.12 21.18 94.38 376.741 377.430 142.197 7.794 3.780 33.881 3.873060 4.451765 4.34060 2.553303 0.86200 1.054273 9.382 8.619 8.768 30.511 265.172 2.644 106.658 196.486 5.765 389.070 6.4270 9.4219 16.3885 17.360 44.72 47.96 166.00 183.97 4.333 56.273 9.418 169.606 41.96 355.18 608.95 467.57 466.06 379.93 29.13 77.24 745 769 1141 614 2130829.931753 2748.1 2996.7 81.5 2495.2 830.0 3204.6 46.2 2632.3 180.596292 4161.653514 388.572346 61840333 3686466667 11555.8 160155675 943043 92576 1.616 2.692 20.267 7.422 42.184 297554 663282 0.665355 1.21726 1.00883 0.325027 2.76681 1.70567 1.41167 0.842984 0.902213 0.200917 0.175060 589.283 352.474 594.109 2.06629 2.86766 3.57288 352.588 0.160017 584.535 353.576 0.120406 0.515284 0.265 2.702 14.21 4.95 4.22 5.90 5.26 7.72 2.38 15.34 27.34 10.83 25.28 17.96 26.47 552.028 402.914 138.549 15.352 3.830 36.075 4.301823 5.410511 4.741088 2.798322 0.940903 1.148367 9.474 9.591 9.788 29.806 253.856 2.492 99.777 182.290 6.495 400.617 4.5120 6.9380 14.0133 17.343 OpenBenchmarking.org
Kvazaar This is a test of Kvazaar as a CPU-based H.265 video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Very Fast Clang 12.0 GCC 11.1 10 20 30 40 50 SE +/- 0.28, N = 3 SE +/- 0.35, N = 3 44.72 42.31 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast Clang 12.0 GCC 11.1 11 22 33 44 55 SE +/- 0.33, N = 3 SE +/- 0.54, N = 3 47.96 46.58 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Very Fast Clang 12.0 GCC 11.1 40 80 120 160 200 SE +/- 0.77, N = 3 SE +/- 0.42, N = 3 166.00 159.71 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast Clang 12.0 GCC 11.1 40 80 120 160 200 SE +/- 1.71, N = 3 SE +/- 1.31, N = 3 183.97 176.66 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 4 - Input: Bosphorus 4K Clang 12.0 GCC 11.1 0.9749 1.9498 2.9247 3.8996 4.8745 SE +/- 0.006, N = 3 SE +/- 0.029, N = 3 4.333 4.213 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 8 - Input: Bosphorus 4K Clang 12.0 GCC 11.1 13 26 39 52 65 SE +/- 0.41, N = 3 SE +/- 0.23, N = 3 56.27 55.19 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 4 - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.082, N = 3 SE +/- 0.043, N = 3 9.418 8.996 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 8 - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 40 80 120 160 200 SE +/- 1.14, N = 3 SE +/- 0.34, N = 3 169.61 167.39 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
SVT-HEVC This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 10 20 30 40 50 SE +/- 0.21, N = 3 SE +/- 0.28, N = 3 41.96 39.59 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 80 160 240 320 400 SE +/- 1.29, N = 3 SE +/- 2.94, N = 3 355.18 336.37 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 130 260 390 520 650 SE +/- 2.17, N = 3 SE +/- 1.44, N = 3 608.95 609.56 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 100 200 300 400 500 SE +/- 3.96, N = 3 SE +/- 2.96, N = 3 467.57 476.85 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -O2 -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 100 200 300 400 500 SE +/- 0.60, N = 3 SE +/- 3.23, N = 3 466.06 477.90 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -O2 -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p Clang 12.0 GCC 11.1 90 180 270 360 450 SE +/- 1.57, N = 3 SE +/- 0.84, N = 3 379.93 393.17 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -O2 -pie -rdynamic -lpthread -lrt -lm
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K Clang 12.0 GCC 11.1 7 14 21 28 35 SE +/- 0.16, N = 3 SE +/- 0.27, N = 3 29.13 26.92 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread -lrt -ldl
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p Clang 12.0 GCC 11.1 20 40 60 80 100 SE +/- 0.27, N = 3 SE +/- 0.62, N = 3 77.24 76.88 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread -lrt -ldl
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen Clang 12.0 GCC 11.1 200 400 600 800 1000 SE +/- 2.73, N = 3 SE +/- 2.40, N = 3 769 898 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced Clang 12.0 GCC 11.1 300 600 900 1200 1500 SE +/- 4.10, N = 3 SE +/- 1.20, N = 3 1141 1315 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing Clang 12.0 GCC 11.1 130 260 390 520 650 SE +/- 15.04, N = 15 SE +/- 4.36, N = 3 614 380 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -pthread -ljpeg -lX11 -lz -lm -lpthread
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Compression Speed Clang 12.0 GCC 11.1 600 1200 1800 2400 3000 SE +/- 30.31, N = 5 SE +/- 30.03, N = 15 2748.1 2611.0 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Decompression Speed Clang 12.0 GCC 11.1 600 1200 1800 2400 3000 SE +/- 7.51, N = 5 SE +/- 2.87, N = 15 2996.7 2959.3 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed Clang 12.0 GCC 11.1 20 40 60 80 100 SE +/- 0.87, N = 5 SE +/- 0.39, N = 3 81.5 83.8 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed Clang 12.0 GCC 11.1 500 1000 1500 2000 2500 SE +/- 13.91, N = 5 SE +/- 0.57, N = 3 2495.2 2537.3 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Compression Speed Clang 12.0 GCC 11.1 200 400 600 800 1000 SE +/- 9.83, N = 3 SE +/- 3.20, N = 3 830.0 1040.6 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Decompression Speed Clang 12.0 GCC 11.1 700 1400 2100 2800 3500 SE +/- 4.90, N = 3 SE +/- 6.63, N = 3 3204.6 3168.4 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed Clang 12.0 GCC 11.1 11 22 33 44 55 SE +/- 0.37, N = 15 SE +/- 0.48, N = 15 46.2 47.9 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed Clang 12.0 GCC 11.1 600 1200 1800 2400 3000 SE +/- 1.79, N = 15 SE +/- 2.07, N = 15 2632.3 2670.8 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
libjpeg-turbo tjbench tjbench is a JPEG decompression/compression benchmark that is part of libjpeg-turbo, a JPEG image codec library optimized for SIMD instructions on modern CPU architectures. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Megapixels/sec, More Is Better libjpeg-turbo tjbench 2.1.0 Test: Decompression Throughput Clang 12.0 GCC 11.1 40 80 120 160 200 SE +/- 0.47, N = 3 SE +/- 1.34, N = 3 180.60 174.23 1. (CC) gcc options: -O3 -march=native -flto -rdynamic -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 Clang 12.0 GCC 11.1 13M 26M 39M 52M 65M SE +/- 5206.83, N = 3 SE +/- 49079.30, N = 3 61840333 60985333 1. (CC) gcc options: -O3 -march=native -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 160 - Buffer Length: 256 - Filter Length: 57 Clang 12.0 GCC 11.1 800M 1600M 2400M 3200M 4000M SE +/- 34322846.29, N = 3 SE +/- 22402480.02, N = 3 3686466667 3182866667 1. (CC) gcc options: -O3 -march=native -flto -pthread -lm -lc -lliquid
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance Clang 12.0 GCC 11.1 4K 8K 12K 16K 20K SE +/- 10.32, N = 3 SE +/- 44.60, N = 3 11555.8 17804.2 -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=native -flto -lssl -lcrypto -ldl
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 Clang 12.0 GCC 11.1 40M 80M 120M 160M 200M SE +/- 1966921.21, N = 4 SE +/- 1982388.01, N = 5 160155675 177613600 -fopenmp=libomp -fopenmp 1. (CXX) g++ options: -O3 -march=native -flto -O2
PostgreSQL pgbench This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Only Clang 12.0 GCC 11.1 200K 400K 600K 800K 1000K SE +/- 8511.06, N = 3 SE +/- 15989.65, N = 13 943043 907401 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Write Clang 12.0 GCC 11.1 20K 40K 60K 80K 100K SE +/- 239.57, N = 3 SE +/- 106.75, N = 3 92576 89425 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Default Clang 12.0 GCC 11.1 0.3686 0.7372 1.1058 1.4744 1.843 SE +/- 0.003, N = 3 SE +/- 0.000, N = 3 1.616 1.638 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 Clang 12.0 GCC 11.1 0.6057 1.2114 1.8171 2.4228 3.0285 SE +/- 0.003, N = 3 SE +/- 0.001, N = 3 2.692 2.645 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless Clang 12.0 GCC 11.1 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 20.27 19.47 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression Clang 12.0 GCC 11.1 2 4 6 8 10 SE +/- 0.007, N = 3 SE +/- 0.007, N = 3 7.422 8.026 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression Clang 12.0 GCC 11.1 10 20 30 40 50 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 42.18 40.91 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 Clang 12.0 GCC 11.1 60K 120K 180K 240K 300K SE +/- 100.93, N = 3 SE +/- 46.89, N = 3 297554 298291 1. (CXX) g++ options: -O3 -march=native -flto -fPIC -O2 -rdynamic -lboost_system -lboost_thread -lboost_filesystem -lboost_chrono -lboost_date_time -lboost_atomic -lglog -lgflags -lprotobuf -lpthread -lhdf5_cpp -lhdf5 -lhdf5_hl_cpp -lhdf5_hl -llmdb -lopenblas
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 Clang 12.0 GCC 11.1 140K 280K 420K 560K 700K SE +/- 154.25, N = 3 SE +/- 192.49, N = 3 663282 662408 1. (CXX) g++ options: -O3 -march=native -flto -fPIC -O2 -rdynamic -lboost_system -lboost_thread -lboost_filesystem -lboost_chrono -lboost_date_time -lboost_atomic -lglog -lgflags -lprotobuf -lpthread -lhdf5_cpp -lhdf5 -lhdf5_hl_cpp -lhdf5_hl -llmdb -lopenblas
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 0.2079 0.4158 0.6237 0.8316 1.0395 SE +/- 0.000134, N = 3 SE +/- 0.001474, N = 3 0.665355 0.923876 -fopenmp=libomp - MIN: 0.61 -fopenmp - MIN: 0.85 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 0.3156 0.6312 0.9468 1.2624 1.578 SE +/- 0.00153, N = 3 SE +/- 0.00224, N = 3 1.21726 1.40268 -fopenmp=libomp - MIN: 1.19 -fopenmp - MIN: 1.36 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 0.2783 0.5566 0.8349 1.1132 1.3915 SE +/- 0.00107, N = 3 SE +/- 0.00988, N = 9 1.00883 1.23707 -fopenmp=libomp - MIN: 0.69 -fopenmp - MIN: 0.88 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 0.0989 0.1978 0.2967 0.3956 0.4945 SE +/- 0.001298, N = 3 SE +/- 0.000912, N = 3 0.325027 0.439708 -fopenmp=libomp - MIN: 0.28 -fopenmp - MIN: 0.4 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 0.6756 1.3512 2.0268 2.7024 3.378 SE +/- 0.00228, N = 3 SE +/- 0.00315, N = 3 2.76681 3.00252 -fopenmp=libomp - MIN: 2.62 -fopenmp - MIN: 2.87 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 0.408 0.816 1.224 1.632 2.04 SE +/- 0.00299, N = 3 SE +/- 0.00150, N = 3 1.70567 1.81351 -fopenmp=libomp - MIN: 1.54 -fopenmp - MIN: 1.68 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 0.3283 0.6566 0.9849 1.3132 1.6415 SE +/- 0.00418, N = 3 SE +/- 0.00729, N = 3 1.41167 1.45930 -fopenmp=libomp - MIN: 1.26 -fopenmp - MIN: 1.28 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 0.1897 0.3794 0.5691 0.7588 0.9485 SE +/- 0.000915, N = 3 SE +/- 0.001374, N = 3 0.842984 0.840383 -fopenmp=libomp - MIN: 0.77 -fopenmp - MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 0.2182 0.4364 0.6546 0.8728 1.091 SE +/- 0.002259, N = 3 SE +/- 0.007538, N = 3 0.902213 0.969972 -fopenmp=libomp - MIN: 0.85 -fopenmp - MIN: 0.89 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 0.0814 0.1628 0.2442 0.3256 0.407 SE +/- 0.000390, N = 3 SE +/- 0.001289, N = 3 0.200917 0.361691 -fopenmp=libomp - MIN: 0.17 -fopenmp - MIN: 0.32 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 0.0438 0.0876 0.1314 0.1752 0.219 SE +/- 0.000222, N = 3 SE +/- 0.000672, N = 3 0.175060 0.194587 -fopenmp=libomp - MIN: 0.16 -fopenmp - MIN: 0.18 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 150 300 450 600 750 SE +/- 2.16, N = 3 SE +/- 3.66, N = 3 589.28 686.19 -fopenmp=libomp - MIN: 555.66 -fopenmp - MIN: 652.78 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 100 200 300 400 500 SE +/- 0.21, N = 3 SE +/- 3.11, N = 3 352.47 448.25 -fopenmp=libomp - MIN: 333.78 -fopenmp - MIN: 427.18 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 150 300 450 600 750 SE +/- 6.09, N = 5 SE +/- 2.28, N = 3 594.11 686.05 -fopenmp=libomp - MIN: 556.85 -fopenmp - MIN: 656.34 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 0.4708 0.9416 1.4124 1.8832 2.354 SE +/- 0.00039, N = 3 SE +/- 0.00148, N = 3 2.06629 2.09246 -fopenmp=libomp - MIN: 1.99 -fopenmp - MIN: 2.04 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 0.7376 1.4752 2.2128 2.9504 3.688 SE +/- 0.00123, N = 3 SE +/- 0.00172, N = 3 2.86766 3.27829 -fopenmp=libomp - MIN: 2.67 -fopenmp - MIN: 3.11 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 0.8039 1.6078 2.4117 3.2156 4.0195 SE +/- 0.00202, N = 3 SE +/- 0.00574, N = 3 3.57288 3.57127 -fopenmp=libomp - MIN: 3.48 -fopenmp - MIN: 3.5 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 100 200 300 400 500 SE +/- 1.62, N = 3 SE +/- 1.07, N = 3 352.59 445.26 -fopenmp=libomp - MIN: 334.39 -fopenmp - MIN: 427.38 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU Clang 12.0 GCC 11.1 0.0557 0.1114 0.1671 0.2228 0.2785 SE +/- 0.000879, N = 3 SE +/- 0.000532, N = 3 0.160017 0.247554 -fopenmp=libomp - MIN: 0.14 -fopenmp - MIN: 0.23 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 150 300 450 600 750 SE +/- 0.97, N = 3 SE +/- 2.54, N = 3 584.54 677.71 -fopenmp=libomp - MIN: 556.26 -fopenmp - MIN: 648.93 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 100 200 300 400 500 SE +/- 2.41, N = 3 SE +/- 1.02, N = 3 353.58 444.23 -fopenmp=libomp - MIN: 333.93 -fopenmp - MIN: 427.49 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU Clang 12.0 GCC 11.1 0.0495 0.099 0.1485 0.198 0.2475 SE +/- 0.000357, N = 3 SE +/- 0.000536, N = 3 0.120406 0.219899 -fopenmp=libomp - MIN: 0.11 -fopenmp - MIN: 0.19 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU Clang 12.0 GCC 11.1 0.1353 0.2706 0.4059 0.5412 0.6765 SE +/- 0.001561, N = 3 SE +/- 0.000449, N = 3 0.515284 0.601530 -fopenmp=libomp - MIN: 0.48 -fopenmp - MIN: 0.56 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
PostgreSQL pgbench This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Only - Average Latency Clang 12.0 GCC 11.1 0.0623 0.1246 0.1869 0.2492 0.3115 SE +/- 0.002, N = 3 SE +/- 0.005, N = 13 0.265 0.277 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Write - Average Latency Clang 12.0 GCC 11.1 0.6293 1.2586 1.8879 2.5172 3.1465 SE +/- 0.007, N = 3 SE +/- 0.003, N = 3 2.702 2.797 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet Clang 12.0 GCC 11.1 5 10 15 20 25 SE +/- 0.41, N = 13 SE +/- 0.19, N = 12 14.21 19.40 -lomp - MIN: 12.25 / MAX: 44.36 -lgomp - MIN: 18 / MAX: 399.43 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.07, N = 13 SE +/- 0.05, N = 12 4.95 9.80 -lomp - MIN: 4.07 / MAX: 22.91 -lgomp - MIN: 9.23 / MAX: 38.71 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.04, N = 13 SE +/- 0.09, N = 12 4.22 9.56 -lomp - MIN: 3.66 / MAX: 22.54 -lgomp - MIN: 8.9 / MAX: 68.78 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.37, N = 13 SE +/- 0.08, N = 12 5.90 10.55 -lomp - MIN: 4.8 / MAX: 33.78 -lgomp - MIN: 10 / MAX: 25.64 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.48, N = 13 SE +/- 0.05, N = 11 5.26 9.43 -lomp - MIN: 3.73 / MAX: 46.61 -lgomp - MIN: 8.91 / MAX: 24.89 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.72, N = 13 SE +/- 0.15, N = 12 7.72 12.48 -lomp - MIN: 5.61 / MAX: 59.7 -lgomp - MIN: 11.37 / MAX: 205.28 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface Clang 12.0 GCC 11.1 2 4 6 8 10 SE +/- 0.02, N = 13 SE +/- 0.05, N = 12 2.38 6.15 -lomp - MIN: 2.14 / MAX: 10.57 -lgomp - MIN: 5.69 / MAX: 22.06 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet Clang 12.0 GCC 11.1 5 10 15 20 25 SE +/- 0.78, N = 13 SE +/- 0.19, N = 12 15.34 19.46 -lomp - MIN: 12.78 / MAX: 96.28 -lgomp - MIN: 17.96 / MAX: 72.73 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 Clang 12.0 GCC 11.1 6 12 18 24 30 SE +/- 0.51, N = 13 SE +/- 0.17, N = 12 27.34 25.34 -lomp - MIN: 21.3 / MAX: 132.89 -lgomp - MIN: 23.97 / MAX: 98.11 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.49, N = 13 SE +/- 0.11, N = 12 10.83 11.10 -lomp - MIN: 8.67 / MAX: 27.15 -lgomp - MIN: 10.39 / MAX: 64.83 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny Clang 12.0 GCC 11.1 6 12 18 24 30 SE +/- 0.25, N = 13 SE +/- 0.18, N = 12 25.28 23.12 -lomp - MIN: 23.29 / MAX: 148.87 -lgomp - MIN: 21.63 / MAX: 253.04 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd Clang 12.0 GCC 11.1 5 10 15 20 25 SE +/- 0.87, N = 13 SE +/- 0.10, N = 12 17.96 21.18 -lomp - MIN: 14.31 / MAX: 155.2 -lgomp - MIN: 20.25 / MAX: 100.94 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m Clang 12.0 GCC 11.1 20 40 60 80 100 SE +/- 1.81, N = 13 SE +/- 1.10, N = 12 26.47 94.38 -lomp - MIN: 20.72 / MAX: 164.27 -lgomp - MIN: 87.04 / MAX: 668.91 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 Clang 12.0 GCC 11.1 120 240 360 480 600 SE +/- 1.30, N = 3 SE +/- 2.55, N = 3 552.03 376.74 -fopenmp=libomp - MIN: 544.23 / MAX: 587.28 -fopenmp - MIN: 373.07 / MAX: 547.54 1. (CXX) g++ options: -O3 -march=native -flto -pthread -fvisibility=hidden -O2 -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 Clang 12.0 GCC 11.1 90 180 270 360 450 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 402.91 377.43 -fopenmp=libomp - MIN: 402.36 / MAX: 405.48 -fopenmp - MIN: 377.31 / MAX: 377.64 1. (CXX) g++ options: -O3 -march=native -flto -pthread -fvisibility=hidden -O2 -rdynamic -ldl
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Clang 12.0 GCC 11.1 30 60 90 120 150 SE +/- 0.91, N = 3 SE +/- 0.46, N = 3 138.55 142.20 -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=native -flto -lm
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Clang 12.0 GCC 11.1 4 8 12 16 20 SE +/- 0.018, N = 3 SE +/- 0.006, N = 3 15.352 7.794 1. (CC) gcc options: -lm -lpthread -O3 -march=native -flto
Primesieve Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.4 1e12 Prime Number Generation Clang 12.0 GCC 11.1 0.8618 1.7236 2.5854 3.4472 4.309 SE +/- 0.003, N = 3 SE +/- 0.006, N = 3 3.830 3.780 1. (CXX) g++ options: -O3 -march=native -flto -O2 -lpthread
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack Clang 12.0 GCC 11.1 1.2174 2.4348 3.6522 4.8696 6.087 SE +/- 0.002175, N = 3 SE +/- 0.003456, N = 3 5.410511 4.451765 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex Clang 12.0 GCC 11.1 1.0667 2.1334 3.2001 4.2668 5.3335 SE +/- 0.003944, N = 3 SE +/- 0.003221, N = 3 4.741088 4.340600 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 136 Ragdolls Clang 12.0 GCC 11.1 0.6296 1.2592 1.8888 2.5184 3.148 SE +/- 0.000181, N = 3 SE +/- 0.004472, N = 3 2.798322 2.553303 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh Clang 12.0 GCC 11.1 0.2117 0.4234 0.6351 0.8468 1.0585 SE +/- 0.000250, N = 3 SE +/- 0.002011, N = 3 0.940903 0.862000 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh Clang 12.0 GCC 11.1 0.2584 0.5168 0.7752 1.0336 1.292 SE +/- 0.001342, N = 3 SE +/- 0.001326, N = 3 1.148367 1.054273 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.003, N = 3 SE +/- 0.004, N = 3 9.591 8.619 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=native -flto -lncurses -lm
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.008, N = 5 SE +/- 0.003, N = 5 9.788 8.768 -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -flto -logg -lm
eSpeak-NG Speech Engine This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis Clang 12.0 GCC 11.1 7 14 21 28 35 SE +/- 0.02, N = 4 SE +/- 0.09, N = 4 29.81 30.51 1. (CC) gcc options: -O3 -march=native -flto -std=c99 -lpthread -lm
Gcrypt Library Libgcrypt is a general purpose cryptographic library developed as part of the GnuPG project. This is a benchmark of libgcrypt's integrated benchmark and is measuring the time to run the benchmark command with a cipher/mac/hash repetition count set for 50 times as simple, high level look at the overall crypto performance of the system under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.9 Clang 12.0 GCC 11.1 60 120 180 240 300 SE +/- 0.78, N = 3 SE +/- 0.95, N = 3 253.86 265.17 1. (CC) gcc options: -O3 -march=native -flto -fvisibility=hidden -lgpg-error
WebP2 Image Encode This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Default Clang 12.0 GCC 11.1 0.5949 1.1898 1.7847 2.3796 2.9745 SE +/- 0.009, N = 3 SE +/- 0.035, N = 3 2.492 2.644 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 75, Compression Effort 7 Clang 12.0 GCC 11.1 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 99.78 106.66 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 95, Compression Effort 7 Clang 12.0 GCC 11.1 40 80 120 160 200 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 182.29 196.49 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Compression Effort 5 Clang 12.0 GCC 11.1 2 4 6 8 10 SE +/- 0.013, N = 3 SE +/- 0.004, N = 3 6.495 5.765 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Lossless Compression Clang 12.0 GCC 11.1 90 180 270 360 450 SE +/- 0.16, N = 3 SE +/- 0.07, N = 3 400.62 389.07 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Medium Clang 12.0 GCC 11.1 2 4 6 8 10 SE +/- 0.0037, N = 3 SE +/- 0.0021, N = 3 4.5120 6.4270 1. (CXX) g++ options: -O3 -march=native -flto -O2 -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Thorough Clang 12.0 GCC 11.1 3 6 9 12 15 SE +/- 0.0107, N = 3 SE +/- 0.0219, N = 3 6.9380 9.4219 1. (CXX) g++ options: -O3 -march=native -flto -O2 -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Exhaustive Clang 12.0 GCC 11.1 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 14.01 16.39 1. (CXX) g++ options: -O3 -march=native -flto -O2 -pthread
GCC 11.1 Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP
OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: GCC 11.1.1 20210428, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driverProcessor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 28 May 2021 11:02 by user .
Clang 12.0 Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP
OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: Clang 12.0.0, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Processor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 28 May 2021 19:40 by user .