Xeon Platinum 8380 compiler benchmarks by Michael Larabel looking at GCC 11 against LLVM Clang 12 for some initial holiday weekend tests...
GCC 11.1 Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP
OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: GCC 11.1.1 20210428, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driverProcessor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Clang 12.0 OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: Clang 12.0.0, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Processor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake OpenBenchmarking.org Phoronix Test Suite 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN 800GB INTEL SSDPF21Q800GB ASPEED 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Fedora 34 5.12.6-300.fc34.x86_64 (x86_64) GCC 11.1.1 20210428 Clang 12.0.0 xfs 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compilers File-System Screen Resolution GCC 11 Vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake Performance System Logs - Transparent Huge Pages: madvise - CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto" - GCC 11.1: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver - Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270 - Python 3.9.5 - SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
GCC 11.1 vs. Clang 12.0 Comparison Phoronix Test Suite Baseline +64.2% +64.2% +128.4% +128.4% +192.6% +192.6% 256.6% 158.4% 126.5% 98% 82.6% 80% 79.3% 78.8% 61.7% 61.6% 54.7% 42.4% 38.9% 36.5% 35.8% 35.3% 27.2% 26.9% 26.3% 25.6% 22.6% 17.9% 16.9% 16.7% 16.4% 15.9% 15.8% 15.5% 15.2% 14.3% 11.2% 8.5% 8.2% 8.1% 8% 7.8% 7.5% 6.9% 6.3% 6.1% 6% 5.7% 5.6% 5.3% 4.7% 4.5% 4.5% 4.1% 3.9% 3.9% 3.7% 3.5% 3.5% 3.4% 3% 2.8% 2.6% 2.5% 2.4% 2% CPU - regnety_400m CPU - blazeface CPU-v3-v3 - mobilenet-v3 CPU-v2-v2 - mobilenet-v2 Total Time - 4.1.R.P.P 97% M.M.B.S.T - u8s8f32 - CPU D.B.s - u8s8f32 - CPU CPU - mnasnet CPU - shufflenet-v2 CPU - efficientnet-b0 Resizing M.M.B.S.T - f32 - CPU R.4.b.P 54.1% CPU - MobileNet v2 46.5% Medium IP Shapes 1D - f32 - CPU CPU - mobilenet Thorough IP Shapes 3D - u8s8f32 - CPU R.N.N.I - f32 - CPU CPU - googlenet R.N.N.I - u8s8f32 - CPU R.N.N.I - bf16bf16bf16 - CPU 8, Long Mode - Compression Speed 25.4% IP Shapes 1D - u8s8f32 - CPU 1000 Stack 21.5% CoreMark Size 666 - I.P.S 18.4% CPU - squeezenet_ssd Exhaustive Sharpen 16.8% M.M.B.S.T - bf16bf16bf16 - CPU R.N.N.T - f32 - CPU R.N.N.T - bf16bf16bf16 - CPU 160 - 256 - 57 R.N.N.T - u8s8f32 - CPU Enhanced 15.2% IP Shapes 3D - f32 - CPU D.B.s - bf16bf16bf16 - CPU Q.1.C.E.5 12.7% P.P.S 11.8% WAV To Opus Encode 11.6% WAV To MP3 11.3% D.B.s - u8s8f32 - CPU 3000 Fall 11.1% 10.9% 136 Ragdolls 9.6% CPU - yolov4-tiny 9.3% 1000 Convex 9.2% Prim Trimesh 9.2% Convex Trimesh 8.9% IP Shapes 1D - bf16bf16bf16 - CPU Bosphorus 4K Q.1.H.C Unkeyed Algorithms CPU - vgg16 7.9% Q.9.C.E.7 C.B.S.A - u8s8f32 - CPU Q.7.C.E.7 CPU - SqueezeNet v1.1 6.8% 2048 x 2048 - Total Time 6.5% IP Shapes 3D - bf16bf16bf16 - CPU Default 1 - Bosphorus 1080p Bosphorus 4K - Very Fast 7 - Bosphorus 1080p 8 - Compression Speed Preset 4 - Bosphorus 1080p 100 - 250 - Read Only - Average Latency Bosphorus 1080p - Ultra Fast Q.1.L 4.1% Bosphorus 1080p - Very Fast 100 - 250 - Read Only 19, Long Mode - Compression Speed 3.7% D.T 100 - 250 - Read Write 100 - 250 - Read Write - Average Latency V.Q.O - Bosphorus 1080p 3.5% C.B.S.A - f32 - CPU Q.1.L.H.C 3.1% Q.1.L.C 3% Bosphorus 4K - Ultra Fast Preset 4 - Bosphorus 4K 19 - Compression Speed 2.8% P.P.A P.S.O - Bosphorus 1080p 2.5% CPU - resnet18 T.T.S.S VMAF Optimized - Bosphorus 1080p 2% Preset 8 - Bosphorus 4K NCNN NCNN NCNN NCNN C-Ray oneDNN oneDNN NCNN NCNN NCNN GraphicsMagick oneDNN OpenSSL TNN ASTC Encoder oneDNN NCNN ASTC Encoder oneDNN oneDNN NCNN oneDNN oneDNN Zstd Compression oneDNN Bullet Physics Engine Coremark NCNN ASTC Encoder GraphicsMagick oneDNN oneDNN oneDNN Liquid-DSP oneDNN GraphicsMagick oneDNN oneDNN WebP2 Image Encode Himeno Benchmark Opus Codec Encoding LAME MP3 Encoding oneDNN Bullet Physics Engine Kripke Bullet Physics Engine NCNN Bullet Physics Engine Bullet Physics Engine Bullet Physics Engine oneDNN x265 WebP Image Encode Crypto++ NCNN WebP2 Image Encode oneDNN WebP2 Image Encode TNN AOBench oneDNN WebP2 Image Encode SVT-HEVC Kvazaar SVT-HEVC Zstd Compression SVT-AV1 PostgreSQL pgbench Gcrypt Library Kvazaar WebP Image Encode Kvazaar PostgreSQL pgbench Zstd Compression libjpeg-turbo tjbench PostgreSQL pgbench PostgreSQL pgbench SVT-VP9 oneDNN WebP Image Encode WebP2 Image Encode Kvazaar SVT-AV1 Zstd Compression Timed MrBayes Analysis SVT-VP9 NCNN eSpeak-NG Speech Engine SVT-VP9 SVT-AV1 GCC 11.1 Clang 12.0
GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake aobench: 2048 x 2048 - Total Time astcenc: Medium astcenc: Thorough astcenc: Exhaustive bullet: 3000 Fall bullet: 1000 Stack bullet: 1000 Convex bullet: 136 Ragdolls bullet: Prim Trimesh bullet: Convex Trimesh c-ray: Total Time - 4K, 16 Rays Per Pixel caffe: AlexNet - CPU - 200 caffe: GoogleNet - CPU - 200 coremark: CoreMark Size 666 - Iterations Per Second cryptopp: Unkeyed Algorithms daphne: OpenMP - NDT Mapping daphne: OpenMP - Points2Image daphne: OpenMP - Euclidean Cluster espeak: Text-To-Speech Synthesis encode-flac: WAV To FLAC gcrypt: gmpbench: Total Time graphics-magick: Rotate graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Resizing himeno: Poisson Pressure Solver kripke: kvazaar: Bosphorus 4K - Very Fast kvazaar: Bosphorus 4K - Ultra Fast kvazaar: Bosphorus 1080p - Very Fast kvazaar: Bosphorus 1080p - Ultra Fast encode-mp3: WAV To MP3 tjbench: Decompression Throughput liquid-dsp: 1 - 256 - 57 liquid-dsp: 160 - 256 - 57 ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU openssl: RSA 4096-bit Performance encode-opus: WAV To Opus Encode pgbench: 100 - 250 - Read Only pgbench: 100 - 250 - Read Only - Average Latency pgbench: 100 - 250 - Read Write pgbench: 100 - 250 - Read Write - Average Latency primesieve: 1e12 Prime Number Generation svt-av1: Preset 4 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 4 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-hevc: 1 - Bosphorus 1080p svt-hevc: 7 - Bosphorus 1080p svt-hevc: 10 - Bosphorus 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p mrbayes: Primate Phylogeny Analysis tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 encode-wavpack: WAV To WavPack webp: Default webp: Quality 100 webp: Quality 100, Lossless webp: Quality 100, Highest Compression webp: Quality 100, Lossless, Highest Compression webp2: Default webp2: Quality 75, Compression Effort 7 webp2: Quality 95, Compression Effort 7 webp2: Quality 100, Compression Effort 5 webp2: Quality 100, Lossless Compression x265: Bosphorus 4K x265: Bosphorus 1080p compress-zstd: 8 - Compression Speed compress-zstd: 8 - Decompression Speed compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed compress-zstd: 8, Long Mode - Compression Speed compress-zstd: 8, Long Mode - Decompression Speed compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19, Long Mode - Decompression Speed GCC 11.1 Clang 12.0 33.881 6.4270 9.4219 16.3885 3.873060 4.451765 4.34060 2.553303 0.86200 1.054273 7.794 298291 662408 2522898.568222 359.900917 1046.60 14507.806103537 1013.99 30.511 9.382 265.172 3871.6 745 898 1315 380 4651.527908 177613600 42.31 46.58 159.71 176.66 8.619 174.229617 60985333 3182866667 19.40 9.80 9.56 10.55 9.43 12.48 6.15 19.46 25.34 11.10 23.12 21.18 94.38 0.923876 1.40268 1.23707 0.439708 3.00252 1.81351 1.45930 0.840383 0.969972 0.361691 0.194587 686.190 448.247 686.045 2.09246 3.27829 3.57127 445.263 0.247554 677.708 444.228 0.219899 0.601530 17804.2 8.768 907401 0.277 89425 2.797 3.780 4.213 55.190 8.996 167.388 39.59 336.37 609.56 476.85 477.90 393.17 142.197 376.741 377.430 17.360 1.638 2.645 19.473 8.026 40.912 2.644 106.658 196.486 5.765 389.070 26.92 76.88 2611.0 2959.3 83.8 2537.3 1040.6 3168.4 47.9 2670.8 36.075 4.5120 6.9380 14.0133 4.301823 5.410511 4.741088 2.798322 0.940903 1.148367 15.352 297554 663282 2130829.931753 388.572346 29.806 9.474 253.856 745 769 1141 614 4161.653514 160155675 44.72 47.96 166.00 183.97 9.591 180.596292 61840333 3686466667 14.21 4.95 4.22 5.90 5.26 7.72 2.38 15.34 27.34 10.83 25.28 17.96 26.47 0.665355 1.21726 1.00883 0.325027 2.76681 1.70567 1.41167 0.842984 0.902213 0.200917 0.175060 589.283 352.474 594.109 2.06629 2.86766 3.57288 352.588 0.160017 584.535 353.576 0.120406 0.515284 11555.8 9.788 943043 0.265 92576 2.702 3.830 4.333 56.273 9.418 169.606 41.96 355.18 608.95 467.57 466.06 379.93 138.549 552.028 402.914 17.343 1.616 2.692 20.267 7.422 42.184 2.492 99.777 182.290 6.495 400.617 29.13 77.24 2748.1 2996.7 81.5 2495.2 830.0 3204.6 46.2 2632.3 OpenBenchmarking.org
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Medium GCC 11.1 Clang 12.0 2 4 6 8 10 SE +/- 0.0021, N = 3 SE +/- 0.0037, N = 3 6.4270 4.5120 1. (CXX) g++ options: -O3 -march=native -flto -O2 -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Thorough GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.0219, N = 3 SE +/- 0.0107, N = 3 9.4219 6.9380 1. (CXX) g++ options: -O3 -march=native -flto -O2 -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Exhaustive GCC 11.1 Clang 12.0 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 16.39 14.01 1. (CXX) g++ options: -O3 -march=native -flto -O2 -pthread
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack GCC 11.1 Clang 12.0 1.2174 2.4348 3.6522 4.8696 6.087 SE +/- 0.003456, N = 3 SE +/- 0.002175, N = 3 4.451765 5.410511 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex GCC 11.1 Clang 12.0 1.0667 2.1334 3.2001 4.2668 5.3335 SE +/- 0.003221, N = 3 SE +/- 0.003944, N = 3 4.340600 4.741088 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 136 Ragdolls GCC 11.1 Clang 12.0 0.6296 1.2592 1.8888 2.5184 3.148 SE +/- 0.004472, N = 3 SE +/- 0.000181, N = 3 2.553303 2.798322 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh GCC 11.1 Clang 12.0 0.2117 0.4234 0.6351 0.8468 1.0585 SE +/- 0.002011, N = 3 SE +/- 0.000250, N = 3 0.862000 0.940903 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh GCC 11.1 Clang 12.0 0.2584 0.5168 0.7752 1.0336 1.292 SE +/- 0.001326, N = 3 SE +/- 0.001342, N = 3 1.054273 1.148367 -lglut -lGL -lGLU 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel GCC 11.1 Clang 12.0 4 8 12 16 20 SE +/- 0.006, N = 3 SE +/- 0.018, N = 3 7.794 15.352 1. (CC) gcc options: -lm -lpthread -O3 -march=native -flto
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 GCC 11.1 Clang 12.0 60K 120K 180K 240K 300K SE +/- 46.89, N = 3 SE +/- 100.93, N = 3 298291 297554 1. (CXX) g++ options: -O3 -march=native -flto -fPIC -O2 -rdynamic -lboost_system -lboost_thread -lboost_filesystem -lboost_chrono -lboost_date_time -lboost_atomic -lglog -lgflags -lprotobuf -lpthread -lhdf5_cpp -lhdf5 -lhdf5_hl_cpp -lhdf5_hl -llmdb -lopenblas
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 GCC 11.1 Clang 12.0 140K 280K 420K 560K 700K SE +/- 192.49, N = 3 SE +/- 154.25, N = 3 662408 663282 1. (CXX) g++ options: -O3 -march=native -flto -fPIC -O2 -rdynamic -lboost_system -lboost_thread -lboost_filesystem -lboost_chrono -lboost_date_time -lboost_atomic -lglog -lgflags -lprotobuf -lpthread -lhdf5_cpp -lhdf5 -lhdf5_hl_cpp -lhdf5_hl -llmdb -lopenblas
eSpeak-NG Speech Engine This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis GCC 11.1 Clang 12.0 7 14 21 28 35 SE +/- 0.09, N = 4 SE +/- 0.02, N = 4 30.51 29.81 1. (CC) gcc options: -O3 -march=native -flto -std=c99 -lpthread -lm
Gcrypt Library Libgcrypt is a general purpose cryptographic library developed as part of the GnuPG project. This is a benchmark of libgcrypt's integrated benchmark and is measuring the time to run the benchmark command with a cipher/mac/hash repetition count set for 50 times as simple, high level look at the overall crypto performance of the system under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.9 GCC 11.1 Clang 12.0 60 120 180 240 300 SE +/- 0.95, N = 3 SE +/- 0.78, N = 3 265.17 253.86 1. (CC) gcc options: -O3 -march=native -flto -fvisibility=hidden -lgpg-error
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen GCC 11.1 Clang 12.0 200 400 600 800 1000 SE +/- 2.40, N = 3 SE +/- 2.73, N = 3 898 769 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced GCC 11.1 Clang 12.0 300 600 900 1200 1500 SE +/- 1.20, N = 3 SE +/- 4.10, N = 3 1315 1141 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing GCC 11.1 Clang 12.0 130 260 390 520 650 SE +/- 4.36, N = 3 SE +/- 15.04, N = 15 380 614 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -pthread -ljpeg -lX11 -lz -lm -lpthread
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 GCC 11.1 Clang 12.0 40M 80M 120M 160M 200M SE +/- 1982388.01, N = 5 SE +/- 1966921.21, N = 4 177613600 160155675 -fopenmp -fopenmp=libomp 1. (CXX) g++ options: -O3 -march=native -flto -O2
Kvazaar This is a test of Kvazaar as a CPU-based H.265 video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Very Fast GCC 11.1 Clang 12.0 10 20 30 40 50 SE +/- 0.35, N = 3 SE +/- 0.28, N = 3 42.31 44.72 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast GCC 11.1 Clang 12.0 11 22 33 44 55 SE +/- 0.54, N = 3 SE +/- 0.33, N = 3 46.58 47.96 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Very Fast GCC 11.1 Clang 12.0 40 80 120 160 200 SE +/- 0.42, N = 3 SE +/- 0.77, N = 3 159.71 166.00 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast GCC 11.1 Clang 12.0 40 80 120 160 200 SE +/- 1.31, N = 3 SE +/- 1.71, N = 3 176.66 183.97 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.004, N = 3 SE +/- 0.003, N = 3 8.619 9.591 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=native -flto -lncurses -lm
libjpeg-turbo tjbench tjbench is a JPEG decompression/compression benchmark that is part of libjpeg-turbo, a JPEG image codec library optimized for SIMD instructions on modern CPU architectures. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Megapixels/sec, More Is Better libjpeg-turbo tjbench 2.1.0 Test: Decompression Throughput GCC 11.1 Clang 12.0 40 80 120 160 200 SE +/- 1.34, N = 3 SE +/- 0.47, N = 3 174.23 180.60 1. (CC) gcc options: -O3 -march=native -flto -rdynamic -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 GCC 11.1 Clang 12.0 13M 26M 39M 52M 65M SE +/- 49079.30, N = 3 SE +/- 5206.83, N = 3 60985333 61840333 1. (CC) gcc options: -O3 -march=native -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 160 - Buffer Length: 256 - Filter Length: 57 GCC 11.1 Clang 12.0 800M 1600M 2400M 3200M 4000M SE +/- 22402480.02, N = 3 SE +/- 34322846.29, N = 3 3182866667 3686466667 1. (CC) gcc options: -O3 -march=native -flto -pthread -lm -lc -lliquid
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet GCC 11.1 Clang 12.0 5 10 15 20 25 SE +/- 0.19, N = 12 SE +/- 0.41, N = 13 19.40 14.21 -lgomp - MIN: 18 / MAX: 399.43 -lomp - MIN: 12.25 / MAX: 44.36 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.05, N = 12 SE +/- 0.07, N = 13 9.80 4.95 -lgomp - MIN: 9.23 / MAX: 38.71 -lomp - MIN: 4.07 / MAX: 22.91 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.09, N = 12 SE +/- 0.04, N = 13 9.56 4.22 -lgomp - MIN: 8.9 / MAX: 68.78 -lomp - MIN: 3.66 / MAX: 22.54 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.08, N = 12 SE +/- 0.37, N = 13 10.55 5.90 -lgomp - MIN: 10 / MAX: 25.64 -lomp - MIN: 4.8 / MAX: 33.78 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.05, N = 11 SE +/- 0.48, N = 13 9.43 5.26 -lgomp - MIN: 8.91 / MAX: 24.89 -lomp - MIN: 3.73 / MAX: 46.61 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.15, N = 12 SE +/- 0.72, N = 13 12.48 7.72 -lgomp - MIN: 11.37 / MAX: 205.28 -lomp - MIN: 5.61 / MAX: 59.7 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface GCC 11.1 Clang 12.0 2 4 6 8 10 SE +/- 0.05, N = 12 SE +/- 0.02, N = 13 6.15 2.38 -lgomp - MIN: 5.69 / MAX: 22.06 -lomp - MIN: 2.14 / MAX: 10.57 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet GCC 11.1 Clang 12.0 5 10 15 20 25 SE +/- 0.19, N = 12 SE +/- 0.78, N = 13 19.46 15.34 -lgomp - MIN: 17.96 / MAX: 72.73 -lomp - MIN: 12.78 / MAX: 96.28 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 GCC 11.1 Clang 12.0 6 12 18 24 30 SE +/- 0.17, N = 12 SE +/- 0.51, N = 13 25.34 27.34 -lgomp - MIN: 23.97 / MAX: 98.11 -lomp - MIN: 21.3 / MAX: 132.89 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.11, N = 12 SE +/- 0.49, N = 13 11.10 10.83 -lgomp - MIN: 10.39 / MAX: 64.83 -lomp - MIN: 8.67 / MAX: 27.15 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny GCC 11.1 Clang 12.0 6 12 18 24 30 SE +/- 0.18, N = 12 SE +/- 0.25, N = 13 23.12 25.28 -lgomp - MIN: 21.63 / MAX: 253.04 -lomp - MIN: 23.29 / MAX: 148.87 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd GCC 11.1 Clang 12.0 5 10 15 20 25 SE +/- 0.10, N = 12 SE +/- 0.87, N = 13 21.18 17.96 -lgomp - MIN: 20.25 / MAX: 100.94 -lomp - MIN: 14.31 / MAX: 155.2 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m GCC 11.1 Clang 12.0 20 40 60 80 100 SE +/- 1.10, N = 12 SE +/- 1.81, N = 13 94.38 26.47 -lgomp - MIN: 87.04 / MAX: 668.91 -lomp - MIN: 20.72 / MAX: 164.27 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 0.2079 0.4158 0.6237 0.8316 1.0395 SE +/- 0.001474, N = 3 SE +/- 0.000134, N = 3 0.923876 0.665355 -fopenmp - MIN: 0.85 -fopenmp=libomp - MIN: 0.61 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 0.3156 0.6312 0.9468 1.2624 1.578 SE +/- 0.00224, N = 3 SE +/- 0.00153, N = 3 1.40268 1.21726 -fopenmp - MIN: 1.36 -fopenmp=libomp - MIN: 1.19 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 0.2783 0.5566 0.8349 1.1132 1.3915 SE +/- 0.00988, N = 9 SE +/- 0.00107, N = 3 1.23707 1.00883 -fopenmp - MIN: 0.88 -fopenmp=libomp - MIN: 0.69 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 0.0989 0.1978 0.2967 0.3956 0.4945 SE +/- 0.000912, N = 3 SE +/- 0.001298, N = 3 0.439708 0.325027 -fopenmp - MIN: 0.4 -fopenmp=libomp - MIN: 0.28 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 0.6756 1.3512 2.0268 2.7024 3.378 SE +/- 0.00315, N = 3 SE +/- 0.00228, N = 3 3.00252 2.76681 -fopenmp - MIN: 2.87 -fopenmp=libomp - MIN: 2.62 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 0.408 0.816 1.224 1.632 2.04 SE +/- 0.00150, N = 3 SE +/- 0.00299, N = 3 1.81351 1.70567 -fopenmp - MIN: 1.68 -fopenmp=libomp - MIN: 1.54 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 0.3283 0.6566 0.9849 1.3132 1.6415 SE +/- 0.00729, N = 3 SE +/- 0.00418, N = 3 1.45930 1.41167 -fopenmp - MIN: 1.28 -fopenmp=libomp - MIN: 1.26 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 0.1897 0.3794 0.5691 0.7588 0.9485 SE +/- 0.001374, N = 3 SE +/- 0.000915, N = 3 0.840383 0.842984 -fopenmp - MIN: 0.8 -fopenmp=libomp - MIN: 0.77 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 0.2182 0.4364 0.6546 0.8728 1.091 SE +/- 0.007538, N = 3 SE +/- 0.002259, N = 3 0.969972 0.902213 -fopenmp - MIN: 0.89 -fopenmp=libomp - MIN: 0.85 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 0.0814 0.1628 0.2442 0.3256 0.407 SE +/- 0.001289, N = 3 SE +/- 0.000390, N = 3 0.361691 0.200917 -fopenmp - MIN: 0.32 -fopenmp=libomp - MIN: 0.17 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 0.0438 0.0876 0.1314 0.1752 0.219 SE +/- 0.000672, N = 3 SE +/- 0.000222, N = 3 0.194587 0.175060 -fopenmp - MIN: 0.18 -fopenmp=libomp - MIN: 0.16 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 150 300 450 600 750 SE +/- 3.66, N = 3 SE +/- 2.16, N = 3 686.19 589.28 -fopenmp - MIN: 652.78 -fopenmp=libomp - MIN: 555.66 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 100 200 300 400 500 SE +/- 3.11, N = 3 SE +/- 0.21, N = 3 448.25 352.47 -fopenmp - MIN: 427.18 -fopenmp=libomp - MIN: 333.78 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 150 300 450 600 750 SE +/- 2.28, N = 3 SE +/- 6.09, N = 5 686.05 594.11 -fopenmp - MIN: 656.34 -fopenmp=libomp - MIN: 556.85 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 0.4708 0.9416 1.4124 1.8832 2.354 SE +/- 0.00148, N = 3 SE +/- 0.00039, N = 3 2.09246 2.06629 -fopenmp - MIN: 2.04 -fopenmp=libomp - MIN: 1.99 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 0.7376 1.4752 2.2128 2.9504 3.688 SE +/- 0.00172, N = 3 SE +/- 0.00123, N = 3 3.27829 2.86766 -fopenmp - MIN: 3.11 -fopenmp=libomp - MIN: 2.67 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 0.8039 1.6078 2.4117 3.2156 4.0195 SE +/- 0.00574, N = 3 SE +/- 0.00202, N = 3 3.57127 3.57288 -fopenmp - MIN: 3.5 -fopenmp=libomp - MIN: 3.48 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 100 200 300 400 500 SE +/- 1.07, N = 3 SE +/- 1.62, N = 3 445.26 352.59 -fopenmp - MIN: 427.38 -fopenmp=libomp - MIN: 334.39 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU GCC 11.1 Clang 12.0 0.0557 0.1114 0.1671 0.2228 0.2785 SE +/- 0.000532, N = 3 SE +/- 0.000879, N = 3 0.247554 0.160017 -fopenmp - MIN: 0.23 -fopenmp=libomp - MIN: 0.14 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 150 300 450 600 750 SE +/- 2.54, N = 3 SE +/- 0.97, N = 3 677.71 584.54 -fopenmp - MIN: 648.93 -fopenmp=libomp - MIN: 556.26 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 100 200 300 400 500 SE +/- 1.02, N = 3 SE +/- 2.41, N = 3 444.23 353.58 -fopenmp - MIN: 427.49 -fopenmp=libomp - MIN: 333.93 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU GCC 11.1 Clang 12.0 0.0495 0.099 0.1485 0.198 0.2475 SE +/- 0.000536, N = 3 SE +/- 0.000357, N = 3 0.219899 0.120406 -fopenmp - MIN: 0.19 -fopenmp=libomp - MIN: 0.11 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU GCC 11.1 Clang 12.0 0.1353 0.2706 0.4059 0.5412 0.6765 SE +/- 0.000449, N = 3 SE +/- 0.001561, N = 3 0.601530 0.515284 -fopenmp - MIN: 0.56 -fopenmp=libomp - MIN: 0.48 1. (CXX) g++ options: -O3 -march=native -flto -std=c++11 -msse4.1 -fPIC -O2 -pie -lpthread -ldl
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance GCC 11.1 Clang 12.0 4K 8K 12K 16K 20K SE +/- 44.60, N = 3 SE +/- 10.32, N = 3 17804.2 11555.8 -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=native -flto -lssl -lcrypto -ldl
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.003, N = 5 SE +/- 0.008, N = 5 8.768 9.788 -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -flto -logg -lm
PostgreSQL pgbench This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Only GCC 11.1 Clang 12.0 200K 400K 600K 800K 1000K SE +/- 15989.65, N = 13 SE +/- 8511.06, N = 3 907401 943043 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Only - Average Latency GCC 11.1 Clang 12.0 0.0623 0.1246 0.1869 0.2492 0.3115 SE +/- 0.005, N = 13 SE +/- 0.002, N = 3 0.277 0.265 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Write GCC 11.1 Clang 12.0 20K 40K 60K 80K 100K SE +/- 106.75, N = 3 SE +/- 239.57, N = 3 89425 92576 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Write - Average Latency GCC 11.1 Clang 12.0 0.6293 1.2586 1.8879 2.5172 3.1465 SE +/- 0.003, N = 3 SE +/- 0.007, N = 3 2.797 2.702 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
Primesieve Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.4 1e12 Prime Number Generation GCC 11.1 Clang 12.0 0.8618 1.7236 2.5854 3.4472 4.309 SE +/- 0.006, N = 3 SE +/- 0.003, N = 3 3.780 3.830 1. (CXX) g++ options: -O3 -march=native -flto -O2 -lpthread
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 4 - Input: Bosphorus 4K GCC 11.1 Clang 12.0 0.9749 1.9498 2.9247 3.8996 4.8745 SE +/- 0.029, N = 3 SE +/- 0.006, N = 3 4.213 4.333 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 8 - Input: Bosphorus 4K GCC 11.1 Clang 12.0 13 26 39 52 65 SE +/- 0.23, N = 3 SE +/- 0.41, N = 3 55.19 56.27 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 4 - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 3 6 9 12 15 SE +/- 0.043, N = 3 SE +/- 0.082, N = 3 8.996 9.418 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 8 - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 40 80 120 160 200 SE +/- 0.34, N = 3 SE +/- 1.14, N = 3 167.39 169.61 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
SVT-HEVC This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 10 20 30 40 50 SE +/- 0.28, N = 3 SE +/- 0.21, N = 3 39.59 41.96 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 80 160 240 320 400 SE +/- 2.94, N = 3 SE +/- 1.29, N = 3 336.37 355.18 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 130 260 390 520 650 SE +/- 1.44, N = 3 SE +/- 2.17, N = 3 609.56 608.95 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 100 200 300 400 500 SE +/- 2.96, N = 3 SE +/- 3.96, N = 3 476.85 467.57 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -O2 -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 100 200 300 400 500 SE +/- 3.23, N = 3 SE +/- 0.60, N = 3 477.90 466.06 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -O2 -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p GCC 11.1 Clang 12.0 90 180 270 360 450 SE +/- 0.84, N = 3 SE +/- 1.57, N = 3 393.17 379.93 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -O2 -pie -rdynamic -lpthread -lrt -lm
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis GCC 11.1 Clang 12.0 30 60 90 120 150 SE +/- 0.46, N = 3 SE +/- 0.91, N = 3 142.20 138.55 -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=native -flto -lm
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 GCC 11.1 Clang 12.0 120 240 360 480 600 SE +/- 2.55, N = 3 SE +/- 1.30, N = 3 376.74 552.03 -fopenmp - MIN: 373.07 / MAX: 547.54 -fopenmp=libomp - MIN: 544.23 / MAX: 587.28 1. (CXX) g++ options: -O3 -march=native -flto -pthread -fvisibility=hidden -O2 -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 GCC 11.1 Clang 12.0 90 180 270 360 450 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 377.43 402.91 -fopenmp - MIN: 377.31 / MAX: 377.64 -fopenmp=libomp - MIN: 402.36 / MAX: 405.48 1. (CXX) g++ options: -O3 -march=native -flto -pthread -fvisibility=hidden -O2 -rdynamic -ldl
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Default GCC 11.1 Clang 12.0 0.3686 0.7372 1.1058 1.4744 1.843 SE +/- 0.000, N = 3 SE +/- 0.003, N = 3 1.638 1.616 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 GCC 11.1 Clang 12.0 0.6057 1.2114 1.8171 2.4228 3.0285 SE +/- 0.001, N = 3 SE +/- 0.003, N = 3 2.645 2.692 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless GCC 11.1 Clang 12.0 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 19.47 20.27 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression GCC 11.1 Clang 12.0 2 4 6 8 10 SE +/- 0.007, N = 3 SE +/- 0.007, N = 3 8.026 7.422 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression GCC 11.1 Clang 12.0 10 20 30 40 50 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 40.91 42.18 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -pthread -lm -ljpeg
WebP2 Image Encode This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Default GCC 11.1 Clang 12.0 0.5949 1.1898 1.7847 2.3796 2.9745 SE +/- 0.035, N = 3 SE +/- 0.009, N = 3 2.644 2.492 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 75, Compression Effort 7 GCC 11.1 Clang 12.0 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 106.66 99.78 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 95, Compression Effort 7 GCC 11.1 Clang 12.0 40 80 120 160 200 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 196.49 182.29 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Compression Effort 5 GCC 11.1 Clang 12.0 2 4 6 8 10 SE +/- 0.004, N = 3 SE +/- 0.013, N = 3 5.765 6.495 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Lossless Compression GCC 11.1 Clang 12.0 90 180 270 360 450 SE +/- 0.07, N = 3 SE +/- 0.16, N = 3 389.07 400.62 1. (CXX) g++ options: -O3 -march=native -flto -fno-rtti -O2 -rdynamic -lpthread -ljpeg
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K GCC 11.1 Clang 12.0 7 14 21 28 35 SE +/- 0.27, N = 3 SE +/- 0.16, N = 3 26.92 29.13 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread -lrt -ldl
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p GCC 11.1 Clang 12.0 20 40 60 80 100 SE +/- 0.62, N = 3 SE +/- 0.27, N = 3 76.88 77.24 1. (CXX) g++ options: -O3 -march=native -flto -O2 -rdynamic -lpthread -lrt -ldl
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Compression Speed GCC 11.1 Clang 12.0 600 1200 1800 2400 3000 SE +/- 30.03, N = 15 SE +/- 30.31, N = 5 2611.0 2748.1 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Decompression Speed GCC 11.1 Clang 12.0 600 1200 1800 2400 3000 SE +/- 2.87, N = 15 SE +/- 7.51, N = 5 2959.3 2996.7 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed GCC 11.1 Clang 12.0 20 40 60 80 100 SE +/- 0.39, N = 3 SE +/- 0.87, N = 5 83.8 81.5 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed GCC 11.1 Clang 12.0 500 1000 1500 2000 2500 SE +/- 0.57, N = 3 SE +/- 13.91, N = 5 2537.3 2495.2 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Compression Speed GCC 11.1 Clang 12.0 200 400 600 800 1000 SE +/- 3.20, N = 3 SE +/- 9.83, N = 3 1040.6 830.0 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Decompression Speed GCC 11.1 Clang 12.0 700 1400 2100 2800 3500 SE +/- 6.63, N = 3 SE +/- 4.90, N = 3 3168.4 3204.6 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed GCC 11.1 Clang 12.0 11 22 33 44 55 SE +/- 0.48, N = 15 SE +/- 0.37, N = 15 47.9 46.2 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed GCC 11.1 Clang 12.0 600 1200 1800 2400 3000 SE +/- 2.07, N = 15 SE +/- 1.79, N = 15 2670.8 2632.3 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz
GCC 11.1 Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP
OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: GCC 11.1.1 20210428, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driverProcessor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 28 May 2021 11:02 by user .
Clang 12.0 Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP
OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: Clang 12.0.0, File-System: xfs, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Processor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270Python Notes: Python 3.9.5Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 28 May 2021 19:40 by user .