Ryzen 9 5950X Clang 12 vs. GCC 11 Benchmarks GCC 11.1 versus LLVM Clang 12 on AMD Ryzen 9 5950X. Benchmarks by Michael Larabel for a future article. Clang 12: -O2: Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3302 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 OS: Fedora 34, Kernel: 5.11.20-300.fc34.x86_64 (x86_64), Desktop: GNOME Shell 40.1, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 21.0.3 (LLVM 12.0.0), Compiler: Clang 12.0.0, File-System: btrfs, Screen Resolution: 3840x2160 Clang 12: -O3 -march=native: Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3302 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 OS: Fedora 34, Kernel: 5.11.20-300.fc34.x86_64 (x86_64), Desktop: GNOME Shell 40.1, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 21.0.3 (LLVM 12.0.0), Compiler: Clang 12.0.0, File-System: btrfs, Screen Resolution: 3840x2160 Clang 12: -O3 -march=native -flto: Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3302 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 OS: Fedora 34, Kernel: 5.11.20-300.fc34.x86_64 (x86_64), Desktop: GNOME Shell 40.1, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 21.0.3 (LLVM 12.0.0), Compiler: Clang 12.0.0, File-System: btrfs, Screen Resolution: 3840x2160 GCC 11.1: -O3 -march=native -flto: Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3302 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 OS: Fedora 34, Kernel: 5.11.20-300.fc34.x86_64 (x86_64), Desktop: GNOME Shell 40.1, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 21.0.3 (LLVM 12.0.0), Compiler: GCC 11.1.1 20210428 + Clang 12.0.0, File-System: btrfs, Screen Resolution: 3840x2160 GCC 11.1: -O3 -march=native: Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3302 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 OS: Fedora 34, Kernel: 5.11.20-300.fc34.x86_64 (x86_64), Desktop: GNOME Shell 40.1, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 21.0.3 (LLVM 12.0.0), Compiler: GCC 11.1.1 20210428 + Clang 12.0.0, File-System: btrfs, Screen Resolution: 3840x2160 GCC 11.1: -O2: Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3302 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 OS: Fedora 34, Kernel: 5.11.20-300.fc34.x86_64 (x86_64), Desktop: GNOME Shell 40.1, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 21.0.3 (LLVM 12.0.0), Compiler: GCC 11.1.1 20210428 + Clang 12.0.0, File-System: btrfs, Screen Resolution: 3840x2160 Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Seconds < Lower Is Better Clang 12: -O2 ..................... 93.53 |=================================== Clang 12: -O3 -march=native ....... 90.79 |================================== Clang 12: -O3 -march=native -flto . 92.41 |================================== GCC 11.1: -O3 -march=native -flto . 93.34 |=================================== GCC 11.1: -O3 -march=native ....... 92.97 |=================================== GCC 11.1: -O2 ..................... 96.79 |==================================== Timed HMMer Search 3.3.2 Pfam Database Search Seconds < Lower Is Better Clang 12: -O2 ..................... 96.09 |=================================== Clang 12: -O3 -march=native ....... 95.21 |================================== Clang 12: -O3 -march=native -flto . 93.31 |================================== GCC 11.1: -O3 -march=native -flto . 97.71 |=================================== GCC 11.1: -O3 -march=native ....... 97.74 |=================================== GCC 11.1: -O2 ..................... 99.72 |==================================== LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein ns/day > Higher Is Better Clang 12: -O2 ..................... 13.09 |=================================== Clang 12: -O3 -march=native ....... 13.30 |==================================== Clang 12: -O3 -march=native -flto . 13.34 |==================================== GCC 11.1: -O3 -march=native -flto . 12.81 |=================================== GCC 11.1: -O3 -march=native ....... 12.75 |================================== GCC 11.1: -O2 ..................... 12.86 |=================================== WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless Encode Time - Seconds < Lower Is Better Clang 12: -O2 ..................... 13.65 |=================================== Clang 12: -O3 -march=native ....... 14.11 |==================================== Clang 12: -O3 -march=native -flto . 13.64 |=================================== GCC 11.1: -O3 -march=native -flto . 13.25 |================================== GCC 11.1: -O3 -march=native ....... 13.55 |=================================== GCC 11.1: -O2 ..................... 13.18 |================================== WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression Encode Time - Seconds < Lower Is Better Clang 12: -O2 ..................... 4.778 |================================ Clang 12: -O3 -march=native ....... 4.703 |================================ Clang 12: -O3 -march=native -flto . 4.722 |================================ GCC 11.1: -O3 -march=native -flto . 5.184 |=================================== GCC 11.1: -O3 -march=native ....... 5.200 |=================================== GCC 11.1: -O2 ..................... 5.355 |==================================== WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression Encode Time - Seconds < Lower Is Better Clang 12: -O2 ..................... 29.12 |==================================== Clang 12: -O3 -march=native ....... 28.64 |=================================== Clang 12: -O3 -march=native -flto . 27.40 |================================== GCC 11.1: -O3 -march=native -flto . 27.95 |=================================== GCC 11.1: -O3 -march=native ....... 28.33 |=================================== GCC 11.1: -O2 ..................... 27.42 |================================== GraphicsMagick 1.3.33 Operation: Rotate Iterations Per Minute > Higher Is Better Clang 12: -O2 ..................... 921 |================================= Clang 12: -O3 -march=native ....... 981 |=================================== Clang 12: -O3 -march=native -flto . 961 |================================== GCC 11.1: -O3 -march=native -flto . 964 |================================== GCC 11.1: -O3 -march=native ....... 1036 |===================================== GCC 11.1: -O2 ..................... 985 |=================================== GraphicsMagick 1.3.33 Operation: Sharpen Iterations Per Minute > Higher Is Better Clang 12: -O2 ..................... 206 |==================== Clang 12: -O3 -march=native ....... 235 |======================= Clang 12: -O3 -march=native -flto . 237 |======================== GCC 11.1: -O3 -march=native -flto . 382 |====================================== GCC 11.1: -O3 -march=native ....... 370 |===================================== GCC 11.1: -O2 ..................... 226 |====================== GraphicsMagick 1.3.33 Operation: Enhanced Iterations Per Minute > Higher Is Better Clang 12: -O2 ..................... 411 |=================================== Clang 12: -O3 -march=native ....... 452 |====================================== Clang 12: -O3 -march=native -flto . 451 |====================================== GCC 11.1: -O3 -march=native -flto . 449 |====================================== GCC 11.1: -O3 -march=native ....... 449 |====================================== GCC 11.1: -O2 ..................... 422 |=================================== GraphicsMagick 1.3.33 Operation: Resizing Iterations Per Minute > Higher Is Better Clang 12: -O2 ..................... 1693 |============================== Clang 12: -O3 -march=native ....... 1767 |=============================== Clang 12: -O3 -march=native -flto . 1762 |=============================== GCC 11.1: -O3 -march=native -flto . 2085 |==================================== GCC 11.1: -O3 -march=native ....... 2120 |===================================== GCC 11.1: -O2 ..................... 1803 |=============================== SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p Frames Per Second > Higher Is Better Clang 12: -O2 ..................... 225.15 |================================= Clang 12: -O3 -march=native ....... 229.27 |================================== Clang 12: -O3 -march=native -flto . 236.05 |=================================== GCC 11.1: -O3 -march=native -flto . 227.04 |================================== GCC 11.1: -O3 -march=native ....... 221.71 |================================= GCC 11.1: -O2 ..................... 217.32 |================================ SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p Frames Per Second > Higher Is Better Clang 12: -O2 ..................... 368.55 |================================== Clang 12: -O3 -march=native ....... 375.95 |=================================== Clang 12: -O3 -march=native -flto . 375.08 |=================================== GCC 11.1: -O3 -march=native -flto . 368.55 |================================== GCC 11.1: -O3 -march=native ....... 369.54 |================================== GCC 11.1: -O2 ..................... 365.79 |================================== SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p Frames Per Second > Higher Is Better Clang 12: -O2 ..................... 231.35 |================================== Clang 12: -O3 -march=native ....... 234.71 |=================================== Clang 12: -O3 -march=native -flto . 232.66 |=================================== GCC 11.1: -O3 -march=native -flto . 230.43 |================================== GCC 11.1: -O3 -march=native ....... 231.16 |================================== GCC 11.1: -O2 ..................... 230.53 |================================== SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p Frames Per Second > Higher Is Better Clang 12: -O2 ..................... 235.64 |================================== Clang 12: -O3 -march=native ....... 239.88 |=================================== Clang 12: -O3 -march=native -flto . 238.13 |=================================== GCC 11.1: -O3 -march=native -flto . 234.70 |================================== GCC 11.1: -O3 -march=native ....... 235.59 |================================== GCC 11.1: -O2 ..................... 235.27 |================================== SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p Frames Per Second > Higher Is Better Clang 12: -O2 ..................... 225.57 |=================================== Clang 12: -O3 -march=native ....... 227.73 |=================================== Clang 12: -O3 -march=native -flto . 227.66 |=================================== GCC 11.1: -O3 -march=native -flto . 222.33 |================================== GCC 11.1: -O3 -march=native ....... 223.11 |================================== GCC 11.1: -O2 ..................... 222.63 |================================== x265 3.4 Video Input: Bosphorus 4K Frames Per Second > Higher Is Better Clang 12: -O2 ..................... 27.77 |=================================== Clang 12: -O3 -march=native ....... 27.75 |=================================== Clang 12: -O3 -march=native -flto . 28.47 |==================================== GCC 11.1: -O3 -march=native -flto . 26.17 |================================= GCC 11.1: -O3 -march=native ....... 25.91 |================================= GCC 11.1: -O2 ..................... 26.24 |================================= Coremark 1.0 CoreMark Size 666 - Iterations Per Second Iterations/Sec > Higher Is Better Clang 12: -O2 ..................... 712758.36 |=========================== Clang 12: -O3 -march=native ....... 722694.01 |=========================== Clang 12: -O3 -march=native -flto . 714933.86 |=========================== GCC 11.1: -O3 -march=native -flto . 849671.96 |================================ GCC 11.1: -O3 -march=native ....... 808580.86 |============================== GCC 11.1: -O2 ..................... 830811.53 |=============================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better Clang 12: -O2 ..................... 4877.31 |============================== Clang 12: -O3 -march=native ....... 5192.19 |================================ Clang 12: -O3 -march=native -flto . 5289.92 |================================= GCC 11.1: -O3 -march=native -flto . 5445.64 |================================== GCC 11.1: -O3 -march=native ....... 5314.73 |================================= GCC 11.1: -O2 ..................... 5120.94 |================================ PJSIP 2.11 Method: INVITE Responses Per Second > Higher Is Better Clang 12: -O2 ..................... 4575 |=================================== Clang 12: -O3 -march=native ....... 4708 |==================================== GCC 11.1: -O3 -march=native -flto . 4616 |=================================== GCC 11.1: -O3 -march=native ....... 4671 |==================================== GCC 11.1: -O2 ..................... 4815 |===================================== PJSIP 2.11 Method: OPTIONS, Stateful Responses Per Second > Higher Is Better Clang 12: -O2 ..................... 7942 |===================================== Clang 12: -O3 -march=native ....... 7958 |===================================== GCC 11.1: -O3 -march=native -flto . 7916 |===================================== GCC 11.1: -O3 -march=native ....... 7982 |===================================== GCC 11.1: -O2 ..................... 7860 |==================================== PJSIP 2.11 Method: OPTIONS, Stateless Responses Per Second > Higher Is Better Clang 12: -O2 ..................... 221107 |================================== Clang 12: -O3 -march=native ....... 222572 |================================== GCC 11.1: -O3 -march=native -flto . 222581 |================================== GCC 11.1: -O3 -march=native ....... 221759 |================================== GCC 11.1: -O2 ..................... 230599 |=================================== C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better Clang 12: -O2 ..................... 49.28 |============================= Clang 12: -O3 -march=native ....... 45.01 |=========================== Clang 12: -O3 -march=native -flto . 44.74 |========================== GCC 11.1: -O3 -march=native -flto . 25.48 |=============== GCC 11.1: -O3 -march=native ....... 25.41 |=============== GCC 11.1: -O2 ..................... 60.93 |==================================== AOBench Size: 2048 x 2048 - Total Time Seconds < Lower Is Better Clang 12: -O2 ..................... 33.78 |==================================== Clang 12: -O3 -march=native ....... 30.23 |================================ Clang 12: -O3 -march=native -flto . 29.73 |================================ GCC 11.1: -O3 -march=native -flto . 26.31 |============================ GCC 11.1: -O3 -march=native ....... 25.48 |=========================== GCC 11.1: -O2 ..................... 31.73 |================================== FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better Clang 12: -O2 ..................... 7.687 |==================================== Clang 12: -O3 -march=native ....... 5.666 |=========================== Clang 12: -O3 -march=native -flto . 5.756 |=========================== GCC 11.1: -O3 -march=native -flto . 6.195 |============================= GCC 11.1: -O3 -march=native ....... 6.237 |============================= GCC 11.1: -O2 ..................... 5.874 |============================ LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better Clang 12: -O2 ..................... 6.565 |=================================== Clang 12: -O3 -march=native ....... 6.098 |================================ Clang 12: -O3 -march=native -flto . 5.810 |=============================== GCC 11.1: -O3 -march=native -flto . 5.403 |============================= GCC 11.1: -O3 -march=native ....... 5.503 |============================= GCC 11.1: -O2 ..................... 6.802 |==================================== Opus Codec Encoding 1.3.1 WAV To Opus Encode Seconds < Lower Is Better Clang 12: -O2 ..................... 5.874 |================================ Clang 12: -O3 -march=native ....... 5.508 |============================== Clang 12: -O3 -march=native -flto . 5.558 |=============================== GCC 11.1: -O3 -march=native -flto . 5.475 |============================== GCC 11.1: -O3 -march=native ....... 5.387 |============================== GCC 11.1: -O2 ..................... 6.515 |==================================== Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better Clang 12: -O2 ..................... 576660000 |============================== Clang 12: -O3 -march=native ....... 576050000 |============================== Clang 12: -O3 -march=native -flto . 587800000 |============================== GCC 11.1: -O3 -march=native -flto . 619393333 |================================ GCC 11.1: -O3 -march=native ....... 611560000 |================================ GCC 11.1: -O2 ..................... 594850000 |=============================== Liquid-DSP 2021.01.31 Threads: 16 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better Clang 12: -O2 ..................... 1056400000 |============================== Clang 12: -O3 -march=native ....... 1061900000 |============================== Clang 12: -O3 -march=native -flto . 1073466667 |============================== GCC 11.1: -O3 -march=native -flto . 1093266667 |=============================== GCC 11.1: -O3 -march=native ....... 1085000000 |=============================== GCC 11.1: -O2 ..................... 1041200000 |============================== libjpeg-turbo tjbench 2.1.0 Test: Decompression Throughput Megapixels/sec > Higher Is Better Clang 12: -O2 ..................... 267.55 |================================== Clang 12: -O3 -march=native ....... 272.51 |================================== Clang 12: -O3 -march=native -flto . 266.23 |================================== GCC 11.1: -O3 -march=native -flto . 270.91 |================================== GCC 11.1: -O3 -march=native ....... 268.11 |================================== GCC 11.1: -O2 ..................... 277.50 |=================================== ASTC Encoder 2.4 Preset: Medium Seconds < Lower Is Better Clang 12: -O2 ..................... 3.3367 |========================= Clang 12: -O3 -march=native ....... 3.3179 |========================= Clang 12: -O3 -march=native -flto . 3.3728 |========================= GCC 11.1: -O3 -march=native -flto . 4.6995 |=================================== GCC 11.1: -O3 -march=native ....... 4.6277 |================================== GCC 11.1: -O2 ..................... 4.6296 |================================== ASTC Encoder 2.4 Preset: Thorough Seconds < Lower Is Better Clang 12: -O2 ..................... 9.3950 |=================================== Clang 12: -O3 -march=native ....... 9.3549 |=================================== Clang 12: -O3 -march=native -flto . 9.4432 |=================================== GCC 11.1: -O3 -march=native -flto . 7.5299 |============================ GCC 11.1: -O3 -march=native ....... 7.5718 |============================ GCC 11.1: -O2 ..................... 7.6841 |============================ ASTC Encoder 2.4 Preset: Exhaustive Seconds < Lower Is Better Clang 12: -O2 ..................... 52.20 |================================= Clang 12: -O3 -march=native ....... 51.82 |================================= Clang 12: -O3 -march=native -flto . 51.90 |================================= GCC 11.1: -O3 -march=native -flto . 56.19 |=================================== GCC 11.1: -O3 -march=native ....... 56.64 |==================================== GCC 11.1: -O2 ..................... 57.14 |==================================== SQLite Speedtest 3.30 Timed Time - Size 1,000 Seconds < Lower Is Better Clang 12: -O2 ..................... 47.79 |=================================== Clang 12: -O3 -march=native ....... 47.63 |=================================== Clang 12: -O3 -march=native -flto . 48.58 |==================================== GCC 11.1: -O3 -march=native -flto . 46.48 |================================== GCC 11.1: -O3 -march=native ....... 46.92 |=================================== GCC 11.1: -O2 ..................... 47.40 |=================================== NCNN 20201218 Target: CPU - Model: mobilenet ms < Lower Is Better Clang 12: -O2 ..................... 11.58 |============================== Clang 12: -O3 -march=native ....... 11.46 |============================== Clang 12: -O3 -march=native -flto . 11.05 |============================= GCC 11.1: -O3 -march=native -flto . 13.74 |==================================== GCC 11.1: -O3 -march=native ....... 12.70 |================================= GCC 11.1: -O2 ..................... 12.95 |================================== NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better Clang 12: -O2 ..................... 3.56 |============================= Clang 12: -O3 -march=native ....... 3.57 |============================= Clang 12: -O3 -march=native -flto . 3.42 |============================ GCC 11.1: -O3 -march=native -flto . 4.46 |==================================== GCC 11.1: -O3 -march=native ....... 4.43 |==================================== GCC 11.1: -O2 ..................... 4.54 |===================================== NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better Clang 12: -O2 ..................... 3.36 |============================== Clang 12: -O3 -march=native ....... 3.07 |=========================== Clang 12: -O3 -march=native -flto . 2.93 |========================== GCC 11.1: -O3 -march=native -flto . 3.86 |================================== GCC 11.1: -O3 -march=native ....... 3.84 |================================== GCC 11.1: -O2 ..................... 4.15 |===================================== NCNN 20201218 Target: CPU - Model: mnasnet ms < Lower Is Better Clang 12: -O2 ..................... 3.26 |============================== Clang 12: -O3 -march=native ....... 3.26 |============================== Clang 12: -O3 -march=native -flto . 3.08 |============================= GCC 11.1: -O3 -march=native -flto . 3.92 |===================================== GCC 11.1: -O3 -march=native ....... 3.95 |===================================== GCC 11.1: -O2 ..................... 3.97 |===================================== NCNN 20201218 Target: CPU - Model: efficientnet-b0 ms < Lower Is Better Clang 12: -O2 ..................... 4.52 |=============================== Clang 12: -O3 -march=native ....... 4.51 |=============================== Clang 12: -O3 -march=native -flto . 4.30 |============================= GCC 11.1: -O3 -march=native -flto . 5.36 |===================================== GCC 11.1: -O3 -march=native ....... 5.37 |===================================== GCC 11.1: -O2 ..................... 5.41 |===================================== NCNN 20201218 Target: CPU - Model: googlenet ms < Lower Is Better Clang 12: -O2 ..................... 11.88 |================================ Clang 12: -O3 -march=native ....... 11.77 |================================ Clang 12: -O3 -march=native -flto . 11.62 |================================ GCC 11.1: -O3 -march=native -flto . 13.26 |==================================== GCC 11.1: -O3 -march=native ....... 12.85 |=================================== GCC 11.1: -O2 ..................... 13.15 |==================================== NCNN 20201218 Target: CPU - Model: vgg16 ms < Lower Is Better Clang 12: -O2 ..................... 57.26 |==================================== Clang 12: -O3 -march=native ....... 56.93 |=================================== Clang 12: -O3 -march=native -flto . 56.11 |=================================== GCC 11.1: -O3 -march=native -flto . 57.84 |==================================== GCC 11.1: -O3 -march=native ....... 57.61 |==================================== GCC 11.1: -O2 ..................... 56.82 |=================================== NCNN 20201218 Target: CPU - Model: resnet18 ms < Lower Is Better Clang 12: -O2 ..................... 14.05 |================================== Clang 12: -O3 -march=native ....... 13.90 |================================== Clang 12: -O3 -march=native -flto . 13.86 |================================== GCC 11.1: -O3 -march=native -flto . 14.80 |==================================== GCC 11.1: -O3 -march=native ....... 14.33 |=================================== GCC 11.1: -O2 ..................... 14.50 |=================================== NCNN 20201218 Target: CPU - Model: alexnet ms < Lower Is Better Clang 12: -O2 ..................... 11.09 |=================================== Clang 12: -O3 -march=native ....... 11.16 |==================================== Clang 12: -O3 -march=native -flto . 11.25 |==================================== GCC 11.1: -O3 -march=native -flto . 11.13 |==================================== GCC 11.1: -O3 -march=native ....... 10.99 |=================================== GCC 11.1: -O2 ..................... 11.16 |==================================== NCNN 20201218 Target: CPU - Model: resnet50 ms < Lower Is Better Clang 12: -O2 ..................... 22.99 |================================ Clang 12: -O3 -march=native ....... 23.25 |================================ Clang 12: -O3 -march=native -flto . 22.90 |=============================== GCC 11.1: -O3 -march=native -flto . 25.84 |==================================== GCC 11.1: -O3 -march=native ....... 25.33 |=================================== GCC 11.1: -O2 ..................... 26.18 |==================================== NCNN 20201218 Target: CPU - Model: squeezenet_ssd ms < Lower Is Better Clang 12: -O2 ..................... 12.61 |=============================== Clang 12: -O3 -march=native ....... 12.35 |============================== Clang 12: -O3 -march=native -flto . 12.43 |=============================== GCC 11.1: -O3 -march=native -flto . 14.50 |==================================== GCC 11.1: -O3 -march=native ....... 13.93 |================================== GCC 11.1: -O2 ..................... 14.61 |==================================== NCNN 20201218 Target: CPU - Model: regnety_400m ms < Lower Is Better Clang 12: -O2 ..................... 12.93 |========================= Clang 12: -O3 -march=native ....... 12.59 |======================== Clang 12: -O3 -march=native -flto . 12.14 |======================== GCC 11.1: -O3 -march=native -flto . 18.53 |==================================== GCC 11.1: -O3 -march=native ....... 17.18 |================================= GCC 11.1: -O2 ..................... 17.24 |================================= TNN 0.2.3 Target: CPU - Model: MobileNet v2 ms < Lower Is Better Clang 12: -O2 ..................... 271.03 |=========================== Clang 12: -O3 -march=native ....... 355.93 |=================================== Clang 12: -O3 -march=native -flto . 261.84 |========================== GCC 11.1: -O3 -march=native -flto . 215.34 |===================== GCC 11.1: -O3 -march=native ....... 227.77 |====================== GCC 11.1: -O2 ..................... 219.13 |====================== TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 ms < Lower Is Better Clang 12: -O2 ..................... 208.56 |================================== Clang 12: -O3 -march=native ....... 215.78 |=================================== Clang 12: -O3 -march=native -flto . 207.93 |================================== GCC 11.1: -O3 -march=native -flto . 202.57 |================================= GCC 11.1: -O3 -march=native ....... 214.28 |=================================== GCC 11.1: -O2 ..................... 208.78 |================================== NCNN 20201218 Target: CPU - Model: shufflenet-v2 ms < Lower Is Better Clang 12: -O2 ..................... 3.81 |======================= Clang 12: -O3 -march=native ....... 3.74 |======================= Clang 12: -O3 -march=native -flto . 3.69 |======================= GCC 11.1: -O3 -march=native -flto . 6.01 |===================================== GCC 11.1: -O3 -march=native ....... 4.39 |=========================== GCC 11.1: -O2 ..................... 4.33 |=========================== NCNN 20201218 Target: CPU - Model: blazeface ms < Lower Is Better Clang 12: -O2 ..................... 1.54 |====================== Clang 12: -O3 -march=native ....... 1.52 |====================== Clang 12: -O3 -march=native -flto . 1.46 |===================== GCC 11.1: -O3 -march=native -flto . 2.55 |===================================== GCC 11.1: -O3 -march=native ....... 1.80 |========================== GCC 11.1: -O2 ..................... 1.85 |=========================== NCNN 20201218 Target: CPU - Model: yolov4-tiny ms < Lower Is Better Clang 12: -O2 ..................... 21.97 |================================= Clang 12: -O3 -march=native ....... 21.73 |================================= Clang 12: -O3 -march=native -flto . 21.58 |================================ GCC 11.1: -O3 -march=native -flto . 23.94 |==================================== GCC 11.1: -O3 -march=native ....... 21.00 |================================ GCC 11.1: -O2 ..................... 21.06 |================================