AMD AOCC 2.2 vs. GCC vs. Clang - EPYC 7742 2P AMD AOCC 2.2 compiler against GCC 10, LLVM Clang 10. Benchmarks by Michael Larabel for a future article. AOCC 2.2: Processor: 2 x AMD EPYC 7742 64-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (RDY1006G BIOS), Chipset: AMD Starship/Matisse, Memory: 504GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710 OS: Ubuntu 20.10, Kernel: 5.4.0-42-generic (x86_64), Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: Clang 10.0.0, File-System: ext4, Screen Resolution: 1920x1080 GCC 10.2: Processor: 2 x AMD EPYC 7742 64-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (RDY1006G BIOS), Chipset: AMD Starship/Matisse, Memory: 504GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710 OS: Ubuntu 20.10, Kernel: 5.4.0-42-generic (x86_64), Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 10.2.0, File-System: ext4, Screen Resolution: 1920x1080 Clang 10.1: Processor: 2 x AMD EPYC 7742 64-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (RDY1006G BIOS), Chipset: AMD Starship/Matisse, Memory: 504GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710 OS: Ubuntu 20.10, Kernel: 5.4.0-42-generic (x86_64), Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: Clang 10.0.1-1Target:, File-System: ext4, Screen Resolution: 1920x1080 Crypto++ 8.2 Test: Unkeyed Algorithms MiB/second > Higher Is Better AOCC 2.2 ... 317.38 |========================================================== GCC 10.2 ... 310.14 |========================================================= Clang 10.1 . 316.54 |========================================================== Rodinia 3.1 Test: OpenMP Leukocyte Seconds < Lower Is Better AOCC 2.2 ... 50.14 |======================================================== GCC 10.2 ... 52.07 |========================================================== Clang 10.1 . 52.71 |=========================================================== Rodinia 3.1 Test: OpenMP Streamcluster Seconds < Lower Is Better AOCC 2.2 ... 10.02 |======================================================= GCC 10.2 ... 10.01 |======================================================= Clang 10.1 . 10.71 |=========================================================== Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Seconds < Lower Is Better AOCC 2.2 ... 106.58 |======================================================== GCC 10.2 ... 109.59 |========================================================= Clang 10.1 . 111.35 |========================================================== Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better AOCC 2.2 ... 5.608 |====================================================== GCC 10.2 ... 6.101 |=========================================================== Clang 10.1 . 4.888 |=============================================== Zstd Compression 1.4.5 Compression Level: 19 MB/s > Higher Is Better AOCC 2.2 ... 128.3 |=========================================================== GCC 10.2 ... 125.7 |========================================================== Clang 10.1 . 120.7 |======================================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better AOCC 2.2 ... 620.89 |========================================================== GCC 10.2 ... 611.72 |========================================================= Clang 10.1 . 620.64 |========================================================== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better AOCC 2.2 ... 3450.97 |========================================================= GCC 10.2 ... 2811.30 |============================================== Clang 10.1 . 3462.03 |========================================================= TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better AOCC 2.2 ... 1163950 |======================================================== GCC 10.2 ... 1014328 |================================================= Clang 10.1 . 1178898 |========================================================= John The Ripper 1.9.0-jumbo-1 Test: Blowfish Real C/S > Higher Is Better AOCC 2.2 ... 176970 |========================================================== GCC 10.2 ... 149154 |================================================= Clang 10.1 . 177644 |========================================================== GraphicsMagick 1.3.33 Operation: Rotate Iterations Per Minute > Higher Is Better AOCC 2.2 ... 529 |============================================================= GCC 10.2 ... 497 |========================================================= Clang 10.1 . 527 |============================================================= GraphicsMagick 1.3.33 Operation: Enhanced Iterations Per Minute > Higher Is Better AOCC 2.2 ... 1108 |============================================== GCC 10.2 ... 1439 |============================================================ Clang 10.1 . 1327 |======================================================= GraphicsMagick 1.3.33 Operation: Resizing Iterations Per Minute > Higher Is Better AOCC 2.2 ... 351 |============================================================= GCC 10.2 ... 107 |=================== Clang 10.1 . 162 |============================ oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 0.812279 |======================= GCC 10.2 ... 1.983800 |======================================================== Clang 10.1 . 0.925616 |========================== oneDNN 1.5 Harness: IP Batch All - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 12.68 |======================================== GCC 10.2 ... 18.87 |=========================================================== Clang 10.1 . 12.91 |======================================== oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 0.506847 |====================================== GCC 10.2 ... 0.744088 |======================================================== Clang 10.1 . 0.537037 |======================================== oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 0.989625 |==================== GCC 10.2 ... 2.826880 |======================================================== Clang 10.1 . 1.057690 |===================== oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 2.57847 |================================================= GCC 10.2 ... 2.98436 |========================================================= Clang 10.1 . 2.62134 |================================================== oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 162.42 |=========== GCC 10.2 ... 870.72 |========================================================== Clang 10.1 . 219.27 |=============== oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 61.87 |========== GCC 10.2 ... 351.23 |========================================================== Clang 10.1 . 87.60 |============== oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU ms < Lower Is Better AOCC 2.2 ... 0.228572 |================= GCC 10.2 ... 0.737722 |======================================================== Clang 10.1 . 0.272380 |===================== SVT-VP9 0.1 Tuning: VMAF Optimized - Input: Bosphorus 1080p Frames Per Second > Higher Is Better AOCC 2.2 ... 392.64 |========================================================== GCC 10.2 ... 376.61 |======================================================== Clang 10.1 . 366.54 |====================================================== SVT-VP9 0.1 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p Frames Per Second > Higher Is Better AOCC 2.2 ... 412.51 |========================================================== GCC 10.2 ... 382.50 |====================================================== Clang 10.1 . 388.72 |======================================================= SVT-VP9 0.1 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p Frames Per Second > Higher Is Better AOCC 2.2 ... 336.19 |========================================================== GCC 10.2 ... 311.85 |====================================================== Clang 10.1 . 321.84 |======================================================== x264 2019-12-17 H.264 Video Encoding Frames Per Second > Higher Is Better AOCC 2.2 ... 204.60 |========================================================== GCC 10.2 ... 204.52 |========================================================== Clang 10.1 . 195.51 |======================================================= Stockfish 9 Total Time Nodes Per Second > Higher Is Better AOCC 2.2 ... 254824742 |======================================================= GCC 10.2 ... 242098165 |==================================================== Clang 10.1 . 257008114 |======================================================= Timed Apache Compilation 2.4.41 Time To Compile Seconds < Lower Is Better AOCC 2.2 ... 41.25 |=========================================================== GCC 10.2 ... 23.21 |================================= Clang 10.1 . 21.74 |=============================== Timed FFmpeg Compilation 4.2.2 Time To Compile Seconds < Lower Is Better AOCC 2.2 ... 39.87 |=========================================================== GCC 10.2 ... 16.44 |======================== Clang 10.1 . 21.82 |================================ Timed MPlayer Compilation 1.4 Time To Compile Seconds < Lower Is Better AOCC 2.2 ... 35.88 |=========================================================== GCC 10.2 ... 10.70 |================== Clang 10.1 . 18.09 |============================== Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better AOCC 2.2 ... 4.310990 |======================================================= GCC 10.2 ... 4.313341 |======================================================= Clang 10.1 . 4.387945 |======================================================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better AOCC 2.2 ... 4.557857 |==================================================== GCC 10.2 ... 4.873698 |======================================================== Clang 10.1 . 4.558771 |==================================================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better AOCC 2.2 ... 1.009440 |====================================================== GCC 10.2 ... 1.039175 |======================================================== Clang 10.1 . 1.021595 |======================================================= Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better AOCC 2.2 ... 1.183395 |====================================================== GCC 10.2 ... 1.233930 |======================================================== Clang 10.1 . 1.176618 |===================================================== OpenSSL 1.1.1 RSA 4096-bit Performance Signs Per Second > Higher Is Better AOCC 2.2 ... 18574.8 |=========================================== GCC 10.2 ... 24437.6 |========================================================= Clang 10.1 . 18618.0 |=========================================== LevelDB 1.22 Benchmark: Hot Read Microseconds Per Op < Lower Is Better AOCC 2.2 ... 286.17 |========================================================= GCC 10.2 ... 289.70 |========================================================= Clang 10.1 . 293.72 |========================================================== ASTC Encoder 2.0 Preset: Fast Seconds < Lower Is Better AOCC 2.2 ... 4.99 |===================================================== GCC 10.2 ... 5.61 |============================================================ Clang 10.1 . 5.17 |======================================================= ASTC Encoder 2.0 Preset: Medium Seconds < Lower Is Better AOCC 2.2 ... 5.67 |====================================================== GCC 10.2 ... 6.25 |============================================================ Clang 10.1 . 5.73 |======================================================= ASTC Encoder 2.0 Preset: Thorough Seconds < Lower Is Better AOCC 2.2 ... 8.04 |===================================================== GCC 10.2 ... 9.09 |============================================================ Clang 10.1 . 8.11 |====================================================== ASTC Encoder 2.0 Preset: Exhaustive Seconds < Lower Is Better AOCC 2.2 ... 19.95 |====================================================== GCC 10.2 ... 22.00 |=========================================================== Clang 10.1 . 19.71 |===================================================== Basis Universal 1.12 Settings: UASTC Level 3 Seconds < Lower Is Better AOCC 2.2 ... 13.74 |=========================================================== GCC 10.2 ... 13.76 |=========================================================== Clang 10.1 . 13.57 |========================================================== CppPerformanceBenchmarks 9 Test: Ctype Seconds < Lower Is Better AOCC 2.2 ... 40.21 |======================================================= GCC 10.2 ... 43.27 |=========================================================== Clang 10.1 . 40.34 |======================================================= CppPerformanceBenchmarks 9 Test: Math Library Seconds < Lower Is Better AOCC 2.2 ... 334.75 |========================================================= GCC 10.2 ... 339.96 |========================================================== Clang 10.1 . 331.62 |========================================================= CppPerformanceBenchmarks 9 Test: Stepanov Vector Seconds < Lower Is Better AOCC 2.2 ... 90.50 |====================================================== GCC 10.2 ... 99.19 |=========================================================== Clang 10.1 . 86.12 |=================================================== CppPerformanceBenchmarks 9 Test: Stepanov Abstraction Seconds < Lower Is Better AOCC 2.2 ... 34.01 |====================================================== GCC 10.2 ... 36.97 |=========================================================== Clang 10.1 . 33.55 |====================================================== Apache Benchmark 2.4.29 Static Web Page Serving Requests Per Second > Higher Is Better AOCC 2.2 ... 27488.36 |======================================================== GCC 10.2 ... 26062.36 |===================================================== Clang 10.1 . 26903.49 |======================================================= BRL-CAD 7.30.8 VGR Performance Metric VGR Performance Metric > Higher Is Better AOCC 2.2 ... 2693129 |==================================================== GCC 10.2 ... 2951330 |========================================================= Clang 10.1 . 2779511 |======================================================