GCC 8.0 vs. Clang 6.0 AMD EPYC Tuning Comparison Tests for a future article on Phoronix. GCC 8.0: -march=x86-64: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: GCC 8.0.0 20171231 + clang (GCC) 8.0.0 20171231 (experimental) + LLVM 5.0.0, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor GCC 8.0: -march=znver1: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: GCC 8.0.0 20171231 + clang (GCC) 8.0.0 20171231 (experimental) + LLVM 5.0.0, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor Clang 6.0: -march=x86-64: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: Clang 6.0.0 (SVN 321623) + LLVM 6.0.0svn, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor Clang 6.0: -march=znver1: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: Clang 6.0.0 (SVN 321623) + LLVM 6.0.0svn, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor SQLite 3.8.10.2 Test Target: Default Test Directory Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 7.61 |============================================== GCC 8.0: -march=znver1 ... 7.16 |=========================================== Clang 6.0: -march=x86-64 . 7.53 |============================================== Clang 6.0: -march=znver1 . 7.48 |============================================= PolyBench-C 3.2 Test: 3 Matrix Multiplications Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 60.68 |========================================== GCC 8.0: -march=znver1 ... 65.45 |============================================= Clang 6.0: -march=x86-64 . 62.98 |=========================================== Clang 6.0: -march=znver1 . 62.75 |=========================================== FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 4959.73 |====================================== GCC 8.0: -march=znver1 ... 5627.83 |=========================================== Clang 6.0: -march=x86-64 . 4660.83 |==================================== Clang 6.0: -march=znver1 . 5031.60 |====================================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 Mflops > Higher Is Better GCC 8.0: -march=znver1 ... 13630 |============================================= Clang 6.0: -march=x86-64 . 13649 |============================================= Clang 6.0: -march=znver1 . 12481 |========================================= Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 13.65 |============================================= GCC 8.0: -march=znver1 ... 12.40 |========================================= Clang 6.0: -march=x86-64 . 12.85 |========================================== Clang 6.0: -march=znver1 . 11.09 |===================================== SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 1579.48 |======================================== GCC 8.0: -march=znver1 ... 1680.45 |=========================================== Clang 6.0: -march=x86-64 . 1479.53 |===================================== Clang 6.0: -march=znver1 . 1699.32 |=========================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 561.03 |============================================ GCC 8.0: -march=znver1 ... 555.76 |============================================ Clang 6.0: -march=x86-64 . 531.38 |========================================== Clang 6.0: -march=znver1 . 552.19 |=========================================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 233.89 |============================================ GCC 8.0: -march=znver1 ... 231.09 |=========================================== Clang 6.0: -march=x86-64 . 179.29 |================================== Clang 6.0: -march=znver1 . 226.68 |=========================================== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 2263.87 |=========================================== GCC 8.0: -march=znver1 ... 2259.95 |=========================================== Clang 6.0: -march=x86-64 . 2190.10 |========================================== Clang 6.0: -march=znver1 . 2258.64 |=========================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 3513.11 |===================================== GCC 8.0: -march=znver1 ... 3678.86 |======================================= Clang 6.0: -march=x86-64 . 3190.43 |================================== Clang 6.0: -march=znver1 . 4034.89 |=========================================== SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better GCC 8.0: -march=x86-64 ... 1423.14 |==================================== GCC 8.0: -march=znver1 ... 1676.62 |=========================================== Clang 6.0: -march=x86-64 . 1110.65 |============================ Clang 6.0: -march=znver1 . 1424.21 |===================================== TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better GCC 8.0: -march=x86-64 ... 874251 |========================================== GCC 8.0: -march=znver1 ... 875085 |========================================== Clang 6.0: -march=x86-64 . 917658 |============================================ Clang 6.0: -march=znver1 . 918269 |============================================ GraphicsMagick 1.3.19 Operation: Blur Iterations Per Minute > Higher Is Better GCC 8.0: -march=x86-64 ... 116 |============================================ GCC 8.0: -march=znver1 ... 123 |=============================================== Clang 6.0: -march=x86-64 . 101 |======================================= Clang 6.0: -march=znver1 . 104 |======================================== GraphicsMagick 1.3.19 Operation: Sharpen Iterations Per Minute > Higher Is Better GCC 8.0: -march=x86-64 ... 157 |============================================= GCC 8.0: -march=znver1 ... 165 |=============================================== Clang 6.0: -march=x86-64 . 131 |===================================== Clang 6.0: -march=znver1 . 136 |======================================= GraphicsMagick 1.3.19 Operation: HWB Color Space Iterations Per Minute > Higher Is Better GCC 8.0: -march=x86-64 ... 177 |============================================= GCC 8.0: -march=znver1 ... 186 |=============================================== Clang 6.0: -march=x86-64 . 150 |====================================== Clang 6.0: -march=znver1 . 155 |======================================= GraphicsMagick 1.3.19 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better GCC 8.0: -march=x86-64 ... 92 |============================================= GCC 8.0: -march=znver1 ... 95 |=============================================== Clang 6.0: -march=x86-64 . 97 |================================================ Clang 6.0: -march=znver1 . 98 |================================================ Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better GCC 8.0: -march=x86-64 ... 949.19 |======================================= GCC 8.0: -march=znver1 ... 935.64 |====================================== Clang 6.0: -march=x86-64 . 1032.71 |========================================== Clang 6.0: -march=znver1 . 1052.47 |=========================================== ebizzy 0.3 Records/s > Higher Is Better GCC 8.0: -march=x86-64 ... 1126032 |========================================== GCC 8.0: -march=znver1 ... 1101176 |========================================= Clang 6.0: -march=x86-64 . 1076648 |======================================== Clang 6.0: -march=znver1 . 1145405 |=========================================== C-Ray 1.1 Total Time Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 3.93 |======================================== GCC 8.0: -march=znver1 ... 3.37 |================================== Clang 6.0: -march=x86-64 . 4.53 |============================================== Clang 6.0: -march=znver1 . 4.48 |============================================= Bullet Physics Engine 2.81 Test: Raytests Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 3.12 |============================================= GCC 8.0: -march=znver1 ... 3.06 |============================================ Clang 6.0: -march=x86-64 . 3.22 |============================================== Clang 6.0: -march=znver1 . 3.18 |============================================= Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 5.34 |============================================= GCC 8.0: -march=znver1 ... 5.27 |============================================ Clang 6.0: -march=x86-64 . 5.48 |============================================== Clang 6.0: -march=znver1 . 5.34 |============================================= Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 6.18 |============================================= GCC 8.0: -march=znver1 ... 5.93 |=========================================== Clang 6.0: -march=x86-64 . 6.30 |============================================== Clang 6.0: -march=znver1 . 6.08 |============================================ Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 5.44 |============================================== GCC 8.0: -march=znver1 ... 5.28 |============================================= Clang 6.0: -march=x86-64 . 5.43 |============================================== Clang 6.0: -march=znver1 . 5.31 |============================================= Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 3.26 |============================================== GCC 8.0: -march=znver1 ... 3.19 |============================================= Clang 6.0: -march=x86-64 . 3.28 |============================================== Clang 6.0: -march=znver1 . 3.23 |============================================= Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 1.10 |============================================== GCC 8.0: -march=znver1 ... 1.10 |============================================== Clang 6.0: -march=x86-64 . 1.10 |============================================== Clang 6.0: -march=znver1 . 1.09 |============================================== Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 1.34 |============================================== GCC 8.0: -march=znver1 ... 1.30 |============================================= Clang 6.0: -march=x86-64 . 1.33 |============================================== Clang 6.0: -march=znver1 . 1.32 |============================================= FLAC Audio Encoding 1.3.1 WAV To FLAC Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 7.12 |========================================= GCC 8.0: -march=znver1 ... 7.45 |=========================================== Clang 6.0: -march=x86-64 . 7.94 |============================================== Clang 6.0: -march=znver1 . 6.63 |====================================== LAME MP3 Encoding 3.99.5 WAV To MP3 Seconds < Lower Is Better GCC 8.0: -march=x86-64 ... 11.10 |======================================= GCC 8.0: -march=znver1 ... 10.81 |====================================== Clang 6.0: -march=x86-64 . 11.33 |======================================== Clang 6.0: -march=znver1 . 12.81 |============================================= Apache Benchmark 2.4.7 Static Web Page Serving Requests Per Second > Higher Is Better GCC 8.0: -march=x86-64 ... 9841.30 |=========================================== GCC 8.0: -march=znver1 ... 9791.23 |=========================================== Clang 6.0: -march=x86-64 . 9531.43 |========================================== Clang 6.0: -march=znver1 . 9663.93 |==========================================