GCC 8.0 vs. Clang 6.0 AMD EPYC Tuning Comparison Tests for a future article on Phoronix. Clang 6.0: -march=znver1: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: Clang 6.0.0 (SVN 321623) + LLVM 6.0.0svn, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor Clang 6.0: -march=x86-64: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: Clang 6.0.0 (SVN 321623) + LLVM 6.0.0svn, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor GCC 8.0: -march=znver1: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: GCC 8.0.0 20171231 + clang (GCC) 8.0.0 20171231 (experimental) + LLVM 5.0.0, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor GCC 8.0: -march=x86-64: Processor: AMD EPYC 7601 32-Core @ 2.20GHz (64 Cores), Motherboard: TYAN B8026T70AE24HR, Chipset: AMD Device 1450, Memory: 126976MB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Broadcom Limited NetXtreme BCM5720 Gigabit PCIe OS: Ubuntu 17.10, Kernel: 4.13.0-21-generic (x86_64), Desktop: GNOME Shell 3.26.1, Display Driver: modesetting 1.19.5, OpenCL: OpenCL 1.2 pocl 1.0 LLVM 5.0.0, Compiler: GCC 8.0.0 20171231 + clang (GCC) 8.0.0 20171231 (experimental) + LLVM 5.0.0, File-System: ext4, Screen Resolution: 1920x1080, System Layer: vm-other Xen 4.9.0 Hypervisor Bullet Physics Engine 2.81 Test: Raytests Seconds < Lower Is Better Clang 6.0: -march=znver1 . 3.18 |============================================= Clang 6.0: -march=x86-64 . 3.22 |============================================== GCC 8.0: -march=znver1 ... 3.06 |============================================ GCC 8.0: -march=x86-64 ... 3.12 |============================================= Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better Clang 6.0: -march=znver1 . 5.34 |============================================= Clang 6.0: -march=x86-64 . 5.48 |============================================== GCC 8.0: -march=znver1 ... 5.27 |============================================ GCC 8.0: -march=x86-64 ... 5.34 |============================================= Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better Clang 6.0: -march=znver1 . 6.08 |============================================ Clang 6.0: -march=x86-64 . 6.30 |============================================== GCC 8.0: -march=znver1 ... 5.93 |=========================================== GCC 8.0: -march=x86-64 ... 6.18 |============================================= Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better Clang 6.0: -march=znver1 . 5.31 |============================================= Clang 6.0: -march=x86-64 . 5.43 |============================================== GCC 8.0: -march=znver1 ... 5.28 |============================================= GCC 8.0: -march=x86-64 ... 5.44 |============================================== Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better Clang 6.0: -march=znver1 . 3.23 |============================================= Clang 6.0: -march=x86-64 . 3.28 |============================================== GCC 8.0: -march=znver1 ... 3.19 |============================================= GCC 8.0: -march=x86-64 ... 3.26 |============================================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better Clang 6.0: -march=znver1 . 1.09 |============================================== Clang 6.0: -march=x86-64 . 1.10 |============================================== GCC 8.0: -march=znver1 ... 1.10 |============================================== GCC 8.0: -march=x86-64 ... 1.10 |============================================== Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better Clang 6.0: -march=znver1 . 1.32 |============================================= Clang 6.0: -march=x86-64 . 1.33 |============================================== GCC 8.0: -march=znver1 ... 1.30 |============================================= GCC 8.0: -march=x86-64 ... 1.34 |============================================== TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better Clang 6.0: -march=znver1 . 918269 |============================================ Clang 6.0: -march=x86-64 . 917658 |============================================ GCC 8.0: -march=znver1 ... 875085 |========================================== GCC 8.0: -march=x86-64 ... 874251 |========================================== SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better Clang 6.0: -march=znver1 . 1699.32 |=========================================== Clang 6.0: -march=x86-64 . 1479.53 |===================================== GCC 8.0: -march=znver1 ... 1680.45 |=========================================== GCC 8.0: -march=x86-64 ... 1579.48 |======================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better Clang 6.0: -march=znver1 . 552.19 |=========================================== Clang 6.0: -march=x86-64 . 531.38 |========================================== GCC 8.0: -march=znver1 ... 555.76 |============================================ GCC 8.0: -march=x86-64 ... 561.03 |============================================ SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better Clang 6.0: -march=znver1 . 226.68 |=========================================== Clang 6.0: -march=x86-64 . 179.29 |================================== GCC 8.0: -march=znver1 ... 231.09 |=========================================== GCC 8.0: -march=x86-64 ... 233.89 |============================================ SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better Clang 6.0: -march=znver1 . 2258.64 |=========================================== Clang 6.0: -march=x86-64 . 2190.10 |========================================== GCC 8.0: -march=znver1 ... 2259.95 |=========================================== GCC 8.0: -march=x86-64 ... 2263.87 |=========================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better Clang 6.0: -march=znver1 . 4034.89 |=========================================== Clang 6.0: -march=x86-64 . 3190.43 |================================== GCC 8.0: -march=znver1 ... 3678.86 |======================================= GCC 8.0: -march=x86-64 ... 3513.11 |===================================== SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better Clang 6.0: -march=znver1 . 1424.21 |===================================== Clang 6.0: -march=x86-64 . 1110.65 |============================ GCC 8.0: -march=znver1 ... 1676.62 |=========================================== GCC 8.0: -march=x86-64 ... 1423.14 |==================================== FLAC Audio Encoding 1.3.1 WAV To FLAC Seconds < Lower Is Better Clang 6.0: -march=znver1 . 6.63 |====================================== Clang 6.0: -march=x86-64 . 7.94 |============================================== GCC 8.0: -march=znver1 ... 7.45 |=========================================== GCC 8.0: -march=x86-64 ... 7.12 |========================================= LAME MP3 Encoding 3.99.5 WAV To MP3 Seconds < Lower Is Better Clang 6.0: -march=znver1 . 12.81 |============================================= Clang 6.0: -march=x86-64 . 11.33 |======================================== GCC 8.0: -march=znver1 ... 10.81 |====================================== GCC 8.0: -march=x86-64 ... 11.10 |======================================= FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 Mflops > Higher Is Better Clang 6.0: -march=znver1 . 5031.60 |====================================== Clang 6.0: -march=x86-64 . 4660.83 |==================================== GCC 8.0: -march=znver1 ... 5627.83 |=========================================== GCC 8.0: -march=x86-64 ... 4959.73 |====================================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 Mflops > Higher Is Better Clang 6.0: -march=znver1 . 12481 |========================================= Clang 6.0: -march=x86-64 . 13649 |============================================= GCC 8.0: -march=znver1 ... 13630 |============================================= Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better Clang 6.0: -march=znver1 . 11.09 |===================================== Clang 6.0: -march=x86-64 . 12.85 |========================================== GCC 8.0: -march=znver1 ... 12.40 |========================================= GCC 8.0: -march=x86-64 ... 13.65 |============================================= Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better Clang 6.0: -march=znver1 . 1052.47 |=========================================== Clang 6.0: -march=x86-64 . 1032.71 |========================================== GCC 8.0: -march=znver1 ... 935.64 |====================================== GCC 8.0: -march=x86-64 ... 949.19 |======================================= GraphicsMagick 1.3.19 Operation: Blur Iterations Per Minute > Higher Is Better Clang 6.0: -march=znver1 . 104 |======================================== Clang 6.0: -march=x86-64 . 101 |======================================= GCC 8.0: -march=znver1 ... 123 |=============================================== GCC 8.0: -march=x86-64 ... 116 |============================================ GraphicsMagick 1.3.19 Operation: Sharpen Iterations Per Minute > Higher Is Better Clang 6.0: -march=znver1 . 136 |======================================= Clang 6.0: -march=x86-64 . 131 |===================================== GCC 8.0: -march=znver1 ... 165 |=============================================== GCC 8.0: -march=x86-64 ... 157 |============================================= GraphicsMagick 1.3.19 Operation: HWB Color Space Iterations Per Minute > Higher Is Better Clang 6.0: -march=znver1 . 155 |======================================= Clang 6.0: -march=x86-64 . 150 |====================================== GCC 8.0: -march=znver1 ... 186 |=============================================== GCC 8.0: -march=x86-64 ... 177 |============================================= GraphicsMagick 1.3.19 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better Clang 6.0: -march=znver1 . 98 |================================================ Clang 6.0: -march=x86-64 . 97 |================================================ GCC 8.0: -march=znver1 ... 95 |=============================================== GCC 8.0: -march=x86-64 ... 92 |============================================= C-Ray 1.1 Total Time Seconds < Lower Is Better Clang 6.0: -march=znver1 . 4.48 |============================================= Clang 6.0: -march=x86-64 . 4.53 |============================================== GCC 8.0: -march=znver1 ... 3.37 |================================== GCC 8.0: -march=x86-64 ... 3.93 |======================================== Apache Benchmark 2.4.7 Static Web Page Serving Requests Per Second > Higher Is Better Clang 6.0: -march=znver1 . 9663.93 |========================================== Clang 6.0: -march=x86-64 . 9531.43 |========================================== GCC 8.0: -march=znver1 ... 9791.23 |=========================================== GCC 8.0: -march=x86-64 ... 9841.30 |=========================================== SQLite 3.8.10.2 Test Target: Default Test Directory Seconds < Lower Is Better Clang 6.0: -march=znver1 . 7.48 |============================================= Clang 6.0: -march=x86-64 . 7.53 |============================================== GCC 8.0: -march=znver1 ... 7.16 |=========================================== GCC 8.0: -march=x86-64 ... 7.61 |============================================== ebizzy 0.3 Records/s > Higher Is Better Clang 6.0: -march=znver1 . 1145405 |=========================================== Clang 6.0: -march=x86-64 . 1076648 |======================================== GCC 8.0: -march=znver1 ... 1101176 |========================================= GCC 8.0: -march=x86-64 ... 1126032 |========================================== PolyBench-C 3.2 Test: 3 Matrix Multiplications Seconds < Lower Is Better Clang 6.0: -march=znver1 . 62.75 |=========================================== Clang 6.0: -march=x86-64 . 62.98 |=========================================== GCC 8.0: -march=znver1 ... 65.45 |============================================= GCC 8.0: -march=x86-64 ... 60.68 |==========================================