AMD EPYC Compiler Tuning GCC 9 compiler tuning benchmarks by Michael Larabel for a future article on Phoronix.com. -O0: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -Og: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O1: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O2: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O2 -ftree-vectorize -ftree-slp-vectorize: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O2 -march=znver1: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O2 -flto: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O3: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O3 -march=znver1: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -O3 -march=znver1 -flto: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 -Ofast -march=znver1: Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200 FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 Mflops > Higher Is Better -O0 ....................................... 1708 |========= -Og ....................................... 4366 |======================= -O1 ....................................... 4632 |======================== -O2 ....................................... 4625 |======================== -O2 -ftree-vectorize -ftree-slp-vectorize . 4805 |========================= -O2 -march=znver1 ......................... 5074 |========================== -O2 -flto ................................. 5091 |=========================== -O3 ....................................... 4751 |========================= -O3 -march=znver1 ......................... 5006 |========================== -O3 -march=znver1 -flto ................... 5571 |============================= -Ofast -march=znver1 ...................... 4885 |========================= FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 Mflops > Higher Is Better -O0 ....................................... 2193 |===== -Og ....................................... 12642 |========================== -O1 ....................................... 13468 |============================ -O2 ....................................... 13391 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 13285 |=========================== -O2 -march=znver1 ......................... 13346 |============================ -O2 -flto ................................. 13214 |=========================== -O3 ....................................... 13555 |============================ -O3 -march=znver1 ......................... 12752 |========================== -O3 -march=znver1 -flto ................... 13110 |=========================== -Ofast -march=znver1 ...................... 13166 |=========================== Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better -O0 ....................................... 9.02 |============================= -Og ....................................... 7.39 |======================== -O1 ....................................... 6.93 |====================== -O2 ....................................... 6.62 |===================== -O2 -ftree-vectorize -ftree-slp-vectorize . 6.82 |====================== -O2 -march=znver1 ......................... 6.54 |===================== -O2 -flto ................................. 6.56 |===================== -O3 ....................................... 6.57 |===================== -O3 -march=znver1 ......................... 6.29 |==================== -O3 -march=znver1 -flto ................... 6.16 |==================== -Ofast -march=znver1 ...................... 6.00 |=================== SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better -O0 ....................................... 434 |====== -Og ....................................... 1205 |================== -O1 ....................................... 1519 |====================== -O2 ....................................... 1369 |==================== -O2 -ftree-vectorize -ftree-slp-vectorize . 1724 |========================= -O2 -march=znver1 ......................... 1501 |====================== -O2 -flto ................................. 1307 |=================== -O3 ....................................... 1800 |=========================== -O3 -march=znver1 ......................... 1961 |============================= -O3 -march=znver1 -flto ................... 1747 |========================== -Ofast -march=znver1 ...................... 1825 |=========================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better -O0 ....................................... 108 |== -Og ....................................... 210 |==== -O1 ....................................... 576 |=========== -O2 ....................................... 560 |=========== -O2 -ftree-vectorize -ftree-slp-vectorize . 560 |=========== -O2 -march=znver1 ......................... 557 |=========== -O2 -flto ................................. 568 |=========== -O3 ....................................... 560 |=========== -O3 -march=znver1 ......................... 557 |=========== -O3 -march=znver1 -flto ................... 1480 |============================= -Ofast -march=znver1 ...................... 561 |=========== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better -O0 ....................................... 201 |======================= -Og ....................................... 257 |============================== -O1 ....................................... 226 |========================== -O2 ....................................... 230 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 231 |=========================== -O2 -march=znver1 ......................... 229 |=========================== -O2 -flto ................................. 232 |=========================== -O3 ....................................... 232 |=========================== -O3 -march=znver1 ......................... 227 |========================== -O3 -march=znver1 -flto ................... 230 |=========================== -Ofast -march=znver1 ...................... 221 |========================== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better -O0 ....................................... 516 |====== -Og ....................................... 2188 |========================= -O1 ....................................... 2411 |=========================== -O2 ....................................... 2527 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 2515 |============================ -O2 -march=znver1 ......................... 2584 |============================= -O2 -flto ................................. 2299 |========================== -O3 ....................................... 2475 |============================ -O3 -march=znver1 ......................... 2482 |============================ -O3 -march=znver1 -flto ................... 2052 |======================= -Ofast -march=znver1 ...................... 2579 |============================= SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better -O0 ....................................... 512 |=== -Og ....................................... 2539 |=============== -O1 ....................................... 3466 |===================== -O2 ....................................... 2609 |================ -O2 -ftree-vectorize -ftree-slp-vectorize . 4396 |========================== -O2 -march=znver1 ......................... 3231 |=================== -O2 -flto ................................. 2515 |=============== -O3 ....................................... 4307 |========================== -O3 -march=znver1 ......................... 4851 |============================= -O3 -march=znver1 -flto ................... 3300 |==================== -Ofast -march=znver1 ...................... 4089 |======================== SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better -O0 ....................................... 832 |============== -Og ....................................... 919 |================ -O1 ....................................... 919 |================ -O2 ....................................... 919 |================ -O2 -ftree-vectorize -ftree-slp-vectorize . 919 |================ -O2 -march=znver1 ......................... 1016 |================= -O2 -flto ................................. 918 |================ -O3 ....................................... 1427 |========================= -O3 -march=znver1 ......................... 1689 |============================= -O3 -march=znver1 -flto ................... 1675 |============================= -Ofast -march=znver1 ...................... 1676 |============================= TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better -O0 ....................................... 865459 |=========================== -Og ....................................... 865187 |=========================== -O1 ....................................... 864102 |=========================== -O2 ....................................... 864373 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 864916 |=========================== -O2 -march=znver1 ......................... 864915 |=========================== -O2 -flto ................................. 864101 |=========================== -O3 ....................................... 864915 |=========================== -O3 -march=znver1 ......................... 865732 |=========================== -O3 -march=znver1 -flto ................... 863018 |=========================== -Ofast -march=znver1 ...................... 864373 |=========================== John The Ripper 1.8.0-jumbo-1 Test: Blowfish Real C/S > Higher Is Better -O0 ....................................... 15179 |====== -Og ....................................... 56453 |======================== -O1 ....................................... 65995 |============================ -O2 ....................................... 62718 |========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 63586 |=========================== -O2 -march=znver1 ......................... 61309 |========================== -O2 -flto ................................. 65117 |=========================== -O3 ....................................... 65806 |============================ -O3 -march=znver1 ......................... 66823 |============================ -O3 -march=znver1 -flto ................... 58764 |========================= -Ofast -march=znver1 ...................... 62841 |========================== John The Ripper 1.8.0-jumbo-1 Test: Traditional DES Real C/S > Higher Is Better -O0 ....................................... 218232000 |==================== -Og ....................................... 239289333 |====================== -O1 ....................................... 257067200 |======================== -O2 ....................................... 257407667 |======================== -O2 -ftree-vectorize -ftree-slp-vectorize . 257058000 |======================== -O2 -march=znver1 ......................... 255957000 |======================== -O2 -flto ................................. 260736667 |======================== -O3 ....................................... 253868583 |======================= -O3 -march=znver1 ......................... 260019667 |======================== -O3 -march=znver1 -flto ................... 254777333 |======================= -Ofast -march=znver1 ...................... 258770667 |======================== SVT-AV1 2019-02-03 1080p 8-bit YUV To AV1 Video Encode Frames Per Second > Higher Is Better -O0 ....................................... 1.69 |============================ -Og ....................................... 1.73 |============================= -O1 ....................................... 1.70 |============================ -O2 ....................................... 1.69 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 1.67 |============================ -O2 -march=znver1 ......................... 1.70 |============================ -O2 -flto ................................. 1.69 |============================ -O3 ....................................... 1.73 |============================= -O3 -march=znver1 ......................... 1.68 |============================ -O3 -march=znver1 -flto ................... 1.71 |============================= -Ofast -march=znver1 ...................... 1.70 |============================ VP9 libvpx Encoding 1.8.0 vpxenc VP9 1080p Video Encode Frames Per Second > Higher Is Better -O0 ....................................... 12.50 |=========================== -Og ....................................... 12.52 |=========================== -O1 ....................................... 12.53 |============================ -O2 ....................................... 12.54 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 12.56 |============================ -O2 -march=znver1 ......................... 12.42 |=========================== -O3 ....................................... 12.31 |=========================== -O3 -march=znver1 ......................... 12.41 |=========================== -O3 -march=znver1 -flto ................... 12.75 |============================ -Ofast -march=znver1 ...................... 12.37 |=========================== x264 2018-09-25 H.264 Video Encoding Frames Per Second > Higher Is Better -O0 ....................................... 102 |===================== -Og ....................................... 142 |============================= -O1 ....................................... 145 |============================== -O2 ....................................... 144 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 144 |============================= -O2 -march=znver1 ......................... 144 |============================= -O3 ....................................... 147 |============================== -O3 -march=znver1 ......................... 144 |============================= -Ofast -march=znver1 ...................... 144 |============================= x265 3.0 H.265 1080p Video Encoding Frames Per Second > Higher Is Better -O0 ....................................... 35.00 |============================ -Og ....................................... 34.76 |=========================== -O1 ....................................... 35.62 |============================ -O2 ....................................... 34.55 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 35.41 |============================ -O2 -march=znver1 ......................... 34.80 |=========================== -O2 -flto ................................. 35.07 |============================ -O3 ....................................... 35.21 |============================ -O3 -march=znver1 ......................... 35.57 |============================ -Ofast -march=znver1 ...................... 34.91 |=========================== GraphicsMagick 1.3.30 Operation: Swirl Iterations Per Minute > Higher Is Better -O0 ....................................... 96 |=============== -Og ....................................... 181 |============================ -O1 ....................................... 194 |============================== -O2 ....................................... 195 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 196 |============================== -O2 -march=znver1 ......................... 196 |============================== -O2 -flto ................................. 196 |============================== -O3 ....................................... 189 |============================= -O3 -march=znver1 ......................... 195 |============================== -O3 -march=znver1 -flto ................... 194 |============================== -Ofast -march=znver1 ...................... 196 |============================== GraphicsMagick 1.3.30 Operation: Rotate Iterations Per Minute > Higher Is Better -O0 ....................................... 98 |=============== -Og ....................................... 181 |============================ -O1 ....................................... 191 |============================== -O2 ....................................... 191 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 190 |============================== -O2 -march=znver1 ......................... 191 |============================== -O2 -flto ................................. 191 |============================== -O3 ....................................... 183 |============================= -O3 -march=znver1 ......................... 190 |============================== -O3 -march=znver1 -flto ................... 188 |============================== -Ofast -march=znver1 ...................... 189 |============================== GraphicsMagick 1.3.30 Operation: Sharpen Iterations Per Minute > Higher Is Better -O0 ....................................... 82 |============= -Og ....................................... 156 |========================== -O1 ....................................... 180 |============================== -O2 ....................................... 181 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 180 |============================== -O2 -march=znver1 ......................... 183 |============================== -O2 -flto ................................. 183 |============================== -O3 ....................................... 174 |============================= -O3 -march=znver1 ......................... 183 |============================== -O3 -march=znver1 -flto ................... 183 |============================== -Ofast -march=znver1 ...................... 182 |============================== GraphicsMagick 1.3.30 Operation: Enhanced Iterations Per Minute > Higher Is Better -O0 ....................................... 90 |============== -Og ....................................... 173 |=========================== -O1 ....................................... 187 |============================= -O2 ....................................... 189 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 188 |============================= -O2 -march=znver1 ......................... 191 |============================== -O2 -flto ................................. 190 |============================== -O3 ....................................... 181 |============================ -O3 -march=znver1 ......................... 191 |============================== -O3 -march=znver1 -flto ................... 186 |============================= -Ofast -march=znver1 ...................... 193 |============================== GraphicsMagick 1.3.30 Operation: Resizing Iterations Per Minute > Higher Is Better -O0 ....................................... 74 |================= -Og ....................................... 120 |=========================== -O1 ....................................... 126 |============================= -O2 ....................................... 131 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 128 |============================= -O2 -march=znver1 ......................... 127 |============================= -O2 -flto ................................. 128 |============================= -O3 ....................................... 118 |=========================== -O3 -march=znver1 ......................... 127 |============================= -O3 -march=znver1 -flto ................... 125 |============================= -Ofast -march=znver1 ...................... 124 |============================ GraphicsMagick 1.3.30 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better -O0 ....................................... 92 |=============== -Og ....................................... 168 |=========================== -O1 ....................................... 179 |============================= -O2 ....................................... 180 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 178 |============================= -O2 -march=znver1 ......................... 180 |============================= -O2 -flto ................................. 180 |============================= -O3 ....................................... 172 |============================ -O3 -march=znver1 ......................... 180 |============================= -O3 -march=znver1 -flto ................... 178 |============================= -Ofast -march=znver1 ...................... 187 |============================== GraphicsMagick 1.3.30 Operation: HWB Color Space Iterations Per Minute > Higher Is Better -O0 ....................................... 102 |============== -Og ....................................... 195 |=========================== -O1 ....................................... 210 |============================= -O2 ....................................... 211 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 212 |============================== -O2 -march=znver1 ......................... 211 |============================== -O2 -flto ................................. 214 |============================== -O3 ....................................... 203 |============================ -O3 -march=znver1 ......................... 210 |============================= -O3 -march=znver1 -flto ................... 209 |============================= -Ofast -march=znver1 ...................... 209 |============================= Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better -O0 ....................................... 383 |=========== -Og ....................................... 772 |====================== -O1 ....................................... 785 |====================== -O2 ....................................... 1017 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 1007 |============================= -O2 -march=znver1 ......................... 1001 |============================ -O2 -flto ................................. 1022 |============================= -O3 ....................................... 1008 |============================= -O3 -march=znver1 ......................... 1011 |============================= -O3 -march=znver1 -flto ................... 1000 |============================ -Ofast -march=znver1 ...................... 1022 |============================= Stockfish 9 Total Time Nodes Per Second > Higher Is Better -O0 ....................................... 105868175 |======================== -Og ....................................... 105709690 |======================== -O1 ....................................... 105698092 |======================== -O2 ....................................... 104480422 |======================== -O2 -ftree-vectorize -ftree-slp-vectorize . 104197865 |======================= -O2 -march=znver1 ......................... 106084276 |======================== -O2 -flto ................................. 104536605 |======================== -O3 ....................................... 104121840 |======================= -O3 -march=znver1 ......................... 106497994 |======================== -Ofast -march=znver1 ...................... 106507244 |======================== Timed Apache Compilation 2.4.7 Time To Compile Seconds < Lower Is Better -O0 ....................................... 11.43 |=========== -Og ....................................... 14.59 |============== -O1 ....................................... 17.51 |================= -O2 ....................................... 23.82 |======================= -O2 -ftree-vectorize -ftree-slp-vectorize . 24.03 |======================== -O2 -march=znver1 ......................... 23.82 |======================= -O2 -flto ................................. 26.50 |========================== -O3 ....................................... 26.08 |========================== -O3 -march=znver1 ......................... 25.94 |========================= -O3 -march=znver1 -flto ................... 28.62 |============================ -Ofast -march=znver1 ...................... 26.11 |========================== Timed ImageMagick Compilation 6.9.0 Time To Compile Seconds < Lower Is Better -O0 ....................................... 5.23 |= -Og ....................................... 7.89 |== -O1 ....................................... 18.42 |==== -O2 ....................................... 23.63 |===== -O2 -ftree-vectorize -ftree-slp-vectorize . 23.91 |===== -O2 -march=znver1 ......................... 23.78 |===== -O2 -flto ................................. 98.67 |====================== -O3 ....................................... 25.06 |====== -O3 -march=znver1 ......................... 24.88 |====== -O3 -march=znver1 -flto ................... 118.48 |=========================== -Ofast -march=znver1 ...................... 25.21 |====== Timed PHP Compilation 7.1.9 Time To Compile Seconds < Lower Is Better -O0 ....................................... 15.19 |===== -Og ....................................... 21.42 |======== -O1 ....................................... 29.05 |========== -O2 ....................................... 52.17 |=================== -O2 -ftree-vectorize -ftree-slp-vectorize . 52.58 |=================== -O2 -march=znver1 ......................... 51.96 |=================== -O3 ....................................... 78.19 |============================ -O3 -march=znver1 ......................... 78.13 |============================ C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better -O0 ....................................... 44.92 |============================ -Og ....................................... 28.64 |================== -O1 ....................................... 28.74 |================== -O2 ....................................... 25.84 |================ -O2 -ftree-vectorize -ftree-slp-vectorize . 25.77 |================ -O2 -march=znver1 ......................... 21.58 |============= -O2 -flto ................................. 25.96 |================ -O3 ....................................... 12.60 |======== -O3 -march=znver1 ......................... 11.35 |======= -O3 -march=znver1 -flto ................... 11.31 |======= -Ofast -march=znver1 ...................... 10.40 |====== AOBench Size: 2048 x 2048 - Total Time Seconds < Lower Is Better -O0 ....................................... 92.50 |============================ -Og ....................................... 77.41 |======================= -O1 ....................................... 56.61 |================= -O2 ....................................... 55.54 |================= -O2 -ftree-vectorize -ftree-slp-vectorize . 55.53 |================= -O2 -march=znver1 ......................... 54.35 |================ -O2 -flto ................................. 55.52 |================= -O3 ....................................... 53.53 |================ -O3 -march=znver1 ......................... 51.49 |================ -O3 -march=znver1 -flto ................... 52.08 |================ -Ofast -march=znver1 ...................... 51.73 |================ Bullet Physics Engine 2.81 Test: Raytests Seconds < Lower Is Better -O0 ....................................... 3.11 |============================= -Og ....................................... 3.11 |============================= -O1 ....................................... 3.12 |============================= -O2 ....................................... 3.12 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 3.11 |============================= -O2 -march=znver1 ......................... 3.10 |============================= -O2 -flto ................................. 3.05 |============================ -O3 ....................................... 3.11 |============================= -O3 -march=znver1 ......................... 3.09 |============================= -Ofast -march=znver1 ...................... 3.09 |============================= Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better -O0 ....................................... 5.14 |============================= -Og ....................................... 5.15 |============================= -O1 ....................................... 5.16 |============================= -O2 ....................................... 5.14 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 5.14 |============================= -O2 -march=znver1 ......................... 5.08 |============================ -O2 -flto ................................. 5.21 |============================= -O3 ....................................... 5.16 |============================= -O3 -march=znver1 ......................... 5.07 |============================ -Ofast -march=znver1 ...................... 5.09 |============================ Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better -O0 ....................................... 6.00 |============================ -Og ....................................... 6.02 |============================ -O1 ....................................... 6.00 |============================ -O2 ....................................... 6.01 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 5.98 |=========================== -O2 -march=znver1 ......................... 5.80 |=========================== -O2 -flto ................................. 6.32 |============================= -O3 ....................................... 6.05 |============================ -O3 -march=znver1 ......................... 5.80 |=========================== -Ofast -march=znver1 ...................... 5.80 |=========================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better -O0 ....................................... 5.36 |============================= -Og ....................................... 5.37 |============================= -O1 ....................................... 5.37 |============================= -O2 ....................................... 5.37 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 5.37 |============================= -O2 -march=znver1 ......................... 5.19 |============================ -O2 -flto ................................. 5.40 |============================= -O3 ....................................... 5.38 |============================= -O3 -march=znver1 ......................... 5.18 |============================ -Ofast -march=znver1 ...................... 5.18 |============================ Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better -O0 ....................................... 3.14 |============================ -Og ....................................... 3.14 |============================ -O1 ....................................... 3.15 |============================ -O2 ....................................... 3.14 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 3.15 |============================ -O2 -march=znver1 ......................... 3.05 |=========================== -O2 -flto ................................. 3.24 |============================= -O3 ....................................... 3.15 |============================ -O3 -march=znver1 ......................... 3.06 |=========================== -Ofast -march=znver1 ...................... 3.06 |=========================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better -O0 ....................................... 1.11 |============================= -Og ....................................... 1.11 |============================= -O1 ....................................... 1.11 |============================= -O2 ....................................... 1.11 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 1.11 |============================= -O2 -march=znver1 ......................... 1.11 |============================= -O2 -flto ................................. 1.09 |============================ -O3 ....................................... 1.11 |============================= -O3 -march=znver1 ......................... 1.11 |============================= -Ofast -march=znver1 ...................... 1.11 |============================= Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better -O0 ....................................... 1.35 |============================= -Og ....................................... 1.35 |============================= -O1 ....................................... 1.35 |============================= -O2 ....................................... 1.35 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 1.35 |============================= -O2 -march=znver1 ......................... 1.32 |============================ -O2 -flto ................................. 1.33 |============================= -O3 ....................................... 1.35 |============================= -O3 -march=znver1 ......................... 1.32 |============================ -Ofast -march=znver1 ...................... 1.32 |============================ Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better -O0 ....................................... 23.12 |============================ -Og ....................................... 14.39 |================= -O1 ....................................... 14.11 |================= -O2 ....................................... 14.48 |================== -O2 -ftree-vectorize -ftree-slp-vectorize . 13.67 |================= -O2 -march=znver1 ......................... 14.71 |================== -O2 -flto ................................. 14.08 |================= -O3 ....................................... 13.66 |================= -O3 -march=znver1 ......................... 14.37 |================= -O3 -march=znver1 -flto ................... 13.16 |================ -Ofast -march=znver1 ...................... 13.77 |================= FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better -O0 ....................................... 96.77 |============================ -Og ....................................... 15.58 |===== -O1 ....................................... 15.01 |==== -O2 ....................................... 13.65 |==== -O2 -ftree-vectorize -ftree-slp-vectorize . 13.70 |==== -O2 -march=znver1 ......................... 13.89 |==== -O2 -flto ................................. 13.64 |==== -O3 ....................................... 13.61 |==== -O3 -march=znver1 ......................... 13.85 |==== -O3 -march=znver1 -flto ................... 14.21 |==== -Ofast -march=znver1 ...................... 13.95 |==== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better -O0 ....................................... 41.79 |============================ -Og ....................................... 16.78 |=========== -O1 ....................................... 14.32 |========== -O2 ....................................... 14.07 |========= -O2 -ftree-vectorize -ftree-slp-vectorize . 10.96 |======= -O2 -march=znver1 ......................... 14.00 |========= -O2 -flto ................................. 14.14 |========= -O3 ....................................... 10.84 |======= -O3 -march=znver1 ......................... 10.57 |======= -O3 -march=znver1 -flto ................... 10.38 |======= -Ofast -march=znver1 ...................... 9.80 |======= libjpeg-turbo tjbench 1.5.3 Test: Decompression Throughput Megapixels/sec > Higher Is Better -O0 ....................................... 111 |======================= -Og ....................................... 141 |============================= -O1 ....................................... 139 |============================= -O2 ....................................... 140 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 139 |============================= -O2 -march=znver1 ......................... 142 |============================== -O2 -flto ................................. 140 |============================= -O3 ....................................... 141 |============================= -O3 -march=znver1 ......................... 144 |============================== -O3 -march=znver1 -flto ................... 144 |============================== -Ofast -march=znver1 ...................... 144 |============================== PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better -O0 ....................................... 419700 |===================== -Og ....................................... 507203 |========================== -O1 ....................................... 515102 |========================== -O2 ....................................... 515340 |========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 529699 |=========================== -O2 -march=znver1 ......................... 510425 |========================== -O2 -flto ................................. 520570 |=========================== -O3 ....................................... 490551 |========================= -O3 -march=znver1 ......................... 505031 |========================== -O3 -march=znver1 -flto ................... 454256 |======================= -Ofast -march=znver1 ...................... 508384 |========================== PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write TPS > Higher Is Better -O0 ....................................... 4585 |========================== -Og ....................................... 3767 |====================== -O1 ....................................... 4301 |========================= -O2 ....................................... 4167 |======================== -O2 -ftree-vectorize -ftree-slp-vectorize . 4239 |======================== -O2 -march=znver1 ......................... 4272 |======================== -O2 -flto ................................. 4095 |======================= -O3 ....................................... 4262 |======================== -O3 -march=znver1 ......................... 5068 |============================= -O3 -march=znver1 -flto ................... 4319 |========================= -Ofast -march=znver1 ...................... 4102 |======================= PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Only TPS > Higher Is Better -O0 ....................................... 9063 |================ -Og ....................................... 13333 |======================= -O1 ....................................... 13303 |======================= -O2 ....................................... 14931 |========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 15353 |=========================== -O2 -march=znver1 ......................... 15111 |========================== -O2 -flto ................................. 14851 |========================== -O3 ....................................... 15099 |========================== -O3 -march=znver1 ......................... 15188 |=========================== -O3 -march=znver1 -flto ................... 16012 |============================ -Ofast -march=znver1 ...................... 15352 |=========================== PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Write TPS > Higher Is Better -O0 ....................................... 886 |====================== -Og ....................................... 1080 |=========================== -O1 ....................................... 1065 |=========================== -O2 ....................................... 1037 |========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 1125 |============================ -O2 -march=znver1 ......................... 1060 |=========================== -O2 -flto ................................. 1127 |============================= -O3 ....................................... 1079 |=========================== -O3 -march=znver1 ......................... 1145 |============================= -O3 -march=znver1 -flto ................... 1074 |=========================== -Ofast -march=znver1 ...................... 1125 |============================ Hierarchical INTegration 1.0 Test: FLOAT QUIPs > Higher Is Better -O0 ....................................... 267404445 |======================== -Og ....................................... 267368671 |======================== -O1 ....................................... 268455578 |======================== -O2 ....................................... 267311970 |======================== -O2 -ftree-vectorize -ftree-slp-vectorize . 267172145 |======================== -O2 -march=znver1 ......................... 267268023 |======================== -O2 -flto ................................. 268173400 |======================== -O3 ....................................... 267315647 |======================== -O3 -march=znver1 ......................... 268506472 |======================== -O3 -march=znver1 -flto ................... 267239405 |======================== -Ofast -march=znver1 ...................... 267055407 |======================== Hierarchical INTegration 1.0 Test: DOUBLE QUIPs > Higher Is Better -O0 ....................................... 598545342 |======================= -Og ....................................... 597234266 |======================= -O1 ....................................... 585060029 |====================== -O2 ....................................... 599481605 |======================= -O2 -ftree-vectorize -ftree-slp-vectorize . 602535297 |======================= -O2 -march=znver1 ......................... 617516626 |======================== -O2 -flto ................................. 626640400 |======================== -O3 ....................................... 595428047 |======================= -O3 -march=znver1 ......................... 589289926 |======================= -O3 -march=znver1 -flto ................... 618644101 |======================== -Ofast -march=znver1 ...................... 605331833 |======================= SVT-AV1 2019-02-15 1080p 8-bit YUV To AV1 Video Encode Frames Per Second > Higher Is Better -O0 ....................................... 5.88 |============================= -Og ....................................... 5.86 |============================= -O1 ....................................... 5.87 |============================= -O2 ....................................... 5.81 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 5.89 |============================= -O2 -march=znver1 ......................... 5.84 |============================= -O2 -flto ................................. 5.90 |============================= -O3 ....................................... 5.90 |============================= -O3 -march=znver1 ......................... 5.89 |============================= -O3 -march=znver1 -flto ................... 5.84 |============================= -Ofast -march=znver1 ...................... 5.91 |============================= SVT-VP9 2019-02-17 1080p 8-bit YUV To VP9 Video Encode Frames Per Second > Higher Is Better -Og ....................................... 92.68 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 94.82 |=========================== -O2 -march=znver1 ......................... 95.91 |=========================== -O2 -flto ................................. 95.79 |=========================== -O3 -march=znver1 -flto ................... 97.26 |============================ -Ofast -march=znver1 ...................... 97.80 |============================ VP9 libvpx Encoding 1.8.0 vpxenc VP9 1080p Video Encode Frames Per Second > Higher Is Better -Og ....................................... 20.39 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 20.34 |=========================== -O2 -march=znver1 ......................... 20.05 |=========================== -O3 -march=znver1 -flto ................... 20.86 |============================ -Ofast -march=znver1 ...................... 20.13 |=========================== ctx_clock Context Switch Time Clocks < Lower Is Better -Og ....................................... 132 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 132 |============================== -O2 -flto ................................. 132 |============================== -O3 -march=znver1 -flto ................... 132 |==============================