GCC 4.9 Compiler Optimization Tuning AMD Kaveri AMD Steamroller CPU Cores on AMD A10-7850K Kaveri APU compiler optimization tuning with various march= values. Benchmarks by Michael Larabel for a future article on Phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1401282-PL-GCC49COMP74&grs .
GCC 4.9 Compiler Optimization Tuning AMD Kaveri Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Driver Compiler File-System Screen Resolution k8 barcelona bdver1 bdver2 bdver3 AMD A10-7850K APU with Radeon R7 @ 3.70GHz (4 Cores) Gigabyte F2A88XM-D3H AMD Device 1422 7168MB 120GB KINGSTON SV300S3 AMD Kaveri 1024MB ATI R6xx HDMI TSB-TV Realtek RTL8111/8168/8411 Ubuntu 14.04 3.13.0-5-generic (x86_64) Unity 7.1.2 radeon 7.2.99 GCC 4.9.0 20140126 + Clang 3.4 + LLVM 3.4 ext4 1920x1080 OpenBenchmarking.org Kernel Details - radeon.dpm=1 Compiler Details - --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran Processor Details - Scaling Governor: acpi-cpufreq ondemand
GCC 4.9 Compiler Optimization Tuning AMD Kaveri c-ray: Total Time encode-flac: WAV To FLAC graphics-magick: Resizing graphics-magick: Sharpen graphics-magick: Blur graphics-magick: HWB Color Space scimark2: Monte Carlo scimark2: Fast Fourier Transform graphics-magick: Local Adaptive Thresholding himeno: Poisson Pressure Solver build-php: Time To Compile scimark2: Sparse Matrix Multiply tscp: AI Chess Performance scimark2: Composite scimark2: Dense LU Matrix Factorization build-apache: Time To Compile x264: H.264 Video Encoding scimark2: Jacobi Successive Over-Relaxation k8 barcelona bdver1 bdver2 bdver3 87.90 6.62 106 71 97 126 397.86 73.59 77 867.36 56.54 865.92 760113 636.54 1156.29 58.55 83.43 689.02 53.33 6.90 120 72 93 138 384.47 68.99 76 898.33 56.60 849.34 742690 629.44 1155.51 58.68 83.66 688.88 40.67 5.29 133 87 108 139 423.73 68.50 81 894.27 58.45 860.41 738311 640.56 1162.41 59.07 84.14 687.76 40.55 5.47 133 81 110 139 423.28 71.15 81 905.30 58.54 877.91 738707 644.89 1164.64 58.91 83.85 687.47 40.54 5.52 133 81 106 139 413.64 70.77 80 902.98 58.48 866.81 739101 641.05 1165.94 58.83 83.83 688.08 OpenBenchmarking.org
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time k8 barcelona bdver1 bdver2 bdver3 20 40 60 80 100 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 87.90 53.33 40.67 40.55 40.54 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -lm -lpthread -O3
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.0 WAV To FLAC k8 barcelona bdver1 bdver2 bdver3 2 4 6 8 10 SE +/- 0.02, N = 5 SE +/- 0.01, N = 5 SE +/- 0.02, N = 5 SE +/- 0.01, N = 5 SE +/- 0.02, N = 5 6.62 6.90 5.29 5.47 5.52 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3 -fvisibility=hidden -logg -lm
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Resizing k8 barcelona bdver1 bdver2 bdver3 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 106 120 133 133 133 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -lwebp -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Sharpen k8 barcelona bdver1 bdver2 bdver3 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 71 72 87 81 81 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -lwebp -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Blur OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Blur k8 barcelona bdver1 bdver2 bdver3 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 97 93 108 110 106 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -lwebp -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: HWB Color Space k8 barcelona bdver1 bdver2 bdver3 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 126 138 139 139 139 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -lwebp -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo k8 barcelona bdver1 bdver2 bdver3 90 180 270 360 450 SE +/- 7.32, N = 4 SE +/- 0.60, N = 4 SE +/- 0.77, N = 4 SE +/- 1.61, N = 4 SE +/- 10.39, N = 4 397.86 384.47 423.73 423.28 413.64 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform k8 barcelona bdver1 bdver2 bdver3 16 32 48 64 80 SE +/- 0.22, N = 4 SE +/- 0.94, N = 4 SE +/- 1.36, N = 4 SE +/- 0.10, N = 4 SE +/- 0.16, N = 4 73.59 68.99 68.50 71.15 70.77 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3
GraphicsMagick Operation: Local Adaptive Thresholding OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Local Adaptive Thresholding k8 barcelona bdver1 bdver2 bdver3 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 77 76 81 81 80 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -lwebp -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver k8 barcelona bdver1 bdver2 bdver3 200 400 600 800 1000 SE +/- 6.79, N = 3 SE +/- 0.19, N = 3 SE +/- 9.68, N = 3 SE +/- 1.51, N = 3 SE +/- 3.10, N = 3 867.36 898.33 894.27 905.30 902.98 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -O3
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile k8 barcelona bdver1 bdver2 bdver3 13 26 39 52 65 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.01, N = 3 56.54 56.60 58.45 58.54 58.48 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -O3 -pedantic -ldl -lz -lm
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply k8 barcelona bdver1 bdver2 bdver3 200 400 600 800 1000 SE +/- 4.11, N = 4 SE +/- 15.60, N = 4 SE +/- 9.94, N = 4 SE +/- 2.24, N = 4 SE +/- 7.27, N = 4 865.92 849.34 860.41 877.91 866.81 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance k8 barcelona bdver1 bdver2 bdver3 160K 320K 480K 640K 800K SE +/- 257.20, N = 5 SE +/- 600.77, N = 5 SE +/- 699.90, N = 5 SE +/- 671.07, N = 5 SE +/- 198.20, N = 5 760113 742690 738311 738707 739101 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -O3
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite k8 barcelona bdver1 bdver2 bdver3 140 280 420 560 700 SE +/- 1.68, N = 4 SE +/- 4.49, N = 4 SE +/- 1.89, N = 4 SE +/- 1.36, N = 4 SE +/- 1.36, N = 4 636.54 629.44 640.56 644.89 641.05 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization k8 barcelona bdver1 bdver2 bdver3 300 600 900 1200 1500 SE +/- 6.32, N = 4 SE +/- 5.67, N = 4 SE +/- 1.67, N = 4 SE +/- 3.97, N = 4 SE +/- 2.87, N = 4 1156.29 1155.51 1162.41 1164.64 1165.94 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3
Timed Apache Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Apache Compilation 2.4.7 Time To Compile k8 barcelona bdver1 bdver2 bdver3 13 26 39 52 65 SE +/- 0.19, N = 3 SE +/- 0.15, N = 3 SE +/- 0.21, N = 3 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 58.55 58.68 59.07 58.91 58.83
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2014-01-09 H.264 Video Encoding k8 barcelona bdver1 bdver2 bdver3 20 40 60 80 100 SE +/- 0.71, N = 5 SE +/- 0.46, N = 5 SE +/- 0.73, N = 5 SE +/- 0.64, N = 5 SE +/- 0.25, N = 5 83.43 83.66 84.14 83.85 83.83 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation k8 barcelona bdver1 bdver2 bdver3 150 300 450 600 750 SE +/- 0.12, N = 4 SE +/- 0.08, N = 4 SE +/- 0.38, N = 4 SE +/- 0.80, N = 4 SE +/- 0.38, N = 4 689.02 688.88 687.76 687.47 688.08 -march=k8 -march=barcelona -march=bdver1 -march=bdver2 -march=bdver3 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5