Benchmarking the SLP Vectorizer via -fslp-vectorize on LLVM Clang 3.4 SVN for a future article on Phoronix.com
LLVM Clang 3.4 SVN Compiler Notes: Optimized build; Built Jul 28 2013 (21:43:17); Default target: x86_64-unknown-linux-gnu; Host CPU: corei7Processor Notes: Scaling Governor: acpi-cpufreq ondemand
-fslp-vectorize Processor: Intel Core i7 720Q @ 1.60GHz (8 Cores), Motherboard: LENOVO 4318CTO, Chipset: Intel Core DMI, Memory: 4096MB, Disk: 160GB INTEL SSDSA2M160, Graphics: NVIDIA Quadro FX 880M 1024MB (405/324MHz), Audio: Conexant CX20585, Network: Intel 82577LM Gigabit Connection + Intel Centrino Ultimate-N 6300
OS: Ubuntu 13.10, Kernel: 3.11.0-031100rc2-generic (x86_64), Desktop: Xfce 4.10, Display Server: X Server 1.14.2, Display Driver: nouveau 1.0.8, OpenGL: 3.0 Mesa 9.1.4 Gallium 0.4, Compiler: Clang 3.4 (SVN 187338) + LLVM 3.4svn, File-System: ext4, Screen Resolution: 1600x900
SciMark This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform -fslp-vectorize LLVM Clang 3.4 SVN 40 80 120 160 200 SE +/- 0.40, N = 4 SE +/- 0.61, N = 4 189.10 173.57 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native
Smallpt Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples -fslp-vectorize LLVM Clang 3.4 SVN 70 140 210 280 350 SE +/- 0.88, N = 3 SE +/- 0.58, N = 3 301 293 -fslp-vectorize 1. (CXX) g++ options: -fopenmp -O3 -march=native
Apache Benchmark OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.3 Static Web Page Serving -fslp-vectorize LLVM Clang 3.4 SVN 2K 4K 6K 8K 10K SE +/- 44.32, N = 3 SE +/- 60.14, N = 3 9715.20 9830.61 1. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native
SciMark This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply -fslp-vectorize LLVM Clang 3.4 SVN 300 600 900 1200 1500 SE +/- 1.06, N = 4 SE +/- 0.93, N = 4 1228.19 1234.66 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization -fslp-vectorize LLVM Clang 3.4 SVN 400 800 1200 1600 2000 SE +/- 3.58, N = 4 SE +/- 8.33, N = 4 1995.65 2003.23 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native
x264 OpenBenchmarking.org Frames Per Second, More Is Better x264 2013-06-08 H.264 Video Encoding -fslp-vectorize LLVM Clang 3.4 SVN 13 26 39 52 65 SE +/- 0.11, N = 5 SE +/- 0.14, N = 5 59.83 59.63 -fslp-vectorize 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
Timed PHP Compilation OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile -fslp-vectorize LLVM Clang 3.4 SVN 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 42.07 41.97 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm
SciMark This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation -fslp-vectorize LLVM Clang 3.4 SVN 200 400 600 800 1000 SE +/- 0.98, N = 4 SE +/- 0.98, N = 4 1093.98 1092.03 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo -fslp-vectorize LLVM Clang 3.4 SVN 80 160 240 320 400 SE +/- 0.00, N = 4 SE +/- 0.38, N = 4 379.41 378.75 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time -fslp-vectorize LLVM Clang 3.4 SVN 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 75.87 75.98 -fslp-vectorize 1. (CC) gcc options: -lm -lpthread -O3 -march=native
Timed MAFFT Alignment OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 6.864 Multiple Sequence Alignment -fslp-vectorize LLVM Clang 3.4 SVN 4 8 12 16 20 SE +/- 0.23, N = 5 SE +/- 0.25, N = 6 15.44 15.42 1. (CC) gcc options: -O3 -lm -lpthread
Timed HMMer Search This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search -fslp-vectorize LLVM Clang 3.4 SVN 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 27.30 27.33 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm
SciMark This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite -fslp-vectorize LLVM Clang 3.4 SVN 200 400 600 800 1000 SE +/- 0.78, N = 4 SE +/- 1.47, N = 4 977.27 976.45 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native
Primesieve OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 4.2 1e12 Prime Number Generation -fslp-vectorize LLVM Clang 3.4 SVN 130 260 390 520 650 SE +/- 0.14, N = 3 SE +/- 0.26, N = 3 614.34 614.07 1. (CXX) g++ options: -O2
FLAC Audio Encoding OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.0 WAV To FLAC -fslp-vectorize LLVM Clang 3.4 SVN 2 4 6 8 10 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 8.86 8.86 -fslp-vectorize 1. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -logg -lm
GraphicsMagick OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding -fslp-vectorize LLVM Clang 3.4 SVN 7 14 21 28 35 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 32 32 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: HWB Color Space -fslp-vectorize LLVM Clang 3.4 SVN 20 40 60 80 100 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 80 80 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Resizing -fslp-vectorize LLVM Clang 3.4 SVN 15 30 45 60 75 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 66 66 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Sharpen -fslp-vectorize LLVM Clang 3.4 SVN 8 16 24 32 40 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 36 36 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Blur -fslp-vectorize LLVM Clang 3.4 SVN 12 24 36 48 60 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 53 53 -fslp-vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lxml2 -lz -lm -lpthread
BLAKE2 OpenBenchmarking.org Cycles Per Byte, Fewer Is Better BLAKE2 20121223 Phoronix Test Suite v4.8.0m4 -fslp-vectorize LLVM Clang 3.4 SVN 0.945 1.89 2.835 3.78 4.725 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 4.20 4.20 1. (CC) gcc options: -std=gnu99 -O3 -march=native -lcrypto -lz
LLVM Clang 3.4 SVN Compiler Notes: Optimized build; Built Jul 28 2013 (21:43:17); Default target: x86_64-unknown-linux-gnu; Host CPU: corei7Processor Notes: Scaling Governor: acpi-cpufreq ondemand
Testing initiated at 28 July 2013 22:40 by user phoronix.
-fslp-vectorize Processor: Intel Core i7 720Q @ 1.60GHz (8 Cores), Motherboard: LENOVO 4318CTO, Chipset: Intel Core DMI, Memory: 4096MB, Disk: 160GB INTEL SSDSA2M160, Graphics: NVIDIA Quadro FX 880M 1024MB (405/324MHz), Audio: Conexant CX20585, Network: Intel 82577LM Gigabit Connection + Intel Centrino Ultimate-N 6300
OS: Ubuntu 13.10, Kernel: 3.11.0-031100rc2-generic (x86_64), Desktop: Xfce 4.10, Display Server: X Server 1.14.2, Display Driver: nouveau 1.0.8, OpenGL: 3.0 Mesa 9.1.4 Gallium 0.4, Compiler: Clang 3.4 (SVN 187338) + LLVM 3.4svn, File-System: ext4, Screen Resolution: 1600x900
Compiler Notes: Optimized build; Built Jul 28 2013 (21:43:17); Default target: x86_64-unknown-linux-gnu; Host CPU: corei7Processor Notes: Scaling Governor: acpi-cpufreq ondemand
Testing initiated at 29 July 2013 10:13 by user phoronix.