LLVM Clang 3.2 Loop Vectorizer Intel Core i7-3960X testing of the automatic loop vectorizer in LLVM 3.2 with the Clang compiler. Benchmarking by Michael Larabel for a future article on phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1210264-RA-LLVMLOOPV66&sor .
LLVM Clang 3.2 Loop Vectorizer Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution Default Loop Vectorization Intel Core i7-3960X @ 3.30GHz (12 Cores) Intel DX79SI Intel Xeon E5/Core 8192MB 64GB OCZ VERTEX AMD Radeon HD 4650 512MB Realtek ALC892 DELL S2409W Intel 82579LM Gigabit Connection Ubuntu 12.10 3.5.0-17-generic (x86_64) Unity 6.8.0 X Server 1.13.0 radeon 6.99.99 2.1 Mesa 9.0 Gallium 0.4 Clang 3.2 (SVN 166775) + LLVM 3.2svn ext4 1920x1080 OpenBenchmarking.org Compiler Details - Optimized build; Built Oct 26 2012 (10:29:50); Default target: x86_64-unknown-linux-gnu; Host CPU: corei7-avx Processor Details - Scaling Governor: ondemand System Details - Compiz was running on this system.
LLVM Clang 3.2 Loop Vectorizer hmmer: Pfam Database Search graphics-magick: Blur graphics-magick: Sharpen graphics-magick: Resizing graphics-magick: HWB Color Space graphics-magick: Local Adaptive Thresholding himeno: Poisson Pressure Solver c-ray: Total Time smallpt: Global Illumination Renderer; 100 Samples pgbench: TPC-B Transactions Per Second Default Loop Vectorization 15.35 82 31 89 123 44 1595.35 20.92 153 324.77 15.37 73 31 84 122 20 1554.66 23.15 157 336.65 OpenBenchmarking.org
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search Default Loop Vectorization 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 15.35 15.37 -mllvm -vectorize 1. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm
GraphicsMagick Operation: Blur OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Blur Default Loop Vectorization 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 82 73 -mllvm -vectorize -lpng12 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Sharpen Loop Vectorization Default 7 14 21 28 35 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 31 31 -mllvm -vectorize -lpng12 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Resizing Default Loop Vectorization 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 89 84 -mllvm -vectorize -lpng12 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lX11 -lz -lm -lpthread
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: HWB Color Space Default Loop Vectorization 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 123 122 -mllvm -vectorize -lpng12 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Local Adaptive Thresholding OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding Default Loop Vectorization 10 20 30 40 50 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 44 20 -mllvm -vectorize -lpng12 1. (CC) gcc options: -O3 -march=native -pthread -lXext -lX11 -lz -lm -lpthread
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver Default Loop Vectorization 300 600 900 1200 1500 SE +/- 3.05, N = 3 SE +/- 6.98, N = 3 1595.35 1554.66 -mllvm -vectorize 1. (CC) gcc options: -O3 -march=native
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time Default Loop Vectorization 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 20.92 23.15 -mllvm -vectorize 1. (CC) gcc options: -lm -lpthread -O3 -march=native
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples Default Loop Vectorization 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 153 157 -mllvm -vectorize 1. (CXX) g++ options: -fopenmp -O3 -march=native
PostgreSQL pgbench TPC-B Transactions Per Second OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 8.4.11 TPC-B Transactions Per Second Loop Vectorization Default 70 140 210 280 350 SE +/- 3.97, N = 3 SE +/- 0.52, N = 3 336.65 324.77 -mllvm -vectorize 1. (CC) gcc options: -O3 -march=native -fno-strict-aliasing -fwrapv -lpgport -lpq -lcrypt -ldl -lm
Phoronix Test Suite v10.8.4