Debian Linux GCC 8 Benchmark -mindirect-branch=thunk GCC 8 benchmarking of user-space with -mindirect-branch=thunk and -mindirect-branch=thunk-inline for retpolines. Tests by Michael Larabel for a future article on Phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1801316-AL-1801161PT71&grs&sro .
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Driver OpenGL Compiler File-System Screen Resolution Display Server Vulkan Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests Intel Core i9-7980XE @ 4.40GHz (18 Cores / 36 Threads) ASUS PRIME X299-A (1004 BIOS) Intel Device 2020 16384MB 120GB Force MP500 LLVMpipe Realtek ALC1220 Acer B286HK Intel Connection Debian 9.3 4.15.0-rc8-retpo-underflow (x86_64) 20180115 GNOME Shell 3.22.3 modesetting 1.19.2 3.3 Mesa 13.0.6 Gallium 0.4 (LLVM 3.9 256 bits) GCC 8.0.1 20180115 ext4 3840x2160 AMD Ryzen 7 1800X Eight-Core @ 3.70GHz (8 Cores / 16 Threads) ASRock X370 Professional Gaming AMD Family 17h 64512MB 500GB Samsung SSD 850 + 2000GB Western Digital WD2003FZEX-0 llvmpipe 6144MB (139/405MHz) NVIDIA GP106 HD Audio HP ZR2440w Aquantia Device d108 + Intel Device 24fb openSUSE 20180129 4.14.15-1-default (x86_64) X Server 1.19.6 NVIDIA 384.111 3.3 Mesa 17.3.3 (LLVM 5.0 128 bits) 1.0.65 GCC 7.3.0 + Clang 5.0.1 (SVN 312548) + LLVM 5.0.1 + ICC + CUDA 8.0 btrfs 3840x1200 OpenBenchmarking.org Environment Details - Stock: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native - -mindirect-branch=thunk: CXXFLAGS=-O3-march=native-mindirect-branch=thunk CFLAGS=-O3-march=native-mindirect-branch=thunk - -mindirect-branch=thunk-inline: CXXFLAGS=-O3-march=native-mindirect-branch=thunk-inline CFLAGS=-O3-march=native-mindirect-branch=thunk-inline Compiler Details - Stock: --disable-multilib --enable-checking=release - -mindirect-branch=thunk: --disable-multilib --enable-checking=release - -mindirect-branch=thunk-inline: --disable-multilib --enable-checking=release - spectre-tests: --build=x86_64-suse-linux --disable-libcc1 --disable-libssp --disable-libstdcxx-pch --disable-libvtv --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-gnu-indirect-function --enable-languages=c,c++,objc,fortran,obj-c++,ada,go --enable-libstdcxx-allocator=new --enable-linux-futex --enable-multilib --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --enable-plugin --enable-ssp --enable-version-specific-runtime-libs --host=x86_64-suse-linux --mandir=/usr/share/man --with-arch-32=x86-64 --with-gcc-major-version-only --with-slibdir=/lib64 --with-tune=generic --without-cuda-driver --without-system-libunwind Disk Details - Stock, -mindirect-branch=thunk: NONE / data=ordered,errors=remount-ro,relatime,rw Processor Details - Stock: Scaling Governor: intel_pstate powersave - -mindirect-branch=thunk: Scaling Governor: intel_pstate powersave - -mindirect-branch=thunk-inline: Scaling Governor: intel_pstate powersave - spectre-tests: Scaling Governor: acpi-cpufreq ondemand Python Details - Stock, -mindirect-branch=thunk: Python 2.7.13 + Python 3.5.3 Security Details - Stock, -mindirect-branch=thunk, -mindirect-branch=thunk-inline: KPTI Full retpoline with underflow protection Protection System Details - spectre-tests: Anisotropic Filtering: 16x.
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk hpcg: mpcbench: Multi-Precision Benchmark tscp: AI Chess Performance stockfish: Total Time bullet: Convex Trimesh bullet: Prim Trimesh bullet: Raytests bullet: 1000 Stack ffmpeg: H.264 HD To NTSC DV redis: GET hpcc: G-Ffte hpcc: G-HPL redis: SET pgbench: Buffer Test - Heavy Contention - Read Write pgbench: Buffer Test - Normal Load - Read Write bullet: 1000 Convex bullet: 3000 Fall Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests 1.38 10013 1386794 2904 1.08 0.92 2.53 4.44 13.29 2222262.58 5.88475 85.93090 1399046.62 11290.86 11387.09 4.37 3.88 1.35 9643 1185357 3074 1.23 1.00 2.78 4.57 13.89 2187123.33 5.58564 86.04620 1280528.47 11147.89 11104.86 4.65 4.11 1.38 9830 1116421 3228 1.28 1.03 2.89 4.83 14.13 2160702.13 5.56022 85.97547 1457026.92 10540.60 11460.43 4.93 6.53 1.01 7557 1068228 3454 1.11 0.89 4.85 2297157.92 1676110.71 3070.48 2455.92 4.51 4.25 OpenBenchmarking.org
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.0 -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 0.3105 0.621 0.9315 1.242 1.5525 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 1.35 1.38 1.38 1.01
GNU MPC Multi-Precision Benchmark OpenBenchmarking.org Global Score, More Is Better GNU MPC 1.1.0 Multi-Precision Benchmark -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 2K 4K 6K 8K 10K SE +/- 84.52, N = 3 SE +/- 75.72, N = 3 SE +/- 43.72, N = 3 SE +/- 8.82, N = 3 9643 9830 10013 7557 -lm -O3 -march=native -lm -O3 -march=native -lm -O3 -march=native -O2 -pedantic -fomit-frame-pointer -m64 -mtune=k8 -march=k8 1. (CC) gcc options: -MT -MD -MP -MF
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 300K 600K 900K 1200K 1500K SE +/- 10473.03, N = 5 SE +/- 17216.49, N = 5 SE +/- 18404.85, N = 6 SE +/- 507.53, N = 5 1185357 1116421 1386794 1068228 1. (CC) gcc options: -O3 -march=native
Stockfish Total Time OpenBenchmarking.org ms, Fewer Is Better Stockfish 2014-11-26 Total Time -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 700 1400 2100 2800 3500 SE +/- 44.96, N = 3 SE +/- 56.05, N = 3 SE +/- 6.11, N = 3 SE +/- 6.06, N = 3 3074 3228 2904 3454 -march=native -march=native -march=native 1. (CXX) g++ options: -lpthread -O3 -fno-exceptions -fno-rtti -ansi -pedantic -msse -msse3 -mpopcnt -flto
Bullet Physics Engine Test: Convex Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 0.288 0.576 0.864 1.152 1.44 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 1.23 1.28 1.08 1.11 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Prim Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 0.2318 0.4636 0.6954 0.9272 1.159 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 1.00 1.03 0.92 0.89 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 0.6503 1.3006 1.9509 2.6012 3.2515 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 2.78 2.89 2.53 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Stack OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 1.0913 2.1826 3.2739 4.3652 5.4565 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 4.57 4.83 4.44 4.85 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 3.3.3 H.264 HD To NTSC DV -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 4 8 12 16 20 SE +/- 0.30, N = 6 SE +/- 0.32, N = 6 SE +/- 0.31, N = 6 13.89 14.13 13.29 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lxcb -lxcb-shm -lxcb-xfixes -lxcb-shape -lasound -lm -llzma -lbz2 -pthread -O3 -march=native -std=c11 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 500K 1000K 1500K 2000K 2500K SE +/- 40180.95, N = 6 SE +/- 41655.36, N = 6 SE +/- 43689.11, N = 3 SE +/- 8828.42, N = 3 2187123.33 2160702.13 2222262.58 2297157.92 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 SE +/- 0.18843, N = 3 5.58564 5.56022 5.88475 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 SE +/- 0.18843, N = 3 5.58564 5.56022 5.88475 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 86.05 85.98 85.93 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 400K 800K 1200K 1600K 2000K SE +/- 160590.96, N = 6 SE +/- 2554.75, N = 3 SE +/- 34299.85, N = 6 SE +/- 10561.86, N = 3 1280528.47 1457026.92 1399046.62 1676110.71 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 2K 4K 6K 8K 10K SE +/- 326.63, N = 6 SE +/- 43.65, N = 3 SE +/- 264.90, N = 6 SE +/- 43.25, N = 6 11147.89 10540.60 11290.86 3070.48 -O3 -march=native -O3 -march=native -O3 -march=native -O2 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 2K 4K 6K 8K 10K SE +/- 276.43, N = 6 SE +/- 63.39, N = 3 SE +/- 221.41, N = 6 SE +/- 41.62, N = 3 11104.86 11460.43 11387.09 2455.92 -O3 -march=native -O3 -march=native -O3 -march=native -O2 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
Bullet Physics Engine Test: 1000 Convex OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 1.1093 2.2186 3.3279 4.4372 5.5465 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 SE +/- 0.17, N = 3 SE +/- 0.03, N = 3 4.65 4.93 4.37 4.51 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock spectre-tests 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 2.39, N = 3 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 4.11 6.53 3.88 4.25 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Phoronix Test Suite v10.8.5