Debian Linux GCC 8 Benchmark -mindirect-branch=thunk GCC 8 benchmarking of user-space with -mindirect-branch=thunk and -mindirect-branch=thunk-inline for retpolines. Tests by Michael Larabel for a future article on Phoronix.com.
HTML result view exported from: https://openbenchmarking.org/result/1801316-AL-1801161PT71&grt&sor .
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Driver OpenGL Compiler File-System Screen Resolution Display Server Vulkan Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests Intel Core i9-7980XE @ 4.40GHz (18 Cores / 36 Threads) ASUS PRIME X299-A (1004 BIOS) Intel Device 2020 16384MB 120GB Force MP500 LLVMpipe Realtek ALC1220 Acer B286HK Intel Connection Debian 9.3 4.15.0-rc8-retpo-underflow (x86_64) 20180115 GNOME Shell 3.22.3 modesetting 1.19.2 3.3 Mesa 13.0.6 Gallium 0.4 (LLVM 3.9 256 bits) GCC 8.0.1 20180115 ext4 3840x2160 AMD Ryzen 7 1800X Eight-Core @ 3.70GHz (8 Cores / 16 Threads) ASRock X370 Professional Gaming AMD Family 17h 64512MB 500GB Samsung SSD 850 + 2000GB Western Digital WD2003FZEX-0 llvmpipe 6144MB (139/405MHz) NVIDIA GP106 HD Audio HP ZR2440w Aquantia Device d108 + Intel Device 24fb openSUSE 20180129 4.14.15-1-default (x86_64) X Server 1.19.6 NVIDIA 384.111 3.3 Mesa 17.3.3 (LLVM 5.0 128 bits) 1.0.65 GCC 7.3.0 + Clang 5.0.1 (SVN 312548) + LLVM 5.0.1 + ICC + CUDA 8.0 btrfs 3840x1200 OpenBenchmarking.org Environment Details - Stock: CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native - -mindirect-branch=thunk: CXXFLAGS=-O3-march=native-mindirect-branch=thunk CFLAGS=-O3-march=native-mindirect-branch=thunk - -mindirect-branch=thunk-inline: CXXFLAGS=-O3-march=native-mindirect-branch=thunk-inline CFLAGS=-O3-march=native-mindirect-branch=thunk-inline Compiler Details - Stock: --disable-multilib --enable-checking=release - -mindirect-branch=thunk: --disable-multilib --enable-checking=release - -mindirect-branch=thunk-inline: --disable-multilib --enable-checking=release - spectre-tests: --build=x86_64-suse-linux --disable-libcc1 --disable-libssp --disable-libstdcxx-pch --disable-libvtv --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-gnu-indirect-function --enable-languages=c,c++,objc,fortran,obj-c++,ada,go --enable-libstdcxx-allocator=new --enable-linux-futex --enable-multilib --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --enable-plugin --enable-ssp --enable-version-specific-runtime-libs --host=x86_64-suse-linux --mandir=/usr/share/man --with-arch-32=x86-64 --with-gcc-major-version-only --with-slibdir=/lib64 --with-tune=generic --without-cuda-driver --without-system-libunwind Disk Details - Stock, -mindirect-branch=thunk: NONE / data=ordered,errors=remount-ro,relatime,rw Processor Details - Stock: Scaling Governor: intel_pstate powersave - -mindirect-branch=thunk: Scaling Governor: intel_pstate powersave - -mindirect-branch=thunk-inline: Scaling Governor: intel_pstate powersave - spectre-tests: Scaling Governor: acpi-cpufreq ondemand Python Details - Stock, -mindirect-branch=thunk: Python 2.7.13 + Python 3.5.3 Security Details - Stock, -mindirect-branch=thunk, -mindirect-branch=thunk-inline: KPTI Full retpoline with underflow protection Protection System Details - spectre-tests: Anisotropic Filtering: 16x.
Debian Linux GCC 8 Benchmark -mindirect-branch=thunk bullet: Raytests bullet: 3000 Fall bullet: 1000 Stack bullet: 1000 Convex bullet: Prim Trimesh bullet: Convex Trimesh ffmpeg: H.264 HD To NTSC DV mpcbench: Multi-Precision Benchmark hpcg: hpcc: G-HPL hpcc: G-Ffte pgbench: Buffer Test - Normal Load - Read Write pgbench: Buffer Test - Heavy Contention - Read Write redis: GET redis: SET stockfish: Total Time tscp: AI Chess Performance Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests 2.53 3.88 4.44 4.37 0.92 1.08 13.29 10013 1.38 85.93090 5.88475 11387.09 11290.86 2222262.58 1399046.62 2904 1386794 2.78 4.11 4.57 4.65 1.00 1.23 13.89 9643 1.35 86.04620 5.58564 11104.86 11147.89 2187123.33 1280528.47 3074 1185357 2.89 6.53 4.83 4.93 1.03 1.28 14.13 9830 1.38 85.97547 5.56022 11460.43 10540.60 2160702.13 1457026.92 3228 1116421 4.25 4.85 4.51 0.89 1.11 7557 1.01 2455.92 3070.48 2297157.92 1676110.71 3454 1068228 OpenBenchmarking.org
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.6503 1.3006 1.9509 2.6012 3.2515 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 2.53 2.78 2.89 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall Stock -mindirect-branch=thunk spectre-tests -mindirect-branch=thunk-inline 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 SE +/- 2.39, N = 3 3.88 4.11 4.25 6.53 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Stack OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Stack Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests 1.0913 2.1826 3.2739 4.3652 5.4565 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 4.44 4.57 4.83 4.85 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 1000 Convex OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 1000 Convex Stock spectre-tests -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.1093 2.2186 3.3279 4.4372 5.5465 SE +/- 0.17, N = 3 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 4.37 4.51 4.65 4.93 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Prim Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Prim Trimesh spectre-tests Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.2318 0.4636 0.6954 0.9272 1.159 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 0.89 0.92 1.00 1.03 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Convex Trimesh OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Convex Trimesh Stock spectre-tests -mindirect-branch=thunk -mindirect-branch=thunk-inline 0.288 0.576 0.864 1.152 1.44 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 1.08 1.11 1.23 1.28 -march=native -march=native -march=native 1. (CXX) g++ options: -O3 -rdynamic -lglut -lGL -lGLU
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 3.3.3 H.264 HD To NTSC DV Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 4 8 12 16 20 SE +/- 0.31, N = 6 SE +/- 0.30, N = 6 SE +/- 0.32, N = 6 13.29 13.89 14.13 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lxcb -lxcb-shm -lxcb-xfixes -lxcb-shape -lasound -lm -llzma -lbz2 -pthread -O3 -march=native -std=c11 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
GNU MPC Multi-Precision Benchmark OpenBenchmarking.org Global Score, More Is Better GNU MPC 1.1.0 Multi-Precision Benchmark Stock -mindirect-branch=thunk-inline -mindirect-branch=thunk spectre-tests 2K 4K 6K 8K 10K SE +/- 43.72, N = 3 SE +/- 75.72, N = 3 SE +/- 84.52, N = 3 SE +/- 8.82, N = 3 10013 9830 9643 7557 -lm -O3 -march=native -lm -O3 -march=native -lm -O3 -march=native -O2 -pedantic -fomit-frame-pointer -m64 -mtune=k8 -march=k8 1. (CC) gcc options: -MT -MD -MP -MF
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.0 -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk spectre-tests 0.3105 0.621 0.9315 1.242 1.5525 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.38 1.38 1.35 1.01
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -mindirect-branch=thunk -mindirect-branch=thunk-inline Stock 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 86.05 85.98 85.93 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.18843, N = 3 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 5.88475 5.58564 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 1.3241 2.6482 3.9723 5.2964 6.6205 SE +/- 0.18843, N = 3 SE +/- 0.01968, N = 3 SE +/- 0.02991, N = 3 5.88475 5.58564 5.56022 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 2.0.2
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk spectre-tests 2K 4K 6K 8K 10K SE +/- 63.39, N = 3 SE +/- 221.41, N = 6 SE +/- 276.43, N = 6 SE +/- 41.62, N = 3 11460.43 11387.09 11104.86 2455.92 -O3 -march=native -O3 -march=native -O3 -march=native -O2 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 10.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests 2K 4K 6K 8K 10K SE +/- 264.90, N = 6 SE +/- 326.63, N = 6 SE +/- 43.65, N = 3 SE +/- 43.25, N = 6 11290.86 11147.89 10540.60 3070.48 -O3 -march=native -O3 -march=native -O3 -march=native -O2 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -fPIC -lpgcommon -lpgport -lpthread -lrt -lcrypt -ldl -lm
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET spectre-tests Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline 500K 1000K 1500K 2000K 2500K SE +/- 8828.42, N = 3 SE +/- 43689.11, N = 3 SE +/- 40180.95, N = 6 SE +/- 41655.36, N = 6 2297157.92 2222262.58 2187123.33 2160702.13 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET spectre-tests -mindirect-branch=thunk-inline Stock -mindirect-branch=thunk 400K 800K 1200K 1600K 2000K SE +/- 10561.86, N = 3 SE +/- 2554.75, N = 3 SE +/- 34299.85, N = 6 SE +/- 160590.96, N = 6 1676110.71 1457026.92 1399046.62 1280528.47 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread
Stockfish Total Time OpenBenchmarking.org ms, Fewer Is Better Stockfish 2014-11-26 Total Time Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests 700 1400 2100 2800 3500 SE +/- 6.11, N = 3 SE +/- 44.96, N = 3 SE +/- 56.05, N = 3 SE +/- 6.06, N = 3 2904 3074 3228 3454 -march=native -march=native -march=native 1. (CXX) g++ options: -lpthread -O3 -fno-exceptions -fno-rtti -ansi -pedantic -msse -msse3 -mpopcnt -flto
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance Stock -mindirect-branch=thunk -mindirect-branch=thunk-inline spectre-tests 300K 600K 900K 1200K 1500K SE +/- 18404.85, N = 6 SE +/- 10473.03, N = 5 SE +/- 17216.49, N = 5 SE +/- 507.53, N = 5 1386794 1185357 1116421 1068228 1. (CC) gcc options: -O3 -march=native
Phoronix Test Suite v10.8.5