Intel FSGSBASE benchmarking by Michael Larabel for a future article.
FSGSBASE Enabled Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002cJava Notes: OpenJDK Runtime Environment (build 11.0.7-ea+9-post-Ubuntu-1ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
nofsgsbase Processor: 2 x Intel Xeon Gold 5220R @ 3.90GHz (36 Cores / 72 Threads), Motherboard: TYAN S7106 (V2.01.B40 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 94GB, Disk: 500GB Samsung SSD 860, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel I210 + 2 x QLogic cLOM8214 1/10GbE
OS: Ubuntu 20.04, Kernel: 5.8.0-rc1-phx-fsgsbase (x86_64) 20200620, Desktop: GNOME Shell 3.36.1, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x5002f01Java Notes: OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
AOM AV1 This is a simple test of the AOMedia AV1 encoder run on the CPU with a sample video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 0 Two-Pass FSGSBASE Enabled nofsgsbase 0.0608 0.1216 0.1824 0.2432 0.304 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.27 0.27 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 4 Two-Pass FSGSBASE Enabled nofsgsbase 0.4388 0.8776 1.3164 1.7552 2.194 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.95 1.94 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Realtime FSGSBASE Enabled nofsgsbase 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.09, N = 3 10.86 10.78 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Two-Pass FSGSBASE Enabled nofsgsbase 0.6705 1.341 2.0115 2.682 3.3525 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 2.98 2.91 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 8 Realtime FSGSBASE Enabled nofsgsbase 6 12 18 24 30 SE +/- 0.17, N = 3 SE +/- 0.31, N = 3 23.74 23.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Increment - Clients: 1 FSGSBASE Enabled nofsgsbase 70 140 210 280 350 SE +/- 2.52, N = 11 SE +/- 2.70, N = 15 291 308
OpenBenchmarking.org Rows Per Second, More Is Better Apache HBase 2.2.3 Test: Random Read - Clients: 1 FSGSBASE Enabled nofsgsbase 1000 2000 3000 4000 5000 SE +/- 56.05, N = 15 SE +/- 48.14, N = 15 4643 4625
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Random Read - Clients: 1 FSGSBASE Enabled nofsgsbase 50 100 150 200 250 SE +/- 2.57, N = 15 SE +/- 2.20, N = 15 213 214
OpenBenchmarking.org Rows Per Second, More Is Better Apache HBase 2.2.3 Test: Sequential Read - Clients: 1 FSGSBASE Enabled nofsgsbase 1100 2200 3300 4400 5500 SE +/- 93.78, N = 15 SE +/- 44.45, N = 15 5270 5090
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Sequential Read - Clients: 1 FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 3.62, N = 15 SE +/- 1.75, N = 15 189 195
OpenBenchmarking.org Rows Per Second, More Is Better Apache HBase 2.2.3 Test: Async Random Read - Clients: 1 FSGSBASE Enabled nofsgsbase 1100 2200 3300 4400 5500 SE +/- 81.85, N = 15 SE +/- 78.77, N = 12 5245 5137
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Async Random Read - Clients: 1 FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 3.20, N = 15 SE +/- 3.36, N = 12 189 193
OpenBenchmarking.org Transactions Per Second, More Is Better Apache Siege 2.4.29 Concurrent Users: 50 FSGSBASE Enabled nofsgsbase 7K 14K 21K 28K 35K SE +/- 194.24, N = 3 SE +/- 188.88, N = 3 33173.45 33180.66 1. (CC) gcc options: -O3 -march=native -lpthread -ldl -lssl -lcrypto
OpenBenchmarking.org Transactions Per Second, More Is Better Apache Siege 2.4.29 Concurrent Users: 200 FSGSBASE Enabled nofsgsbase 10K 20K 30K 40K 50K SE +/- 1308.05, N = 12 SE +/- 254.67, N = 3 48540.41 43531.70 1. (CC) gcc options: -O3 -march=native -lpthread -ldl -lssl -lcrypto
BlogBench BlogBench is designed to replicate the load of a real-world busy file server by stressing the file-system with multiple threads of random reads, writes, and rewrites. The behavior is mimicked of that of a blog by creating blogs with content and pictures, modifying blog posts, adding comments to these blogs, and then reading the content of the blogs. All of these blogs generated are created locally with fake content and pictures. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Final Score, More Is Better BlogBench 1.1 Test: Write FSGSBASE Enabled nofsgsbase 5K 10K 15K 20K 25K SE +/- 920.31, N = 3 SE +/- 1371.68, N = 3 24247 20455 1. (CC) gcc options: -O3 -march=native -pthread
CP2K Molecular Dynamics CP2K is an open-source molecular dynamics software package focused on quantum chemistry and solid-state physics. This test profile currently makes use of the OpenMP implementation and using the Fayalite-FIST molecular dynamics run and measures the total time to complete. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 6.1 Fayalite-FIST Data FSGSBASE Enabled nofsgsbase 400 800 1200 1600 2000 2027.69 1886.26
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p FSGSBASE Enabled nofsgsbase 70 140 210 280 350 SE +/- 3.13, N = 3 SE +/- 4.08, N = 3 329.39 328.03 MIN: 204.26 / MAX: 425.36 MIN: 183.84 / MAX: 426.68 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 4K FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 2.74, N = 3 SE +/- 0.93, N = 3 182.78 180.77 MIN: 88.23 / MAX: 199.52 MIN: 91.75 / MAX: 195.31 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 1080p FSGSBASE Enabled nofsgsbase 70 140 210 280 350 SE +/- 1.19, N = 3 SE +/- 1.40, N = 3 338.36 335.16 MIN: 185.24 / MAX: 374.84 MIN: 172.66 / MAX: 372.4 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 10-bit FSGSBASE Enabled nofsgsbase 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 87.20 87.47 MIN: 66.61 / MAX: 133.73 MIN: 66.73 / MAX: 137.93 1. (CC) gcc options: -O3 -march=native -pthread
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill FSGSBASE Enabled nofsgsbase 40K 80K 120K 160K 200K SE +/- 172.09, N = 3 SE +/- 226.90, N = 3 186093 186451 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Read FSGSBASE Enabled nofsgsbase 30M 60M 90M 120M 150M SE +/- 833744.91, N = 3 SE +/- 497372.75, N = 3 142205648 141448198 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Sequential Fill FSGSBASE Enabled nofsgsbase 40K 80K 120K 160K 200K SE +/- 190.16, N = 3 SE +/- 107.10, N = 3 187950 189643 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill Sync FSGSBASE Enabled nofsgsbase 1400 2800 4200 5600 7000 SE +/- 511.75, N = 15 SE +/- 26.46, N = 3 6532 5681 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Read While Writing FSGSBASE Enabled nofsgsbase 1.2M 2.4M 3.6M 4.8M 6M SE +/- 27153.16, N = 3 SE +/- 54477.77, N = 3 5356272 5396411 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
Flexible IO Tester Fio is an advanced disk benchmark that depends upon the kernel's AIO access library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org IOPS, More Is Better Flexible IO Tester 3.18 Type: Random Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 2MB - Disk Target: Default Test Directory FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.88, N = 3 187 135 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org IOPS, More Is Better Flexible IO Tester 3.18 Type: Random Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 4KB - Disk Target: Default Test Directory FSGSBASE Enabled nofsgsbase 20K 40K 60K 80K 100K SE +/- 251.66, N = 3 SE +/- 100.00, N = 3 88800 63400 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org MB/s, More Is Better Flexible IO Tester 3.18 Type: Sequential Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 2MB - Disk Target: Default Test Directory FSGSBASE Enabled nofsgsbase 80 160 240 320 400 SE +/- 6.17, N = 3 SE +/- 12.84, N = 15 385 346 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org IOPS, More Is Better Flexible IO Tester 3.18 Type: Sequential Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 2MB - Disk Target: Default Test Directory FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 2.96, N = 3 SE +/- 6.41, N = 15 189 170 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing on the CPU with the water_GMX50 data. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2020.1 Water Benchmark FSGSBASE Enabled nofsgsbase 0.7907 1.5814 2.3721 3.1628 3.9535 SE +/- 0.001, N = 3 SE +/- 0.006, N = 3 3.506 3.514 1. (CXX) g++ options: -O3 -march=native -pthread -lrt -lpthread -lm
LevelDB LevelDB is a key-value storage library developed by Google that supports making use of Snappy for data compression and has other modern features. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Hot Read FSGSBASE Enabled nofsgsbase 20 40 60 80 100 SE +/- 1.10, N = 3 SE +/- 1.38, N = 3 92.28 91.65 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Fill Sync FSGSBASE Enabled nofsgsbase 0.405 0.81 1.215 1.62 2.025 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.8 1.8 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Fill Sync FSGSBASE Enabled nofsgsbase 1000 2000 3000 4000 5000 SE +/- 8.32, N = 3 SE +/- 8.46, N = 3 4485.35 4460.31 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Overwrite FSGSBASE Enabled nofsgsbase 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 9.9 10.0 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Overwrite FSGSBASE Enabled nofsgsbase 200 400 600 800 1000 SE +/- 1.81, N = 3 SE +/- 2.31, N = 3 799.36 792.44 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Random Fill FSGSBASE Enabled nofsgsbase 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 9.8 10.0 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Random Fill FSGSBASE Enabled nofsgsbase 200 400 600 800 1000 SE +/- 3.59, N = 3 SE +/- 8.40, N = 3 809.70 798.60 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Random Read FSGSBASE Enabled nofsgsbase 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.66, N = 3 94.00 93.01 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Seek Random FSGSBASE Enabled nofsgsbase 30 60 90 120 150 SE +/- 0.13, N = 3 SE +/- 0.75, N = 3 113.58 113.20 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Random Delete FSGSBASE Enabled nofsgsbase 170 340 510 680 850 SE +/- 1.02, N = 3 SE +/- 2.15, N = 3 774.36 761.43 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Sequential Fill FSGSBASE Enabled nofsgsbase 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 9.6 9.9 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Sequential Fill FSGSBASE Enabled nofsgsbase 200 400 600 800 1000 SE +/- 3.79, N = 3 SE +/- 3.18, N = 3 826.27 809.44 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
MariaDB This is a MariaDB MySQL database server benchmark making use of mysqlslap. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 64 FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 2.42, N = 6 SE +/- 2.50, N = 5 205 199 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 128 FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 0.48, N = 3 SE +/- 0.65, N = 3 159 154 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 256 FSGSBASE Enabled nofsgsbase 30 60 90 120 150 SE +/- 0.58, N = 3 SE +/- 0.32, N = 3 150 143 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 512 FSGSBASE Enabled nofsgsbase 40 80 120 160 200 SE +/- 2.84, N = 9 SE +/- 0.52, N = 3 161 140 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
Memtier_benchmark Memtier_benchmark is a NoSQL Redis/Memcache traffic generation plus benchmarking tool. This current test profile currently just stresses the Redis protocol and basic options exposed wotj a 1:1 Set/Get ratio, 30 pipeline, 100 clients per thread, and thread count equal to the number of CPU cores/threads present. Patches to extend the test are welcome as always. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ops/sec, More Is Better Memtier_benchmark 1.2.17 Protocol: Redis FSGSBASE Enabled nofsgsbase 600K 1200K 1800K 2400K 3000K SE +/- 74458.90, N = 15 SE +/- 13554.01, N = 3 2859981.40 2635763.82 1. (CXX) g++ options: -O2 -levent -lpthread -lz -lpcre
OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression FSGSBASE Enabled nofsgsbase 0.4523 0.9046 1.3569 1.8092 2.2615 SE +/- 0.02, N = 9 SE +/- 0.03, N = 15 1.96 2.01
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.13 ATPase Simulation - 327,506 Atoms FSGSBASE Enabled nofsgsbase 0.1375 0.275 0.4125 0.55 0.6875 SE +/- 0.00455, N = 14 SE +/- 0.00071, N = 3 0.61104 0.61077
Numenta Anomaly Benchmark Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: EXPoSE FSGSBASE Enabled nofsgsbase 300 600 900 1200 1500 SE +/- 15.42, N = 3 SE +/- 5.29, N = 3 1513.78 1500.85
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: bf16bf16bf16 - Engine: CPU FSGSBASE Enabled nofsgsbase 1.2815 2.563 3.8445 5.126 6.4075 SE +/- 0.00089, N = 3 SE +/- 0.00720, N = 3 5.69562 5.67910 MIN: 5.52 MIN: 5.5 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: bf16bf16bf16 - Engine: CPU FSGSBASE Enabled nofsgsbase 12 24 36 48 60 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 51.10 51.11 MIN: 50.05 MIN: 50.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU FSGSBASE Enabled nofsgsbase 2 4 6 8 10 SE +/- 0.00089, N = 3 SE +/- 0.01144, N = 3 6.39728 6.38735 MIN: 6.3 MIN: 6.3 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: bf16bf16bf16 - Engine: CPU FSGSBASE Enabled nofsgsbase 2 4 6 8 10 SE +/- 0.01137, N = 3 SE +/- 0.00338, N = 3 7.39154 7.39063 MIN: 7.23 MIN: 7.23 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: bf16bf16bf16 - Engine: CPU FSGSBASE Enabled nofsgsbase 3 6 9 12 15 SE +/- 0.00175, N = 3 SE +/- 0.00883, N = 3 9.46158 9.46163 MIN: 9.31 MIN: 9.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU FSGSBASE Enabled nofsgsbase 0.3267 0.6534 0.9801 1.3068 1.6335 SE +/- 0.00269, N = 3 SE +/- 0.00164, N = 3 1.44923 1.45193 MIN: 1.4 MIN: 1.41 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG19 - Device: CPU FSGSBASE Enabled nofsgsbase 5 10 15 20 25 SE +/- 0.18, N = 3 SE +/- 0.12, N = 3 20.64 21.39
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: CPU FSGSBASE Enabled nofsgsbase 200 400 600 800 1000 SE +/- 11.90, N = 4 SE +/- 3.24, N = 3 868.09 850.65
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: CPU FSGSBASE Enabled nofsgsbase 3 6 9 12 15 SE +/- 0.12, N = 3 SE +/- 0.09, N = 3 10.62 10.60
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU FSGSBASE Enabled nofsgsbase 1.008 2.016 3.024 4.032 5.04 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 4.45 4.48
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: CPU FSGSBASE Enabled nofsgsbase 0.4455 0.891 1.3365 1.782 2.2275 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.96 1.98
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: Inception V3 - Device: CPU FSGSBASE Enabled nofsgsbase 1.2375 2.475 3.7125 4.95 6.1875 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 5.50 5.45
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: NASNer Large - Device: CPU FSGSBASE Enabled nofsgsbase 0.1305 0.261 0.3915 0.522 0.6525 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.57 0.58
pmbench Pmbench is a Linux paging and virtual memory benchmark. This test profile will report the average page latency of the system. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org us - Average Page Latency, Fewer Is Better pmbench Concurrent Worker Threads: 72 - Read-Write Ratio: 100% Reads FSGSBASE Enabled nofsgsbase 0.0104 0.0208 0.0312 0.0416 0.052 SE +/- 0.0012, N = 12 SE +/- 0.0004, N = 15 0.0460 0.0451 1. (CC) gcc options: -lm -luuid -lxml2 -m64 -pthread
OpenBenchmarking.org us - Average Page Latency, Fewer Is Better pmbench Concurrent Worker Threads: 72 - Read-Write Ratio: 100% Writes FSGSBASE Enabled nofsgsbase 0.0183 0.0366 0.0549 0.0732 0.0915 SE +/- 0.0010, N = 5 SE +/- 0.0009, N = 3 0.0802 0.0812 1. (CC) gcc options: -lm -luuid -lxml2 -m64 -pthread
OpenBenchmarking.org us - Average Page Latency, Fewer Is Better pmbench Concurrent Worker Threads: 1 - Read-Write Ratio: 80% Reads 20% Writes FSGSBASE Enabled nofsgsbase 0.017 0.034 0.051 0.068 0.085 SE +/- 0.0002, N = 3 SE +/- 0.0002, N = 3 0.0756 0.0756 1. (CC) gcc options: -lm -luuid -lxml2 -m64 -pthread
PostgreSQL pgbench This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only FSGSBASE Enabled nofsgsbase 130K 260K 390K 520K 650K SE +/- 2139.37, N = 3 SE +/- 3396.56, N = 3 593329.43 594395.91 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write FSGSBASE Enabled nofsgsbase 1100 2200 3300 4400 5500 SE +/- 59.24, N = 6 SE +/- 34.32, N = 3 4908.50 2762.90 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only FSGSBASE Enabled nofsgsbase 130K 260K 390K 520K 650K SE +/- 1415.39, N = 3 SE +/- 680.20, N = 3 618994.74 619694.86 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write FSGSBASE Enabled nofsgsbase 1000 2000 3000 4000 5000 SE +/- 59.91, N = 3 SE +/- 25.55, N = 9 4727.71 2634.27 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
PostMark This is a test of NetApp's PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostMark 1.51 Disk Transaction Performance FSGSBASE Enabled nofsgsbase 1200 2400 3600 4800 6000 SE +/- 44.00, N = 3 SE +/- 44.00, N = 3 5725 5725 1. (CC) gcc options: -O3 -march=native
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.8 FSGSBASE Enabled nofsgsbase 600 1200 1800 2400 3000 2687.6 2688.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -lm
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: SADD FSGSBASE Enabled nofsgsbase 400K 800K 1200K 1600K 2000K SE +/- 2516.35, N = 3 SE +/- 30688.88, N = 15 2087688.75 2071005.92 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: GET FSGSBASE Enabled nofsgsbase 500K 1000K 1500K 2000K 2500K SE +/- 31845.75, N = 3 SE +/- 18114.33, N = 3 2500801.50 2340376.20 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: SET FSGSBASE Enabled nofsgsbase 400K 800K 1200K 1600K 2000K SE +/- 2456.08, N = 3 SE +/- 4205.47, N = 3 1918164.83 1908415.46 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: SENDFILE FSGSBASE Enabled nofsgsbase 100K 200K 300K 400K 500K SE +/- 1247.48, N = 3 SE +/- 104.27, N = 3 447930.47 444432.10 1. (CC) gcc options: -O3 -march=native -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Stress FSGSBASE Enabled nofsgsbase 3K 6K 9K 12K 15K SE +/- 19.86, N = 3 SE +/- 52.61, N = 3 11896.42 11983.66 1. (CC) gcc options: -O3 -march=native -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Context Switching FSGSBASE Enabled nofsgsbase 2M 4M 6M 8M 10M SE +/- 154150.18, N = 3 SE +/- 27584.57, N = 3 9410762.67 7847877.59 1. (CC) gcc options: -O3 -march=native -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p FSGSBASE Enabled nofsgsbase 0.027 0.054 0.081 0.108 0.135 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.120 0.120 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p FSGSBASE Enabled nofsgsbase 1.2983 2.5966 3.8949 5.1932 6.4915 SE +/- 0.067, N = 3 SE +/- 0.082, N = 3 5.770 5.683 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p FSGSBASE Enabled nofsgsbase 11 22 33 44 55 SE +/- 0.53, N = 3 SE +/- 0.04, N = 3 48.87 49.21 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
VP9 libvpx Encoding This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9/WebM format using a sample 1080p video. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 FSGSBASE Enabled nofsgsbase 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 6.14 6.12 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE -std=c++11
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 FSGSBASE Enabled nofsgsbase 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.08, N = 3 23.01 23.24 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE -std=c++11
YafaRay YafaRay is an open-source physically based montecarlo ray-tracing engine. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better YafaRay 3.4.1 Total Time For Sample Scene FSGSBASE Enabled nofsgsbase 30 60 90 120 150 SE +/- 2.93, N = 15 SE +/- 3.44, N = 15 108.82 113.63 1. (CXX) g++ options: -std=c++11 -O3 -ffast-math -rdynamic -ldl -lImath -lIlmImf -lIex -lHalf -lz -lIlmThread -lxml2 -lfreetype -lpthread
FSGSBASE Enabled Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002cJava Notes: OpenJDK Runtime Environment (build 11.0.7-ea+9-post-Ubuntu-1ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Testing initiated at 20 June 2020 20:34 by user phoronix.
nofsgsbase Processor: 2 x Intel Xeon Gold 5220R @ 3.90GHz (36 Cores / 72 Threads), Motherboard: TYAN S7106 (V2.01.B40 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 94GB, Disk: 500GB Samsung SSD 860, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel I210 + 2 x QLogic cLOM8214 1/10GbE
OS: Ubuntu 20.04, Kernel: 5.8.0-rc1-phx-fsgsbase (x86_64) 20200620, Desktop: GNOME Shell 3.36.1, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x5002f01Java Notes: OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Testing initiated at 22 June 2020 16:05 by user phoronix.