Intel FSGSBASE benchmarking by Michael Larabel for a future article.
FSGSBASE Enabled Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002cJava Notes: OpenJDK Runtime Environment (build 11.0.7-ea+9-post-Ubuntu-1ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
nofsgsbase Processor: 2 x Intel Xeon Gold 5220R @ 3.90GHz (36 Cores / 72 Threads), Motherboard: TYAN S7106 (V2.01.B40 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 94GB, Disk: 500GB Samsung SSD 860, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel I210 + 2 x QLogic cLOM8214 1/10GbE
OS: Ubuntu 20.04, Kernel: 5.8.0-rc1-phx-fsgsbase (x86_64) 20200620, Desktop: GNOME Shell 3.36.1, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x5002f01Java Notes: OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: SENDFILE nofsgsbase FSGSBASE Enabled 100K 200K 300K 400K 500K SE +/- 104.27, N = 3 SE +/- 1247.48, N = 3 444432.10 447930.47 1. (CC) gcc options: -O3 -march=native -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Stress nofsgsbase FSGSBASE Enabled 3K 6K 9K 12K 15K SE +/- 52.61, N = 3 SE +/- 19.86, N = 3 11983.66 11896.42 1. (CC) gcc options: -O3 -march=native -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Context Switching nofsgsbase FSGSBASE Enabled 2M 4M 6M 8M 10M SE +/- 27584.57, N = 3 SE +/- 154150.18, N = 3 7847877.59 9410762.67 1. (CC) gcc options: -O3 -march=native -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Flexible IO Tester Fio is an advanced disk benchmark that depends upon the kernel's AIO access library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org IOPS, More Is Better Flexible IO Tester 3.18 Type: Random Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 2MB - Disk Target: Default Test Directory nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 135 187 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org IOPS, More Is Better Flexible IO Tester 3.18 Type: Random Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 4KB - Disk Target: Default Test Directory nofsgsbase FSGSBASE Enabled 20K 40K 60K 80K 100K SE +/- 100.00, N = 3 SE +/- 251.66, N = 3 63400 88800 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org MB/s, More Is Better Flexible IO Tester 3.18 Type: Sequential Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 2MB - Disk Target: Default Test Directory nofsgsbase FSGSBASE Enabled 80 160 240 320 400 SE +/- 12.84, N = 15 SE +/- 6.17, N = 3 346 385 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org IOPS, More Is Better Flexible IO Tester 3.18 Type: Sequential Write - Engine: IO_uring - Buffered: Yes - Direct: No - Block Size: 2MB - Disk Target: Default Test Directory nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 6.41, N = 15 SE +/- 2.96, N = 3 170 189 1. (CC) gcc options: -rdynamic -std=gnu99 -ffast-math -include -O3 -fcommon -U_FORTIFY_SOURCE -march=native -ll -lcurl -lssl -lcrypto -lnuma -libverbs -lrt -laio -lz -lpthread -lm -ldl
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG19 - Device: CPU nofsgsbase FSGSBASE Enabled 5 10 15 20 25 SE +/- 0.12, N = 3 SE +/- 0.18, N = 3 21.39 20.64
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: CPU nofsgsbase FSGSBASE Enabled 200 400 600 800 1000 SE +/- 3.24, N = 3 SE +/- 11.90, N = 4 850.65 868.09
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: CPU nofsgsbase FSGSBASE Enabled 3 6 9 12 15 SE +/- 0.09, N = 3 SE +/- 0.12, N = 3 10.60 10.62
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU nofsgsbase FSGSBASE Enabled 1.008 2.016 3.024 4.032 5.04 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 4.48 4.45
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: CPU nofsgsbase FSGSBASE Enabled 0.4455 0.891 1.3365 1.782 2.2275 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.98 1.96
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: Inception V3 - Device: CPU nofsgsbase FSGSBASE Enabled 1.2375 2.475 3.7125 4.95 6.1875 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 5.45 5.50
OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: NASNer Large - Device: CPU nofsgsbase FSGSBASE Enabled 0.1305 0.261 0.3915 0.522 0.6525 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.58 0.57
Numenta Anomaly Benchmark Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: EXPoSE nofsgsbase FSGSBASE Enabled 300 600 900 1200 1500 SE +/- 5.29, N = 3 SE +/- 15.42, N = 3 1500.85 1513.78
OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression nofsgsbase FSGSBASE Enabled 0.4523 0.9046 1.3569 1.8092 2.2615 SE +/- 0.03, N = 15 SE +/- 0.02, N = 9 2.01 1.96
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing on the CPU with the water_GMX50 data. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2020.1 Water Benchmark nofsgsbase FSGSBASE Enabled 0.7907 1.5814 2.3721 3.1628 3.9535 SE +/- 0.006, N = 3 SE +/- 0.001, N = 3 3.514 3.506 1. (CXX) g++ options: -O3 -march=native -pthread -lrt -lpthread -lm
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.13 ATPase Simulation - 327,506 Atoms nofsgsbase FSGSBASE Enabled 0.1375 0.275 0.4125 0.55 0.6875 SE +/- 0.00071, N = 3 SE +/- 0.00455, N = 14 0.61077 0.61104
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: bf16bf16bf16 - Engine: CPU nofsgsbase FSGSBASE Enabled 1.2815 2.563 3.8445 5.126 6.4075 SE +/- 0.00720, N = 3 SE +/- 0.00089, N = 3 5.67910 5.69562 MIN: 5.5 MIN: 5.52 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: bf16bf16bf16 - Engine: CPU nofsgsbase FSGSBASE Enabled 12 24 36 48 60 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 51.11 51.10 MIN: 50.21 MIN: 50.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU nofsgsbase FSGSBASE Enabled 2 4 6 8 10 SE +/- 0.01144, N = 3 SE +/- 0.00089, N = 3 6.38735 6.39728 MIN: 6.3 MIN: 6.3 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: bf16bf16bf16 - Engine: CPU nofsgsbase FSGSBASE Enabled 2 4 6 8 10 SE +/- 0.00338, N = 3 SE +/- 0.01137, N = 3 7.39063 7.39154 MIN: 7.23 MIN: 7.23 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: bf16bf16bf16 - Engine: CPU nofsgsbase FSGSBASE Enabled 3 6 9 12 15 SE +/- 0.00883, N = 3 SE +/- 0.00175, N = 3 9.46163 9.46158 MIN: 9.35 MIN: 9.31 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU nofsgsbase FSGSBASE Enabled 0.3267 0.6534 0.9801 1.3068 1.6335 SE +/- 0.00164, N = 3 SE +/- 0.00269, N = 3 1.45193 1.44923 MIN: 1.41 MIN: 1.4 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.8 nofsgsbase FSGSBASE Enabled 600 1200 1800 2400 3000 2688.5 2687.6 1. (CXX) g++ options: -O3 -march=native -fopenmp -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -lm
CP2K Molecular Dynamics CP2K is an open-source molecular dynamics software package focused on quantum chemistry and solid-state physics. This test profile currently makes use of the OpenMP implementation and using the Fayalite-FIST molecular dynamics run and measures the total time to complete. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 6.1 Fayalite-FIST Data nofsgsbase FSGSBASE Enabled 400 800 1200 1600 2000 1886.26 2027.69
pmbench Pmbench is a Linux paging and virtual memory benchmark. This test profile will report the average page latency of the system. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org us - Average Page Latency, Fewer Is Better pmbench Concurrent Worker Threads: 72 - Read-Write Ratio: 100% Reads nofsgsbase FSGSBASE Enabled 0.0104 0.0208 0.0312 0.0416 0.052 SE +/- 0.0004, N = 15 SE +/- 0.0012, N = 12 0.0451 0.0460 1. (CC) gcc options: -lm -luuid -lxml2 -m64 -pthread
OpenBenchmarking.org us - Average Page Latency, Fewer Is Better pmbench Concurrent Worker Threads: 72 - Read-Write Ratio: 100% Writes nofsgsbase FSGSBASE Enabled 0.0183 0.0366 0.0549 0.0732 0.0915 SE +/- 0.0009, N = 3 SE +/- 0.0010, N = 5 0.0812 0.0802 1. (CC) gcc options: -lm -luuid -lxml2 -m64 -pthread
OpenBenchmarking.org us - Average Page Latency, Fewer Is Better pmbench Concurrent Worker Threads: 1 - Read-Write Ratio: 80% Reads 20% Writes nofsgsbase FSGSBASE Enabled 0.017 0.034 0.051 0.068 0.085 SE +/- 0.0002, N = 3 SE +/- 0.0002, N = 3 0.0756 0.0756 1. (CC) gcc options: -lm -luuid -lxml2 -m64 -pthread
PostMark This is a test of NetApp's PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostMark 1.51 Disk Transaction Performance nofsgsbase FSGSBASE Enabled 1200 2400 3600 4800 6000 SE +/- 44.00, N = 3 SE +/- 44.00, N = 3 5725 5725 1. (CC) gcc options: -O3 -march=native
AOM AV1 This is a simple test of the AOMedia AV1 encoder run on the CPU with a sample video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 0 Two-Pass nofsgsbase FSGSBASE Enabled 0.0608 0.1216 0.1824 0.2432 0.304 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.27 0.27 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 4 Two-Pass nofsgsbase FSGSBASE Enabled 0.4388 0.8776 1.3164 1.7552 2.194 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.94 1.95 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Realtime nofsgsbase FSGSBASE Enabled 3 6 9 12 15 SE +/- 0.09, N = 3 SE +/- 0.08, N = 3 10.78 10.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Two-Pass nofsgsbase FSGSBASE Enabled 0.6705 1.341 2.0115 2.682 3.3525 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 2.91 2.98 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 8 Realtime nofsgsbase FSGSBASE Enabled 6 12 18 24 30 SE +/- 0.31, N = 3 SE +/- 0.17, N = 3 23.84 23.74 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
VP9 libvpx Encoding This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9/WebM format using a sample 1080p video. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 nofsgsbase FSGSBASE Enabled 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 6.12 6.14 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE -std=c++11
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 nofsgsbase FSGSBASE Enabled 6 12 18 24 30 SE +/- 0.08, N = 3 SE +/- 0.09, N = 3 23.24 23.01 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE -std=c++11
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p nofsgsbase FSGSBASE Enabled 70 140 210 280 350 SE +/- 4.08, N = 3 SE +/- 3.13, N = 3 328.03 329.39 MIN: 183.84 / MAX: 426.68 MIN: 204.26 / MAX: 425.36 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 4K nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 0.93, N = 3 SE +/- 2.74, N = 3 180.77 182.78 MIN: 91.75 / MAX: 195.31 MIN: 88.23 / MAX: 199.52 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 1080p nofsgsbase FSGSBASE Enabled 70 140 210 280 350 SE +/- 1.40, N = 3 SE +/- 1.19, N = 3 335.16 338.36 MIN: 172.66 / MAX: 372.4 MIN: 185.24 / MAX: 374.84 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 10-bit nofsgsbase FSGSBASE Enabled 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 87.47 87.20 MIN: 66.73 / MAX: 137.93 MIN: 66.61 / MAX: 133.73 1. (CC) gcc options: -O3 -march=native -pthread
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p nofsgsbase FSGSBASE Enabled 0.027 0.054 0.081 0.108 0.135 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.120 0.120 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p nofsgsbase FSGSBASE Enabled 1.2983 2.5966 3.8949 5.1932 6.4915 SE +/- 0.082, N = 3 SE +/- 0.067, N = 3 5.683 5.770 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p nofsgsbase FSGSBASE Enabled 11 22 33 44 55 SE +/- 0.04, N = 3 SE +/- 0.53, N = 3 49.21 48.87 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
YafaRay YafaRay is an open-source physically based montecarlo ray-tracing engine. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better YafaRay 3.4.1 Total Time For Sample Scene nofsgsbase FSGSBASE Enabled 30 60 90 120 150 SE +/- 3.44, N = 15 SE +/- 2.93, N = 15 113.63 108.82 1. (CXX) g++ options: -std=c++11 -O3 -ffast-math -rdynamic -ldl -lImath -lIlmImf -lIex -lHalf -lz -lIlmThread -lxml2 -lfreetype -lpthread
BlogBench BlogBench is designed to replicate the load of a real-world busy file server by stressing the file-system with multiple threads of random reads, writes, and rewrites. The behavior is mimicked of that of a blog by creating blogs with content and pictures, modifying blog posts, adding comments to these blogs, and then reading the content of the blogs. All of these blogs generated are created locally with fake content and pictures. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Final Score, More Is Better BlogBench 1.1 Test: Write nofsgsbase FSGSBASE Enabled 5K 10K 15K 20K 25K SE +/- 1371.68, N = 3 SE +/- 920.31, N = 3 20455 24247 1. (CC) gcc options: -O3 -march=native -pthread
OpenBenchmarking.org Transactions Per Second, More Is Better Apache Siege 2.4.29 Concurrent Users: 50 nofsgsbase FSGSBASE Enabled 7K 14K 21K 28K 35K SE +/- 188.88, N = 3 SE +/- 194.24, N = 3 33180.66 33173.45 1. (CC) gcc options: -O3 -march=native -lpthread -ldl -lssl -lcrypto
OpenBenchmarking.org Transactions Per Second, More Is Better Apache Siege 2.4.29 Concurrent Users: 200 nofsgsbase FSGSBASE Enabled 10K 20K 30K 40K 50K SE +/- 254.67, N = 3 SE +/- 1308.05, N = 12 43531.70 48540.41 1. (CC) gcc options: -O3 -march=native -lpthread -ldl -lssl -lcrypto
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Increment - Clients: 1 nofsgsbase FSGSBASE Enabled 70 140 210 280 350 SE +/- 2.70, N = 15 SE +/- 2.52, N = 11 308 291
OpenBenchmarking.org Rows Per Second, More Is Better Apache HBase 2.2.3 Test: Random Read - Clients: 1 nofsgsbase FSGSBASE Enabled 1000 2000 3000 4000 5000 SE +/- 48.14, N = 15 SE +/- 56.05, N = 15 4625 4643
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Random Read - Clients: 1 nofsgsbase FSGSBASE Enabled 50 100 150 200 250 SE +/- 2.20, N = 15 SE +/- 2.57, N = 15 214 213
OpenBenchmarking.org Rows Per Second, More Is Better Apache HBase 2.2.3 Test: Sequential Read - Clients: 1 nofsgsbase FSGSBASE Enabled 1100 2200 3300 4400 5500 SE +/- 44.45, N = 15 SE +/- 93.78, N = 15 5090 5270
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Sequential Read - Clients: 1 nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 1.75, N = 15 SE +/- 3.62, N = 15 195 189
OpenBenchmarking.org Rows Per Second, More Is Better Apache HBase 2.2.3 Test: Async Random Read - Clients: 1 nofsgsbase FSGSBASE Enabled 1100 2200 3300 4400 5500 SE +/- 78.77, N = 12 SE +/- 81.85, N = 15 5137 5245
OpenBenchmarking.org Microseconds - Average Latency, Fewer Is Better Apache HBase 2.2.3 Test: Async Random Read - Clients: 1 nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 3.36, N = 12 SE +/- 3.20, N = 15 193 189
Memtier_benchmark Memtier_benchmark is a NoSQL Redis/Memcache traffic generation plus benchmarking tool. This current test profile currently just stresses the Redis protocol and basic options exposed wotj a 1:1 Set/Get ratio, 30 pipeline, 100 clients per thread, and thread count equal to the number of CPU cores/threads present. Patches to extend the test are welcome as always. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ops/sec, More Is Better Memtier_benchmark 1.2.17 Protocol: Redis nofsgsbase FSGSBASE Enabled 600K 1200K 1800K 2400K 3000K SE +/- 13554.01, N = 3 SE +/- 74458.90, N = 15 2635763.82 2859981.40 1. (CXX) g++ options: -O2 -levent -lpthread -lz -lpcre
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: SADD nofsgsbase FSGSBASE Enabled 400K 800K 1200K 1600K 2000K SE +/- 30688.88, N = 15 SE +/- 2516.35, N = 3 2071005.92 2087688.75 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: GET nofsgsbase FSGSBASE Enabled 500K 1000K 1500K 2000K 2500K SE +/- 18114.33, N = 3 SE +/- 31845.75, N = 3 2340376.20 2500801.50 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: SET nofsgsbase FSGSBASE Enabled 400K 800K 1200K 1600K 2000K SE +/- 4205.47, N = 3 SE +/- 2456.08, N = 3 1908415.46 1918164.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill nofsgsbase FSGSBASE Enabled 40K 80K 120K 160K 200K SE +/- 226.90, N = 3 SE +/- 172.09, N = 3 186451 186093 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Read nofsgsbase FSGSBASE Enabled 30M 60M 90M 120M 150M SE +/- 497372.75, N = 3 SE +/- 833744.91, N = 3 141448198 142205648 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Sequential Fill nofsgsbase FSGSBASE Enabled 40K 80K 120K 160K 200K SE +/- 107.10, N = 3 SE +/- 190.16, N = 3 189643 187950 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill Sync nofsgsbase FSGSBASE Enabled 1400 2800 4200 5600 7000 SE +/- 26.46, N = 3 SE +/- 511.75, N = 15 5681 6532 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Read While Writing nofsgsbase FSGSBASE Enabled 1.2M 2.4M 3.6M 4.8M 6M SE +/- 54477.77, N = 3 SE +/- 27153.16, N = 3 5396411 5356272 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
LevelDB LevelDB is a key-value storage library developed by Google that supports making use of Snappy for data compression and has other modern features. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Hot Read nofsgsbase FSGSBASE Enabled 20 40 60 80 100 SE +/- 1.38, N = 3 SE +/- 1.10, N = 3 91.65 92.28 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Fill Sync nofsgsbase FSGSBASE Enabled 0.405 0.81 1.215 1.62 2.025 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.8 1.8 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Fill Sync nofsgsbase FSGSBASE Enabled 1000 2000 3000 4000 5000 SE +/- 8.46, N = 3 SE +/- 8.32, N = 3 4460.31 4485.35 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Overwrite nofsgsbase FSGSBASE Enabled 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 10.0 9.9 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Overwrite nofsgsbase FSGSBASE Enabled 200 400 600 800 1000 SE +/- 2.31, N = 3 SE +/- 1.81, N = 3 792.44 799.36 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Random Fill nofsgsbase FSGSBASE Enabled 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.03, N = 3 10.0 9.8 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Random Fill nofsgsbase FSGSBASE Enabled 200 400 600 800 1000 SE +/- 8.40, N = 3 SE +/- 3.59, N = 3 798.60 809.70 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Random Read nofsgsbase FSGSBASE Enabled 20 40 60 80 100 SE +/- 0.66, N = 3 SE +/- 0.05, N = 3 93.01 94.00 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Seek Random nofsgsbase FSGSBASE Enabled 30 60 90 120 150 SE +/- 0.75, N = 3 SE +/- 0.13, N = 3 113.20 113.58 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Random Delete nofsgsbase FSGSBASE Enabled 170 340 510 680 850 SE +/- 2.15, N = 3 SE +/- 1.02, N = 3 761.43 774.36 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org MB/s, More Is Better LevelDB 1.22 Benchmark: Sequential Fill nofsgsbase FSGSBASE Enabled 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 9.9 9.6 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.22 Benchmark: Sequential Fill nofsgsbase FSGSBASE Enabled 200 400 600 800 1000 SE +/- 3.18, N = 3 SE +/- 3.79, N = 3 809.44 826.27 1. (CXX) g++ options: -O3 -march=native -lsnappy -lpthread
PostgreSQL pgbench This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only nofsgsbase FSGSBASE Enabled 130K 260K 390K 520K 650K SE +/- 3396.56, N = 3 SE +/- 2139.37, N = 3 594395.91 593329.43 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write nofsgsbase FSGSBASE Enabled 1100 2200 3300 4400 5500 SE +/- 34.32, N = 3 SE +/- 59.24, N = 6 2762.90 4908.50 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only nofsgsbase FSGSBASE Enabled 130K 260K 390K 520K 650K SE +/- 680.20, N = 3 SE +/- 1415.39, N = 3 619694.86 618994.74 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write nofsgsbase FSGSBASE Enabled 1000 2000 3000 4000 5000 SE +/- 25.55, N = 9 SE +/- 59.91, N = 3 2634.27 4727.71 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
MariaDB This is a MariaDB MySQL database server benchmark making use of mysqlslap. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 64 nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 2.50, N = 5 SE +/- 2.42, N = 6 199 205 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 128 nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 0.65, N = 3 SE +/- 0.48, N = 3 154 159 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 256 nofsgsbase FSGSBASE Enabled 30 60 90 120 150 SE +/- 0.32, N = 3 SE +/- 0.58, N = 3 143 150 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 512 nofsgsbase FSGSBASE Enabled 40 80 120 160 200 SE +/- 0.52, N = 3 SE +/- 2.84, N = 9 140 161 1. (CXX) g++ options: -O3 -march=native -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -lsnappy -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
FSGSBASE Enabled Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002cJava Notes: OpenJDK Runtime Environment (build 11.0.7-ea+9-post-Ubuntu-1ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Testing initiated at 20 June 2020 20:34 by user phoronix.
nofsgsbase Processor: 2 x Intel Xeon Gold 5220R @ 3.90GHz (36 Cores / 72 Threads), Motherboard: TYAN S7106 (V2.01.B40 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 94GB, Disk: 500GB Samsung SSD 860, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel I210 + 2 x QLogic cLOM8214 1/10GbE
OS: Ubuntu 20.04, Kernel: 5.8.0-rc1-phx-fsgsbase (x86_64) 20200620, Desktop: GNOME Shell 3.36.1, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: MQ-DEADLINE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x5002f01Java Notes: OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)Python Notes: Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Testing initiated at 22 June 2020 16:05 by user phoronix.