Intel Xeon E5-2687W v3 testing with a MSI X99S SLI PLUS (MS-7885) v1.0 (1.E0 BIOS) and NVIDIA GeForce GTX 770 on Ubuntu 20.04 via the Phoronix Test Suite.
1 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0x44Python Notes: Python 3.8.5Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
2 3 Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (10 Cores / 20 Threads), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0 (1.E0 BIOS), Chipset: Intel Xeon E7 v3/Xeon, Memory: 32GB, Disk: 80GB INTEL SSDSCKGW08, Graphics: NVIDIA GeForce GTX 770, Audio: Realtek ALC892, Monitor: LG Ultra HD, Network: Intel I218-V
OS: Ubuntu 20.04, Kernel: 5.9.0-050900rc7daily20200928-generic (x86_64) 20200927, Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 3840x2160
Algebraic Multi-Grid Benchmark AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 1 2 3 70M 140M 210M 280M 350M SE +/- 2829162.69, N = 3 SE +/- 2849594.82, N = 3 SE +/- 980611.02, N = 3 302891133 303612367 305133633 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding 3 2 1 300 600 900 1200 1500 SE +/- 0.57, N = 3 SE +/- 0.31, N = 3 SE +/- 0.36, N = 3 1280.90 1284.71 1285.02 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding 3 2 1 400 800 1200 1600 2000 SE +/- 0.99, N = 3 SE +/- 0.77, N = 3 SE +/- 0.84, N = 3 1745.94 1753.61 1756.70 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 2 3 1 400 800 1200 1600 2000 SE +/- 28.57, N = 15 SE +/- 29.00, N = 12 SE +/- 18.20, N = 3 1589.75 1597.84 1699.85 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 3 2 1 400 800 1200 1600 2000 SE +/- 47.93, N = 12 SE +/- 45.52, N = 15 SE +/- 65.46, N = 3 1979.00 1984.73 2071.46 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding 1 2 3 400 800 1200 1600 2000 SE +/- 39.18, N = 15 SE +/- 31.45, N = 15 SE +/- 12.81, N = 3 1778.10 1881.94 1962.74 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding 1 2 3 600 1200 1800 2400 3000 SE +/- 1.79, N = 15 SE +/- 0.00, N = 15 SE +/- 0.00, N = 3 2664.35 2716.90 2716.90 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP 1 2 3 50 100 150 200 250 SE +/- 0.33, N = 3 SE +/- 0.19, N = 3 SE +/- 0.19, N = 3 239.24 239.43 239.62 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Build2 This test profile measures the time to bootstrap/install the build2 C++ build toolchain from source. Build2 is a cross-platform build toolchain for C/C++ code and features Cargo-like features. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile 2 3 1 40 80 120 160 200 SE +/- 0.85, N = 3 SE +/- 0.91, N = 3 SE +/- 0.55, N = 3 160.67 160.09 159.69
CLOMP CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. This particular test profile configuration is currently set to look at the OpenMP static schedule speed-up across all available CPU cores using the recommended test configuration. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Speedup, More Is Better CLOMP 1.2 Static OMP Speedup 2 3 1 3 6 9 12 15 SE +/- 1.35, N = 12 SE +/- 0.09, N = 3 SE +/- 0.13, N = 3 11.0 13.0 13.2 1. (CC) gcc options: -fopenmp -O3 -lm
CloverLeaf CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 2 3 1 30 60 90 120 150 SE +/- 0.54, N = 3 SE +/- 0.33, N = 3 SE +/- 0.22, N = 3 140.64 140.57 136.89 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Cpuminer-Opt Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Magi 1 2 3 50 100 150 200 250 SE +/- 0.12, N = 3 SE +/- 0.17, N = 3 SE +/- 2.29, N = 14 203.98 204.18 206.24 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: x25x 1 2 3 50 100 150 200 250 SE +/- 1.24, N = 3 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 219.58 220.73 220.82 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Deepcoin 2 3 1 1600 3200 4800 6400 8000 SE +/- 164.38, N = 15 SE +/- 33.20, N = 3 SE +/- 132.89, N = 15 7013.87 7312.24 7487.98 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Ringcoin 1 2 3 300 600 900 1200 1500 SE +/- 3.43, N = 3 SE +/- 10.01, N = 3 SE +/- 11.74, N = 14 1579.48 1582.49 1590.60 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Blake-2 S 1 2 3 50K 100K 150K 200K 250K SE +/- 3265.93, N = 4 SE +/- 2626.85, N = 15 SE +/- 3030.09, N = 4 220508 221938 226073 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Garlicoin 1 3 2 400 800 1200 1600 2000 SE +/- 7.48, N = 3 SE +/- 4.72, N = 3 SE +/- 6.73, N = 3 1736.89 1738.45 1750.88 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Skeincoin 2 1 3 8K 16K 24K 32K 40K SE +/- 325.79, N = 15 SE +/- 291.68, N = 15 SE +/- 140.75, N = 3 34338 34847 35707 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Myriad-Groestl 1 3 2 2K 4K 6K 8K 10K SE +/- 6.67, N = 3 SE +/- 125.83, N = 3 SE +/- 161.69, N = 3 10363 10510 10567 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: LBC, LBRY Credits 1 2 3 5K 10K 15K 20K 25K SE +/- 113.58, N = 3 SE +/- 86.86, N = 3 SE +/- 96.84, N = 3 23330 23407 23463 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Quad SHA-256, Pyrite 1 3 2 10K 20K 30K 40K 50K SE +/- 749.50, N = 3 SE +/- 1271.64, N = 15 SE +/- 748.75, N = 15 47653 48341 48622 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.15.5 Algorithm: Triple SHA-256, Onecoin 2 1 3 13K 26K 39K 52K 65K SE +/- 1076.82, N = 15 SE +/- 902.62, N = 15 SE +/- 80.07, N = 3 60350 61247 62923 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-whirlpool 1 3 2 110K 220K 330K 440K 550K SE +/- 2447.10, N = 3 SE +/- 720.67, N = 3 SE +/- 541.00, N = 3 530142 532634 532814
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Encryption 1 2 3 400 800 1200 1600 2000 SE +/- 6.83, N = 3 SE +/- 3.81, N = 3 SE +/- 2.43, N = 3 1702.3 1708.5 1714.4
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Decryption 1 2 3 400 800 1200 1600 2000 SE +/- 8.30, N = 3 SE +/- 1.84, N = 3 SE +/- 1.95, N = 3 1697.2 1702.6 1710.6
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Encryption 1 2 3 120 240 360 480 600 SE +/- 2.47, N = 3 SE +/- 1.50, N = 3 SE +/- 0.07, N = 3 549.0 550.3 551.4
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Decryption 2 1 3 120 240 360 480 600 SE +/- 1.18, N = 3 SE +/- 0.78, N = 3 SE +/- 0.81, N = 3 531.5 532.9 533.5
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Encryption 1 2 3 80 160 240 320 400 SE +/- 1.43, N = 3 SE +/- 0.34, N = 3 SE +/- 0.07, N = 3 344.4 345.4 346.2
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Decryption 2 1 3 80 160 240 320 400 SE +/- 0.91, N = 3 SE +/- 0.24, N = 3 SE +/- 0.15, N = 3 345.8 346.8 346.9
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Encryption 1 2 3 300 600 900 1200 1500 SE +/- 1.19, N = 3 SE +/- 3.43, N = 3 SE +/- 0.72, N = 3 1400.2 1401.7 1404.2
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Decryption 1 2 3 300 600 900 1200 1500 SE +/- 1.87, N = 3 SE +/- 2.11, N = 3 SE +/- 2.16, N = 3 1389.6 1393.0 1394.3
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Encryption 1 2 3 120 240 360 480 600 SE +/- 0.60, N = 2 SE +/- 0.57, N = 3 SE +/- 0.26, N = 3 550.7 551.2 551.7
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Decryption 3 1 2 80 160 240 320 400 SE +/- 1.02, N = 3 SE +/- 0.25, N = 3 SE +/- 0.20, N = 3 345.7 346.4 346.8
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 3 1 2 100 200 300 400 500 SE +/- 2.60, N = 3 SE +/- 0.72, N = 3 SE +/- 0.88, N = 3 458.06 459.11 459.32 MIN: 341.78 / MAX: 587.39 MIN: 341.89 / MAX: 588.95 MIN: 342.72 / MAX: 583.61 1. (CC) gcc options: -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 3 2 1 30 60 90 120 150 SE +/- 0.14, N = 3 SE +/- 0.53, N = 3 SE +/- 0.27, N = 3 140.47 140.67 142.04 MIN: 130.58 / MAX: 160.85 MIN: 125.57 / MAX: 161.64 MIN: 130.3 / MAX: 163.29 1. (CC) gcc options: -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 2 1 3 80 160 240 320 400 SE +/- 3.16, N = 3 SE +/- 3.01, N = 3 SE +/- 1.19, N = 3 380.86 382.21 385.99 MIN: 302.45 / MAX: 418.3 MIN: 305.4 / MAX: 420.06 MIN: 312.93 / MAX: 423.3 1. (CC) gcc options: -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 3 1 2 15 30 45 60 75 SE +/- 0.08, N = 3 SE +/- 0.18, N = 3 SE +/- 0.17, N = 3 68.34 68.44 68.62 MIN: 44.14 / MAX: 168.93 MIN: 43.96 / MAX: 169.86 MIN: 44.12 / MAX: 171.68 1. (CC) gcc options: -pthread
Etcpak Etcpack is the self-proclaimed "fastest ETC compressor on the planet" with focused on providing open-source, very fast ETC and S3 texture compression support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 2 1 3 200 400 600 800 1000 SE +/- 0.37, N = 3 SE +/- 2.71, N = 3 SE +/- 2.11, N = 3 1080.55 1083.86 1084.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 1 2 3 50 100 150 200 250 SE +/- 0.79, N = 3 SE +/- 0.55, N = 3 SE +/- 0.05, N = 3 235.54 235.97 236.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 3 2 1 30 60 90 120 150 SE +/- 0.53, N = 3 SE +/- 0.38, N = 3 SE +/- 0.03, N = 3 140.21 140.32 140.73 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 1 3 2 50 100 150 200 250 SE +/- 0.40, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 226.75 227.06 227.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
FinanceBench FinanceBench is a collection of financial program benchmarks with support for benchmarking on the GPU via OpenCL and CPU benchmarking with OpenMP. The FinanceBench test cases are focused on Black-Sholes-Merton Process with Analytic European Option engine, QMC (Sobol) Monte-Carlo method (Equity Option Example), Bonds Fixed-rate bond with flat forward curve, and Repo Securities repurchase agreement. FinanceBench was originally written by the Cavazos Lab at University of Delaware. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Repo OpenMP 2 3 1 15K 30K 45K 60K 75K SE +/- 1002.67, N = 3 SE +/- 922.31, N = 4 SE +/- 89.04, N = 3 69828.36 69374.87 68017.65 1. (CXX) g++ options: -O3 -march=native -fopenmp
OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Bonds OpenMP 3 1 2 20K 40K 60K 80K 100K SE +/- 1452.56, N = 12 SE +/- 1477.66, N = 5 SE +/- 20.05, N = 3 116695.38 115661.13 114077.57 1. (CXX) g++ options: -O3 -march=native -fopenmp
Gcrypt Library Libgcrypt is a general purpose cryptographic library developed as part of the GnuPG project. This is a benchmark of libgcrypt's integrated benchmark and is measuring the time to run the benchmark command with a cipher/mac/hash repetition count set for 50 times as simple, high level look at the overall crypto performance of the system under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.9 1 3 2 60 120 180 240 300 SE +/- 0.82, N = 3 SE +/- 0.41, N = 3 SE +/- 0.33, N = 3 273.55 273.01 272.41 1. (CC) gcc options: -O2 -fvisibility=hidden
Google SynthMark SynthMark is a cross platform tool for benchmarking CPU performance under a variety of real-time audio workloads. It uses a polyphonic synthesizer model to provide standardized tests for latency, jitter and computational throughput. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 3 2 1 120 240 360 480 600 SE +/- 0.80, N = 3 SE +/- 0.57, N = 3 SE +/- 0.84, N = 3 549.94 550.14 551.18 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 1 2 3 10M 20M 30M 40M 50M SE +/- 200225.59, N = 3 SE +/- 199447.85, N = 3 SE +/- 186245.67, N = 3 46104420 46529853 46840427 1. (CXX) g++ options: -O3 -fopenmp
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: XZ 0 - Process: Decompression 1 2 3 20 40 60 80 100 96 96 96 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Zstd 1 - Process: Compression 3 1 2 90 180 270 360 450 SE +/- 1.00, N = 3 SE +/- 0.58, N = 3 400 401 401 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Zstd 1 - Process: Decompression 3 2 1 300 600 900 1200 1500 SE +/- 2.73, N = 3 SE +/- 3.06, N = 3 SE +/- 2.08, N = 3 1386 1389 1390 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Zstd 8 - Process: Compression 1 2 3 15 30 45 60 75 SE +/- 0.58, N = 3 65 65 65 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Zstd 8 - Process: Decompression 3 2 1 300 600 900 1200 1500 SE +/- 21.06, N = 3 SE +/- 7.88, N = 3 SE +/- 4.06, N = 3 1396 1413 1423 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Crush 0 - Process: Compression 1 2 3 20 40 60 80 100 76 76 76 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Crush 0 - Process: Decompression 1 3 2 90 180 270 360 450 431 431 432 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Brotli 0 - Process: Compression 1 2 3 80 160 240 320 400 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 356 356 357 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Brotli 0 - Process: Decompression 1 3 2 110 220 330 440 550 SE +/- 3.00, N = 3 SE +/- 0.67, N = 3 503 505 506 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Brotli 2 - Process: Compression 1 2 3 30 60 90 120 150 148 148 148 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Brotli 2 - Process: Decompression 1 3 2 130 260 390 520 650 SE +/- 5.17, N = 3 SE +/- 2.73, N = 3 SE +/- 0.33, N = 3 581 582 585 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Libdeflate 1 - Process: Compression 1 2 3 40 80 120 160 200 182 182 182 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
OpenBenchmarking.org MB/s, More Is Better lzbench 1.8 Test: Libdeflate 1 - Process: Decompression 2 1 3 200 400 600 800 1000 SE +/- 0.33, N = 3 994 995 995 1. (CXX) g++ options: -pthread -fomit-frame-pointer -fstrict-aliasing -ffast-math -O3
Mobile Neural Network MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 1 2 3 2 4 6 8 10 SE +/- 0.011, N = 3 SE +/- 0.029, N = 3 SE +/- 0.021, N = 3 8.447 8.439 8.424 MIN: 8.35 / MAX: 9.56 MIN: 8.3 / MAX: 52.09 MIN: 8.33 / MAX: 9.34 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 1 3 2 12 24 36 48 60 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 54.51 54.12 53.97 MIN: 53.53 / MAX: 130.53 MIN: 53.84 / MAX: 125.64 MIN: 53.45 / MAX: 128.98 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 2 3 1 1.1579 2.3158 3.4737 4.6316 5.7895 SE +/- 0.008, N = 3 SE +/- 0.037, N = 3 SE +/- 0.016, N = 3 5.146 5.100 5.061 MIN: 5.03 / MAX: 6.03 MIN: 4.42 / MAX: 11.39 MIN: 4.98 / MAX: 5.99 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 1 3 2 2 4 6 8 10 SE +/- 0.006, N = 3 SE +/- 0.018, N = 3 SE +/- 0.007, N = 3 6.196 6.151 6.112 MIN: 4.66 / MAX: 19.35 MIN: 6.05 / MAX: 6.9 MIN: 6.03 / MAX: 12.61 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 3 2 1 13 26 39 52 65 SE +/- 0.26, N = 3 SE +/- 0.21, N = 3 SE +/- 0.19, N = 3 55.96 55.92 55.87 MIN: 55.41 / MAX: 122.63 MIN: 55.44 / MAX: 148.38 MIN: 55.38 / MAX: 87.21 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C 2 3 1 200 400 600 800 1000 SE +/- 11.00, N = 15 SE +/- 4.68, N = 3 SE +/- 16.48, N = 3 1002.01 1047.67 1050.17 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D 2 3 1 200 400 600 800 1000 SE +/- 9.01, N = 12 SE +/- 17.23, N = 3 SE +/- 14.12, N = 4 1008.91 1019.37 1027.88 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet 2 3 1 5 10 15 20 25 SE +/- 0.14, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 20.21 20.06 20.03 MIN: 19.81 / MAX: 21.32 MIN: 19.95 / MAX: 21.79 MIN: 19.87 / MAX: 21.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 3 2 1 2 4 6 8 10 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 6.31 6.22 6.22 MIN: 6.11 / MAX: 64.27 MIN: 6.12 / MAX: 6.83 MIN: 6.14 / MAX: 6.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 3 2 1 1.2218 2.4436 3.6654 4.8872 6.109 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 5.43 5.41 5.41 MIN: 5.28 / MAX: 6.12 MIN: 5.29 / MAX: 5.54 MIN: 5.29 / MAX: 5.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 2 1 3 2 4 6 8 10 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 7.07 7.05 7.03 MIN: 6.95 / MAX: 7.92 MIN: 6.99 / MAX: 7.63 MIN: 6.94 / MAX: 7.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet 3 2 1 1.251 2.502 3.753 5.004 6.255 SE +/- 0.07, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.56 5.55 5.52 MIN: 5.41 / MAX: 5.79 MIN: 5.4 / MAX: 5.65 MIN: 5.41 / MAX: 5.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 3 2 1 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 8.33 8.30 8.28 MIN: 8.16 / MAX: 9.17 MIN: 8.14 / MAX: 8.79 MIN: 8.18 / MAX: 8.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface 2 1 3 0.6278 1.2556 1.8834 2.5112 3.139 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 2.79 2.76 2.75 MIN: 2.69 / MAX: 3.4 MIN: 2.7 / MAX: 3.13 MIN: 2.69 / MAX: 3.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet 3 2 1 4 8 12 16 20 SE +/- 0.31, N = 3 SE +/- 0.20, N = 3 SE +/- 0.14, N = 3 16.95 16.84 16.60 MIN: 16.16 / MAX: 17.97 MIN: 16.12 / MAX: 18.21 MIN: 16.2 / MAX: 74.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 2 3 1 12 24 36 48 60 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 0.13, N = 3 51.97 51.85 51.78 MIN: 51.54 / MAX: 54.71 MIN: 51.49 / MAX: 53.93 MIN: 51.37 / MAX: 99.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 2 1 3 4 8 12 16 20 SE +/- 0.15, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 15.54 15.20 15.19 MIN: 15 / MAX: 110.46 MIN: 15.01 / MAX: 15.36 MIN: 14.96 / MAX: 16.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet 2 3 1 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 12.12 12.11 12.08 MIN: 12.02 / MAX: 12.63 MIN: 12.03 / MAX: 12.39 MIN: 11.98 / MAX: 12.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 3 1 2 7 14 21 28 35 SE +/- 0.09, N = 3 SE +/- 0.25, N = 3 SE +/- 0.10, N = 3 29.11 29.06 28.86 MIN: 28.24 / MAX: 30.15 MIN: 28.13 / MAX: 134.5 MIN: 28.14 / MAX: 29.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny 3 2 1 8 16 24 32 40 SE +/- 0.34, N = 3 SE +/- 0.60, N = 3 SE +/- 0.35, N = 3 32.73 31.96 31.43 MIN: 31.73 / MAX: 35.09 MIN: 30.06 / MAX: 35.42 MIN: 30.14 / MAX: 34.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd 3 2 1 6 12 18 24 30 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 26.59 26.57 26.55 MIN: 26.05 / MAX: 27.21 MIN: 26.04 / MAX: 27.24 MIN: 26.01 / MAX: 27.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m 1 2 3 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.13, N = 3 SE +/- 0.06, N = 3 24.08 23.93 23.75 MIN: 23.89 / MAX: 25.07 MIN: 23.59 / MAX: 24.31 MIN: 23.54 / MAX: 24.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 3 1 2 0.918 1.836 2.754 3.672 4.59 SE +/- 0.02757, N = 3 SE +/- 0.00717, N = 3 SE +/- 0.01870, N = 3 4.07980 4.07476 4.07451 MIN: 3.99 MIN: 4 MIN: 3.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 3 1 2 2 4 6 8 10 SE +/- 0.03820, N = 3 SE +/- 0.02331, N = 3 SE +/- 0.00084, N = 3 7.43079 7.41691 7.39027 MIN: 7.35 MIN: 7.36 MIN: 7.35 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 2 3 1 0.6387 1.2774 1.9161 2.5548 3.1935 SE +/- 0.00201, N = 3 SE +/- 0.00573, N = 3 SE +/- 0.00327, N = 3 2.83861 2.83373 2.83074 MIN: 2.8 MIN: 2.79 MIN: 2.8 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 3 2 0.5602 1.1204 1.6806 2.2408 2.801 SE +/- 0.00663, N = 3 SE +/- 0.00042, N = 3 SE +/- 0.00118, N = 3 2.48994 2.48125 2.47930 MIN: 2.45 MIN: 2.45 MIN: 2.45 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 2 1 3 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 13.80 13.79 13.78 MIN: 13.72 MIN: 13.69 MIN: 13.71 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 2 3 1 1.3001 2.6002 3.9003 5.2004 6.5005 SE +/- 0.00194, N = 3 SE +/- 0.00979, N = 3 SE +/- 0.01646, N = 3 5.77823 5.76890 5.76536 MIN: 5.74 MIN: 5.71 MIN: 5.7 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 2 1 3 2 4 6 8 10 SE +/- 0.00711, N = 3 SE +/- 0.00368, N = 3 SE +/- 0.00234, N = 3 8.40124 8.39325 8.38226 MIN: 8.35 MIN: 8.35 MIN: 8.31 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 3 1 2 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 12.72 12.71 12.69 MIN: 12.64 MIN: 12.64 MIN: 12.63 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 3 1 2 2 4 6 8 10 SE +/- 0.09722, N = 3 SE +/- 0.08426, N = 4 SE +/- 0.07423, N = 6 6.68411 6.66011 6.51937 MIN: 6.42 MIN: 6.33 MIN: 6.3 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 2 1 3 1.1424 2.2848 3.4272 4.5696 5.712 SE +/- 0.00166, N = 3 SE +/- 0.00123, N = 3 SE +/- 0.00135, N = 3 5.07714 5.07564 5.07005 MIN: 5.06 MIN: 5.05 MIN: 5.04 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 3 2 1 900 1800 2700 3600 4500 SE +/- 5.47, N = 3 SE +/- 6.88, N = 3 SE +/- 5.71, N = 3 4051.42 4050.54 4048.95 MIN: 4040.91 MIN: 4036.39 MIN: 4036 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 3 2 1 500 1000 1500 2000 2500 SE +/- 8.06, N = 3 SE +/- 4.53, N = 3 SE +/- 1.96, N = 3 2221.93 2214.44 2209.39 MIN: 2205.08 MIN: 2206.84 MIN: 2203.59 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 2.55, N = 3 SE +/- 1.68, N = 3 SE +/- 3.06, N = 3 4045.79 4044.15 4040.61 MIN: 4037.43 MIN: 4035.82 MIN: 4031.1 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 3 2 1 500 1000 1500 2000 2500 SE +/- 9.79, N = 3 SE +/- 1.77, N = 3 SE +/- 0.62, N = 3 2221.13 2211.70 2208.25 MIN: 2207.17 MIN: 2206.98 MIN: 2204.85 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 3 1 2 0.758 1.516 2.274 3.032 3.79 SE +/- 0.01022, N = 3 SE +/- 0.01155, N = 3 SE +/- 0.00051, N = 3 3.36883 3.36850 3.35947 MIN: 3.27 MIN: 3.3 MIN: 3.3 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 3 2 900 1800 2700 3600 4500 SE +/- 1.90, N = 3 SE +/- 6.05, N = 3 SE +/- 1.11, N = 3 4051.99 4048.11 4040.59 MIN: 4039.27 MIN: 4033 MIN: 4035.29 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 500 1000 1500 2000 2500 SE +/- 5.90, N = 3 SE +/- 0.63, N = 3 SE +/- 2.02, N = 3 2211.56 2211.32 2210.61 MIN: 2201.95 MIN: 2206.12 MIN: 2204.44 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 3 2 0.6759 1.3518 2.0277 2.7036 3.3795 SE +/- 0.00849, N = 3 SE +/- 0.00263, N = 3 SE +/- 0.01863, N = 3 3.00389 2.99407 2.99080 MIN: 2.94 MIN: 2.94 MIN: 2.83 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 3 1 2 70 140 210 280 350 SE +/- 0.44, N = 3 SE +/- 1.01, N = 3 SE +/- 0.60, N = 3 327 328 328 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 2 3 1 120 240 360 480 600 SE +/- 1.36, N = 3 SE +/- 0.83, N = 3 SE +/- 0.87, N = 3 541 542 545 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1 2 3 13 26 39 52 65 SE +/- 0.00, N = 3 SE +/- 0.17, N = 3 59 59 59 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 3 1 2 2K 4K 6K 8K 10K SE +/- 28.57, N = 3 SE +/- 9.85, N = 3 SE +/- 16.03, N = 3 9729 9736 9775 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 3 1 2 900 1800 2700 3600 4500 SE +/- 8.12, N = 3 SE +/- 9.80, N = 3 SE +/- 5.93, N = 3 4005 4007 4008 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenFOAM OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 3 2 1 50 100 150 200 250 SE +/- 1.02, N = 3 SE +/- 0.61, N = 3 SE +/- 0.54, N = 3 219.57 219.10 219.05 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -lgenericPatchFields -lOpenFOAM -ldl -lm
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 3 1 2 3 6 9 12 15 SE +/- 0.03, N = 5 SE +/- 0.03, N = 5 SE +/- 0.04, N = 5 10.80 10.69 10.62 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
perf-bench This test profile is used for running Linux perf-bench, the benchmark support within the Linux kernel's perf tool. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe 2 1 3 16K 32K 48K 64K 80K SE +/- 929.56, N = 5 SE +/- 327.32, N = 3 SE +/- 629.67, N = 3 73695 73755 74430 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -lpthread -lrt -lm -ldl -lelf -lcrypto -lz -llzma -lnuma
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 3 2 1 12 24 36 48 60 SE +/- 0.74, N = 3 SE +/- 0.70, N = 5 SE +/- 0.64, N = 5 51.91 51.90 51.15 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -pthread -lm
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 1 2 3 400 800 1200 1600 2000 SE +/- 3.77, N = 3 SE +/- 2.43, N = 3 SE +/- 4.94, N = 3 1689.6 1691.5 1691.6 1. (CXX) g++ options: -O3 -march=native -rdynamic
OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 3 1 2 0.1823 0.3646 0.5469 0.7292 0.9115 SE +/- 0.003, N = 3 SE +/- 0.001, N = 3 SE +/- 0.005, N = 3 0.808 0.810 0.810
OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 1 3 2 0.2408 0.4816 0.7224 0.9632 1.204 SE +/- 0.008, N = 3 SE +/- 0.002, N = 3 SE +/- 0.003, N = 3 1.065 1.068 1.070
OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 2 1 3 0.5344 1.0688 1.6032 2.1376 2.672 SE +/- 0.026, N = 3 SE +/- 0.011, N = 3 SE +/- 0.013, N = 3 2.361 2.375 2.375
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: LPOP 2 3 1 400K 800K 1200K 1600K 2000K SE +/- 4394.55, N = 3 SE +/- 91181.44, N = 12 SE +/- 6269.54, N = 3 1271416.38 1758555.23 2010121.55 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SADD 2 1 3 300K 600K 900K 1200K 1500K SE +/- 10942.24, N = 3 SE +/- 21115.79, N = 4 SE +/- 7906.68, N = 3 1588662.75 1602736.88 1605411.58 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: LPUSH 2 1 3 300K 600K 900K 1200K 1500K SE +/- 9099.06, N = 3 SE +/- 3115.55, N = 3 SE +/- 2768.29, N = 3 1242294.79 1247740.00 1249824.00 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET 3 2 1 400K 800K 1200K 1600K 2000K SE +/- 20496.02, N = 3 SE +/- 25400.58, N = 3 SE +/- 6745.04, N = 3 1765664.37 1775802.50 1887226.87 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SET 2 1 3 300K 600K 900K 1200K 1500K SE +/- 19369.44, N = 15 SE +/- 14514.92, N = 8 SE +/- 9026.88, N = 3 1408307.62 1415197.08 1436898.96 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 1 3 2 70 140 210 280 350 SE +/- 1.18, N = 3 SE +/- 0.90, N = 3 SE +/- 0.24, N = 3 339.21 338.74 337.51 MIN: 336.01 / MAX: 348.17 MIN: 336.55 / MAX: 341.55 MIN: 336.44 / MAX: 346.96 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 1 2 3 70 140 210 280 350 SE +/- 0.32, N = 3 SE +/- 0.16, N = 3 SE +/- 0.61, N = 3 329.72 328.63 327.83 MIN: 329.09 / MAX: 331.08 MIN: 328.14 / MAX: 330.14 MIN: 326.73 / MAX: 330.31 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
WebP2 Image Encode This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Default 3 1 2 1.2951 2.5902 3.8853 5.1804 6.4755 SE +/- 0.043, N = 3 SE +/- 0.037, N = 3 SE +/- 0.018, N = 3 5.756 5.677 5.664 1. (CXX) g++ options: -msse4.2 -fno-rtti -O3 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 75, Compression Effort 7 1 3 2 70 140 210 280 350 SE +/- 0.36, N = 3 SE +/- 0.52, N = 3 SE +/- 0.19, N = 3 298.92 298.54 298.25 1. (CXX) g++ options: -msse4.2 -fno-rtti -O3 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 95, Compression Effort 7 1 2 3 120 240 360 480 600 SE +/- 1.36, N = 3 SE +/- 0.47, N = 3 SE +/- 0.96, N = 3 545.96 545.66 544.66 1. (CXX) g++ options: -msse4.2 -fno-rtti -O3 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Compression Effort 5 1 2 3 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 14.48 14.48 14.48 1. (CXX) g++ options: -msse4.2 -fno-rtti -O3 -rdynamic -lpthread -ljpeg
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Lossless Compression 3 1 2 200 400 600 800 1000 SE +/- 0.05, N = 3 SE +/- 0.20, N = 3 SE +/- 0.53, N = 3 940.23 939.76 938.92 1. (CXX) g++ options: -msse4.2 -fno-rtti -O3 -rdynamic -lpthread -ljpeg
1 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0x44Python Notes: Python 3.8.5Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 26 January 2021 16:22 by user phoronix.
2 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0x44Python Notes: Python 3.8.5Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 27 January 2021 05:17 by user phoronix.
3 Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (10 Cores / 20 Threads), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0 (1.E0 BIOS), Chipset: Intel Xeon E7 v3/Xeon, Memory: 32GB, Disk: 80GB INTEL SSDSCKGW08, Graphics: NVIDIA GeForce GTX 770, Audio: Realtek ALC892, Monitor: LG Ultra HD, Network: Intel I218-V
OS: Ubuntu 20.04, Kernel: 5.9.0-050900rc7daily20200928-generic (x86_64) 20200927, Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0x44Python Notes: Python 3.8.5Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 27 January 2021 15:34 by user phoronix.