AMD EPYC 7601 32-Core testing with a TYAN B8026T70AE24HR (V1.02.B10 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
Run 1 Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Run 2 Run 3 Processor: AMD EPYC 7601 32-Core @ 2.20GHz (32 Cores / 64 Threads), Motherboard: TYAN B8026T70AE24HR (V1.02.B10 BIOS), Chipset: AMD 17h, Memory: 126GB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: llvmpipe, Monitor: VE228, Network: 2 x Broadcom NetXtreme BCM5720 2-port PCIe
OS: Ubuntu 20.04, Kernel: 5.4.0-53-generic (x86_64), Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, OpenGL: 3.3 Mesa 20.0.8 (LLVM 10.0.0 128 bits), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
AMD EPYC 7601 Xmas 2020 OpenBenchmarking.org Phoronix Test Suite AMD EPYC 7601 32-Core @ 2.20GHz (32 Cores / 64 Threads) TYAN B8026T70AE24HR (V1.02.B10 BIOS) AMD 17h 126GB 280GB INTEL SSDPE21D280GA llvmpipe VE228 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 20.04 5.4.0-53-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 modesetting 1.20.8 3.3 Mesa 20.0.8 (LLVM 10.0.0 128 bits) GCC 9.3.0 ext4 1920x1080 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution AMD EPYC 7601 Xmas 2020 Benchmarks System Logs - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Run 1 Run 2 Run 3 Result Overview Phoronix Test Suite 100% 100% 101% 101% 102% Node.js V8 Web Tooling Benchmark oneDNN CLOMP Timed MAFFT Alignment Timed HMMer Search Monkey Audio Encoding Build2 NCNN SQLite Speedtest Coremark Opus Codec Encoding Timed FFmpeg Compilation WavPack Audio Encoding Timed Eigen Compilation simdjson
AMD EPYC 7601 Xmas 2020 ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU hmmer: Pfam Database Search node-web-tooling: build-eigen: Time To Compile build2: Time To Compile sqlite-speedtest: Timed Time - Size 1,000 onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU clomp: Static OMP Speedup simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU simdjson: Kostya onednn: IP Shapes 3D - f32 - CPU build-ffmpeg: Time To Compile encode-ape: WAV To APE encode-wavpack: WAV To WavPack onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU coremark: CoreMark Size 666 - Iterations Per Second onednn: Deconvolution Batch shapes_1d - f32 - CPU encode-opus: WAV To Opus Encode onednn: IP Shapes 1D - u8s8f32 - CPU mafft: Multiple Sequence Alignment - LSU RNA onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU Run 1 Run 2 Run 3 117.02 46.68 57.99 60.78 33.20 41.83 100.72 48.06 7.79 22.19 16.17 17.51 16.30 17.42 43.10 10732.73 10583.10 10689.16 3300.07 3293.49 3332.79 200.295 6.78 120.016 102.295 90.116 4.60248 5.34771 57.1 0.28 0.36 0.37 1.71220 0.33 12.4177 39.094 18.346 17.319 1.78892 879248.022638 4.03281 10.187 2.67937 15.018 23.3030 9.04439 3.56511 18.5128 4.41314 119.23 46.89 55.80 59.24 30.19 45.70 94.33 46.99 7.89 22.24 15.76 17.35 17.48 19.42 41.84 10747.5 10647.60 10915.65 3434.14 3322.68 3312.30 199.708 6.74 119.981 102.588 90.320 4.71473 4.49148 57.7 0.28 0.36 0.37 1.74056 0.33 11.7699 39.108 18.332 17.312 1.77953 879237.122078 4.00713 10.215 2.68511 15.147 22.4576 9.03893 3.55970 18.6800 4.37097 118.48 44.81 56.52 59.49 31.92 43.64 88.55 49.81 7.90 23.22 16.26 16.94 16.91 18.22 43.26 10314.22 10812.54 11077.9 3327.91 3393.82 3405.82 200.753 6.85 120.191 102.354 90.107 4.22841 4.33519 57.8 0.28 0.36 0.37 1.66512 0.33 12.0856 39.189 18.416 17.292 1.79366 876909.950001 4.01827 10.195 2.66509 15.023 23.2120 9.08767 3.57153 18.6556 4.40049 OpenBenchmarking.org
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m Run 3 Run 2 Run 1 30 60 90 120 150 SE +/- 1.61, N = 12 SE +/- 2.16, N = 12 SE +/- 1.56, N = 12 118.48 119.23 117.02 MIN: 109.05 / MAX: 2000.57 MIN: 109.72 / MAX: 3748.32 MIN: 109.2 / MAX: 1631.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd Run 3 Run 2 Run 1 11 22 33 44 55 SE +/- 1.26, N = 12 SE +/- 1.43, N = 12 SE +/- 1.46, N = 12 44.81 46.89 46.68 MIN: 37.3 / MAX: 531.47 MIN: 37.8 / MAX: 531.21 MIN: 37.34 / MAX: 524.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny Run 3 Run 2 Run 1 13 26 39 52 65 SE +/- 0.57, N = 12 SE +/- 0.72, N = 12 SE +/- 1.01, N = 12 56.52 55.80 57.99 MIN: 44.78 / MAX: 275.31 MIN: 46.04 / MAX: 269.61 MIN: 46.53 / MAX: 296.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 Run 3 Run 2 Run 1 14 28 42 56 70 SE +/- 3.15, N = 12 SE +/- 1.83, N = 12 SE +/- 2.69, N = 12 59.49 59.24 60.78 MIN: 38.42 / MAX: 662.24 MIN: 38.46 / MAX: 770.44 MIN: 38.67 / MAX: 633.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet Run 3 Run 2 Run 1 8 16 24 32 40 SE +/- 1.62, N = 12 SE +/- 1.11, N = 12 SE +/- 1.65, N = 12 31.92 30.19 33.20 MIN: 16.26 / MAX: 163.34 MIN: 16.07 / MAX: 156.27 MIN: 18.26 / MAX: 171.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 Run 3 Run 2 Run 1 10 20 30 40 50 SE +/- 2.71, N = 12 SE +/- 3.66, N = 12 SE +/- 1.98, N = 12 43.64 45.70 41.83 MIN: 27.63 / MAX: 246.76 MIN: 27.44 / MAX: 248.59 MIN: 23.36 / MAX: 249.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 Run 3 Run 2 Run 1 20 40 60 80 100 SE +/- 3.70, N = 12 SE +/- 3.64, N = 12 SE +/- 5.01, N = 12 88.55 94.33 100.72 MIN: 43.77 / MAX: 279.55 MIN: 47.38 / MAX: 304.01 MIN: 45.51 / MAX: 338.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet Run 3 Run 2 Run 1 11 22 33 44 55 SE +/- 2.45, N = 12 SE +/- 2.77, N = 12 SE +/- 2.27, N = 12 49.81 46.99 48.06 MIN: 32.24 / MAX: 613.18 MIN: 33.26 / MAX: 605.84 MIN: 32.75 / MAX: 604.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface Run 3 Run 2 Run 1 2 4 6 8 10 SE +/- 0.09, N = 12 SE +/- 0.06, N = 12 SE +/- 0.06, N = 12 7.90 7.89 7.79 MIN: 7.46 / MAX: 59.21 MIN: 7.55 / MAX: 80.4 MIN: 7.48 / MAX: 49.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 Run 3 Run 2 Run 1 6 12 18 24 30 SE +/- 0.56, N = 12 SE +/- 0.31, N = 12 SE +/- 0.37, N = 12 23.22 22.24 22.19 MIN: 20.7 / MAX: 525.92 MIN: 20.84 / MAX: 524.85 MIN: 20.26 / MAX: 496.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet Run 3 Run 2 Run 1 4 8 12 16 20 SE +/- 0.43, N = 12 SE +/- 0.24, N = 12 SE +/- 0.48, N = 12 16.26 15.76 16.17 MIN: 14.79 / MAX: 427.39 MIN: 14.78 / MAX: 415.5 MIN: 14.62 / MAX: 428.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 Run 3 Run 2 Run 1 4 8 12 16 20 SE +/- 0.13, N = 12 SE +/- 0.51, N = 12 SE +/- 0.42, N = 12 16.94 17.35 17.51 MIN: 16.11 / MAX: 120.46 MIN: 16 / MAX: 357.5 MIN: 15.95 / MAX: 355.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 Run 3 Run 2 Run 1 4 8 12 16 20 SE +/- 0.59, N = 12 SE +/- 0.96, N = 12 SE +/- 0.56, N = 12 16.91 17.48 16.30 MIN: 14.94 / MAX: 447.34 MIN: 14.75 / MAX: 525.02 MIN: 14.83 / MAX: 444.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 Run 3 Run 2 Run 1 5 10 15 20 25 SE +/- 0.54, N = 12 SE +/- 1.19, N = 12 SE +/- 0.22, N = 12 18.22 19.42 17.42 MIN: 15.35 / MAX: 435.27 MIN: 15.84 / MAX: 439.61 MIN: 15.79 / MAX: 249.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet Run 3 Run 2 Run 1 10 20 30 40 50 SE +/- 1.07, N = 12 SE +/- 1.10, N = 12 SE +/- 1.57, N = 12 43.26 41.84 43.10 MIN: 34.82 / MAX: 511.89 MIN: 35.73 / MAX: 496.11 MIN: 35.09 / MAX: 501.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 2K 4K 6K 8K 10K SE +/- 184.49, N = 13 SE +/- 128.18, N = 12 SE +/- 164.15, N = 12 10314.22 10747.50 10732.73 MIN: 7551.61 MIN: 9600.6 MIN: 8370.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 2K 4K 6K 8K 10K SE +/- 330.55, N = 12 SE +/- 204.17, N = 12 SE +/- 206.22, N = 10 10812.54 10647.60 10583.10 MIN: 8507.26 MIN: 8942.86 MIN: 9226.33 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU Run 3 Run 2 Run 1 2K 4K 6K 8K 10K SE +/- 137.39, N = 4 SE +/- 182.83, N = 12 SE +/- 258.76, N = 9 11077.90 10915.65 10689.16 MIN: 10188.8 MIN: 8687.22 MIN: 9144.24 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 700 1400 2100 2800 3500 SE +/- 46.22, N = 15 SE +/- 41.23, N = 6 SE +/- 44.01, N = 15 3327.91 3434.14 3300.07 MIN: 2885.53 MIN: 3003.06 MIN: 2751.78 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 700 1400 2100 2800 3500 SE +/- 11.78, N = 3 SE +/- 38.80, N = 15 SE +/- 45.21, N = 15 3393.82 3322.68 3293.49 MIN: 3348.85 MIN: 2956.39 MIN: 2548 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU Run 3 Run 2 Run 1 700 1400 2100 2800 3500 SE +/- 51.82, N = 3 SE +/- 58.86, N = 15 SE +/- 31.60, N = 10 3405.82 3312.30 3332.79 MIN: 3281.7 MIN: 2572.81 MIN: 2567.62 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Node.js V8 Web Tooling Benchmark Running the V8 project's Web-Tooling-Benchmark under Node.js. The Web-Tooling-Benchmark stresses JavaScript-related workloads common to web developers like Babel and TypeScript and Babylon. This test profile can test the system's JavaScript performance with Node.js. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org runs/s, More Is Better Node.js V8 Web Tooling Benchmark Run 3 Run 2 Run 1 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 6.85 6.74 6.78 1. Nodejs
v10.19.0
Build2 This test profile measures the time to bootstrap/install the build2 C++ build toolchain from source. Build2 is a cross-platform build toolchain for C/C++ code and features Cargo-like features. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile Run 3 Run 2 Run 1 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.14, N = 3 SE +/- 0.36, N = 3 102.35 102.59 102.30
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 1.0608 2.1216 3.1824 4.2432 5.304 SE +/- 0.04344, N = 3 SE +/- 0.12692, N = 15 SE +/- 0.17305, N = 15 4.22841 4.71473 4.60248 MIN: 3.96 MIN: 3.93 MIN: 3.92 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 1.2032 2.4064 3.6096 4.8128 6.016 SE +/- 0.13685, N = 12 SE +/- 0.07284, N = 15 SE +/- 0.52810, N = 15 4.33519 4.49148 5.34771 MIN: 2.85 MIN: 2.85 MIN: 2.82 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
CLOMP CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. This particular test profile configuration is currently set to look at the OpenMP static schedule speed-up across all available CPU cores using the recommended test configuration. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Speedup, More Is Better CLOMP 1.2 Static OMP Speedup Run 3 Run 2 Run 1 13 26 39 52 65 SE +/- 0.43, N = 3 SE +/- 0.70, N = 3 SE +/- 0.32, N = 3 57.8 57.7 57.1 1. (CC) gcc options: -fopenmp -O3 -lm
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: LargeRandom Run 3 Run 2 Run 1 0.063 0.126 0.189 0.252 0.315 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.28 0.28 0.28 1. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: PartialTweets Run 3 Run 2 Run 1 0.081 0.162 0.243 0.324 0.405 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.36 0.36 0.36 1. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: DistinctUserID Run 3 Run 2 Run 1 0.0833 0.1666 0.2499 0.3332 0.4165 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.37 0.37 0.37 1. (CXX) g++ options: -O3 -pthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 0.3916 0.7832 1.1748 1.5664 1.958 SE +/- 0.06598, N = 15 SE +/- 0.05674, N = 12 SE +/- 0.06136, N = 15 1.66512 1.74056 1.71220 MIN: 0.98 MIN: 1.11 MIN: 1.12 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: Kostya Run 3 Run 2 Run 1 0.0743 0.1486 0.2229 0.2972 0.3715 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.33 0.33 0.33 1. (CXX) g++ options: -O3 -pthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 3 6 9 12 15 SE +/- 0.23, N = 15 SE +/- 0.25, N = 15 SE +/- 0.32, N = 15 12.09 11.77 12.42 MIN: 3.14 MIN: 3.13 MIN: 3.03 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 0.4036 0.8072 1.2108 1.6144 2.018 SE +/- 0.01875, N = 3 SE +/- 0.01534, N = 15 SE +/- 0.01884, N = 3 1.79366 1.77953 1.78892 MIN: 1.66 MIN: 1.57 MIN: 1.66 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 0.9074 1.8148 2.7222 3.6296 4.537 SE +/- 0.03270, N = 3 SE +/- 0.02054, N = 3 SE +/- 0.02609, N = 3 4.01827 4.00713 4.03281 MIN: 3.67 MIN: 3.65 MIN: 3.67 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode Run 3 Run 2 Run 1 3 6 9 12 15 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.00, N = 5 10.20 10.22 10.19 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 0.6041 1.2082 1.8123 2.4164 3.0205 SE +/- 0.00279, N = 3 SE +/- 0.00743, N = 3 SE +/- 0.01516, N = 3 2.66509 2.68511 2.67937 MIN: 2.55 MIN: 2.55 MIN: 2.56 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 6 12 18 24 30 SE +/- 0.12, N = 3 SE +/- 0.45, N = 12 SE +/- 0.12, N = 3 23.21 22.46 23.30 MIN: 20.94 MIN: 11.33 MIN: 21.63 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 3 6 9 12 15 SE +/- 0.11809, N = 15 SE +/- 0.11235, N = 3 SE +/- 0.10760, N = 15 9.08767 9.03893 9.04439 MIN: 6.99 MIN: 6.98 MIN: 6.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 0.8036 1.6072 2.4108 3.2144 4.018 SE +/- 0.00955, N = 3 SE +/- 0.02517, N = 3 SE +/- 0.01645, N = 3 3.57153 3.55970 3.56511 MIN: 1.89 MIN: 1.89 MIN: 1.9 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 5 10 15 20 25 SE +/- 0.25, N = 3 SE +/- 0.29, N = 3 SE +/- 0.19, N = 3 18.66 18.68 18.51 MIN: 17.14 MIN: 17.2 MIN: 17.06 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 0.993 1.986 2.979 3.972 4.965 SE +/- 0.06103, N = 4 SE +/- 0.06219, N = 3 SE +/- 0.05240, N = 6 4.40049 4.37097 4.41314 MIN: 4.06 MIN: 4.07 MIN: 4.07 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Run 1 Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 21 December 2020 16:36 by user phoronix.
Run 2 Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 21 December 2020 20:55 by user phoronix.
Run 3 Processor: AMD EPYC 7601 32-Core @ 2.20GHz (32 Cores / 64 Threads), Motherboard: TYAN B8026T70AE24HR (V1.02.B10 BIOS), Chipset: AMD 17h, Memory: 126GB, Disk: 280GB INTEL SSDPE21D280GA, Graphics: llvmpipe, Monitor: VE228, Network: 2 x Broadcom NetXtreme BCM5720 2-port PCIe
OS: Ubuntu 20.04, Kernel: 5.4.0-53-generic (x86_64), Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, OpenGL: 3.3 Mesa 20.0.8 (LLVM 10.0.0 128 bits), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 22 December 2020 05:26 by user phoronix.