AMD EPYC 7601 32-Core testing with a TYAN B8026T70AE24HR (V1.02.B10 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2012222-HA-AMDEPYC7628 AMD EPYC 7601 Xmas 2020 - Phoronix Test Suite AMD EPYC 7601 Xmas 2020 AMD EPYC 7601 32-Core testing with a TYAN B8026T70AE24HR (V1.02.B10 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2012222-HA-AMDEPYC7628&grt&sro .
AMD EPYC 7601 Xmas 2020 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution Run 1 Run 2 Run 3 AMD EPYC 7601 32-Core @ 2.20GHz (32 Cores / 64 Threads) TYAN B8026T70AE24HR (V1.02.B10 BIOS) AMD 17h 126GB 280GB INTEL SSDPE21D280GA llvmpipe VE228 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 20.04 5.4.0-53-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 modesetting 1.20.8 3.3 Mesa 20.0.8 (LLVM 10.0.0 128 bits) GCC 9.3.0 ext4 1920x1080 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7601 Xmas 2020 build2: Time To Compile clomp: Static OMP Speedup coremark: CoreMark Size 666 - Iterations Per Second encode-ape: WAV To APE ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m node-web-tooling: onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU encode-opus: WAV To Opus Encode simdjson: Kostya simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID sqlite-speedtest: Timed Time - Size 1,000 build-eigen: Time To Compile build-ffmpeg: Time To Compile hmmer: Pfam Database Search mafft: Multiple Sequence Alignment - LSU RNA encode-wavpack: WAV To WavPack Run 1 Run 2 Run 3 102.295 57.1 879248.022638 18.346 43.10 17.42 16.30 17.51 16.17 22.19 7.79 48.06 100.72 41.83 33.20 60.78 57.99 46.68 117.02 6.78 5.34771 12.4177 2.67937 3.56511 18.5128 4.03281 9.04439 23.3030 4.60248 4.41314 10732.73 3293.49 10583.10 3300.07 1.71220 10689.16 3332.79 1.78892 10.187 0.33 0.28 0.36 0.37 90.116 120.016 39.094 200.295 15.018 17.319 102.588 57.7 879237.122078 18.332 41.84 19.42 17.48 17.35 15.76 22.24 7.89 46.99 94.33 45.70 30.19 59.24 55.80 46.89 119.23 6.74 4.49148 11.7699 2.68511 3.55970 18.6800 4.00713 9.03893 22.4576 4.71473 4.37097 10747.5 3322.68 10647.60 3434.14 1.74056 10915.65 3312.30 1.77953 10.215 0.33 0.28 0.36 0.37 90.320 119.981 39.108 199.708 15.147 17.312 102.354 57.8 876909.950001 18.416 43.26 18.22 16.91 16.94 16.26 23.22 7.90 49.81 88.55 43.64 31.92 59.49 56.52 44.81 118.48 6.85 4.33519 12.0856 2.66509 3.57153 18.6556 4.01827 9.08767 23.2120 4.22841 4.40049 10314.22 3393.82 10812.54 3327.91 1.66512 11077.9 3405.82 1.79366 10.195 0.33 0.28 0.36 0.37 90.107 120.191 39.189 200.753 15.023 17.292 OpenBenchmarking.org
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile Run 1 Run 2 Run 3 20 40 60 80 100 SE +/- 0.36, N = 3 SE +/- 0.14, N = 3 SE +/- 0.04, N = 3 102.30 102.59 102.35
CLOMP Static OMP Speedup OpenBenchmarking.org Speedup, More Is Better CLOMP 1.2 Static OMP Speedup Run 1 Run 2 Run 3 13 26 39 52 65 SE +/- 0.32, N = 3 SE +/- 0.70, N = 3 SE +/- 0.43, N = 3 57.1 57.7 57.8 1. (CC) gcc options: -fopenmp -O3 -lm
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second Run 1 Run 2 Run 3 200K 400K 600K 800K 1000K SE +/- 5683.14, N = 3 SE +/- 1976.60, N = 3 SE +/- 2175.03, N = 3 879248.02 879237.12 876909.95 1. (CC) gcc options: -O2 -lrt" -lrt
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE Run 1 Run 2 Run 3 5 10 15 20 25 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.08, N = 5 18.35 18.33 18.42 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet Run 1 Run 2 Run 3 10 20 30 40 50 SE +/- 1.57, N = 12 SE +/- 1.10, N = 12 SE +/- 1.07, N = 12 43.10 41.84 43.26 MIN: 35.09 / MAX: 501.41 MIN: 35.73 / MAX: 496.11 MIN: 34.82 / MAX: 511.89 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 Run 1 Run 2 Run 3 5 10 15 20 25 SE +/- 0.22, N = 12 SE +/- 1.19, N = 12 SE +/- 0.54, N = 12 17.42 19.42 18.22 MIN: 15.79 / MAX: 249.25 MIN: 15.84 / MAX: 439.61 MIN: 15.35 / MAX: 435.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.56, N = 12 SE +/- 0.96, N = 12 SE +/- 0.59, N = 12 16.30 17.48 16.91 MIN: 14.83 / MAX: 444.14 MIN: 14.75 / MAX: 525.02 MIN: 14.94 / MAX: 447.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.42, N = 12 SE +/- 0.51, N = 12 SE +/- 0.13, N = 12 17.51 17.35 16.94 MIN: 15.95 / MAX: 355.59 MIN: 16 / MAX: 357.5 MIN: 16.11 / MAX: 120.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.48, N = 12 SE +/- 0.24, N = 12 SE +/- 0.43, N = 12 16.17 15.76 16.26 MIN: 14.62 / MAX: 428.63 MIN: 14.78 / MAX: 415.5 MIN: 14.79 / MAX: 427.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 Run 1 Run 2 Run 3 6 12 18 24 30 SE +/- 0.37, N = 12 SE +/- 0.31, N = 12 SE +/- 0.56, N = 12 22.19 22.24 23.22 MIN: 20.26 / MAX: 496.03 MIN: 20.84 / MAX: 524.85 MIN: 20.7 / MAX: 525.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface Run 1 Run 2 Run 3 2 4 6 8 10 SE +/- 0.06, N = 12 SE +/- 0.06, N = 12 SE +/- 0.09, N = 12 7.79 7.89 7.90 MIN: 7.48 / MAX: 49.5 MIN: 7.55 / MAX: 80.4 MIN: 7.46 / MAX: 59.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet Run 1 Run 2 Run 3 11 22 33 44 55 SE +/- 2.27, N = 12 SE +/- 2.77, N = 12 SE +/- 2.45, N = 12 48.06 46.99 49.81 MIN: 32.75 / MAX: 604.8 MIN: 33.26 / MAX: 605.84 MIN: 32.24 / MAX: 613.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 Run 1 Run 2 Run 3 20 40 60 80 100 SE +/- 5.01, N = 12 SE +/- 3.64, N = 12 SE +/- 3.70, N = 12 100.72 94.33 88.55 MIN: 45.51 / MAX: 338.45 MIN: 47.38 / MAX: 304.01 MIN: 43.77 / MAX: 279.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 Run 1 Run 2 Run 3 10 20 30 40 50 SE +/- 1.98, N = 12 SE +/- 3.66, N = 12 SE +/- 2.71, N = 12 41.83 45.70 43.64 MIN: 23.36 / MAX: 249.26 MIN: 27.44 / MAX: 248.59 MIN: 27.63 / MAX: 246.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet Run 1 Run 2 Run 3 8 16 24 32 40 SE +/- 1.65, N = 12 SE +/- 1.11, N = 12 SE +/- 1.62, N = 12 33.20 30.19 31.92 MIN: 18.26 / MAX: 171.51 MIN: 16.07 / MAX: 156.27 MIN: 16.26 / MAX: 163.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 Run 1 Run 2 Run 3 14 28 42 56 70 SE +/- 2.69, N = 12 SE +/- 1.83, N = 12 SE +/- 3.15, N = 12 60.78 59.24 59.49 MIN: 38.67 / MAX: 633.7 MIN: 38.46 / MAX: 770.44 MIN: 38.42 / MAX: 662.24 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny Run 1 Run 2 Run 3 13 26 39 52 65 SE +/- 1.01, N = 12 SE +/- 0.72, N = 12 SE +/- 0.57, N = 12 57.99 55.80 56.52 MIN: 46.53 / MAX: 296.18 MIN: 46.04 / MAX: 269.61 MIN: 44.78 / MAX: 275.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd Run 1 Run 2 Run 3 11 22 33 44 55 SE +/- 1.46, N = 12 SE +/- 1.43, N = 12 SE +/- 1.26, N = 12 46.68 46.89 44.81 MIN: 37.34 / MAX: 524.93 MIN: 37.8 / MAX: 531.21 MIN: 37.3 / MAX: 531.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m Run 1 Run 2 Run 3 30 60 90 120 150 SE +/- 1.56, N = 12 SE +/- 2.16, N = 12 SE +/- 1.61, N = 12 117.02 119.23 118.48 MIN: 109.2 / MAX: 1631.71 MIN: 109.72 / MAX: 3748.32 MIN: 109.05 / MAX: 2000.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Node.js V8 Web Tooling Benchmark OpenBenchmarking.org runs/s, More Is Better Node.js V8 Web Tooling Benchmark Run 1 Run 2 Run 3 2 4 6 8 10 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 6.78 6.74 6.85 1. Nodejs
v10.19.0
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 1.2032 2.4064 3.6096 4.8128 6.016 SE +/- 0.52810, N = 15 SE +/- 0.07284, N = 15 SE +/- 0.13685, N = 12 5.34771 4.49148 4.33519 MIN: 2.82 MIN: 2.85 MIN: 2.85 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 3 6 9 12 15 SE +/- 0.32, N = 15 SE +/- 0.25, N = 15 SE +/- 0.23, N = 15 12.42 11.77 12.09 MIN: 3.03 MIN: 3.13 MIN: 3.14 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 0.6041 1.2082 1.8123 2.4164 3.0205 SE +/- 0.01516, N = 3 SE +/- 0.00743, N = 3 SE +/- 0.00279, N = 3 2.67937 2.68511 2.66509 MIN: 2.56 MIN: 2.55 MIN: 2.55 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 0.8036 1.6072 2.4108 3.2144 4.018 SE +/- 0.01645, N = 3 SE +/- 0.02517, N = 3 SE +/- 0.00955, N = 3 3.56511 3.55970 3.57153 MIN: 1.9 MIN: 1.89 MIN: 1.89 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 5 10 15 20 25 SE +/- 0.19, N = 3 SE +/- 0.29, N = 3 SE +/- 0.25, N = 3 18.51 18.68 18.66 MIN: 17.06 MIN: 17.2 MIN: 17.14 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 0.9074 1.8148 2.7222 3.6296 4.537 SE +/- 0.02609, N = 3 SE +/- 0.02054, N = 3 SE +/- 0.03270, N = 3 4.03281 4.00713 4.01827 MIN: 3.67 MIN: 3.65 MIN: 3.67 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 3 6 9 12 15 SE +/- 0.10760, N = 15 SE +/- 0.11235, N = 3 SE +/- 0.11809, N = 15 9.04439 9.03893 9.08767 MIN: 6.91 MIN: 6.98 MIN: 6.99 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 6 12 18 24 30 SE +/- 0.12, N = 3 SE +/- 0.45, N = 12 SE +/- 0.12, N = 3 23.30 22.46 23.21 MIN: 21.63 MIN: 11.33 MIN: 20.94 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 1.0608 2.1216 3.1824 4.2432 5.304 SE +/- 0.17305, N = 15 SE +/- 0.12692, N = 15 SE +/- 0.04344, N = 3 4.60248 4.71473 4.22841 MIN: 3.92 MIN: 3.93 MIN: 3.96 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 0.993 1.986 2.979 3.972 4.965 SE +/- 0.05240, N = 6 SE +/- 0.06219, N = 3 SE +/- 0.06103, N = 4 4.41314 4.37097 4.40049 MIN: 4.07 MIN: 4.07 MIN: 4.06 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 2K 4K 6K 8K 10K SE +/- 164.15, N = 12 SE +/- 128.18, N = 12 SE +/- 184.49, N = 13 10732.73 10747.50 10314.22 MIN: 8370.79 MIN: 9600.6 MIN: 7551.61 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 700 1400 2100 2800 3500 SE +/- 45.21, N = 15 SE +/- 38.80, N = 15 SE +/- 11.78, N = 3 3293.49 3322.68 3393.82 MIN: 2548 MIN: 2956.39 MIN: 3348.85 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 2K 4K 6K 8K 10K SE +/- 206.22, N = 10 SE +/- 204.17, N = 12 SE +/- 330.55, N = 12 10583.10 10647.60 10812.54 MIN: 9226.33 MIN: 8942.86 MIN: 8507.26 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 700 1400 2100 2800 3500 SE +/- 44.01, N = 15 SE +/- 41.23, N = 6 SE +/- 46.22, N = 15 3300.07 3434.14 3327.91 MIN: 2751.78 MIN: 3003.06 MIN: 2885.53 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 0.3916 0.7832 1.1748 1.5664 1.958 SE +/- 0.06136, N = 15 SE +/- 0.05674, N = 12 SE +/- 0.06598, N = 15 1.71220 1.74056 1.66512 MIN: 1.12 MIN: 1.11 MIN: 0.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU Run 1 Run 2 Run 3 2K 4K 6K 8K 10K SE +/- 258.76, N = 9 SE +/- 182.83, N = 12 SE +/- 137.39, N = 4 10689.16 10915.65 11077.90 MIN: 9144.24 MIN: 8687.22 MIN: 10188.8 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU Run 1 Run 2 Run 3 700 1400 2100 2800 3500 SE +/- 31.60, N = 10 SE +/- 58.86, N = 15 SE +/- 51.82, N = 3 3332.79 3312.30 3405.82 MIN: 2567.62 MIN: 2572.81 MIN: 3281.7 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU Run 1 Run 2 Run 3 0.4036 0.8072 1.2108 1.6144 2.018 SE +/- 0.01884, N = 3 SE +/- 0.01534, N = 15 SE +/- 0.01875, N = 3 1.78892 1.77953 1.79366 MIN: 1.66 MIN: 1.57 MIN: 1.66 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode Run 1 Run 2 Run 3 3 6 9 12 15 SE +/- 0.00, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 10.19 10.22 10.20 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: Kostya Run 1 Run 2 Run 3 0.0743 0.1486 0.2229 0.2972 0.3715 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.33 0.33 0.33 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: LargeRandom Run 1 Run 2 Run 3 0.063 0.126 0.189 0.252 0.315 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.28 0.28 0.28 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: PartialTweets Run 1 Run 2 Run 3 0.081 0.162 0.243 0.324 0.405 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.36 0.36 0.36 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: DistinctUserID Run 1 Run 2 Run 3 0.0833 0.1666 0.2499 0.3332 0.4165 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.37 0.37 0.37 1. (CXX) g++ options: -O3 -pthread
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 Run 1 Run 2 Run 3 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.98, N = 3 SE +/- 0.17, N = 3 90.12 90.32 90.11 1. (CC) gcc options: -O2 -ldl -lz -lpthread
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.3.9 Time To Compile Run 1 Run 2 Run 3 30 60 90 120 150 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.17, N = 3 120.02 119.98 120.19
Timed FFmpeg Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed FFmpeg Compilation 4.2.2 Time To Compile Run 1 Run 2 Run 3 9 18 27 36 45 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 39.09 39.11 39.19
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.1 Pfam Database Search Run 1 Run 2 Run 3 40 80 120 160 200 SE +/- 0.06, N = 3 SE +/- 0.82, N = 3 SE +/- 0.16, N = 3 200.30 199.71 200.75 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm
Timed MAFFT Alignment Multiple Sequence Alignment - LSU RNA OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 7.471 Multiple Sequence Alignment - LSU RNA Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 SE +/- 0.04, N = 3 15.02 15.15 15.02 1. (CC) gcc options: -std=c99 -O3 -lm -lpthread
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.02, N = 5 SE +/- 0.02, N = 5 SE +/- 0.00, N = 5 17.32 17.31 17.29 1. (CXX) g++ options: -rdynamic
Phoronix Test Suite v10.8.4