AMD EPYC 7601 32-Core testing with a TYAN B8026T70AE24HR (V1.02.B10 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2012222-HA-AMDEPYC7628 AMD EPYC 7601 Xmas 2020 - Phoronix Test Suite AMD EPYC 7601 Xmas 2020 AMD EPYC 7601 32-Core testing with a TYAN B8026T70AE24HR (V1.02.B10 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2012222-HA-AMDEPYC7628&grr&sor&rro .
AMD EPYC 7601 Xmas 2020 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution Run 1 Run 2 Run 3 AMD EPYC 7601 32-Core @ 2.20GHz (32 Cores / 64 Threads) TYAN B8026T70AE24HR (V1.02.B10 BIOS) AMD 17h 126GB 280GB INTEL SSDPE21D280GA llvmpipe VE228 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 20.04 5.4.0-53-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 modesetting 1.20.8 3.3 Mesa 20.0.8 (LLVM 10.0.0 128 bits) GCC 9.3.0 ext4 1920x1080 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7601 Xmas 2020 ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU hmmer: Pfam Database Search node-web-tooling: build-eigen: Time To Compile build2: Time To Compile sqlite-speedtest: Timed Time - Size 1,000 onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU clomp: Static OMP Speedup simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU simdjson: Kostya onednn: IP Shapes 3D - f32 - CPU build-ffmpeg: Time To Compile encode-ape: WAV To APE encode-wavpack: WAV To WavPack onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU coremark: CoreMark Size 666 - Iterations Per Second onednn: Deconvolution Batch shapes_1d - f32 - CPU encode-opus: WAV To Opus Encode onednn: IP Shapes 1D - u8s8f32 - CPU mafft: Multiple Sequence Alignment - LSU RNA onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU Run 1 Run 2 Run 3 117.02 46.68 57.99 60.78 33.20 41.83 100.72 48.06 7.79 22.19 16.17 17.51 16.30 17.42 43.10 10732.73 10583.10 10689.16 3300.07 3293.49 3332.79 200.295 6.78 120.016 102.295 90.116 4.60248 5.34771 57.1 0.28 0.36 0.37 1.71220 0.33 12.4177 39.094 18.346 17.319 1.78892 879248.022638 4.03281 10.187 2.67937 15.018 23.3030 9.04439 3.56511 18.5128 4.41314 119.23 46.89 55.80 59.24 30.19 45.70 94.33 46.99 7.89 22.24 15.76 17.35 17.48 19.42 41.84 10747.5 10647.60 10915.65 3434.14 3322.68 3312.30 199.708 6.74 119.981 102.588 90.320 4.71473 4.49148 57.7 0.28 0.36 0.37 1.74056 0.33 11.7699 39.108 18.332 17.312 1.77953 879237.122078 4.00713 10.215 2.68511 15.147 22.4576 9.03893 3.55970 18.6800 4.37097 118.48 44.81 56.52 59.49 31.92 43.64 88.55 49.81 7.90 23.22 16.26 16.94 16.91 18.22 43.26 10314.22 10812.54 11077.9 3327.91 3393.82 3405.82 200.753 6.85 120.191 102.354 90.107 4.22841 4.33519 57.8 0.28 0.36 0.37 1.66512 0.33 12.0856 39.189 18.416 17.292 1.79366 876909.950001 4.01827 10.195 2.66509 15.023 23.2120 9.08767 3.57153 18.6556 4.40049 OpenBenchmarking.org
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m Run 2 Run 3 Run 1 30 60 90 120 150 SE +/- 2.16, N = 12 SE +/- 1.61, N = 12 SE +/- 1.56, N = 12 119.23 118.48 117.02 MIN: 109.72 / MAX: 3748.32 MIN: 109.05 / MAX: 2000.57 MIN: 109.2 / MAX: 1631.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd Run 2 Run 1 Run 3 11 22 33 44 55 SE +/- 1.43, N = 12 SE +/- 1.46, N = 12 SE +/- 1.26, N = 12 46.89 46.68 44.81 MIN: 37.8 / MAX: 531.21 MIN: 37.34 / MAX: 524.93 MIN: 37.3 / MAX: 531.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny Run 1 Run 3 Run 2 13 26 39 52 65 SE +/- 1.01, N = 12 SE +/- 0.57, N = 12 SE +/- 0.72, N = 12 57.99 56.52 55.80 MIN: 46.53 / MAX: 296.18 MIN: 44.78 / MAX: 275.31 MIN: 46.04 / MAX: 269.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 Run 1 Run 3 Run 2 14 28 42 56 70 SE +/- 2.69, N = 12 SE +/- 3.15, N = 12 SE +/- 1.83, N = 12 60.78 59.49 59.24 MIN: 38.67 / MAX: 633.7 MIN: 38.42 / MAX: 662.24 MIN: 38.46 / MAX: 770.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet Run 1 Run 3 Run 2 8 16 24 32 40 SE +/- 1.65, N = 12 SE +/- 1.62, N = 12 SE +/- 1.11, N = 12 33.20 31.92 30.19 MIN: 18.26 / MAX: 171.51 MIN: 16.26 / MAX: 163.34 MIN: 16.07 / MAX: 156.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 Run 2 Run 3 Run 1 10 20 30 40 50 SE +/- 3.66, N = 12 SE +/- 2.71, N = 12 SE +/- 1.98, N = 12 45.70 43.64 41.83 MIN: 27.44 / MAX: 248.59 MIN: 27.63 / MAX: 246.76 MIN: 23.36 / MAX: 249.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 Run 1 Run 2 Run 3 20 40 60 80 100 SE +/- 5.01, N = 12 SE +/- 3.64, N = 12 SE +/- 3.70, N = 12 100.72 94.33 88.55 MIN: 45.51 / MAX: 338.45 MIN: 47.38 / MAX: 304.01 MIN: 43.77 / MAX: 279.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet Run 3 Run 1 Run 2 11 22 33 44 55 SE +/- 2.45, N = 12 SE +/- 2.27, N = 12 SE +/- 2.77, N = 12 49.81 48.06 46.99 MIN: 32.24 / MAX: 613.18 MIN: 32.75 / MAX: 604.8 MIN: 33.26 / MAX: 605.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface Run 3 Run 2 Run 1 2 4 6 8 10 SE +/- 0.09, N = 12 SE +/- 0.06, N = 12 SE +/- 0.06, N = 12 7.90 7.89 7.79 MIN: 7.46 / MAX: 59.21 MIN: 7.55 / MAX: 80.4 MIN: 7.48 / MAX: 49.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 Run 3 Run 2 Run 1 6 12 18 24 30 SE +/- 0.56, N = 12 SE +/- 0.31, N = 12 SE +/- 0.37, N = 12 23.22 22.24 22.19 MIN: 20.7 / MAX: 525.92 MIN: 20.84 / MAX: 524.85 MIN: 20.26 / MAX: 496.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet Run 3 Run 1 Run 2 4 8 12 16 20 SE +/- 0.43, N = 12 SE +/- 0.48, N = 12 SE +/- 0.24, N = 12 16.26 16.17 15.76 MIN: 14.79 / MAX: 427.39 MIN: 14.62 / MAX: 428.63 MIN: 14.78 / MAX: 415.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.42, N = 12 SE +/- 0.51, N = 12 SE +/- 0.13, N = 12 17.51 17.35 16.94 MIN: 15.95 / MAX: 355.59 MIN: 16 / MAX: 357.5 MIN: 16.11 / MAX: 120.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 Run 2 Run 3 Run 1 4 8 12 16 20 SE +/- 0.96, N = 12 SE +/- 0.59, N = 12 SE +/- 0.56, N = 12 17.48 16.91 16.30 MIN: 14.75 / MAX: 525.02 MIN: 14.94 / MAX: 447.34 MIN: 14.83 / MAX: 444.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 Run 2 Run 3 Run 1 5 10 15 20 25 SE +/- 1.19, N = 12 SE +/- 0.54, N = 12 SE +/- 0.22, N = 12 19.42 18.22 17.42 MIN: 15.84 / MAX: 439.61 MIN: 15.35 / MAX: 435.27 MIN: 15.79 / MAX: 249.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet Run 3 Run 1 Run 2 10 20 30 40 50 SE +/- 1.07, N = 12 SE +/- 1.57, N = 12 SE +/- 1.10, N = 12 43.26 43.10 41.84 MIN: 34.82 / MAX: 511.89 MIN: 35.09 / MAX: 501.41 MIN: 35.73 / MAX: 496.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU Run 2 Run 1 Run 3 2K 4K 6K 8K 10K SE +/- 128.18, N = 12 SE +/- 164.15, N = 12 SE +/- 184.49, N = 13 10747.50 10732.73 10314.22 MIN: 9600.6 MIN: 8370.79 MIN: 7551.61 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU Run 3 Run 2 Run 1 2K 4K 6K 8K 10K SE +/- 330.55, N = 12 SE +/- 204.17, N = 12 SE +/- 206.22, N = 10 10812.54 10647.60 10583.10 MIN: 8507.26 MIN: 8942.86 MIN: 9226.33 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU Run 3 Run 2 Run 1 2K 4K 6K 8K 10K SE +/- 137.39, N = 4 SE +/- 182.83, N = 12 SE +/- 258.76, N = 9 11077.90 10915.65 10689.16 MIN: 10188.8 MIN: 8687.22 MIN: 9144.24 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU Run 2 Run 3 Run 1 700 1400 2100 2800 3500 SE +/- 41.23, N = 6 SE +/- 46.22, N = 15 SE +/- 44.01, N = 15 3434.14 3327.91 3300.07 MIN: 3003.06 MIN: 2885.53 MIN: 2751.78 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU Run 3 Run 2 Run 1 700 1400 2100 2800 3500 SE +/- 11.78, N = 3 SE +/- 38.80, N = 15 SE +/- 45.21, N = 15 3393.82 3322.68 3293.49 MIN: 3348.85 MIN: 2956.39 MIN: 2548 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU Run 3 Run 1 Run 2 700 1400 2100 2800 3500 SE +/- 51.82, N = 3 SE +/- 31.60, N = 10 SE +/- 58.86, N = 15 3405.82 3332.79 3312.30 MIN: 3281.7 MIN: 2567.62 MIN: 2572.81 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.1 Pfam Database Search Run 3 Run 1 Run 2 40 80 120 160 200 SE +/- 0.16, N = 3 SE +/- 0.06, N = 3 SE +/- 0.82, N = 3 200.75 200.30 199.71 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm
Node.js V8 Web Tooling Benchmark OpenBenchmarking.org runs/s, More Is Better Node.js V8 Web Tooling Benchmark Run 2 Run 1 Run 3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 SE +/- 0.04, N = 3 6.74 6.78 6.85 1. Nodejs
v10.19.0
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.3.9 Time To Compile Run 3 Run 1 Run 2 30 60 90 120 150 SE +/- 0.17, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 120.19 120.02 119.98
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile Run 2 Run 3 Run 1 20 40 60 80 100 SE +/- 0.14, N = 3 SE +/- 0.04, N = 3 SE +/- 0.36, N = 3 102.59 102.35 102.30
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 Run 2 Run 1 Run 3 20 40 60 80 100 SE +/- 0.98, N = 3 SE +/- 0.01, N = 3 SE +/- 0.17, N = 3 90.32 90.12 90.11 1. (CC) gcc options: -O2 -ldl -lz -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU Run 2 Run 1 Run 3 1.0608 2.1216 3.1824 4.2432 5.304 SE +/- 0.12692, N = 15 SE +/- 0.17305, N = 15 SE +/- 0.04344, N = 3 4.71473 4.60248 4.22841 MIN: 3.93 MIN: 3.92 MIN: 3.96 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU Run 1 Run 2 Run 3 1.2032 2.4064 3.6096 4.8128 6.016 SE +/- 0.52810, N = 15 SE +/- 0.07284, N = 15 SE +/- 0.13685, N = 12 5.34771 4.49148 4.33519 MIN: 2.82 MIN: 2.85 MIN: 2.85 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
CLOMP Static OMP Speedup OpenBenchmarking.org Speedup, More Is Better CLOMP 1.2 Static OMP Speedup Run 1 Run 2 Run 3 13 26 39 52 65 SE +/- 0.32, N = 3 SE +/- 0.70, N = 3 SE +/- 0.43, N = 3 57.1 57.7 57.8 1. (CC) gcc options: -fopenmp -O3 -lm
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: LargeRandom Run 1 Run 2 Run 3 0.063 0.126 0.189 0.252 0.315 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.28 0.28 0.28 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: PartialTweets Run 1 Run 2 Run 3 0.081 0.162 0.243 0.324 0.405 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.36 0.36 0.36 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: DistinctUserID Run 1 Run 2 Run 3 0.0833 0.1666 0.2499 0.3332 0.4165 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.37 0.37 0.37 1. (CXX) g++ options: -O3 -pthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU Run 2 Run 1 Run 3 0.3916 0.7832 1.1748 1.5664 1.958 SE +/- 0.05674, N = 12 SE +/- 0.06136, N = 15 SE +/- 0.06598, N = 15 1.74056 1.71220 1.66512 MIN: 1.11 MIN: 1.12 MIN: 0.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.7.1 Throughput Test: Kostya Run 1 Run 2 Run 3 0.0743 0.1486 0.2229 0.2972 0.3715 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.33 0.33 0.33 1. (CXX) g++ options: -O3 -pthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU Run 1 Run 3 Run 2 3 6 9 12 15 SE +/- 0.32, N = 15 SE +/- 0.23, N = 15 SE +/- 0.25, N = 15 12.42 12.09 11.77 MIN: 3.03 MIN: 3.14 MIN: 3.13 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Timed FFmpeg Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed FFmpeg Compilation 4.2.2 Time To Compile Run 3 Run 2 Run 1 9 18 27 36 45 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 39.19 39.11 39.09
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE Run 3 Run 1 Run 2 5 10 15 20 25 SE +/- 0.08, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 18.42 18.35 18.33 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack Run 1 Run 2 Run 3 4 8 12 16 20 SE +/- 0.02, N = 5 SE +/- 0.02, N = 5 SE +/- 0.00, N = 5 17.32 17.31 17.29 1. (CXX) g++ options: -rdynamic
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU Run 3 Run 1 Run 2 0.4036 0.8072 1.2108 1.6144 2.018 SE +/- 0.01875, N = 3 SE +/- 0.01884, N = 3 SE +/- 0.01534, N = 15 1.79366 1.78892 1.77953 MIN: 1.66 MIN: 1.66 MIN: 1.57 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second Run 3 Run 2 Run 1 200K 400K 600K 800K 1000K SE +/- 2175.03, N = 3 SE +/- 1976.60, N = 3 SE +/- 5683.14, N = 3 876909.95 879237.12 879248.02 1. (CC) gcc options: -O2 -lrt" -lrt
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU Run 1 Run 3 Run 2 0.9074 1.8148 2.7222 3.6296 4.537 SE +/- 0.02609, N = 3 SE +/- 0.03270, N = 3 SE +/- 0.02054, N = 3 4.03281 4.01827 4.00713 MIN: 3.67 MIN: 3.67 MIN: 3.65 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode Run 2 Run 3 Run 1 3 6 9 12 15 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.00, N = 5 10.22 10.20 10.19 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU Run 2 Run 1 Run 3 0.6041 1.2082 1.8123 2.4164 3.0205 SE +/- 0.00743, N = 3 SE +/- 0.01516, N = 3 SE +/- 0.00279, N = 3 2.68511 2.67937 2.66509 MIN: 2.55 MIN: 2.56 MIN: 2.55 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Timed MAFFT Alignment Multiple Sequence Alignment - LSU RNA OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 7.471 Multiple Sequence Alignment - LSU RNA Run 2 Run 3 Run 1 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 15.15 15.02 15.02 1. (CC) gcc options: -std=c99 -O3 -lm -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU Run 1 Run 3 Run 2 6 12 18 24 30 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.45, N = 12 23.30 23.21 22.46 MIN: 21.63 MIN: 20.94 MIN: 11.33 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU Run 3 Run 1 Run 2 3 6 9 12 15 SE +/- 0.11809, N = 15 SE +/- 0.10760, N = 15 SE +/- 0.11235, N = 3 9.08767 9.04439 9.03893 MIN: 6.99 MIN: 6.91 MIN: 6.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU Run 3 Run 1 Run 2 0.8036 1.6072 2.4108 3.2144 4.018 SE +/- 0.00955, N = 3 SE +/- 0.01645, N = 3 SE +/- 0.02517, N = 3 3.57153 3.56511 3.55970 MIN: 1.89 MIN: 1.9 MIN: 1.89 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU Run 2 Run 3 Run 1 5 10 15 20 25 SE +/- 0.29, N = 3 SE +/- 0.25, N = 3 SE +/- 0.19, N = 3 18.68 18.66 18.51 MIN: 17.2 MIN: 17.14 MIN: 17.06 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU Run 1 Run 3 Run 2 0.993 1.986 2.979 3.972 4.965 SE +/- 0.05240, N = 6 SE +/- 0.06103, N = 4 SE +/- 0.06219, N = 3 4.41314 4.40049 4.37097 MIN: 4.07 MIN: 4.06 MIN: 4.07 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Phoronix Test Suite v10.8.4