Intel Xeon Silver 4216 testing with a TYAN S7100AG2NR (V4.02 BIOS) and ASPEED on Debian 10 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2103218-HA-XEONSILVE43 Xeon Silver March - Phoronix Test Suite Xeon Silver March Intel Xeon Silver 4216 testing with a TYAN S7100AG2NR (V4.02 BIOS) and ASPEED on Debian 10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103218-HA-XEONSILVE43&grr&sro .
Xeon Silver March Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 2 3 Intel Xeon Silver 4216 @ 3.20GHz (16 Cores / 32 Threads) TYAN S7100AG2NR (V4.02 BIOS) Intel Sky Lake-E DMI3 Registers 24GB 240GB Corsair Force MP500 ASPEED Realtek ALC892 2 x Intel I350 Debian 10 4.19.0-9-amd64 (x86_64) GNOME Shell 3.30.2 X Server GCC 8.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002c Python Details - 1: Python 2.7.16 + Python 3.7.3 Security Details - itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Xeon Silver March luaradio: Complex Phase luaradio: Hilbert Transform luaradio: FM Deemphasis Filter luaradio: Five Back to Back FIR Filters aom-av1: Speed 4 Two-Pass mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: resnet-v2-50 mnn: SqueezeNetV1.0 aom-av1: Speed 0 Two-Pass sysbench: CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - f32 - CPU aom-av1: Speed 6 Two-Pass onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU svt-hevc: 1 - Bosphorus 1080p onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU simdjson: Kostya incompact3d: input.i3d 193 Cells Per Direction simdjson: LargeRand basis: UASTC Level 3 stockfish: Total Time aom-av1: Speed 6 Realtime simdjson: DistinctUserID simdjson: PartialTweets basis: ETC1S basis: UASTC Level 2 onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU aom-av1: Speed 8 Realtime incompact3d: input.i3d 129 Cells Per Direction onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU basis: UASTC Level 0 onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU sysbench: RAM / Memory onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU svt-hevc: 7 - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU svt-hevc: 10 - Bosphorus 1080p 1 2 3 434.1 55.4 254.2 808.6 3.77 55.095 3.265 5.038 43.656 8.133 0.21 22943.61 3589.87 3589.54 3586.70 9.29 1853.05 7.88 1852.81 1859.12 1.13 66.4466426 0.38 57.170 31665033 11.76 1.48 1.37 32.705 31.892 18.4274 11.7592 1.26827 30.06 16.7026520 4.75486 10.5995 1.05123 1.61069 0.833192 3.56301 10.084 3.93294 3.12486 1.45952 15699.71 16.2810 6.57468 6.39957 120.65 169.62 216.44 221.50 21.7683 7.65076 1.88615 258.29 438.2 55.4 254.0 807.8 3.77 54.547 3.227 5.095 43.985 8.130 0.20 22942.83 3589.97 3593.32 3588.58 9.31 1853.25 7.86 1853.80 1856.49 1.12 66.6669617 0.38 57.186 31089824 11.68 1.48 1.37 32.657 31.871 18.4257 11.7549 1.27125 30.00 16.8470281 4.74640 10.5877 1.04921 1.61866 0.834428 3.56246 10.066 3.85096 3.12458 1.42021 15632.40 16.2907 6.51645 6.21612 121.08 169.81 218.37 221.91 21.7742 7.64752 1.88395 257.29 439.2 55.4 253.3 810.2 3.76 54.090 3.292 5.099 44.232 8.236 0.21 22943.20 3592.30 3588.05 3592.55 9.31 1855.59 7.89 1852.34 1854.28 1.13 66.1601740 0.38 57.064 31357896 11.70 1.48 1.36 32.610 31.868 18.4369 11.7840 1.27234 29.64 16.8595543 4.74045 10.5943 1.0489 1.60482 0.830736 3.56446 10.072 3.91852 3.13052 1.44027 15678.47 16.2782 6.55537 6.16607 120.93 170.48 219.08 222.05 21.7774 7.63646 1.88619 255.87 OpenBenchmarking.org
LuaRadio Test: Complex Phase OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Complex Phase 1 2 3 100 200 300 400 500 SE +/- 3.21, N = 3 SE +/- 3.89, N = 3 SE +/- 1.11, N = 3 434.1 438.2 439.2
LuaRadio Test: Hilbert Transform OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Hilbert Transform 1 2 3 12 24 36 48 60 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 55.4 55.4 55.4
LuaRadio Test: FM Deemphasis Filter OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: FM Deemphasis Filter 1 2 3 60 120 180 240 300 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.66, N = 3 254.2 254.0 253.3
LuaRadio Test: Five Back to Back FIR Filters OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Five Back to Back FIR Filters 1 2 3 200 400 600 800 1000 SE +/- 2.34, N = 3 SE +/- 2.49, N = 3 SE +/- 0.70, N = 3 808.6 807.8 810.2
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 4 Two-Pass 1 2 3 0.8483 1.6966 2.5449 3.3932 4.2415 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.77 3.77 3.76 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: inception-v3 1 2 3 12 24 36 48 60 SE +/- 0.48, N = 3 SE +/- 0.47, N = 3 SE +/- 0.05, N = 3 55.10 54.55 54.09 MIN: 53.83 / MAX: 68.3 MIN: 53.67 / MAX: 69.4 MIN: 53.71 / MAX: 67.03 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: mobilenet-v1-1.0 1 2 3 0.7407 1.4814 2.2221 2.9628 3.7035 SE +/- 0.044, N = 3 SE +/- 0.027, N = 3 SE +/- 0.015, N = 3 3.265 3.227 3.292 MIN: 3.07 / MAX: 3.63 MIN: 2.92 / MAX: 16.04 MIN: 3.15 / MAX: 3.55 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: MobileNetV2_224 1 2 3 1.1473 2.2946 3.4419 4.5892 5.7365 SE +/- 0.022, N = 3 SE +/- 0.041, N = 3 SE +/- 0.012, N = 3 5.038 5.095 5.099 MIN: 4.8 / MAX: 5.28 MIN: 4.61 / MAX: 8.07 MIN: 4.59 / MAX: 10.01 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: resnet-v2-50 1 2 3 10 20 30 40 50 SE +/- 0.14, N = 3 SE +/- 0.20, N = 3 SE +/- 0.17, N = 3 43.66 43.99 44.23 MIN: 43.28 / MAX: 56.66 MIN: 43.06 / MAX: 56.17 MIN: 43.21 / MAX: 56.94 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: SqueezeNetV1.0 1 2 3 2 4 6 8 10 SE +/- 0.082, N = 3 SE +/- 0.048, N = 3 SE +/- 0.068, N = 3 8.133 8.130 8.236 MIN: 7.75 / MAX: 14.52 MIN: 7.68 / MAX: 12.26 MIN: 7.7 / MAX: 9.92 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 0 Two-Pass 1 2 3 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.21 0.20 0.21 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU 1 2 3 5K 10K 15K 20K 25K SE +/- 0.67, N = 3 SE +/- 0.76, N = 3 SE +/- 1.02, N = 3 22943.61 22942.83 22943.20 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 1.80, N = 3 SE +/- 3.00, N = 3 SE +/- 5.29, N = 3 3589.87 3589.97 3592.30 MIN: 3583.15 MIN: 3582.71 MIN: 3584.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 2.28, N = 3 SE +/- 4.71, N = 3 SE +/- 0.91, N = 3 3589.54 3593.32 3588.05 MIN: 3583 MIN: 3583.23 MIN: 3584.71 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 1.49, N = 3 SE +/- 1.40, N = 3 SE +/- 5.39, N = 3 3586.70 3588.58 3592.55 MIN: 3583.33 MIN: 3583.51 MIN: 3582.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Two-Pass 1 2 3 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 9.29 9.31 9.31 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 400 800 1200 1600 2000 SE +/- 0.80, N = 3 SE +/- 1.77, N = 3 SE +/- 4.07, N = 3 1853.05 1853.25 1855.59 MIN: 1849.21 MIN: 1848.5 MIN: 1848.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SVT-HEVC Tuning: 1 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p 1 2 3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 7.88 7.86 7.89 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 400 800 1200 1600 2000 SE +/- 0.49, N = 3 SE +/- 1.78, N = 3 SE +/- 0.88, N = 3 1852.81 1853.80 1852.34 MIN: 1850.49 MIN: 1848.59 MIN: 1848.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 400 800 1200 1600 2000 SE +/- 1.06, N = 3 SE +/- 5.14, N = 3 SE +/- 0.29, N = 3 1859.12 1856.49 1854.28 MIN: 1850.63 MIN: 1849.03 MIN: 1851.24 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya 1 2 3 0.2543 0.5086 0.7629 1.0172 1.2715 SE +/- 0.00, N = 3 SE +/- 0.02, N = 4 SE +/- 0.00, N = 3 1.13 1.12 1.13 1. (CXX) g++ options: -O3 -pthread
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 1 2 3 15 30 45 60 75 SE +/- 0.08, N = 3 SE +/- 0.33, N = 3 SE +/- 0.37, N = 3 66.45 66.67 66.16 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom 1 2 3 0.0855 0.171 0.2565 0.342 0.4275 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.38 0.38 0.38 1. (CXX) g++ options: -O3 -pthread
Basis Universal Settings: UASTC Level 3 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 3 1 2 3 13 26 39 52 65 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 57.17 57.19 57.06 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time 1 2 3 7M 14M 21M 28M 35M SE +/- 284219.69, N = 3 SE +/- 411515.60, N = 5 SE +/- 418440.60, N = 4 31665033 31089824 31357896 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fprofile-use -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Realtime 1 2 3 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 11.76 11.68 11.70 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID 1 2 3 0.333 0.666 0.999 1.332 1.665 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.48 1.48 1.48 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets 1 2 3 0.3083 0.6166 0.9249 1.2332 1.5415 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.37 1.37 1.36 1. (CXX) g++ options: -O3 -pthread
Basis Universal Settings: ETC1S OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: ETC1S 1 2 3 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 32.71 32.66 32.61 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 2 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 2 1 2 3 7 14 21 28 35 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 31.89 31.87 31.87 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 18.43 18.43 18.44 MIN: 18.28 MIN: 18.28 MIN: 18.28 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 1 2 3 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 11.76 11.75 11.78 MIN: 11.07 MIN: 11.36 MIN: 11.5 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2863 0.5726 0.8589 1.1452 1.4315 SE +/- 0.00141, N = 3 SE +/- 0.00167, N = 3 SE +/- 0.00498, N = 3 1.26827 1.27125 1.27234 MIN: 1.26 MIN: 1.26 MIN: 1.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 8 Realtime 1 2 3 7 14 21 28 35 SE +/- 0.15, N = 3 SE +/- 0.25, N = 3 SE +/- 0.20, N = 3 30.06 30.00 29.64 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction 1 2 3 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 16.70 16.85 16.86 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 1.0698 2.1396 3.2094 4.2792 5.349 SE +/- 0.00438, N = 3 SE +/- 0.00637, N = 3 SE +/- 0.00145, N = 3 4.75486 4.74640 4.74045 MIN: 4.62 MIN: 4.5 MIN: 4.64 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 10.60 10.59 10.59 MIN: 9.95 MIN: 9.92 MIN: 9.87 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2365 0.473 0.7095 0.946 1.1825 SE +/- 0.00328, N = 3 SE +/- 0.00059, N = 3 SE +/- 0.00161, N = 3 1.05123 1.04921 1.04890 MIN: 1 MIN: 1 MIN: 1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.3642 0.7284 1.0926 1.4568 1.821 SE +/- 0.00563, N = 3 SE +/- 0.00101, N = 3 SE +/- 0.00500, N = 3 1.61069 1.61866 1.60482 MIN: 1.57 MIN: 1.57 MIN: 1.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.1877 0.3754 0.5631 0.7508 0.9385 SE +/- 0.005518, N = 3 SE +/- 0.000758, N = 3 SE +/- 0.004141, N = 3 0.833192 0.834428 0.830736 MIN: 0.8 MIN: 0.79 MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 0.802 1.604 2.406 3.208 4.01 SE +/- 0.00273, N = 3 SE +/- 0.00246, N = 3 SE +/- 0.00178, N = 3 3.56301 3.56246 3.56446 MIN: 3.51 MIN: 3.49 MIN: 3.52 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Basis Universal Settings: UASTC Level 0 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 0 1 2 3 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 10.08 10.07 10.07 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 2 3 0.8849 1.7698 2.6547 3.5396 4.4245 SE +/- 0.02492, N = 3 SE +/- 0.02483, N = 3 SE +/- 0.03733, N = 3 3.93294 3.85096 3.91852 MIN: 3.83 MIN: 3.76 MIN: 3.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 0.7044 1.4088 2.1132 2.8176 3.522 SE +/- 0.01034, N = 3 SE +/- 0.00672, N = 3 SE +/- 0.00725, N = 3 3.12486 3.12458 3.13052 MIN: 3.02 MIN: 3.02 MIN: 3.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3284 0.6568 0.9852 1.3136 1.642 SE +/- 0.01077, N = 3 SE +/- 0.00155, N = 3 SE +/- 0.00257, N = 3 1.45952 1.42021 1.44027 MIN: 1.4 MIN: 1.37 MIN: 1.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Sysbench Test: RAM / Memory OpenBenchmarking.org MiB/sec, More Is Better Sysbench 1.0.20 Test: RAM / Memory 1 2 3 3K 6K 9K 12K 15K SE +/- 44.47, N = 3 SE +/- 37.93, N = 3 SE +/- 4.37, N = 3 15699.71 15632.40 15678.47 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 16.28 16.29 16.28 MIN: 16.27 MIN: 16.26 MIN: 16.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.04270, N = 3 SE +/- 0.00975, N = 3 SE +/- 0.03840, N = 3 6.57468 6.51645 6.55537 MIN: 6.45 MIN: 6.45 MIN: 6.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.08087, N = 3 SE +/- 0.09250, N = 3 SE +/- 0.05945, N = 3 6.39957 6.21612 6.16607 MIN: 6.25 MIN: 6.06 MIN: 6.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SVT-HEVC Tuning: 7 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p 1 2 3 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 120.65 121.08 120.93 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p 1 2 3 40 80 120 160 200 SE +/- 0.45, N = 3 SE +/- 0.35, N = 3 SE +/- 0.77, N = 3 169.62 169.81 170.48 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: VMAF Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p 1 2 3 50 100 150 200 250 SE +/- 0.35, N = 3 SE +/- 0.27, N = 3 SE +/- 1.43, N = 3 216.44 218.37 219.08 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p 1 2 3 50 100 150 200 250 SE +/- 0.92, N = 3 SE +/- 0.07, N = 3 SE +/- 0.82, N = 3 221.50 221.91 222.05 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 21.77 21.77 21.78 MIN: 21.68 MIN: 21.66 MIN: 21.69 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.01058, N = 3 SE +/- 0.00616, N = 3 SE +/- 0.00108, N = 3 7.65076 7.64752 7.63646 MIN: 7.62 MIN: 7.62 MIN: 7.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.4244 0.8488 1.2732 1.6976 2.122 SE +/- 0.00222, N = 3 SE +/- 0.00013, N = 3 SE +/- 0.00235, N = 3 1.88615 1.88395 1.88619 MIN: 1.88 MIN: 1.88 MIN: 1.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SVT-HEVC Tuning: 10 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p 1 2 3 60 120 180 240 300 SE +/- 0.64, N = 3 SE +/- 0.42, N = 3 SE +/- 0.38, N = 3 258.29 257.29 255.87 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
Phoronix Test Suite v10.8.4