Intel Xeon Silver 4216 testing with a TYAN S7100AG2NR (V4.02 BIOS) and ASPEED on Debian 10 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2103218-HA-XEONSILVE43 Xeon Silver March - Phoronix Test Suite Xeon Silver March Intel Xeon Silver 4216 testing with a TYAN S7100AG2NR (V4.02 BIOS) and ASPEED on Debian 10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103218-HA-XEONSILVE43&grs&sor&rro .
Xeon Silver March Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 2 3 Intel Xeon Silver 4216 @ 3.20GHz (16 Cores / 32 Threads) TYAN S7100AG2NR (V4.02 BIOS) Intel Sky Lake-E DMI3 Registers 24GB 240GB Corsair Force MP500 ASPEED Realtek ALC892 2 x Intel I350 Debian 10 4.19.0-9-amd64 (x86_64) GNOME Shell 3.30.2 X Server GCC 8.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002c Python Details - 1: Python 2.7.16 + Python 3.7.3 Security Details - itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Xeon Silver March aom-av1: Speed 0 Two-Pass onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU mnn: mobilenet-v1-1.0 mnn: inception-v3 stockfish: Total Time aom-av1: Speed 8 Realtime mnn: resnet-v2-50 mnn: SqueezeNetV1.0 svt-vp9: VMAF Optimized - Bosphorus 1080p mnn: MobileNetV2_224 luaradio: Complex Phase svt-hevc: 10 - Bosphorus 1080p incompact3d: input.i3d 129 Cells Per Direction onednn: Convolution Batch Shapes Auto - f32 - CPU simdjson: Kostya onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU incompact3d: input.i3d 193 Cells Per Direction simdjson: PartialTweets aom-av1: Speed 6 Realtime svt-vp9: Visual Quality Optimized - Bosphorus 1080p onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU sysbench: RAM / Memory svt-hevc: 1 - Bosphorus 1080p svt-hevc: 7 - Bosphorus 1080p luaradio: FM Deemphasis Filter onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU luaradio: Five Back to Back FIR Filters basis: ETC1S aom-av1: Speed 4 Two-Pass onednn: Recurrent Neural Network Inference - f32 - CPU svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU aom-av1: Speed 6 Two-Pass basis: UASTC Level 3 onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU basis: UASTC Level 0 onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU basis: UASTC Level 2 onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU sysbench: CPU luaradio: Hilbert Transform simdjson: DistinctUserID simdjson: LargeRand 1 2 3 0.21 6.39957 1.45952 3.93294 3.265 55.095 31665033 30.06 43.656 8.133 216.44 5.038 434.1 258.29 16.7026520 6.57468 1.13 1.61069 66.4466426 1.37 11.76 169.62 0.833192 15699.71 7.88 120.65 254.2 1.26827 4.75486 808.6 32.705 3.77 1859.12 221.50 11.7592 1.05123 9.29 57.170 3.12486 7.65076 10.084 3586.70 3589.54 1853.05 1.88615 10.5995 1852.81 16.2810 31.892 3589.87 18.4274 3.56301 21.7683 22943.61 55.4 1.48 0.38 0.20 6.21612 1.42021 3.85096 3.227 54.547 31089824 30.00 43.985 8.130 218.37 5.095 438.2 257.29 16.8470281 6.51645 1.12 1.61866 66.6669617 1.37 11.68 169.81 0.834428 15632.40 7.86 121.08 254.0 1.27125 4.74640 807.8 32.657 3.77 1856.49 221.91 11.7549 1.04921 9.31 57.186 3.12458 7.64752 10.066 3588.58 3593.32 1853.25 1.88395 10.5877 1853.80 16.2907 31.871 3589.97 18.4257 3.56246 21.7742 22942.83 55.4 1.48 0.38 0.21 6.16607 1.44027 3.91852 3.292 54.090 31357896 29.64 44.232 8.236 219.08 5.099 439.2 255.87 16.8595543 6.55537 1.13 1.60482 66.1601740 1.36 11.70 170.48 0.830736 15678.47 7.89 120.93 253.3 1.27234 4.74045 810.2 32.610 3.76 1854.28 222.05 11.7840 1.0489 9.31 57.064 3.13052 7.63646 10.072 3592.55 3588.05 1855.59 1.88619 10.5943 1852.34 16.2782 31.868 3592.30 18.4369 3.56446 21.7774 22943.20 55.4 1.48 0.38 OpenBenchmarking.org
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 0 Two-Pass 2 1 3 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.20 0.21 0.21 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.08087, N = 3 SE +/- 0.09250, N = 3 SE +/- 0.05945, N = 3 6.39957 6.21612 6.16607 MIN: 6.25 MIN: 6.06 MIN: 6.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 3 2 0.3284 0.6568 0.9852 1.3136 1.642 SE +/- 0.01077, N = 3 SE +/- 0.00257, N = 3 SE +/- 0.00155, N = 3 1.45952 1.44027 1.42021 MIN: 1.4 MIN: 1.39 MIN: 1.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 3 2 0.8849 1.7698 2.6547 3.5396 4.4245 SE +/- 0.02492, N = 3 SE +/- 0.03733, N = 3 SE +/- 0.02483, N = 3 3.93294 3.91852 3.85096 MIN: 3.83 MIN: 3.8 MIN: 3.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: mobilenet-v1-1.0 3 1 2 0.7407 1.4814 2.2221 2.9628 3.7035 SE +/- 0.015, N = 3 SE +/- 0.044, N = 3 SE +/- 0.027, N = 3 3.292 3.265 3.227 MIN: 3.15 / MAX: 3.55 MIN: 3.07 / MAX: 3.63 MIN: 2.92 / MAX: 16.04 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: inception-v3 1 2 3 12 24 36 48 60 SE +/- 0.48, N = 3 SE +/- 0.47, N = 3 SE +/- 0.05, N = 3 55.10 54.55 54.09 MIN: 53.83 / MAX: 68.3 MIN: 53.67 / MAX: 69.4 MIN: 53.71 / MAX: 67.03 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time 2 3 1 7M 14M 21M 28M 35M SE +/- 411515.60, N = 5 SE +/- 418440.60, N = 4 SE +/- 284219.69, N = 3 31089824 31357896 31665033 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fprofile-use -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 8 Realtime 3 2 1 7 14 21 28 35 SE +/- 0.20, N = 3 SE +/- 0.25, N = 3 SE +/- 0.15, N = 3 29.64 30.00 30.06 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: resnet-v2-50 3 2 1 10 20 30 40 50 SE +/- 0.17, N = 3 SE +/- 0.20, N = 3 SE +/- 0.14, N = 3 44.23 43.99 43.66 MIN: 43.21 / MAX: 56.94 MIN: 43.06 / MAX: 56.17 MIN: 43.28 / MAX: 56.66 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: SqueezeNetV1.0 3 1 2 2 4 6 8 10 SE +/- 0.068, N = 3 SE +/- 0.082, N = 3 SE +/- 0.048, N = 3 8.236 8.133 8.130 MIN: 7.7 / MAX: 9.92 MIN: 7.75 / MAX: 14.52 MIN: 7.68 / MAX: 12.26 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
SVT-VP9 Tuning: VMAF Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p 1 2 3 50 100 150 200 250 SE +/- 0.35, N = 3 SE +/- 0.27, N = 3 SE +/- 1.43, N = 3 216.44 218.37 219.08 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: MobileNetV2_224 3 2 1 1.1473 2.2946 3.4419 4.5892 5.7365 SE +/- 0.012, N = 3 SE +/- 0.041, N = 3 SE +/- 0.022, N = 3 5.099 5.095 5.038 MIN: 4.59 / MAX: 10.01 MIN: 4.61 / MAX: 8.07 MIN: 4.8 / MAX: 5.28 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
LuaRadio Test: Complex Phase OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Complex Phase 1 2 3 100 200 300 400 500 SE +/- 3.21, N = 3 SE +/- 3.89, N = 3 SE +/- 1.11, N = 3 434.1 438.2 439.2
SVT-HEVC Tuning: 10 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p 3 2 1 60 120 180 240 300 SE +/- 0.38, N = 3 SE +/- 0.42, N = 3 SE +/- 0.64, N = 3 255.87 257.29 258.29 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction 3 2 1 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 16.86 16.85 16.70 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 3 2 2 4 6 8 10 SE +/- 0.04270, N = 3 SE +/- 0.03840, N = 3 SE +/- 0.00975, N = 3 6.57468 6.55537 6.51645 MIN: 6.45 MIN: 6.44 MIN: 6.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya 2 1 3 0.2543 0.5086 0.7629 1.0172 1.2715 SE +/- 0.02, N = 4 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.12 1.13 1.13 1. (CXX) g++ options: -O3 -pthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 2 1 3 0.3642 0.7284 1.0926 1.4568 1.821 SE +/- 0.00101, N = 3 SE +/- 0.00563, N = 3 SE +/- 0.00500, N = 3 1.61866 1.61069 1.60482 MIN: 1.57 MIN: 1.57 MIN: 1.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 2 1 3 15 30 45 60 75 SE +/- 0.33, N = 3 SE +/- 0.08, N = 3 SE +/- 0.37, N = 3 66.67 66.45 66.16 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets 3 1 2 0.3083 0.6166 0.9249 1.2332 1.5415 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.36 1.37 1.37 1. (CXX) g++ options: -O3 -pthread
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Realtime 2 3 1 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 11.68 11.70 11.76 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
SVT-VP9 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p 1 2 3 40 80 120 160 200 SE +/- 0.45, N = 3 SE +/- 0.35, N = 3 SE +/- 0.77, N = 3 169.62 169.81 170.48 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 2 1 3 0.1877 0.3754 0.5631 0.7508 0.9385 SE +/- 0.000758, N = 3 SE +/- 0.005518, N = 3 SE +/- 0.004141, N = 3 0.834428 0.833192 0.830736 MIN: 0.79 MIN: 0.8 MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Sysbench Test: RAM / Memory OpenBenchmarking.org MiB/sec, More Is Better Sysbench 1.0.20 Test: RAM / Memory 2 3 1 3K 6K 9K 12K 15K SE +/- 37.93, N = 3 SE +/- 4.37, N = 3 SE +/- 44.47, N = 3 15632.40 15678.47 15699.71 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
SVT-HEVC Tuning: 1 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p 2 1 3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 7.86 7.88 7.89 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-HEVC Tuning: 7 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p 1 3 2 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 120.65 120.93 121.08 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
LuaRadio Test: FM Deemphasis Filter OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: FM Deemphasis Filter 3 2 1 60 120 180 240 300 SE +/- 0.66, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 253.3 254.0 254.2
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.2863 0.5726 0.8589 1.1452 1.4315 SE +/- 0.00498, N = 3 SE +/- 0.00167, N = 3 SE +/- 0.00141, N = 3 1.27234 1.27125 1.26827 MIN: 1.26 MIN: 1.26 MIN: 1.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 1.0698 2.1396 3.2094 4.2792 5.349 SE +/- 0.00438, N = 3 SE +/- 0.00637, N = 3 SE +/- 0.00145, N = 3 4.75486 4.74640 4.74045 MIN: 4.62 MIN: 4.5 MIN: 4.64 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LuaRadio Test: Five Back to Back FIR Filters OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Five Back to Back FIR Filters 2 1 3 200 400 600 800 1000 SE +/- 2.49, N = 3 SE +/- 2.34, N = 3 SE +/- 0.70, N = 3 807.8 808.6 810.2
Basis Universal Settings: ETC1S OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: ETC1S 1 2 3 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 32.71 32.66 32.61 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 4 Two-Pass 3 1 2 0.8483 1.6966 2.5449 3.3932 4.2415 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.76 3.77 3.77 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 400 800 1200 1600 2000 SE +/- 1.06, N = 3 SE +/- 5.14, N = 3 SE +/- 0.29, N = 3 1859.12 1856.49 1854.28 MIN: 1850.63 MIN: 1849.03 MIN: 1851.24 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SVT-VP9 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p 1 2 3 50 100 150 200 250 SE +/- 0.92, N = 3 SE +/- 0.07, N = 3 SE +/- 0.82, N = 3 221.50 221.91 222.05 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 3 1 2 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 11.78 11.76 11.75 MIN: 11.5 MIN: 11.07 MIN: 11.36 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2365 0.473 0.7095 0.946 1.1825 SE +/- 0.00328, N = 3 SE +/- 0.00059, N = 3 SE +/- 0.00161, N = 3 1.05123 1.04921 1.04890 MIN: 1 MIN: 1 MIN: 1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Two-Pass 1 2 3 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 9.29 9.31 9.31 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Basis Universal Settings: UASTC Level 3 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 3 2 1 3 13 26 39 52 65 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.06, N = 3 57.19 57.17 57.06 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 3 1 2 0.7044 1.4088 2.1132 2.8176 3.522 SE +/- 0.00725, N = 3 SE +/- 0.01034, N = 3 SE +/- 0.00672, N = 3 3.13052 3.12486 3.12458 MIN: 3.03 MIN: 3.02 MIN: 3.02 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.01058, N = 3 SE +/- 0.00616, N = 3 SE +/- 0.00108, N = 3 7.65076 7.64752 7.63646 MIN: 7.62 MIN: 7.62 MIN: 7.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Basis Universal Settings: UASTC Level 0 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 0 1 3 2 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 10.08 10.07 10.07 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 3 2 1 800 1600 2400 3200 4000 SE +/- 5.39, N = 3 SE +/- 1.40, N = 3 SE +/- 1.49, N = 3 3592.55 3588.58 3586.70 MIN: 3582.49 MIN: 3583.51 MIN: 3583.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 800 1600 2400 3200 4000 SE +/- 4.71, N = 3 SE +/- 2.28, N = 3 SE +/- 0.91, N = 3 3593.32 3589.54 3588.05 MIN: 3583.23 MIN: 3583 MIN: 3584.71 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 3 2 1 400 800 1200 1600 2000 SE +/- 4.07, N = 3 SE +/- 1.77, N = 3 SE +/- 0.80, N = 3 1855.59 1853.25 1853.05 MIN: 1848.04 MIN: 1848.5 MIN: 1849.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 3 1 2 0.4244 0.8488 1.2732 1.6976 2.122 SE +/- 0.00235, N = 3 SE +/- 0.00222, N = 3 SE +/- 0.00013, N = 3 1.88619 1.88615 1.88395 MIN: 1.88 MIN: 1.88 MIN: 1.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 1 3 2 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 10.60 10.59 10.59 MIN: 9.95 MIN: 9.87 MIN: 9.92 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 2 1 3 400 800 1200 1600 2000 SE +/- 1.78, N = 3 SE +/- 0.49, N = 3 SE +/- 0.88, N = 3 1853.80 1852.81 1852.34 MIN: 1848.59 MIN: 1850.49 MIN: 1848.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 16.29 16.28 16.28 MIN: 16.26 MIN: 16.27 MIN: 16.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Basis Universal Settings: UASTC Level 2 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 2 1 2 3 7 14 21 28 35 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 31.89 31.87 31.87 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 3 2 1 800 1600 2400 3200 4000 SE +/- 5.29, N = 3 SE +/- 3.00, N = 3 SE +/- 1.80, N = 3 3592.30 3589.97 3589.87 MIN: 3584.03 MIN: 3582.71 MIN: 3583.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 3 1 2 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 18.44 18.43 18.43 MIN: 18.28 MIN: 18.28 MIN: 18.28 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 3 1 2 0.802 1.604 2.406 3.208 4.01 SE +/- 0.00178, N = 3 SE +/- 0.00273, N = 3 SE +/- 0.00246, N = 3 3.56446 3.56301 3.56246 MIN: 3.52 MIN: 3.51 MIN: 3.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 3 2 1 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 21.78 21.77 21.77 MIN: 21.69 MIN: 21.66 MIN: 21.68 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU 2 3 1 5K 10K 15K 20K 25K SE +/- 0.76, N = 3 SE +/- 1.02, N = 3 SE +/- 0.67, N = 3 22942.83 22943.20 22943.61 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
LuaRadio Test: Hilbert Transform OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Hilbert Transform 1 2 3 12 24 36 48 60 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 55.4 55.4 55.4
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID 1 2 3 0.333 0.666 0.999 1.332 1.665 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.48 1.48 1.48 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom 1 2 3 0.0855 0.171 0.2565 0.342 0.4275 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.38 0.38 0.38 1. (CXX) g++ options: -O3 -pthread
Phoronix Test Suite v10.8.4