Intel Xeon Silver 4216 testing with a TYAN S7100AG2NR (V4.02 BIOS) and ASPEED on Debian 10 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2103218-HA-XEONSILVE43 Xeon Silver March - Phoronix Test Suite Xeon Silver March Intel Xeon Silver 4216 testing with a TYAN S7100AG2NR (V4.02 BIOS) and ASPEED on Debian 10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103218-HA-XEONSILVE43&sor&gru .
Xeon Silver March Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 2 3 Intel Xeon Silver 4216 @ 3.20GHz (16 Cores / 32 Threads) TYAN S7100AG2NR (V4.02 BIOS) Intel Sky Lake-E DMI3 Registers 24GB 240GB Corsair Force MP500 ASPEED Realtek ALC892 2 x Intel I350 Debian 10 4.19.0-9-amd64 (x86_64) GNOME Shell 3.30.2 X Server GCC 8.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0x500002c Python Details - 1: Python 2.7.16 + Python 3.7.3 Security Details - itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
Xeon Silver March sysbench: CPU aom-av1: Speed 0 Two-Pass aom-av1: Speed 4 Two-Pass aom-av1: Speed 6 Realtime aom-av1: Speed 6 Two-Pass aom-av1: Speed 8 Realtime svt-hevc: 1 - Bosphorus 1080p svt-hevc: 7 - Bosphorus 1080p svt-hevc: 10 - Bosphorus 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p simdjson: Kostya simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID luaradio: Five Back to Back FIR Filters luaradio: FM Deemphasis Filter luaradio: Hilbert Transform luaradio: Complex Phase sysbench: RAM / Memory stockfish: Total Time onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 incompact3d: input.i3d 129 Cells Per Direction incompact3d: input.i3d 193 Cells Per Direction basis: ETC1S basis: UASTC Level 0 basis: UASTC Level 2 basis: UASTC Level 3 1 2 3 22943.61 0.21 3.77 11.76 9.29 30.06 7.88 120.65 258.29 216.44 221.50 169.62 1.13 0.38 1.37 1.48 808.6 254.2 55.4 434.1 15699.71 31665033 4.75486 3.93294 1.05123 1.45952 10.5995 3.12486 6.57468 11.7592 7.65076 6.39957 1.26827 1.88615 3586.70 1859.12 3589.87 16.2810 18.4274 21.7683 1852.81 1.61069 3589.54 1853.05 0.833192 3.56301 8.133 43.656 5.038 3.265 55.095 16.7026520 66.4466426 32.705 10.084 31.892 57.170 22942.83 0.20 3.77 11.68 9.31 30.00 7.86 121.08 257.29 218.37 221.91 169.81 1.12 0.38 1.37 1.48 807.8 254.0 55.4 438.2 15632.40 31089824 4.74640 3.85096 1.04921 1.42021 10.5877 3.12458 6.51645 11.7549 7.64752 6.21612 1.27125 1.88395 3588.58 1856.49 3589.97 16.2907 18.4257 21.7742 1853.80 1.61866 3593.32 1853.25 0.834428 3.56246 8.130 43.985 5.095 3.227 54.547 16.8470281 66.6669617 32.657 10.066 31.871 57.186 22943.20 0.21 3.76 11.70 9.31 29.64 7.89 120.93 255.87 219.08 222.05 170.48 1.13 0.38 1.36 1.48 810.2 253.3 55.4 439.2 15678.47 31357896 4.74045 3.91852 1.0489 1.44027 10.5943 3.13052 6.55537 11.7840 7.63646 6.16607 1.27234 1.88619 3592.55 1854.28 3592.30 16.2782 18.4369 21.7774 1852.34 1.60482 3588.05 1855.59 0.830736 3.56446 8.236 44.232 5.099 3.292 54.090 16.8595543 66.1601740 32.610 10.072 31.868 57.064 OpenBenchmarking.org
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU 1 3 2 5K 10K 15K 20K 25K SE +/- 0.67, N = 3 SE +/- 1.02, N = 3 SE +/- 0.76, N = 3 22943.61 22943.20 22942.83 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 0 Two-Pass 3 1 2 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.21 0.21 0.20 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 4 Two-Pass 2 1 3 0.8483 1.6966 2.5449 3.3932 4.2415 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.77 3.77 3.76 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Realtime 1 3 2 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 11.76 11.70 11.68 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Two-Pass 3 2 1 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 9.31 9.31 9.29 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 8 Realtime 1 2 3 7 14 21 28 35 SE +/- 0.15, N = 3 SE +/- 0.25, N = 3 SE +/- 0.20, N = 3 30.06 30.00 29.64 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
SVT-HEVC Tuning: 1 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p 3 1 2 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 7.89 7.88 7.86 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-HEVC Tuning: 7 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p 2 3 1 30 60 90 120 150 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 121.08 120.93 120.65 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-HEVC Tuning: 10 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p 1 2 3 60 120 180 240 300 SE +/- 0.64, N = 3 SE +/- 0.42, N = 3 SE +/- 0.38, N = 3 258.29 257.29 255.87 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 Tuning: VMAF Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p 3 2 1 50 100 150 200 250 SE +/- 1.43, N = 3 SE +/- 0.27, N = 3 SE +/- 0.35, N = 3 219.08 218.37 216.44 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p 3 2 1 50 100 150 200 250 SE +/- 0.82, N = 3 SE +/- 0.07, N = 3 SE +/- 0.92, N = 3 222.05 221.91 221.50 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p 3 2 1 40 80 120 160 200 SE +/- 0.77, N = 3 SE +/- 0.35, N = 3 SE +/- 0.45, N = 3 170.48 169.81 169.62 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya 3 1 2 0.2543 0.5086 0.7629 1.0172 1.2715 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 4 1.13 1.13 1.12 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom 3 2 1 0.0855 0.171 0.2565 0.342 0.4275 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.38 0.38 0.38 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets 2 1 3 0.3083 0.6166 0.9249 1.2332 1.5415 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.37 1.37 1.36 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID 3 2 1 0.333 0.666 0.999 1.332 1.665 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.48 1.48 1.48 1. (CXX) g++ options: -O3 -pthread
LuaRadio Test: Five Back to Back FIR Filters OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Five Back to Back FIR Filters 3 1 2 200 400 600 800 1000 SE +/- 0.70, N = 3 SE +/- 2.34, N = 3 SE +/- 2.49, N = 3 810.2 808.6 807.8
LuaRadio Test: FM Deemphasis Filter OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: FM Deemphasis Filter 1 2 3 60 120 180 240 300 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.66, N = 3 254.2 254.0 253.3
LuaRadio Test: Hilbert Transform OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Hilbert Transform 3 2 1 12 24 36 48 60 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 55.4 55.4 55.4
LuaRadio Test: Complex Phase OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Complex Phase 3 2 1 100 200 300 400 500 SE +/- 1.11, N = 3 SE +/- 3.89, N = 3 SE +/- 3.21, N = 3 439.2 438.2 434.1
Sysbench Test: RAM / Memory OpenBenchmarking.org MiB/sec, More Is Better Sysbench 1.0.20 Test: RAM / Memory 1 3 2 3K 6K 9K 12K 15K SE +/- 44.47, N = 3 SE +/- 4.37, N = 3 SE +/- 37.93, N = 3 15699.71 15678.47 15632.40 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time 1 3 2 7M 14M 21M 28M 35M SE +/- 284219.69, N = 3 SE +/- 418440.60, N = 4 SE +/- 411515.60, N = 5 31665033 31357896 31089824 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fprofile-use -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 3 2 1 1.0698 2.1396 3.2094 4.2792 5.349 SE +/- 0.00145, N = 3 SE +/- 0.00637, N = 3 SE +/- 0.00438, N = 3 4.74045 4.74640 4.75486 MIN: 4.64 MIN: 4.5 MIN: 4.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 2 3 1 0.8849 1.7698 2.6547 3.5396 4.4245 SE +/- 0.02483, N = 3 SE +/- 0.03733, N = 3 SE +/- 0.02492, N = 3 3.85096 3.91852 3.93294 MIN: 3.76 MIN: 3.8 MIN: 3.83 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.2365 0.473 0.7095 0.946 1.1825 SE +/- 0.00161, N = 3 SE +/- 0.00059, N = 3 SE +/- 0.00328, N = 3 1.04890 1.04921 1.05123 MIN: 1 MIN: 1 MIN: 1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 2 3 1 0.3284 0.6568 0.9852 1.3136 1.642 SE +/- 0.00155, N = 3 SE +/- 0.00257, N = 3 SE +/- 0.01077, N = 3 1.42021 1.44027 1.45952 MIN: 1.37 MIN: 1.39 MIN: 1.4 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 2 3 1 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 10.59 10.59 10.60 MIN: 9.92 MIN: 9.87 MIN: 9.95 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 0.7044 1.4088 2.1132 2.8176 3.522 SE +/- 0.00672, N = 3 SE +/- 0.01034, N = 3 SE +/- 0.00725, N = 3 3.12458 3.12486 3.13052 MIN: 3.02 MIN: 3.02 MIN: 3.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 2 3 1 2 4 6 8 10 SE +/- 0.00975, N = 3 SE +/- 0.03840, N = 3 SE +/- 0.04270, N = 3 6.51645 6.55537 6.57468 MIN: 6.45 MIN: 6.44 MIN: 6.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 2 1 3 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 11.75 11.76 11.78 MIN: 11.36 MIN: 11.07 MIN: 11.5 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 3 2 1 2 4 6 8 10 SE +/- 0.00108, N = 3 SE +/- 0.00616, N = 3 SE +/- 0.01058, N = 3 7.63646 7.64752 7.65076 MIN: 7.62 MIN: 7.62 MIN: 7.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 3 2 1 2 4 6 8 10 SE +/- 0.05945, N = 3 SE +/- 0.09250, N = 3 SE +/- 0.08087, N = 3 6.16607 6.21612 6.39957 MIN: 6.04 MIN: 6.06 MIN: 6.25 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2863 0.5726 0.8589 1.1452 1.4315 SE +/- 0.00141, N = 3 SE +/- 0.00167, N = 3 SE +/- 0.00498, N = 3 1.26827 1.27125 1.27234 MIN: 1.26 MIN: 1.26 MIN: 1.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 2 1 3 0.4244 0.8488 1.2732 1.6976 2.122 SE +/- 0.00013, N = 3 SE +/- 0.00222, N = 3 SE +/- 0.00235, N = 3 1.88395 1.88615 1.88619 MIN: 1.88 MIN: 1.88 MIN: 1.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 1.49, N = 3 SE +/- 1.40, N = 3 SE +/- 5.39, N = 3 3586.70 3588.58 3592.55 MIN: 3583.33 MIN: 3583.51 MIN: 3582.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 3 2 1 400 800 1200 1600 2000 SE +/- 0.29, N = 3 SE +/- 5.14, N = 3 SE +/- 1.06, N = 3 1854.28 1856.49 1859.12 MIN: 1851.24 MIN: 1849.03 MIN: 1850.63 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 1.80, N = 3 SE +/- 3.00, N = 3 SE +/- 5.29, N = 3 3589.87 3589.97 3592.30 MIN: 3583.15 MIN: 3582.71 MIN: 3584.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 3 1 2 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 16.28 16.28 16.29 MIN: 16.26 MIN: 16.27 MIN: 16.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 18.43 18.43 18.44 MIN: 18.28 MIN: 18.28 MIN: 18.28 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 21.77 21.77 21.78 MIN: 21.68 MIN: 21.66 MIN: 21.69 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 3 1 2 400 800 1200 1600 2000 SE +/- 0.88, N = 3 SE +/- 0.49, N = 3 SE +/- 1.78, N = 3 1852.34 1852.81 1853.80 MIN: 1848.76 MIN: 1850.49 MIN: 1848.59 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 3 1 2 0.3642 0.7284 1.0926 1.4568 1.821 SE +/- 0.00500, N = 3 SE +/- 0.00563, N = 3 SE +/- 0.00101, N = 3 1.60482 1.61069 1.61866 MIN: 1.56 MIN: 1.57 MIN: 1.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 3 1 2 800 1600 2400 3200 4000 SE +/- 0.91, N = 3 SE +/- 2.28, N = 3 SE +/- 4.71, N = 3 3588.05 3589.54 3593.32 MIN: 3584.71 MIN: 3583 MIN: 3583.23 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 400 800 1200 1600 2000 SE +/- 0.80, N = 3 SE +/- 1.77, N = 3 SE +/- 4.07, N = 3 1853.05 1853.25 1855.59 MIN: 1849.21 MIN: 1848.5 MIN: 1848.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 3 1 2 0.1877 0.3754 0.5631 0.7508 0.9385 SE +/- 0.004141, N = 3 SE +/- 0.005518, N = 3 SE +/- 0.000758, N = 3 0.830736 0.833192 0.834428 MIN: 0.8 MIN: 0.8 MIN: 0.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 0.802 1.604 2.406 3.208 4.01 SE +/- 0.00246, N = 3 SE +/- 0.00273, N = 3 SE +/- 0.00178, N = 3 3.56246 3.56301 3.56446 MIN: 3.49 MIN: 3.51 MIN: 3.52 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: SqueezeNetV1.0 2 1 3 2 4 6 8 10 SE +/- 0.048, N = 3 SE +/- 0.082, N = 3 SE +/- 0.068, N = 3 8.130 8.133 8.236 MIN: 7.68 / MAX: 12.26 MIN: 7.75 / MAX: 14.52 MIN: 7.7 / MAX: 9.92 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: resnet-v2-50 1 2 3 10 20 30 40 50 SE +/- 0.14, N = 3 SE +/- 0.20, N = 3 SE +/- 0.17, N = 3 43.66 43.99 44.23 MIN: 43.28 / MAX: 56.66 MIN: 43.06 / MAX: 56.17 MIN: 43.21 / MAX: 56.94 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: MobileNetV2_224 1 2 3 1.1473 2.2946 3.4419 4.5892 5.7365 SE +/- 0.022, N = 3 SE +/- 0.041, N = 3 SE +/- 0.012, N = 3 5.038 5.095 5.099 MIN: 4.8 / MAX: 5.28 MIN: 4.61 / MAX: 8.07 MIN: 4.59 / MAX: 10.01 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: mobilenet-v1-1.0 2 1 3 0.7407 1.4814 2.2221 2.9628 3.7035 SE +/- 0.027, N = 3 SE +/- 0.044, N = 3 SE +/- 0.015, N = 3 3.227 3.265 3.292 MIN: 2.92 / MAX: 16.04 MIN: 3.07 / MAX: 3.63 MIN: 3.15 / MAX: 3.55 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: inception-v3 3 2 1 12 24 36 48 60 SE +/- 0.05, N = 3 SE +/- 0.47, N = 3 SE +/- 0.48, N = 3 54.09 54.55 55.10 MIN: 53.71 / MAX: 67.03 MIN: 53.67 / MAX: 69.4 MIN: 53.83 / MAX: 68.3 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction 1 2 3 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 16.70 16.85 16.86 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 3 1 2 15 30 45 60 75 SE +/- 0.37, N = 3 SE +/- 0.08, N = 3 SE +/- 0.33, N = 3 66.16 66.45 66.67 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Basis Universal Settings: ETC1S OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: ETC1S 3 2 1 8 16 24 32 40 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 32.61 32.66 32.71 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 0 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 0 2 3 1 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 10.07 10.07 10.08 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 2 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 2 3 2 1 7 14 21 28 35 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 31.87 31.87 31.89 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 3 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 3 3 1 2 13 26 39 52 65 SE +/- 0.06, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 57.06 57.17 57.19 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Phoronix Test Suite v10.8.4