MKL-DNN GCC 9 Cascadelake Compiler Tuning

2 x Intel Xeon Platinum 8280 testing with a GIGABYTE MD61-SC2-00 v01000100 (T15 BIOS) and ASPEED Family on Ubuntu 18.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/1904200-HV-MKLDNNGCC43&sor.

MKL-DNN GCC 9 Cascadelake Compiler TuningProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverCompilerFile-SystemScreen Resolution-O3 -march=skylake-avx512-O3 -march=cascadelake2 x Intel Xeon Platinum 8280 @ 4.00GHz (56 Cores / 112 Threads)GIGABYTE MD61-SC2-00 v01000100 (T15 BIOS)Intel Sky Lake-E DMI3 Registers386048MBSamsung SSD 970 PRO 512GBASPEED FamilyVE2282 x Intel X722 for 1GbE + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 18.045.1.0-999-generic (x86_64) 20190416GNOME Shell 3.28.3X Server 1.20.1modesetting 1.20.1GCC 9.0.1 20190414ext41920x1080OpenBenchmarking.orgEnvironment Details- -O3 -march=skylake-avx512: CXXFLAGS=-O3-march=skylake-avx512 CFLAGS=-O3-march=skylake-avx512 - -O3 -march=cascadelake: CXXFLAGS=-O3-march=cascadelake CFLAGS=-O3-march=cascadelakeCompiler Details- --disable-multilib --enable-checking=releaseProcessor Details- Scaling Governor: intel_pstate powersaveSecurity Details- __user pointer sanitization + Enhanced IBRS IBPB: conditional RSB filling + SSB disabled via prctl and seccomp

MKL-DNN GCC 9 Cascadelake Compiler Tuningmkl-dnn: IP Batch 1D - f32mkl-dnn: IP Batch All - f32mkl-dnn: IP Batch 1D - u8s8u8s32mkl-dnn: IP Batch All - u8s8u8s32mkl-dnn: Convolution Batch conv_3d - f32mkl-dnn: Convolution Batch conv_all - f32mkl-dnn: Deconvolution Batch deconv_1d - f32mkl-dnn: Deconvolution Batch deconv_3d - f32mkl-dnn: Convolution Batch conv_alexnet - f32mkl-dnn: Deconvolution Batch deconv_all - f32mkl-dnn: Convolution Batch conv_3d - u8s8u8s32mkl-dnn: Convolution Batch conv_all - u8s8u8s32mkl-dnn: Convolution Batch conv_googlenet_v3 - f32mkl-dnn: Deconvolution Batch deconv_1d - u8s8u8s32mkl-dnn: Deconvolution Batch deconv_3d - u8s8u8s32mkl-dnn: Convolution Batch conv_alexnet - u8s8u8s32mkl-dnn: Deconvolution Batch deconv_all - u8s8u8s32mkl-dnn: Convolution Batch conv_googlenet_v3 - u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake17.552494.5972.4985.671321823.7727.41175610271125838595817275.407973545212645617717.442494.5872.3885.401321523.5327.77175310257128646594547275.4179948453126386177OpenBenchmarking.org

MKL-DNN

Harness: IP Batch 1D - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch 1D - Data Type: f32-O3 -march=cascadelake-O3 -march=skylake-avx51248121620SE +/- 0.03, N = 3SE +/- 0.06, N = 317.4417.55-march=cascadelake - MIN: 16.92MIN: 16.941. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: IP Batch All - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch All - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake50100150200250SE +/- 0.17, N = 3SE +/- 0.10, N = 3249249MIN: 245.97-march=cascadelake - MIN: 245.951. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: IP Batch 1D - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch 1D - Data Type: u8s8u8s32-O3 -march=cascadelake-O3 -march=skylake-avx5121.03282.06563.09844.13125.164SE +/- 0.00, N = 3SE +/- 0.00, N = 34.584.59-march=cascadelake - MIN: 4.19MIN: 4.221. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: IP Batch All - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch All - Data Type: u8s8u8s32-O3 -march=cascadelake-O3 -march=skylake-avx5121632486480SE +/- 0.02, N = 3SE +/- 0.13, N = 372.3872.49-march=cascadelake - MIN: 70.76MIN: 70.541. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_3d - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_3d - Data Type: f32-O3 -march=cascadelake-O3 -march=skylake-avx51220406080100SE +/- 0.22, N = 3SE +/- 0.20, N = 385.4085.67-march=cascadelake - MIN: 83.33MIN: 83.51. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_all - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_all - Data Type: f32-O3 -march=cascadelake-O3 -march=skylake-avx5123K6K9K12K15KSE +/- 8.35, N = 3SE +/- 5.34, N = 31321513218-march=cascadelake - MIN: 13089.5MIN: 13095.41. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_1d - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_1d - Data Type: f32-O3 -march=cascadelake-O3 -march=skylake-avx512612182430SE +/- 0.27, N = 6SE +/- 0.38, N = 323.5323.77-march=cascadelake - MIN: 22.83MIN: 22.941. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_3d - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_3d - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake714212835SE +/- 0.06, N = 3SE +/- 0.08, N = 327.4127.77MIN: 26.68-march=cascadelake - MIN: 271. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_alexnet - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_alexnet - Data Type: f32-O3 -march=cascadelake-O3 -march=skylake-avx512400800120016002000SE +/- 1.65, N = 3SE +/- 0.82, N = 317531756-march=cascadelake - MIN: 1737.11MIN: 1740.281. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_all - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_all - Data Type: f32-O3 -march=cascadelake-O3 -march=skylake-avx5122K4K6K8K10KSE +/- 2.13, N = 3SE +/- 4.71, N = 31025710271-march=cascadelake - MIN: 10166.6MIN: 10175.51. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_3d - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_3d - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake30K60K90K120K150KSE +/- 1214.69, N = 3SE +/- 1664.64, N = 9125838128646MIN: 124475-march=cascadelake - MIN: 1244051. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_all - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_all - Data Type: u8s8u8s32-O3 -march=cascadelake-O3 -march=skylake-avx51213K26K39K52K65KSE +/- 265.15, N = 3SE +/- 321.79, N = 35945459581-march=cascadelake - MIN: 59115MIN: 591741. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake160320480640800SE +/- 0.66, N = 3SE +/- 0.39, N = 3727727MIN: 717.67-march=cascadelake - MIN: 717.661. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_1d - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_1d - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake1.21732.43463.65194.86926.0865SE +/- 0.01, N = 3SE +/- 0.01, N = 35.405.41MIN: 5.32-march=cascadelake - MIN: 5.321. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_3d - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_3d - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake20K40K60K80K100KSE +/- 76.36, N = 3SE +/- 139.78, N = 37973579948MIN: 79643-march=cascadelake - MIN: 79663.41. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_alexnet - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_alexnet - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake100200300400500SE +/- 0.13, N = 3SE +/- 0.33, N = 3452453MIN: 444.46-march=cascadelake - MIN: 444.071. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_all - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_all - Data Type: u8s8u8s32-O3 -march=cascadelake-O3 -march=skylake-avx51230K60K90K120K150KSE +/- 134.27, N = 3SE +/- 186.57, N = 3126386126456-march=cascadelake - MIN: 126069MIN: 1260231. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_googlenet_v3 - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_googlenet_v3 - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake4080120160200SE +/- 0.11, N = 3SE +/- 0.10, N = 3177177MIN: 175.69-march=cascadelake - MIN: 175.831. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl


Phoronix Test Suite v10.8.4