MKL-DNN GCC 9 Cascadelake Compiler Tuning

2 x Intel Xeon Platinum 8280 testing with a GIGABYTE MD61-SC2-00 v01000100 (T15 BIOS) and ASPEED Family on Ubuntu 18.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/1904200-HV-MKLDNNGCC43.

MKL-DNN GCC 9 Cascadelake Compiler TuningProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverCompilerFile-SystemScreen Resolution-O3 -march=skylake-avx512-O3 -march=cascadelake2 x Intel Xeon Platinum 8280 @ 4.00GHz (56 Cores / 112 Threads)GIGABYTE MD61-SC2-00 v01000100 (T15 BIOS)Intel Sky Lake-E DMI3 Registers386048MBSamsung SSD 970 PRO 512GBASPEED FamilyVE2282 x Intel X722 for 1GbE + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 18.045.1.0-999-generic (x86_64) 20190416GNOME Shell 3.28.3X Server 1.20.1modesetting 1.20.1GCC 9.0.1 20190414ext41920x1080OpenBenchmarking.orgEnvironment Details- -O3 -march=skylake-avx512: CXXFLAGS=-O3-march=skylake-avx512 CFLAGS=-O3-march=skylake-avx512 - -O3 -march=cascadelake: CXXFLAGS=-O3-march=cascadelake CFLAGS=-O3-march=cascadelakeCompiler Details- --disable-multilib --enable-checking=releaseProcessor Details- Scaling Governor: intel_pstate powersaveSecurity Details- __user pointer sanitization + Enhanced IBRS IBPB: conditional RSB filling + SSB disabled via prctl and seccomp

MKL-DNN GCC 9 Cascadelake Compiler Tuningmkl-dnn: IP Batch 1D - f32mkl-dnn: IP Batch All - f32mkl-dnn: IP Batch 1D - u8s8u8s32mkl-dnn: IP Batch All - u8s8u8s32mkl-dnn: Convolution Batch conv_3d - f32mkl-dnn: Convolution Batch conv_all - f32mkl-dnn: Deconvolution Batch deconv_1d - f32mkl-dnn: Deconvolution Batch deconv_3d - f32mkl-dnn: Convolution Batch conv_alexnet - f32mkl-dnn: Deconvolution Batch deconv_all - f32mkl-dnn: Convolution Batch conv_3d - u8s8u8s32mkl-dnn: Convolution Batch conv_all - u8s8u8s32mkl-dnn: Convolution Batch conv_googlenet_v3 - f32mkl-dnn: Deconvolution Batch deconv_1d - u8s8u8s32mkl-dnn: Deconvolution Batch deconv_3d - u8s8u8s32mkl-dnn: Convolution Batch conv_alexnet - u8s8u8s32mkl-dnn: Deconvolution Batch deconv_all - u8s8u8s32mkl-dnn: Convolution Batch conv_googlenet_v3 - u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake17.552494.5972.4985.671321823.7727.41175610271125838595817275.407973545212645617717.442494.5872.3885.401321523.5327.77175310257128646594547275.4179948453126386177OpenBenchmarking.org

MKL-DNN

Harness: IP Batch 1D - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch 1D - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake48121620SE +/- 0.06, N = 3SE +/- 0.03, N = 317.5517.44MIN: 16.94-march=cascadelake - MIN: 16.921. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: IP Batch All - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch All - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake50100150200250SE +/- 0.17, N = 3SE +/- 0.10, N = 3249249MIN: 245.97-march=cascadelake - MIN: 245.951. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: IP Batch 1D - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch 1D - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake1.03282.06563.09844.13125.164SE +/- 0.00, N = 3SE +/- 0.00, N = 34.594.58MIN: 4.22-march=cascadelake - MIN: 4.191. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: IP Batch All - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: IP Batch All - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake1632486480SE +/- 0.13, N = 3SE +/- 0.02, N = 372.4972.38MIN: 70.54-march=cascadelake - MIN: 70.761. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_3d - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_3d - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake20406080100SE +/- 0.20, N = 3SE +/- 0.22, N = 385.6785.40MIN: 83.5-march=cascadelake - MIN: 83.331. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_all - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_all - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake3K6K9K12K15KSE +/- 5.34, N = 3SE +/- 8.35, N = 31321813215MIN: 13095.4-march=cascadelake - MIN: 13089.51. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_1d - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_1d - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake612182430SE +/- 0.38, N = 3SE +/- 0.27, N = 623.7723.53MIN: 22.94-march=cascadelake - MIN: 22.831. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_3d - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_3d - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake714212835SE +/- 0.06, N = 3SE +/- 0.08, N = 327.4127.77MIN: 26.68-march=cascadelake - MIN: 271. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_alexnet - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_alexnet - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake400800120016002000SE +/- 0.82, N = 3SE +/- 1.65, N = 317561753MIN: 1740.28-march=cascadelake - MIN: 1737.111. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_all - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_all - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake2K4K6K8K10KSE +/- 4.71, N = 3SE +/- 2.13, N = 31027110257MIN: 10175.5-march=cascadelake - MIN: 10166.61. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_3d - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_3d - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake30K60K90K120K150KSE +/- 1214.69, N = 3SE +/- 1664.64, N = 9125838128646MIN: 124475-march=cascadelake - MIN: 1244051. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_all - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_all - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake13K26K39K52K65KSE +/- 321.79, N = 3SE +/- 265.15, N = 35958159454MIN: 59174-march=cascadelake - MIN: 591151. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32-O3 -march=skylake-avx512-O3 -march=cascadelake160320480640800SE +/- 0.66, N = 3SE +/- 0.39, N = 3727727MIN: 717.67-march=cascadelake - MIN: 717.661. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_1d - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_1d - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake1.21732.43463.65194.86926.0865SE +/- 0.01, N = 3SE +/- 0.01, N = 35.405.41MIN: 5.32-march=cascadelake - MIN: 5.321. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_3d - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_3d - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake20K40K60K80K100KSE +/- 76.36, N = 3SE +/- 139.78, N = 37973579948MIN: 79643-march=cascadelake - MIN: 79663.41. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_alexnet - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_alexnet - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake100200300400500SE +/- 0.13, N = 3SE +/- 0.33, N = 3452453MIN: 444.46-march=cascadelake - MIN: 444.071. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Deconvolution Batch deconv_all - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Deconvolution Batch deconv_all - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake30K60K90K120K150KSE +/- 186.57, N = 3SE +/- 134.27, N = 3126456126386MIN: 126023-march=cascadelake - MIN: 1260691. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl

MKL-DNN

Harness: Convolution Batch conv_googlenet_v3 - Data Type: u8s8u8s32

OpenBenchmarking.orgms, Fewer Is BetterMKL-DNN 2019-04-16Harness: Convolution Batch conv_googlenet_v3 - Data Type: u8s8u8s32-O3 -march=skylake-avx512-O3 -march=cascadelake4080120160200SE +/- 0.11, N = 3SE +/- 0.10, N = 3177177MIN: 175.69-march=cascadelake - MIN: 175.831. (CXX) g++ options: -O3 -std=c++11 -march=native -mtune=native -fPIC -fopenmp -pie -lmklml_intel -ldl


Phoronix Test Suite v10.8.4