ARM64 oneDNN 2.0

Ampere Altra ARMv8 Neoverse-N1 testing with a WIWYNN Mt.Jade (1.1.20201019 BIOS) and ASPEED on Ubuntu 20.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2012096-HA-ARM64ONED36.

ARM64 oneDNN 2.0ProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDisplay ServerDisplay DriverCompilerFile-SystemScreen Resolution1Ampere Altra ARMv8 Neoverse-N1 @ 3.30GHz (160 Cores)WIWYNN Mt.Jade (1.1.20201019 BIOS)Ampere Computing LLC Device e100502GB3841GB Micron_9300_MTFDHAL3T8TDP + 960GB SAMSUNG MZ1LB960HAJQ-00007ASPEEDVE228Mellanox MT28908 + Intel I210Ubuntu 20.105.10.0-051000rc6daily20201206-generic (aarch64) 20201206X Server 1.20.9modesetting 1.20.9GCC 10.2.0ext41920x1080OpenBenchmarking.org- CXXFLAGS="-O3 -march=armv8.2-a -mtune=neoverse-n1" CFLAGS="-O3 -march=armv8.2-a -mtune=neoverse-n1" - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - Scaling Governor: cppc_cpufreq performance (Boost: Enabled)- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ARM64 oneDNN 2.0onednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU155.6889121.39282.2356179.469143.52134.669534.723565.942020.737435.542716960.116887.115974.116839.017.2265OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU11326395265SE +/- 0.12, N = 355.69MIN: 491. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU1306090120150SE +/- 0.43, N = 3121.39MIN: 118.051. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU120406080100SE +/- 0.13, N = 382.24MIN: 73.611. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU14080120160200SE +/- 0.82, N = 3179.47MIN: 167.231. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU1306090120150SE +/- 1.60, N = 3143.52MIN: 135.021. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU1816243240SE +/- 0.22, N = 334.67MIN: 27.261. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU1816243240SE +/- 0.45, N = 434.72MIN: 25.611. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU11530456075SE +/- 0.37, N = 365.94MIN: 38.481. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU1510152025SE +/- 0.09, N = 320.74MIN: 13.961. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU1816243240SE +/- 0.09, N = 335.54MIN: 32.931. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU14K8K12K16K20KSE +/- 586.35, N = 1016960.1MIN: 12083.41. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU14K8K12K16K20KSE +/- 490.90, N = 916887.1MIN: 11151.41. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU13K6K9K12K15KSE +/- 687.41, N = 915974.1MIN: 11432.31. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU14K8K12K16K20KSE +/- 724.35, N = 1216839.0MIN: 8880.141. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU148121620SE +/- 0.23, N = 317.23MIN: 14.471. (CXX) g++ options: -O3 -march=armv8.2-a -mtune=neoverse-n1 -std=c++11 -fopenmp -mcpu=native -fPIC -pie -lpthread


Phoronix Test Suite v10.8.4