oneDNN 3.0 Intel Sapphire Rapids AMX

2 x Intel Xeon Platinum 8490H oneDNN Intel AMX Sapphire Rapids benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2301158-NE-ONEDNN30I77.

oneDNN 3.0 Intel Sapphire Rapids AMXProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE2 x Intel Xeon Platinum 8490H @ 3.50GHz (120 Cores / 240 Threads)Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS)Intel Device 1bce16 x 64 GB 4800MT/s Samsung M321R8GA0BB0-CQKEG2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007 + 3841GB Micron_9300_MTFDHAL3T8TDP + 960GB INTEL SSDSC2KG96ASPEEDVGA HDMI4 x Intel E810-C for QSFP + 2 x Intel X710 for 10GBASE-TUbuntu 22.046.1.4-060104-generic (x86_64)GNOME Shell 42.2X Server 1.21.1.31.2.204GCC 11.3.0 + Clang 14.0.0-1ubuntu1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x2b0000c0Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

oneDNN 3.0 Intel Sapphire Rapids AMXonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE0.5806460.4457795.737170.329961850.3462.170731.045116.483660.383785875.2842.171091.048366.588410.386636864.4593.067212.378926.8243316.5751862.4213.056122.383616.6481616.4589856.896OpenBenchmarking.org

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE0.69011.38022.07032.76043.4505SE +/- 0.006207, N = 4SE +/- 0.004682, N = 3SE +/- 0.008260, N = 3SE +/- 0.014891, N = 3SE +/- 0.025853, N = 30.5806462.1707302.1710903.0672103.056120MIN: 0.47MIN: 2.01MIN: 2MIN: 2.85MIN: 2.841. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190027663501AVX512_CORE_FP16190028443506AVX512_CORE_BF16190027913502AVX512_CORE_VNNI190029173502AVX512_CORE190028583501OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX194534637AVX512_CORE_FP16194586721AVX512_CORE_BF16199591722AVX512_CORE_VNNI201596744AVX512_CORE197596744OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX29.041.447.0AVX512_CORE_FP1633.046.052.0AVX512_CORE_BF1628.043.151.0AVX512_CORE_VNNI29.043.751.0AVX512_CORE30.044.952.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1530456075

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE0.53631.07261.60892.14522.6815SE +/- 0.001544, N = 9SE +/- 0.003634, N = 9SE +/- 0.005254, N = 9SE +/- 0.007894, N = 9SE +/- 0.008552, N = 90.4457791.0451101.0483602.3789202.383610MIN: 0.38MIN: 1.01MIN: 1.01MIN: 2.3MIN: 2.291. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190029183508AVX512_CORE_FP16190027553504AVX512_CORE_BF16190031103502AVX512_CORE_VNNI190029523506AVX512_CORE190028673510OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX194440756AVX512_CORE_FP16196416702AVX512_CORE_BF16195425705AVX512_CORE_VNNI198428722AVX512_CORE200430721OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX35.040.245.0AVX512_CORE_FP1636.040.946.0AVX512_CORE_BF1635.040.245.0AVX512_CORE_VNNI35.039.846.0AVX512_CORE35.040.747.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1428425670

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE246810SE +/- 0.11275, N = 15SE +/- 0.27236, N = 15SE +/- 0.33773, N = 15SE +/- 0.31537, N = 15SE +/- 0.26087, N = 155.737176.483666.588416.824336.64816MIN: 3.75MIN: 3.65MIN: 3.66MIN: 4.26MIN: 3.811. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190029853512AVX512_CORE_FP16190029243510AVX512_CORE_BF16190029333508AVX512_CORE_VNNI190029403515AVX512_CORE190029263513OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX106493604AVX512_CORE_FP16195508657AVX512_CORE_BF16197510652AVX512_CORE_VNNI193534705AVX512_CORE200536710OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX34.043.447.0AVX512_CORE_FP1634.044.548.0AVX512_CORE_BF1634.044.247.0AVX512_CORE_VNNI34.045.150.0AVX512_CORE34.045.249.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1428425670

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE48121620SE +/- 0.002541, N = 4SE +/- 0.001848, N = 4SE +/- 0.003559, N = 4SE +/- 0.135276, N = 4SE +/- 0.370395, N = 150.3299610.3837850.38663616.57510016.458900MIN: 15.34MIN: 12.811. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190028775340AVX512_CORE_FP16190028773502AVX512_CORE_BF16190027663509AVX512_CORE_VNNI190030143512AVX512_CORE190029593508OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor13002600390052006500

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX197485593AVX512_CORE_FP16199512647AVX512_CORE_BF16196507647AVX512_CORE_VNNI199486601AVX512_CORE197487609OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX35.042.546.0AVX512_CORE_FP1636.044.248.0AVX512_CORE_BF1635.043.748.0AVX512_CORE_VNNI36.042.749.0AVX512_CORE35.042.846.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1428425670

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE2004006008001000SE +/- 9.48, N = 4SE +/- 12.20, N = 12SE +/- 10.05, N = 15SE +/- 13.04, N = 15SE +/- 10.59, N = 15850.35875.28864.46862.42856.90MIN: 797.61MIN: 777.12MIN: 758.74MIN: 767.35MIN: 741.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190029273518AVX512_CORE_FP16190029233516AVX512_CORE_BF16190029233515AVX512_CORE_VNNI190029243514AVX512_CORE190029263515OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX196576732AVX512_CORE_FP16194576732AVX512_CORE_BF16197577733AVX512_CORE_VNNI197576733AVX512_CORE201576734OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX37.048.953.0AVX512_CORE_FP1637.049.655.0AVX512_CORE_BF1638.049.754.0AVX512_CORE_VNNI35.048.753.0AVX512_CORE34.048.653.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1530456075

CPU Peak Freq (Highest CPU Core Frequency) Monitor

Phoronix Test Suite System Monitoring

OpenBenchmarking.orgMegahertzCPU Peak Freq (Highest CPU Core Frequency) MonitorPhoronix Test Suite System MonitoringAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE9001800270036004500Min: 1900 / Avg: 2940.37 / Max: 5340Min: 1900 / Avg: 2938.87 / Max: 5424Min: 1900 / Avg: 2941.89 / Max: 3518Min: 1900 / Avg: 2941.72 / Max: 3515Min: 1900 / Avg: 2933.61 / Max: 3515

CPU Power Consumption Monitor

Phoronix Test Suite System Monitoring

OpenBenchmarking.orgWattsCPU Power Consumption MonitorPhoronix Test Suite System MonitoringAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE130260390520650Min: 106.17 / Avg: 565.15 / Max: 756.46Min: 193.64 / Avg: 570.71 / Max: 756.46Min: 105.24 / Avg: 567.97 / Max: 756.24Min: 104.8 / Avg: 550.33 / Max: 744.38Min: 195.36 / Avg: 545.16 / Max: 744.31

CPU Temperature Monitor

Phoronix Test Suite System Monitoring

OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System MonitoringAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE1122334455Min: 29 / Avg: 48.95 / Max: 54Min: 33 / Avg: 49.66 / Max: 56Min: 28 / Avg: 49 / Max: 54Min: 29 / Avg: 47.07 / Max: 53Min: 30 / Avg: 46.72 / Max: 53


Phoronix Test Suite v10.8.4