oneDNN 3.0 Intel Sapphire Rapids AMX

2 x Intel Xeon Platinum 8490H oneDNN Intel AMX Sapphire Rapids benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2301158-NE-ONEDNN30I77&sor.

oneDNN 3.0 Intel Sapphire Rapids AMXProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE2 x Intel Xeon Platinum 8490H @ 3.50GHz (120 Cores / 240 Threads)Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS)Intel Device 1bce16 x 64 GB 4800MT/s Samsung M321R8GA0BB0-CQKEG2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007 + 3841GB Micron_9300_MTFDHAL3T8TDP + 960GB INTEL SSDSC2KG96ASPEEDVGA HDMI4 x Intel E810-C for QSFP + 2 x Intel X710 for 10GBASE-TUbuntu 22.046.1.4-060104-generic (x86_64)GNOME Shell 42.2X Server 1.21.1.31.2.204GCC 11.3.0 + Clang 14.0.0-1ubuntu1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x2b0000c0Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

oneDNN 3.0 Intel Sapphire Rapids AMXonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE0.5806460.4457795.737170.329961850.3462.170731.045116.483660.383785875.2842.171091.048366.588410.386636864.4593.067212.378926.8243316.5751862.4213.056122.383616.6481616.4589856.896OpenBenchmarking.org

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_COREAVX512_CORE_VNNI0.69011.38022.07032.76043.4505SE +/- 0.006207, N = 4SE +/- 0.004682, N = 3SE +/- 0.008260, N = 3SE +/- 0.025853, N = 3SE +/- 0.014891, N = 30.5806462.1707302.1710903.0561203.067210MIN: 0.47MIN: 2.01MIN: 2MIN: 2.84MIN: 2.851. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_VNNI190029173502AVX512_CORE190028583501AVX512_CORE_FP16190028443506AVX512_CORE_BF16190027913502AVX512_CORE_AMX190027663501OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX194534637AVX512_CORE_FP16194586721AVX512_CORE_BF16199591722AVX512_CORE197596744AVX512_CORE_VNNI201596744OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX29.041.447.0AVX512_CORE_BF1628.043.151.0AVX512_CORE_VNNI29.043.751.0AVX512_CORE30.044.952.0AVX512_CORE_FP1633.046.052.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1530456075

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_CORE_VNNIAVX512_CORE0.53631.07261.60892.14522.6815SE +/- 0.001544, N = 9SE +/- 0.003634, N = 9SE +/- 0.005254, N = 9SE +/- 0.007894, N = 9SE +/- 0.008552, N = 90.4457791.0451101.0483602.3789202.383610MIN: 0.38MIN: 1.01MIN: 1.01MIN: 2.3MIN: 2.291. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_BF16190031103502AVX512_CORE_VNNI190029523506AVX512_CORE_AMX190029183508AVX512_CORE190028673510AVX512_CORE_FP16190027553504OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_FP16196416702AVX512_CORE_BF16195425705AVX512_CORE_VNNI198428722AVX512_CORE200430721AVX512_CORE_AMX194440756OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_VNNI35.039.846.0AVX512_CORE_BF1635.040.245.0AVX512_CORE_AMX35.040.245.0AVX512_CORE35.040.747.0AVX512_CORE_FP1636.040.946.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1428425670

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_COREAVX512_CORE_VNNI246810SE +/- 0.11275, N = 15SE +/- 0.27236, N = 15SE +/- 0.33773, N = 15SE +/- 0.26087, N = 15SE +/- 0.31537, N = 155.737176.483666.588416.648166.82433MIN: 3.75MIN: 3.65MIN: 3.66MIN: 3.81MIN: 4.261. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190029853512AVX512_CORE_VNNI190029403515AVX512_CORE_BF16190029333508AVX512_CORE190029263513AVX512_CORE_FP16190029243510OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX106493604AVX512_CORE_FP16195508657AVX512_CORE_BF16197510652AVX512_CORE_VNNI193534705AVX512_CORE200536710OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX34.043.447.0AVX512_CORE_BF1634.044.247.0AVX512_CORE_FP1634.044.548.0AVX512_CORE_VNNI34.045.150.0AVX512_CORE34.045.249.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1428425670

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF16AVX512_COREAVX512_CORE_VNNI48121620SE +/- 0.002541, N = 4SE +/- 0.001848, N = 4SE +/- 0.003559, N = 4SE +/- 0.370395, N = 15SE +/- 0.135276, N = 40.3299610.3837850.38663616.45890016.575100MIN: 12.81MIN: 15.341. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_VNNI190030143512AVX512_CORE190029593508AVX512_CORE_AMX190028775340AVX512_CORE_FP16190028773502AVX512_CORE_BF16190027663509OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor13002600390052006500

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_AMX197485593AVX512_CORE_VNNI199486601AVX512_CORE197487609AVX512_CORE_BF16196507647AVX512_CORE_FP16199512647OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

MinAvgMaxAVX512_CORE_AMX35.042.546.0AVX512_CORE_VNNI36.042.749.0AVX512_CORE35.042.846.0AVX512_CORE_BF1635.043.748.0AVX512_CORE_FP1636.044.248.0OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature Monitor1428425670

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUAVX512_CORE_AMXAVX512_COREAVX512_CORE_VNNIAVX512_CORE_BF16AVX512_CORE_FP162004006008001000SE +/- 9.48, N = 4SE +/- 10.59, N = 15SE +/- 13.04, N = 15SE +/- 10.05, N = 15SE +/- 12.20, N = 12850.35856.90862.42864.46875.28MIN: 797.61MIN: 741.53MIN: 767.35MIN: 758.74MIN: 777.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxAVX512_CORE_AMX190029273518AVX512_CORE190029263515AVX512_CORE_VNNI190029243514AVX512_CORE_BF16190029233515AVX512_CORE_FP16190029233516OpenBenchmarking.orgMegahertz, More Is BetteroneDNN 3.0CPU Peak Freq (Highest CPU Core Frequency) Monitor10002000300040005000

oneDNN

CPU Power Consumption Monitor

MinAvgMaxAVX512_CORE_FP16194576732AVX512_CORE_AMX196576732AVX512_CORE_VNNI197576733AVX512_CORE201576734AVX512_CORE_BF16197577733OpenBenchmarking.orgWatts, Fewer Is BetteroneDNN 3.0CPU Power Consumption Monitor2004006008001000

oneDNN

CPU Temperature Monitor

OpenBenchmarking.orgCelsius, Fewer Is BetteroneDNN 3.0CPU Temperature MonitorAVX512_COREAVX512_CORE_VNNIAVX512_CORE_AMXAVX512_CORE_FP16AVX512_CORE_BF161122334455Min: 34 / Avg: 48.63 / Max: 53Min: 35 / Avg: 48.74 / Max: 53Min: 37 / Avg: 48.92 / Max: 53Min: 37 / Avg: 49.6 / Max: 55Min: 38 / Avg: 49.73 / Max: 54

CPU Peak Freq (Highest CPU Core Frequency) Monitor

Phoronix Test Suite System Monitoring

OpenBenchmarking.orgMegahertzCPU Peak Freq (Highest CPU Core Frequency) MonitorPhoronix Test Suite System MonitoringAVX512_COREAVX512_CORE_FP16AVX512_CORE_AMXAVX512_CORE_VNNIAVX512_CORE_BF169001800270036004500Min: 1900 / Avg: 2933.61 / Max: 3515Min: 1900 / Avg: 2938.87 / Max: 5424Min: 1900 / Avg: 2940.37 / Max: 5340Min: 1900 / Avg: 2941.72 / Max: 3515Min: 1900 / Avg: 2941.89 / Max: 3518

CPU Power Consumption Monitor

Phoronix Test Suite System Monitoring

OpenBenchmarking.orgWattsCPU Power Consumption MonitorPhoronix Test Suite System MonitoringAVX512_COREAVX512_CORE_VNNIAVX512_CORE_AMXAVX512_CORE_BF16AVX512_CORE_FP16130260390520650Min: 195.36 / Avg: 545.16 / Max: 744.31Min: 104.8 / Avg: 550.33 / Max: 744.38Min: 106.17 / Avg: 565.15 / Max: 756.46Min: 105.24 / Avg: 567.97 / Max: 756.24Min: 193.64 / Avg: 570.71 / Max: 756.46

CPU Temperature Monitor

Phoronix Test Suite System Monitoring

OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System MonitoringAVX512_COREAVX512_CORE_VNNIAVX512_CORE_AMXAVX512_CORE_BF16AVX512_CORE_FP161122334455Min: 30 / Avg: 46.72 / Max: 53Min: 29 / Avg: 47.07 / Max: 53Min: 29 / Avg: 48.95 / Max: 54Min: 28 / Avg: 49 / Max: 54Min: 33 / Avg: 49.66 / Max: 56


Phoronix Test Suite v10.8.4