amd-genoa-onednn-31

2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1004D BIOS) and ASPEED on Clear Linux OS 38660 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2303310-NE-AMDGENOAO60.

amd-genoa-onednn-31 ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDisplay ServerCompilerFile-SystemScreen Resolutionabc2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads)AMD Titanite_4G (RTI1004D BIOS)AMD Device 14a41520GB2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007ASPEEDBroadcom NetXtreme BCM5720 PCIeClear Linux OS 386606.2.8-1293.native (x86_64)X ServerGCC 12.2.1 20230323 releases/gcc-12.2.0-616-g1b6b7f214c + Clang 15.0.7 + LLVM 15.0.7ext4800x600OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysEnvironment Details- FFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -malign-data=abi -fno-semantic-interposition -ftree-vectorize -ftree-loop-vectorize -Wl,--enable-new-dtags" CXXFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop -fvisibility-inlines-hidden -Wl,--enable-new-dtags -std=gnu++17" FCFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -malign-data=abi -fno-semantic-interposition -ftree-vectorize -ftree-loop-vectorize -Wl,-sort-common -Wl,--enable-new-dtags" CFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop" THEANO_FLAGS="floatX=float32,openmp=true,gcc.cxxflags="-ftree-vectorize -mavx"" Compiler Details- --build=x86_64-generic-linux --disable-libmpx --disable-libunwind-exceptions --disable-multiarch --disable-vtable-verify --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-clocale=gnu --enable-default-pie --enable-gnu-indirect-function --enable-gnu-indirect-function --enable-host-shared --enable-languages=c,c++,fortran,go,jit --enable-ld=default --enable-libstdcxx-pch --enable-linux-futex --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --exec-prefix=/usr --includedir=/usr/include --target=x86_64-generic-linux --with-arch=x86-64-v3 --with-gcc-major-version-only --with-glibc-version=2.35 --with-gnu-ld --with-isl --with-pic --with-ppl=yes --with-tune=sapphirerapids --with-zstd Processor Details- Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa101111Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

amd-genoa-onednn-31 onednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUabc5.162951.815173.816670.7535643.804541.546270.52181320.84260.9467860.3627810.9239510.282128999.3841297.601007.4750.4149992.233560.6431951302.83998.0405.569951.758874.129760.8169923.879031.649190.54705720.69350.9525390.3631140.9469370.2925791001.271317.631016.040.4080032.226820.6403641323.121022.781373.665.134211.722733.982730.8241014.610931.661250.53034220.680.9542840.3601270.9119440.277235974.371424.56968.2940.4068692.212620.6538131357.36985.4981347.87OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUabc1.25322.50643.75965.01286.266SE +/- 0.06811, N = 155.162955.569955.13421MIN: 3.15MIN: 4.66MIN: 4.521. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUabc0.40840.81681.22521.63362.042SE +/- 0.01079, N = 31.815171.758871.72273MIN: 1.4MIN: 1.36MIN: 1.421. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUabc0.92921.85842.78763.71684.646SE +/- 0.05597, N = 123.816674.129763.98273MIN: 2.1MIN: 2.56MIN: 2.431. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUabc0.18540.37080.55620.74160.927SE +/- 0.008221, N = 150.7535640.8169920.824101MIN: 0.59MIN: 0.69MIN: 0.681. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUabc1.03752.0753.11254.155.1875SE +/- 0.09341, N = 153.804543.879034.61093MIN: 2.37MIN: 2.81MIN: 2.891. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUabc0.37380.74761.12141.49521.869SE +/- 0.06611, N = 121.546271.649191.66125MIN: 1.08MIN: 1.33MIN: 1.321. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUabc0.12310.24620.36930.49240.6155SE +/- 0.005190, N = 60.5218130.5470570.530342MIN: 0.41MIN: 0.44MIN: 0.411. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUabc510152025SE +/- 0.22, N = 320.8420.6920.68MIN: 17.94MIN: 17.76MIN: 18.181. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUabc0.21470.42940.64410.85881.0735SE +/- 0.001343, N = 30.9467860.9525390.954284MIN: 0.89MIN: 0.9MIN: 0.91. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUabc0.08170.16340.24510.32680.4085SE +/- 0.003988, N = 40.3627810.3631140.360127MIN: 0.27MIN: 0.28MIN: 0.271. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUabc0.21310.42620.63930.85241.0655SE +/- 0.012327, N = 30.9239510.9469370.911944MIN: 0.72MIN: 0.72MIN: 0.731. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUabc0.06580.13160.19740.26320.329SE +/- 0.003064, N = 50.2821280.2925790.277235MIN: 0.26MIN: 0.25MIN: 0.261. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUabc2004006008001000SE +/- 11.15, N = 3999.381001.27974.37MIN: 954.31MIN: 977.96MIN: 9481. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUabc30060090012001500SE +/- 20.17, N = 121297.601317.631424.56MIN: 1150.87MIN: 1280.87MIN: 1385.441. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUabc2004006008001000SE +/- 10.72, N = 31007.481016.04968.29MIN: 964.57MIN: 994.64MIN: 943.321. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUabc0.09340.18680.28020.37360.467SE +/- 0.004416, N = 30.4149990.4080030.406869MIN: 0.33MIN: 0.34MIN: 0.331. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUabc0.50261.00521.50782.01042.513SE +/- 0.01725, N = 32.233562.226822.21262MIN: 1.85MIN: 1.9MIN: 1.91. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUabc0.14710.29420.44130.58840.7355SE +/- 0.003091, N = 30.6431950.6403640.653813MIN: 0.54MIN: 0.55MIN: 0.551. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUabc30060090012001500SE +/- 12.34, N = 151302.831323.121357.36MIN: 1163.84MIN: 1290.77MIN: 1320.671. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUabc2004006008001000SE +/- 12.91, N = 3998.041022.78985.50MIN: 953.8MIN: 998.27MIN: 961.271. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUbc300600900120015001373.661347.87MIN: 1318.03MIN: 1311.941. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread


Phoronix Test Suite v10.8.4