2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1004D BIOS) and ASPEED on Clear Linux OS 38660 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2303310-NE-AMDGENOAO60 amd-genoa-onednn-31 - Phoronix Test Suite amd-genoa-onednn-31 2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1004D BIOS) and ASPEED on Clear Linux OS 38660 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2303310-NE-AMDGENOAO60&rdt&rro .
amd-genoa-onednn-31 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Display Server Compiler File-System Screen Resolution a b c 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1004D BIOS) AMD Device 14a4 1520GB 2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007 ASPEED Broadcom NetXtreme BCM5720 PCIe Clear Linux OS 38660 6.2.8-1293.native (x86_64) X Server GCC 12.2.1 20230323 releases/gcc-12.2.0-616-g1b6b7f214c + Clang 15.0.7 + LLVM 15.0.7 ext4 800x600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - FFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -malign-data=abi -fno-semantic-interposition -ftree-vectorize -ftree-loop-vectorize -Wl,--enable-new-dtags" CXXFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop -fvisibility-inlines-hidden -Wl,--enable-new-dtags -std=gnu++17" FCFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -malign-data=abi -fno-semantic-interposition -ftree-vectorize -ftree-loop-vectorize -Wl,-sort-common -Wl,--enable-new-dtags" CFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop" THEANO_FLAGS="floatX=float32,openmp=true,gcc.cxxflags="-ftree-vectorize -mavx"" Compiler Details - --build=x86_64-generic-linux --disable-libmpx --disable-libunwind-exceptions --disable-multiarch --disable-vtable-verify --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-clocale=gnu --enable-default-pie --enable-gnu-indirect-function --enable-gnu-indirect-function --enable-host-shared --enable-languages=c,c++,fortran,go,jit --enable-ld=default --enable-libstdcxx-pch --enable-linux-futex --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --exec-prefix=/usr --includedir=/usr/include --target=x86_64-generic-linux --with-arch=x86-64-v3 --with-gcc-major-version-only --with-glibc-version=2.35 --with-gnu-ld --with-isl --with-pic --with-ppl=yes --with-tune=sapphirerapids --with-zstd Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa101111 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
amd-genoa-onednn-31 onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU a b c 5.16295 1.81517 3.81667 0.753564 3.80454 1.54627 0.521813 20.8426 0.946786 0.362781 0.923951 0.282128 999.384 1297.60 1007.475 0.414999 2.23356 0.643195 1302.83 998.040 5.56995 1.75887 4.12976 0.816992 3.87903 1.64919 0.547057 20.6935 0.952539 0.363114 0.946937 0.292579 1001.27 1317.63 1016.04 0.408003 2.22682 0.640364 1323.12 1022.78 1373.66 5.13421 1.72273 3.98273 0.824101 4.61093 1.66125 0.530342 20.68 0.954284 0.360127 0.911944 0.277235 974.37 1424.56 968.294 0.406869 2.21262 0.653813 1357.36 985.498 1347.87 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU c b a 1.2532 2.5064 3.7596 5.0128 6.266 SE +/- 0.06811, N = 15 5.13421 5.56995 5.16295 MIN: 4.52 MIN: 4.66 MIN: 3.15 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU c b a 0.4084 0.8168 1.2252 1.6336 2.042 SE +/- 0.01079, N = 3 1.72273 1.75887 1.81517 MIN: 1.42 MIN: 1.36 MIN: 1.4 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU c b a 0.9292 1.8584 2.7876 3.7168 4.646 SE +/- 0.05597, N = 12 3.98273 4.12976 3.81667 MIN: 2.43 MIN: 2.56 MIN: 2.1 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU c b a 0.1854 0.3708 0.5562 0.7416 0.927 SE +/- 0.008221, N = 15 0.824101 0.816992 0.753564 MIN: 0.68 MIN: 0.69 MIN: 0.59 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU c b a 1.0375 2.075 3.1125 4.15 5.1875 SE +/- 0.09341, N = 15 4.61093 3.87903 3.80454 MIN: 2.89 MIN: 2.81 MIN: 2.37 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU c b a 0.3738 0.7476 1.1214 1.4952 1.869 SE +/- 0.06611, N = 12 1.66125 1.64919 1.54627 MIN: 1.32 MIN: 1.33 MIN: 1.08 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU c b a 0.1231 0.2462 0.3693 0.4924 0.6155 SE +/- 0.005190, N = 6 0.530342 0.547057 0.521813 MIN: 0.41 MIN: 0.44 MIN: 0.41 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU c b a 5 10 15 20 25 SE +/- 0.22, N = 3 20.68 20.69 20.84 MIN: 18.18 MIN: 17.76 MIN: 17.94 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU c b a 0.2147 0.4294 0.6441 0.8588 1.0735 SE +/- 0.001343, N = 3 0.954284 0.952539 0.946786 MIN: 0.9 MIN: 0.9 MIN: 0.89 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU c b a 0.0817 0.1634 0.2451 0.3268 0.4085 SE +/- 0.003988, N = 4 0.360127 0.363114 0.362781 MIN: 0.27 MIN: 0.28 MIN: 0.27 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU c b a 0.2131 0.4262 0.6393 0.8524 1.0655 SE +/- 0.012327, N = 3 0.911944 0.946937 0.923951 MIN: 0.73 MIN: 0.72 MIN: 0.72 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU c b a 0.0658 0.1316 0.1974 0.2632 0.329 SE +/- 0.003064, N = 5 0.277235 0.292579 0.282128 MIN: 0.26 MIN: 0.25 MIN: 0.26 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU c b a 200 400 600 800 1000 SE +/- 11.15, N = 3 974.37 1001.27 999.38 MIN: 948 MIN: 977.96 MIN: 954.31 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU c b a 300 600 900 1200 1500 SE +/- 20.17, N = 12 1424.56 1317.63 1297.60 MIN: 1385.44 MIN: 1280.87 MIN: 1150.87 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU c b a 200 400 600 800 1000 SE +/- 10.72, N = 3 968.29 1016.04 1007.48 MIN: 943.32 MIN: 994.64 MIN: 964.57 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU c b a 0.0934 0.1868 0.2802 0.3736 0.467 SE +/- 0.004416, N = 3 0.406869 0.408003 0.414999 MIN: 0.33 MIN: 0.34 MIN: 0.33 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU c b a 0.5026 1.0052 1.5078 2.0104 2.513 SE +/- 0.01725, N = 3 2.21262 2.22682 2.23356 MIN: 1.9 MIN: 1.9 MIN: 1.85 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU c b a 0.1471 0.2942 0.4413 0.5884 0.7355 SE +/- 0.003091, N = 3 0.653813 0.640364 0.643195 MIN: 0.55 MIN: 0.55 MIN: 0.54 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU c b a 300 600 900 1200 1500 SE +/- 12.34, N = 15 1357.36 1323.12 1302.83 MIN: 1320.67 MIN: 1290.77 MIN: 1163.84 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU c b a 200 400 600 800 1000 SE +/- 12.91, N = 3 985.50 1022.78 998.04 MIN: 961.27 MIN: 998.27 MIN: 953.8 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU c b 300 600 900 1200 1500 1347.87 1373.66 MIN: 1311.94 MIN: 1318.03 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.4