2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1004D BIOS) and ASPEED on Clear Linux OS 38660 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2303310-NE-AMDGENOAO60 amd-genoa-onednn-31 - Phoronix Test Suite amd-genoa-onednn-31 2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1004D BIOS) and ASPEED on Clear Linux OS 38660 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2303310-NE-AMDGENOAO60&rdt&grr .
amd-genoa-onednn-31 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Display Server Compiler File-System Screen Resolution a b c 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1004D BIOS) AMD Device 14a4 1520GB 2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007 ASPEED Broadcom NetXtreme BCM5720 PCIe Clear Linux OS 38660 6.2.8-1293.native (x86_64) X Server GCC 12.2.1 20230323 releases/gcc-12.2.0-616-g1b6b7f214c + Clang 15.0.7 + LLVM 15.0.7 ext4 800x600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - FFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -malign-data=abi -fno-semantic-interposition -ftree-vectorize -ftree-loop-vectorize -Wl,--enable-new-dtags" CXXFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop -fvisibility-inlines-hidden -Wl,--enable-new-dtags -std=gnu++17" FCFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -malign-data=abi -fno-semantic-interposition -ftree-vectorize -ftree-loop-vectorize -Wl,-sort-common -Wl,--enable-new-dtags" CFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop" THEANO_FLAGS="floatX=float32,openmp=true,gcc.cxxflags="-ftree-vectorize -mavx"" Compiler Details - --build=x86_64-generic-linux --disable-libmpx --disable-libunwind-exceptions --disable-multiarch --disable-vtable-verify --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-clocale=gnu --enable-default-pie --enable-gnu-indirect-function --enable-gnu-indirect-function --enable-host-shared --enable-languages=c,c++,fortran,go,jit --enable-ld=default --enable-libstdcxx-pch --enable-linux-futex --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --exec-prefix=/usr --includedir=/usr/include --target=x86_64-generic-linux --with-arch=x86-64-v3 --with-gcc-major-version-only --with-glibc-version=2.35 --with-gnu-ld --with-isl --with-pic --with-ppl=yes --with-tune=sapphirerapids --with-zstd Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa101111 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
amd-genoa-onednn-31 onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU a b c 1302.83 1297.60 999.384 1007.475 998.040 5.16295 3.80454 3.81667 0.753564 1.54627 20.8426 2.23356 0.923951 0.521813 1.81517 0.362781 0.414999 0.282128 0.643195 0.946786 1323.12 1317.63 1001.27 1016.04 1022.78 5.56995 3.87903 1373.66 4.12976 0.816992 1.64919 20.6935 2.22682 0.946937 0.547057 1.75887 0.363114 0.408003 0.292579 0.640364 0.952539 1357.36 1424.56 974.37 968.294 985.498 5.13421 4.61093 1347.87 3.98273 0.824101 1.66125 20.68 2.21262 0.911944 0.530342 1.72273 0.360127 0.406869 0.277235 0.653813 0.954284 OpenBenchmarking.org
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b c 300 600 900 1200 1500 SE +/- 12.34, N = 15 1302.83 1323.12 1357.36 MIN: 1163.84 MIN: 1290.77 MIN: 1320.67 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b c 300 600 900 1200 1500 SE +/- 20.17, N = 12 1297.60 1317.63 1424.56 MIN: 1150.87 MIN: 1280.87 MIN: 1385.44 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 11.15, N = 3 999.38 1001.27 974.37 MIN: 954.31 MIN: 977.96 MIN: 948 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 10.72, N = 3 1007.48 1016.04 968.29 MIN: 964.57 MIN: 994.64 MIN: 943.32 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 12.91, N = 3 998.04 1022.78 985.50 MIN: 953.8 MIN: 998.27 MIN: 961.27 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b c 1.2532 2.5064 3.7596 5.0128 6.266 SE +/- 0.06811, N = 15 5.16295 5.56995 5.13421 MIN: 3.15 MIN: 4.66 MIN: 4.52 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU a b c 1.0375 2.075 3.1125 4.15 5.1875 SE +/- 0.09341, N = 15 3.80454 3.87903 4.61093 MIN: 2.37 MIN: 2.81 MIN: 2.89 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU b c 300 600 900 1200 1500 1373.66 1347.87 MIN: 1318.03 MIN: 1311.94 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b c 0.9292 1.8584 2.7876 3.7168 4.646 SE +/- 0.05597, N = 12 3.81667 4.12976 3.98273 MIN: 2.1 MIN: 2.56 MIN: 2.43 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b c 0.1854 0.3708 0.5562 0.7416 0.927 SE +/- 0.008221, N = 15 0.753564 0.816992 0.824101 MIN: 0.59 MIN: 0.69 MIN: 0.68 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU a b c 0.3738 0.7476 1.1214 1.4952 1.869 SE +/- 0.06611, N = 12 1.54627 1.64919 1.66125 MIN: 1.08 MIN: 1.33 MIN: 1.32 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b c 5 10 15 20 25 SE +/- 0.22, N = 3 20.84 20.69 20.68 MIN: 17.94 MIN: 17.76 MIN: 18.18 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a b c 0.5026 1.0052 1.5078 2.0104 2.513 SE +/- 0.01725, N = 3 2.23356 2.22682 2.21262 MIN: 1.85 MIN: 1.9 MIN: 1.9 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b c 0.2131 0.4262 0.6393 0.8524 1.0655 SE +/- 0.012327, N = 3 0.923951 0.946937 0.911944 MIN: 0.72 MIN: 0.72 MIN: 0.73 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b c 0.1231 0.2462 0.3693 0.4924 0.6155 SE +/- 0.005190, N = 6 0.521813 0.547057 0.530342 MIN: 0.41 MIN: 0.44 MIN: 0.41 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b c 0.4084 0.8168 1.2252 1.6336 2.042 SE +/- 0.01079, N = 3 1.81517 1.75887 1.72273 MIN: 1.4 MIN: 1.36 MIN: 1.42 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b c 0.0817 0.1634 0.2451 0.3268 0.4085 SE +/- 0.003988, N = 4 0.362781 0.363114 0.360127 MIN: 0.27 MIN: 0.28 MIN: 0.27 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU a b c 0.0934 0.1868 0.2802 0.3736 0.467 SE +/- 0.004416, N = 3 0.414999 0.408003 0.406869 MIN: 0.33 MIN: 0.34 MIN: 0.33 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b c 0.0658 0.1316 0.1974 0.2632 0.329 SE +/- 0.003064, N = 5 0.282128 0.292579 0.277235 MIN: 0.26 MIN: 0.25 MIN: 0.26 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU a b c 0.1471 0.2942 0.4413 0.5884 0.7355 SE +/- 0.003091, N = 3 0.643195 0.640364 0.653813 MIN: 0.54 MIN: 0.55 MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b c 0.2147 0.4294 0.6441 0.8588 1.0735 SE +/- 0.001343, N = 3 0.946786 0.952539 0.954284 MIN: 0.89 MIN: 0.9 MIN: 0.9 1. (CXX) g++ options: -O3 -march=native -pipe -fexceptions -m64 -ffat-lto-objects -fno-trapping-math -mrelax-cmpxchg-loop -std=gnu++17 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.4