9684x ne Tests for a future article. 2 x AMD EPYC 9684X 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2310150-NE-9684XNE5490&grw&sro .
9684x ne Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b 2 x AMD EPYC 9684X 96-Core @ 2.55GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 1520GB 3201GB Micron_7450_MTFDKCC3T2TFS ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 23.10 6.6.0-060600rc1-generic (x86_64) GNOME Shell X Server 1.21.1.7 GCC 13.2.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-nEN1TP/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-nEN1TP/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10113e Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9684x ne onednn: IP Shapes 3D - f32 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 2400 onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 1200 onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 240 embree: Pathtracer - Crown embree: Pathtracer ISPC - Crown embree: Pathtracer - Asian Dragon embree: Pathtracer - Asian Dragon Obj embree: Pathtracer ISPC - Asian Dragon embree: Pathtracer ISPC - Asian Dragon Obj oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only openvkl: vklBenchmarkCPU ISPC openvkl: vklBenchmarkCPU Scalar a b 4.40651 97.678 8.24195 11.45807 0.851887 32.1485 3.44728 0.480803 30.3653 0.948793 40.483 0.319810 0.777316 0.275663 1862.61 2131.26 1817.58 0.372487 1.56883 0.625811 2070.84 1860.35 2036.88 3.574 192.4860 200.4923 212.4785 188.6699 234.8068 201.5342 3.78 3.80 1.89 3530 1494 4.19777 97.448 6.89651 16.9496 0.818265 33.9112 3.55742 0.509185 30.5588 0.969145 39.752 0.326161 0.7716 0.278455 1852.73 2062.29 1798.18 0.386826 1.54248 0.617057 2104.98 1831.26 2096.03 3.079 192.1728 199.7125 212.0222 189.7142 233.9207 200.7999 3.80 3.78 1.90 3529 1494 OpenBenchmarking.org
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b 0.9915 1.983 2.9745 3.966 4.9575 SE +/- 0.01148, N = 3 4.40651 4.19777 MIN: 3.53 MIN: 3.46 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 a b 20 40 60 80 100 SE +/- 3.79, N = 15 97.68 97.45 1. (CXX) g++ options: -O3 -fopenmp
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b 2 4 6 8 10 SE +/- 0.61719, N = 15 8.24195 6.89651 MIN: 3.98 MIN: 5.33 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b 4 8 12 16 20 SE +/- 1.01, N = 15 11.46 16.95 MIN: 4.44 MIN: 11.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b 0.1917 0.3834 0.5751 0.7668 0.9585 SE +/- 0.007085, N = 3 0.851887 0.818265 MIN: 0.7 MIN: 0.69 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU a b 8 16 24 32 40 SE +/- 0.61, N = 15 32.15 33.91 MIN: 20.73 MIN: 25.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU a b 0.8004 1.6008 2.4012 3.2016 4.002 SE +/- 0.02918, N = 15 3.44728 3.55742 MIN: 2.55 MIN: 2.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b 0.1146 0.2292 0.3438 0.4584 0.573 SE +/- 0.004181, N = 3 0.480803 0.509185 MIN: 0.41 MIN: 0.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b 7 14 21 28 35 SE +/- 0.11, N = 3 30.37 30.56 MIN: 28.2 MIN: 28.84 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b 0.2181 0.4362 0.6543 0.8724 1.0905 SE +/- 0.003774, N = 3 0.948793 0.969145 MIN: 0.88 MIN: 0.88 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 a b 9 18 27 36 45 SE +/- 2.18, N = 12 40.48 39.75 1. (CXX) g++ options: -O3 -fopenmp
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b 0.0734 0.1468 0.2202 0.2936 0.367 SE +/- 0.002729, N = 15 0.319810 0.326161 MIN: 0.24 MIN: 0.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b 0.1749 0.3498 0.5247 0.6996 0.8745 SE +/- 0.004465, N = 3 0.777316 0.771600 MIN: 0.57 MIN: 0.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b 0.0627 0.1254 0.1881 0.2508 0.3135 SE +/- 0.001684, N = 3 0.275663 0.278455 MIN: 0.26 MIN: 0.27 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 14.79, N = 9 1862.61 1852.73 MIN: 1754.2 MIN: 1829.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b 500 1000 1500 2000 2500 SE +/- 13.26, N = 3 2131.26 2062.29 MIN: 2090.72 MIN: 2023.87 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 24.34, N = 3 1817.58 1798.18 MIN: 1756.95 MIN: 1779.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU a b 0.087 0.174 0.261 0.348 0.435 SE +/- 0.004142, N = 4 0.372487 0.386826 MIN: 0.33 MIN: 0.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a b 0.353 0.706 1.059 1.412 1.765 SE +/- 0.01413, N = 3 1.56883 1.54248 MIN: 1.26 MIN: 1.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU a b 0.1408 0.2816 0.4224 0.5632 0.704 SE +/- 0.002542, N = 3 0.625811 0.617057 MIN: 0.55 MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b 500 1000 1500 2000 2500 SE +/- 25.85, N = 3 2070.84 2104.98 MIN: 2009.72 MIN: 2079.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 17.10, N = 3 1860.35 1831.26 MIN: 1812.41 MIN: 1812.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 13.33, N = 3 2036.88 2096.03 MIN: 1993.78 MIN: 2065.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 240 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 240 a b 0.8042 1.6084 2.4126 3.2168 4.021 SE +/- 0.189, N = 15 3.574 3.079 1. (CXX) g++ options: -O3 -fopenmp
Embree Binary: Pathtracer - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Crown a b 40 80 120 160 200 SE +/- 0.47, N = 3 192.49 192.17 MIN: 181.49 / MAX: 212.32 MIN: 183.96 / MAX: 208.29
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Crown a b 40 80 120 160 200 SE +/- 0.51, N = 3 200.49 199.71 MIN: 188.57 / MAX: 219.64 MIN: 188.89 / MAX: 218.34
Embree Binary: Pathtracer - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Asian Dragon a b 50 100 150 200 250 SE +/- 0.40, N = 3 212.48 212.02 MIN: 203.46 / MAX: 225.48 MIN: 205.37 / MAX: 221.09
Embree Binary: Pathtracer - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Asian Dragon Obj a b 40 80 120 160 200 SE +/- 1.11, N = 3 188.67 189.71 MIN: 178.28 / MAX: 200.84 MIN: 183.29 / MAX: 200.72
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon a b 50 100 150 200 250 SE +/- 0.31, N = 3 234.81 233.92 MIN: 223.76 / MAX: 253.29 MIN: 224.01 / MAX: 248.26
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon Obj a b 40 80 120 160 200 SE +/- 0.75, N = 3 201.53 200.80 MIN: 189 / MAX: 214.5 MIN: 190.91 / MAX: 211.99
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.855 1.71 2.565 3.42 4.275 SE +/- 0.02, N = 3 3.78 3.80
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.855 1.71 2.565 3.42 4.275 SE +/- 0.01, N = 3 3.80 3.78
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b 0.4275 0.855 1.2825 1.71 2.1375 SE +/- 0.00, N = 3 1.89 1.90
OpenVKL Benchmark: vklBenchmarkCPU ISPC OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU ISPC a b 800 1600 2400 3200 4000 SE +/- 1.20, N = 3 3530 3529 MIN: 303 / MAX: 39315 MIN: 305 / MAX: 39378
OpenVKL Benchmark: vklBenchmarkCPU Scalar OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU Scalar a b 300 600 900 1200 1500 SE +/- 1.15, N = 3 1494 1494 MIN: 113 / MAX: 23868 MIN: 114 / MAX: 23822
Phoronix Test Suite v10.8.5