9684x ne Tests for a future article. 2 x AMD EPYC 9684X 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2310150-NE-9684XNE5490&sro&grr .
9684x ne Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b 2 x AMD EPYC 9684X 96-Core @ 2.55GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 1520GB 3201GB Micron_7450_MTFDKCC3T2TFS ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 23.10 6.6.0-060600rc1-generic (x86_64) GNOME Shell X Server 1.21.1.7 GCC 13.2.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-nEN1TP/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-nEN1TP/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10113e Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9684x ne openvkl: vklBenchmarkCPU Scalar openvkl: vklBenchmarkCPU ISPC easywave: e2Asean Grid + BengkuluSept2007 Source - 2400 onednn: Recurrent Neural Network Training - f32 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 1200 onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 240 oidn: RTLightmap.hdr.4096x4096 - CPU-Only embree: Pathtracer - Asian Dragon Obj embree: Pathtracer ISPC - Asian Dragon Obj onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU embree: Pathtracer - Crown embree: Pathtracer - Asian Dragon embree: Pathtracer ISPC - Crown embree: Pathtracer ISPC - Asian Dragon onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU a b 1494 3530 97.678 1862.61 40.483 1860.35 1817.58 2131.26 2070.84 2036.88 32.1485 8.24195 11.45807 3.44728 0.319810 30.3653 1.56883 0.777316 3.574 1.89 188.6699 201.5342 4.40651 0.851887 3.78 3.80 0.372487 0.480803 192.4860 212.4785 200.4923 234.8068 0.275663 0.948793 0.625811 1494 3529 97.448 1852.73 39.752 1831.26 1798.18 2062.29 2104.98 2096.03 33.9112 6.89651 16.9496 3.55742 0.326161 30.5588 1.54248 0.7716 3.079 1.90 189.7142 200.7999 4.19777 0.818265 3.80 3.78 0.386826 0.509185 192.1728 212.0222 199.7125 233.9207 0.278455 0.969145 0.617057 OpenBenchmarking.org
OpenVKL Benchmark: vklBenchmarkCPU Scalar OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU Scalar a b 300 600 900 1200 1500 SE +/- 1.15, N = 3 1494 1494 MIN: 113 / MAX: 23868 MIN: 114 / MAX: 23822
OpenVKL Benchmark: vklBenchmarkCPU ISPC OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU ISPC a b 800 1600 2400 3200 4000 SE +/- 1.20, N = 3 3530 3529 MIN: 303 / MAX: 39315 MIN: 305 / MAX: 39378
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 a b 20 40 60 80 100 SE +/- 3.79, N = 15 97.68 97.45 1. (CXX) g++ options: -O3 -fopenmp
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 14.79, N = 9 1862.61 1852.73 MIN: 1754.2 MIN: 1829.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 a b 9 18 27 36 45 SE +/- 2.18, N = 12 40.48 39.75 1. (CXX) g++ options: -O3 -fopenmp
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 17.10, N = 3 1860.35 1831.26 MIN: 1812.41 MIN: 1812.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 24.34, N = 3 1817.58 1798.18 MIN: 1756.95 MIN: 1779.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b 500 1000 1500 2000 2500 SE +/- 13.26, N = 3 2131.26 2062.29 MIN: 2090.72 MIN: 2023.87 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b 500 1000 1500 2000 2500 SE +/- 25.85, N = 3 2070.84 2104.98 MIN: 2009.72 MIN: 2079.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b 400 800 1200 1600 2000 SE +/- 13.33, N = 3 2036.88 2096.03 MIN: 1993.78 MIN: 2065.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU a b 8 16 24 32 40 SE +/- 0.61, N = 15 32.15 33.91 MIN: 20.73 MIN: 25.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b 2 4 6 8 10 SE +/- 0.61719, N = 15 8.24195 6.89651 MIN: 3.98 MIN: 5.33 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b 4 8 12 16 20 SE +/- 1.01, N = 15 11.46 16.95 MIN: 4.44 MIN: 11.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU a b 0.8004 1.6008 2.4012 3.2016 4.002 SE +/- 0.02918, N = 15 3.44728 3.55742 MIN: 2.55 MIN: 2.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b 0.0734 0.1468 0.2202 0.2936 0.367 SE +/- 0.002729, N = 15 0.319810 0.326161 MIN: 0.24 MIN: 0.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b 7 14 21 28 35 SE +/- 0.11, N = 3 30.37 30.56 MIN: 28.2 MIN: 28.84 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a b 0.353 0.706 1.059 1.412 1.765 SE +/- 0.01413, N = 3 1.56883 1.54248 MIN: 1.26 MIN: 1.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b 0.1749 0.3498 0.5247 0.6996 0.8745 SE +/- 0.004465, N = 3 0.777316 0.771600 MIN: 0.57 MIN: 0.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 240 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 240 a b 0.8042 1.6084 2.4126 3.2168 4.021 SE +/- 0.189, N = 15 3.574 3.079 1. (CXX) g++ options: -O3 -fopenmp
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b 0.4275 0.855 1.2825 1.71 2.1375 SE +/- 0.00, N = 3 1.89 1.90
Embree Binary: Pathtracer - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Asian Dragon Obj a b 40 80 120 160 200 SE +/- 1.11, N = 3 188.67 189.71 MIN: 178.28 / MAX: 200.84 MIN: 183.29 / MAX: 200.72
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon Obj a b 40 80 120 160 200 SE +/- 0.75, N = 3 201.53 200.80 MIN: 189 / MAX: 214.5 MIN: 190.91 / MAX: 211.99
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b 0.9915 1.983 2.9745 3.966 4.9575 SE +/- 0.01148, N = 3 4.40651 4.19777 MIN: 3.53 MIN: 3.46 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b 0.1917 0.3834 0.5751 0.7668 0.9585 SE +/- 0.007085, N = 3 0.851887 0.818265 MIN: 0.7 MIN: 0.69 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.855 1.71 2.565 3.42 4.275 SE +/- 0.02, N = 3 3.78 3.80
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.855 1.71 2.565 3.42 4.275 SE +/- 0.01, N = 3 3.80 3.78
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU a b 0.087 0.174 0.261 0.348 0.435 SE +/- 0.004142, N = 4 0.372487 0.386826 MIN: 0.33 MIN: 0.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b 0.1146 0.2292 0.3438 0.4584 0.573 SE +/- 0.004181, N = 3 0.480803 0.509185 MIN: 0.41 MIN: 0.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Embree Binary: Pathtracer - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Crown a b 40 80 120 160 200 SE +/- 0.47, N = 3 192.49 192.17 MIN: 181.49 / MAX: 212.32 MIN: 183.96 / MAX: 208.29
Embree Binary: Pathtracer - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Asian Dragon a b 50 100 150 200 250 SE +/- 0.40, N = 3 212.48 212.02 MIN: 203.46 / MAX: 225.48 MIN: 205.37 / MAX: 221.09
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Crown a b 40 80 120 160 200 SE +/- 0.51, N = 3 200.49 199.71 MIN: 188.57 / MAX: 219.64 MIN: 188.89 / MAX: 218.34
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon a b 50 100 150 200 250 SE +/- 0.31, N = 3 234.81 233.92 MIN: 223.76 / MAX: 253.29 MIN: 224.01 / MAX: 248.26
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b 0.0627 0.1254 0.1881 0.2508 0.3135 SE +/- 0.001684, N = 3 0.275663 0.278455 MIN: 0.26 MIN: 0.27 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b 0.2181 0.4362 0.6543 0.8724 1.0905 SE +/- 0.003774, N = 3 0.948793 0.969145 MIN: 0.88 MIN: 0.88 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU a b 0.1408 0.2816 0.4224 0.5632 0.704 SE +/- 0.002542, N = 3 0.625811 0.617057 MIN: 0.55 MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.5