sapphire rapids october Tests for a future article. 2 x Intel Xeon Platinum 8490H testing with a Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2310248-NE-SAPPHIRER96&grs&sro .
sapphire rapids october Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b 2 x Intel Xeon Platinum 8490H @ 3.50GHz (120 Cores / 240 Threads) Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS) Intel Device 1bce 1008GB 3201GB Micron_7450_MTFDKCC3T2TFS ASPEED 2 x Intel X710 for 10GBASE-T Ubuntu 23.10 6.6.0-rc5-phx-patched (x86_64) GNOME Shell 45.0 X Server 1.21.1.7 GCC 13.2.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x2b0004b1 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
sapphire rapids october onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 240 onednn: Recurrent Neural Network Inference - f32 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 2400 onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: IP Shapes 3D - f32 - CPU oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU oidn: RTLightmap.hdr.4096x4096 - CPU-Only embree: Pathtracer - Crown onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU easywave: e2Asean Grid + BengkuluSept2007 Source - 1200 oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU embree: Pathtracer - Asian Dragon Obj openvkl: vklBenchmarkCPU ISPC embree: Pathtracer ISPC - Asian Dragon Obj embree: Pathtracer - Asian Dragon onednn: Recurrent Neural Network Inference - u8s8f32 - CPU openvkl: vklBenchmarkCPU Scalar embree: Pathtracer ISPC - Crown embree: Pathtracer ISPC - Asian Dragon a b 2.28356 1.8459 2.98 774.627 125.737 0.414613 0.434631 0.783853 0.327592 1073.23 0.404445 15.8707 2.48058 4.45 1063.85 1099.08 3.29789 0.223134 0.515277 0.7279 2.10 109.4393 0.43385 52.6 4.41 808.375 9.95622 114.1228 2684 131.427 126.5604 831.247 1022 123.3553 151.613 2.82237 2.20049 3.209 823.916 132.55 0.398996 0.418805 0.811661 0.339075 1108.59 0.416529 15.4782 2.54078 4.37 1081.47 1081.21 3.25571 0.220395 0.520272 0.720924 2.12 110.4623 0.437447 52.99 4.44 813.506 9.89765 114.7157 2672 131.8973 126.9798 828.745 1021 123.2485 151.6824 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b 0.635 1.27 1.905 2.54 3.175 2.28356 2.82237 MIN: 2.02 MIN: 2.52 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b 0.4951 0.9902 1.4853 1.9804 2.4755 1.84590 2.20049 MIN: 1.64 MIN: 1.94 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 240 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 240 a b 0.722 1.444 2.166 2.888 3.61 2.980 3.209 1. (CXX) g++ options: -O3 -fopenmp
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b 200 400 600 800 1000 774.63 823.92 MIN: 764.75 MIN: 808.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 a b 30 60 90 120 150 125.74 132.55 1. (CXX) g++ options: -O3 -fopenmp
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b 0.0933 0.1866 0.2799 0.3732 0.4665 0.414613 0.398996 MIN: 0.32 MIN: 0.31 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a b 0.0978 0.1956 0.2934 0.3912 0.489 0.434631 0.418805 MIN: 0.34 MIN: 0.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b 0.1826 0.3652 0.5478 0.7304 0.913 0.783853 0.811661 MIN: 0.64 MIN: 0.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU a b 0.0763 0.1526 0.2289 0.3052 0.3815 0.327592 0.339075 MIN: 0.28 MIN: 0.29 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b 200 400 600 800 1000 1073.23 1108.59 MIN: 1056.67 MIN: 1086.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b 0.0937 0.1874 0.2811 0.3748 0.4685 0.404445 0.416529 MIN: 0.36 MIN: 0.36 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b 4 8 12 16 20 15.87 15.48 MIN: 13.44 MIN: 13.41 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b 0.5717 1.1434 1.7151 2.2868 2.8585 2.48058 2.54078 MIN: 2.19 MIN: 2.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b 1.0013 2.0026 3.0039 4.0052 5.0065 4.45 4.37
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b 200 400 600 800 1000 1063.85 1081.47 MIN: 1047.66 MIN: 1064.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b 200 400 600 800 1000 1099.08 1081.21 MIN: 1082.67 MIN: 1062.88 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU a b 0.742 1.484 2.226 2.968 3.71 3.29789 3.25571 MIN: 2.62 MIN: 2.52 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b 0.0502 0.1004 0.1506 0.2008 0.251 0.223134 0.220395 MIN: 0.19 MIN: 0.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b 0.1171 0.2342 0.3513 0.4684 0.5855 0.515277 0.520272 MIN: 0.45 MIN: 0.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b 0.1638 0.3276 0.4914 0.6552 0.819 0.727900 0.720924 MIN: 0.66 MIN: 0.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b 0.477 0.954 1.431 1.908 2.385 2.10 2.12
Embree Binary: Pathtracer - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Crown a b 20 40 60 80 100 109.44 110.46 MIN: 99.83 / MAX: 123.71 MIN: 100.87 / MAX: 127.63
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU a b 0.0984 0.1968 0.2952 0.3936 0.492 0.433850 0.437447 MIN: 0.37 MIN: 0.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 a b 12 24 36 48 60 52.60 52.99 1. (CXX) g++ options: -O3 -fopenmp
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.1 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.999 1.998 2.997 3.996 4.995 4.41 4.44
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b 200 400 600 800 1000 808.38 813.51 MIN: 793.57 MIN: 798.39 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU a b 3 6 9 12 15 9.95622 9.89765 MIN: 4.3 MIN: 4.62 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Embree Binary: Pathtracer - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Asian Dragon Obj a b 30 60 90 120 150 114.12 114.72 MIN: 108.69 / MAX: 126.84 MIN: 109.31 / MAX: 125.77
OpenVKL Benchmark: vklBenchmarkCPU ISPC OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU ISPC a b 600 1200 1800 2400 3000 2684 2672 MIN: 187 / MAX: 30504 MIN: 186 / MAX: 29904
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon Obj a b 30 60 90 120 150 131.43 131.90 MIN: 124.86 / MAX: 145.41 MIN: 125.55 / MAX: 144.84
Embree Binary: Pathtracer - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer - Model: Asian Dragon a b 30 60 90 120 150 126.56 126.98 MIN: 120.32 / MAX: 139.72 MIN: 121.07 / MAX: 138.68
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b 200 400 600 800 1000 831.25 828.75 MIN: 815.8 MIN: 810.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenVKL Benchmark: vklBenchmarkCPU Scalar OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU Scalar a b 200 400 600 800 1000 1022 1021 MIN: 98 / MAX: 14743 MIN: 98 / MAX: 14523
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Crown a b 30 60 90 120 150 123.36 123.25 MIN: 114.34 / MAX: 139.51 MIN: 113.56 / MAX: 138.91
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon a b 30 60 90 120 150 151.61 151.68 MIN: 143.13 / MAX: 166.36 MIN: 143.16 / MAX: 167.17
Phoronix Test Suite v10.8.5