8380 sun

2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2204037-NE-8380SUN3435.

8380 sunProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionABCD2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Device 0998512GB3841GB Micron_9300_MTFDHAL3T8TDPASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 20.045.15.11-051511-generic (x86_64)GNOME Shell 3.36.9X Server 1.20.131.0.2GCC 9.3.0 + Clang 10.0.0-4ubuntu1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0002a0 Java Details- OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

8380 sunperf-bench: Epoll Waitperf-bench: Futex Hashperf-bench: Memcpy 1MBperf-bench: Memset 1MBperf-bench: Sched Pipeperf-bench: Futex Lock-Piperf-bench: Syscall Basiconednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUjava-jmh: ThroughputABCD3190299093816.84611758.85090515748447139554400.8743691.277511.304840.4448062.983151.813351.384727.247380.8841681.138240.3722740.194293627.389379.171618.6762.123093.785753.62162378.8550.230657636.287378.9590.1779392.08241117974431957.743376299035615.9343453.17192615964144139850080.8580121.28131.297610.4387983.000611.810131.392727.237010.8886751.129150.36910.196251625.616379.589626.9822.117863.790393.61967383.2080.234187621.88386.1790.1701092.09074117397403601.043518298584316.26780158.44890615906449139760560.8634191.290491.283100.4437442.993631.820231.388297.269690.8882881.145030.3644780.194502621.734379.634618.1432.104923.762453.60012379.2060.232623618.823379.5850.1723502.05510117539633814.563345298485315.95404459.78867315889650139255170.8552131.289791.275390.4453762.994081.820821.381837.296530.8845721.152700.3726930.191428622.500378.233617.7922.117353.774493.60258378.6410.232562617.729377.2340.1721822.08466117793385377.77OpenBenchmarking.org

perf-bench

Benchmark: Epoll Wait

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Epoll WaitABCD8001600240032004000SE +/- 26.58, N = 3SE +/- 25.11, N = 331903376351833451. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Futex Hash

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex HashABCD600K1200K1800K2400K3000KSE +/- 2180.19, N = 3SE +/- 4824.86, N = 329909382990356298584329848531. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Memcpy 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memcpy 1MBABCD48121620SE +/- 0.16, N = 3SE +/- 0.18, N = 316.8515.9316.2715.951. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Memset 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memset 1MBABCD1326395265SE +/- 0.55, N = 15SE +/- 1.02, N = 358.8553.1758.4559.791. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Sched Pipe

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Sched PipeABCD30K60K90K120K150KSE +/- 545.51, N = 3SE +/- 451.65, N = 31574841596411590641588961. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Futex Lock-Pi

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex Lock-PiABCD1122334455SE +/- 1.92, N = 15SE +/- 2.63, N = 15474449501. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Syscall Basic

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Syscall BasicABCD3M6M9M12M15MSE +/- 4397.07, N = 3SE +/- 37556.12, N = 3139554401398500813976056139255171. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUABCD0.19670.39340.59010.78680.9835SE +/- 0.001707, N = 3SE +/- 0.002275, N = 30.8743690.8580120.8634190.855213MIN: 0.81MIN: 0.8MIN: 0.8MIN: 0.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUABCD0.29040.58080.87121.16161.452SE +/- 0.00528, N = 3SE +/- 0.00317, N = 31.277511.281301.290491.28979MIN: 1.24MIN: 1.25MIN: 1.25MIN: 1.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUABCD0.29360.58720.88081.17441.468SE +/- 0.02105, N = 3SE +/- 0.01852, N = 31.304841.297611.283101.27539MIN: 0.94MIN: 1.02MIN: 0.93MIN: 0.891. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUABCD0.10020.20040.30060.40080.501SE +/- 0.001544, N = 3SE +/- 0.000362, N = 30.4448060.4387980.4437440.445376MIN: 0.41MIN: 0.41MIN: 0.41MIN: 0.411. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUABCD0.67511.35022.02532.70043.3755SE +/- 0.00300, N = 3SE +/- 0.00450, N = 32.983153.000612.993632.99408MIN: 2.86MIN: 2.87MIN: 2.86MIN: 2.851. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUABCD0.40970.81941.22911.63882.0485SE +/- 0.00151, N = 3SE +/- 0.00600, N = 31.813351.810131.820231.82082MIN: 1.68MIN: 1.68MIN: 1.68MIN: 1.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUABCD0.31340.62680.94021.25361.567SE +/- 0.00571, N = 3SE +/- 0.00232, N = 31.384721.392721.388291.38183MIN: 1.28MIN: 1.24MIN: 1.26MIN: 1.261. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUABCD246810SE +/- 0.00587, N = 3SE +/- 0.00979, N = 37.247387.237017.269697.29653MIN: 6.76MIN: 6.73MIN: 6.69MIN: 6.671. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUABCD0.20.40.60.81SE +/- 0.002781, N = 3SE +/- 0.001577, N = 30.8841680.8886750.8882880.884572MIN: 0.84MIN: 0.84MIN: 0.84MIN: 0.841. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUABCD0.25940.51880.77821.03761.297SE +/- 0.00239, N = 3SE +/- 0.00521, N = 31.138241.129151.145031.15270MIN: 0.95MIN: 0.96MIN: 0.97MIN: 0.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUABCD0.08390.16780.25170.33560.4195SE +/- 0.000878, N = 3SE +/- 0.003422, N = 30.3722740.3691000.3644780.372693MIN: 0.33MIN: 0.33MIN: 0.33MIN: 0.331. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUABCD0.04420.08840.13260.17680.221SE +/- 0.001404, N = 3SE +/- 0.000487, N = 30.1942930.1962510.1945020.191428MIN: 0.18MIN: 0.18MIN: 0.18MIN: 0.181. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUABCD140280420560700SE +/- 1.66, N = 3SE +/- 1.46, N = 3627.39625.62621.73622.50MIN: 596.88MIN: 597.43MIN: 594.79MIN: 596.581. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUABCD80160240320400SE +/- 1.29, N = 3SE +/- 0.51, N = 3379.17379.59379.63378.23MIN: 363.09MIN: 361.95MIN: 361.75MIN: 360.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUABCD140280420560700SE +/- 1.17, N = 3SE +/- 0.39, N = 3618.68626.98618.14617.79MIN: 594.98MIN: 598.45MIN: 592.21MIN: 594.211. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUABCD0.47770.95541.43311.91082.3885SE +/- 0.00132, N = 3SE +/- 0.00242, N = 32.123092.117862.104922.11735MIN: 2.03MIN: 2.03MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUABCD0.85281.70562.55843.41124.264SE +/- 0.00243, N = 3SE +/- 0.00723, N = 33.785753.790393.762453.77449MIN: 3.54MIN: 3.53MIN: 3.52MIN: 3.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUABCD0.81491.62982.44473.25964.0745SE +/- 0.00386, N = 3SE +/- 0.00989, N = 33.621623.619673.600123.60258MIN: 3.51MIN: 3.51MIN: 3.51MIN: 3.511. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUABCD80160240320400SE +/- 2.30, N = 3SE +/- 0.30, N = 3378.86383.21379.21378.64MIN: 362.36MIN: 366.5MIN: 358.89MIN: 362.011. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUABCD0.05270.10540.15810.21080.2635SE +/- 0.001971, N = 3SE +/- 0.000626, N = 30.2306570.2341870.2326230.232562MIN: 0.21MIN: 0.22MIN: 0.21MIN: 0.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUABCD140280420560700SE +/- 0.48, N = 3SE +/- 0.57, N = 3636.29621.88618.82617.73MIN: 595.54MIN: 595.66MIN: 594.97MIN: 594.741. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUABCD80160240320400SE +/- 1.61, N = 3SE +/- 1.09, N = 3378.96386.18379.59377.23MIN: 363.09MIN: 363.76MIN: 358.8MIN: 359.791. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUABCD0.040.080.120.160.2SE +/- 0.000739, N = 3SE +/- 0.001576, N = 30.1779390.1701090.1723500.172182MIN: 0.16MIN: 0.15MIN: 0.16MIN: 0.151. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUABCD0.47040.94081.41121.88162.352SE +/- 0.01164, N = 3SE +/- 0.03558, N = 32.082412.090742.055102.08466MIN: 1.86MIN: 1.85MIN: 1.81MIN: 1.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

Java JMH

Throughput

OpenBenchmarking.orgOps/s, More Is BetterJava JMHThroughputABCD30000M60000M90000M120000M150000M117974431957.74117397403601.04117539633814.56117793385377.77


Phoronix Test Suite v10.8.4