8380 sun

2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2204037-NE-8380SUN3435&grr&rdt.

8380 sunProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionABCD2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Device 0998512GB3841GB Micron_9300_MTFDHAL3T8TDPASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 20.045.15.11-051511-generic (x86_64)GNOME Shell 3.36.9X Server 1.20.131.0.2GCC 9.3.0 + Clang 10.0.0-4ubuntu1ext41920x1080OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0002a0 Java Details- OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

8380 sunjava-jmh: Throughputperf-bench: Futex Lock-Pionednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUperf-bench: Sched Pipeperf-bench: Epoll Waitperf-bench: Futex Hashperf-bench: Memcpy 1MBonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUperf-bench: Memset 1MBonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUperf-bench: Syscall Basiconednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUABCD117974431957.7447618.676636.287627.389378.959379.171378.8551574843190299093816.8461177.247383.785750.37227458.8509050.8743692.983151.304842.082410.2306570.1779391.277511.813350.444806139554401.138241.384722.123093.621620.8841680.194293117397403601.0444626.982621.88625.616386.179379.589383.2081596413376299035615.934347.237013.790390.369153.1719260.8580123.000611.297612.090740.2341870.1701091.28131.810130.438798139850081.129151.392722.117863.619670.8886750.196251117539633814.5649618.143618.823621.734379.585379.634379.2061590643518298584316.2678017.269693.762450.36447858.4489060.8634192.993631.283102.055100.2326230.1723501.290491.820230.443744139760561.145031.388292.104923.600120.8882880.194502117793385377.7750617.792617.729622.500377.234378.233378.6411588963345298485315.9540447.296533.774490.37269359.7886730.8552132.994081.275392.084660.2325620.1721821.289791.820820.445376139255171.152701.381832.117353.602580.8845720.191428OpenBenchmarking.org

Java JMH

Throughput

OpenBenchmarking.orgOps/s, More Is BetterJava JMHThroughputABCD30000M60000M90000M120000M150000M117974431957.74117397403601.04117539633814.56117793385377.77

perf-bench

Benchmark: Futex Lock-Pi

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex Lock-PiABCD1122334455SE +/- 1.92, N = 15SE +/- 2.63, N = 15474449501. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUABCD140280420560700SE +/- 1.17, N = 3SE +/- 0.39, N = 3618.68626.98618.14617.79MIN: 594.98MIN: 598.45MIN: 592.21MIN: 594.211. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUABCD140280420560700SE +/- 0.48, N = 3SE +/- 0.57, N = 3636.29621.88618.82617.73MIN: 595.54MIN: 595.66MIN: 594.97MIN: 594.741. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUABCD140280420560700SE +/- 1.66, N = 3SE +/- 1.46, N = 3627.39625.62621.73622.50MIN: 596.88MIN: 597.43MIN: 594.79MIN: 596.581. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUABCD80160240320400SE +/- 1.61, N = 3SE +/- 1.09, N = 3378.96386.18379.59377.23MIN: 363.09MIN: 363.76MIN: 358.8MIN: 359.791. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUABCD80160240320400SE +/- 1.29, N = 3SE +/- 0.51, N = 3379.17379.59379.63378.23MIN: 363.09MIN: 361.95MIN: 361.75MIN: 360.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUABCD80160240320400SE +/- 2.30, N = 3SE +/- 0.30, N = 3378.86383.21379.21378.64MIN: 362.36MIN: 366.5MIN: 358.89MIN: 362.011. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

perf-bench

Benchmark: Sched Pipe

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Sched PipeABCD30K60K90K120K150KSE +/- 545.51, N = 3SE +/- 451.65, N = 31574841596411590641588961. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Epoll Wait

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Epoll WaitABCD8001600240032004000SE +/- 26.58, N = 3SE +/- 25.11, N = 331903376351833451. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Futex Hash

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex HashABCD600K1200K1800K2400K3000KSE +/- 2180.19, N = 3SE +/- 4824.86, N = 329909382990356298584329848531. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

perf-bench

Benchmark: Memcpy 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memcpy 1MBABCD48121620SE +/- 0.16, N = 3SE +/- 0.18, N = 316.8515.9316.2715.951. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUABCD246810SE +/- 0.00587, N = 3SE +/- 0.00979, N = 37.247387.237017.269697.29653MIN: 6.76MIN: 6.73MIN: 6.69MIN: 6.671. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUABCD0.85281.70562.55843.41124.264SE +/- 0.00243, N = 3SE +/- 0.00723, N = 33.785753.790393.762453.77449MIN: 3.54MIN: 3.53MIN: 3.52MIN: 3.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUABCD0.08390.16780.25170.33560.4195SE +/- 0.000878, N = 3SE +/- 0.003422, N = 30.3722740.3691000.3644780.372693MIN: 0.33MIN: 0.33MIN: 0.33MIN: 0.331. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

perf-bench

Benchmark: Memset 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memset 1MBABCD1326395265SE +/- 0.55, N = 15SE +/- 1.02, N = 358.8553.1758.4559.791. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUABCD0.19670.39340.59010.78680.9835SE +/- 0.001707, N = 3SE +/- 0.002275, N = 30.8743690.8580120.8634190.855213MIN: 0.81MIN: 0.8MIN: 0.8MIN: 0.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUABCD0.67511.35022.02532.70043.3755SE +/- 0.00300, N = 3SE +/- 0.00450, N = 32.983153.000612.993632.99408MIN: 2.86MIN: 2.87MIN: 2.86MIN: 2.851. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUABCD0.29360.58720.88081.17441.468SE +/- 0.02105, N = 3SE +/- 0.01852, N = 31.304841.297611.283101.27539MIN: 0.94MIN: 1.02MIN: 0.93MIN: 0.891. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUABCD0.47040.94081.41121.88162.352SE +/- 0.01164, N = 3SE +/- 0.03558, N = 32.082412.090742.055102.08466MIN: 1.86MIN: 1.85MIN: 1.81MIN: 1.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUABCD0.05270.10540.15810.21080.2635SE +/- 0.001971, N = 3SE +/- 0.000626, N = 30.2306570.2341870.2326230.232562MIN: 0.21MIN: 0.22MIN: 0.21MIN: 0.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUABCD0.040.080.120.160.2SE +/- 0.000739, N = 3SE +/- 0.001576, N = 30.1779390.1701090.1723500.172182MIN: 0.16MIN: 0.15MIN: 0.16MIN: 0.151. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUABCD0.29040.58080.87121.16161.452SE +/- 0.00528, N = 3SE +/- 0.00317, N = 31.277511.281301.290491.28979MIN: 1.24MIN: 1.25MIN: 1.25MIN: 1.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUABCD0.40970.81941.22911.63882.0485SE +/- 0.00151, N = 3SE +/- 0.00600, N = 31.813351.810131.820231.82082MIN: 1.68MIN: 1.68MIN: 1.68MIN: 1.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUABCD0.10020.20040.30060.40080.501SE +/- 0.001544, N = 3SE +/- 0.000362, N = 30.4448060.4387980.4437440.445376MIN: 0.41MIN: 0.41MIN: 0.41MIN: 0.411. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

perf-bench

Benchmark: Syscall Basic

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Syscall BasicABCD3M6M9M12M15MSE +/- 4397.07, N = 3SE +/- 37556.12, N = 3139554401398500813976056139255171. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUABCD0.25940.51880.77821.03761.297SE +/- 0.00239, N = 3SE +/- 0.00521, N = 31.138241.129151.145031.15270MIN: 0.95MIN: 0.96MIN: 0.97MIN: 0.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUABCD0.31340.62680.94021.25361.567SE +/- 0.00571, N = 3SE +/- 0.00232, N = 31.384721.392721.388291.38183MIN: 1.28MIN: 1.24MIN: 1.26MIN: 1.261. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUABCD0.47770.95541.43311.91082.3885SE +/- 0.00132, N = 3SE +/- 0.00242, N = 32.123092.117862.104922.11735MIN: 2.03MIN: 2.03MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUABCD0.81491.62982.44473.25964.0745SE +/- 0.00386, N = 3SE +/- 0.00989, N = 33.621623.619673.600123.60258MIN: 3.51MIN: 3.51MIN: 3.51MIN: 3.511. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUABCD0.20.40.60.81SE +/- 0.002781, N = 3SE +/- 0.001577, N = 30.8841680.8886750.8882880.884572MIN: 0.84MIN: 0.84MIN: 0.84MIN: 0.841. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUABCD0.04420.08840.13260.17680.221SE +/- 0.001404, N = 3SE +/- 0.000487, N = 30.1942930.1962510.1945020.191428MIN: 0.18MIN: 0.18MIN: 0.18MIN: 0.181. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl


Phoronix Test Suite v10.8.4