3970X april AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1502 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2204022-NE-3970XAPRI60&sor&gru .
3970X april Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C AMD Ryzen Threadripper 3970X 32-Core @ 3.70GHz (32 Cores / 64 Threads) ASUS ROG ZENITH II EXTREME (1502 BIOS) AMD Starship/Matisse 64GB Samsung SSD 980 PRO 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 21.10 5.15.5-051505-generic (x86_64) GNOME Shell 40.5 X Server + Wayland 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039 Java Details - OpenJDK Runtime Environment (build 11.0.14.1+1-Ubuntu-0ubuntu1.21.10) Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
3970X april perf-bench: Memcpy 1MB perf-bench: Memset 1MB java-jmh: Throughput perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU A B C 14.650909 74.070612 63748499441.069 13411 4555052 240248 189 18727173 1.34126 4.23007 1.25689 0.846616 5.37336 4.04769 2.69958 5.93698 1.44982 1.53131 4142.38 1169.78 4143.75 1165.53 4.16374 4152.24 1250.46 10.2364 10.658625 80.795318 63696734759.63 13003 4567109 377962 187 18705119 1.25925 5.13321 1.17561 0.891938 5.70658 3.7746 2.66137 6.43645 1.44523 1.54297 4136.25 1256.25 4112.62 1256.61 4.65854 4036.93 1273.28 10.6426 9.500946 73.181666 63765483746.799 12947 4574779 326836 196 18699792 1.26163 5.28602 1.29343 0.910685 5.89321 3.86946 2.66465 6.50507 1.44296 1.54292 4115.6 1252.01 4116.99 1251.49 3.31743 4119.52 1210.42 11.2493 OpenBenchmarking.org
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB A B C 4 8 12 16 20 14.650909 10.658625 9.500946 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB B A C 20 40 60 80 100 80.80 74.07 73.18 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
Java JMH Throughput OpenBenchmarking.org Ops/s, More Is Better Java JMH Throughput C A B 14000M 28000M 42000M 56000M 70000M 63765483746.80 63748499441.07 63696734759.63
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait A B C 3K 6K 9K 12K 15K 13411 13003 12947 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash C B A 1000K 2000K 3000K 4000K 5000K 4574779 4567109 4555052 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe B C A 80K 160K 240K 320K 400K 377962 326836 240248 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi C A B 40 80 120 160 200 196 189 187 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic A B C 4M 8M 12M 16M 20M 18727173 18705119 18699792 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU B C A 0.3018 0.6036 0.9054 1.2072 1.509 1.25925 1.26163 1.34126 MIN: 1.17 MIN: 1.17 MIN: 1.22 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C 1.1894 2.3788 3.5682 4.7576 5.947 4.23007 5.13321 5.28602 MIN: 4.14 MIN: 5.08 MIN: 5.24 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU B A C 0.291 0.582 0.873 1.164 1.455 1.17561 1.25689 1.29343 MIN: 1.04 MIN: 1.1 MIN: 1.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C 0.2049 0.4098 0.6147 0.8196 1.0245 0.846616 0.891938 0.910685 MIN: 0.78 MIN: 0.79 MIN: 0.82 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C 1.326 2.652 3.978 5.304 6.63 5.37336 5.70658 5.89321 MIN: 5.3 MIN: 5.64 MIN: 5.82 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU B C A 0.9107 1.8214 2.7321 3.6428 4.5535 3.77460 3.86946 4.04769 MIN: 3.54 MIN: 3.56 MIN: 3.55 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU B C A 0.6074 1.2148 1.8222 2.4296 3.037 2.66137 2.66465 2.69958 MIN: 2.6 MIN: 2.6 MIN: 2.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C 2 4 6 8 10 5.93698 6.43645 6.50507 MIN: 5.81 MIN: 6.28 MIN: 6.38 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU C B A 0.3262 0.6524 0.9786 1.3048 1.631 1.44296 1.44523 1.44982 MIN: 1.37 MIN: 1.38 MIN: 1.39 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A C B 0.3472 0.6944 1.0416 1.3888 1.736 1.53131 1.54292 1.54297 MIN: 1.47 MIN: 1.48 MIN: 1.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU C B A 900 1800 2700 3600 4500 4115.60 4136.25 4142.38 MIN: 4101.42 MIN: 4121.52 MIN: 4127.08 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A C B 300 600 900 1200 1500 1169.78 1252.01 1256.25 MIN: 1155.47 MIN: 1232.87 MIN: 1234.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU B C A 900 1800 2700 3600 4500 4112.62 4116.99 4143.75 MIN: 4096.93 MIN: 4100.76 MIN: 4124.75 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A C B 300 600 900 1200 1500 1165.53 1251.49 1256.61 MIN: 1151.68 MIN: 1234.2 MIN: 1238.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU C A B 1.0482 2.0964 3.1446 4.1928 5.241 3.31743 4.16374 4.65854 MIN: 3.18 MIN: 4.02 MIN: 4.51 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU B C A 900 1800 2700 3600 4500 4036.93 4119.52 4152.24 MIN: 4023.67 MIN: 4106.71 MIN: 4137.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU C A B 300 600 900 1200 1500 1210.42 1250.46 1273.28 MIN: 1192.25 MIN: 1231.51 MIN: 1251.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C 3 6 9 12 15 10.24 10.64 11.25 MIN: 9.08 MIN: 9.23 MIN: 9.99 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
Phoronix Test Suite v10.8.5