epyc turin Benchmarks for a future article. 2 x AMD EPYC 9755 128-Core testing with a AMD VOLCANO (RVOT1001B BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410191-NE-EPYCTURIN56&grs&sor .
epyc turin Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution a b 2 x AMD EPYC 9755 128-Core @ 2.70GHz (256 Cores / 512 Threads) AMD VOLCANO (RVOT1001B BIOS) AMD Device 153a 1520GB 512GB SAMSUNG MZVL2512HCJQ-00B00 + 3201GB Micron_7450_MTFDKCB3T2TFS ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 24.04 6.12.0-rc3-phx (x86_64) GCC 13.2.0 + Clang 18.1.3 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
epyc turin xnnpack: FP32MobileNetV1 litert: Mobilenet Float litert: Inception ResNet V2 xnnpack: FP16MobileNetV3Small xnnpack: FP32MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Large xnnpack: QS8MobileNetV2 litert: Quantized COCO SSD MobileNet v1 xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP32MobileNetV3Small litert: Mobilenet Quant onednn: Deconvolution Batch shapes_1d - CPU epoch: Cone onednn: Convolution Batch Shapes Auto - CPU onednn: Recurrent Neural Network Training - CPU onednn: Deconvolution Batch shapes_3d - CPU onednn: Recurrent Neural Network Inference - CPU onednn: IP Shapes 1D - CPU warpx: Plasma Acceleration onednn: IP Shapes 3D - CPU warpx: Uniform Plasma litert: NASNet Mobile litert: Inception V4 litert: SqueezeNet litert: DeepLab V3 a b 126508 75229.3 824335 308464 349733 121353 322181 202569 63357.4 278473 243557 306213 43601.8 26.1446 283.75 0.298302 761.155 0.423216 514.987 0.797341 23.47043634 0.652196 20.383024 1831650 422782 105380 93820.7 30213 24645.9 363512 406223 283027 99495 269833 238886 71555.6 304488 262462 286163 46489.2 26.5673 280.78 0.301225 758.564 0.421823 516.464 0.799482 23.42649518 0.652110 20.38324203 1254561 380938 46189.4 102742.9 OpenBenchmarking.org
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 b a 30K 60K 90K 120K 150K 30213 126508 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float b a 16K 32K 48K 64K 80K 24645.9 75229.3
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 b a 200K 400K 600K 800K 1000K 363512 824335
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a b 90K 180K 270K 360K 450K 308464 406223 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 b a 70K 140K 210K 280K 350K 283027 349733 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 b a 30K 60K 90K 120K 150K 99495 121353 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large b a 70K 140K 210K 280K 350K 269833 322181 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a b 50K 100K 150K 200K 250K 202569 238886 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a b 15K 30K 45K 60K 75K 63357.4 71555.6
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a b 70K 140K 210K 280K 350K 278473 304488 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a b 60K 120K 180K 240K 300K 243557 262462 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small b a 70K 140K 210K 280K 350K 286163 306213 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a b 10K 20K 30K 40K 50K 43601.8 46489.2
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a b 6 12 18 24 30 SE +/- 0.13, N = 3 26.14 26.57 MIN: 24.29 MIN: 24.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
Epoch Epoch3D Deck: Cone OpenBenchmarking.org Seconds, Fewer Is Better Epoch 4.19.4 Epoch3D Deck: Cone b a 60 120 180 240 300 SE +/- 0.98, N = 3 280.78 283.75 1. (F9X) gfortran options: -O3 -std=f2003 -Jobj -lsdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a b 0.0678 0.1356 0.2034 0.2712 0.339 SE +/- 0.001827, N = 3 0.298302 0.301225 MIN: 0.29 MIN: 0.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU b a 160 320 480 640 800 SE +/- 0.94, N = 3 758.56 761.16 MIN: 750.89 MIN: 753.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU b a 0.0952 0.1904 0.2856 0.3808 0.476 SE +/- 0.000942, N = 3 0.421823 0.423216 MIN: 0.4 MIN: 0.41 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a b 110 220 330 440 550 SE +/- 1.40, N = 3 514.99 516.46 MIN: 510.29 MIN: 510.39 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a b 0.1799 0.3598 0.5397 0.7196 0.8995 SE +/- 0.000618, N = 3 0.797341 0.799482 MIN: 0.76 MIN: 0.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
WarpX Input: Plasma Acceleration OpenBenchmarking.org Seconds, Fewer Is Better WarpX 24.10 Input: Plasma Acceleration b a 6 12 18 24 30 SE +/- 0.13, N = 3 23.43 23.47 1. (CXX) g++ options: -O3 -lm
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU b a 0.1467 0.2934 0.4401 0.5868 0.7335 SE +/- 0.003624, N = 3 0.652110 0.652196 MIN: 0.6 MIN: 0.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
WarpX Input: Uniform Plasma OpenBenchmarking.org Seconds, Fewer Is Better WarpX 24.10 Input: Uniform Plasma a b 5 10 15 20 25 SE +/- 0.19, N = 15 20.38 20.38 1. (CXX) g++ options: -O3 -lm
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile b a 400K 800K 1200K 1600K 2000K SE +/- 119244.97, N = 12 1254561 1831650
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 b a 90K 180K 270K 360K 450K SE +/- 19162.36, N = 12 380938 422782
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet b a 20K 40K 60K 80K 100K SE +/- 759.10, N = 15 46189.4 105380.0
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a b 20K 40K 60K 80K 100K SE +/- 6031.32, N = 12 93820.7 102742.9
Phoronix Test Suite v10.8.5