9684x-march Tests for a future article. 2 x AMD EPYC 9684X 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403270-NE-9684XMARC10&grw .
9684x-march Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution PRE a 2 x AMD EPYC 9684X 96-Core @ 2.55GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 1520GB 3201GB Micron_7450_MTFDKCB3T2TFS + 257GB Flash Drive ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 23.10 6.5.0-25-generic (x86_64) GCC 13.2.0 ext4 640x480 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10113e Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9684x-march brl-cad: VGR Performance Metric tensorflow: CPU - 1 - AlexNet tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 64 - AlexNet tensorflow: CPU - 1 - GoogLeNet tensorflow: CPU - 1 - ResNet-50 tensorflow: CPU - 256 - AlexNet tensorflow: CPU - 512 - AlexNet tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 32 - ResNet-50 tensorflow: CPU - 64 - GoogLeNet tensorflow: CPU - 64 - ResNet-50 tensorflow: CPU - 256 - GoogLeNet tensorflow: CPU - 256 - ResNet-50 tensorflow: CPU - 512 - GoogLeNet tensorflow: CPU - 512 - ResNet-50 pytorch: CPU - 1 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 512 - ResNet-50 pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 512 - ResNet-152 pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 256 - Efficientnet_v2_l pytorch: CPU - 512 - Efficientnet_v2_l blender: BMW27 - CPU-Only blender: Junkshop - CPU-Only blender: Classroom - CPU-Only blender: Fishy Cat - CPU-Only blender: Barbershop - CPU-Only blender: Pabellon Barcelona - CPU-Only build-mesa: Time To Compile rocksdb: Overwrite rocksdb: Rand Read rocksdb: Update Rand rocksdb: Read While Writing rocksdb: Read Rand Write Rand PRE a 5956612 21.16 242.29 424.06 765.55 12.58 4.05 1652.23 1980.51 112.64 39.68 185.16 65.88 275.34 87.72 400.03 119.83 493.31 140.59 23.06 9.97 20.93 20.19 21.59 8.93 21.20 8.72 20.43 9.21 8.92 9.47 6.29 2.33 2.33 2.32 2.29 2.31 7.55 11.4 18.03 9.96 67.38 22.99 14.66 421049 1105306233 421266 27130363 3619142 5927564 20.78 247.55 436.25 749.46 13.20 3.9 1604.52 2010.56 114.26 41.26 176.36 60.25 273.68 88.93 399.46 118.88 484.02 140.49 23.20 10.58 21.53 20.84 21.08 9.01 20.77 9.34 21.01 8.91 9.09 9.33 6.45 2.33 2.31 2.31 2.33 2.33 7.55 11.44 18.08 9.85 67.66 23.1 14.756 421616 1108892776 425687 26406662 3643263 OpenBenchmarking.org
BRL-CAD VGR Performance Metric OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.38.2 VGR Performance Metric a PRE 1.3M 2.6M 3.9M 5.2M 6.5M 5927564 5956612 1. (CXX) g++ options: -std=c++17 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -ltcl8.6 -lnetpbm -lregex_brl -lz_brl -lassimp -ldl -lm -ltk8.6
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: AlexNet PRE a 5 10 15 20 25 SE +/- 0.16, N = 15 21.16 20.78
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: AlexNet PRE a 50 100 150 200 250 SE +/- 2.30, N = 15 242.29 247.55
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: AlexNet PRE a 90 180 270 360 450 SE +/- 6.62, N = 15 424.06 436.25
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: AlexNet PRE a 170 340 510 680 850 SE +/- 5.39, N = 15 765.55 749.46
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: GoogLeNet PRE a 3 6 9 12 15 SE +/- 0.14, N = 15 12.58 13.20
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 PRE a 0.9113 1.8226 2.7339 3.6452 4.5565 4.05 3.90
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: AlexNet PRE a 400 800 1200 1600 2000 1652.23 1604.52
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: AlexNet PRE a 400 800 1200 1600 2000 1980.51 2010.56
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: GoogLeNet PRE a 30 60 90 120 150 112.64 114.26
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 PRE a 9 18 27 36 45 39.68 41.26
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: GoogLeNet PRE a 40 80 120 160 200 185.16 176.36
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 a PRE 15 30 45 60 75 60.25 65.88
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: GoogLeNet a PRE 60 120 180 240 300 273.68 275.34
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 a PRE 20 40 60 80 100 88.93 87.72
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: GoogLeNet a PRE 90 180 270 360 450 399.46 400.03
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 a PRE 30 60 90 120 150 118.88 119.83
TensorFlow Device: CPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: GoogLeNet a PRE 110 220 330 440 550 484.02 493.31
TensorFlow Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 a PRE 30 60 90 120 150 140.49 140.59
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 PRE a 6 12 18 24 30 SE +/- 0.20, N = 15 23.06 23.20 MIN: 12.95 / MAX: 24.52 MIN: 12.21 / MAX: 25.13
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 PRE a 3 6 9 12 15 SE +/- 0.10, N = 15 9.97 10.58 MIN: 4.85 / MAX: 10.69 MIN: 4.55 / MAX: 11.67
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 PRE a 5 10 15 20 25 SE +/- 0.16, N = 3 20.93 21.53 MIN: 12.91 / MAX: 21.51 MIN: 12.64 / MAX: 22.28
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 PRE a 5 10 15 20 25 SE +/- 0.16, N = 15 20.19 20.84 MIN: 11.95 / MAX: 21.04 MIN: 11.24 / MAX: 22.33
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 PRE a 5 10 15 20 25 SE +/- 0.23, N = 3 21.59 21.08 MIN: 14.02 / MAX: 22.21 MIN: 13.2 / MAX: 22.07
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 PRE a 3 6 9 12 15 SE +/- 0.09, N = 3 8.93 9.01 MIN: 8.8 / MAX: 9.04 MIN: 4.81 / MAX: 9.31
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 PRE a 5 10 15 20 25 SE +/- 0.10, N = 3 21.20 20.77 MIN: 12.68 / MAX: 21.88 MIN: 12.97 / MAX: 21.67
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 PRE a 3 6 9 12 15 SE +/- 0.08, N = 3 8.72 9.34 MIN: 5.23 / MAX: 9.06 MIN: 4.74 / MAX: 9.74
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 PRE a 5 10 15 20 25 SE +/- 0.14, N = 15 20.43 21.01 MIN: 13.46 / MAX: 21.1 MIN: 11.92 / MAX: 22.65
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 PRE a 3 6 9 12 15 SE +/- 0.09, N = 12 9.21 8.91 MIN: 4.8 / MAX: 9.43 MIN: 4.5 / MAX: 9.7
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 PRE a 3 6 9 12 15 SE +/- 0.10, N = 12 8.92 9.09 MIN: 5.04 / MAX: 9.16 MIN: 4.84 / MAX: 10.03
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 PRE a 3 6 9 12 15 SE +/- 0.10, N = 3 9.47 9.33 MIN: 5.17 / MAX: 9.87 MIN: 4.69 / MAX: 9.66
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l PRE a 2 4 6 8 10 SE +/- 0.09, N = 3 6.29 6.45 MIN: 3.09 / MAX: 6.44 MIN: 3.05 / MAX: 6.85
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l PRE a 0.5243 1.0486 1.5729 2.0972 2.6215 SE +/- 0.01, N = 3 2.33 2.33 MIN: 1.76 / MAX: 2.72 MIN: 1.77 / MAX: 2.9
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l PRE a 0.5243 1.0486 1.5729 2.0972 2.6215 SE +/- 0.01, N = 3 2.33 2.31 MIN: 1.78 / MAX: 2.8 MIN: 1.88 / MAX: 2.74
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l PRE a 0.522 1.044 1.566 2.088 2.61 SE +/- 0.01, N = 3 2.32 2.31 MIN: 1.9 / MAX: 2.75 MIN: 1.53 / MAX: 2.83
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l PRE a 0.5243 1.0486 1.5729 2.0972 2.6215 SE +/- 0.01, N = 3 2.29 2.33 MIN: 1.79 / MAX: 2.72 MIN: 1.59 / MAX: 2.78
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l PRE a 0.5243 1.0486 1.5729 2.0972 2.6215 SE +/- 0.01, N = 3 2.31 2.33 MIN: 1.7 / MAX: 2.84 MIN: 1.58 / MAX: 2.83
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: BMW27 - Compute: CPU-Only a PRE 2 4 6 8 10 7.55 7.55
Blender Blend File: Junkshop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Junkshop - Compute: CPU-Only a PRE 3 6 9 12 15 11.44 11.40
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Classroom - Compute: CPU-Only a PRE 4 8 12 16 20 18.08 18.03
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Fishy Cat - Compute: CPU-Only a PRE 3 6 9 12 15 9.85 9.96
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Barbershop - Compute: CPU-Only a PRE 15 30 45 60 75 67.66 67.38
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Pabellon Barcelona - Compute: CPU-Only a PRE 6 12 18 24 30 23.10 22.99
Timed Mesa Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Mesa Compilation 24.0 Time To Compile PRE a 4 8 12 16 20 SE +/- 0.04, N = 3 14.66 14.76
RocksDB Test: Overwrite OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Overwrite a PRE 90K 180K 270K 360K 450K 421616 421049 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Random Read OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Random Read a PRE 200M 400M 600M 800M 1000M 1108892776 1105306233 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Update Random OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Update Random a PRE 90K 180K 270K 360K 450K 425687 421266 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Read While Writing a PRE 6M 12M 18M 24M 30M 26406662 27130363 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Read Random Write Random a PRE 800K 1600K 2400K 3200K 4000K 3643263 3619142 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Phoronix Test Suite v10.8.5