ff Tests for a future article. AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1802 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401116-NE-FF240899407&rdt .
ff Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b AMD Ryzen Threadripper 3970X 32-Core @ 3.70GHz (32 Cores / 64 Threads) ASUS ROG ZENITH II EXTREME (1802 BIOS) AMD Starship/Matisse 4 x 16 GB DRAM-3600MT/s Corsair CMT64GX4M4Z3600C16 Samsung SSD 980 PRO 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 22.04 6.2.0-39-generic (x86_64) GNOME Shell 42.2 X Server + Wayland 4.6 Mesa 22.0.1 (LLVM 13.0.1 DRM 3.49) 1.2.204 GCC 11.4.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ff quicksilver: CTS2 quicksilver: CORAL2 P1 quicksilver: CORAL2 P2 cachebench: Read cachebench: Write cachebench: Read / Modify / Write rav1e: 1 rav1e: 5 rav1e: 6 rav1e: 10 y-cruncher: 1B y-cruncher: 500M pytorch: CPU - 1 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l tensorflow: CPU - 1 - VGG-16 tensorflow: CPU - 1 - AlexNet tensorflow: CPU - 16 - VGG-16 tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 1 - GoogLeNet tensorflow: CPU - 1 - ResNet-50 tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 speedb: Rand Fill speedb: Rand Read speedb: Update Rand speedb: Seq Fill speedb: Rand Fill Sync speedb: Read While Writing speedb: Read Rand Write Rand llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf llama-cpp: llama-2-70b-chat.Q5_0.gguf a b 20786667 21960000 28950000 11066.955437 61568.843405 119470.658017 0.836 2.813 3.733 8.630 16.373 7.925 35.79 13.95 30.68 11.74 7.90 5.99 2.07 5.35 6.90 61.41 9.23 6.37 42.90 12.54 252455 129495776 241710 254122 5618 7991264 2422144 19.36 10.63 1.86 20850000 21990000 28950000 11033.02931 61337.271973 118721.395337 0.839 2.809 3.78 8.869 16.394 7.986 35.43 14.19 31.16 11.88 7.89 6.02 2.01 5.35 6.81 60.42 9.23 6.4 43.29 12.59 252251 131475734 240957 254353 5613 8052776 2403103 19.32 10.64 1.86 OpenBenchmarking.org
Quicksilver Input: CTS2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CTS2 a b 4M 8M 12M 16M 20M SE +/- 37564.76, N = 3 20786667 20850000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Quicksilver Input: CORAL2 P1 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 a b 5M 10M 15M 20M 25M SE +/- 0.00, N = 3 21960000 21990000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Quicksilver Input: CORAL2 P2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 a b 6M 12M 18M 24M 30M SE +/- 37859.39, N = 3 28950000 28950000 1. (CXX) g++ options: -fopenmp -O3 -march=native
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read a b 2K 4K 6K 8K 10K SE +/- 33.97, N = 3 11066.96 11033.03 MIN: 10956.53 / MAX: 11112.67 MIN: 11002.15 / MAX: 11058.91 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write a b 13K 26K 39K 52K 65K SE +/- 29.06, N = 3 61568.84 61337.27 MIN: 40788.42 / MAX: 66208.33 MIN: 41835.46 / MAX: 65734.93 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write a b 30K 60K 90K 120K 150K SE +/- 81.20, N = 3 119470.66 118721.40 MIN: 97982.3 / MAX: 130920.18 MIN: 99330.52 / MAX: 130732.06 1. (CC) gcc options: -O3 -lrt
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 1 a b 0.1888 0.3776 0.5664 0.7552 0.944 SE +/- 0.001, N = 3 0.836 0.839
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 5 a b 0.6329 1.2658 1.8987 2.5316 3.1645 SE +/- 0.005, N = 3 2.813 2.809
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 6 a b 0.8505 1.701 2.5515 3.402 4.2525 SE +/- 0.017, N = 3 3.733 3.780
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 10 a b 2 4 6 8 10 SE +/- 0.075, N = 3 8.630 8.869
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 1B a b 4 8 12 16 20 SE +/- 0.03, N = 3 16.37 16.39
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 500M a b 2 4 6 8 10 SE +/- 0.007, N = 3 7.925 7.986
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 8 16 24 32 40 SE +/- 0.14, N = 3 35.79 35.43 MIN: 35 / MAX: 36.54 MIN: 34.31 / MAX: 36.32
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 a b 4 8 12 16 20 SE +/- 0.08, N = 3 13.95 14.19 MIN: 13.59 / MAX: 14.24 MIN: 14.01 / MAX: 14.36
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 7 14 21 28 35 SE +/- 0.17, N = 3 30.68 31.16 MIN: 30 / MAX: 31.31 MIN: 30.47 / MAX: 31.36
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 a b 3 6 9 12 15 SE +/- 0.05, N = 3 11.74 11.88 MIN: 11.43 / MAX: 11.97 MIN: 11.8 / MAX: 11.95
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l a b 2 4 6 8 10 SE +/- 0.01, N = 3 7.90 7.89 MIN: 7.78 / MAX: 8.01 MIN: 7.8 / MAX: 7.97
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l a b 2 4 6 8 10 SE +/- 0.02, N = 3 5.99 6.02 MIN: 5.9 / MAX: 6.06 MIN: 5.98 / MAX: 6.06
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: VGG-16 a b 0.4658 0.9316 1.3974 1.8632 2.329 SE +/- 0.01, N = 3 2.07 2.01
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: AlexNet a b 1.2038 2.4076 3.6114 4.8152 6.019 SE +/- 0.00, N = 3 5.35 5.35
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 a b 2 4 6 8 10 SE +/- 0.01, N = 3 6.90 6.81
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet a b 14 28 42 56 70 SE +/- 0.13, N = 3 61.41 60.42
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: GoogLeNet a b 3 6 9 12 15 SE +/- 0.02, N = 3 9.23 9.23
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 2 4 6 8 10 SE +/- 0.01, N = 3 6.37 6.40
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet a b 10 20 30 40 50 SE +/- 0.16, N = 3 42.90 43.29
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 3 6 9 12 15 SE +/- 0.08, N = 3 12.54 12.59
Speedb Test: Random Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill a b 50K 100K 150K 200K 250K SE +/- 91.82, N = 3 252455 252251 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Random Read OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read a b 30M 60M 90M 120M 150M SE +/- 1683044.27, N = 3 129495776 131475734 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Update Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random a b 50K 100K 150K 200K 250K SE +/- 580.55, N = 3 241710 240957 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Sequential Fill a b 50K 100K 150K 200K 250K SE +/- 336.51, N = 3 254122 254353 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill Sync a b 1200 2400 3600 4800 6000 SE +/- 49.65, N = 3 5618 5613 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read While Writing a b 2M 4M 6M 8M 10M SE +/- 69727.75, N = 3 7991264 8052776 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random a b 500K 1000K 1500K 2000K 2500K SE +/- 3556.84, N = 3 2422144 2403103 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf a b 5 10 15 20 25 SE +/- 0.07, N = 3 19.36 19.32 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b 3 6 9 12 15 SE +/- 0.00, N = 3 10.63 10.64 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf a b 0.4185 0.837 1.2555 1.674 2.0925 SE +/- 0.00, N = 3 1.86 1.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Phoronix Test Suite v10.8.5