ff AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1802 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401113-NE-FF610899407&grr .
ff Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b AMD Ryzen Threadripper 3970X 32-Core @ 3.70GHz (32 Cores / 64 Threads) ASUS ROG ZENITH II EXTREME (1802 BIOS) AMD Starship/Matisse 4 x 16 GB DRAM-3600MT/s Corsair CMT64GX4M4Z3600C16 Samsung SSD 980 PRO 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 22.04 6.2.0-39-generic (x86_64) GNOME Shell 42.2 X Server + Wayland 4.6 Mesa 22.0.1 (LLVM 13.0.1 DRM 3.49) 1.2.204 GCC 11.4.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ff pytorch: CPU - 16 - Efficientnet_v2_l tensorflow: CPU - 16 - VGG-16 speedb: Seq Fill quicksilver: CTS2 llama-cpp: llama-2-70b-chat.Q5_0.gguf pytorch: CPU - 16 - ResNet-152 quicksilver: CORAL2 P2 tensorflow: CPU - 16 - ResNet-50 pytorch: CPU - 1 - Efficientnet_v2_l cachebench: Read / Modify / Write cachebench: Write cachebench: Read rav1e: 1 pytorch: CPU - 1 - ResNet-152 rav1e: 5 pytorch: CPU - 16 - ResNet-50 rav1e: 10 speedb: Rand Fill Sync speedb: Rand Fill speedb: Update Rand speedb: Read Rand Write Rand speedb: Read While Writing speedb: Rand Read tensorflow: CPU - 1 - VGG-16 rav1e: 6 quicksilver: CORAL2 P1 llama-cpp: llama-2-13b.Q4_0.gguf tensorflow: CPU - 16 - GoogLeNet pytorch: CPU - 1 - ResNet-50 tensorflow: CPU - 16 - AlexNet llama-cpp: llama-2-7b.Q4_0.gguf tensorflow: CPU - 1 - AlexNet tensorflow: CPU - 1 - ResNet-50 y-cruncher: 1B tensorflow: CPU - 1 - GoogLeNet y-cruncher: 500M a b 5.99 6.90 254122 20786667 1.86 11.74 28950000 12.54 7.90 119470.658017 61568.843405 11066.955437 0.836 13.95 2.813 30.68 8.630 5618 252455 241710 2422144 7991264 129495776 2.07 3.733 21960000 10.63 42.90 35.79 61.41 19.36 5.35 6.37 16.373 9.23 7.925 6.02 6.81 254353 20850000 1.86 11.88 28950000 12.59 7.89 118721.395337 61337.271973 11033.02931 0.839 14.19 2.809 31.16 8.869 5613 252251 240957 2403103 8052776 131475734 2.01 3.78 21990000 10.64 43.29 35.43 60.42 19.32 5.35 6.4 16.394 9.23 7.986 OpenBenchmarking.org
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l a b 2 4 6 8 10 SE +/- 0.02, N = 3 5.99 6.02 MIN: 5.9 / MAX: 6.06 MIN: 5.98 / MAX: 6.06
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 a b 2 4 6 8 10 SE +/- 0.01, N = 3 6.90 6.81
Speedb Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Sequential Fill a b 50K 100K 150K 200K 250K SE +/- 336.51, N = 3 254122 254353 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Quicksilver Input: CTS2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CTS2 a b 4M 8M 12M 16M 20M SE +/- 37564.76, N = 3 20786667 20850000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf a b 0.4185 0.837 1.2555 1.674 2.0925 SE +/- 0.00, N = 3 1.86 1.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 a b 3 6 9 12 15 SE +/- 0.05, N = 3 11.74 11.88 MIN: 11.43 / MAX: 11.97 MIN: 11.8 / MAX: 11.95
Quicksilver Input: CORAL2 P2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 a b 6M 12M 18M 24M 30M SE +/- 37859.39, N = 3 28950000 28950000 1. (CXX) g++ options: -fopenmp -O3 -march=native
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 3 6 9 12 15 SE +/- 0.08, N = 3 12.54 12.59
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l a b 2 4 6 8 10 SE +/- 0.01, N = 3 7.90 7.89 MIN: 7.78 / MAX: 8.01 MIN: 7.8 / MAX: 7.97
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write a b 30K 60K 90K 120K 150K SE +/- 81.20, N = 3 119470.66 118721.40 MIN: 97982.3 / MAX: 130920.18 MIN: 99330.52 / MAX: 130732.06 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write a b 13K 26K 39K 52K 65K SE +/- 29.06, N = 3 61568.84 61337.27 MIN: 40788.42 / MAX: 66208.33 MIN: 41835.46 / MAX: 65734.93 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read a b 2K 4K 6K 8K 10K SE +/- 33.97, N = 3 11066.96 11033.03 MIN: 10956.53 / MAX: 11112.67 MIN: 11002.15 / MAX: 11058.91 1. (CC) gcc options: -O3 -lrt
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 1 a b 0.1888 0.3776 0.5664 0.7552 0.944 SE +/- 0.001, N = 3 0.836 0.839
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 a b 4 8 12 16 20 SE +/- 0.08, N = 3 13.95 14.19 MIN: 13.59 / MAX: 14.24 MIN: 14.01 / MAX: 14.36
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 5 a b 0.6329 1.2658 1.8987 2.5316 3.1645 SE +/- 0.005, N = 3 2.813 2.809
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 7 14 21 28 35 SE +/- 0.17, N = 3 30.68 31.16 MIN: 30 / MAX: 31.31 MIN: 30.47 / MAX: 31.36
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 10 a b 2 4 6 8 10 SE +/- 0.075, N = 3 8.630 8.869
Speedb Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill Sync a b 1200 2400 3600 4800 6000 SE +/- 49.65, N = 3 5618 5613 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Random Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill a b 50K 100K 150K 200K 250K SE +/- 91.82, N = 3 252455 252251 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Update Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random a b 50K 100K 150K 200K 250K SE +/- 580.55, N = 3 241710 240957 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random a b 500K 1000K 1500K 2000K 2500K SE +/- 3556.84, N = 3 2422144 2403103 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read While Writing a b 2M 4M 6M 8M 10M SE +/- 69727.75, N = 3 7991264 8052776 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Random Read OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read a b 30M 60M 90M 120M 150M SE +/- 1683044.27, N = 3 129495776 131475734 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: VGG-16 a b 0.4658 0.9316 1.3974 1.8632 2.329 SE +/- 0.01, N = 3 2.07 2.01
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 6 a b 0.8505 1.701 2.5515 3.402 4.2525 SE +/- 0.017, N = 3 3.733 3.780
Quicksilver Input: CORAL2 P1 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 a b 5M 10M 15M 20M 25M SE +/- 0.00, N = 3 21960000 21990000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b 3 6 9 12 15 SE +/- 0.00, N = 3 10.63 10.64 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet a b 10 20 30 40 50 SE +/- 0.16, N = 3 42.90 43.29
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 8 16 24 32 40 SE +/- 0.14, N = 3 35.79 35.43 MIN: 35 / MAX: 36.54 MIN: 34.31 / MAX: 36.32
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet a b 14 28 42 56 70 SE +/- 0.13, N = 3 61.41 60.42
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf a b 5 10 15 20 25 SE +/- 0.07, N = 3 19.36 19.32 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: AlexNet a b 1.2038 2.4076 3.6114 4.8152 6.019 SE +/- 0.00, N = 3 5.35 5.35
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 2 4 6 8 10 SE +/- 0.01, N = 3 6.37 6.40
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 1B a b 4 8 12 16 20 SE +/- 0.03, N = 3 16.37 16.39
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: GoogLeNet a b 3 6 9 12 15 SE +/- 0.02, N = 3 9.23 9.23
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 500M a b 2 4 6 8 10 SE +/- 0.007, N = 3 7.925 7.986
Phoronix Test Suite v10.8.5