ff AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1802 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401113-NE-FF610899407&grs&rdt .
ff Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b AMD Ryzen Threadripper 3970X 32-Core @ 3.70GHz (32 Cores / 64 Threads) ASUS ROG ZENITH II EXTREME (1802 BIOS) AMD Starship/Matisse 4 x 16 GB DRAM-3600MT/s Corsair CMT64GX4M4Z3600C16 Samsung SSD 980 PRO 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 22.04 6.2.0-39-generic (x86_64) GNOME Shell 42.2 X Server + Wayland 4.6 Mesa 22.0.1 (LLVM 13.0.1 DRM 3.49) 1.2.204 GCC 11.4.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ff tensorflow: CPU - 1 - VGG-16 rav1e: 10 pytorch: CPU - 1 - ResNet-152 tensorflow: CPU - 16 - AlexNet pytorch: CPU - 16 - ResNet-50 speedb: Rand Read tensorflow: CPU - 16 - VGG-16 rav1e: 6 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 1 - ResNet-50 tensorflow: CPU - 16 - GoogLeNet speedb: Read Rand Write Rand speedb: Read While Writing y-cruncher: 500M cachebench: Read / Modify / Write pytorch: CPU - 16 - Efficientnet_v2_l tensorflow: CPU - 1 - ResNet-50 tensorflow: CPU - 16 - ResNet-50 cachebench: Write rav1e: 1 speedb: Update Rand cachebench: Read quicksilver: CTS2 llama-cpp: llama-2-7b.Q4_0.gguf rav1e: 5 quicksilver: CORAL2 P1 y-cruncher: 1B pytorch: CPU - 1 - Efficientnet_v2_l llama-cpp: llama-2-13b.Q4_0.gguf speedb: Seq Fill speedb: Rand Fill Sync speedb: Rand Fill llama-cpp: llama-2-70b-chat.Q5_0.gguf tensorflow: CPU - 1 - GoogLeNet tensorflow: CPU - 1 - AlexNet quicksilver: CORAL2 P2 a b 2.07 8.630 13.95 61.41 30.68 129495776 6.90 3.733 11.74 35.79 42.90 2422144 7991264 7.925 119470.658017 5.99 6.37 12.54 61568.843405 0.836 241710 11066.955437 20786667 19.36 2.813 21960000 16.373 7.90 10.63 254122 5618 252455 1.86 9.23 5.35 28950000 2.01 8.869 14.19 60.42 31.16 131475734 6.81 3.78 11.88 35.43 43.29 2403103 8052776 7.986 118721.395337 6.02 6.4 12.59 61337.271973 0.839 240957 11033.02931 20850000 19.32 2.809 21990000 16.394 7.89 10.64 254353 5613 252251 1.86 9.23 5.35 28950000 OpenBenchmarking.org
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: VGG-16 a b 0.4658 0.9316 1.3974 1.8632 2.329 SE +/- 0.01, N = 3 2.07 2.01
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 10 a b 2 4 6 8 10 SE +/- 0.075, N = 3 8.630 8.869
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 a b 4 8 12 16 20 SE +/- 0.08, N = 3 13.95 14.19 MIN: 13.59 / MAX: 14.24 MIN: 14.01 / MAX: 14.36
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet a b 14 28 42 56 70 SE +/- 0.13, N = 3 61.41 60.42
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 7 14 21 28 35 SE +/- 0.17, N = 3 30.68 31.16 MIN: 30 / MAX: 31.31 MIN: 30.47 / MAX: 31.36
Speedb Test: Random Read OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read a b 30M 60M 90M 120M 150M SE +/- 1683044.27, N = 3 129495776 131475734 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 a b 2 4 6 8 10 SE +/- 0.01, N = 3 6.90 6.81
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 6 a b 0.8505 1.701 2.5515 3.402 4.2525 SE +/- 0.017, N = 3 3.733 3.780
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 a b 3 6 9 12 15 SE +/- 0.05, N = 3 11.74 11.88 MIN: 11.43 / MAX: 11.97 MIN: 11.8 / MAX: 11.95
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 8 16 24 32 40 SE +/- 0.14, N = 3 35.79 35.43 MIN: 35 / MAX: 36.54 MIN: 34.31 / MAX: 36.32
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet a b 10 20 30 40 50 SE +/- 0.16, N = 3 42.90 43.29
Speedb Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random a b 500K 1000K 1500K 2000K 2500K SE +/- 3556.84, N = 3 2422144 2403103 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read While Writing a b 2M 4M 6M 8M 10M SE +/- 69727.75, N = 3 7991264 8052776 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 500M a b 2 4 6 8 10 SE +/- 0.007, N = 3 7.925 7.986
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write a b 30K 60K 90K 120K 150K SE +/- 81.20, N = 3 119470.66 118721.40 MIN: 97982.3 / MAX: 130920.18 MIN: 99330.52 / MAX: 130732.06 1. (CC) gcc options: -O3 -lrt
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l a b 2 4 6 8 10 SE +/- 0.02, N = 3 5.99 6.02 MIN: 5.9 / MAX: 6.06 MIN: 5.98 / MAX: 6.06
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 2 4 6 8 10 SE +/- 0.01, N = 3 6.37 6.40
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 3 6 9 12 15 SE +/- 0.08, N = 3 12.54 12.59
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write a b 13K 26K 39K 52K 65K SE +/- 29.06, N = 3 61568.84 61337.27 MIN: 40788.42 / MAX: 66208.33 MIN: 41835.46 / MAX: 65734.93 1. (CC) gcc options: -O3 -lrt
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 1 a b 0.1888 0.3776 0.5664 0.7552 0.944 SE +/- 0.001, N = 3 0.836 0.839
Speedb Test: Update Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random a b 50K 100K 150K 200K 250K SE +/- 580.55, N = 3 241710 240957 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read a b 2K 4K 6K 8K 10K SE +/- 33.97, N = 3 11066.96 11033.03 MIN: 10956.53 / MAX: 11112.67 MIN: 11002.15 / MAX: 11058.91 1. (CC) gcc options: -O3 -lrt
Quicksilver Input: CTS2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CTS2 a b 4M 8M 12M 16M 20M SE +/- 37564.76, N = 3 20786667 20850000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf a b 5 10 15 20 25 SE +/- 0.07, N = 3 19.36 19.32 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 5 a b 0.6329 1.2658 1.8987 2.5316 3.1645 SE +/- 0.005, N = 3 2.813 2.809
Quicksilver Input: CORAL2 P1 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 a b 5M 10M 15M 20M 25M SE +/- 0.00, N = 3 21960000 21990000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 1B a b 4 8 12 16 20 SE +/- 0.03, N = 3 16.37 16.39
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l a b 2 4 6 8 10 SE +/- 0.01, N = 3 7.90 7.89 MIN: 7.78 / MAX: 8.01 MIN: 7.8 / MAX: 7.97
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b 3 6 9 12 15 SE +/- 0.00, N = 3 10.63 10.64 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Speedb Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Sequential Fill a b 50K 100K 150K 200K 250K SE +/- 336.51, N = 3 254122 254353 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill Sync a b 1200 2400 3600 4800 6000 SE +/- 49.65, N = 3 5618 5613 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Random Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill a b 50K 100K 150K 200K 250K SE +/- 91.82, N = 3 252455 252251 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf a b 0.4185 0.837 1.2555 1.674 2.0925 SE +/- 0.00, N = 3 1.86 1.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: GoogLeNet a b 3 6 9 12 15 SE +/- 0.02, N = 3 9.23 9.23
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: AlexNet a b 1.2038 2.4076 3.6114 4.8152 6.019 SE +/- 0.00, N = 3 5.35 5.35
Quicksilver Input: CORAL2 P2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 a b 6M 12M 18M 24M 30M SE +/- 37859.39, N = 3 28950000 28950000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Phoronix Test Suite v10.8.5