dlda Tests for a future article. Intel Core i9-10980XE testing with a ASRock X299 Steel Legend (P1.50 BIOS) and llvmpipe on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2404061-PTS-DLDA801834&grr .
dlda Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b Intel Core i9-10980XE @ 4.80GHz (18 Cores / 36 Threads) ASRock X299 Steel Legend (P1.50 BIOS) Intel Sky Lake-E DMI3 Registers 4 x 8GB DDR4-3600MT/s Samsung SSD 970 PRO 512GB llvmpipe Realtek ALC1220 Intel I219-V + Intel I211 Ubuntu 22.04 6.5.0-18-generic (x86_64) GNOME Shell 42.2 X Server 1.21.1.4 4.5 Mesa 22.0.1 (LLVM 13.0.1 256 bits) 1.2.204 GCC 11.4.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_cpufreq schedutil - CPU Microcode: 0x5003604 Python Details - Python 3.10.12 Security Details - gather_data_sampling: Mitigation of Microcode + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Mitigation of Enhanced IBRS + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
dlda blender: Barbershop - CPU-Only brl-cad: VGR Performance Metric ffmpeg: libx265 - Platform ffmpeg: libx265 - Video On Demand ffmpeg: libx265 - Upload blender: Pabellon Barcelona - CPU-Only ffmpeg: libx264 - Upload blender: Classroom - CPU-Only ffmpeg: libx264 - Platform ffmpeg: libx264 - Video On Demand tensorflow: CPU - 64 - ResNet-50 ffmpeg: libx265 - Live llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU blender: Fishy Cat - CPU-Only specfem3d: Water-layered Halfspace blender: Junkshop - CPU-Only specfem3d: Layered Halfspace llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU blender: BMW27 - CPU-Only tensorflow: CPU - 32 - ResNet-50 rocksdb: Rand Fill Sync rocksdb: Rand Fill rocksdb: Update Rand rocksdb: Overwrite rocksdb: Read Rand Write Rand rocksdb: Read While Writing rocksdb: Rand Read tensorflow: CPU - 64 - GoogLeNet specfem3d: Homogeneous Halfspace tensorflow: CPU - 16 - ResNet-50 ffmpeg: libx264 - Live build-ffmpeg: Time To Compile specfem3d: Mount St. Helens specfem3d: Tomographic Model tensorflow: CPU - 32 - GoogLeNet llamafile: llava-v1.5-7b-q4 - CPU x265: Bosphorus 4K rocksdb: Seq Fill tensorflow: CPU - 64 - AlexNet build-mesa: Time To Compile x265: Bosphorus 1080p tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 1 - ResNet-50 tensorflow: CPU - 1 - AlexNet tensorflow: CPU - 1 - GoogLeNet a b 931.12 211860 17.77 17.80 9.17 313.36 11.15 264.21 41.55 41.62 41.59 41.09 2.93 120.6 115.601201526 117.6 112.489970072 9.49 91.49 41.22 4670 1122751 633669 1117358 2423455 3741170 74178255 125.69 51.132054445 40.06 178.34 47.369 42.510866238 40.100782187 128.1 13.31 20.86 1307492 287.38 26.37 29.37 235.74 128.18 153.21 14.49 14.37 51.39 931.98 212029 17.72 17.78 9.08 320.82 11.18 264.21 41.63 41.63 41.56 41.20 2.93 120.75 114.462975473 115.25 111.672788927 9.51 91.37 41.2 4668 1128064 631933 1121724 2425448 3698117 73039420 125.8 50.527685606 40.1 177.16 46.095 42.465093136 40.082694716 128.31 13.4 21.39 1326770 287.56 25.529 29.29 235.64 128.02 153.76 14.55 14.39 50.57 OpenBenchmarking.org
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Barbershop - Compute: CPU-Only a b 200 400 600 800 1000 931.12 931.98
BRL-CAD VGR Performance Metric OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.38.2 VGR Performance Metric a b 50K 100K 150K 200K 250K 211860 212029 1. (CXX) g++ options: -std=c++17 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -ltcl8.6 -lnetpbm -lregex_brl -lz_brl -lassimp -ldl -lm -ltk8.6
FFmpeg Encoder: libx265 - Scenario: Platform OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx265 - Scenario: Platform a b 4 8 12 16 20 17.77 17.72 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
FFmpeg Encoder: libx265 - Scenario: Video On Demand OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx265 - Scenario: Video On Demand a b 4 8 12 16 20 17.80 17.78 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
FFmpeg Encoder: libx265 - Scenario: Upload OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx265 - Scenario: Upload a b 3 6 9 12 15 9.17 9.08 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Pabellon Barcelona - Compute: CPU-Only a b 70 140 210 280 350 313.36 320.82
FFmpeg Encoder: libx264 - Scenario: Upload OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx264 - Scenario: Upload a b 3 6 9 12 15 11.15 11.18 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Classroom - Compute: CPU-Only a b 60 120 180 240 300 264.21 264.21
FFmpeg Encoder: libx264 - Scenario: Platform OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx264 - Scenario: Platform a b 9 18 27 36 45 41.55 41.63 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
FFmpeg Encoder: libx264 - Scenario: Video On Demand OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx264 - Scenario: Video On Demand a b 9 18 27 36 45 41.62 41.63 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 a b 9 18 27 36 45 41.59 41.56
FFmpeg Encoder: libx265 - Scenario: Live OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx265 - Scenario: Live a b 9 18 27 36 45 41.09 41.20 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.7 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU a b 0.6593 1.3186 1.9779 2.6372 3.2965 2.93 2.93
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Fishy Cat - Compute: CPU-Only a b 30 60 90 120 150 120.60 120.75
SPECFEM3D Model: Water-layered Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Water-layered Halfspace a b 30 60 90 120 150 115.60 114.46 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Blender Blend File: Junkshop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: Junkshop - Compute: CPU-Only a b 30 60 90 120 150 117.60 115.25
SPECFEM3D Model: Layered Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Layered Halfspace a b 30 60 90 120 150 112.49 111.67 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.7 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU a b 3 6 9 12 15 9.49 9.51
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.1 Blend File: BMW27 - Compute: CPU-Only a b 20 40 60 80 100 91.49 91.37
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 a b 9 18 27 36 45 41.22 41.20
RocksDB Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Random Fill Sync a b 1000 2000 3000 4000 5000 4670 4668 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Random Fill OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Random Fill a b 200K 400K 600K 800K 1000K 1122751 1128064 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Update Random OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Update Random a b 140K 280K 420K 560K 700K 633669 631933 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Overwrite OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Overwrite a b 200K 400K 600K 800K 1000K 1117358 1121724 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Read Random Write Random a b 500K 1000K 1500K 2000K 2500K 2423455 2425448 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Read While Writing a b 800K 1600K 2400K 3200K 4000K 3741170 3698117 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB Test: Random Read OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Random Read a b 16M 32M 48M 64M 80M 74178255 73039420 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: GoogLeNet a b 30 60 90 120 150 125.69 125.80
SPECFEM3D Model: Homogeneous Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Homogeneous Halfspace a b 12 24 36 48 60 51.13 50.53 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 a b 9 18 27 36 45 40.06 40.10
FFmpeg Encoder: libx264 - Scenario: Live OpenBenchmarking.org FPS, More Is Better FFmpeg 7.0 Encoder: libx264 - Scenario: Live a b 40 80 120 160 200 178.34 177.16 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Timed FFmpeg Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed FFmpeg Compilation 7.0 Time To Compile a b 11 22 33 44 55 47.37 46.10
SPECFEM3D Model: Mount St. Helens OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Mount St. Helens a b 10 20 30 40 50 42.51 42.47 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
SPECFEM3D Model: Tomographic Model OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Tomographic Model a b 9 18 27 36 45 40.10 40.08 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: GoogLeNet a b 30 60 90 120 150 128.10 128.31
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.7 Test: llava-v1.5-7b-q4 - Acceleration: CPU a b 3 6 9 12 15 13.31 13.40
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.6 Video Input: Bosphorus 4K a b 5 10 15 20 25 20.86 21.39 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
RocksDB Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Sequential Fill a b 300K 600K 900K 1200K 1500K 1307492 1326770 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: AlexNet a b 60 120 180 240 300 287.38 287.56
Timed Mesa Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Mesa Compilation 24.0 Time To Compile a b 6 12 18 24 30 26.37 25.53
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.6 Video Input: Bosphorus 1080p a b 7 14 21 28 35 29.37 29.29 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: AlexNet a b 50 100 150 200 250 235.74 235.64
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: GoogLeNet a b 30 60 90 120 150 128.18 128.02
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: AlexNet a b 30 60 90 120 150 153.21 153.76
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 4 8 12 16 20 14.49 14.55
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: AlexNet a b 4 8 12 16 20 14.37 14.39
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: GoogLeNet a b 12 24 36 48 60 51.39 50.57
Phoronix Test Suite v10.8.5