Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS) and ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB on EndeavourOS rolling via the Phoronix Test Suite.
NVIDIA RTX 4070 SUPER Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: ASUS NVIDIA GeForce RTX 4070 SUPER 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysCompiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
NVIDIA RTX 4070 Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: MSI NVIDIA GeForce RTX 4070 12GB , Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2aPython Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
NVIDIA RTX 4070 TI Changed Graphics to NVIDIA GeForce RTX 4070 Ti 12GB .
Graphics Change: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36
NVIDIA RTX 3090 Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: NVIDIA GeForce RTX 3090 24GB , Audio: Realtek ALC1220, Monitor: PI-KVM Video , Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.4-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.baPython Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
NVIDIA RTX 4070 TI SUPER Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS) , Chipset: Intel Raptor Lake-S PCH , Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001 + 0GB CD-ROM Drive , Graphics: ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB , Audio: Realtek ALC1220, Monitor: PI-KVM Video, Network: Intel I226-V + Intel Raptor Lake-S PCH CNVi WiFi
OS: EndeavourOS rolling, Kernel: 6.7.4-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 2.1 AMD-APP (3602.0) + OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11fGraphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.03.45.00.c5Python Notes: Python 3.11.7Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
PyTorch This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 3090 120 240 360 480 600 SE +/- 3.07, N = 3 SE +/- 3.09, N = 3 SE +/- 11.16, N = 12 558.82 557.73 546.76 535.39 525.12 MIN: 473.77 / MAX: 573.46 MIN: 513.63 / MAX: 563.37 MIN: 195.25 / MAX: 556.94 MIN: 428.43 / MAX: 572.99 MIN: 458.54 / MAX: 542.46
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 40 80 120 160 200 SE +/- 0.73, N = 3 SE +/- 0.38, N = 3 SE +/- 0.36, N = 3 SE +/- 0.09, N = 2 201.94 201.19 200.46 198.18 197.12 MIN: 183.53 / MAX: 206.5 MIN: 180.79 / MAX: 203.92 MIN: 177.25 / MAX: 203.31 MIN: 181.27 / MAX: 200.06 MIN: 137.37 / MAX: 198.9
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 120 240 360 480 600 SE +/- 1.33, N = 3 SE +/- 2.23, N = 3 SE +/- 0.26, N = 3 SE +/- 0.89, N = 2 531.96 509.45 502.92 458.39 419.76 MIN: 422.98 / MAX: 539.81 MIN: 430.1 / MAX: 516.48 MIN: 415.65 / MAX: 520.39 MIN: 404.5 / MAX: 461.01 MIN: 376.2 / MAX: 422.17
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 120 240 360 480 600 SE +/- 0.70, N = 3 SE +/- 1.69, N = 3 SE +/- 2.17, N = 2 SE +/- 0.13, N = 2 532.77 505.55 501.50 459.94 420.29 MIN: 420.31 / MAX: 538.98 MIN: 419.93 / MAX: 512.69 MIN: 415.94 / MAX: 510.69 MIN: 403.65 / MAX: 462.59 MIN: 376.81 / MAX: 421.58
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 110 220 330 440 550 SE +/- 1.58, N = 3 SE +/- 0.92, N = 3 SE +/- 1.92, N = 3 SE +/- 0.27, N = 3 SE +/- 0.24, N = 3 527.82 507.45 505.62 458.36 419.03 MIN: 419.39 / MAX: 534.44 MIN: 423.41 / MAX: 512.88 MIN: 426.6 / MAX: 513.25 MIN: 404.89 / MAX: 461.01 MIN: 376 / MAX: 422
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.29, N = 3 198.58 195.40 194.29 187.26 164.14 MIN: 183.91 / MAX: 201.98 MIN: 186.09 / MAX: 197.7 MIN: 182.25 / MAX: 197.39 MIN: 179.81 / MAX: 188.21 MIN: 145.67 / MAX: 165.38
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 110 220 330 440 550 SE +/- 0.54, N = 3 SE +/- 1.39, N = 3 SE +/- 0.34, N = 3 SE +/- 0.14, N = 3 529.14 504.67 459.93 416.89 MIN: 414.54 / MAX: 534.65 MIN: 412.34 / MAX: 514.07 MIN: 403.65 / MAX: 462.74 MIN: 329.77 / MAX: 420.82
Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: TypeError: 'NoneType' object is not callable
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 40 80 120 160 200 SE +/- 0.28, N = 3 SE +/- 0.29, N = 3 198.82 197.82 195.39 187.69 163.74 MIN: 188.33 / MAX: 201.47 MIN: 176.19 / MAX: 201.63 MIN: 183.94 / MAX: 198.7 MIN: 182.03 / MAX: 188.31 MIN: 144.93 / MAX: 165.03
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 110 220 330 440 550 SE +/- 1.16, N = 3 SE +/- 0.83, N = 2 SE +/- 4.43, N = 2 SE +/- 0.43, N = 2 SE +/- 0.40, N = 3 529.49 504.66 504.27 459.27 416.20 MIN: 410.12 / MAX: 537.25 MIN: 424.27 / MAX: 509.08 MIN: 418.22 / MAX: 512.44 MIN: 405.48 / MAX: 461.88 MIN: 355.45 / MAX: 419.05
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 40 80 120 160 200 SE +/- 0.78, N = 2 SE +/- 0.20, N = 3 SE +/- 0.51, N = 3 SE +/- 0.34, N = 3 197.02 196.50 196.07 186.63 164.14 MIN: 183.92 / MAX: 200.54 MIN: 179.34 / MAX: 200 MIN: 171.95 / MAX: 199.96 MIN: 180.51 / MAX: 187.79 MIN: 149 / MAX: 165
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 40 80 120 160 200 SE +/- 0.95, N = 3 SE +/- 0.19, N = 2 SE +/- 1.14, N = 2 SE +/- 0.17, N = 3 198.70 195.86 194.58 187.27 161.01 MIN: 185.21 / MAX: 203.36 MIN: 181.64 / MAX: 199.2 MIN: 183.74 / MAX: 198.52 MIN: 179.9 / MAX: 188.08 MIN: 138.12 / MAX: 165.16
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 40 80 120 160 200 SE +/- 0.81, N = 3 SE +/- 1.38, N = 2 SE +/- 0.05, N = 3 SE +/- 0.33, N = 2 198.01 195.30 194.87 187.51 164.35 MIN: 185.3 / MAX: 202.59 MIN: 182 / MAX: 199.43 MIN: 180.8 / MAX: 198 MIN: 181.57 / MAX: 188.05 MIN: 149.91 / MAX: 166.09
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 20 40 60 80 100 SE +/- 0.55, N = 3 SE +/- 0.24, N = 2 SE +/- 0.33, N = 3 108.59 107.59 106.37 105.86 105.55 MIN: 99.04 / MAX: 110.68 MIN: 98.77 / MAX: 109.43 MIN: 97.91 / MAX: 108.16 MIN: 95.05 / MAX: 107.6 MIN: 91.76 / MAX: 107.42
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 20 40 60 80 100 SE +/- 0.33, N = 3 SE +/- 0.52, N = 2 SE +/- 0.53, N = 2 103.68 103.66 103.45 98.11 MIN: 96.86 / MAX: 105.56 MIN: 93.46 / MAX: 105.95 MIN: 95.22 / MAX: 105.88 MIN: 89.88 / MAX: 100.25
Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: AttributeError: 'tuple' object has no attribute '_compiled_call_impl'
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI 20 40 60 80 100 SE +/- 0.62, N = 3 SE +/- 0.13, N = 3 SE +/- 6.65, N = 5 102.90 102.83 102.60 99.05 96.50 MIN: 95.98 / MAX: 104.54 MIN: 92.44 / MAX: 105.47 MIN: 94.84 / MAX: 104.25 MIN: 91.8 / MAX: 100.69 MIN: 64.35 / MAX: 104.79
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.39, N = 2 SE +/- 1.49, N = 2 SE +/- 0.45, N = 3 SE +/- 0.14, N = 3 103.49 103.20 102.60 101.55 99.84 MIN: 93.23 / MAX: 105.43 MIN: 95.31 / MAX: 105.27 MIN: 79.69 / MAX: 105.28 MIN: 93.44 / MAX: 103.08 MIN: 92.73 / MAX: 101.46
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 20 40 60 80 100 SE +/- 0.18, N = 3 SE +/- 0.05, N = 2 SE +/- 0.57, N = 3 103.24 103.17 102.83 101.24 99.43 MIN: 95.41 / MAX: 104.9 MIN: 95.79 / MAX: 105.15 MIN: 93.16 / MAX: 105.07 MIN: 93.33 / MAX: 102.92 MIN: 90.49 / MAX: 101.97
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 20 40 60 80 100 SE +/- 0.36, N = 2 SE +/- 0.39, N = 3 SE +/- 0.19, N = 3 103.57 103.53 103.50 101.43 99.25 MIN: 95.95 / MAX: 105.54 MIN: 88.81 / MAX: 104.8 MIN: 94.95 / MAX: 105.61 MIN: 93.27 / MAX: 103.58 MIN: 91.16 / MAX: 101.18
VkFFT OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT R2C / C2R NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 13K 26K 39K 52K 65K SE +/- 772.47, N = 15 SE +/- 520.37, N = 3 SE +/- 702.53, N = 15 SE +/- 320.62, N = 3 SE +/- 745.02, N = 13 59378 55446 54794 48418 47097 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in half precision NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 60K 120K 180K 240K 300K SE +/- 160.60, N = 3 SE +/- 3524.05, N = 12 SE +/- 1301.92, N = 3 SE +/- 1708.38, N = 3 SE +/- 159.17, N = 3 273221 143992 137762 136210 131705 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein in single precision NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 3K 6K 9K 12K 15K SE +/- 73.00, N = 3 SE +/- 102.52, N = 3 SE +/- 118.41, N = 3 SE +/- 115.62, N = 3 SE +/- 52.09, N = 3 16141 15166 15125 14205 13714 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in double precision NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 7K 14K 21K 28K 35K SE +/- 50.66, N = 3 SE +/- 325.03, N = 3 SE +/- 302.46, N = 3 SE +/- 146.69, N = 3 SE +/- 125.94, N = 3 30912 27947 25431 24317 22390 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 30K 60K 90K 120K 150K SE +/- 9.64, N = 3 SE +/- 33.60, N = 3 SE +/- 13.72, N = 3 SE +/- 0.88, N = 3 SE +/- 7.94, N = 3 141876 104003 77774 73942 73929 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C multidimensional in single precision NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 13K 26K 39K 52K 65K SE +/- 251.10, N = 3 SE +/- 417.77, N = 15 SE +/- 407.28, N = 15 SE +/- 407.19, N = 15 SE +/- 476.57, N = 5 59790 51528 50856 50299 47212 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein benchmark in double precision NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 1100 2200 3300 4400 5500 SE +/- 11.37, N = 3 SE +/- 11.35, N = 3 SE +/- 12.55, N = 3 SE +/- 9.84, N = 3 SE +/- 4.51, N = 3 5047 4647 4451 4195 3886 1. (CXX) g++ options: -O3 -lrt
OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 30K 60K 90K 120K 150K SE +/- 37.44, N = 3 SE +/- 20.80, N = 3 SE +/- 5.84, N = 3 SE +/- 28.54, N = 3 SE +/- 37.77, N = 3 144311 105549 79057 75141 75078 1. (CXX) g++ options: -O3 -lrt
PlaidML This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.
FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test quit with a non-zero exit status.
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test quit with a non-zero exit status. E: AttributeError: 'method_descriptor' object has no attribute 'default'
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test quit with a non-zero exit status. The test run did not produce a result.
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
Libplacebo Libplacebo is a multimedia rendering library based on the core rendering code of the MPV player. The libplacebo benchmark relies on the Vulkan API and tests various primitives. Learn more via the OpenBenchmarking.org test page.
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: libplacebo: line 3: ./src/bench: No such file or directory
NeatBench NeatBench is a benchmark of the cross-platform Neat Video software on the CPU and optional GPU (OpenCL / CUDA) support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 512.75, N = 16 4070.0 4070.0 4070.0 3090.0 2084.1
Libplacebo Libplacebo is a multimedia rendering library based on the core rendering code of the MPV player. The libplacebo benchmark relies on the Vulkan API and tests various primitives. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: deband_heavy NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 500 1000 1500 2000 2500 SE +/- 0.75, N = 3 SE +/- 0.56, N = 3 SE +/- 2.26, N = 3 SE +/- 2.92, N = 3 SE +/- 0.08, N = 3 2495.92 2306.67 2186.70 2024.61 1847.98 1. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF
OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: polar_nocompute NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 600 1200 1800 2400 3000 SE +/- 1.94, N = 3 SE +/- 0.26, N = 3 SE +/- 0.24, N = 3 SE +/- 3.45, N = 3 SE +/- 0.16, N = 3 2653.03 2461.23 2327.55 2126.31 1972.78 1. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF
OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: hdr_peakdetect NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 1100 2200 3300 4400 5500 SE +/- 13.97, N = 3 SE +/- 28.18, N = 3 SE +/- 99.97, N = 3 SE +/- 144.09, N = 3 SE +/- 3.65, N = 3 5104.10 3931.57 3544.60 3452.43 3292.37 1. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF
OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: hdr_lut NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 900 1800 2700 3600 4500 SE +/- 33.96, N = 3 SE +/- 6.47, N = 3 SE +/- 12.09, N = 3 SE +/- 17.88, N = 3 SE +/- 22.23, N = 3 3976.04 3946.90 3905.98 3845.51 3376.85 1. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF
OpenBenchmarking.org FPS, More Is Better Libplacebo 5.229.1 Test: av1_grain_lap NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 900 1800 2700 3600 4500 SE +/- 5.52, N = 3 SE +/- 16.20, N = 3 SE +/- 35.33, N = 3 SE +/- 12.99, N = 3 SE +/- 39.01, N = 3 4171.00 4152.41 4143.96 4126.89 4057.41 1. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF
ProjectPhysX OpenCL-Benchmark ProjectPhysX OpenCL-Benchmark provides various OpenCL compute and memory bandwidth micro-benchmarks Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 200 400 600 800 1000 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 864.11 619.03 465.18 465.07 464.86 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 200 400 600 800 1000 SE +/- 0.06, N = 3 SE +/- 0.57, N = 3 SE +/- 0.16, N = 3 SE +/- 0.11, N = 3 SE +/- 0.14, N = 3 887.31 608.94 459.43 457.17 455.01 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 200 400 600 800 1000 SE +/- 0.32, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.12, N = 3 825.8 595.2 446.3 446.3 446.2 1. (CC) gcc options: -O2 -flto -lOpenCL
OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 160 320 480 640 800 SE +/- 0.83, N = 3 SE +/- 0.25, N = 3 SE +/- 0.12, N = 3 SE +/- 1.11, N = 3 SE +/- 0.55, N = 3 753.8 551.9 412.2 407.5 406.7 1. (CC) gcc options: -O2 -flto -lOpenCL
ViennaCL ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile makes use of ViennaCL's built-in benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER 30 60 90 120 150 SE +/- 1.20, N = 3 SE +/- 0.88, N = 3 SE +/- 1.20, N = 3 SE +/- 1.20, N = 3 SE +/- 0.67, N = 3 132 132 132 131 107 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER 30 60 90 120 150 SE +/- 2.00, N = 3 SE +/- 2.19, N = 3 SE +/- 0.33, N = 3 SE +/- 4.81, N = 3 SE +/- 0.33, N = 3 156 156 154 153 120 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 40 80 120 160 200 SE +/- 2.40, N = 3 SE +/- 3.76, N = 3 SE +/- 2.73, N = 3 SE +/- 35.40, N = 3 SE +/- 0.58, N = 3 168.0 166.0 165.0 132.1 129.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 16 32 48 64 80 SE +/- 0.74, N = 3 SE +/- 0.25, N = 3 SE +/- 0.32, N = 3 SE +/- 0.72, N = 3 SE +/- 0.18, N = 3 71.3 71.0 70.8 70.2 52.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 20 40 60 80 100 SE +/- 0.57, N = 3 SE +/- 0.12, N = 3 SE +/- 0.44, N = 3 SE +/- 0.94, N = 3 SE +/- 0.12, N = 3 87.3 87.2 86.8 86.2 64.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.22, N = 3 SE +/- 0.58, N = 3 SE +/- 0.84, N = 3 SE +/- 0.19, N = 3 96.8 96.7 96.4 95.2 70.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 20 40 60 80 100 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.46, N = 3 103.0 103.0 103.0 102.0 78.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 20 40 60 80 100 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 6.30, N = 3 SE +/- 0.47, N = 3 110.0 109.0 109.0 102.7 82.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 80 160 240 320 400 SE +/- 0.33, N = 3 SE +/- 1.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 373 363 336 334 330 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 110 220 330 440 550 SE +/- 0.58, N = 3 SE +/- 1.20, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 498 469 393 392 389 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 90 180 270 360 450 SE +/- 1.00, N = 3 SE +/- 0.58, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 410 376 370 365 362 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 130 260 390 520 650 SE +/- 0.58, N = 3 SE +/- 4.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 605 512 424 423 423 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 160 320 480 640 800 SE +/- 0.58, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 724 585 455 437 437 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 140 280 420 560 700 SE +/- 0.88, N = 3 SE +/- 1.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 659 575 458 457 456 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 50 100 150 200 250 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 218 211 210 209 187 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 90 180 270 360 450 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 424 391 389 387 374 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
clpeak Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 200 400 600 800 1000 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 816.55 582.84 437.65 437.63 437.21 1. (CXX) g++ options: -O3
vkpeak Vkpeak is a Vulkan compute benchmark inspired by OpenCL's clpeak. Vkpeak provides Vulkan compute performance measurements for FP16 / FP32 / FP64 / INT16 / INT32 scalar and vec4 performance. Learn more via the OpenBenchmarking.org test page.
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.
clpeak Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 9K 18K 27K 36K 45K SE +/- 50.25, N = 3 SE +/- 11.67, N = 3 SE +/- 0.99, N = 3 SE +/- 113.39, N = 3 SE +/- 5.46, N = 3 43244.79 38691.73 35492.69 34906.79 28479.39 1. (CXX) g++ options: -O3
OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 160 320 480 640 800 SE +/- 1.26, N = 3 SE +/- 1.33, N = 3 SE +/- 1.63, N = 3 SE +/- 0.98, N = 3 SE +/- 0.21, N = 3 750.36 667.05 642.23 630.11 515.17 1. (CXX) g++ options: -O3
ViennaCL ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile makes use of ViennaCL's built-in benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 30 60 90 120 150 SE +/- 1.50, N = 2 SE +/- 1.86, N = 3 SE +/- 4.04, N = 3 SE +/- 1.15, N = 3 SE +/- 1.86, N = 3 122 122 119 117 113 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 30 60 90 120 150 SE +/- 1.76, N = 3 SE +/- 3.50, N = 2 SE +/- 3.28, N = 3 SE +/- 1.20, N = 3 SE +/- 2.08, N = 3 122 119 119 118 117 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER 30 60 90 120 150 SE +/- 2.08, N = 3 SE +/- 2.08, N = 3 SE +/- 2.31, N = 3 SE +/- 3.00, N = 2 SE +/- 1.00, N = 2 125 121 121 120 115 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 30 60 90 120 150 SE +/- 2.08, N = 3 SE +/- 2.08, N = 3 SE +/- 1.20, N = 3 SE +/- 2.91, N = 3 SE +/- 0.88, N = 3 124 122 118 117 113 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 150 300 450 600 750 SE +/- 1.33, N = 3 SE +/- 0.33, N = 3 SE +/- 2.31, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 681 604 592 577 473 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 150 300 450 600 750 SE +/- 1.00, N = 3 SE +/- 0.33, N = 3 SE +/- 2.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 689 612 595 584 477 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 150 300 450 600 750 SE +/- 1.00, N = 3 SE +/- 0.67, N = 3 SE +/- 0.00, N = 3 SE +/- 2.03, N = 3 SE +/- 0.33, N = 3 714 634 599 594 494 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 160 320 480 640 800 SE +/- 1.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 731 648 613 593 502 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
clpeak Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 5K 10K 15K 20K 25K SE +/- 28.14, N = 3 SE +/- 2.50, N = 3 SE +/- 3.14, N = 3 SE +/- 16.49, N = 3 SE +/- 15.26, N = 3 22171.25 19821.10 18170.54 17923.33 14555.19 1. (CXX) g++ options: -O3
Hashcat Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: MD5 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 20000M 40000M 60000M 80000M 100000M SE +/- 97655010.68, N = 3 SE +/- 11283665.68, N = 3 SE +/- 22430807.19, N = 3 SE +/- 53667246.37, N = 3 SE +/- 33772046.30, N = 3 82004966667 73312233333 67583033333 67177300000 56147866667
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 6000M 12000M 18000M 24000M 30000M SE +/- 29067564.97, N = 3 SE +/- 15926811.78, N = 3 SE +/- 5140363.15, N = 3 SE +/- 26244639.66, N = 3 SE +/- 6318315.53, N = 3 26388600000 23532400000 22132600000 21323733333 18202466667
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: 7-Zip NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 300K 600K 900K 1200K 1500K SE +/- 1628.91, N = 3 SE +/- 2339.04, N = 3 SE +/- 1991.93, N = 3 SE +/- 1587.45, N = 3 SE +/- 2062.63, N = 3 1420700 1262633 1176467 1056000 976967
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 800M 1600M 2400M 3200M 4000M SE +/- 1098989.43, N = 3 SE +/- 721110.26, N = 3 SE +/- 1530068.99, N = 3 SE +/- 3288532.26, N = 3 SE +/- 1059874.21, N = 3 3887033333 3462500000 3232733333 3081866667 2673300000
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 200K 400K 600K 800K 1000K SE +/- 392.99, N = 3 SE +/- 888.82, N = 3 SE +/- 633.33, N = 3 SE +/- 1757.21, N = 3 SE +/- 176.38, N = 3 961733 858600 802967 797833 660967
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: VGG-16 NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 0.3105 0.621 0.9315 1.242 1.5525 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 2 SE +/- 0.00, N = 3 1.38 1.38 1.36 1.35 1.32
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: AlexNet NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 4 8 12 16 20 SE +/- 0.06, N = 2 SE +/- 0.20, N = 15 SE +/- 0.16, N = 3 SE +/- 0.22, N = 2 SE +/- 0.13, N = 15 14.79 14.45 14.04 13.92 12.26
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: VGG-16 NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.01, N = 2 SE +/- 0.00, N = 3 SE +/- 0.00, N = 2 SE +/- 0.00, N = 3 1.50 1.49 1.49 1.48 1.45
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: VGG-16 NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.00, N = 3 SE +/- 0.00, N = 2 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.50 1.50 1.50 1.50 1.46
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: VGG-16 NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER 0.3398 0.6796 1.0194 1.3592 1.699 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.51 1.50 1.50 1.46
Device: GPU - Batch Size: 64 - Model: VGG-16
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: UnboundLocalError: cannot access local variable 'decorators' where it is not associated with a value
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: AlexNet NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER 7 14 21 28 35 SE +/- 0.07, N = 3 SE +/- 0.08, N = 3 SE +/- 0.17, N = 3 SE +/- 0.07, N = 3 31.98 31.70 31.59 31.45 31.10
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 256 - Model: VGG-16 NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 0.3398 0.6796 1.0194 1.3592 1.699 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.51 1.50 1.47
Device: GPU - Batch Size: 256 - Model: VGG-16
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: AttributeError: 'collections.OrderedDict' object has no attribute 'empty'
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: AlexNet NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 8 16 24 32 40 SE +/- 0.05, N = 3 SE +/- 0.15, N = 2 SE +/- 0.18, N = 3 SE +/- 0.04, N = 3 SE +/- 0.19, N = 3 33.53 33.40 33.32 33.29 32.88
Device: GPU - Batch Size: 512 - Model: VGG-16
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: AttributeError: 'function' object has no attribute 'empty'
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: AlexNet NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER 8 16 24 32 40 SE +/- 0.06, N = 3 SE +/- 0.08, N = 3 SE +/- 0.14, N = 3 SE +/- 0.06, N = 3 34.06 33.97 33.93 33.93 33.55
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: GoogLeNet NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.30, N = 2 SE +/- 0.10, N = 3 SE +/- 0.17, N = 2 SE +/- 0.05, N = 3 12.82 12.79 12.78 12.62 12.24
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: ResNet-50 NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 0.9788 1.9576 2.9364 3.9152 4.894 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 2 SE +/- 0.02, N = 3 4.35 4.35 4.34 4.32 4.14
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 256 - Model: AlexNet NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 8 16 24 32 40 SE +/- 0.07, N = 2 SE +/- 0.07, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 34.61 34.46 34.16 33.95
Device: GPU - Batch Size: 256 - Model: AlexNet
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: UnboundLocalError: cannot access local variable 'kind' where it is not associated with a value
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 512 - Model: AlexNet NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 8 16 24 32 40 SE +/- 0.01, N = 3 SE +/- 0.09, N = 2 SE +/- 0.03, N = 3 SE +/- 0.02, N = 2 SE +/- 0.01, N = 3 35.58 35.44 35.21 35.10 35.02
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: GoogLeNet NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 15.69 15.68 15.67 15.66 15.29
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: ResNet-50 NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 1.2353 2.4706 3.7059 4.9412 6.1765 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 2 SE +/- 0.02, N = 3 5.49 5.49 5.46 5.46 5.32
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: GoogLeNet NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI SUPER 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 2 SE +/- 0.06, N = 3 15.81 15.67 15.63 15.61 15.11
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: ResNet-50 NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 1.2533 2.5066 3.7599 5.0132 6.2665 SE +/- 0.01, N = 3 SE +/- 0.01, N = 2 SE +/- 0.01, N = 2 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 5.57 5.55 5.51 5.50 5.35
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: GoogLeNet NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 2 SE +/- 0.09, N = 3 15.63 15.54 15.52 15.50 15.00
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: ResNet-50 NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER 1.2533 2.5066 3.7599 5.0132 6.2665 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 2 SE +/- 0.01, N = 2 SE +/- 0.02, N = 3 5.57 5.55 5.55 5.53 5.33
GpuOwl GpuOwl is a Mersenne primality tester leveraging OpenCL for cross-vendor GPU acceleration. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.2.1 Exponent: 57885161 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 200 400 600 800 1000 SE +/- 0.35, N = 3 SE +/- 2.53, N = 3 SE +/- 1.26, N = 3 SE +/- 2.01, N = 3 SE +/- 0.00, N = 3 1025.99 919.13 869.07 866.31 714.80
OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.2.1 Exponent: 77936867 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 160 320 480 640 800 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.09, N = 3 761.61 676.59 646.41 645.99 530.32
OpenBenchmarking.org Iterations / Second, More Is Better GpuOwl 7.2.1 Exponent: 332220523 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 40 80 120 160 200 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 163.41 145.84 137.44 137.32 112.61
IndigoBench This is a test of Indigo Renderer's IndigoBench benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Bedroom NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 24.57 20.96 20.26 19.80 18.20
OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Supercar NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 14 28 42 56 70 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 61.34 53.59 52.81 52.01 48.52
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 1.13, N = 12 SE +/- 0.01, N = 3 16.23 13.95 13.59 12.99 11.74 MIN: 15.91 / MAX: 16.36 MIN: 13.67 / MAX: 14.14 MIN: 12.52 / MAX: 13.84 MIN: 0.52 / MAX: 14.69 MIN: 11.35 / MAX: 11.83
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 SE +/- 0.08, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 12.42 10.99 10.56 10.20 8.89 MIN: 4.35 / MAX: 14.32 MIN: 4.17 / MAX: 12.71 MIN: 3.7 / MAX: 12.17 MIN: 4.07 / MAX: 11.93 MIN: 3.32 / MAX: 10.26
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 4 8 12 16 20 SE +/- 0.15, N = 4 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 13.64 12.14 11.89 11.72 10.40 MIN: 11.16 / MAX: 18.46 MIN: 10.24 / MAX: 16.71 MIN: 9.85 / MAX: 15.88 MIN: 9.6 / MAX: 15.44 MIN: 8.31 / MAX: 13.9
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 2 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 14.61 13.23 13.12 12.82 10.92 MIN: 5.91 / MAX: 16.88 MIN: 5.41 / MAX: 15.13 MIN: 4.85 / MAX: 15.21 MIN: 4.84 / MAX: 14.62 MIN: 4.45 / MAX: 12.42
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 8 16 24 32 40 SE +/- 0.36, N = 5 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 33.29 31.86 27.71 27.67 23.26 MIN: 30.4 / MAX: 36.21 MIN: 28.57 / MAX: 33.29 MIN: 25.01 / MAX: 29.15 MIN: 24.87 / MAX: 29.03 MIN: 20.92 / MAX: 24.3
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
Backend: OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
MandelGPU MandelGPU is an OpenCL benchmark and this test runs with the OpenCL rendering float4 kernel with a maximum of 4096 iterations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Samples/sec, More Is Better MandelGPU 1.3pts1 OpenCL Device: GPU NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 140M 280M 420M 560M 700M SE +/- 1096202.13, N = 3 SE +/- 1202791.77, N = 3 SE +/- 467034.80, N = 3 SE +/- 1783157.89, N = 3 SE +/- 794770.01, N = 3 656484783.7 619106132.5 587219538.2 516770131.2 484098913.8 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
ProjectPhysX OpenCL-Benchmark ProjectPhysX OpenCL-Benchmark provides various OpenCL compute and memory bandwidth micro-benchmarks Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 0.1672 0.3344 0.5016 0.6688 0.836 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 0.743 0.660 0.637 0.621 0.510 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.10, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 45.95 40.91 39.40 38.59 31.77 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 0.9945 1.989 2.9835 3.978 4.9725 SE +/- 0.016, N = 3 SE +/- 0.009, N = 3 SE +/- 0.015, N = 3 SE +/- 0.004, N = 3 SE +/- 0.003, N = 3 4.420 4.414 4.214 3.443 3.135 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 23.66 21.05 20.03 19.89 16.38 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 20.50 18.28 17.17 17.00 14.28 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 SE +/- 0.02, N = 3 17.62 15.73 14.31 13.73 12.12 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ArrayFire ArrayFire is an GPU and CPU numeric processing library, this test uses the built-in CPU and OpenCL ArrayFire benchmarks. Learn more via the OpenBenchmarking.org test page.
Test: Conjugate Gradient OpenCL
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: arrayfire: line 3: ./cg_opencl: No such file or directory
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: arrayfire: line 3: ./cg_opencl: No such file or directory
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: arrayfire: line 3: ./cg_opencl: No such file or directory
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: arrayfire: line 3: ./cg_opencl: No such file or directory
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: arrayfire: line 3: ./cg_opencl: No such file or directory
NAMD CUDA NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. This version of the NAMD test profile uses CUDA GPU acceleration. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD CUDA 2.14 ATPase Simulation - 327,506 Atoms NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 0.0243 0.0486 0.0729 0.0972 0.1215 SE +/- 0.00061, N = 3 SE +/- 0.00031, N = 3 SE +/- 0.00021, N = 3 SE +/- 0.00018, N = 3 SE +/- 0.00042, N = 3 0.06788 0.06791 0.07498 0.07715 0.10822
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7dd7c6de3450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7b80311e3450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x736df4b59450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x77ed97de3450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x78750ffea450 google::LogMessageFatal::~LogMessageFatal()
Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7b5ea59be450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7c31ed79d450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7ba579075450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7ace5f7b4450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x70294d9e3450 google::LogMessageFatal::~LogMessageFatal()
Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7670bcda4450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7bb89c5be450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x72248ee5c450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7d66735f5450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7ce671f7d450 google::LogMessageFatal::~LogMessageFatal()
Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x73552c3e3450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x71f0ea05a450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7898abd73450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7522f0d76450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x703a2cf99450 google::LogMessageFatal::~LogMessageFatal()
Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7d7151816450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7e64df79d450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x761e63d48450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7bfcc77e3450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7bc837b4a450 google::LogMessageFatal::~LogMessageFatal()
Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x74746a490450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7493bdbbc450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7338f7773450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7792f141e450 google::LogMessageFatal::~LogMessageFatal()
NVIDIA RTX 4070 TI SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: @ 0x7911d0e44450 google::LogMessageFatal::~LogMessageFatal()
VkResample VkResample is a Vulkan-based image upscaling library based on VkFFT. The sample input file is upscaling a 4K image to 8K using Vulkan-based GPU acceleration. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 90 180 270 360 450 SE +/- 0.02, N = 3 SE +/- 0.35, N = 3 SE +/- 0.30, N = 3 SE +/- 0.30, N = 3 SE +/- 0.77, N = 3 285.99 322.06 333.64 339.59 415.16 1. (CXX) g++ options: -O3
OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 10.32 13.36 18.02 18.46 18.49 1. (CXX) g++ options: -O3
FinanceBench FinanceBench is a collection of financial program benchmarks with support for benchmarking on the GPU via OpenCL and CPU benchmarking with OpenMP. The FinanceBench test cases are focused on Black-Sholes-Merton Process with Analytic European Option engine, QMC (Sobol) Monte-Carlo method (Equity Option Example), Bonds Fixed-rate bond with flat forward curve, and Repo Securities repurchase agreement. FinanceBench was originally written by the Cavazos Lab at University of Delaware. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 2 4 6 8 10 SE +/- 0.000, N = 3 SE +/- 0.003, N = 3 SE +/- 0.006, N = 3 SE +/- 0.114, N = 15 SE +/- 0.003, N = 3 0.501 5.226 5.741 5.912 6.906 1. (CXX) g++ options: -O3 -march=native -fopenmp
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
Target: Vulkan GPU
NVIDIA RTX 4070 SUPER: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ncnn: line 3: ./benchncnn: No such file or directory
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.22, N = 9 SE +/- 0.21, N = 12 SE +/- 0.25, N = 12 SE +/- 0.47, N = 9 6.28 6.92 7.20 7.45 8.62 MIN: 6.16 / MAX: 8.09 MIN: 6.06 / MAX: 8.65 MIN: 6.2 / MAX: 11.13 MIN: 6.87 / MAX: 734.65 MIN: 6.42 / MAX: 1101.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 SE +/- 0.07, N = 9 SE +/- 0.07, N = 12 SE +/- 0.44, N = 9 2.34 2.42 2.43 2.48 3.03 MIN: 2.04 / MAX: 2.63 MIN: 2.24 / MAX: 9.23 MIN: 2.09 / MAX: 5.8 MIN: 2.02 / MAX: 5.82 MIN: 2.38 / MAX: 970.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 SUPER 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.09, N = 9 SE +/- 0.08, N = 12 SE +/- 0.09, N = 9 SE +/- 0.16, N = 9 1.87 2.09 2.15 2.20 2.25 MIN: 1.81 / MAX: 5.21 MIN: 1.78 / MAX: 2.85 MIN: 1.81 / MAX: 2.58 MIN: 1.91 / MAX: 2.71 MIN: 1.75 / MAX: 343.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 NVIDIA RTX 4070 TI NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 0.9383 1.8766 2.8149 3.7532 4.6915 SE +/- 0.08, N = 12 SE +/- 0.21, N = 3 SE +/- 0.19, N = 3 SE +/- 0.09, N = 11 SE +/- 0.34, N = 8 2.01 2.04 2.05 2.08 2.31 MIN: 1.73 / MAX: 3.86 MIN: 1.8 / MAX: 5.8 MIN: 1.83 / MAX: 6.6 MIN: 1.82 / MAX: 2.59 MIN: 1.76 / MAX: 421.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 0.9315 1.863 2.7945 3.726 4.6575 SE +/- 0.14, N = 3 SE +/- 0.13, N = 3 SE +/- 0.08, N = 8 SE +/- 0.05, N = 9 SE +/- 1.31, N = 9 2.16 2.21 2.22 2.30 3.85 MIN: 2.01 / MAX: 2.55 MIN: 2.01 / MAX: 3.93 MIN: 1.83 / MAX: 2.54 MIN: 2.15 / MAX: 2.58 MIN: 1.89 / MAX: 1093.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 4 8 12 16 20 SE +/- 0.17, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 9 SE +/- 0.07, N = 12 SE +/- 0.97, N = 9 3.34 3.36 3.46 3.46 5.07 MIN: 3.14 / MAX: 4 MIN: 3.21 / MAX: 3.57 MIN: 2.91 / MAX: 3.79 MIN: 3.13 / MAX: 7.03 MIN: 3.22 / MAX: 1124.2 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER 0.198 0.396 0.594 0.792 0.99 SE +/- 0.03, N = 9 SE +/- 0.04, N = 9 SE +/- 0.03, N = 9 SE +/- 0.03, N = 9 SE +/- 0.05, N = 3 0.81 0.84 0.84 0.84 0.86 MIN: 0.61 / MAX: 1.19 MIN: 0.65 / MAX: 4.63 MIN: 0.64 / MAX: 0.96 MIN: 0.63 / MAX: 1.13 MIN: 0.75 / MAX: 2.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER 3 6 9 12 15 SE +/- 0.14, N = 9 SE +/- 0.14, N = 9 SE +/- 0.18, N = 9 SE +/- 0.24, N = 3 SE +/- 1.21, N = 9 5.87 6.06 6.11 6.25 11.04 MIN: 5.2 / MAX: 6.88 MIN: 5.33 / MAX: 8.36 MIN: 5.25 / MAX: 9.16 MIN: 5.84 / MAX: 6.86 MIN: 5.28 / MAX: 1769.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 30 60 90 120 150 SE +/- 0.25, N = 3 SE +/- 0.19, N = 3 SE +/- 11.81, N = 9 SE +/- 13.24, N = 12 SE +/- 29.60, N = 9 17.88 21.76 32.05 45.52 117.81 MIN: 17.3 / MAX: 18.57 MIN: 21.34 / MAX: 23.45 MIN: 17.34 / MAX: 644.35 MIN: 17.49 / MAX: 643.35 MIN: 17.16 / MAX: 647.67 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.08, N = 3 SE +/- 0.73, N = 12 SE +/- 1.33, N = 9 SE +/- 3.49, N = 9 4.12 4.64 5.11 5.47 8.97 MIN: 3.97 / MAX: 4.51 MIN: 4.46 / MAX: 7.98 MIN: 3.99 / MAX: 916.69 MIN: 3.95 / MAX: 726.67 MIN: 3.94 / MAX: 922.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 4 8 12 16 20 SE +/- 0.09, N = 3 SE +/- 0.03, N = 9 SE +/- 0.03, N = 3 SE +/- 1.70, N = 12 SE +/- 5.86, N = 9 3.60 3.74 4.38 5.78 16.17 MIN: 3.44 / MAX: 3.79 MIN: 3.61 / MAX: 3.98 MIN: 4.29 / MAX: 6.18 MIN: 3.6 / MAX: 397.75 MIN: 3.52 / MAX: 436.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 NVIDIA RTX 3090 NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 10 20 30 40 50 SE +/- 0.12, N = 9 SE +/- 0.10, N = 9 SE +/- 0.22, N = 3 SE +/- 4.00, N = 12 SE +/- 14.70, N = 9 8.20 8.24 8.58 12.25 46.26 MIN: 7.69 / MAX: 11.69 MIN: 7.87 / MAX: 9.87 MIN: 8.2 / MAX: 11.07 MIN: 8 / MAX: 1777.17 MIN: 7.71 / MAX: 1829.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 4070 SUPER 14 28 42 56 70 SE +/- 0.21, N = 3 SE +/- 3.14, N = 3 SE +/- 3.10, N = 12 SE +/- 5.37, N = 12 SE +/- 10.56, N = 9 11.29 14.26 16.37 20.74 63.82 MIN: 10.82 / MAX: 11.93 MIN: 10.89 / MAX: 673.37 MIN: 10.57 / MAX: 855.36 MIN: 10.3 / MAX: 854.36 MIN: 10.28 / MAX: 858.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 2 4 6 8 10 SE +/- 0.19, N = 3 SE +/- 0.54, N = 3 SE +/- 0.12, N = 12 SE +/- 0.29, N = 9 SE +/- 1.76, N = 9 4.90 5.11 5.18 5.36 6.86 MIN: 4.47 / MAX: 5.27 MIN: 4.43 / MAX: 8.4 MIN: 4.67 / MAX: 6.88 MIN: 4.55 / MAX: 496.3 MIN: 4.34 / MAX: 1630.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m NVIDIA RTX 4070 TI NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 SUPER 3 6 9 12 15 SE +/- 0.18, N = 12 SE +/- 0.24, N = 12 SE +/- 0.32, N = 8 SE +/- 0.26, N = 12 SE +/- 3.28, N = 9 5.89 6.21 6.47 6.59 11.11 MIN: 5.42 / MAX: 7.57 MIN: 5.53 / MAX: 8.99 MIN: 5.44 / MAX: 9.3 MIN: 5.45 / MAX: 9.09 MIN: 5.49 / MAX: 4942.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer NVIDIA RTX 4070 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 200 400 600 800 1000 SE +/- 61.31, N = 9 SE +/- 57.46, N = 12 SE +/- 52.80, N = 9 SE +/- 25.65, N = 9 SE +/- 87.53, N = 9 281.56 312.10 327.82 390.18 844.61 MIN: 46.48 / MAX: 1913.33 MIN: 47.85 / MAX: 1850.09 MIN: 46.48 / MAX: 1816.93 MIN: 46.49 / MAX: 1816.77 MIN: 46.34 / MAX: 1866.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet NVIDIA RTX 4070 NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER 2 4 6 8 10 SE +/- 0.10, N = 9 SE +/- 0.08, N = 8 SE +/- 0.26, N = 3 SE +/- 0.12, N = 8 SE +/- 0.29, N = 9 2.34 2.50 2.54 2.84 2.86 MIN: 2 / MAX: 3.86 MIN: 2.1 / MAX: 32.36 MIN: 2.14 / MAX: 4.21 MIN: 2.4 / MAX: 5.07 MIN: 2.17 / MAX: 577.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
RealSR-NCNN RealSR-NCNN is an NCNN neural network implementation of the RealSR project and accelerated using the Vulkan API. RealSR is the Real-World Super Resolution via Kernel Estimation and Noise Injection. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image by a scale of 4x with Vulkan. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: No NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 2 4 6 8 10 SE +/- 0.016, N = 3 SE +/- 0.003, N = 3 SE +/- 0.039, N = 3 SE +/- 0.150, N = 15 SE +/- 0.006, N = 3 5.556 5.633 5.962 6.323 7.092
OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: Yes NVIDIA RTX 3090 NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.23, N = 3 30.31 30.72 33.63 34.89 42.85
Waifu2x-NCNN Vulkan Waifu2x-NCNN is an NCNN neural network implementation of the Waifu2x converter project and accelerated using the Vulkan API. NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. This test profile times how long it takes to increase the resolution of a sample image with Vulkan. Learn more via the OpenBenchmarking.org test page.
Scale: 2x - Denoise: 3 - TAA: No
NVIDIA RTX 4070 SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 3090: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
NVIDIA RTX 4070 TI SUPER: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.
OpenBenchmarking.org Seconds, Fewer Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 0.7205 1.441 2.1615 2.882 3.6025 SE +/- 0.028, N = 3 SE +/- 0.009, N = 3 SE +/- 0.014, N = 3 SE +/- 0.028, N = 3 SE +/- 0.011, N = 3 2.660 2.854 2.855 3.168 3.202
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 0.9221 1.8442 2.7663 3.6884 4.6105 SE +/- 0.004, N = 3 SE +/- 0.002, N = 3 SE +/- 0.039, N = 4 SE +/- 0.030, N = 15 SE +/- 0.008, N = 3 2.973 3.291 3.480 3.844 4.098 1. (CXX) g++ options: -O2 -lOpenCL
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA OptiX NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 2 4 6 8 10 SE +/- 0.06, N = 14 SE +/- 0.02, N = 3 SE +/- 0.06, N = 13 SE +/- 0.01, N = 3 SE +/- 0.06, N = 14 5.04 5.43 5.57 6.21 6.31
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: NVIDIA OptiX NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 11.20 12.30 12.60 14.86 15.26
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA OptiX NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 3 6 9 12 15 SE +/- 0.06, N = 13 SE +/- 0.01, N = 3 SE +/- 0.06, N = 13 SE +/- 0.08, N = 9 SE +/- 0.03, N = 3 8.32 9.02 9.45 10.64 11.03
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: NVIDIA OptiX NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 3090 NVIDIA RTX 4070 13 26 39 52 65 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 2 SE +/- 0.04, N = 3 44.49 50.73 51.30 54.30 58.44
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX NVIDIA RTX 4070 TI SUPER NVIDIA RTX 4070 TI NVIDIA RTX 4070 SUPER NVIDIA RTX 4070 NVIDIA RTX 3090 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 12.56 13.97 14.29 16.55 17.30
NVIDIA RTX 4070 SUPER Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: ASUS NVIDIA GeForce RTX 4070 SUPER 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysCompiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 25 January 2024 21:36 by user test.
NVIDIA RTX 4070 Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: MSI NVIDIA GeForce RTX 4070 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2aPython Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 28 January 2024 13:02 by user test.
NVIDIA RTX 4070 TI Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: NVIDIA GeForce RTX 4070 Ti 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 29 January 2024 17:08 by user test.
NVIDIA RTX 3090 Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: NVIDIA GeForce RTX 3090 24GB, Audio: Realtek ALC1220, Monitor: PI-KVM Video, Network: Intel I226-V + Intel Device 7a70
OS: EndeavourOS rolling, Kernel: 6.7.4-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.baPython Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 7 February 2024 20:29 by user saddytech.
NVIDIA RTX 4070 TI SUPER Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS), Chipset: Intel Raptor Lake-S PCH, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001 + 0GB CD-ROM Drive, Graphics: ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB, Audio: Realtek ALC1220, Monitor: PI-KVM Video, Network: Intel I226-V + Intel Raptor Lake-S PCH CNVi WiFi
OS: EndeavourOS rolling, Kernel: 6.7.4-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 2.1 AMD-APP (3602.0) + OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: alwaysEnvironment Notes: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Notes: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnuProcessor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11fGraphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.03.45.00.c5Python Notes: Python 3.11.7Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 15 February 2024 17:25 by user saddytech.