RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS) and ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402174-SADD-240211636&gru&sor.

RTX 4070 SUPERProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERIntel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads)ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS)Intel Device 7a2732GB4001GB Seagate ZP4000GP304001ASUS NVIDIA GeForce RTX 4070 SUPER 12GBRealtek ALC1220ARZOPAIntel I226-V + Intel Device 7a70EndeavourOS rolling6.7.1-arch1-1 (x86_64)KDE Plasma 5.27.10X Server 1.21.1.11NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.1 20230801ext41920x1080MSI NVIDIA GeForce RTX 4070 12GBGCC 13.2.1 20230801 + CUDA 12.3NVIDIA GeForce RTX 4070 Ti 12GBNVIDIA GeForce RTX 3090 24GBPI-KVM Video6.7.4-arch1-1 (x86_64)ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS)Intel Raptor Lake-S PCH4001GB Seagate ZP4000GP304001 + 0GB CD-ROM DriveASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GBIntel I226-V + Intel Raptor Lake-S PCH CNVi WiFiOpenCL 2.1 AMD-APP (3602.0) + OpenCL 3.0 CUDA 12.4.74OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- NVIDIA RTX 4070 SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 3090: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- NVIDIA RTX 4070 SUPER: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070 TI: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 3090: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070 TI SUPER: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11fGraphics Details- NVIDIA RTX 4070 SUPER: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1- NVIDIA RTX 4070: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2a- NVIDIA RTX 4070 TI: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36- NVIDIA RTX 3090: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.ba- NVIDIA RTX 4070 TI SUPER: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.03.45.00.c5Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected Environment Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090, NVIDIA RTX 4070 TI SUPER: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Python Details- NVIDIA RTX 4070: Python 3.11.6- NVIDIA RTX 4070 TI: Python 3.11.6- NVIDIA RTX 3090: Python 3.11.6- NVIDIA RTX 4070 TI SUPER: Python 3.11.7

RTX 4070 SUPERpytorch: NVIDIA CUDA GPU - 1 - ResNet-50pytorch: NVIDIA CUDA GPU - 1 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 16 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-152pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lvkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C Bluestein in single precisionvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingneatbench: GPUlibplacebo: deband_heavylibplacebo: polar_nocomputelibplacebo: hdr_peakdetectlibplacebo: hdr_lutlibplacebo: av1_grain_lapopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: Memory Bandwidth Coalesced Writecl-mem: Copycl-mem: Readcl-mem: Writeviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tclpeak: Global Memory Bandwidthclpeak: Single-Precision Floatclpeak: Double-Precision Doublevkpeak: fp32-scalarvkpeak: fp32-vec4vkpeak: fp16-scalarvkpeak: fp16-vec4vkpeak: fp64-scalarvkpeak: fp64-vec4viennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTclpeak: Integer Compute INTvkpeak: int32-scalarvkpeak: int32-vec4vkpeak: int16-scalarvkpeak: int16-vec4hashcat: MD5hashcat: SHA1hashcat: 7-Ziphashcat: SHA-512hashcat: TrueCrypt RIPEMD160 + XTStensorflow: GPU - 1 - VGG-16tensorflow: GPU - 1 - AlexNettensorflow: GPU - 16 - VGG-16tensorflow: GPU - 32 - VGG-16tensorflow: GPU - 64 - VGG-16tensorflow: GPU - 16 - AlexNettensorflow: GPU - 256 - VGG-16tensorflow: GPU - 32 - AlexNettensorflow: GPU - 64 - AlexNettensorflow: GPU - 1 - GoogLeNettensorflow: GPU - 1 - ResNet-50tensorflow: GPU - 256 - AlexNettensorflow: GPU - 512 - AlexNettensorflow: GPU - 16 - GoogLeNettensorflow: GPU - 16 - ResNet-50tensorflow: GPU - 32 - GoogLeNettensorflow: GPU - 32 - ResNet-50tensorflow: GPU - 64 - GoogLeNettensorflow: GPU - 64 - ResNet-50gpuowl: 57885161gpuowl: 77936867gpuowl: 332220523indigobench: OpenCL GPU - Bedroomindigobench: OpenCL GPU - Supercarluxcorerender: DLSC - GPUluxcorerender: Danish Mood - GPUluxcorerender: Orange Juice - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Rainbow Colors and Prism - GPUfahbench: mandelgpu: GPUoctanebench: Total Scoreopencl-benchmark: FP64 Computeopencl-benchmark: FP32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT8 Computenamd-cuda: ATPase Simulation - 327,506 Atomsvkresample: 2x - Doublevkresample: 2x - Singlefinancebench: Black-Scholes OpenCLncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetrealsr-ncnn: 4x - Norealsr-ncnn: 4x - Yeswaifu2x-ncnn: 2x - 3 - Yesrodinia: OpenCL Particle Filterblender: BMW27 - NVIDIA OptiXblender: Classroom - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXblender: Pabellon Barcelona - NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER557.73201.94509.45501.50507.45195.40504.67195.39504.27196.07194.58195.30106.37102.60102.60103.17103.57547941317051516624317739295029944517507840702186.702327.553292.373905.984171.00464.86455.01331.8446.2407.513215616570.887.296.8102109334392370423437458210389437.6535492.69630.1111911711512257758459961318170.546758303333322132600000117646732327333338029671.3513.921.481.5031.5933.433.9712.624.3534.1635.1015.675.4615.615.5115.525.55869.07646.41137.4419.80152.81313.5910.5611.7212.8227.67366.0576587219538.2720.9737890.62138.5944.21419.88917.17014.3070.06791339.59318.4895.9128.623.032.252.313.855.070.8411.04117.818.9716.1746.2663.826.8611.11844.612.866.32334.8852.8553.4805.5712.609.4551.3014.29546.76198.18458.39459.94458.36187.26459.93187.69459.27186.63187.27187.51107.59103.68102.90101.55101.24101.43470971377621371422390777744721238867905740701843.261968.373329.263946.904152.41465.18459.43330.3446.3406.713115316671.086.896.7103109330389362423455456209387437.2128479.39515.1712212212111847347749450214555.19561478666671820246666797696726733000006609671.3614.041.501.51.5031.4533.3233.9312.784.3435.2115.665.4915.635.5515.545.55714.80530.32112.6118.20348.51711.748.8910.4010.9223.26317.1952516770131.2647.9978670.51031.7683.44316.37714.28412.1160.07498415.16018.0166.9067.202.482.152.082.243.590.846.0645.525.115.788.7220.745.186.21382.822.677.09242.8523.1684.0986.2114.8611.0358.4416.55535.39201.19502.92505.55505.62194.29198.82504.66197.02195.86194.87108.59103.4596.50103.20103.24103.50554461362101512525431739425152846477514140702306.562459.033475.063976.044143.96465.07457.17333.3446.3412.213215616871.387.396.4103102.7336393365424437457211391437.6338691.73667.0511711812512460461263464819821.107331223333323532400000126263334625000008586001.3814.791.491.51.531.701.533.2934.0612.794.3234.6135.4415.695.4615.815.5015.505.53919.13676.59145.8420.25653.58913.9510.9911.8913.2327.71382.1637619106132.5735.9405930.66040.9144.42021.04718.28115.7310.06788322.06418.4565.2267.452.542.092.014.143.460.827.3734.497.746.0712.2516.376.135.89497.663.045.96233.6262.8543.2915.4312.309.0250.7313.97525.12197.12419.76420.29419.03164.14416.89163.74416.20164.14161.01164.35105.5598.1199.0599.8499.4399.2548418273221142053091214187650856419514431130902020.162116.795055.883369.884100.36864.11887.31360.8825.8753.8132154132.170.286.295.2103110363498376605724659187374816.5534906.79642.2320353.9526699.6620151.4439860.80638.84638.7411311912111359259559459317923.3320295.2720009.7313264.9116329.726717730000021323733333105600030818666677978331.3814.451.491.51.5131.981.5133.5333.9312.824.3534.4635.5815.685.4915.675.5715.635.57866.31645.99137.3220.95952.01412.9910.2012.1413.1233.29343.0199484098913.8674.2509120.63739.3953.13520.02717.00113.7270.10822333.63910.3235.7417.272.342.212.042.163.340.876.1417.884.123.6012.7011.294.906.73354.572.655.55630.3133.2023.8446.3115.2610.6454.3017.30558.82200.46531.96532.77527.82198.58529.14197.82529.49196.50198.70198.01105.86103.66102.83103.49102.83103.535937814399216141279471040035979050471055492084.12495.922653.033913.343822.164044.72619.03608.94370.7595.2551.910712012952.764.370.878.582.6373469410512585575218424582.8443244.79750.3623920.6731635.4723894.7047340.52750.49750.6812211912011768168971473122171.2523888.0223768.2715901.3221156.998200496666726388600000142070038870333339617331.3212.261.451.461.4631.101.4732.8833.5512.244.1433.9535.0215.295.3215.115.3515.005.331025.99761.61163.4124.57061.33816.2312.4213.6414.6131.86394.7356656484783.7876.4369940.74345.9504.41423.66020.50317.6150.07715285.98813.3630.5017.482.702.162.132.263.480.886.4624.857.584.418.7917.205.196.59312.102.555.63330.7242.6602.9735.0411.208.3244.4912.56OpenBenchmarking.org

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090120240360480600SE +/- 3.07, N = 3SE +/- 3.09, N = 3SE +/- 11.16, N = 12558.82557.73546.76535.39525.12MIN: 473.77 / MAX: 573.46MIN: 513.63 / MAX: 563.37MIN: 195.25 / MAX: 556.94MIN: 428.43 / MAX: 572.99MIN: 458.54 / MAX: 542.46

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.73, N = 3SE +/- 0.38, N = 3SE +/- 0.36, N = 3SE +/- 0.09, N = 2201.94201.19200.46198.18197.12MIN: 183.53 / MAX: 206.5MIN: 180.79 / MAX: 203.92MIN: 177.25 / MAX: 203.31MIN: 181.27 / MAX: 200.06MIN: 137.37 / MAX: 198.9

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090120240360480600SE +/- 1.33, N = 3SE +/- 2.23, N = 3SE +/- 0.26, N = 3SE +/- 0.89, N = 2531.96509.45502.92458.39419.76MIN: 422.98 / MAX: 539.81MIN: 430.1 / MAX: 516.48MIN: 415.65 / MAX: 520.39MIN: 404.5 / MAX: 461.01MIN: 376.2 / MAX: 422.17

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090120240360480600SE +/- 0.70, N = 3SE +/- 1.69, N = 3SE +/- 2.17, N = 2SE +/- 0.13, N = 2532.77505.55501.50459.94420.29MIN: 420.31 / MAX: 538.98MIN: 419.93 / MAX: 512.69MIN: 415.94 / MAX: 510.69MIN: 403.65 / MAX: 462.59MIN: 376.81 / MAX: 421.58

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 1.58, N = 3SE +/- 0.92, N = 3SE +/- 1.92, N = 3SE +/- 0.27, N = 3SE +/- 0.24, N = 3527.82507.45505.62458.36419.03MIN: 419.39 / MAX: 534.44MIN: 423.41 / MAX: 512.88MIN: 426.6 / MAX: 513.25MIN: 404.89 / MAX: 461.01MIN: 376 / MAX: 422

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.33, N = 3SE +/- 0.29, N = 3198.58195.40194.29187.26164.14MIN: 183.91 / MAX: 201.98MIN: 186.09 / MAX: 197.7MIN: 182.25 / MAX: 197.39MIN: 179.81 / MAX: 188.21MIN: 145.67 / MAX: 165.38

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 0.54, N = 3SE +/- 1.39, N = 3SE +/- 0.34, N = 3SE +/- 0.14, N = 3529.14504.67459.93416.89MIN: 414.54 / MAX: 534.65MIN: 412.34 / MAX: 514.07MIN: 403.65 / MAX: 462.74MIN: 329.77 / MAX: 420.82

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.28, N = 3SE +/- 0.29, N = 3198.82197.82195.39187.69163.74MIN: 188.33 / MAX: 201.47MIN: 176.19 / MAX: 201.63MIN: 183.94 / MAX: 198.7MIN: 182.03 / MAX: 188.31MIN: 144.93 / MAX: 165.03

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 1.16, N = 3SE +/- 0.83, N = 2SE +/- 4.43, N = 2SE +/- 0.43, N = 2SE +/- 0.40, N = 3529.49504.66504.27459.27416.20MIN: 410.12 / MAX: 537.25MIN: 424.27 / MAX: 509.08MIN: 418.22 / MAX: 512.44MIN: 405.48 / MAX: 461.88MIN: 355.45 / MAX: 419.05

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.78, N = 2SE +/- 0.20, N = 3SE +/- 0.51, N = 3SE +/- 0.34, N = 3197.02196.50196.07186.63164.14MIN: 183.92 / MAX: 200.54MIN: 179.34 / MAX: 200MIN: 171.95 / MAX: 199.96MIN: 180.51 / MAX: 187.79MIN: 149 / MAX: 165

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.95, N = 3SE +/- 0.19, N = 2SE +/- 1.14, N = 2SE +/- 0.17, N = 3198.70195.86194.58187.27161.01MIN: 185.21 / MAX: 203.36MIN: 181.64 / MAX: 199.2MIN: 183.74 / MAX: 198.52MIN: 179.9 / MAX: 188.08MIN: 138.12 / MAX: 165.16

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.81, N = 3SE +/- 1.38, N = 2SE +/- 0.05, N = 3SE +/- 0.33, N = 2198.01195.30194.87187.51164.35MIN: 185.3 / MAX: 202.59MIN: 182 / MAX: 199.43MIN: 180.8 / MAX: 198MIN: 181.57 / MAX: 188.05MIN: 149.91 / MAX: 166.09

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPERNVIDIA RTX 309020406080100SE +/- 0.55, N = 3SE +/- 0.24, N = 2SE +/- 0.33, N = 3108.59107.59106.37105.86105.55MIN: 99.04 / MAX: 110.68MIN: 98.77 / MAX: 109.43MIN: 97.91 / MAX: 108.16MIN: 95.05 / MAX: 107.6MIN: 91.76 / MAX: 107.42

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.33, N = 3SE +/- 0.52, N = 2SE +/- 0.53, N = 2103.68103.66103.4598.11MIN: 96.86 / MAX: 105.56MIN: 93.46 / MAX: 105.95MIN: 95.22 / MAX: 105.88MIN: 89.88 / MAX: 100.25

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TI20406080100SE +/- 0.62, N = 3SE +/- 0.13, N = 3SE +/- 6.65, N = 5102.90102.83102.6099.0596.50MIN: 95.98 / MAX: 104.54MIN: 92.44 / MAX: 105.47MIN: 94.84 / MAX: 104.25MIN: 91.8 / MAX: 100.69MIN: 64.35 / MAX: 104.79

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.13, N = 3SE +/- 0.39, N = 2SE +/- 1.49, N = 2SE +/- 0.45, N = 3SE +/- 0.14, N = 3103.49103.20102.60101.5599.84MIN: 93.23 / MAX: 105.43MIN: 95.31 / MAX: 105.27MIN: 79.69 / MAX: 105.28MIN: 93.44 / MAX: 103.08MIN: 92.73 / MAX: 101.46

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.18, N = 3SE +/- 0.05, N = 2SE +/- 0.57, N = 3103.24103.17102.83101.2499.43MIN: 95.41 / MAX: 104.9MIN: 95.79 / MAX: 105.15MIN: 93.16 / MAX: 105.07MIN: 93.33 / MAX: 102.92MIN: 90.49 / MAX: 101.97

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.36, N = 2SE +/- 0.39, N = 3SE +/- 0.19, N = 3103.57103.53103.50101.4399.25MIN: 95.95 / MAX: 105.54MIN: 88.81 / MAX: 104.8MIN: 94.95 / MAX: 105.61MIN: 93.27 / MAX: 103.58MIN: 91.16 / MAX: 101.18

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT R2C / C2RNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407013K26K39K52K65KSE +/- 772.47, N = 15SE +/- 520.37, N = 3SE +/- 702.53, N = 15SE +/- 320.62, N = 3SE +/- 745.02, N = 1359378554465479448418470971. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in half precisionNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER60K120K180K240K300KSE +/- 160.60, N = 3SE +/- 3524.05, N = 12SE +/- 1301.92, N = 3SE +/- 1708.38, N = 3SE +/- 159.17, N = 32732211439921377621362101317051. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein in single precisionNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 40703K6K9K12K15KSE +/- 73.00, N = 3SE +/- 102.52, N = 3SE +/- 118.41, N = 3SE +/- 115.62, N = 3SE +/- 52.09, N = 316141151661512514205137141. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in double precisionNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 40707K14K21K28K35KSE +/- 50.66, N = 3SE +/- 325.03, N = 3SE +/- 302.46, N = 3SE +/- 146.69, N = 3SE +/- 125.94, N = 330912279472543124317223901. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precisionNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER30K60K90K120K150KSE +/- 9.64, N = 3SE +/- 33.60, N = 3SE +/- 13.72, N = 3SE +/- 0.88, N = 3SE +/- 7.94, N = 31418761040037777473942739291. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C multidimensional in single precisionNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 407013K26K39K52K65KSE +/- 251.10, N = 3SE +/- 417.77, N = 15SE +/- 407.28, N = 15SE +/- 407.19, N = 15SE +/- 476.57, N = 559790515285085650299472121. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein benchmark in double precisionNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407011002200330044005500SE +/- 11.37, N = 3SE +/- 11.35, N = 3SE +/- 12.55, N = 3SE +/- 9.84, N = 3SE +/- 4.51, N = 3504746474451419538861. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER30K60K90K120K150KSE +/- 37.44, N = 3SE +/- 20.80, N = 3SE +/- 5.84, N = 3SE +/- 28.54, N = 3SE +/- 37.77, N = 31443111055497905775141750781. (CXX) g++ options: -O3 -lrt

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 512.75, N = 164070.04070.04070.03090.02084.1

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: deband_heavyNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40705001000150020002500SE +/- 0.75, N = 3SE +/- 0.56, N = 3SE +/- 2.26, N = 3SE +/- 2.92, N = 3SE +/- 0.08, N = 32495.922306.672186.702024.611847.981. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: polar_nocomputeNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40706001200180024003000SE +/- 1.94, N = 3SE +/- 0.26, N = 3SE +/- 0.24, N = 3SE +/- 3.45, N = 3SE +/- 0.16, N = 32653.032461.232327.552126.311972.781. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_peakdetectNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER11002200330044005500SE +/- 13.97, N = 3SE +/- 28.18, N = 3SE +/- 99.97, N = 3SE +/- 144.09, N = 3SE +/- 3.65, N = 35104.103931.573544.603452.433292.371. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_lutNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPERNVIDIA RTX 30909001800270036004500SE +/- 33.96, N = 3SE +/- 6.47, N = 3SE +/- 12.09, N = 3SE +/- 17.88, N = 3SE +/- 22.23, N = 33976.043946.903905.983845.513376.851. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: av1_grain_lapNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 5.52, N = 3SE +/- 16.20, N = 3SE +/- 35.33, N = 3SE +/- 12.99, N = 3SE +/- 39.01, N = 34171.004152.414143.964126.894057.411. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.07, N = 3SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3864.11619.03465.18465.07464.861. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.06, N = 3SE +/- 0.57, N = 3SE +/- 0.16, N = 3SE +/- 0.11, N = 3SE +/- 0.14, N = 3887.31608.94459.43457.17455.011. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 407080160240320400SE +/- 0.00, N = 3SE +/- 0.22, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3370.7360.8333.3331.8330.31. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.32, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.12, N = 3825.8595.2446.3446.3446.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070160320480640800SE +/- 0.83, N = 3SE +/- 0.25, N = 3SE +/- 0.12, N = 3SE +/- 1.11, N = 3SE +/- 0.55, N = 3753.8551.9412.2407.5406.71. (CC) gcc options: -O2 -flto -lOpenCL

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPER306090120150SE +/- 1.20, N = 3SE +/- 0.88, N = 3SE +/- 1.20, N = 3SE +/- 1.20, N = 3SE +/- 0.67, N = 31321321321311071. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TI SUPER306090120150SE +/- 2.00, N = 3SE +/- 2.19, N = 3SE +/- 0.33, N = 3SE +/- 4.81, N = 3SE +/- 0.33, N = 31561561541531201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 2.40, N = 3SE +/- 3.76, N = 3SE +/- 2.73, N = 3SE +/- 35.40, N = 3SE +/- 0.58, N = 3168.0166.0165.0132.1129.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1632486480SE +/- 0.74, N = 3SE +/- 0.25, N = 3SE +/- 0.32, N = 3SE +/- 0.72, N = 3SE +/- 0.18, N = 371.371.070.870.252.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.57, N = 3SE +/- 0.12, N = 3SE +/- 0.44, N = 3SE +/- 0.94, N = 3SE +/- 0.12, N = 387.387.286.886.264.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 3SE +/- 0.58, N = 3SE +/- 0.84, N = 3SE +/- 0.19, N = 396.896.796.495.270.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.88, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.46, N = 3103.0103.0103.0102.078.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 6.30, N = 3SE +/- 0.47, N = 3110.0109.0109.0102.782.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 407080160240320400SE +/- 0.33, N = 3SE +/- 1.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33733633363343301. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070110220330440550SE +/- 0.58, N = 3SE +/- 1.20, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34984693933923891. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 407090180270360450SE +/- 1.00, N = 3SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34103763703653621. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER130260390520650SE +/- 0.58, N = 3SE +/- 4.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36055124244234231. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER160320480640800SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 37245854554374371. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070140280420560700SE +/- 0.88, N = 3SE +/- 1.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 36595754584574561. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309050100150200250SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 32182112102091871. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309090180270360450SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34243913893873741. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 40702004006008001000SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3816.55582.84437.65437.63437.211. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40709K18K27K36K45KSE +/- 50.25, N = 3SE +/- 11.67, N = 3SE +/- 0.99, N = 3SE +/- 113.39, N = 3SE +/- 5.46, N = 343244.7938691.7335492.6934906.7928479.391. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070160320480640800SE +/- 1.26, N = 3SE +/- 1.33, N = 3SE +/- 1.63, N = 3SE +/- 0.98, N = 3SE +/- 0.21, N = 3750.36667.05642.23630.11515.171. (CXX) g++ options: -O3

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-scalarNVIDIA RTX 4070 TI SUPERNVIDIA RTX 30905K10K15K20K25KSE +/- 5.58, N = 3SE +/- 123.67, N = 323920.6720353.95

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-vec4NVIDIA RTX 4070 TI SUPERNVIDIA RTX 30907K14K21K28K35KSE +/- 0.73, N = 3SE +/- 206.35, N = 331635.4726767.21

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-scalarNVIDIA RTX 4070 TI SUPERNVIDIA RTX 30905K10K15K20K25KSE +/- 0.21, N = 3SE +/- 34.29, N = 323894.7020151.44

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-vec4NVIDIA RTX 4070 TI SUPERNVIDIA RTX 309010K20K30K40K50KSE +/- 0.16, N = 3SE +/- 12.74, N = 347340.5239860.80

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-scalarNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090160320480640800SE +/- 0.01, N = 3SE +/- 0.06, N = 3750.49638.84

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-vec4NVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090160320480640800SE +/- 0.45, N = 3SE +/- 0.72, N = 3750.68639.52

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 1.50, N = 2SE +/- 1.86, N = 3SE +/- 4.04, N = 3SE +/- 1.15, N = 3SE +/- 1.86, N = 31221221191171131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER306090120150SE +/- 1.76, N = 3SE +/- 3.50, N = 2SE +/- 3.28, N = 3SE +/- 1.20, N = 3SE +/- 2.08, N = 31221191191181171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPER306090120150SE +/- 2.08, N = 3SE +/- 2.08, N = 3SE +/- 2.31, N = 3SE +/- 3.00, N = 2SE +/- 1.00, N = 21251211211201151. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090306090120150SE +/- 2.08, N = 3SE +/- 2.08, N = 3SE +/- 1.20, N = 3SE +/- 2.91, N = 3SE +/- 0.88, N = 31241221181171131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070150300450600750SE +/- 1.33, N = 3SE +/- 0.33, N = 3SE +/- 2.31, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36816045925774731. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070150300450600750SE +/- 1.00, N = 3SE +/- 0.33, N = 3SE +/- 2.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36896125955844771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070150300450600750SE +/- 1.00, N = 3SE +/- 0.67, N = 3SE +/- 0.00, N = 3SE +/- 2.03, N = 3SE +/- 0.33, N = 37146345995944941. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070160320480640800SE +/- 1.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 37316486135935021. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40705K10K15K20K25KSE +/- 28.14, N = 3SE +/- 2.50, N = 3SE +/- 3.14, N = 3SE +/- 16.49, N = 3SE +/- 15.26, N = 322171.2519821.1018170.5417923.3314555.191. (CXX) g++ options: -O3

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-scalarNVIDIA RTX 4070 TI SUPERNVIDIA RTX 30905K10K15K20K25KSE +/- 0.08, N = 3SE +/- 16.76, N = 323888.0220315.10

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-vec4NVIDIA RTX 4070 TI SUPERNVIDIA RTX 30905K10K15K20K25KSE +/- 0.99, N = 3SE +/- 15.96, N = 323768.2720017.06

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-scalarNVIDIA RTX 4070 TI SUPERNVIDIA RTX 30903K6K9K12K15KSE +/- 0.75, N = 3SE +/- 10.34, N = 315901.3213273.53

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-vec4NVIDIA RTX 4070 TI SUPERNVIDIA RTX 30905K10K15K20K25KSE +/- 1.60, N = 3SE +/- 9.58, N = 321156.9916338.23

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407020000M40000M60000M80000M100000MSE +/- 97655010.68, N = 3SE +/- 11283665.68, N = 3SE +/- 22430807.19, N = 3SE +/- 53667246.37, N = 3SE +/- 33772046.30, N = 38200496666773312233333675830333336717730000056147866667

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40706000M12000M18000M24000M30000MSE +/- 29067564.97, N = 3SE +/- 15926811.78, N = 3SE +/- 5140363.15, N = 3SE +/- 26244639.66, N = 3SE +/- 6318315.53, N = 32638860000023532400000221326000002132373333318202466667

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070300K600K900K1200K1500KSE +/- 1628.91, N = 3SE +/- 2339.04, N = 3SE +/- 1991.93, N = 3SE +/- 1587.45, N = 3SE +/- 2062.63, N = 31420700126263311764671056000976967

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070800M1600M2400M3200M4000MSE +/- 1098989.43, N = 3SE +/- 721110.26, N = 3SE +/- 1530068.99, N = 3SE +/- 3288532.26, N = 3SE +/- 1059874.21, N = 338870333333462500000323273333330818666672673300000

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070200K400K600K800K1000KSE +/- 392.99, N = 3SE +/- 888.82, N = 3SE +/- 633.33, N = 3SE +/- 1757.21, N = 3SE +/- 176.38, N = 3961733858600802967797833660967

TensorFlow

Device: GPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER0.31050.6210.93151.2421.5525SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 2SE +/- 0.00, N = 31.381.381.361.351.32

TensorFlow

Device: GPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER48121620SE +/- 0.06, N = 2SE +/- 0.20, N = 15SE +/- 0.16, N = 3SE +/- 0.22, N = 2SE +/- 0.13, N = 1514.7914.4514.0413.9212.26

TensorFlow

Device: GPU - Batch Size: 16 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER0.33750.6751.01251.351.6875SE +/- 0.01, N = 2SE +/- 0.00, N = 3SE +/- 0.00, N = 2SE +/- 0.00, N = 31.501.491.491.481.45

TensorFlow

Device: GPU - Batch Size: 32 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.00, N = 2SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.501.501.501.46

TensorFlow

Device: GPU - Batch Size: 64 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 TI SUPER0.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.511.501.501.46

TensorFlow

Device: GPU - Batch Size: 16 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPER714212835SE +/- 0.07, N = 3SE +/- 0.08, N = 3SE +/- 0.17, N = 3SE +/- 0.07, N = 331.9831.7031.5931.4531.10

TensorFlow

Device: GPU - Batch Size: 256 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 31.511.501.47

TensorFlow

Device: GPU - Batch Size: 32 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER816243240SE +/- 0.05, N = 3SE +/- 0.15, N = 2SE +/- 0.18, N = 3SE +/- 0.04, N = 3SE +/- 0.19, N = 333.5333.4033.3233.2932.88

TensorFlow

Device: GPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TI SUPER816243240SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 3SE +/- 0.06, N = 334.0633.9733.9333.9333.55

TensorFlow

Device: GPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER3691215SE +/- 0.07, N = 3SE +/- 0.30, N = 2SE +/- 0.10, N = 3SE +/- 0.17, N = 2SE +/- 0.05, N = 312.8212.7912.7812.6212.24

TensorFlow

Device: GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.97881.95762.93643.91524.894SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 2SE +/- 0.02, N = 34.354.354.344.324.14

TensorFlow

Device: GPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER816243240SE +/- 0.07, N = 2SE +/- 0.07, N = 3SE +/- 0.01, N = 3SE +/- 0.05, N = 334.6134.4634.1633.95

TensorFlow

Device: GPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER816243240SE +/- 0.01, N = 3SE +/- 0.09, N = 2SE +/- 0.03, N = 3SE +/- 0.02, N = 2SE +/- 0.01, N = 335.5835.4435.2135.1035.02

TensorFlow

Device: GPU - Batch Size: 16 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.07, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 315.6915.6815.6715.6615.29

TensorFlow

Device: GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER1.23532.47063.70594.94126.1765SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 2SE +/- 0.02, N = 35.495.495.465.465.32

TensorFlow

Device: GPU - Batch Size: 32 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER48121620SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 2SE +/- 0.06, N = 315.8115.6715.6315.6115.11

TensorFlow

Device: GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1.25332.50663.75995.01326.2665SE +/- 0.01, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 3SE +/- 0.00, N = 35.575.555.515.505.35

TensorFlow

Device: GPU - Batch Size: 64 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.08, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 2SE +/- 0.09, N = 315.6315.5415.5215.5015.00

TensorFlow

Device: GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1.25332.50663.75995.01326.2665SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 35.575.555.555.535.33

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 57885161NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40702004006008001000SE +/- 0.35, N = 3SE +/- 2.53, N = 3SE +/- 1.26, N = 3SE +/- 2.01, N = 3SE +/- 0.00, N = 31025.99919.13869.07866.31714.80

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 77936867NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070160320480640800SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.09, N = 3761.61676.59646.41645.99530.32

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 332220523NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40704080120160200SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3163.41145.84137.44137.32112.61

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 324.5720.9620.2619.8018.20

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40701428425670SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 361.3453.5952.8152.0148.52

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407048121620SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 1.13, N = 12SE +/- 0.01, N = 316.2313.9513.5912.9911.74MIN: 15.91 / MAX: 16.36MIN: 13.67 / MAX: 14.14MIN: 12.52 / MAX: 13.84MIN: 0.52 / MAX: 14.69MIN: 11.35 / MAX: 11.83

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40703691215SE +/- 0.03, N = 3SE +/- 0.11, N = 3SE +/- 0.08, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 312.4210.9910.5610.208.89MIN: 4.35 / MAX: 14.32MIN: 4.17 / MAX: 12.71MIN: 3.7 / MAX: 12.17MIN: 4.07 / MAX: 11.93MIN: 3.32 / MAX: 10.26

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 407048121620SE +/- 0.15, N = 4SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 313.6412.1411.8911.7210.40MIN: 11.16 / MAX: 18.46MIN: 10.24 / MAX: 16.71MIN: 9.85 / MAX: 15.88MIN: 9.6 / MAX: 15.44MIN: 8.31 / MAX: 13.9

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 407048121620SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 2SE +/- 0.02, N = 3SE +/- 0.01, N = 314.6113.2313.1212.8210.92MIN: 5.91 / MAX: 16.88MIN: 5.41 / MAX: 15.13MIN: 4.85 / MAX: 15.21MIN: 4.84 / MAX: 14.62MIN: 4.45 / MAX: 12.42

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070816243240SE +/- 0.36, N = 5SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 333.2931.8627.7127.6723.26MIN: 30.4 / MAX: 36.21MIN: 28.57 / MAX: 33.29MIN: 25.01 / MAX: 29.15MIN: 24.87 / MAX: 29.03MIN: 20.92 / MAX: 24.3

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407090180270360450SE +/- 0.22, N = 3SE +/- 0.26, N = 3SE +/- 0.39, N = 3SE +/- 0.26, N = 3SE +/- 0.12, N = 3394.74382.16366.06343.02317.20

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090140M280M420M560M700MSE +/- 1096202.13, N = 3SE +/- 1202791.77, N = 3SE +/- 467034.80, N = 3SE +/- 1783157.89, N = 3SE +/- 794770.01, N = 3656484783.7619106132.5587219538.2516770131.2484098913.81. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40702004006008001000876.44735.94720.97674.25648.00

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 ComputeNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 40700.16720.33440.50160.66880.836SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.001, N = 30.7430.6600.6370.6210.5101. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 ComputeNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 40701020304050SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.10, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 345.9540.9139.4038.5931.771. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 ComputeNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30900.99451.9892.98353.9784.9725SE +/- 0.016, N = 3SE +/- 0.009, N = 3SE +/- 0.015, N = 3SE +/- 0.004, N = 3SE +/- 0.003, N = 34.4204.4144.2143.4433.1351. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 ComputeNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070612182430SE +/- 0.01, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 323.6621.0520.0319.8916.381. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 ComputeNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070510152025SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 320.5018.2817.1717.0014.281. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 ComputeNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407048121620SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 317.6215.7314.3113.7312.121. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 30900.02430.04860.07290.09720.1215SE +/- 0.00061, N = 3SE +/- 0.00031, N = 3SE +/- 0.00021, N = 3SE +/- 0.00018, N = 3SE +/- 0.00042, N = 30.067880.067910.074980.077150.10822

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 407090180270360450SE +/- 0.02, N = 3SE +/- 0.35, N = 3SE +/- 0.30, N = 3SE +/- 0.30, N = 3SE +/- 0.77, N = 3285.99322.06333.64339.59415.161. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 310.3213.3618.0218.4618.491. (CXX) g++ options: -O3

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070246810SE +/- 0.000, N = 3SE +/- 0.003, N = 3SE +/- 0.006, N = 3SE +/- 0.114, N = 15SE +/- 0.003, N = 30.5015.2265.7415.9126.9061. (CXX) g++ options: -O3 -march=native -fopenmp

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER3691215SE +/- 0.05, N = 3SE +/- 0.22, N = 9SE +/- 0.21, N = 12SE +/- 0.25, N = 12SE +/- 0.47, N = 96.286.927.207.458.62MIN: 6.16 / MAX: 8.09MIN: 6.06 / MAX: 8.65MIN: 6.2 / MAX: 11.13MIN: 6.87 / MAX: 734.65MIN: 6.42 / MAX: 1101.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.05532.11063.16594.22125.2765SE +/- 0.15, N = 3SE +/- 0.09, N = 3SE +/- 0.07, N = 9SE +/- 0.07, N = 12SE +/- 0.44, N = 92.342.422.432.483.03MIN: 2.04 / MAX: 2.63MIN: 2.24 / MAX: 9.23MIN: 2.09 / MAX: 5.8MIN: 2.02 / MAX: 5.82MIN: 2.38 / MAX: 970.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 SUPER246810SE +/- 0.02, N = 3SE +/- 0.09, N = 9SE +/- 0.08, N = 12SE +/- 0.09, N = 9SE +/- 0.16, N = 91.872.092.152.202.25MIN: 1.81 / MAX: 5.21MIN: 1.78 / MAX: 2.85MIN: 1.81 / MAX: 2.58MIN: 1.91 / MAX: 2.71MIN: 1.75 / MAX: 343.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.93831.87662.81493.75324.6915SE +/- 0.08, N = 12SE +/- 0.21, N = 3SE +/- 0.19, N = 3SE +/- 0.09, N = 11SE +/- 0.34, N = 82.012.042.052.082.31MIN: 1.73 / MAX: 3.86MIN: 1.8 / MAX: 5.8MIN: 1.83 / MAX: 6.6MIN: 1.82 / MAX: 2.59MIN: 1.76 / MAX: 421.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER0.93151.8632.79453.7264.6575SE +/- 0.14, N = 3SE +/- 0.13, N = 3SE +/- 0.08, N = 8SE +/- 0.05, N = 9SE +/- 1.31, N = 92.162.212.222.303.85MIN: 2.01 / MAX: 2.55MIN: 2.01 / MAX: 3.93MIN: 1.83 / MAX: 2.54MIN: 2.15 / MAX: 2.58MIN: 1.89 / MAX: 1093.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER48121620SE +/- 0.17, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 9SE +/- 0.07, N = 12SE +/- 0.97, N = 93.343.363.463.465.07MIN: 3.14 / MAX: 4MIN: 3.21 / MAX: 3.57MIN: 2.91 / MAX: 3.79MIN: 3.13 / MAX: 7.03MIN: 3.22 / MAX: 1124.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.1980.3960.5940.7920.99SE +/- 0.03, N = 9SE +/- 0.04, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 9SE +/- 0.05, N = 30.810.840.840.840.86MIN: 0.61 / MAX: 1.19MIN: 0.65 / MAX: 4.63MIN: 0.64 / MAX: 0.96MIN: 0.63 / MAX: 1.13MIN: 0.75 / MAX: 2.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPER3691215SE +/- 0.14, N = 9SE +/- 0.14, N = 9SE +/- 0.18, N = 9SE +/- 0.24, N = 3SE +/- 1.21, N = 95.876.066.116.2511.04MIN: 5.2 / MAX: 6.88MIN: 5.33 / MAX: 8.36MIN: 5.25 / MAX: 9.16MIN: 5.84 / MAX: 6.86MIN: 5.28 / MAX: 1769.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 0.25, N = 3SE +/- 0.19, N = 3SE +/- 11.81, N = 9SE +/- 13.24, N = 12SE +/- 29.60, N = 917.8821.7632.0545.52117.81MIN: 17.3 / MAX: 18.57MIN: 21.34 / MAX: 23.45MIN: 17.34 / MAX: 644.35MIN: 17.49 / MAX: 643.35MIN: 17.16 / MAX: 647.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER48121620SE +/- 0.07, N = 3SE +/- 0.08, N = 3SE +/- 0.73, N = 12SE +/- 1.33, N = 9SE +/- 3.49, N = 94.124.645.115.478.97MIN: 3.97 / MAX: 4.51MIN: 4.46 / MAX: 7.98MIN: 3.99 / MAX: 916.69MIN: 3.95 / MAX: 726.67MIN: 3.94 / MAX: 922.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.09, N = 3SE +/- 0.03, N = 9SE +/- 0.03, N = 3SE +/- 1.70, N = 12SE +/- 5.86, N = 93.603.744.385.7816.17MIN: 3.44 / MAX: 3.79MIN: 3.61 / MAX: 3.98MIN: 4.29 / MAX: 6.18MIN: 3.6 / MAX: 397.75MIN: 3.52 / MAX: 436.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER1020304050SE +/- 0.12, N = 9SE +/- 0.10, N = 9SE +/- 0.22, N = 3SE +/- 4.00, N = 12SE +/- 14.70, N = 98.208.248.5812.2546.26MIN: 7.69 / MAX: 11.69MIN: 7.87 / MAX: 9.87MIN: 8.2 / MAX: 11.07MIN: 8 / MAX: 1777.17MIN: 7.71 / MAX: 1829.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER1428425670SE +/- 0.21, N = 3SE +/- 3.14, N = 3SE +/- 3.10, N = 12SE +/- 5.37, N = 12SE +/- 10.56, N = 911.2914.2616.3720.7463.82MIN: 10.82 / MAX: 11.93MIN: 10.89 / MAX: 673.37MIN: 10.57 / MAX: 855.36MIN: 10.3 / MAX: 854.36MIN: 10.28 / MAX: 858.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER246810SE +/- 0.19, N = 3SE +/- 0.54, N = 3SE +/- 0.12, N = 12SE +/- 0.29, N = 9SE +/- 1.76, N = 94.905.115.185.366.86MIN: 4.47 / MAX: 5.27MIN: 4.43 / MAX: 8.4MIN: 4.67 / MAX: 6.88MIN: 4.55 / MAX: 496.3MIN: 4.34 / MAX: 1630.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 SUPER3691215SE +/- 0.18, N = 12SE +/- 0.24, N = 12SE +/- 0.32, N = 8SE +/- 0.26, N = 12SE +/- 3.28, N = 95.896.216.476.5911.11MIN: 5.42 / MAX: 7.57MIN: 5.53 / MAX: 8.99MIN: 5.44 / MAX: 9.3MIN: 5.45 / MAX: 9.09MIN: 5.49 / MAX: 4942.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 4070NVIDIA RTX 4070 TI SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER2004006008001000SE +/- 61.31, N = 9SE +/- 57.46, N = 12SE +/- 52.80, N = 9SE +/- 25.65, N = 9SE +/- 87.53, N = 9281.56312.10327.82390.18844.61MIN: 46.48 / MAX: 1913.33MIN: 47.85 / MAX: 1850.09MIN: 46.48 / MAX: 1816.93MIN: 46.49 / MAX: 1816.77MIN: 46.34 / MAX: 1866.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER246810SE +/- 0.10, N = 9SE +/- 0.08, N = 8SE +/- 0.26, N = 3SE +/- 0.12, N = 8SE +/- 0.29, N = 92.342.502.542.842.86MIN: 2 / MAX: 3.86MIN: 2.1 / MAX: 32.36MIN: 2.14 / MAX: 4.21MIN: 2.4 / MAX: 5.07MIN: 2.17 / MAX: 577.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070246810SE +/- 0.016, N = 3SE +/- 0.003, N = 3SE +/- 0.039, N = 3SE +/- 0.150, N = 15SE +/- 0.006, N = 35.5565.6335.9626.3237.092

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 40701020304050SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.23, N = 330.3130.7233.6334.8942.85

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30900.72051.4412.16152.8823.6025SE +/- 0.028, N = 3SE +/- 0.009, N = 3SE +/- 0.014, N = 3SE +/- 0.028, N = 3SE +/- 0.011, N = 32.6602.8542.8553.1683.202

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40700.92211.84422.76633.68844.6105SE +/- 0.004, N = 3SE +/- 0.002, N = 3SE +/- 0.039, N = 4SE +/- 0.030, N = 15SE +/- 0.008, N = 32.9733.2913.4803.8444.0981. (CXX) g++ options: -O2 -lOpenCL

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090246810SE +/- 0.06, N = 14SE +/- 0.02, N = 3SE +/- 0.06, N = 13SE +/- 0.01, N = 3SE +/- 0.06, N = 145.045.435.576.216.31

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309048121620SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 311.2012.3012.6014.8615.26

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40703691215SE +/- 0.06, N = 13SE +/- 0.01, N = 3SE +/- 0.06, N = 13SE +/- 0.08, N = 9SE +/- 0.03, N = 38.329.029.4510.6411.03

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40701326395265SE +/- 0.08, N = 3SE +/- 0.05, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 2SE +/- 0.04, N = 344.4950.7351.3054.3058.44

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 4070 TI SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309048121620SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 312.5613.9714.2916.5517.30


Phoronix Test Suite v10.8.4