RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS) and NVIDIA GeForce RTX 3090 24GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402116-SADD-240207012&grs.

RTX 4070 SUPERProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads)ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS)Intel Device 7a2732GB4001GB Seagate ZP4000GP304001ASUS NVIDIA GeForce RTX 4070 SUPER 12GBRealtek ALC1220ARZOPAIntel I226-V + Intel Device 7a70EndeavourOS rolling6.7.1-arch1-1 (x86_64)KDE Plasma 5.27.10X Server 1.21.1.11NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.1 20230801ext41920x1080MSI NVIDIA GeForce RTX 4070 12GBGCC 13.2.1 20230801 + CUDA 12.3NVIDIA GeForce RTX 4070 Ti 12GBNVIDIA GeForce RTX 3090 24GBPI-KVM Video6.7.4-arch1-1 (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- NVIDIA RTX 4070 SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 3090: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Details- NVIDIA RTX 4070 SUPER: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1- NVIDIA RTX 4070: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2a- NVIDIA RTX 4070 TI: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36- NVIDIA RTX 3090: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.baSecurity Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected Environment Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Python Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090: Python 3.11.6

RTX 4070 SUPERvkfft: FFT + iFFT C2C 1D batched in half precisionopencl-benchmark: Memory Bandwidth Coalesced Writevkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingvkfft: FFT + iFFT C2C 1D batched in single precisionclpeak: Global Memory Bandwidthopencl-benchmark: Memory Bandwidth Coalesced Readcl-mem: Writecl-mem: Readvkresample: 2x - Singleviennacl: OpenCL BLAS - dAXPYnamd-cuda: ATPase Simulation - 327,506 Atomslibplacebo: hdr_peakdetectviennacl: OpenCL BLAS - dDOTluxcorerender: Rainbow Colors and Prism - GPUviennacl: OpenCL BLAS - dCOPYrealsr-ncnn: 4x - Yesopencl-benchmark: INT64 Computevkfft: FFT + iFFT C2C 1D batched in double precisionclpeak: Integer Compute INTclpeak: Single-Precision Floatneatbench: GPUhashcat: MD5hashcat: TrueCrypt RIPEMD160 + XTSopencl-benchmark: INT8 Computehashcat: SHA-512gpuowl: 332220523clpeak: Double-Precision Doubleopencl-benchmark: FP64 Computehashcat: SHA1hashcat: 7-Zipviennacl: OpenCL BLAS - dGEMM-TTvkresample: 2x - Doubleopencl-benchmark: FP32 Computegpuowl: 57885161opencl-benchmark: INT32 Computeviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - sAXPYopencl-benchmark: INT16 Computemandelgpu: GPUviennacl: OpenCL BLAS - dGEMM-NNgpuowl: 77936867libplacebo: deband_heavylibplacebo: polar_nocomputerodinia: OpenCL Particle Filterblender: Classroom - NVIDIA OptiXblender: Pabellon Barcelona - NVIDIA OptiXluxcorerender: Danish Mood - GPUblender: Fishy Cat - NVIDIA OptiXpytorch: NVIDIA CUDA GPU - 256 - ResNet-152pytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-50pytorch: NVIDIA CUDA GPU - 512 - ResNet-50luxcorerender: LuxCore Benchmark - GPUpytorch: NVIDIA CUDA GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 256 - ResNet-50fahbench: pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-152libplacebo: hdr_lutvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionpytorch: NVIDIA CUDA GPU - 16 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-152vkfft: FFT + iFFT R2C / C2Rluxcorerender: Orange Juice - GPUblender: BMW27 - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXindigobench: OpenCL GPU - Bedroomoctanebench: Total Scoreviennacl: OpenCL BLAS - dGEMV-Nwaifu2x-ncnn: 2x - 3 - Yesvkfft: FFT + iFFT C2C Bluestein in single precisionindigobench: OpenCL GPU - Supercarviennacl: OpenCL BLAS - sCOPYviennacl: CPU BLAS - dGEMM-TTcl-mem: Copyvkfft: FFT + iFFT C2C multidimensional in single precisionviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-NNtensorflow: GPU - 1 - AlexNetpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lviennacl: OpenCL BLAS - dGEMV-Tpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lviennacl: CPU BLAS - dGEMM-NTviennacl: OpenCL BLAS - sDOTpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 1 - ResNet-152tensorflow: GPU - 1 - VGG-16viennacl: CPU BLAS - sAXPYlibplacebo: av1_grain_laptensorflow: GPU - 16 - AlexNetviennacl: CPU BLAS - dDOTtensorflow: GPU - 1 - GoogLeNetviennacl: CPU BLAS - dCOPYtensorflow: GPU - 512 - AlexNettensorflow: GPU - 16 - VGG-16tensorflow: GPU - 256 - AlexNettensorflow: GPU - 32 - GoogLeNetviennacl: CPU BLAS - dAXPYtensorflow: GPU - 32 - ResNet-50viennacl: CPU BLAS - dGEMV-Ntensorflow: GPU - 64 - GoogLeNetvkpeak: fp32-vec4viennacl: CPU BLAS - sCOPYtensorflow: GPU - 64 - ResNet-50tensorflow: GPU - 32 - AlexNettensorflow: GPU - 1 - ResNet-50tensorflow: GPU - 256 - VGG-16tensorflow: GPU - 64 - VGG-16tensorflow: GPU - 16 - ResNet-50vkpeak: fp32-scalartensorflow: GPU - 64 - AlexNetvkpeak: int16-scalarvkpeak: fp16-scalarvkpeak: fp16-vec4vkpeak: int16-vec4tensorflow: GPU - 16 - GoogLeNetvkpeak: int32-scalarvkpeak: fp64-vec4vkpeak: int32-vec4vkpeak: fp64-scalartensorflow: GPU - 32 - VGG-16ncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - sDOTfinancebench: Black-Scholes OpenCLluxcorerender: DLSC - GPUrealsr-ncnn: 4x - Nopytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 1 - ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090131705455.017507873929437.65464.86407.5446.218.4894370.067913292.3745827.6742334.8854.2142431718170.5435492.6940706758303333380296714.3073232733333137.44630.110.621221326000001176467613339.59338.594869.0719.88959958439217.170587219538.2577646.412186.702327.553.48012.6014.2910.569.45194.58195.39509.45504.2712.82507.45504.67366.0576501.50196.073905.984451195.40195.305479411.725.5751.3019.801720.9737892102.8551516652.813334122331.85029911511913.92389103.57117370103.17102.60106.37201.941.351564171.0031.5996.812.6270.835.101.4834.1615.6187.25.5110215.521325.5533.44.355.4633.9715.671.502.86844.6111.116.8663.8246.2616.178.97117.8111.040.845.073.852.312.253.038.621091655.91213.596.323102.60557.73137762459.437905777774437.21465.18406.7446.318.0164550.074983329.2645623.2642342.8523.4432239014555.1928479.3940705614786666766096712.1162673300000112.61515.170.51018202466667976967502415.16031.768714.8016.37749447738914.284516770131.2473530.321843.261968.374.09814.8616.558.8911.03187.27187.69458.39459.2710.92458.36459.93317.1952459.94186.633946.903886187.26187.514709710.406.2158.4418.203647.9978672093.1681371448.517330118330.34721212112214.04103.68387101.43122362101.24101.55107.59198.181.361534152.4131.4596.712.7871.035.211.5015.6386.85.5510315.541315.5533.324.341.505.4933.9315.661.52.67382.826.215.1820.748.725.785.1145.526.060.843.592.242.082.152.487.201091666.90611.747.092102.90546.76136210457.177514173942437.63465.07412.2446.318.4564370.067883475.0645727.7142433.6264.4202543119821.1038691.7340707331223333385860015.7313462500000145.84667.050.660235324000001262633648322.06440.914919.1321.04763461239318.281619106132.5604676.592306.562459.033.29112.3013.9710.999.02195.86198.82502.92504.6613.23505.62382.1637505.55197.023976.044647194.29194.875544611.895.4350.7320.256735.9405932112.8541512553.589336124333.35152812511714.79103.45391103.50118365103.24103.20108.59201.191.381564143.9631.7096.412.7971.335.441.4934.6115.8187.35.5010315.501325.5333.294.321.51.55.4634.0615.691.53.04497.665.896.1316.3712.256.077.7434.497.370.823.464.142.012.092.547.45102.71685.22613.955.96296.50535.39273221887.31144311141876816.55864.11753.8825.810.3237240.108225055.8865933.2960530.3133.1353091217923.3334906.7930906717730000079783313.7273081866667137.32642.230.637213237333331056000593333.63939.395866.3120.02759459549817.001484098913.8592645.992020.162116.793.84415.2617.3010.2010.64161.01163.74419.76416.2013.12419.03416.89343.0199420.29164.143369.884195164.14164.354841812.146.3154.3020.959674.2509121873.2021420552.014363113360.85085612111314.4598.1137499.2511937699.4399.84105.55197.121.381544100.3631.9895.212.8270.235.581.4934.4615.6786.25.5710315.6326699.661325.5733.534.351.511.515.4920353.9533.9313264.9120151.4439860.8016329.7215.6820295.27638.7420009.73638.841.52.65354.576.734.9011.2912.703.604.1217.886.140.873.342.162.042.212.347.27110132.15.74112.995.55699.05525.12OpenBenchmarking.org

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in half precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309060K120K180K240K300KSE +/- 159.17, N = 3SE +/- 1301.92, N = 3SE +/- 1708.38, N = 3SE +/- 160.60, N = 31317051377621362102732211. (CXX) g++ options: -O3 -lrt

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30902004006008001000SE +/- 0.14, N = 3SE +/- 0.16, N = 3SE +/- 0.11, N = 3SE +/- 0.06, N = 3455.01459.43457.17887.311. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309030K60K90K120K150KSE +/- 37.77, N = 3SE +/- 5.84, N = 3SE +/- 28.54, N = 3SE +/- 37.44, N = 37507879057751411443111. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309030K60K90K120K150KSE +/- 7.94, N = 3SE +/- 13.72, N = 3SE +/- 0.88, N = 3SE +/- 9.64, N = 37392977774739421418761. (CXX) g++ options: -O3 -lrt

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30902004006008001000SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3437.65437.21437.63816.551. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30902004006008001000SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3464.86465.18465.07864.111. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090160320480640800SE +/- 1.11, N = 3SE +/- 0.55, N = 3SE +/- 0.12, N = 3SE +/- 0.83, N = 3407.5406.7412.2753.81. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30902004006008001000SE +/- 0.12, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.32, N = 3446.2446.3446.3825.81. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 318.4918.0218.4610.321. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090160320480640800SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.58, N = 34374554377241. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.02430.04860.07290.09720.1215SE +/- 0.00031, N = 3SE +/- 0.00021, N = 3SE +/- 0.00061, N = 3SE +/- 0.00042, N = 30.067910.074980.067880.10822

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_peakdetectNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309011002200330044005500SE +/- 3.65, N = 3SE +/- 11.75, N = 3SE +/- 99.97, N = 3SE +/- 43.13, N = 33292.373310.023544.604997.081. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090140280420560700SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.88, N = 34584564576591. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090816243240SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.36, N = 527.6723.2627.7133.29MIN: 24.87 / MAX: 29.03MIN: 20.92 / MAX: 24.3MIN: 25.01 / MAX: 29.15MIN: 30.4 / MAX: 36.21

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090130260390520650SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.58, N = 34234234246051. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901020304050SE +/- 0.02, N = 3SE +/- 0.23, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 334.8942.8533.6330.31

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.99451.9892.98353.9784.9725SE +/- 0.015, N = 3SE +/- 0.004, N = 3SE +/- 0.016, N = 3SE +/- 0.003, N = 34.2143.4434.4203.1351. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in double precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30907K14K21K28K35KSE +/- 146.69, N = 3SE +/- 125.94, N = 3SE +/- 302.46, N = 3SE +/- 50.66, N = 3243172239025431309121. (CXX) g++ options: -O3 -lrt

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904K8K12K16K20KSE +/- 3.14, N = 3SE +/- 15.26, N = 3SE +/- 2.50, N = 3SE +/- 16.49, N = 318170.5414555.1919821.1017923.331. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30908K16K24K32K40KSE +/- 0.99, N = 3SE +/- 5.46, N = 3SE +/- 11.67, N = 3SE +/- 113.39, N = 335492.6928479.3938691.7334906.791. (CXX) g++ options: -O3

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30909001800270036004500SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34070407040703090

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309016000M32000M48000M64000M80000MSE +/- 22430807.19, N = 3SE +/- 33772046.30, N = 3SE +/- 11283665.68, N = 3SE +/- 53667246.37, N = 367583033333561478666677331223333367177300000

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090200K400K600K800K1000KSE +/- 633.33, N = 3SE +/- 176.38, N = 3SE +/- 888.82, N = 3SE +/- 1757.21, N = 3802967660967858600797833

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 314.3112.1215.7313.731. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090700M1400M2100M2800M3500MSE +/- 1530068.99, N = 3SE +/- 1059874.21, N = 3SE +/- 721110.26, N = 3SE +/- 3288532.26, N = 33232733333267330000034625000003081866667

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 332220523NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3137.44112.61145.84137.32

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090140280420560700SE +/- 0.98, N = 3SE +/- 0.21, N = 3SE +/- 1.33, N = 3SE +/- 1.63, N = 3630.11515.17667.05642.231. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.14850.2970.44550.5940.7425SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.6210.5100.6600.6371. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30905000M10000M15000M20000M25000MSE +/- 5140363.15, N = 3SE +/- 6318315.53, N = 3SE +/- 15926811.78, N = 3SE +/- 26244639.66, N = 322132600000182024666672353240000021323733333

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090300K600K900K1200K1500KSE +/- 1991.93, N = 3SE +/- 2062.63, N = 3SE +/- 2339.04, N = 3SE +/- 1587.45, N = 3117646797696712626331056000

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090140280420560700SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 36135026485931. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309090180270360450SE +/- 0.30, N = 3SE +/- 0.77, N = 3SE +/- 0.35, N = 3SE +/- 0.30, N = 3339.59415.16322.06333.641. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090918273645SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.10, N = 338.5931.7740.9139.401. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 57885161NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30902004006008001000SE +/- 1.26, N = 3SE +/- 0.00, N = 3SE +/- 2.53, N = 3SE +/- 2.01, N = 3869.07714.80919.13866.31

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090510152025SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 319.8916.3821.0520.031. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090140280420560700SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.67, N = 3SE +/- 2.03, N = 35994946345941. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090130260390520650SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 2.33, N = 35844776125951. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090110220330440550SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.58, N = 33923893934981. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 317.1714.2818.2817.001. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090130M260M390M520M650MSE +/- 467034.80, N = 3SE +/- 1783157.89, N = 3SE +/- 1202791.77, N = 3SE +/- 794770.01, N = 3587219538.2516770131.2619106132.5484098913.81. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090130260390520650SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 2.31, N = 35774736045921. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 77936867NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090150300450600750SE +/- 0.00, N = 3SE +/- 0.09, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3646.41530.32676.59645.99

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: deband_heavyNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30905001000150020002500SE +/- 2.26, N = 3SE +/- 0.08, N = 3SE +/- 0.56, N = 3SE +/- 4.93, N = 32186.701847.982306.672017.751. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: polar_nocomputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30905001000150020002500SE +/- 0.24, N = 3SE +/- 0.16, N = 3SE +/- 0.26, N = 3SE +/- 7.22, N = 32327.551972.782461.232119.891. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.92211.84422.76633.68844.6105SE +/- 0.039, N = 4SE +/- 0.008, N = 3SE +/- 0.002, N = 3SE +/- 0.030, N = 153.4804.0983.2913.8441. (CXX) g++ options: -O2 -lOpenCL

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 312.6014.8612.3015.26

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 314.2916.5513.9717.30

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 0.08, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 3SE +/- 0.04, N = 310.568.8910.9910.20MIN: 3.7 / MAX: 12.17MIN: 3.32 / MAX: 10.26MIN: 4.17 / MAX: 12.71MIN: 4.07 / MAX: 11.93

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 0.06, N = 13SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.08, N = 99.4511.039.0210.64

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 1.14, N = 2SE +/- 0.17, N = 3SE +/- 0.19, N = 2194.58187.27195.86161.01MIN: 183.74 / MAX: 198.52MIN: 179.9 / MAX: 188.08MIN: 181.64 / MAX: 199.2MIN: 138.12 / MAX: 165.16

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 0.29, N = 3195.39187.69198.82163.74MIN: 183.94 / MAX: 198.7MIN: 182.03 / MAX: 188.31MIN: 188.33 / MAX: 201.47MIN: 144.93 / MAX: 165.03

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090110220330440550SE +/- 0.26, N = 3SE +/- 2.23, N = 3SE +/- 0.89, N = 2509.45458.39502.92419.76MIN: 430.1 / MAX: 516.48MIN: 404.5 / MAX: 461.01MIN: 415.65 / MAX: 520.39MIN: 376.2 / MAX: 422.17

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090110220330440550SE +/- 4.43, N = 2SE +/- 0.43, N = 2SE +/- 0.83, N = 2SE +/- 0.40, N = 3504.27459.27504.66416.20MIN: 418.22 / MAX: 512.44MIN: 405.48 / MAX: 461.88MIN: 424.27 / MAX: 509.08MIN: 355.45 / MAX: 419.05

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 212.8210.9213.2313.12MIN: 4.84 / MAX: 14.62MIN: 4.45 / MAX: 12.42MIN: 5.41 / MAX: 15.13MIN: 4.85 / MAX: 15.21

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090110220330440550SE +/- 0.92, N = 3SE +/- 0.27, N = 3SE +/- 1.92, N = 3SE +/- 0.24, N = 3507.45458.36505.62419.03MIN: 423.41 / MAX: 512.88MIN: 404.89 / MAX: 461.01MIN: 426.6 / MAX: 513.25MIN: 376 / MAX: 422

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 1.39, N = 3SE +/- 0.34, N = 3SE +/- 0.14, N = 3504.67459.93416.89MIN: 412.34 / MAX: 514.07MIN: 403.65 / MAX: 462.74MIN: 329.77 / MAX: 420.82

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309080160240320400SE +/- 0.39, N = 3SE +/- 0.12, N = 3SE +/- 0.26, N = 3SE +/- 0.26, N = 3366.06317.20382.16343.02

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090110220330440550SE +/- 2.17, N = 2SE +/- 0.13, N = 2SE +/- 1.69, N = 3501.50459.94505.55420.29MIN: 415.94 / MAX: 510.69MIN: 403.65 / MAX: 462.59MIN: 419.93 / MAX: 512.69MIN: 376.81 / MAX: 421.58

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 0.51, N = 3SE +/- 0.34, N = 3SE +/- 0.78, N = 2196.07186.63197.02164.14MIN: 171.95 / MAX: 199.96MIN: 180.51 / MAX: 187.79MIN: 183.92 / MAX: 200.54MIN: 149 / MAX: 165

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_lutNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30909001800270036004500SE +/- 12.09, N = 3SE +/- 10.06, N = 3SE +/- 5.47, N = 3SE +/- 13.62, N = 33905.983927.113971.613313.261. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein benchmark in double precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309010002000300040005000SE +/- 12.55, N = 3SE +/- 4.51, N = 3SE +/- 11.35, N = 3SE +/- 9.84, N = 344513886464741951. (CXX) g++ options: -O3 -lrt

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 0.29, N = 3195.40187.26194.29164.14MIN: 186.09 / MAX: 197.7MIN: 179.81 / MAX: 188.21MIN: 182.25 / MAX: 197.39MIN: 145.67 / MAX: 165.38

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 1.38, N = 2SE +/- 0.05, N = 3SE +/- 0.33, N = 2195.30187.51194.87164.35MIN: 182 / MAX: 199.43MIN: 181.57 / MAX: 188.05MIN: 180.8 / MAX: 198MIN: 149.91 / MAX: 166.09

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT R2C / C2RNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309012K24K36K48K60KSE +/- 702.53, N = 15SE +/- 745.02, N = 13SE +/- 520.37, N = 3SE +/- 320.62, N = 3547944709755446484181. (CXX) g++ options: -O3 -lrt

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 311.7210.4011.8912.14MIN: 9.6 / MAX: 15.44MIN: 8.31 / MAX: 13.9MIN: 9.85 / MAX: 15.88MIN: 10.24 / MAX: 16.71

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090246810SE +/- 0.06, N = 13SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 145.576.215.436.31

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901326395265SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 251.3058.4450.7354.30

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090510152025SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 319.8018.2020.2620.96

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090160320480640800720.97648.00735.94674.25

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309050100150200250SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 32102092111871. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.72051.4412.16152.8823.6025SE +/- 0.014, N = 3SE +/- 0.028, N = 3SE +/- 0.009, N = 3SE +/- 0.011, N = 32.8553.1682.8543.202

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903K6K9K12K15KSE +/- 102.52, N = 3SE +/- 52.09, N = 3SE +/- 118.41, N = 3SE +/- 115.62, N = 3151661371415125142051. (CXX) g++ options: -O3 -lrt

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901224364860SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 352.8148.5253.5952.01

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309080160240320400SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 33343303363631. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 2.08, N = 3SE +/- 1.20, N = 3SE +/- 2.08, N = 3SE +/- 0.88, N = 31221181241131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309080160240320400SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.00, N = 3SE +/- 0.22, N = 3331.8330.3333.3360.81. (CC) gcc options: -O2 -flto -lOpenCL

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C multidimensional in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309011K22K33K44K55KSE +/- 407.19, N = 15SE +/- 476.57, N = 5SE +/- 417.77, N = 15SE +/- 407.28, N = 15502994721251528508561. (CXX) g++ options: -O3 -lrt

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 1.00, N = 2SE +/- 2.31, N = 3SE +/- 2.08, N = 3SE +/- 2.08, N = 31151211251211. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 4.04, N = 3SE +/- 1.86, N = 3SE +/- 1.15, N = 3SE +/- 1.86, N = 31191221171131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.22, N = 2SE +/- 0.16, N = 3SE +/- 0.06, N = 2SE +/- 0.20, N = 1513.9214.0414.7914.45

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.52, N = 2SE +/- 0.53, N = 2103.68103.4598.11MIN: 96.86 / MAX: 105.56MIN: 95.22 / MAX: 105.88MIN: 89.88 / MAX: 100.25

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309080160240320400SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 33893873913741. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.39, N = 3SE +/- 0.36, N = 2SE +/- 0.19, N = 3103.57101.43103.5099.25MIN: 95.95 / MAX: 105.54MIN: 93.27 / MAX: 103.58MIN: 94.95 / MAX: 105.61MIN: 91.16 / MAX: 101.18

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 2.08, N = 3SE +/- 1.76, N = 3SE +/- 1.20, N = 3SE +/- 3.28, N = 31171221181191. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309080160240320400SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.58, N = 33703623653761. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.05, N = 2SE +/- 0.57, N = 3103.17101.24103.2499.43MIN: 95.79 / MAX: 105.15MIN: 93.33 / MAX: 102.92MIN: 95.41 / MAX: 104.9MIN: 90.49 / MAX: 101.97

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 1.49, N = 2SE +/- 0.45, N = 3SE +/- 0.39, N = 2SE +/- 0.14, N = 3102.60101.55103.2099.84MIN: 79.69 / MAX: 105.28MIN: 93.44 / MAX: 103.08MIN: 95.31 / MAX: 105.27MIN: 92.73 / MAX: 101.46

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.55, N = 3SE +/- 0.33, N = 3106.37107.59108.59105.55MIN: 97.91 / MAX: 108.16MIN: 98.77 / MAX: 109.43MIN: 99.04 / MAX: 110.68MIN: 91.76 / MAX: 107.42

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 0.36, N = 3SE +/- 0.73, N = 3SE +/- 0.09, N = 2201.94198.18201.19197.12MIN: 183.53 / MAX: 206.5MIN: 181.27 / MAX: 200.06MIN: 180.79 / MAX: 203.92MIN: 137.37 / MAX: 198.9

TensorFlow

Device: GPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.31050.6210.93151.2421.5525SE +/- 0.01, N = 2SE +/- 0.01, N = 3SE +/- 0.01, N = 31.351.361.381.38

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 2.19, N = 3SE +/- 4.81, N = 3SE +/- 2.00, N = 3SE +/- 0.33, N = 31561531561541. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: av1_grain_lapNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30909001800270036004500SE +/- 5.52, N = 3SE +/- 66.69, N = 3SE +/- 21.66, N = 3SE +/- 12.99, N = 34171.004103.404140.874126.891. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

TensorFlow

Device: GPU - Batch Size: 16 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090714212835SE +/- 0.17, N = 3SE +/- 0.08, N = 3SE +/- 0.07, N = 331.5931.4531.7031.98

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 3SE +/- 0.58, N = 3SE +/- 0.84, N = 396.896.796.495.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 0.17, N = 2SE +/- 0.10, N = 3SE +/- 0.30, N = 2SE +/- 0.07, N = 312.6212.7812.7912.82

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901632486480SE +/- 0.32, N = 3SE +/- 0.25, N = 3SE +/- 0.74, N = 3SE +/- 0.72, N = 370.871.071.370.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090816243240SE +/- 0.02, N = 2SE +/- 0.03, N = 3SE +/- 0.09, N = 2SE +/- 0.01, N = 335.1035.2135.4435.58

TensorFlow

Device: GPU - Batch Size: 16 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.33750.6751.01251.351.6875SE +/- 0.00, N = 2SE +/- 0.01, N = 2SE +/- 0.00, N = 31.481.501.491.49

TensorFlow

Device: GPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090816243240SE +/- 0.01, N = 3SE +/- 0.07, N = 2SE +/- 0.07, N = 334.1634.6134.46

TensorFlow

Device: GPU - Batch Size: 32 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.01, N = 2SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 315.6115.6315.8115.67

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.12, N = 3SE +/- 0.44, N = 3SE +/- 0.57, N = 3SE +/- 0.94, N = 387.286.887.386.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901.25332.50663.75995.01326.2665SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 3SE +/- 0.01, N = 35.515.555.505.57

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.88, N = 31021031031031. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 64 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.07, N = 3SE +/- 0.06, N = 2SE +/- 0.08, N = 315.5215.5415.5015.63

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-vec4NVIDIA RTX 30906K12K18K24K30KSE +/- 1.51, N = 326563.72

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 1.20, N = 3SE +/- 1.20, N = 3SE +/- 0.88, N = 3SE +/- 1.20, N = 31321311321321. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901.25332.50663.75995.01326.2665SE +/- 0.01, N = 2SE +/- 0.00, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 35.555.555.535.57

TensorFlow

Device: GPU - Batch Size: 32 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090816243240SE +/- 0.15, N = 2SE +/- 0.18, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 333.4033.3233.2933.53

TensorFlow

Device: GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.97881.95762.93643.91524.894SE +/- 0.01, N = 3SE +/- 0.02, N = 2SE +/- 0.03, N = 34.354.344.324.35

TensorFlow

Device: GPU - Batch Size: 256 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: VGG-16NVIDIA RTX 4070 TINVIDIA RTX 30900.33980.67961.01941.35921.699SE +/- 0.00, N = 31.501.51

TensorFlow

Device: GPU - Batch Size: 64 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.501.51

TensorFlow

Device: GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901.23532.47063.70594.94126.1765SE +/- 0.00, N = 2SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 35.465.495.465.49

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-scalarNVIDIA RTX 30904K8K12K16K20KSE +/- 36.15, N = 320263.13

TensorFlow

Device: GPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090816243240SE +/- 0.14, N = 3SE +/- 0.06, N = 3SE +/- 0.08, N = 333.9733.9334.0633.93

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-scalarNVIDIA RTX 30903K6K9K12K15KSE +/- 0.21, N = 313259.97

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-scalarNVIDIA RTX 30904K8K12K16K20KSE +/- 34.71, N = 320080.47

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-vec4NVIDIA RTX 30909K18K27K36K45KSE +/- 69.71, N = 339771.97

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-vec4NVIDIA RTX 30904K8K12K16K20KSE +/- 1.52, N = 316331.16

TensorFlow

Device: GPU - Batch Size: 16 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.05, N = 315.6715.6615.6915.68

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-scalarNVIDIA RTX 30904K8K12K16K20KSE +/- 3.21, N = 320280.33

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-vec4NVIDIA RTX 3090140280420560700SE +/- 0.02, N = 3638.72

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-vec4NVIDIA RTX 30904K8K12K16K20KSE +/- 2.34, N = 319996.92

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-scalarNVIDIA RTX 3090140280420560700SE +/- 0.03, N = 3638.70

TensorFlow

Device: GPU - Batch Size: 32 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 2SE +/- 0.00, N = 31.501.501.501.50

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090246810SE +/- 0.29, N = 9SE +/- 0.10, N = 9SE +/- 0.12, N = 8SE +/- 2.14, N = 62.862.342.846.38MIN: 2.17 / MAX: 577.17MIN: 2 / MAX: 3.86MIN: 2.4 / MAX: 5.07MIN: 2.14 / MAX: 1476.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30902004006008001000SE +/- 87.53, N = 9SE +/- 61.31, N = 9SE +/- 25.65, N = 9SE +/- 76.74, N = 6844.61281.56390.18663.24MIN: 46.34 / MAX: 1866.93MIN: 46.48 / MAX: 1913.33MIN: 46.49 / MAX: 1816.77MIN: 46.42 / MAX: 1833.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 3.28, N = 9SE +/- 0.31, N = 8SE +/- 0.24, N = 9SE +/- 1.17, N = 611.116.505.978.06MIN: 5.49 / MAX: 4942.19MIN: 5.52 / MAX: 460.02MIN: 5.49 / MAX: 7.35MIN: 5.43 / MAX: 1922.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090246810SE +/- 1.76, N = 9SE +/- 0.17, N = 9SE +/- 0.29, N = 9SE +/- 1.57, N = 66.865.275.366.63MIN: 4.34 / MAX: 1630.01MIN: 4.53 / MAX: 7.53MIN: 4.55 / MAX: 496.3MIN: 4.43 / MAX: 1636.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901428425670SE +/- 10.56, N = 9SE +/- 7.50, N = 9SE +/- 2.58, N = 9SE +/- 5.27, N = 663.8225.1116.4726.85MIN: 10.28 / MAX: 858.44MIN: 10.66 / MAX: 857.35MIN: 10.61 / MAX: 826.68MIN: 10.35 / MAX: 853.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901020304050SE +/- 14.70, N = 9SE +/- 0.10, N = 9SE +/- 4.23, N = 9SE +/- 11.48, N = 646.268.2414.3227.77MIN: 7.71 / MAX: 1829.99MIN: 7.87 / MAX: 9.87MIN: 7.9 / MAX: 1787.49MIN: 7.77 / MAX: 1603.331. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 5.86, N = 9SE +/- 3.71, N = 9SE +/- 0.03, N = 9SE +/- 0.02, N = 616.179.333.743.69MIN: 3.52 / MAX: 436.52MIN: 3.5 / MAX: 430.03MIN: 3.61 / MAX: 3.98MIN: 3.59 / MAX: 7.371. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 3.49, N = 9SE +/- 3.20, N = 9SE +/- 1.33, N = 9SE +/- 6.10, N = 68.978.585.4717.41MIN: 3.94 / MAX: 922.04MIN: 3.98 / MAX: 912.04MIN: 3.95 / MAX: 726.67MIN: 4.05 / MAX: 900.271. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 29.60, N = 9SE +/- 19.29, N = 9SE +/- 11.81, N = 9SE +/- 22.21, N = 6117.8154.5432.05145.72MIN: 17.16 / MAX: 647.67MIN: 17.54 / MAX: 646.66MIN: 17.34 / MAX: 644.35MIN: 17.46 / MAX: 648.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 1.21, N = 9SE +/- 0.14, N = 9SE +/- 0.14, N = 9SE +/- 1.05, N = 611.046.065.877.49MIN: 5.28 / MAX: 1769.19MIN: 5.33 / MAX: 8.36MIN: 5.2 / MAX: 6.88MIN: 5.46 / MAX: 1242.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.19580.39160.58740.78320.979SE +/- 0.04, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 60.840.840.810.86MIN: 0.65 / MAX: 4.63MIN: 0.64 / MAX: 0.96MIN: 0.61 / MAX: 1.19MIN: 0.64 / MAX: 3.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.97, N = 9SE +/- 0.09, N = 9SE +/- 0.06, N = 9SE +/- 9.20, N = 65.073.463.4913.87MIN: 3.22 / MAX: 1124.2MIN: 2.91 / MAX: 3.79MIN: 3.18 / MAX: 4.03MIN: 2.86 / MAX: 2218.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.93151.8632.79453.7264.6575SE +/- 1.31, N = 9SE +/- 0.08, N = 8SE +/- 0.05, N = 9SE +/- 0.06, N = 53.852.222.302.24MIN: 1.89 / MAX: 1093.29MIN: 1.83 / MAX: 2.54MIN: 2.15 / MAX: 2.58MIN: 2.07 / MAX: 6.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.93831.87662.81493.75324.6915SE +/- 0.34, N = 8SE +/- 0.12, N = 7SE +/- 0.10, N = 8SE +/- 2.18, N = 62.312.112.034.17MIN: 1.76 / MAX: 421.42MIN: 1.77 / MAX: 2.53MIN: 1.84 / MAX: 2.58MIN: 1.83 / MAX: 1393.331. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090246810SE +/- 0.16, N = 9SE +/- 6.77, N = 8SE +/- 0.09, N = 9SE +/- 1.14, N = 62.258.712.093.19MIN: 1.75 / MAX: 343.7MIN: 1.73 / MAX: 1561.29MIN: 1.78 / MAX: 2.85MIN: 1.82 / MAX: 1210.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30901.05532.11063.16594.22125.2765SE +/- 0.44, N = 9SE +/- 2.36, N = 9SE +/- 0.07, N = 9SE +/- 0.13, N = 63.034.692.432.65MIN: 2.38 / MAX: 970.87MIN: 1.91 / MAX: 1305.64MIN: 2.09 / MAX: 5.8MIN: 2.23 / MAX: 6.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30903691215SE +/- 0.47, N = 9SE +/- 2.50, N = 9SE +/- 0.98, N = 9SE +/- 4.97, N = 68.6210.148.4312.07MIN: 6.42 / MAX: 1101.3MIN: 6.53 / MAX: 1509.26MIN: 6.51 / MAX: 1023.8MIN: 6.42 / MAX: 1193.341. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 6.30, N = 3SE +/- 0.33, N = 3109.0109.0102.7110.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30904080120160200SE +/- 2.73, N = 3SE +/- 3.76, N = 3SE +/- 2.40, N = 3SE +/- 35.40, N = 3165.0166.0168.0132.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090246810SE +/- 0.114, N = 15SE +/- 0.003, N = 3SE +/- 0.003, N = 3SE +/- 0.006, N = 35.9126.9065.2265.7411. (CXX) g++ options: -O3 -march=native -fopenmp

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309048121620SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 1.13, N = 1213.5911.7413.9512.99MIN: 12.52 / MAX: 13.84MIN: 11.35 / MAX: 11.83MIN: 13.67 / MAX: 14.14MIN: 0.52 / MAX: 14.69

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090246810SE +/- 0.150, N = 15SE +/- 0.006, N = 3SE +/- 0.039, N = 3SE +/- 0.016, N = 36.3237.0925.9625.556

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 6.65, N = 5SE +/- 0.13, N = 3102.60102.9096.5099.05MIN: 94.84 / MAX: 104.25MIN: 95.98 / MAX: 104.54MIN: 64.35 / MAX: 104.79MIN: 91.8 / MAX: 100.69

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090120240360480600SE +/- 3.09, N = 3SE +/- 11.16, N = 12557.73546.76535.39525.12MIN: 513.63 / MAX: 563.37MIN: 195.25 / MAX: 556.94MIN: 428.43 / MAX: 572.99MIN: 458.54 / MAX: 542.46


Phoronix Test Suite v10.8.4