RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS) and ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402174-SADD-240211636&sro&grs.

RTX 4070 SUPERProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERIntel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads)ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS)Intel Device 7a2732GB4001GB Seagate ZP4000GP304001ASUS NVIDIA GeForce RTX 4070 SUPER 12GBRealtek ALC1220ARZOPAIntel I226-V + Intel Device 7a70EndeavourOS rolling6.7.1-arch1-1 (x86_64)KDE Plasma 5.27.10X Server 1.21.1.11NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.1 20230801ext41920x1080MSI NVIDIA GeForce RTX 4070 12GBGCC 13.2.1 20230801 + CUDA 12.3NVIDIA GeForce RTX 4070 Ti 12GBNVIDIA GeForce RTX 3090 24GBPI-KVM Video6.7.4-arch1-1 (x86_64)ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS)Intel Raptor Lake-S PCH4001GB Seagate ZP4000GP304001 + 0GB CD-ROM DriveASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GBIntel I226-V + Intel Raptor Lake-S PCH CNVi WiFiOpenCL 2.1 AMD-APP (3602.0) + OpenCL 3.0 CUDA 12.4.74OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- NVIDIA RTX 4070 SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 3090: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- NVIDIA RTX 4070 SUPER: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070 TI: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 3090: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070 TI SUPER: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11fGraphics Details- NVIDIA RTX 4070 SUPER: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1- NVIDIA RTX 4070: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2a- NVIDIA RTX 4070 TI: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36- NVIDIA RTX 3090: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.ba- NVIDIA RTX 4070 TI SUPER: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.03.45.00.c5Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected Environment Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090, NVIDIA RTX 4070 TI SUPER: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Python Details- NVIDIA RTX 4070: Python 3.11.6- NVIDIA RTX 4070 TI: Python 3.11.6- NVIDIA RTX 3090: Python 3.11.6- NVIDIA RTX 4070 TI SUPER: Python 3.11.7

RTX 4070 SUPERopencl-benchmark: Memory Bandwidth Coalesced Writevkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingvkfft: FFT + iFFT C2C 1D batched in single precisionclpeak: Global Memory Bandwidthopencl-benchmark: Memory Bandwidth Coalesced Readcl-mem: Writecl-mem: Readvkresample: 2x - Singleviennacl: OpenCL BLAS - dAXPYnamd-cuda: ATPase Simulation - 327,506 Atomslibplacebo: hdr_peakdetectclpeak: Integer Compute INTclpeak: Single-Precision Floathashcat: MD5opencl-benchmark: FP64 Computeclpeak: Double-Precision Doubleviennacl: OpenCL BLAS - dGEMM-TThashcat: TrueCrypt RIPEMD160 + XTShashcat: 7-Ziphashcat: SHA-512opencl-benchmark: INT8 Computevkresample: 2x - Doublegpuowl: 332220523hashcat: SHA1opencl-benchmark: FP32 Computeviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dDOTopencl-benchmark: INT32 Computeviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-NNgpuowl: 77936867opencl-benchmark: INT16 Computegpuowl: 57885161luxcorerender: Rainbow Colors and Prism - GPUviennacl: OpenCL BLAS - dCOPYrealsr-ncnn: 4x - Yesopencl-benchmark: INT64 Computeluxcorerender: Danish Mood - GPUvkfft: FFT + iFFT C2C 1D batched in double precisionrodinia: OpenCL Particle Filterblender: Pabellon Barcelona - NVIDIA OptiXviennacl: CPU BLAS - dDOTblender: Classroom - NVIDIA OptiXviennacl: CPU BLAS - dAXPYmandelgpu: GPUlibplacebo: deband_heavyviennacl: CPU BLAS - dCOPYoctanebench: Total Scoreindigobench: OpenCL GPU - Bedroomlibplacebo: polar_nocomputeluxcorerender: LuxCore Benchmark - GPUblender: Fishy Cat - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXviennacl: CPU BLAS - dGEMV-Nluxcorerender: Orange Juice - GPUviennacl: CPU BLAS - sAXPYvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkpeak: int16-vec4viennacl: OpenCL BLAS - sAXPYpytorch: NVIDIA CUDA GPU - 512 - ResNet-50pytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 16 - ResNet-50vkfft: FFT + iFFT C2C multidimensional in single precisionindigobench: OpenCL GPU - Supercarvkfft: FFT + iFFT R2C / C2Rpytorch: NVIDIA CUDA GPU - 64 - ResNet-50blender: BMW27 - NVIDIA OptiXfahbench: pytorch: NVIDIA CUDA GPU - 256 - ResNet-152viennacl: CPU BLAS - sCOPYpytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-152tensorflow: GPU - 1 - AlexNetpytorch: NVIDIA CUDA GPU - 512 - ResNet-152waifu2x-ncnn: 2x - 3 - Yesvkpeak: int16-scalarpytorch: NVIDIA CUDA GPU - 64 - ResNet-152libplacebo: hdr_lutvkpeak: fp16-vec4vkpeak: fp32-vec4vkpeak: fp16-scalarvkpeak: int32-vec4vkpeak: fp32-scalarvkpeak: int32-scalarvkfft: FFT + iFFT C2C Bluestein in single precisionvkpeak: fp64-vec4vkpeak: fp64-scalarviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - sCOPYcl-mem: Copyviennacl: CPU BLAS - dGEMM-TTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-NNpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_ltensorflow: GPU - 1 - ResNet-50tensorflow: GPU - 1 - GoogLeNettensorflow: GPU - 32 - GoogLeNettensorflow: GPU - 1 - VGG-16tensorflow: GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lviennacl: CPU BLAS - dGEMM-NTtensorflow: GPU - 64 - GoogLeNettensorflow: GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_ltensorflow: GPU - 16 - VGG-16tensorflow: GPU - 64 - VGG-16tensorflow: GPU - 16 - ResNet-50libplacebo: av1_grain_lappytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_ltensorflow: GPU - 16 - AlexNettensorflow: GPU - 32 - VGG-16tensorflow: GPU - 256 - VGG-16tensorflow: GPU - 16 - GoogLeNetpytorch: NVIDIA CUDA GPU - 1 - ResNet-152tensorflow: GPU - 32 - AlexNettensorflow: GPU - 256 - AlexNettensorflow: GPU - 512 - AlexNettensorflow: GPU - 64 - AlexNetncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetneatbench: GPUviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - sDOTfinancebench: Black-Scholes OpenCLluxcorerender: DLSC - GPUvkfft: FFT + iFFT C2C 1D batched in half precisionrealsr-ncnn: 4x - Nopytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 1 - ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER455.017507873929437.65464.86407.5446.218.4894370.067913292.3718170.5435492.69675830333330.621630.116138029671176467323273333314.307339.593137.442213260000038.59459945819.889584577646.4117.170869.0727.6742334.8854.21410.56243173.48014.2996.812.6087.2587219538.22186.7070.8720.97378919.8012327.5512.829.4551.3010211.721564451392504.27504.67501.50509.455029952.81354794507.455.57366.0576194.58132195.39195.4013.92195.302.855196.073905.9815166210389370334331.81221151194.3512.6215.611.355.55103.5711715.525.51103.17102.601.485.464171.00106.3731.591.5015.67201.9433.434.1635.1033.972.86844.6111.116.8663.8246.2616.178.97117.8111.040.845.073.852.312.253.038.6240701091655.91213.591317056.323102.60557.73459.437905777774437.21465.18406.7446.318.0164550.074983329.2614555.1928479.39561478666670.510515.17502660967976967267330000012.116415.160112.611820246666731.76849445616.377477473530.3214.284714.8023.2642342.8523.4438.89223904.09816.5596.714.8686.8516770131.21843.2671.0647.99786718.2031968.3710.9211.0358.4410310.401533886389459.27459.93459.94458.394721248.51747097458.366.21317.1952187.27131187.69187.2614.04187.513.168186.633946.9013714209387362330330.3118121122103.684.3412.7815.631.365.55101.4312215.545.55101.24101.551.501.505.494152.41107.5931.451.515.66198.1833.3235.2133.932.67382.826.215.1820.748.725.785.1145.526.060.843.592.242.082.152.487.2040701091666.90611.741377627.092102.90546.76457.177514173942437.63465.07412.2446.318.4564370.067883475.0619821.1038691.73733122333330.660667.056488586001262633346250000015.731322.064145.842353240000040.91463445721.047612604676.5918.281919.1327.7142433.6264.42010.99254313.29113.9796.412.3087.3619106132.52306.5671.3735.94059320.2562459.0313.239.0250.7310311.891564647393504.66505.55502.925152853.58955446505.625.43382.1637195.86132198.82194.2914.79194.872.854197.023976.0415125211391365336333.3124125117103.454.3212.7915.811.385.53103.5011815.505.50103.24103.201.491.55.464143.96108.5931.701.51.515.69201.1933.2934.6135.4434.063.04497.665.896.1316.3712.256.077.7434.497.370.823.464.142.012.092.547.454070102.71685.22613.951362105.96296.50535.39887.31144311141876816.55864.11753.8825.810.3237240.108225055.8817923.3334906.79671773000000.637642.235937978331056000308186666713.727333.639137.322132373333339.39559465920.027595592645.9917.001866.3133.2960530.3133.13510.20309123.84417.3095.215.2686.2484098913.82020.1670.2674.25091220.9592116.7913.1210.6454.3010312.14154419516329.72498416.20416.89420.29419.765085652.01448418419.036.31343.0199161.01132163.74164.1414.45164.353.20213264.91164.143369.8839860.8026699.6620151.4420009.7320353.9520295.2714205638.74638.84187374376363360.811312111398.114.3512.8215.671.385.5799.2511915.635.5799.4399.841.491.515.494100.36105.5531.981.51.5115.68197.1233.5334.4635.5833.932.65354.576.734.9011.2912.703.604.1217.886.140.873.342.162.042.212.347.273090110132.15.74112.992732215.55699.05525.12608.94105549104003582.84619.03551.9595.213.3635850.077153913.3422171.2543244.79820049666670.743750.367319617331420700388703333317.615285.988163.412638860000045.95071457523.660689681761.6120.5031025.9931.8651230.7244.41412.42279472.97312.5670.811.2064.3656484783.72495.9252.7876.43699424.5702653.0314.618.3244.4978.513.64120504721156.99469529.49529.14532.77531.965979061.33859378527.825.04394.7356198.70107197.82198.5812.26198.012.66015901.32196.503822.1647340.5231635.4723894.7023768.2723920.6723888.0216141750.68750.49218424410373370.7117120122103.664.1412.2415.111.325.33103.5311915.005.35102.83103.491.451.465.324044.72105.8631.101.461.4715.29200.4632.8833.9535.0233.552.55312.106.595.1917.208.794.417.5824.856.460.883.482.262.132.162.707.482084.182.61290.50116.231439925.633102.83558.82OpenBenchmarking.org

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.06, N = 3SE +/- 0.16, N = 3SE +/- 0.14, N = 3SE +/- 0.11, N = 3SE +/- 0.57, N = 3887.31459.43455.01457.17608.941. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER30K60K90K120K150KSE +/- 37.44, N = 3SE +/- 5.84, N = 3SE +/- 37.77, N = 3SE +/- 28.54, N = 3SE +/- 20.80, N = 31443117905775078751411055491. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER30K60K90K120K150KSE +/- 9.64, N = 3SE +/- 13.72, N = 3SE +/- 7.94, N = 3SE +/- 0.88, N = 3SE +/- 33.60, N = 31418767777473929739421040031. (CXX) g++ options: -O3 -lrt

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3816.55437.21437.65437.63582.841. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3864.11465.18464.86465.07619.031. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.83, N = 3SE +/- 0.55, N = 3SE +/- 1.11, N = 3SE +/- 0.12, N = 3SE +/- 0.25, N = 3753.8406.7407.5412.2551.91. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.32, N = 3SE +/- 0.00, N = 3SE +/- 0.12, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3825.8446.3446.2446.3595.21. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 310.3218.0218.4918.4613.361. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 37244554374375851. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.02430.04860.07290.09720.1215SE +/- 0.00042, N = 3SE +/- 0.00021, N = 3SE +/- 0.00031, N = 3SE +/- 0.00061, N = 3SE +/- 0.00018, N = 30.108220.074980.067910.067880.07715

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_peakdetectNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER11002200330044005500SE +/- 75.68, N = 3SE +/- 11.75, N = 3SE +/- 3.65, N = 3SE +/- 165.74, N = 3SE +/- 13.83, N = 34969.743310.023292.373475.063913.341. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 16.49, N = 3SE +/- 15.26, N = 3SE +/- 3.14, N = 3SE +/- 2.50, N = 3SE +/- 28.14, N = 317923.3314555.1918170.5419821.1022171.251. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER9K18K27K36K45KSE +/- 113.39, N = 3SE +/- 5.46, N = 3SE +/- 0.99, N = 3SE +/- 11.67, N = 3SE +/- 50.25, N = 334906.7928479.3935492.6938691.7343244.791. (CXX) g++ options: -O3

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20000M40000M60000M80000M100000MSE +/- 53667246.37, N = 3SE +/- 33772046.30, N = 3SE +/- 22430807.19, N = 3SE +/- 11283665.68, N = 3SE +/- 97655010.68, N = 36717730000056147866667675830333337331223333382004966667

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 ComputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.16720.33440.50160.66880.836SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.6370.5100.6210.6600.7431. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER160320480640800SE +/- 1.63, N = 3SE +/- 0.21, N = 3SE +/- 0.98, N = 3SE +/- 1.33, N = 3SE +/- 1.26, N = 3642.23515.17630.11667.05750.361. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 1.33, N = 35935026136487311. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER200K400K600K800K1000KSE +/- 1757.21, N = 3SE +/- 176.38, N = 3SE +/- 633.33, N = 3SE +/- 888.82, N = 3SE +/- 392.99, N = 3797833660967802967858600961733

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER300K600K900K1200K1500KSE +/- 1587.45, N = 3SE +/- 2062.63, N = 3SE +/- 1991.93, N = 3SE +/- 2339.04, N = 3SE +/- 1628.91, N = 31056000976967117646712626331420700

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER800M1600M2400M3200M4000MSE +/- 3288532.26, N = 3SE +/- 1059874.21, N = 3SE +/- 1530068.99, N = 3SE +/- 721110.26, N = 3SE +/- 1098989.43, N = 330818666672673300000323273333334625000003887033333

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 ComputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 313.7312.1214.3115.7317.621. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.30, N = 3SE +/- 0.77, N = 3SE +/- 0.30, N = 3SE +/- 0.35, N = 3SE +/- 0.02, N = 3333.64415.16339.59322.06285.991. (CXX) g++ options: -O3

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 332220523NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3137.32112.61137.44145.84163.41

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER6000M12000M18000M24000M30000MSE +/- 26244639.66, N = 3SE +/- 6318315.53, N = 3SE +/- 5140363.15, N = 3SE +/- 15926811.78, N = 3SE +/- 29067564.97, N = 32132373333318202466667221326000002353240000026388600000

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 ComputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1020304050SE +/- 0.10, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 339.4031.7738.5940.9145.951. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER150300450600750SE +/- 2.03, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.67, N = 3SE +/- 1.00, N = 35944945996347141. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER140280420560700SE +/- 0.88, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 1.33, N = 36594564584575751. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 ComputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER612182430SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.04, N = 3SE +/- 0.01, N = 320.0316.3819.8921.0523.661. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER150300450600750SE +/- 2.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 1.00, N = 35954775846126891. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER150300450600750SE +/- 2.31, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 1.33, N = 35924735776046811. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 77936867NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.00, N = 3SE +/- 0.09, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3645.99530.32646.41676.59761.61

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 ComputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER510152025SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 317.0014.2817.1718.2820.501. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 57885161NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 2.01, N = 3SE +/- 0.00, N = 3SE +/- 1.26, N = 3SE +/- 2.53, N = 3SE +/- 0.35, N = 3866.31714.80869.07919.131025.99

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER816243240SE +/- 0.36, N = 5SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 333.2923.2627.6727.7131.86MIN: 30.4 / MAX: 36.21MIN: 20.92 / MAX: 24.3MIN: 24.87 / MAX: 29.03MIN: 25.01 / MAX: 29.15MIN: 28.57 / MAX: 33.29

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER130260390520650SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 4.00, N = 36054234234245121. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1020304050SE +/- 0.06, N = 3SE +/- 0.23, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 330.3142.8534.8933.6330.72

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 ComputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.99451.9892.98353.9784.9725SE +/- 0.003, N = 3SE +/- 0.004, N = 3SE +/- 0.015, N = 3SE +/- 0.016, N = 3SE +/- 0.009, N = 33.1353.4434.2144.4204.4141. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3691215SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 0.11, N = 3SE +/- 0.03, N = 310.208.8910.5610.9912.42MIN: 4.07 / MAX: 11.93MIN: 3.32 / MAX: 10.26MIN: 3.7 / MAX: 12.17MIN: 4.17 / MAX: 12.71MIN: 4.35 / MAX: 14.32

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in double precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER7K14K21K28K35KSE +/- 50.66, N = 3SE +/- 125.94, N = 3SE +/- 146.69, N = 3SE +/- 302.46, N = 3SE +/- 325.03, N = 330912223902431725431279471. (CXX) g++ options: -O3 -lrt

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.92211.84422.76633.68844.6105SE +/- 0.030, N = 15SE +/- 0.008, N = 3SE +/- 0.039, N = 4SE +/- 0.002, N = 3SE +/- 0.004, N = 33.8444.0983.4803.2912.9731. (CXX) g++ options: -O2 -lOpenCL

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 317.3016.5514.2913.9712.56

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.84, N = 3SE +/- 0.22, N = 3SE +/- 0.09, N = 3SE +/- 0.58, N = 3SE +/- 0.19, N = 395.296.796.896.470.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 315.2614.8612.6012.3011.20

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.94, N = 3SE +/- 0.44, N = 3SE +/- 0.12, N = 3SE +/- 0.57, N = 3SE +/- 0.12, N = 386.286.887.287.364.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER140M280M420M560M700MSE +/- 794770.01, N = 3SE +/- 1783157.89, N = 3SE +/- 467034.80, N = 3SE +/- 1202791.77, N = 3SE +/- 1096202.13, N = 3484098913.8516770131.2587219538.2619106132.5656484783.71. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: deband_heavyNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER5001000150020002500SE +/- 2.90, N = 3SE +/- 0.12, N = 3SE +/- 2.26, N = 3SE +/- 0.40, N = 3SE +/- 2.22, N = 32015.931843.262186.702306.562493.291. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1632486480SE +/- 0.72, N = 3SE +/- 0.25, N = 3SE +/- 0.32, N = 3SE +/- 0.74, N = 3SE +/- 0.18, N = 370.271.070.871.352.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000674.25648.00720.97735.94876.44

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 320.9618.2019.8020.2624.57

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: polar_nocomputeNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER6001200180024003000SE +/- 3.16, N = 3SE +/- 0.01, N = 3SE +/- 0.24, N = 3SE +/- 1.70, N = 3SE +/- 0.38, N = 32116.501968.372327.552459.032646.701. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.03, N = 2SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 313.1210.9212.8213.2314.61MIN: 4.85 / MAX: 15.21MIN: 4.45 / MAX: 12.42MIN: 4.84 / MAX: 14.62MIN: 5.41 / MAX: 15.13MIN: 5.91 / MAX: 16.88

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3691215SE +/- 0.08, N = 9SE +/- 0.03, N = 3SE +/- 0.06, N = 13SE +/- 0.01, N = 3SE +/- 0.06, N = 1310.6411.039.459.028.32

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1326395265SE +/- 0.02, N = 2SE +/- 0.04, N = 3SE +/- 0.10, N = 3SE +/- 0.05, N = 3SE +/- 0.08, N = 354.3058.4451.3050.7344.49

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.88, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.46, N = 3103.0103.0102.0103.078.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.00, N = 3SE +/- 0.15, N = 412.1410.4011.7211.8913.64MIN: 10.24 / MAX: 16.71MIN: 8.31 / MAX: 13.9MIN: 9.6 / MAX: 15.44MIN: 9.85 / MAX: 15.88MIN: 11.16 / MAX: 18.46

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 0.33, N = 3SE +/- 4.81, N = 3SE +/- 2.19, N = 3SE +/- 2.00, N = 3SE +/- 0.33, N = 31541531561561201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein benchmark in double precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER11002200330044005500SE +/- 9.84, N = 3SE +/- 4.51, N = 3SE +/- 12.55, N = 3SE +/- 11.35, N = 3SE +/- 11.37, N = 3419538864451464750471. (CXX) g++ options: -O3 -lrt

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 28.77, N = 3SE +/- 3.02, N = 316302.5821124.09

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER110220330440550SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 1.20, N = 34983893923934691. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER110220330440550SE +/- 0.40, N = 3SE +/- 0.43, N = 2SE +/- 4.43, N = 2SE +/- 0.83, N = 2SE +/- 1.16, N = 3416.20459.27504.27504.66529.49MIN: 355.45 / MAX: 419.05MIN: 405.48 / MAX: 461.88MIN: 418.22 / MAX: 512.44MIN: 424.27 / MAX: 509.08MIN: 410.12 / MAX: 537.25

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI SUPER110220330440550SE +/- 0.14, N = 3SE +/- 0.34, N = 3SE +/- 1.39, N = 3SE +/- 0.54, N = 3416.89459.93504.67529.14MIN: 329.77 / MAX: 420.82MIN: 403.65 / MAX: 462.74MIN: 412.34 / MAX: 514.07MIN: 414.54 / MAX: 534.65

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER120240360480600SE +/- 0.13, N = 2SE +/- 2.17, N = 2SE +/- 1.69, N = 3SE +/- 0.70, N = 3420.29459.94501.50505.55532.77MIN: 376.81 / MAX: 421.58MIN: 403.65 / MAX: 462.59MIN: 415.94 / MAX: 510.69MIN: 419.93 / MAX: 512.69MIN: 420.31 / MAX: 538.98

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER120240360480600SE +/- 0.89, N = 2SE +/- 0.26, N = 3SE +/- 2.23, N = 3SE +/- 1.33, N = 3419.76458.39509.45502.92531.96MIN: 376.2 / MAX: 422.17MIN: 404.5 / MAX: 461.01MIN: 430.1 / MAX: 516.48MIN: 415.65 / MAX: 520.39MIN: 422.98 / MAX: 539.81

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C multidimensional in single precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER13K26K39K52K65KSE +/- 407.28, N = 15SE +/- 476.57, N = 5SE +/- 407.19, N = 15SE +/- 417.77, N = 15SE +/- 251.10, N = 350856472125029951528597901. (CXX) g++ options: -O3 -lrt

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1428425670SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 352.0148.5252.8153.5961.34

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT R2C / C2RNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER13K26K39K52K65KSE +/- 320.62, N = 3SE +/- 745.02, N = 13SE +/- 702.53, N = 15SE +/- 520.37, N = 3SE +/- 772.47, N = 1548418470975479455446593781. (CXX) g++ options: -O3 -lrt

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER110220330440550SE +/- 0.24, N = 3SE +/- 0.27, N = 3SE +/- 0.92, N = 3SE +/- 1.92, N = 3SE +/- 1.58, N = 3419.03458.36507.45505.62527.82MIN: 376 / MAX: 422MIN: 404.89 / MAX: 461.01MIN: 423.41 / MAX: 512.88MIN: 426.6 / MAX: 513.25MIN: 419.39 / MAX: 534.44

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER246810SE +/- 0.06, N = 14SE +/- 0.01, N = 3SE +/- 0.06, N = 13SE +/- 0.02, N = 3SE +/- 0.06, N = 146.316.215.575.435.04

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.26, N = 3SE +/- 0.12, N = 3SE +/- 0.39, N = 3SE +/- 0.26, N = 3SE +/- 0.22, N = 3343.02317.20366.06382.16394.74

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.17, N = 3SE +/- 1.14, N = 2SE +/- 0.19, N = 2SE +/- 0.95, N = 3161.01187.27194.58195.86198.70MIN: 138.12 / MAX: 165.16MIN: 179.9 / MAX: 188.08MIN: 183.74 / MAX: 198.52MIN: 181.64 / MAX: 199.2MIN: 185.21 / MAX: 203.36

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 1.20, N = 3SE +/- 1.20, N = 3SE +/- 1.20, N = 3SE +/- 0.88, N = 3SE +/- 0.67, N = 31321311321321071. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.29, N = 3SE +/- 0.28, N = 3163.74187.69195.39198.82197.82MIN: 144.93 / MAX: 165.03MIN: 182.03 / MAX: 188.31MIN: 183.94 / MAX: 198.7MIN: 188.33 / MAX: 201.47MIN: 176.19 / MAX: 201.63

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.29, N = 3SE +/- 0.33, N = 3164.14187.26195.40194.29198.58MIN: 145.67 / MAX: 165.38MIN: 179.81 / MAX: 188.21MIN: 186.09 / MAX: 197.7MIN: 182.25 / MAX: 197.39MIN: 183.91 / MAX: 201.98

TensorFlow

Device: GPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.20, N = 15SE +/- 0.16, N = 3SE +/- 0.22, N = 2SE +/- 0.06, N = 2SE +/- 0.13, N = 1514.4514.0413.9214.7912.26

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.33, N = 2SE +/- 0.05, N = 3SE +/- 1.38, N = 2SE +/- 0.81, N = 3164.35187.51195.30194.87198.01MIN: 149.91 / MAX: 166.09MIN: 181.57 / MAX: 188.05MIN: 182 / MAX: 199.43MIN: 180.8 / MAX: 198MIN: 185.3 / MAX: 202.59

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.72051.4412.16152.8823.6025SE +/- 0.011, N = 3SE +/- 0.028, N = 3SE +/- 0.014, N = 3SE +/- 0.009, N = 3SE +/- 0.028, N = 33.2023.1682.8552.8542.660

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3K6K9K12K15KSE +/- 23.55, N = 3SE +/- 22.68, N = 313225.1715859.37

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.34, N = 3SE +/- 0.51, N = 3SE +/- 0.78, N = 2SE +/- 0.20, N = 3164.14186.63196.07197.02196.50MIN: 149 / MAX: 165MIN: 180.51 / MAX: 187.79MIN: 171.95 / MAX: 199.96MIN: 183.92 / MAX: 200.54MIN: 179.34 / MAX: 200

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_lutNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 13.62, N = 3SE +/- 10.06, N = 3SE +/- 12.09, N = 3SE +/- 5.47, N = 3SE +/- 16.37, N = 33313.263927.113905.983971.613822.161. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER10K20K30K40K50KSE +/- 75.11, N = 3SE +/- 76.15, N = 339746.9147192.56

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER7K14K21K28K35KSE +/- 1.51, N = 3SE +/- 43.84, N = 326563.7231591.71

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 34.71, N = 3SE +/- 34.93, N = 320080.4723825.05

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 2.34, N = 3SE +/- 34.76, N = 319996.9223733.30

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 36.15, N = 3SE +/- 38.13, N = 320263.1323883.53

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 3.21, N = 3SE +/- 0.36, N = 320280.3323874.85

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein in single precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3K6K9K12K15KSE +/- 115.62, N = 3SE +/- 52.09, N = 3SE +/- 102.52, N = 3SE +/- 118.41, N = 3SE +/- 73.00, N = 314205137141516615125161411. (CXX) g++ options: -O3 -lrt

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.02, N = 3SE +/- 0.00, N = 3638.72749.76

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.03, N = 3SE +/- 0.01, N = 3638.70750.47

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER50100150200250SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 31872092102112181. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33743873893914241. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 33763623703654101. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER80160240320400SE +/- 1.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 33633303343363731. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER80160240320400SE +/- 0.22, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3360.8330.3331.8333.3370.71. (CC) gcc options: -O2 -flto -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 0.88, N = 3SE +/- 1.20, N = 3SE +/- 2.08, N = 3SE +/- 2.08, N = 3SE +/- 2.91, N = 31131181221241171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 2.08, N = 3SE +/- 2.31, N = 3SE +/- 1.00, N = 2SE +/- 2.08, N = 3SE +/- 3.00, N = 21211211151251201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 1.86, N = 3SE +/- 1.86, N = 3SE +/- 4.04, N = 3SE +/- 1.15, N = 3SE +/- 1.50, N = 21131221191171221. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.53, N = 2SE +/- 0.52, N = 2SE +/- 0.33, N = 398.11103.68103.45103.66MIN: 89.88 / MAX: 100.25MIN: 96.86 / MAX: 105.56MIN: 95.22 / MAX: 105.88MIN: 93.46 / MAX: 105.95

TensorFlow

Device: GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.97881.95762.93643.91524.894SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 2SE +/- 0.02, N = 34.354.344.354.324.14

TensorFlow

Device: GPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3691215SE +/- 0.07, N = 3SE +/- 0.10, N = 3SE +/- 0.17, N = 2SE +/- 0.30, N = 2SE +/- 0.05, N = 312.8212.7812.6212.7912.24

TensorFlow

Device: GPU - Batch Size: 32 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 2SE +/- 0.03, N = 3SE +/- 0.06, N = 315.6715.6315.6115.8115.11

TensorFlow

Device: GPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.31050.6210.93151.2421.5525SE +/- 0.01, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 3SE +/- 0.00, N = 31.381.361.351.381.32

TensorFlow

Device: GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1.25332.50663.75995.01326.2665SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 35.575.555.555.535.33

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.19, N = 3SE +/- 0.39, N = 3SE +/- 0.36, N = 299.25101.43103.57103.50103.53MIN: 91.16 / MAX: 101.18MIN: 93.27 / MAX: 103.58MIN: 95.95 / MAX: 105.54MIN: 94.95 / MAX: 105.61MIN: 88.81 / MAX: 104.8

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 3.28, N = 3SE +/- 1.76, N = 3SE +/- 2.08, N = 3SE +/- 1.20, N = 3SE +/- 3.50, N = 21191221171181191. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 64 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.08, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 2SE +/- 0.09, N = 315.6315.5415.5215.5015.00

TensorFlow

Device: GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1.25332.50663.75995.01326.2665SE +/- 0.01, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 3SE +/- 0.00, N = 35.575.555.515.505.35

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.57, N = 3SE +/- 0.05, N = 2SE +/- 0.18, N = 399.43101.24103.17103.24102.83MIN: 90.49 / MAX: 101.97MIN: 93.33 / MAX: 102.92MIN: 95.79 / MAX: 105.15MIN: 95.41 / MAX: 104.9MIN: 93.16 / MAX: 105.07

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.14, N = 3SE +/- 0.45, N = 3SE +/- 1.49, N = 2SE +/- 0.39, N = 2SE +/- 0.13, N = 399.84101.55102.60103.20103.49MIN: 92.73 / MAX: 101.46MIN: 93.44 / MAX: 103.08MIN: 79.69 / MAX: 105.28MIN: 95.31 / MAX: 105.27MIN: 93.23 / MAX: 105.43

TensorFlow

Device: GPU - Batch Size: 16 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.01, N = 2SE +/- 0.00, N = 2SE +/- 0.00, N = 31.491.501.481.491.45

TensorFlow

Device: GPU - Batch Size: 64 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.511.501.501.46

TensorFlow

Device: GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1.23532.47063.70594.94126.1765SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 2SE +/- 0.01, N = 3SE +/- 0.02, N = 35.495.495.465.465.32

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: av1_grain_lapNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 42.82, N = 3SE +/- 66.69, N = 3SE +/- 5.52, N = 3SE +/- 21.66, N = 3SE +/- 48.74, N = 34096.484103.404171.004140.874044.721. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.55, N = 3SE +/- 0.24, N = 2105.55107.59106.37108.59105.86MIN: 91.76 / MAX: 107.42MIN: 98.77 / MAX: 109.43MIN: 97.91 / MAX: 108.16MIN: 99.04 / MAX: 110.68MIN: 95.05 / MAX: 107.6

TensorFlow

Device: GPU - Batch Size: 16 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER714212835SE +/- 0.07, N = 3SE +/- 0.17, N = 3SE +/- 0.08, N = 3SE +/- 0.07, N = 331.9831.4531.5931.7031.10

TensorFlow

Device: GPU - Batch Size: 32 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 2SE +/- 0.00, N = 31.501.501.501.501.46

TensorFlow

Device: GPU - Batch Size: 256 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 31.511.501.47

TensorFlow

Device: GPU - Batch Size: 16 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 315.6815.6615.6715.6915.29

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.09, N = 2SE +/- 0.36, N = 3SE +/- 0.73, N = 3SE +/- 0.38, N = 3197.12198.18201.94201.19200.46MIN: 137.37 / MAX: 198.9MIN: 181.27 / MAX: 200.06MIN: 183.53 / MAX: 206.5MIN: 180.79 / MAX: 203.92MIN: 177.25 / MAX: 203.31

TensorFlow

Device: GPU - Batch Size: 32 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER816243240SE +/- 0.05, N = 3SE +/- 0.18, N = 3SE +/- 0.15, N = 2SE +/- 0.04, N = 3SE +/- 0.19, N = 333.5333.3233.4033.2932.88

TensorFlow

Device: GPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER816243240SE +/- 0.07, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 2SE +/- 0.05, N = 334.4634.1634.6133.95

TensorFlow

Device: GPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER816243240SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 2SE +/- 0.09, N = 2SE +/- 0.01, N = 335.5835.2135.1035.4435.02

TensorFlow

Device: GPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER816243240SE +/- 0.08, N = 3SE +/- 0.14, N = 3SE +/- 0.06, N = 3SE +/- 0.06, N = 333.9333.9333.9734.0633.55

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER246810SE +/- 0.08, N = 8SE +/- 0.10, N = 9SE +/- 0.29, N = 9SE +/- 0.12, N = 8SE +/- 0.26, N = 32.502.342.862.842.54MIN: 2.1 / MAX: 32.36MIN: 2 / MAX: 3.86MIN: 2.17 / MAX: 577.17MIN: 2.4 / MAX: 5.07MIN: 2.14 / MAX: 4.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 52.80, N = 9SE +/- 61.31, N = 9SE +/- 87.53, N = 9SE +/- 25.65, N = 9SE +/- 57.46, N = 12327.82281.56844.61390.18312.10MIN: 46.48 / MAX: 1816.93MIN: 46.48 / MAX: 1913.33MIN: 46.34 / MAX: 1866.93MIN: 46.49 / MAX: 1816.77MIN: 47.85 / MAX: 1850.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3691215SE +/- 0.32, N = 8SE +/- 0.24, N = 12SE +/- 3.28, N = 9SE +/- 0.18, N = 12SE +/- 0.26, N = 126.476.2111.115.896.59MIN: 5.44 / MAX: 9.3MIN: 5.53 / MAX: 8.99MIN: 5.49 / MAX: 4942.19MIN: 5.42 / MAX: 7.57MIN: 5.45 / MAX: 9.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER246810SE +/- 0.19, N = 3SE +/- 0.12, N = 12SE +/- 1.76, N = 9SE +/- 0.29, N = 9SE +/- 0.54, N = 34.905.186.865.365.11MIN: 4.47 / MAX: 5.27MIN: 4.67 / MAX: 6.88MIN: 4.34 / MAX: 1630.01MIN: 4.55 / MAX: 496.3MIN: 4.43 / MAX: 8.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1428425670SE +/- 0.21, N = 3SE +/- 5.37, N = 12SE +/- 10.56, N = 9SE +/- 3.10, N = 12SE +/- 3.14, N = 311.2920.7463.8216.3714.26MIN: 10.82 / MAX: 11.93MIN: 10.3 / MAX: 854.36MIN: 10.28 / MAX: 858.44MIN: 10.57 / MAX: 855.36MIN: 10.89 / MAX: 673.371. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1020304050SE +/- 0.12, N = 9SE +/- 0.10, N = 9SE +/- 14.70, N = 9SE +/- 4.00, N = 12SE +/- 0.22, N = 38.208.2446.2612.258.58MIN: 7.69 / MAX: 11.69MIN: 7.87 / MAX: 9.87MIN: 7.71 / MAX: 1829.99MIN: 8 / MAX: 1777.17MIN: 8.2 / MAX: 11.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.09, N = 3SE +/- 1.70, N = 12SE +/- 5.86, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 33.605.7816.173.744.38MIN: 3.44 / MAX: 3.79MIN: 3.6 / MAX: 397.75MIN: 3.52 / MAX: 436.52MIN: 3.61 / MAX: 3.98MIN: 4.29 / MAX: 6.181. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.07, N = 3SE +/- 0.73, N = 12SE +/- 3.49, N = 9SE +/- 1.33, N = 9SE +/- 0.08, N = 34.125.118.975.474.64MIN: 3.97 / MAX: 4.51MIN: 3.99 / MAX: 916.69MIN: 3.94 / MAX: 922.04MIN: 3.95 / MAX: 726.67MIN: 4.46 / MAX: 7.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER306090120150SE +/- 0.25, N = 3SE +/- 13.24, N = 12SE +/- 29.60, N = 9SE +/- 11.81, N = 9SE +/- 0.19, N = 317.8845.52117.8132.0521.76MIN: 17.3 / MAX: 18.57MIN: 17.49 / MAX: 643.35MIN: 17.16 / MAX: 647.67MIN: 17.34 / MAX: 644.35MIN: 21.34 / MAX: 23.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3691215SE +/- 0.18, N = 9SE +/- 0.14, N = 9SE +/- 1.21, N = 9SE +/- 0.14, N = 9SE +/- 0.24, N = 36.116.0611.045.876.25MIN: 5.25 / MAX: 9.16MIN: 5.33 / MAX: 8.36MIN: 5.28 / MAX: 1769.19MIN: 5.2 / MAX: 6.88MIN: 5.84 / MAX: 6.861. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.1980.3960.5940.7920.99SE +/- 0.03, N = 9SE +/- 0.03, N = 9SE +/- 0.04, N = 9SE +/- 0.03, N = 9SE +/- 0.05, N = 30.840.840.840.810.86MIN: 0.63 / MAX: 1.13MIN: 0.64 / MAX: 0.96MIN: 0.65 / MAX: 4.63MIN: 0.61 / MAX: 1.19MIN: 0.75 / MAX: 2.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 0.17, N = 3SE +/- 0.09, N = 9SE +/- 0.97, N = 9SE +/- 0.07, N = 12SE +/- 0.07, N = 33.343.465.073.463.36MIN: 3.14 / MAX: 4MIN: 2.91 / MAX: 3.79MIN: 3.22 / MAX: 1124.2MIN: 3.13 / MAX: 7.03MIN: 3.21 / MAX: 3.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.93151.8632.79453.7264.6575SE +/- 0.14, N = 3SE +/- 0.08, N = 8SE +/- 1.31, N = 9SE +/- 0.05, N = 9SE +/- 0.13, N = 32.162.223.852.302.21MIN: 2.01 / MAX: 2.55MIN: 1.83 / MAX: 2.54MIN: 1.89 / MAX: 1093.29MIN: 2.15 / MAX: 2.58MIN: 2.01 / MAX: 3.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER0.93831.87662.81493.75324.6915SE +/- 0.21, N = 3SE +/- 0.09, N = 11SE +/- 0.34, N = 8SE +/- 0.08, N = 12SE +/- 0.19, N = 32.042.082.312.012.05MIN: 1.8 / MAX: 5.8MIN: 1.82 / MAX: 2.59MIN: 1.76 / MAX: 421.42MIN: 1.73 / MAX: 3.86MIN: 1.83 / MAX: 6.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER246810SE +/- 0.09, N = 9SE +/- 0.08, N = 12SE +/- 0.16, N = 9SE +/- 0.09, N = 9SE +/- 0.02, N = 32.202.152.252.091.87MIN: 1.91 / MAX: 2.71MIN: 1.81 / MAX: 2.58MIN: 1.75 / MAX: 343.7MIN: 1.78 / MAX: 2.85MIN: 1.81 / MAX: 5.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER1.05532.11063.16594.22125.2765SE +/- 0.15, N = 3SE +/- 0.07, N = 12SE +/- 0.44, N = 9SE +/- 0.07, N = 9SE +/- 0.09, N = 32.342.483.032.432.42MIN: 2.04 / MAX: 2.63MIN: 2.02 / MAX: 5.82MIN: 2.38 / MAX: 970.87MIN: 2.09 / MAX: 5.8MIN: 2.24 / MAX: 9.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER3691215SE +/- 0.22, N = 9SE +/- 0.21, N = 12SE +/- 0.47, N = 9SE +/- 0.25, N = 12SE +/- 0.05, N = 36.927.208.627.456.28MIN: 6.06 / MAX: 8.65MIN: 6.2 / MAX: 11.13MIN: 6.42 / MAX: 1101.3MIN: 6.87 / MAX: 734.65MIN: 6.16 / MAX: 8.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 512.75, N = 163090.04070.04070.04070.02084.1

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 6.30, N = 3SE +/- 0.47, N = 3110.0109.0109.0102.782.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER4080120160200SE +/- 35.40, N = 3SE +/- 3.76, N = 3SE +/- 2.73, N = 3SE +/- 2.40, N = 3SE +/- 0.58, N = 3132.1166.0165.0168.0129.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER246810SE +/- 0.006, N = 3SE +/- 0.003, N = 3SE +/- 0.114, N = 15SE +/- 0.003, N = 3SE +/- 0.000, N = 35.7416.9065.9125.2260.5011. (CXX) g++ options: -O3 -march=native -fopenmp

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER48121620SE +/- 1.13, N = 12SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 312.9911.7413.5913.9516.23MIN: 0.52 / MAX: 14.69MIN: 11.35 / MAX: 11.83MIN: 12.52 / MAX: 13.84MIN: 13.67 / MAX: 14.14MIN: 15.91 / MAX: 16.36

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in half precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER60K120K180K240K300KSE +/- 160.60, N = 3SE +/- 1301.92, N = 3SE +/- 159.17, N = 3SE +/- 1708.38, N = 3SE +/- 3524.05, N = 122732211377621317051362101439921. (CXX) g++ options: -O3 -lrt

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER246810SE +/- 0.016, N = 3SE +/- 0.006, N = 3SE +/- 0.150, N = 15SE +/- 0.039, N = 3SE +/- 0.003, N = 35.5567.0926.3235.9625.633

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.13, N = 3SE +/- 6.65, N = 5SE +/- 0.62, N = 399.05102.90102.6096.50102.83MIN: 91.8 / MAX: 100.69MIN: 95.98 / MAX: 104.54MIN: 94.84 / MAX: 104.25MIN: 64.35 / MAX: 104.79MIN: 92.44 / MAX: 105.47

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070 TI SUPER120240360480600SE +/- 3.09, N = 3SE +/- 11.16, N = 12SE +/- 3.07, N = 3525.12546.76557.73535.39558.82MIN: 458.54 / MAX: 542.46MIN: 195.25 / MAX: 556.94MIN: 513.63 / MAX: 563.37MIN: 428.43 / MAX: 572.99MIN: 473.77 / MAX: 573.46


Phoronix Test Suite v10.8.5