RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS) and NVIDIA GeForce RTX 3090 24GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402116-SADD-240207012&sor.

RTX 4070 SUPERProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads)ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS)Intel Device 7a2732GB4001GB Seagate ZP4000GP304001ASUS NVIDIA GeForce RTX 4070 SUPER 12GBRealtek ALC1220ARZOPAIntel I226-V + Intel Device 7a70EndeavourOS rolling6.7.1-arch1-1 (x86_64)KDE Plasma 5.27.10X Server 1.21.1.11NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.1 20230801ext41920x1080MSI NVIDIA GeForce RTX 4070 12GBGCC 13.2.1 20230801 + CUDA 12.3NVIDIA GeForce RTX 4070 Ti 12GBNVIDIA GeForce RTX 3090 24GBPI-KVM Video6.7.4-arch1-1 (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- NVIDIA RTX 4070 SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 3090: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Details- NVIDIA RTX 4070 SUPER: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1- NVIDIA RTX 4070: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2a- NVIDIA RTX 4070 TI: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36- NVIDIA RTX 3090: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.baSecurity Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected Environment Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Python Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090: Python 3.11.6

RTX 4070 SUPERopencl-benchmark: FP64 Computeopencl-benchmark: FP32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT8 Computeopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: Memory Bandwidth Coalesced Writepytorch: NVIDIA CUDA GPU - 1 - ResNet-50pytorch: NVIDIA CUDA GPU - 1 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 16 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-152pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lgpuowl: 57885161gpuowl: 77936867gpuowl: 332220523realsr-ncnn: 4x - Norealsr-ncnn: 4x - Yeswaifu2x-ncnn: 2x - 3 - Yesvkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C Bluestein in single precisionvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflinghashcat: MD5hashcat: SHA1hashcat: 7-Ziphashcat: SHA-512hashcat: TrueCrypt RIPEMD160 + XTScl-mem: Copycl-mem: Readcl-mem: Writenamd-cuda: ATPase Simulation - 327,506 Atomsvkresample: 2x - Doublevkresample: 2x - Singleoctanebench: Total Scorefahbench: clpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidthrodinia: OpenCL Particle Filterluxcorerender: DLSC - GPUluxcorerender: Danish Mood - GPUluxcorerender: Orange Juice - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Rainbow Colors and Prism - GPUfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTblender: BMW27 - NVIDIA OptiXblender: Classroom - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXblender: Pabellon Barcelona - NVIDIA OptiXindigobench: OpenCL GPU - Bedroomindigobench: OpenCL GPU - Supercarmandelgpu: GPUneatbench: GPUtensorflow: GPU - 1 - VGG-16tensorflow: GPU - 1 - AlexNettensorflow: GPU - 16 - VGG-16tensorflow: GPU - 32 - VGG-16tensorflow: GPU - 64 - VGG-16tensorflow: GPU - 16 - AlexNettensorflow: GPU - 256 - VGG-16tensorflow: GPU - 32 - AlexNettensorflow: GPU - 64 - AlexNettensorflow: GPU - 1 - GoogLeNettensorflow: GPU - 1 - ResNet-50tensorflow: GPU - 256 - AlexNettensorflow: GPU - 512 - AlexNettensorflow: GPU - 16 - GoogLeNettensorflow: GPU - 16 - ResNet-50tensorflow: GPU - 32 - GoogLeNettensorflow: GPU - 32 - ResNet-50tensorflow: GPU - 64 - GoogLeNettensorflow: GPU - 64 - ResNet-50libplacebo: deband_heavylibplacebo: polar_nocomputelibplacebo: hdr_peakdetectlibplacebo: hdr_lutlibplacebo: av1_grain_lapncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetvkpeak: fp32-scalarvkpeak: fp32-vec4vkpeak: fp16-scalarvkpeak: fp16-vec4vkpeak: fp64-scalarvkpeak: fp64-vec4vkpeak: int32-scalarvkpeak: int32-vec4vkpeak: int16-scalarvkpeak: int16-vec4NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30900.62138.5944.21419.88917.17014.307464.86455.01557.73201.94509.45501.50507.45195.40504.67195.39504.27196.07194.58195.30106.37102.60102.60103.17103.57869.07646.41137.446.32334.8852.8555479413170515166243177392950299445175078675830333332213260000011764673232733333802967331.8446.2407.50.06791339.59318.489720.973789366.057618170.5435492.69630.11437.653.48013.5910.5611.7212.8227.675.91213215616570.887.296.81021091191171151223343923704234374582103895775845996135.5712.609.4551.3014.2919.80152.813587219538.240701.3513.921.481.5031.5933.433.9712.624.3534.1635.1015.675.4615.615.5115.525.552186.702327.553292.373905.984171.008.623.032.252.313.855.070.8411.04117.818.9716.1746.2663.826.8611.11844.612.860.51031.7683.44316.37714.28412.116465.18459.43546.76198.18458.39459.94458.36187.26459.93187.69459.27186.63187.27187.51107.59103.68102.90101.55101.24101.43714.80530.32112.617.09242.8523.168470971377621371422390777744721238867905756147866667182024666679769672673300000660967330.3446.3406.70.07498415.16018.016647.997867317.195214555.1928479.39515.17437.214.09811.748.8910.4010.9223.266.90613115316671.086.896.71031091221221211183303893624234554562093874734774945026.2114.8611.0358.4416.5518.20348.517516770131.240701.3614.041.501.51.5031.4533.3233.9312.784.3435.2115.665.4915.635.5515.545.551843.261968.373329.263946.904152.417.202.482.152.082.243.590.846.0645.525.115.788.7220.745.186.21382.822.670.66040.9144.42021.04718.28115.731465.07457.17535.39201.19502.92505.55505.62194.29198.82504.66197.02195.86194.87108.59103.4596.50103.20103.24103.50919.13676.59145.845.96233.6262.8545544613621015125254317394251528464775141733122333332353240000012626333462500000858600333.3446.3412.20.06788322.06418.456735.940593382.163719821.1038691.73667.05437.633.29113.9510.9911.8913.2327.715.22613215616871.387.396.4103102.71171181251243363933654244374572113916046126346485.4312.309.0250.7313.9720.25653.589619106132.540701.3814.791.491.51.531.701.533.2934.0612.794.3234.6135.4415.695.4615.815.5015.505.532306.562459.033475.063976.044143.967.452.542.092.014.143.460.827.3734.497.746.0712.2516.376.135.89497.663.040.63739.3953.13520.02717.00113.727864.11887.31525.12197.12419.76420.29419.03164.14416.89163.74416.20164.14161.01164.35105.5598.1199.0599.8499.4399.25866.31645.99137.325.55630.3133.202484182732211420530912141876508564195144311671773000002132373333310560003081866667797833360.8825.8753.80.10822333.63910.323674.250912343.019917923.3334906.79642.23816.553.84412.9910.2012.1413.1233.295.741132154132.170.286.295.21031101131191211133634983766057246591873745925955945936.3115.2610.6454.3017.3020.95952.014484098913.830901.3814.451.491.51.5131.981.5133.5333.9312.824.3534.4635.5815.685.4915.675.5715.635.572020.162116.795055.883369.884100.367.272.342.212.042.163.340.876.1417.884.123.6012.7011.294.906.73354.572.6520353.9526699.6620151.4439860.80638.84638.7420295.2720009.7313264.9116329.72OpenBenchmarking.org

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 ComputeNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 40700.14850.2970.44550.5940.7425SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.001, N = 30.6600.6370.6210.5101. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 ComputeNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070918273645SE +/- 0.00, N = 3SE +/- 0.10, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 340.9139.4038.5931.771. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 ComputeNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30900.99451.9892.98353.9784.9725SE +/- 0.016, N = 3SE +/- 0.015, N = 3SE +/- 0.004, N = 3SE +/- 0.003, N = 34.4204.2143.4433.1351. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 ComputeNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070510152025SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 321.0520.0319.8916.381. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 ComputeNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407048121620SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 318.2817.1717.0014.281. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 ComputeNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407048121620SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 315.7314.3113.7312.121. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3864.11465.18465.07464.861. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.06, N = 3SE +/- 0.16, N = 3SE +/- 0.11, N = 3SE +/- 0.14, N = 3887.31459.43457.17455.011. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090120240360480600SE +/- 3.09, N = 3SE +/- 11.16, N = 12557.73546.76535.39525.12MIN: 513.63 / MAX: 563.37MIN: 195.25 / MAX: 556.94MIN: 428.43 / MAX: 572.99MIN: 458.54 / MAX: 542.46

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.73, N = 3SE +/- 0.36, N = 3SE +/- 0.09, N = 2201.94201.19198.18197.12MIN: 183.53 / MAX: 206.5MIN: 180.79 / MAX: 203.92MIN: 181.27 / MAX: 200.06MIN: 137.37 / MAX: 198.9

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 2.23, N = 3SE +/- 0.26, N = 3SE +/- 0.89, N = 2509.45502.92458.39419.76MIN: 430.1 / MAX: 516.48MIN: 415.65 / MAX: 520.39MIN: 404.5 / MAX: 461.01MIN: 376.2 / MAX: 422.17

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 1.69, N = 3SE +/- 2.17, N = 2SE +/- 0.13, N = 2505.55501.50459.94420.29MIN: 419.93 / MAX: 512.69MIN: 415.94 / MAX: 510.69MIN: 403.65 / MAX: 462.59MIN: 376.81 / MAX: 421.58

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 0.92, N = 3SE +/- 1.92, N = 3SE +/- 0.27, N = 3SE +/- 0.24, N = 3507.45505.62458.36419.03MIN: 423.41 / MAX: 512.88MIN: 426.6 / MAX: 513.25MIN: 404.89 / MAX: 461.01MIN: 376 / MAX: 422

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.29, N = 3195.40194.29187.26164.14MIN: 186.09 / MAX: 197.7MIN: 182.25 / MAX: 197.39MIN: 179.81 / MAX: 188.21MIN: 145.67 / MAX: 165.38

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 1.39, N = 3SE +/- 0.34, N = 3SE +/- 0.14, N = 3504.67459.93416.89MIN: 412.34 / MAX: 514.07MIN: 403.65 / MAX: 462.74MIN: 329.77 / MAX: 420.82

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.29, N = 3198.82195.39187.69163.74MIN: 188.33 / MAX: 201.47MIN: 183.94 / MAX: 198.7MIN: 182.03 / MAX: 188.31MIN: 144.93 / MAX: 165.03

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090110220330440550SE +/- 0.83, N = 2SE +/- 4.43, N = 2SE +/- 0.43, N = 2SE +/- 0.40, N = 3504.66504.27459.27416.20MIN: 424.27 / MAX: 509.08MIN: 418.22 / MAX: 512.44MIN: 405.48 / MAX: 461.88MIN: 355.45 / MAX: 419.05

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.78, N = 2SE +/- 0.51, N = 3SE +/- 0.34, N = 3197.02196.07186.63164.14MIN: 183.92 / MAX: 200.54MIN: 171.95 / MAX: 199.96MIN: 180.51 / MAX: 187.79MIN: 149 / MAX: 165

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 0.19, N = 2SE +/- 1.14, N = 2SE +/- 0.17, N = 3195.86194.58187.27161.01MIN: 181.64 / MAX: 199.2MIN: 183.74 / MAX: 198.52MIN: 179.9 / MAX: 188.08MIN: 138.12 / MAX: 165.16

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 30904080120160200SE +/- 1.38, N = 2SE +/- 0.05, N = 3SE +/- 0.33, N = 2195.30194.87187.51164.35MIN: 182 / MAX: 199.43MIN: 180.8 / MAX: 198MIN: 181.57 / MAX: 188.05MIN: 149.91 / MAX: 166.09

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 309020406080100SE +/- 0.55, N = 3SE +/- 0.33, N = 3108.59107.59106.37105.55MIN: 99.04 / MAX: 110.68MIN: 98.77 / MAX: 109.43MIN: 97.91 / MAX: 108.16MIN: 91.76 / MAX: 107.42

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.52, N = 2SE +/- 0.53, N = 2103.68103.4598.11MIN: 96.86 / MAX: 105.56MIN: 95.22 / MAX: 105.88MIN: 89.88 / MAX: 100.25

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070 TI20406080100SE +/- 0.13, N = 3SE +/- 6.65, N = 5102.90102.6099.0596.50MIN: 95.98 / MAX: 104.54MIN: 94.84 / MAX: 104.25MIN: 91.8 / MAX: 100.69MIN: 64.35 / MAX: 104.79

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.39, N = 2SE +/- 1.49, N = 2SE +/- 0.45, N = 3SE +/- 0.14, N = 3103.20102.60101.5599.84MIN: 95.31 / MAX: 105.27MIN: 79.69 / MAX: 105.28MIN: 93.44 / MAX: 103.08MIN: 92.73 / MAX: 101.46

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.05, N = 2SE +/- 0.57, N = 3103.24103.17101.2499.43MIN: 95.41 / MAX: 104.9MIN: 95.79 / MAX: 105.15MIN: 93.33 / MAX: 102.92MIN: 90.49 / MAX: 101.97

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.36, N = 2SE +/- 0.39, N = 3SE +/- 0.19, N = 3103.57103.50101.4399.25MIN: 95.95 / MAX: 105.54MIN: 94.95 / MAX: 105.61MIN: 93.27 / MAX: 103.58MIN: 91.16 / MAX: 101.18

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 57885161NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40702004006008001000SE +/- 2.53, N = 3SE +/- 1.26, N = 3SE +/- 2.01, N = 3SE +/- 0.00, N = 3919.13869.07866.31714.80

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 77936867NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070150300450600750SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.09, N = 3676.59646.41645.99530.32

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 332220523NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070306090120150SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3145.84137.44137.32112.61

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070246810SE +/- 0.016, N = 3SE +/- 0.039, N = 3SE +/- 0.150, N = 15SE +/- 0.006, N = 35.5565.9626.3237.092

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 40701020304050SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.23, N = 330.3133.6334.8942.85

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30900.72051.4412.16152.8823.6025SE +/- 0.009, N = 3SE +/- 0.014, N = 3SE +/- 0.028, N = 3SE +/- 0.011, N = 32.8542.8553.1683.202

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT R2C / C2RNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407012K24K36K48K60KSE +/- 520.37, N = 3SE +/- 702.53, N = 15SE +/- 320.62, N = 3SE +/- 745.02, N = 13554465479448418470971. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in half precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER60K120K180K240K300KSE +/- 160.60, N = 3SE +/- 1301.92, N = 3SE +/- 1708.38, N = 3SE +/- 159.17, N = 32732211377621362101317051. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 40703K6K9K12K15KSE +/- 102.52, N = 3SE +/- 118.41, N = 3SE +/- 115.62, N = 3SE +/- 52.09, N = 3151661512514205137141. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in double precisionNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 40707K14K21K28K35KSE +/- 50.66, N = 3SE +/- 302.46, N = 3SE +/- 146.69, N = 3SE +/- 125.94, N = 3309122543124317223901. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precisionNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER30K60K90K120K150KSE +/- 9.64, N = 3SE +/- 13.72, N = 3SE +/- 0.88, N = 3SE +/- 7.94, N = 31418767777473942739291. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C multidimensional in single precisionNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 407011K22K33K44K55KSE +/- 417.77, N = 15SE +/- 407.28, N = 15SE +/- 407.19, N = 15SE +/- 476.57, N = 5515285085650299472121. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein benchmark in double precisionNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407010002000300040005000SE +/- 11.35, N = 3SE +/- 12.55, N = 3SE +/- 9.84, N = 3SE +/- 4.51, N = 346474451419538861. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER30K60K90K120K150KSE +/- 37.44, N = 3SE +/- 5.84, N = 3SE +/- 28.54, N = 3SE +/- 37.77, N = 31443117905775141750781. (CXX) g++ options: -O3 -lrt

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407016000M32000M48000M64000M80000MSE +/- 11283665.68, N = 3SE +/- 22430807.19, N = 3SE +/- 53667246.37, N = 3SE +/- 33772046.30, N = 373312233333675830333336717730000056147866667

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40705000M10000M15000M20000M25000MSE +/- 15926811.78, N = 3SE +/- 5140363.15, N = 3SE +/- 26244639.66, N = 3SE +/- 6318315.53, N = 323532400000221326000002132373333318202466667

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070300K600K900K1200K1500KSE +/- 2339.04, N = 3SE +/- 1991.93, N = 3SE +/- 1587.45, N = 3SE +/- 2062.63, N = 3126263311764671056000976967

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070700M1400M2100M2800M3500MSE +/- 721110.26, N = 3SE +/- 1530068.99, N = 3SE +/- 3288532.26, N = 3SE +/- 1059874.21, N = 33462500000323273333330818666672673300000

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070200K400K600K800K1000KSE +/- 888.82, N = 3SE +/- 633.33, N = 3SE +/- 1757.21, N = 3SE +/- 176.38, N = 3858600802967797833660967

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 407080160240320400SE +/- 0.22, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3360.8333.3331.8330.31. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.32, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.12, N = 3825.8446.3446.3446.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070160320480640800SE +/- 0.83, N = 3SE +/- 0.12, N = 3SE +/- 1.11, N = 3SE +/- 0.55, N = 3753.8412.2407.5406.71. (CC) gcc options: -O2 -flto -lOpenCL

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30900.02430.04860.07290.09720.1215SE +/- 0.00061, N = 3SE +/- 0.00031, N = 3SE +/- 0.00021, N = 3SE +/- 0.00042, N = 30.067880.067910.074980.10822

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 407090180270360450SE +/- 0.35, N = 3SE +/- 0.30, N = 3SE +/- 0.30, N = 3SE +/- 0.77, N = 3322.06333.64339.59415.161. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 310.3218.0218.4618.491. (CXX) g++ options: -O3

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070160320480640800735.94720.97674.25648.00

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407080160240320400SE +/- 0.26, N = 3SE +/- 0.39, N = 3SE +/- 0.26, N = 3SE +/- 0.12, N = 3382.16366.06343.02317.20

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40704K8K12K16K20KSE +/- 2.50, N = 3SE +/- 3.14, N = 3SE +/- 16.49, N = 3SE +/- 15.26, N = 319821.1018170.5417923.3314555.191. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40708K16K24K32K40KSE +/- 11.67, N = 3SE +/- 0.99, N = 3SE +/- 113.39, N = 3SE +/- 5.46, N = 338691.7335492.6934906.7928479.391. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070140280420560700SE +/- 1.33, N = 3SE +/- 1.63, N = 3SE +/- 0.98, N = 3SE +/- 0.21, N = 3667.05642.23630.11515.171. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 40702004006008001000SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3816.55437.65437.63437.211. (CXX) g++ options: -O3

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40700.92211.84422.76633.68844.6105SE +/- 0.002, N = 3SE +/- 0.039, N = 4SE +/- 0.030, N = 15SE +/- 0.008, N = 33.2913.4803.8444.0981. (CXX) g++ options: -O2 -lOpenCL

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 407048121620SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 1.13, N = 12SE +/- 0.01, N = 313.9513.5912.9911.74MIN: 13.67 / MAX: 14.14MIN: 12.52 / MAX: 13.84MIN: 0.52 / MAX: 14.69MIN: 11.35 / MAX: 11.83

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40703691215SE +/- 0.11, N = 3SE +/- 0.08, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 310.9910.5610.208.89MIN: 4.17 / MAX: 12.71MIN: 3.7 / MAX: 12.17MIN: 4.07 / MAX: 11.93MIN: 3.32 / MAX: 10.26

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 40703691215SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 312.1411.8911.7210.40MIN: 10.24 / MAX: 16.71MIN: 9.85 / MAX: 15.88MIN: 9.6 / MAX: 15.44MIN: 8.31 / MAX: 13.9

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 40703691215SE +/- 0.01, N = 3SE +/- 0.03, N = 2SE +/- 0.02, N = 3SE +/- 0.01, N = 313.2313.1212.8210.92MIN: 5.41 / MAX: 15.13MIN: 4.85 / MAX: 15.21MIN: 4.84 / MAX: 14.62MIN: 4.45 / MAX: 12.42

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070816243240SE +/- 0.36, N = 5SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 333.2927.7127.6723.26MIN: 30.4 / MAX: 36.21MIN: 25.01 / MAX: 29.15MIN: 24.87 / MAX: 29.03MIN: 20.92 / MAX: 24.3

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070246810SE +/- 0.003, N = 3SE +/- 0.006, N = 3SE +/- 0.114, N = 15SE +/- 0.003, N = 35.2265.7415.9126.9061. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070306090120150SE +/- 1.20, N = 3SE +/- 0.88, N = 3SE +/- 1.20, N = 3SE +/- 1.20, N = 31321321321311. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070306090120150SE +/- 2.00, N = 3SE +/- 2.19, N = 3SE +/- 0.33, N = 3SE +/- 4.81, N = 31561561541531. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 30904080120160200SE +/- 2.40, N = 3SE +/- 3.76, N = 3SE +/- 2.73, N = 3SE +/- 35.40, N = 3168.0166.0165.0132.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 30901632486480SE +/- 0.74, N = 3SE +/- 0.25, N = 3SE +/- 0.32, N = 3SE +/- 0.72, N = 371.371.070.870.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309020406080100SE +/- 0.57, N = 3SE +/- 0.12, N = 3SE +/- 0.44, N = 3SE +/- 0.94, N = 387.387.286.886.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 309020406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 3SE +/- 0.58, N = 3SE +/- 0.84, N = 396.896.796.495.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.88, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 31031031031021. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 6.30, N = 3110.0109.0109.0102.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090306090120150SE +/- 1.86, N = 3SE +/- 4.04, N = 3SE +/- 1.15, N = 3SE +/- 1.86, N = 31221191171131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER306090120150SE +/- 1.76, N = 3SE +/- 3.28, N = 3SE +/- 1.20, N = 3SE +/- 2.08, N = 31221191181171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 2.08, N = 3SE +/- 2.08, N = 3SE +/- 2.31, N = 3SE +/- 1.00, N = 21251211211151. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090306090120150SE +/- 2.08, N = 3SE +/- 2.08, N = 3SE +/- 1.20, N = 3SE +/- 0.88, N = 31241221181131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 407080160240320400SE +/- 1.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33633363343301. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070110220330440550SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34983933923891. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 407080160240320400SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33763703653621. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER130260390520650SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36054244234231. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER160320480640800SE +/- 0.58, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 37244554374371. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 4070140280420560700SE +/- 0.88, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 36594584574561. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309050100150200250SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 32112102091871. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309080160240320400SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33913893873741. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070130260390520650SE +/- 0.33, N = 3SE +/- 2.31, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36045925774731. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070130260390520650SE +/- 0.33, N = 3SE +/- 2.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36125955844771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070140280420560700SE +/- 0.67, N = 3SE +/- 0.00, N = 3SE +/- 2.03, N = 3SE +/- 0.33, N = 36345995944941. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070140280420560700SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 36486135935021. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090246810SE +/- 0.02, N = 3SE +/- 0.06, N = 13SE +/- 0.01, N = 3SE +/- 0.06, N = 145.435.576.216.31

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309048121620SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 312.3012.6014.8615.26

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40703691215SE +/- 0.01, N = 3SE +/- 0.06, N = 13SE +/- 0.08, N = 9SE +/- 0.03, N = 39.029.4510.6411.03

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40701326395265SE +/- 0.05, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 2SE +/- 0.04, N = 350.7351.3054.3058.44

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 309048121620SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 313.9714.2916.5517.30

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070510152025SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 320.9620.2619.8018.20

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40701224364860SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 353.5952.8152.0148.52

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090130M260M390M520M650MSE +/- 1202791.77, N = 3SE +/- 467034.80, N = 3SE +/- 1783157.89, N = 3SE +/- 794770.01, N = 3619106132.5587219538.2516770131.2484098913.81. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 30909001800270036004500SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34070407040703090

TensorFlow

Device: GPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.31050.6210.93151.2421.5525SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 21.381.381.361.35

TensorFlow

Device: GPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.06, N = 2SE +/- 0.20, N = 15SE +/- 0.16, N = 3SE +/- 0.22, N = 214.7914.4514.0413.92

TensorFlow

Device: GPU - Batch Size: 16 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER0.33750.6751.01251.351.6875SE +/- 0.01, N = 2SE +/- 0.00, N = 3SE +/- 0.00, N = 21.501.491.491.48

TensorFlow

Device: GPU - Batch Size: 32 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.00, N = 2SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.501.501.50

TensorFlow

Device: GPU - Batch Size: 64 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 40700.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.511.501.50

TensorFlow

Device: GPU - Batch Size: 16 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070714212835SE +/- 0.07, N = 3SE +/- 0.08, N = 3SE +/- 0.17, N = 331.9831.7031.5931.45

TensorFlow

Device: GPU - Batch Size: 256 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: VGG-16NVIDIA RTX 3090NVIDIA RTX 4070 TI0.33980.67961.01941.35921.699SE +/- 0.00, N = 31.511.50

TensorFlow

Device: GPU - Batch Size: 32 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI816243240SE +/- 0.05, N = 3SE +/- 0.15, N = 2SE +/- 0.18, N = 3SE +/- 0.04, N = 333.5333.4033.3233.29

TensorFlow

Device: GPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 4070816243240SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 334.0633.9733.9333.93

TensorFlow

Device: GPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.07, N = 3SE +/- 0.30, N = 2SE +/- 0.10, N = 3SE +/- 0.17, N = 212.8212.7912.7812.62

TensorFlow

Device: GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TI0.97881.95762.93643.91524.894SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 24.354.354.344.32

TensorFlow

Device: GPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPER816243240SE +/- 0.07, N = 2SE +/- 0.07, N = 3SE +/- 0.01, N = 334.6134.4634.16

TensorFlow

Device: GPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER816243240SE +/- 0.01, N = 3SE +/- 0.09, N = 2SE +/- 0.03, N = 3SE +/- 0.02, N = 235.5835.4435.2135.10

TensorFlow

Device: GPU - Batch Size: 16 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 SUPERNVIDIA RTX 407048121620SE +/- 0.07, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 315.6915.6815.6715.66

TensorFlow

Device: GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER1.23532.47063.70594.94126.1765SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 25.495.495.465.46

TensorFlow

Device: GPU - Batch Size: 32 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 215.8115.6715.6315.61

TensorFlow

Device: GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI1.25332.50663.75995.01326.2665SE +/- 0.01, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 35.575.555.515.50

TensorFlow

Device: GPU - Batch Size: 64 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI48121620SE +/- 0.08, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 215.6315.5415.5215.50

TensorFlow

Device: GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TI1.25332.50663.75995.01326.2665SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 25.575.555.555.53

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: deband_heavyNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40705001000150020002500SE +/- 0.56, N = 3SE +/- 2.26, N = 3SE +/- 2.92, N = 3SE +/- 0.08, N = 32306.672186.702024.611847.981. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: polar_nocomputeNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 3090NVIDIA RTX 40705001000150020002500SE +/- 0.26, N = 3SE +/- 0.24, N = 3SE +/- 3.45, N = 3SE +/- 0.16, N = 32461.232327.552126.311972.781. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_peakdetectNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER11002200330044005500SE +/- 13.97, N = 3SE +/- 99.97, N = 3SE +/- 144.09, N = 3SE +/- 3.65, N = 35104.103544.603452.433292.371. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_lutNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPERNVIDIA RTX 30909001800270036004500SE +/- 33.96, N = 3SE +/- 6.47, N = 3SE +/- 12.09, N = 3SE +/- 22.23, N = 33976.043946.903905.983376.851. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: av1_grain_lapNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 30909001800270036004500SE +/- 5.52, N = 3SE +/- 16.20, N = 3SE +/- 35.33, N = 3SE +/- 12.99, N = 34171.004152.414143.964126.891. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER3691215SE +/- 0.22, N = 9SE +/- 0.21, N = 12SE +/- 0.25, N = 12SE +/- 0.47, N = 96.927.207.458.62MIN: 6.06 / MAX: 8.65MIN: 6.2 / MAX: 11.13MIN: 6.87 / MAX: 734.65MIN: 6.42 / MAX: 1101.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.05532.11063.16594.22125.2765SE +/- 0.15, N = 3SE +/- 0.07, N = 9SE +/- 0.07, N = 12SE +/- 0.44, N = 92.342.432.483.03MIN: 2.04 / MAX: 2.63MIN: 2.09 / MAX: 5.8MIN: 2.02 / MAX: 5.82MIN: 2.38 / MAX: 970.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 SUPER246810SE +/- 0.09, N = 9SE +/- 0.08, N = 12SE +/- 0.09, N = 9SE +/- 0.16, N = 92.092.152.202.25MIN: 1.78 / MAX: 2.85MIN: 1.81 / MAX: 2.58MIN: 1.91 / MAX: 2.71MIN: 1.75 / MAX: 343.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.93831.87662.81493.75324.6915SE +/- 0.08, N = 12SE +/- 0.21, N = 3SE +/- 0.09, N = 11SE +/- 0.34, N = 82.012.042.082.31MIN: 1.73 / MAX: 3.86MIN: 1.8 / MAX: 5.8MIN: 1.82 / MAX: 2.59MIN: 1.76 / MAX: 421.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER0.93151.8632.79453.7264.6575SE +/- 0.14, N = 3SE +/- 0.08, N = 8SE +/- 0.05, N = 9SE +/- 1.31, N = 92.162.222.303.85MIN: 2.01 / MAX: 2.55MIN: 1.83 / MAX: 2.54MIN: 2.15 / MAX: 2.58MIN: 1.89 / MAX: 1093.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER48121620SE +/- 0.17, N = 3SE +/- 0.09, N = 9SE +/- 0.07, N = 12SE +/- 0.97, N = 93.343.463.465.07MIN: 3.14 / MAX: 4MIN: 2.91 / MAX: 3.79MIN: 3.13 / MAX: 7.03MIN: 3.22 / MAX: 1124.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 4070 TINVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 30900.19580.39160.58740.78320.979SE +/- 0.03, N = 9SE +/- 0.04, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 90.810.840.840.84MIN: 0.61 / MAX: 1.19MIN: 0.65 / MAX: 4.63MIN: 0.64 / MAX: 0.96MIN: 0.63 / MAX: 1.131. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 SUPER3691215SE +/- 0.14, N = 9SE +/- 0.14, N = 9SE +/- 0.18, N = 9SE +/- 1.21, N = 95.876.066.1111.04MIN: 5.2 / MAX: 6.88MIN: 5.33 / MAX: 8.36MIN: 5.25 / MAX: 9.16MIN: 5.28 / MAX: 1769.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 0.25, N = 3SE +/- 11.81, N = 9SE +/- 13.24, N = 12SE +/- 29.60, N = 917.8832.0545.52117.81MIN: 17.3 / MAX: 18.57MIN: 17.34 / MAX: 644.35MIN: 17.49 / MAX: 643.35MIN: 17.16 / MAX: 647.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER48121620SE +/- 0.07, N = 3SE +/- 0.73, N = 12SE +/- 1.33, N = 9SE +/- 3.49, N = 94.125.115.478.97MIN: 3.97 / MAX: 4.51MIN: 3.99 / MAX: 916.69MIN: 3.95 / MAX: 726.67MIN: 3.94 / MAX: 922.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.09, N = 3SE +/- 0.03, N = 9SE +/- 1.70, N = 12SE +/- 5.86, N = 93.603.745.7816.17MIN: 3.44 / MAX: 3.79MIN: 3.61 / MAX: 3.98MIN: 3.6 / MAX: 397.75MIN: 3.52 / MAX: 436.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50NVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER1020304050SE +/- 0.12, N = 9SE +/- 0.10, N = 9SE +/- 4.00, N = 12SE +/- 14.70, N = 98.208.2412.2546.26MIN: 7.69 / MAX: 11.69MIN: 7.87 / MAX: 9.87MIN: 8 / MAX: 1777.17MIN: 7.71 / MAX: 1829.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 4070 SUPER1428425670SE +/- 0.21, N = 3SE +/- 3.10, N = 12SE +/- 5.37, N = 12SE +/- 10.56, N = 911.2916.3720.7463.82MIN: 10.82 / MAX: 11.93MIN: 10.57 / MAX: 855.36MIN: 10.3 / MAX: 854.36MIN: 10.28 / MAX: 858.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 3090NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER246810SE +/- 0.19, N = 3SE +/- 0.12, N = 12SE +/- 0.29, N = 9SE +/- 1.76, N = 94.905.185.366.86MIN: 4.47 / MAX: 5.27MIN: 4.67 / MAX: 6.88MIN: 4.55 / MAX: 496.3MIN: 4.34 / MAX: 1630.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 4070 TINVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 SUPER3691215SE +/- 0.18, N = 12SE +/- 0.24, N = 12SE +/- 0.32, N = 8SE +/- 3.28, N = 95.896.216.4711.11MIN: 5.42 / MAX: 7.57MIN: 5.53 / MAX: 8.99MIN: 5.44 / MAX: 9.3MIN: 5.49 / MAX: 4942.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER2004006008001000SE +/- 61.31, N = 9SE +/- 52.80, N = 9SE +/- 25.65, N = 9SE +/- 87.53, N = 9281.56327.82390.18844.61MIN: 46.48 / MAX: 1913.33MIN: 46.48 / MAX: 1816.93MIN: 46.49 / MAX: 1816.77MIN: 46.34 / MAX: 1866.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TINVIDIA RTX 4070 SUPER246810SE +/- 0.10, N = 9SE +/- 0.08, N = 8SE +/- 0.12, N = 8SE +/- 0.29, N = 92.342.502.842.86MIN: 2 / MAX: 3.86MIN: 2.1 / MAX: 32.36MIN: 2.4 / MAX: 5.07MIN: 2.17 / MAX: 577.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-scalarNVIDIA RTX 30904K8K12K16K20KSE +/- 123.67, N = 320353.95

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-vec4NVIDIA RTX 30906K12K18K24K30KSE +/- 206.35, N = 326767.21

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-scalarNVIDIA RTX 30904K8K12K16K20KSE +/- 34.29, N = 320151.44

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-vec4NVIDIA RTX 30909K18K27K36K45KSE +/- 12.74, N = 339860.80

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-scalarNVIDIA RTX 3090140280420560700SE +/- 0.06, N = 3638.84

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-vec4NVIDIA RTX 3090140280420560700SE +/- 0.72, N = 3639.52

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-scalarNVIDIA RTX 30904K8K12K16K20KSE +/- 16.76, N = 320315.10

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-vec4NVIDIA RTX 30904K8K12K16K20KSE +/- 15.96, N = 320017.06

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-scalarNVIDIA RTX 30903K6K9K12K15KSE +/- 10.34, N = 313273.53

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-vec4NVIDIA RTX 30904K8K12K16K20KSE +/- 9.58, N = 316338.23


Phoronix Test Suite v10.8.5