RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS) and MSI NVIDIA GeForce RTX 4070 12GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2401299-NE-2401275NE83&grs&sro.

RTX 4070 SUPERProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads)ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS)Intel Device 7a2732GB4001GB Seagate ZP4000GP304001ASUS NVIDIA GeForce RTX 4070 SUPER 12GBRealtek ALC1220ARZOPAIntel I226-V + Intel Device 7a70EndeavourOS rolling6.7.1-arch1-1 (x86_64)KDE Plasma 5.27.10X Server 1.21.1.11NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.1 20230801ext41920x1080MSI NVIDIA GeForce RTX 4070 12GBGCC 13.2.1 20230801 + CUDA 12.3OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11dGraphics Details- NVIDIA RTX 4070 SUPER: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1- NVIDIA RTX 4070: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2aSecurity Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected Environment Details- NVIDIA RTX 4070: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Python Details- NVIDIA RTX 4070: Python 3.11.6

RTX 4070 SUPERclpeak: Integer Compute INTclpeak: Single-Precision Floatrealsr-ncnn: 4x - Yesviennacl: OpenCL BLAS - dGEMM-NTopencl-benchmark: INT64 Computeclpeak: Double-Precision Doublevkresample: 2x - Doubleviennacl: OpenCL BLAS - dGEMM-TTgpuowl: 332220523viennacl: OpenCL BLAS - dGEMM-NNgpuowl: 77936867opencl-benchmark: FP64 Computehashcat: SHA1gpuowl: 57885161opencl-benchmark: FP32 Computehashcat: TrueCrypt RIPEMD160 + XTSopencl-benchmark: INT32 Computeviennacl: OpenCL BLAS - dGEMM-TNhashcat: SHA-512hashcat: 7-Ziphashcat: MD5opencl-benchmark: INT16 Computeluxcorerender: Rainbow Colors and Prism - GPUluxcorerender: Danish Mood - GPUlibplacebo: deband_heavylibplacebo: polar_nocomputeopencl-benchmark: INT8 Computeblender: Classroom - NVIDIA OptiXrodinia: OpenCL Particle Filterluxcorerender: LuxCore Benchmark - GPUblender: Fishy Cat - NVIDIA OptiXvkfft: FFT + iFFT R2C / C2Rblender: Pabellon Barcelona - NVIDIA OptiXluxcorerender: DLSC - GPUfahbench: vkfft: FFT + iFFT C2C Bluestein benchmark in double precisionblender: Barbershop - NVIDIA OptiXmandelgpu: GPUluxcorerender: Orange Juice - GPUblender: BMW27 - NVIDIA OptiXoctanebench: Total Scorepytorch: NVIDIA CUDA GPU - 16 - ResNet-50waifu2x-ncnn: 2x - 3 - Yespytorch: NVIDIA CUDA GPU - 64 - ResNet-50vkfft: FFT + iFFT C2C Bluestein in single precisionnamd-cuda: ATPase Simulation - 327,506 Atomspytorch: NVIDIA CUDA GPU - 512 - ResNet-50pytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50indigobench: OpenCL GPU - Supercarindigobench: OpenCL GPU - Bedroomvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingviennacl: CPU BLAS - dGEMM-TNvkfft: FFT + iFFT C2C 1D batched in single precisionpytorch: NVIDIA CUDA GPU - 64 - ResNet-152vkfft: FFT + iFFT C2C 1D batched in half precisionpytorch: NVIDIA CUDA GPU - 16 - ResNet-152viennacl: CPU BLAS - dGEMM-NTpytorch: NVIDIA CUDA GPU - 512 - ResNet-152viennacl: OpenCL BLAS - dAXPYpytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-152viennacl: CPU BLAS - dGEMM-TTvkresample: 2x - Singleviennacl: CPU BLAS - dGEMM-NNviennacl: OpenCL BLAS - sDOTpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 1 - ResNet-50viennacl: CPU BLAS - sAXPYpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 1 - ResNet-152libplacebo: av1_grain_laptensorflow: GPU - 16 - VGG-16tensorflow: GPU - 1 - GoogLeNetviennacl: OpenCL BLAS - sCOPYpytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_llibplacebo: hdr_lutpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lviennacl: CPU BLAS - dGEMV-Nopencl-benchmark: Memory Bandwidth Coalesced Writetensorflow: GPU - 1 - AlexNetviennacl: OpenCL BLAS - sAXPYviennacl: CPU BLAS - sCOPYtensorflow: GPU - 1 - VGG-16tensorflow: GPU - 32 - ResNet-50viennacl: CPU BLAS - sDOTtensorflow: GPU - 16 - ResNet-50viennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMV-Nviennacl: CPU BLAS - dAXPYcl-mem: Copytensorflow: GPU - 16 - AlexNetviennacl: OpenCL BLAS - dDOTtensorflow: GPU - 512 - AlexNetpytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lviennacl: CPU BLAS - dCOPYtensorflow: GPU - 32 - AlexNettensorflow: GPU - 1 - ResNet-50cl-mem: Writetensorflow: GPU - 64 - GoogLeNettensorflow: GPU - 32 - GoogLeNettensorflow: GPU - 64 - AlexNetviennacl: CPU BLAS - dDOTclpeak: Global Memory Bandwidthopencl-benchmark: Memory Bandwidth Coalesced Readtensorflow: GPU - 16 - GoogLeNetcl-mem: Readtensorflow: GPU - 64 - ResNet-50tensorflow: GPU - 256 - AlexNettensorflow: GPU - 64 - VGG-16tensorflow: GPU - 32 - VGG-16neatbench: GPUviennacl: OpenCL BLAS - dCOPYviennacl: CPU BLAS - dGEMV-Tpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetlibplacebo: hdr_peakdetectfinancebench: Black-Scholes OpenCLrealsr-ncnn: 4x - Novkpeak: NVIDIA RTX 4070 SUPERNVIDIA RTX 407018170.5435492.6934.8855844.214630.11339.593613137.44577646.410.62122132600000869.0738.59480296719.889599323273333311764676758303333317.17027.6710.562186.702327.5514.30712.603.48012.829.455479414.2913.59366.0576445151.30587219538.211.725.57720.973789509.452.855507.45151660.06791504.27504.67501.5052.81319.80124317502997507811573929196.07131705195.40117195.30437195.39194.5812218.489119370103.57557.73156103.17201.944171.001.4812.62334106.373905.98102.60102455.0113.923921321.355.511655.4638921087.2331.831.5945835.10102.6070.833.44.35407.515.5215.6133.9796.8437.65464.8615.67446.25.5534.161.5040704231092.86844.6111.116.8663.8246.2616.178.97117.8111.040.845.073.852.312.253.038.623292.375.9126.32314555.1928479.3942.8524773.443515.17415.160502112.61473530.320.51018202466667714.8031.76866096716.37749426733000009769675614786666714.28423.268.891843.261968.3712.11614.864.09810.9211.034709716.5511.74317.1952388658.44516770131.210.406.21647.997867458.393.168458.36137140.07498459.27459.93459.9448.51718.20322390472127905712177774186.63137762187.26122187.51455187.69187.2711818.016122362101.43546.76153101.24198.184152.411.5012.78330107.593946.90101.55103459.4314.043891311.365.551665.4938720986.8330.331.4545635.21102.9071.033.324.34406.715.5415.6333.9396.7437.21465.1815.66446.35.551.501.54070423109103.682.67382.826.215.1820.748.725.785.1145.526.060.843.592.242.082.152.487.203329.266.9067.092OpenBenchmarking.org

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER4K8K12K16K20KSE +/- 15.26, N = 3SE +/- 3.14, N = 314555.1918170.541. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA RTX 4070NVIDIA RTX 4070 SUPER8K16K24K32K40KSE +/- 5.46, N = 3SE +/- 0.99, N = 328479.3935492.691. (CXX) g++ options: -O3

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 4070NVIDIA RTX 4070 SUPER1020304050SE +/- 0.23, N = 3SE +/- 0.02, N = 342.8534.89

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER130260390520650SE +/- 0.33, N = 3SE +/- 0.00, N = 34775841. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 ComputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.94821.89642.84463.79284.741SE +/- 0.004, N = 3SE +/- 0.015, N = 33.4434.2141. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA RTX 4070NVIDIA RTX 4070 SUPER140280420560700SE +/- 0.21, N = 3SE +/- 0.98, N = 3515.17630.111. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 4070NVIDIA RTX 4070 SUPER90180270360450SE +/- 0.77, N = 3SE +/- 0.30, N = 3415.16339.591. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER130260390520650SE +/- 0.33, N = 3SE +/- 0.00, N = 35026131. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 332220523NVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 0.00, N = 3SE +/- 0.00, N = 3112.61137.44

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA RTX 4070NVIDIA RTX 4070 SUPER120240360480600SE +/- 0.33, N = 3SE +/- 0.00, N = 34735771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 77936867NVIDIA RTX 4070NVIDIA RTX 4070 SUPER140280420560700SE +/- 0.09, N = 3SE +/- 0.00, N = 3530.32646.41

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 ComputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.13970.27940.41910.55880.6985SE +/- 0.001, N = 3SE +/- 0.000, N = 30.5100.6211. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 4070NVIDIA RTX 4070 SUPER5000M10000M15000M20000M25000MSE +/- 6318315.53, N = 3SE +/- 5140363.15, N = 31820246666722132600000

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 57885161NVIDIA RTX 4070NVIDIA RTX 4070 SUPER2004006008001000SE +/- 0.00, N = 3SE +/- 1.26, N = 3714.80869.07

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 ComputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER918273645SE +/- 0.03, N = 3SE +/- 0.03, N = 331.7738.591. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 4070NVIDIA RTX 4070 SUPER200K400K600K800K1000KSE +/- 176.38, N = 3SE +/- 633.33, N = 3660967802967

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 ComputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER510152025SE +/- 0.02, N = 3SE +/- 0.00, N = 316.3819.891. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA RTX 4070NVIDIA RTX 4070 SUPER130260390520650SE +/- 0.33, N = 3SE +/- 0.00, N = 34945991. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 4070NVIDIA RTX 4070 SUPER700M1400M2100M2800M3500MSE +/- 1059874.21, N = 3SE +/- 1530068.99, N = 326733000003232733333

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 4070NVIDIA RTX 4070 SUPER300K600K900K1200K1500KSE +/- 2062.63, N = 3SE +/- 1991.93, N = 39769671176467

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 4070NVIDIA RTX 4070 SUPER14000M28000M42000M56000M70000MSE +/- 33772046.30, N = 3SE +/- 22430807.19, N = 35614786666767583033333

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 ComputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.02, N = 3SE +/- 0.00, N = 314.2817.171. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER714212835SE +/- 0.01, N = 3SE +/- 0.03, N = 323.2627.67MIN: 20.92 / MAX: 24.3MIN: 24.87 / MAX: 29.03

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.06, N = 3SE +/- 0.08, N = 38.8910.56MIN: 3.32 / MAX: 10.26MIN: 3.7 / MAX: 12.17

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: deband_heavyNVIDIA RTX 4070NVIDIA RTX 4070 SUPER5001000150020002500SE +/- 0.12, N = 3SE +/- 2.26, N = 31843.262186.701. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: polar_nocomputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER5001000150020002500SE +/- 0.01, N = 3SE +/- 0.24, N = 31968.372327.551. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 ComputeNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.02, N = 3SE +/- 0.05, N = 312.1214.311. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.03, N = 3SE +/- 0.00, N = 314.8612.60

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.92211.84422.76633.68844.6105SE +/- 0.008, N = 3SE +/- 0.039, N = 44.0983.4801. (CXX) g++ options: -O2 -lOpenCL

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.01, N = 3SE +/- 0.02, N = 310.9212.82MIN: 4.45 / MAX: 12.42MIN: 4.84 / MAX: 14.62

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.03, N = 3SE +/- 0.06, N = 1311.039.45

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT R2C / C2RNVIDIA RTX 4070NVIDIA RTX 4070 SUPER12K24K36K48K60KSE +/- 745.02, N = 13SE +/- 702.53, N = 1547097547941. (CXX) g++ options: -O3 -lrt

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.01, N = 3SE +/- 0.03, N = 316.5514.29

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 311.7413.59MIN: 11.35 / MAX: 11.83MIN: 12.52 / MAX: 13.84

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA RTX 4070NVIDIA RTX 4070 SUPER80160240320400SE +/- 0.12, N = 3SE +/- 0.39, N = 3317.20366.06

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein benchmark in double precisionNVIDIA RTX 4070NVIDIA RTX 4070 SUPER10002000300040005000SE +/- 4.51, N = 3SE +/- 12.55, N = 3388644511. (CXX) g++ options: -O3 -lrt

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 4070NVIDIA RTX 4070 SUPER1326395265SE +/- 0.04, N = 3SE +/- 0.10, N = 358.4451.30

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER130M260M390M520M650MSE +/- 1783157.89, N = 3SE +/- 467034.80, N = 3516770131.2587219538.21. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.03, N = 3SE +/- 0.07, N = 310.4011.72MIN: 8.31 / MAX: 13.9MIN: 9.6 / MAX: 15.44

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 4070NVIDIA RTX 4070 SUPER246810SE +/- 0.01, N = 3SE +/- 0.06, N = 136.215.57

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreNVIDIA RTX 4070NVIDIA RTX 4070 SUPER160320480640800648.00720.97

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER110220330440550SE +/- 0.26, N = 3458.39509.45MIN: 404.5 / MAX: 461.01MIN: 430.1 / MAX: 516.48

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.71281.42562.13842.85123.564SE +/- 0.028, N = 3SE +/- 0.014, N = 33.1682.855

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER110220330440550SE +/- 0.27, N = 3SE +/- 0.92, N = 3458.36507.45MIN: 404.89 / MAX: 461.01MIN: 423.41 / MAX: 512.88

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein in single precisionNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3K6K9K12K15KSE +/- 52.09, N = 3SE +/- 102.52, N = 313714151661. (CXX) g++ options: -O3 -lrt

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.01690.03380.05070.06760.0845SE +/- 0.00021, N = 3SE +/- 0.00031, N = 30.074980.06791

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER110220330440550SE +/- 0.43, N = 2SE +/- 4.43, N = 2459.27504.27MIN: 405.48 / MAX: 461.88MIN: 418.22 / MAX: 512.44

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER110220330440550SE +/- 0.34, N = 3SE +/- 1.39, N = 3459.93504.67MIN: 403.65 / MAX: 462.74MIN: 412.34 / MAX: 514.07

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER110220330440550SE +/- 0.13, N = 2SE +/- 2.17, N = 2459.94501.50MIN: 403.65 / MAX: 462.59MIN: 415.94 / MAX: 510.69

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 4070NVIDIA RTX 4070 SUPER1224364860SE +/- 0.03, N = 3SE +/- 0.03, N = 348.5252.81

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 4070NVIDIA RTX 4070 SUPER510152025SE +/- 0.01, N = 3SE +/- 0.01, N = 318.2019.80

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in double precisionNVIDIA RTX 4070NVIDIA RTX 4070 SUPER5K10K15K20K25KSE +/- 125.94, N = 3SE +/- 146.69, N = 322390243171. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C multidimensional in single precisionNVIDIA RTX 4070NVIDIA RTX 4070 SUPER11K22K33K44K55KSE +/- 476.57, N = 5SE +/- 407.19, N = 1547212502991. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20K40K60K80K100KSE +/- 5.84, N = 3SE +/- 37.77, N = 379057750781. (CXX) g++ options: -O3 -lrt

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 2.31, N = 3SE +/- 1.00, N = 21211151. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precisionNVIDIA RTX 4070NVIDIA RTX 4070 SUPER17K34K51K68K85KSE +/- 13.72, N = 3SE +/- 7.94, N = 377774739291. (CXX) g++ options: -O3 -lrt

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152NVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200SE +/- 0.34, N = 3SE +/- 0.51, N = 3186.63196.07MIN: 180.51 / MAX: 187.79MIN: 171.95 / MAX: 199.96

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in half precisionNVIDIA RTX 4070NVIDIA RTX 4070 SUPER30K60K90K120K150KSE +/- 1301.92, N = 3SE +/- 159.17, N = 31377621317051. (CXX) g++ options: -O3 -lrt

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152NVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200SE +/- 0.29, N = 3187.26195.40MIN: 179.81 / MAX: 188.21MIN: 186.09 / MAX: 197.7

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 1.76, N = 3SE +/- 2.08, N = 31221171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152NVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200SE +/- 0.05, N = 3SE +/- 1.38, N = 2187.51195.30MIN: 181.57 / MAX: 188.05MIN: 182 / MAX: 199.43

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER100200300400500SE +/- 0.00, N = 3SE +/- 0.00, N = 34554371. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152NVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200187.69195.39MIN: 182.03 / MAX: 188.31MIN: 183.94 / MAX: 198.7

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152NVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200SE +/- 0.17, N = 3SE +/- 1.14, N = 2187.27194.58MIN: 179.9 / MAX: 188.08MIN: 183.74 / MAX: 198.52

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 1.20, N = 3SE +/- 2.08, N = 31181221. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 4070NVIDIA RTX 4070 SUPER510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 318.0218.491. (CXX) g++ options: -O3

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 1.86, N = 3SE +/- 4.04, N = 31221191. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER80160240320400SE +/- 0.00, N = 3SE +/- 0.00, N = 33623701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.39, N = 3101.43103.57MIN: 93.27 / MAX: 103.58MIN: 95.95 / MAX: 105.54

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER120240360480600SE +/- 3.09, N = 3546.76557.73MIN: 195.25 / MAX: 556.94MIN: 513.63 / MAX: 563.37

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 4.81, N = 3SE +/- 2.19, N = 31531561. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.05, N = 2101.24103.17MIN: 93.33 / MAX: 102.92MIN: 95.79 / MAX: 105.15

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152NVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200SE +/- 0.36, N = 3198.18201.94MIN: 181.27 / MAX: 200.06MIN: 183.53 / MAX: 206.5

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: av1_grain_lapNVIDIA RTX 4070NVIDIA RTX 4070 SUPER9001800270036004500SE +/- 66.69, N = 3SE +/- 5.52, N = 34103.404171.001. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

TensorFlow

Device: GPU - Batch Size: 16 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.33750.6751.01251.351.6875SE +/- 0.01, N = 2SE +/- 0.00, N = 21.501.48

TensorFlow

Device: GPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.10, N = 3SE +/- 0.17, N = 212.7812.62

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 33303341. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.55, N = 3107.59106.37MIN: 98.77 / MAX: 109.43MIN: 97.91 / MAX: 108.16

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_lutNVIDIA RTX 4070NVIDIA RTX 4070 SUPER8001600240032004000SE +/- 10.06, N = 3SE +/- 12.09, N = 33927.113905.981. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.45, N = 3SE +/- 1.49, N = 2101.55102.60MIN: 93.44 / MAX: 103.08MIN: 79.69 / MAX: 105.28

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 31031021. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 4070NVIDIA RTX 4070 SUPER100200300400500SE +/- 0.16, N = 3SE +/- 0.14, N = 3459.43455.011. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

TensorFlow

Device: GPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.16, N = 3SE +/- 0.22, N = 214.0413.92

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER90180270360450SE +/- 0.00, N = 3SE +/- 0.00, N = 33893921. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 1.20, N = 3SE +/- 1.20, N = 31311321. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.3060.6120.9181.2241.53SE +/- 0.01, N = 21.361.35

TensorFlow

Device: GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.24882.49763.74644.99526.244SE +/- 0.01, N = 2SE +/- 0.01, N = 25.555.51

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER4080120160200SE +/- 3.76, N = 3SE +/- 2.73, N = 31661651. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.23532.47063.70594.94126.1765SE +/- 0.02, N = 3SE +/- 0.00, N = 25.495.46

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA RTX 4070NVIDIA RTX 4070 SUPER80160240320400SE +/- 0.00, N = 3SE +/- 0.00, N = 33873891. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA RTX 4070NVIDIA RTX 4070 SUPER50100150200250SE +/- 0.33, N = 3SE +/- 0.33, N = 32092101. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.44, N = 3SE +/- 0.12, N = 386.887.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA RTX 4070NVIDIA RTX 4070 SUPER70140210280350SE +/- 0.09, N = 3SE +/- 0.03, N = 3330.3331.81. (CC) gcc options: -O2 -flto -lOpenCL

TensorFlow

Device: GPU - Batch Size: 16 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER714212835SE +/- 0.17, N = 331.4531.59

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER100200300400500SE +/- 0.00, N = 3SE +/- 0.00, N = 34564581. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER816243240SE +/- 0.03, N = 3SE +/- 0.02, N = 235.2135.10

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100102.90102.60MIN: 95.98 / MAX: 104.54MIN: 94.84 / MAX: 104.25

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER1632486480SE +/- 0.25, N = 3SE +/- 0.32, N = 371.070.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

TensorFlow

Device: GPU - Batch Size: 32 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER816243240SE +/- 0.18, N = 3SE +/- 0.15, N = 233.3233.40

TensorFlow

Device: GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.97881.95762.93643.91524.894SE +/- 0.01, N = 34.344.35

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA RTX 4070NVIDIA RTX 4070 SUPER90180270360450SE +/- 0.55, N = 3SE +/- 1.11, N = 3406.7407.51. (CC) gcc options: -O2 -flto -lOpenCL

TensorFlow

Device: GPU - Batch Size: 64 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.07, N = 315.5415.52

TensorFlow

Device: GPU - Batch Size: 32 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.03, N = 3SE +/- 0.01, N = 215.6315.61

TensorFlow

Device: GPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER816243240SE +/- 0.14, N = 333.9333.97

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.22, N = 3SE +/- 0.09, N = 396.796.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 4070NVIDIA RTX 4070 SUPER90180270360450SE +/- 0.02, N = 3SE +/- 0.02, N = 3437.21437.651. (CXX) g++ options: -O3

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 4070NVIDIA RTX 4070 SUPER100200300400500SE +/- 0.03, N = 3SE +/- 0.01, N = 3465.18464.861. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

TensorFlow

Device: GPU - Batch Size: 16 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 0.03, N = 3SE +/- 0.03, N = 315.6615.67

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA RTX 4070NVIDIA RTX 4070 SUPER100200300400500SE +/- 0.00, N = 3SE +/- 0.12, N = 3446.3446.21. (CC) gcc options: -O2 -flto -lOpenCL

TensorFlow

Device: GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.24882.49763.74644.99526.244SE +/- 0.00, N = 3SE +/- 0.01, N = 25.555.55

TensorFlow

Device: GPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetNVIDIA RTX 4070 SUPER816243240SE +/- 0.01, N = 334.16

TensorFlow

Device: GPU - Batch Size: 64 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16NVIDIA RTX 40700.33750.6751.01251.351.6875SE +/- 0.00, N = 31.50

TensorFlow

Device: GPU - Batch Size: 32 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.50

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUNVIDIA RTX 4070NVIDIA RTX 4070 SUPER9001800270036004500SE +/- 0.00, N = 3SE +/- 0.00, N = 340704070

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA RTX 4070NVIDIA RTX 4070 SUPER90180270360450SE +/- 0.00, N = 3SE +/- 0.33, N = 34234231. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA RTX 4070NVIDIA RTX 4070 SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 31091091. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lNVIDIA RTX 407020406080100103.68MIN: 96.86 / MAX: 105.56

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.64351.2871.93052.5743.2175SE +/- 0.10, N = 9SE +/- 0.29, N = 92.342.86MIN: 2 / MAX: 3.86MIN: 2.17 / MAX: 577.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 4070NVIDIA RTX 4070 SUPER2004006008001000SE +/- 61.31, N = 9SE +/- 87.53, N = 9281.56844.61MIN: 46.48 / MAX: 1913.33MIN: 46.34 / MAX: 1866.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.24, N = 12SE +/- 3.28, N = 96.2111.11MIN: 5.53 / MAX: 8.99MIN: 5.49 / MAX: 4942.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 4070NVIDIA RTX 4070 SUPER246810SE +/- 0.12, N = 12SE +/- 1.76, N = 95.186.86MIN: 4.67 / MAX: 6.88MIN: 4.34 / MAX: 1630.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 4070NVIDIA RTX 4070 SUPER1428425670SE +/- 5.37, N = 12SE +/- 10.56, N = 920.7463.82MIN: 10.3 / MAX: 854.36MIN: 10.28 / MAX: 858.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50NVIDIA RTX 4070NVIDIA RTX 4070 SUPER1020304050SE +/- 0.10, N = 9SE +/- 14.70, N = 98.2446.26MIN: 7.87 / MAX: 9.87MIN: 7.71 / MAX: 1829.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER48121620SE +/- 1.70, N = 12SE +/- 5.86, N = 95.7816.17MIN: 3.6 / MAX: 397.75MIN: 3.52 / MAX: 436.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18NVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.73, N = 12SE +/- 3.49, N = 95.118.97MIN: 3.99 / MAX: 916.69MIN: 3.94 / MAX: 922.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16NVIDIA RTX 4070NVIDIA RTX 4070 SUPER306090120150SE +/- 13.24, N = 12SE +/- 29.60, N = 945.52117.81MIN: 17.49 / MAX: 643.35MIN: 17.16 / MAX: 647.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.14, N = 9SE +/- 1.21, N = 96.0611.04MIN: 5.33 / MAX: 8.36MIN: 5.28 / MAX: 1769.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.1890.3780.5670.7560.945SE +/- 0.03, N = 9SE +/- 0.04, N = 90.840.84MIN: 0.64 / MAX: 0.96MIN: 0.65 / MAX: 4.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.14082.28163.42244.56325.704SE +/- 0.09, N = 9SE +/- 0.97, N = 93.465.07MIN: 2.91 / MAX: 3.79MIN: 3.22 / MAX: 1124.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.86631.73262.59893.46524.3315SE +/- 0.08, N = 8SE +/- 1.31, N = 92.223.85MIN: 1.83 / MAX: 2.54MIN: 1.89 / MAX: 1093.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 4070NVIDIA RTX 4070 SUPER0.51981.03961.55942.07922.599SE +/- 0.09, N = 11SE +/- 0.34, N = 82.082.31MIN: 1.82 / MAX: 2.59MIN: 1.76 / MAX: 421.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 4070NVIDIA RTX 4070 SUPER246810SE +/- 0.08, N = 12SE +/- 0.16, N = 92.152.25MIN: 1.81 / MAX: 2.58MIN: 1.75 / MAX: 343.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 4070NVIDIA RTX 4070 SUPER1.05532.11063.16594.22125.2765SE +/- 0.07, N = 12SE +/- 0.44, N = 92.483.03MIN: 2.02 / MAX: 5.82MIN: 2.38 / MAX: 970.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 4070NVIDIA RTX 4070 SUPER3691215SE +/- 0.21, N = 12SE +/- 0.47, N = 97.208.62MIN: 6.2 / MAX: 11.13MIN: 6.42 / MAX: 1101.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_peakdetectNVIDIA RTX 4070NVIDIA RTX 4070 SUPER7001400210028003500SE +/- 11.75, N = 3SE +/- 3.65, N = 33310.023292.371. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA RTX 4070NVIDIA RTX 4070 SUPER246810SE +/- 0.003, N = 3SE +/- 0.114, N = 156.9065.9121. (CXX) g++ options: -O3 -march=native -fopenmp

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 4070NVIDIA RTX 4070 SUPER246810SE +/- 0.006, N = 3SE +/- 0.150, N = 157.0926.323


Phoronix Test Suite v10.8.5