RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS) and ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB on EndeavourOS rolling via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2402174-SADD-240211636
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

BLAS (Basic Linear Algebra Sub-Routine) Tests 3 Tests
C++ Boost Tests 2 Tests
CPU Massive 5 Tests
Creator Workloads 4 Tests
HPC - High Performance Computing 8 Tests
Machine Learning 6 Tests
Multi-Core 6 Tests
NVIDIA GPU Compute 25 Tests
OpenCL 5 Tests
Python Tests 5 Tests
Renderers 3 Tests
Server CPU Tests 2 Tests
Vulkan Compute 7 Tests
Common Workstation Benchmarks 2 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
NVIDIA RTX 4070 SUPER
January 25
  23 Hours, 51 Minutes
NVIDIA RTX 4070
January 28
  22 Hours, 26 Minutes
NVIDIA RTX 4070 TI
January 29
  1 Day, 7 Hours, 18 Minutes
NVIDIA RTX 3090
February 07
  1 Day, 10 Hours, 51 Minutes
NVIDIA RTX 4070 TI SUPER
February 15
  1 Day, 17 Hours, 21 Minutes
Invert Hiding All Results Option
  1 Day, 6 Hours, 46 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


RTX 4070 SUPER - Phoronix Test Suite

RTX 4070 SUPER

Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS) and ASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402174-SADD-240211636.

RTX 4070 SUPERProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPERIntel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads)ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS)Intel Device 7a2732GB4001GB Seagate ZP4000GP304001ASUS NVIDIA GeForce RTX 4070 SUPER 12GBRealtek ALC1220ARZOPAIntel I226-V + Intel Device 7a70EndeavourOS rolling6.7.1-arch1-1 (x86_64)KDE Plasma 5.27.10X Server 1.21.1.11NVIDIA 550.40.074.6.0OpenCL 3.0 CUDA 12.4.74GCC 13.2.1 20230801ext41920x1080MSI NVIDIA GeForce RTX 4070 12GBGCC 13.2.1 20230801 + CUDA 12.3NVIDIA GeForce RTX 4070 Ti 12GBNVIDIA GeForce RTX 3090 24GBPI-KVM Video6.7.4-arch1-1 (x86_64)ASUS TUF GAMING Z790-PRO WIFI (1630 BIOS)Intel Raptor Lake-S PCH4001GB Seagate ZP4000GP304001 + 0GB CD-ROM DriveASUS NVIDIA GeForce RTX 4070 Ti SUPER 16GBIntel I226-V + Intel Raptor Lake-S PCH CNVi WiFiOpenCL 2.1 AMD-APP (3602.0) + OpenCL 3.0 CUDA 12.4.74OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- NVIDIA RTX 4070 SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 3090: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - NVIDIA RTX 4070 TI SUPER: --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- NVIDIA RTX 4070 SUPER: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070 TI: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 3090: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11d- NVIDIA RTX 4070 TI SUPER: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x11fGraphics Details- NVIDIA RTX 4070 SUPER: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.69.00.c1- NVIDIA RTX 4070: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.3e.40.2a- NVIDIA RTX 4070 TI: BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.36- NVIDIA RTX 3090: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.26.08.ba- NVIDIA RTX 4070 TI SUPER: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 95.03.45.00.c5Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected Environment Details- NVIDIA RTX 4070, NVIDIA RTX 4070 TI, NVIDIA RTX 3090, NVIDIA RTX 4070 TI SUPER: NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Python Details- NVIDIA RTX 4070: Python 3.11.6- NVIDIA RTX 4070 TI: Python 3.11.6- NVIDIA RTX 3090: Python 3.11.6- NVIDIA RTX 4070 TI SUPER: Python 3.11.7

RTX 4070 SUPERopencl-benchmark: FP64 Computeopencl-benchmark: FP32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT8 Computeopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: Memory Bandwidth Coalesced Writepytorch: NVIDIA CUDA GPU - 1 - ResNet-50pytorch: NVIDIA CUDA GPU - 1 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 16 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-152pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lgpuowl: 57885161gpuowl: 77936867gpuowl: 332220523realsr-ncnn: 4x - Norealsr-ncnn: 4x - Yeswaifu2x-ncnn: 2x - 3 - Yesvkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C Bluestein in single precisionvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflinghashcat: MD5hashcat: SHA1hashcat: 7-Ziphashcat: SHA-512hashcat: TrueCrypt RIPEMD160 + XTScl-mem: Copycl-mem: Readcl-mem: Writenamd-cuda: ATPase Simulation - 327,506 Atomsvkresample: 2x - Doublevkresample: 2x - Singleoctanebench: Total Scorefahbench: clpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidthrodinia: OpenCL Particle Filterluxcorerender: DLSC - GPUluxcorerender: Danish Mood - GPUluxcorerender: Orange Juice - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Rainbow Colors and Prism - GPUfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTblender: BMW27 - NVIDIA OptiXblender: Classroom - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXblender: Pabellon Barcelona - NVIDIA OptiXindigobench: OpenCL GPU - Bedroomindigobench: OpenCL GPU - Supercarmandelgpu: GPUneatbench: GPUtensorflow: GPU - 1 - VGG-16tensorflow: GPU - 1 - AlexNettensorflow: GPU - 16 - VGG-16tensorflow: GPU - 32 - VGG-16tensorflow: GPU - 64 - VGG-16tensorflow: GPU - 16 - AlexNettensorflow: GPU - 256 - VGG-16tensorflow: GPU - 32 - AlexNettensorflow: GPU - 64 - AlexNettensorflow: GPU - 1 - GoogLeNettensorflow: GPU - 1 - ResNet-50tensorflow: GPU - 256 - AlexNettensorflow: GPU - 512 - AlexNettensorflow: GPU - 16 - GoogLeNettensorflow: GPU - 16 - ResNet-50tensorflow: GPU - 32 - GoogLeNettensorflow: GPU - 32 - ResNet-50tensorflow: GPU - 64 - GoogLeNettensorflow: GPU - 64 - ResNet-50libplacebo: deband_heavylibplacebo: polar_nocomputelibplacebo: hdr_peakdetectlibplacebo: hdr_lutlibplacebo: av1_grain_lapncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetvkpeak: fp32-scalarvkpeak: fp32-vec4vkpeak: fp16-scalarvkpeak: fp16-vec4vkpeak: fp64-scalarvkpeak: fp64-vec4vkpeak: int32-scalarvkpeak: int32-vec4vkpeak: int16-scalarvkpeak: int16-vec4NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.62138.5944.21419.88917.17014.307464.86455.01557.73201.94509.45501.50507.45195.40504.67195.39504.27196.07194.58195.30106.37102.60102.60103.17103.57869.07646.41137.446.32334.8852.8555479413170515166243177392950299445175078675830333332213260000011764673232733333802967331.8446.2407.50.06791339.59318.489720.973789366.057618170.5435492.69630.11437.653.48013.5910.5611.7212.8227.675.91213215616570.887.296.81021091191171151223343923704234374582103895775845996135.5712.609.4551.3014.2919.80152.813587219538.240701.3513.921.481.5031.5933.433.9712.624.3534.1635.1015.675.4615.615.5115.525.552186.702327.553292.373905.984171.008.623.032.252.313.855.070.8411.04117.818.9716.1746.2663.826.8611.11844.612.860.51031.7683.44316.37714.28412.116465.18459.43546.76198.18458.39459.94458.36187.26459.93187.69459.27186.63187.27187.51107.59103.68102.90101.55101.24101.43714.80530.32112.617.09242.8523.168470971377621371422390777744721238867905756147866667182024666679769672673300000660967330.3446.3406.70.07498415.16018.016647.997867317.195214555.1928479.39515.17437.214.09811.748.8910.4010.9223.266.90613115316671.086.896.71031091221221211183303893624234554562093874734774945026.2114.8611.0358.4416.5518.20348.517516770131.240701.3614.041.501.51.5031.4533.3233.9312.784.3435.2115.665.4915.635.5515.545.551843.261968.373329.263946.904152.417.202.482.152.082.243.590.846.0645.525.115.788.7220.745.186.21382.822.670.66040.9144.42021.04718.28115.731465.07457.17535.39201.19502.92505.55505.62194.29198.82504.66197.02195.86194.87108.59103.4596.50103.20103.24103.50919.13676.59145.845.96233.6262.8545544613621015125254317394251528464775141733122333332353240000012626333462500000858600333.3446.3412.20.06788322.06418.456735.940593382.163719821.1038691.73667.05437.633.29113.9510.9911.8913.2327.715.22613215616871.387.396.4103102.71171181251243363933654244374572113916046126346485.4312.309.0250.7313.9720.25653.589619106132.540701.3814.791.491.51.531.701.533.2934.0612.794.3234.6135.4415.695.4615.815.5015.505.532306.562459.033475.063976.044143.967.452.542.092.014.143.460.827.3734.497.746.0712.2516.376.135.89497.663.040.63739.3953.13520.02717.00113.727864.11887.31525.12197.12419.76420.29419.03164.14416.89163.74416.20164.14161.01164.35105.5598.1199.0599.8499.4399.25866.31645.99137.325.55630.3133.202484182732211420530912141876508564195144311671773000002132373333310560003081866667797833360.8825.8753.80.10822333.63910.323674.250912343.019917923.3334906.79642.23816.553.84412.9910.2012.1413.1233.295.741132154132.170.286.295.21031101131191211133634983766057246591873745925955945936.3115.2610.6454.3017.3020.95952.014484098913.830901.3814.451.491.51.5131.981.5133.5333.9312.824.3534.4635.5815.685.4915.675.5715.635.572020.162116.795055.883369.884100.367.272.342.212.042.163.340.876.1417.884.123.6012.7011.294.906.73354.572.6520353.9526699.6620151.4439860.80638.84638.7420295.2720009.7313264.9116329.720.74345.9504.41423.66020.50317.615619.03608.94558.82200.46531.96532.77527.82198.58529.14197.82529.49196.50198.70198.01105.86103.66102.83103.49102.83103.531025.99761.61163.415.63330.7242.660593781439921614127947104003597905047105549820049666672638860000014207003887033333961733370.7595.2551.90.07715285.98813.363876.436994394.735622171.2543244.79750.36582.842.97316.2312.4213.6414.6131.860.50110712012952.764.370.878.582.61221191201173734694105125855752184246816897147315.0411.208.3244.4912.5624.57061.338656484783.72084.11.3212.261.451.461.4631.101.4732.8833.5512.244.1433.9535.0215.295.3215.115.3515.005.332495.922653.033913.343822.164044.727.482.702.162.132.263.480.886.4624.857.584.418.7917.205.196.59312.102.5523920.6731635.4723894.7047340.52750.49750.6823888.0223768.2715901.3221156.99OpenBenchmarking.org

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.16720.33440.50160.66880.836SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.6210.5100.6600.6370.7431. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1020304050SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.10, N = 3SE +/- 0.01, N = 338.5931.7740.9139.4045.951. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.99451.9892.98353.9784.9725SE +/- 0.015, N = 3SE +/- 0.004, N = 3SE +/- 0.016, N = 3SE +/- 0.003, N = 3SE +/- 0.009, N = 34.2143.4434.4203.1354.4141. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER612182430SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.01, N = 319.8916.3821.0520.0323.661. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER510152025SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 317.1714.2818.2817.0020.501. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 ComputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.00, N = 314.3112.1215.7313.7317.621. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced ReadNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 3464.86465.18465.07864.11619.031. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced WriteNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.14, N = 3SE +/- 0.16, N = 3SE +/- 0.11, N = 3SE +/- 0.06, N = 3SE +/- 0.57, N = 3455.01459.43457.17887.31608.941. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER120240360480600SE +/- 3.09, N = 3SE +/- 11.16, N = 12SE +/- 3.07, N = 3557.73546.76535.39525.12558.82MIN: 513.63 / MAX: 563.37MIN: 195.25 / MAX: 556.94MIN: 428.43 / MAX: 572.99MIN: 458.54 / MAX: 542.46MIN: 473.77 / MAX: 573.46

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.36, N = 3SE +/- 0.73, N = 3SE +/- 0.09, N = 2SE +/- 0.38, N = 3201.94198.18201.19197.12200.46MIN: 183.53 / MAX: 206.5MIN: 181.27 / MAX: 200.06MIN: 180.79 / MAX: 203.92MIN: 137.37 / MAX: 198.9MIN: 177.25 / MAX: 203.31

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER120240360480600SE +/- 0.26, N = 3SE +/- 2.23, N = 3SE +/- 0.89, N = 2SE +/- 1.33, N = 3509.45458.39502.92419.76531.96MIN: 430.1 / MAX: 516.48MIN: 404.5 / MAX: 461.01MIN: 415.65 / MAX: 520.39MIN: 376.2 / MAX: 422.17MIN: 422.98 / MAX: 539.81

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER120240360480600SE +/- 2.17, N = 2SE +/- 0.13, N = 2SE +/- 1.69, N = 3SE +/- 0.70, N = 3501.50459.94505.55420.29532.77MIN: 415.94 / MAX: 510.69MIN: 403.65 / MAX: 462.59MIN: 419.93 / MAX: 512.69MIN: 376.81 / MAX: 421.58MIN: 420.31 / MAX: 538.98

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER110220330440550SE +/- 0.92, N = 3SE +/- 0.27, N = 3SE +/- 1.92, N = 3SE +/- 0.24, N = 3SE +/- 1.58, N = 3507.45458.36505.62419.03527.82MIN: 423.41 / MAX: 512.88MIN: 404.89 / MAX: 461.01MIN: 426.6 / MAX: 513.25MIN: 376 / MAX: 422MIN: 419.39 / MAX: 534.44

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.29, N = 3SE +/- 0.33, N = 3195.40187.26194.29164.14198.58MIN: 186.09 / MAX: 197.7MIN: 179.81 / MAX: 188.21MIN: 182.25 / MAX: 197.39MIN: 145.67 / MAX: 165.38MIN: 183.91 / MAX: 201.98

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER110220330440550SE +/- 1.39, N = 3SE +/- 0.34, N = 3SE +/- 0.14, N = 3SE +/- 0.54, N = 3504.67459.93416.89529.14MIN: 412.34 / MAX: 514.07MIN: 403.65 / MAX: 462.74MIN: 329.77 / MAX: 420.82MIN: 414.54 / MAX: 534.65

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.29, N = 3SE +/- 0.28, N = 3195.39187.69198.82163.74197.82MIN: 183.94 / MAX: 198.7MIN: 182.03 / MAX: 188.31MIN: 188.33 / MAX: 201.47MIN: 144.93 / MAX: 165.03MIN: 176.19 / MAX: 201.63

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER110220330440550SE +/- 4.43, N = 2SE +/- 0.43, N = 2SE +/- 0.83, N = 2SE +/- 0.40, N = 3SE +/- 1.16, N = 3504.27459.27504.66416.20529.49MIN: 418.22 / MAX: 512.44MIN: 405.48 / MAX: 461.88MIN: 424.27 / MAX: 509.08MIN: 355.45 / MAX: 419.05MIN: 410.12 / MAX: 537.25

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.51, N = 3SE +/- 0.34, N = 3SE +/- 0.78, N = 2SE +/- 0.20, N = 3196.07186.63197.02164.14196.50MIN: 171.95 / MAX: 199.96MIN: 180.51 / MAX: 187.79MIN: 183.92 / MAX: 200.54MIN: 149 / MAX: 165MIN: 179.34 / MAX: 200

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 1.14, N = 2SE +/- 0.17, N = 3SE +/- 0.19, N = 2SE +/- 0.95, N = 3194.58187.27195.86161.01198.70MIN: 183.74 / MAX: 198.52MIN: 179.9 / MAX: 188.08MIN: 181.64 / MAX: 199.2MIN: 138.12 / MAX: 165.16MIN: 185.21 / MAX: 203.36

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 1.38, N = 2SE +/- 0.05, N = 3SE +/- 0.33, N = 2SE +/- 0.81, N = 3195.30187.51194.87164.35198.01MIN: 182 / MAX: 199.43MIN: 181.57 / MAX: 188.05MIN: 180.8 / MAX: 198MIN: 149.91 / MAX: 166.09MIN: 185.3 / MAX: 202.59

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.55, N = 3SE +/- 0.33, N = 3SE +/- 0.24, N = 2106.37107.59108.59105.55105.86MIN: 97.91 / MAX: 108.16MIN: 98.77 / MAX: 109.43MIN: 99.04 / MAX: 110.68MIN: 91.76 / MAX: 107.42MIN: 95.05 / MAX: 107.6

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.52, N = 2SE +/- 0.53, N = 2SE +/- 0.33, N = 3103.68103.4598.11103.66MIN: 96.86 / MAX: 105.56MIN: 95.22 / MAX: 105.88MIN: 89.88 / MAX: 100.25MIN: 93.46 / MAX: 105.95

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 6.65, N = 5SE +/- 0.13, N = 3SE +/- 0.62, N = 3102.60102.9096.5099.05102.83MIN: 94.84 / MAX: 104.25MIN: 95.98 / MAX: 104.54MIN: 64.35 / MAX: 104.79MIN: 91.8 / MAX: 100.69MIN: 92.44 / MAX: 105.47

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 1.49, N = 2SE +/- 0.45, N = 3SE +/- 0.39, N = 2SE +/- 0.14, N = 3SE +/- 0.13, N = 3102.60101.55103.2099.84103.49MIN: 79.69 / MAX: 105.28MIN: 93.44 / MAX: 103.08MIN: 95.31 / MAX: 105.27MIN: 92.73 / MAX: 101.46MIN: 93.23 / MAX: 105.43

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.05, N = 2SE +/- 0.57, N = 3SE +/- 0.18, N = 3103.17101.24103.2499.43102.83MIN: 95.79 / MAX: 105.15MIN: 93.33 / MAX: 102.92MIN: 95.41 / MAX: 104.9MIN: 90.49 / MAX: 101.97MIN: 93.16 / MAX: 105.07

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.39, N = 3SE +/- 0.36, N = 2SE +/- 0.19, N = 3103.57101.43103.5099.25103.53MIN: 95.95 / MAX: 105.54MIN: 93.27 / MAX: 103.58MIN: 94.95 / MAX: 105.61MIN: 91.16 / MAX: 101.18MIN: 88.81 / MAX: 104.8

GpuOwl

Exponent: 57885161

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 57885161NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 1.26, N = 3SE +/- 0.00, N = 3SE +/- 2.53, N = 3SE +/- 2.01, N = 3SE +/- 0.35, N = 3869.07714.80919.13866.311025.99

GpuOwl

Exponent: 77936867

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 77936867NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.00, N = 3SE +/- 0.09, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3646.41530.32676.59645.99761.61

GpuOwl

Exponent: 332220523

OpenBenchmarking.orgIterations / Second, More Is BetterGpuOwl 7.2.1Exponent: 332220523NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3137.44112.61145.84137.32163.41

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER246810SE +/- 0.150, N = 15SE +/- 0.006, N = 3SE +/- 0.039, N = 3SE +/- 0.016, N = 3SE +/- 0.003, N = 36.3237.0925.9625.5565.633

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1020304050SE +/- 0.02, N = 3SE +/- 0.23, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.02, N = 334.8942.8533.6330.3130.72

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.72051.4412.16152.8823.6025SE +/- 0.014, N = 3SE +/- 0.028, N = 3SE +/- 0.009, N = 3SE +/- 0.011, N = 3SE +/- 0.028, N = 32.8553.1682.8543.2022.660

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT R2C / C2RNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER13K26K39K52K65KSE +/- 702.53, N = 15SE +/- 745.02, N = 13SE +/- 520.37, N = 3SE +/- 320.62, N = 3SE +/- 772.47, N = 1554794470975544648418593781. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in half precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER60K120K180K240K300KSE +/- 159.17, N = 3SE +/- 1301.92, N = 3SE +/- 1708.38, N = 3SE +/- 160.60, N = 3SE +/- 3524.05, N = 121317051377621362102732211439921. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3K6K9K12K15KSE +/- 102.52, N = 3SE +/- 52.09, N = 3SE +/- 118.41, N = 3SE +/- 115.62, N = 3SE +/- 73.00, N = 315166137141512514205161411. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in double precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER7K14K21K28K35KSE +/- 146.69, N = 3SE +/- 125.94, N = 3SE +/- 302.46, N = 3SE +/- 50.66, N = 3SE +/- 325.03, N = 324317223902543130912279471. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER30K60K90K120K150KSE +/- 7.94, N = 3SE +/- 13.72, N = 3SE +/- 0.88, N = 3SE +/- 9.64, N = 3SE +/- 33.60, N = 37392977774739421418761040031. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C multidimensional in single precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER13K26K39K52K65KSE +/- 407.19, N = 15SE +/- 476.57, N = 5SE +/- 417.77, N = 15SE +/- 407.28, N = 15SE +/- 251.10, N = 350299472125152850856597901. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C Bluestein benchmark in double precisionNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER11002200330044005500SE +/- 12.55, N = 3SE +/- 4.51, N = 3SE +/- 11.35, N = 3SE +/- 9.84, N = 3SE +/- 11.37, N = 3445138864647419550471. (CXX) g++ options: -O3 -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.2.31Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER30K60K90K120K150KSE +/- 37.77, N = 3SE +/- 5.84, N = 3SE +/- 28.54, N = 3SE +/- 37.44, N = 3SE +/- 20.80, N = 37507879057751411443111055491. (CXX) g++ options: -O3 -lrt

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20000M40000M60000M80000M100000MSE +/- 22430807.19, N = 3SE +/- 33772046.30, N = 3SE +/- 11283665.68, N = 3SE +/- 53667246.37, N = 3SE +/- 97655010.68, N = 36758303333356147866667733122333336717730000082004966667

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER6000M12000M18000M24000M30000MSE +/- 5140363.15, N = 3SE +/- 6318315.53, N = 3SE +/- 15926811.78, N = 3SE +/- 26244639.66, N = 3SE +/- 29067564.97, N = 32213260000018202466667235324000002132373333326388600000

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER300K600K900K1200K1500KSE +/- 1991.93, N = 3SE +/- 2062.63, N = 3SE +/- 2339.04, N = 3SE +/- 1587.45, N = 3SE +/- 1628.91, N = 31176467976967126263310560001420700

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER800M1600M2400M3200M4000MSE +/- 1530068.99, N = 3SE +/- 1059874.21, N = 3SE +/- 721110.26, N = 3SE +/- 3288532.26, N = 3SE +/- 1098989.43, N = 332327333332673300000346250000030818666673887033333

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER200K400K600K800K1000KSE +/- 633.33, N = 3SE +/- 176.38, N = 3SE +/- 888.82, N = 3SE +/- 1757.21, N = 3SE +/- 392.99, N = 3802967660967858600797833961733

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER80160240320400SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.00, N = 3SE +/- 0.22, N = 3SE +/- 0.00, N = 3331.8330.3333.3360.8370.71. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.12, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.32, N = 3SE +/- 0.00, N = 3446.2446.3446.3825.8595.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 1.11, N = 3SE +/- 0.55, N = 3SE +/- 0.12, N = 3SE +/- 0.83, N = 3SE +/- 0.25, N = 3407.5406.7412.2753.8551.91. (CC) gcc options: -O2 -flto -lOpenCL

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.02430.04860.07290.09720.1215SE +/- 0.00031, N = 3SE +/- 0.00021, N = 3SE +/- 0.00061, N = 3SE +/- 0.00042, N = 3SE +/- 0.00018, N = 30.067910.074980.067880.108220.07715

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: DoubleNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.30, N = 3SE +/- 0.77, N = 3SE +/- 0.35, N = 3SE +/- 0.30, N = 3SE +/- 0.02, N = 3339.59415.16322.06333.64285.991. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 318.4918.0218.4610.3213.361. (CXX) g++ options: -O3

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000720.97648.00735.94674.25876.44

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.39, N = 3SE +/- 0.12, N = 3SE +/- 0.26, N = 3SE +/- 0.26, N = 3SE +/- 0.22, N = 3366.06317.20382.16343.02394.74

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 3.14, N = 3SE +/- 15.26, N = 3SE +/- 2.50, N = 3SE +/- 16.49, N = 3SE +/- 28.14, N = 318170.5414555.1919821.1017923.3322171.251. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER9K18K27K36K45KSE +/- 0.99, N = 3SE +/- 5.46, N = 3SE +/- 11.67, N = 3SE +/- 113.39, N = 3SE +/- 50.25, N = 335492.6928479.3938691.7334906.7943244.791. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.98, N = 3SE +/- 0.21, N = 3SE +/- 1.33, N = 3SE +/- 1.63, N = 3SE +/- 1.26, N = 3630.11515.17667.05642.23750.361. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3437.65437.21437.63816.55582.841. (CXX) g++ options: -O3

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.92211.84422.76633.68844.6105SE +/- 0.039, N = 4SE +/- 0.008, N = 3SE +/- 0.002, N = 3SE +/- 0.030, N = 15SE +/- 0.004, N = 33.4804.0983.2913.8442.9731. (CXX) g++ options: -O2 -lOpenCL

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 1.13, N = 12SE +/- 0.01, N = 313.5911.7413.9512.9916.23MIN: 12.52 / MAX: 13.84MIN: 11.35 / MAX: 11.83MIN: 13.67 / MAX: 14.14MIN: 0.52 / MAX: 14.69MIN: 15.91 / MAX: 16.36

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3691215SE +/- 0.08, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 310.568.8910.9910.2012.42MIN: 3.7 / MAX: 12.17MIN: 3.32 / MAX: 10.26MIN: 4.17 / MAX: 12.71MIN: 4.07 / MAX: 11.93MIN: 4.35 / MAX: 14.32

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.15, N = 411.7210.4011.8912.1413.64MIN: 9.6 / MAX: 15.44MIN: 8.31 / MAX: 13.9MIN: 9.85 / MAX: 15.88MIN: 10.24 / MAX: 16.71MIN: 11.16 / MAX: 18.46

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 2SE +/- 0.00, N = 312.8210.9213.2313.1214.61MIN: 4.84 / MAX: 14.62MIN: 4.45 / MAX: 12.42MIN: 5.41 / MAX: 15.13MIN: 4.85 / MAX: 15.21MIN: 5.91 / MAX: 16.88

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER816243240SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.36, N = 5SE +/- 0.09, N = 327.6723.2627.7133.2931.86MIN: 24.87 / MAX: 29.03MIN: 20.92 / MAX: 24.3MIN: 25.01 / MAX: 29.15MIN: 30.4 / MAX: 36.21MIN: 28.57 / MAX: 33.29

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER246810SE +/- 0.114, N = 15SE +/- 0.003, N = 3SE +/- 0.003, N = 3SE +/- 0.006, N = 3SE +/- 0.000, N = 35.9126.9065.2265.7410.5011. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 1.20, N = 3SE +/- 1.20, N = 3SE +/- 0.88, N = 3SE +/- 1.20, N = 3SE +/- 0.67, N = 31321311321321071. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 2.19, N = 3SE +/- 4.81, N = 3SE +/- 2.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 31561531561541201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER4080120160200SE +/- 2.73, N = 3SE +/- 3.76, N = 3SE +/- 2.40, N = 3SE +/- 35.40, N = 3SE +/- 0.58, N = 3165.0166.0168.0132.1129.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1632486480SE +/- 0.32, N = 3SE +/- 0.25, N = 3SE +/- 0.74, N = 3SE +/- 0.72, N = 3SE +/- 0.18, N = 370.871.071.370.252.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.12, N = 3SE +/- 0.44, N = 3SE +/- 0.57, N = 3SE +/- 0.94, N = 3SE +/- 0.12, N = 387.286.887.386.264.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.09, N = 3SE +/- 0.22, N = 3SE +/- 0.58, N = 3SE +/- 0.84, N = 3SE +/- 0.19, N = 396.896.796.495.270.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.88, N = 3SE +/- 0.46, N = 3102.0103.0103.0103.078.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 6.30, N = 3SE +/- 0.33, N = 3SE +/- 0.47, N = 3109.0109.0102.7110.082.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 4.04, N = 3SE +/- 1.86, N = 3SE +/- 1.15, N = 3SE +/- 1.86, N = 3SE +/- 1.50, N = 21191221171131221. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 2.08, N = 3SE +/- 1.76, N = 3SE +/- 1.20, N = 3SE +/- 3.28, N = 3SE +/- 3.50, N = 21171221181191191. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 1.00, N = 2SE +/- 2.31, N = 3SE +/- 2.08, N = 3SE +/- 2.08, N = 3SE +/- 3.00, N = 21151211251211201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 2.08, N = 3SE +/- 1.20, N = 3SE +/- 2.08, N = 3SE +/- 0.88, N = 3SE +/- 2.91, N = 31221181241131171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER80160240320400SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 3SE +/- 0.33, N = 33343303363633731. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER110220330440550SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.58, N = 3SE +/- 1.20, N = 33923893934984691. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.58, N = 3SE +/- 1.00, N = 33703623653764101. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER130260390520650SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.58, N = 3SE +/- 4.00, N = 34234234246055121. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.58, N = 3SE +/- 0.00, N = 34374554377245851. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER140280420560700SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.88, N = 3SE +/- 1.33, N = 34584564576595751. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER50100150200250SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 32102092111872181. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER90180270360450SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33893873913744241. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER150300450600750SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 2.31, N = 3SE +/- 1.33, N = 35774736045926811. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER150300450600750SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 2.33, N = 3SE +/- 1.00, N = 35844776125956891. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER150300450600750SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.67, N = 3SE +/- 2.03, N = 3SE +/- 1.00, N = 35994946345947141. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.33, N = 36135026485937311. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER246810SE +/- 0.06, N = 13SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 14SE +/- 0.06, N = 145.576.215.436.315.04

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 312.6014.8612.3015.2611.20

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3691215SE +/- 0.06, N = 13SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.08, N = 9SE +/- 0.06, N = 139.4511.039.0210.648.32

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1326395265SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 2SE +/- 0.08, N = 351.3058.4450.7354.3044.49

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 314.2916.5513.9717.3012.56

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 319.8018.2020.2620.9624.57

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1428425670SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 352.8148.5253.5952.0161.34

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER140M280M420M560M700MSE +/- 467034.80, N = 3SE +/- 1783157.89, N = 3SE +/- 1202791.77, N = 3SE +/- 794770.01, N = 3SE +/- 1096202.13, N = 3587219538.2516770131.2619106132.5484098913.8656484783.71. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 512.75, N = 164070.04070.04070.03090.02084.1

TensorFlow

Device: GPU - Batch Size: 1 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.31050.6210.93151.2421.5525SE +/- 0.01, N = 2SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 31.351.361.381.381.32

TensorFlow

Device: GPU - Batch Size: 1 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.22, N = 2SE +/- 0.16, N = 3SE +/- 0.06, N = 2SE +/- 0.20, N = 15SE +/- 0.13, N = 1513.9214.0414.7914.4512.26

TensorFlow

Device: GPU - Batch Size: 16 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 2SE +/- 0.01, N = 2SE +/- 0.00, N = 3SE +/- 0.00, N = 31.481.501.491.491.45

TensorFlow

Device: GPU - Batch Size: 32 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.33750.6751.01251.351.6875SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 2SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.501.501.501.46

TensorFlow

Device: GPU - Batch Size: 64 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16NVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.501.511.46

TensorFlow

Device: GPU - Batch Size: 16 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER714212835SE +/- 0.17, N = 3SE +/- 0.08, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 331.5931.4531.7031.9831.10

TensorFlow

Device: GPU - Batch Size: 256 - Model: VGG-16

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: VGG-16NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.33980.67961.01941.35921.699SE +/- 0.00, N = 3SE +/- 0.00, N = 31.501.511.47

TensorFlow

Device: GPU - Batch Size: 32 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER816243240SE +/- 0.15, N = 2SE +/- 0.18, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.19, N = 333.4033.3233.2933.5332.88

TensorFlow

Device: GPU - Batch Size: 64 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER816243240SE +/- 0.14, N = 3SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 0.06, N = 333.9733.9334.0633.9333.55

TensorFlow

Device: GPU - Batch Size: 1 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3691215SE +/- 0.17, N = 2SE +/- 0.10, N = 3SE +/- 0.30, N = 2SE +/- 0.07, N = 3SE +/- 0.05, N = 312.6212.7812.7912.8212.24

TensorFlow

Device: GPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.97881.95762.93643.91524.894SE +/- 0.01, N = 3SE +/- 0.02, N = 2SE +/- 0.03, N = 3SE +/- 0.02, N = 34.354.344.324.354.14

TensorFlow

Device: GPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER816243240SE +/- 0.01, N = 3SE +/- 0.07, N = 2SE +/- 0.07, N = 3SE +/- 0.05, N = 334.1634.6134.4633.95

TensorFlow

Device: GPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER816243240SE +/- 0.02, N = 2SE +/- 0.03, N = 3SE +/- 0.09, N = 2SE +/- 0.01, N = 3SE +/- 0.01, N = 335.1035.2135.4435.5835.02

TensorFlow

Device: GPU - Batch Size: 16 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 315.6715.6615.6915.6815.29

TensorFlow

Device: GPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1.23532.47063.70594.94126.1765SE +/- 0.00, N = 2SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 35.465.495.465.495.32

TensorFlow

Device: GPU - Batch Size: 32 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.01, N = 2SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.06, N = 315.6115.6315.8115.6715.11

TensorFlow

Device: GPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1.25332.50663.75995.01326.2665SE +/- 0.01, N = 2SE +/- 0.01, N = 2SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 35.515.555.505.575.35

TensorFlow

Device: GPU - Batch Size: 64 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.07, N = 3SE +/- 0.06, N = 2SE +/- 0.08, N = 3SE +/- 0.09, N = 315.5215.5415.5015.6315.00

TensorFlow

Device: GPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1.25332.50663.75995.01326.2665SE +/- 0.01, N = 2SE +/- 0.00, N = 3SE +/- 0.01, N = 2SE +/- 0.01, N = 3SE +/- 0.02, N = 35.555.555.535.575.33

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: deband_heavyNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5001000150020002500SE +/- 2.26, N = 3SE +/- 0.08, N = 3SE +/- 0.56, N = 3SE +/- 4.93, N = 3SE +/- 2.22, N = 32186.701847.982306.672017.752493.291. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: polar_nocomputeNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER6001200180024003000SE +/- 0.24, N = 3SE +/- 0.16, N = 3SE +/- 0.26, N = 3SE +/- 7.22, N = 3SE +/- 0.38, N = 32327.551972.782461.232119.892646.701. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_peakdetectNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER11002200330044005500SE +/- 3.65, N = 3SE +/- 11.75, N = 3SE +/- 99.97, N = 3SE +/- 43.13, N = 3SE +/- 28.18, N = 33292.373310.023544.604997.083931.571. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: hdr_lutNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 12.09, N = 3SE +/- 10.06, N = 3SE +/- 5.47, N = 3SE +/- 13.62, N = 3SE +/- 17.88, N = 33905.983927.113971.613313.263845.511. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 5.229.1Test: av1_grain_lapNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER9001800270036004500SE +/- 5.52, N = 3SE +/- 66.69, N = 3SE +/- 21.66, N = 3SE +/- 12.99, N = 3SE +/- 39.01, N = 34171.004103.404140.874126.894057.411. (CXX) g++ options: -lm -pthread -ldl -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -MD -MQ -MF

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3691215SE +/- 0.47, N = 9SE +/- 2.50, N = 9SE +/- 0.98, N = 9SE +/- 4.97, N = 6SE +/- 0.05, N = 38.6210.148.4312.076.28MIN: 6.42 / MAX: 1101.3MIN: 6.53 / MAX: 1509.26MIN: 6.51 / MAX: 1023.8MIN: 6.42 / MAX: 1193.34MIN: 6.16 / MAX: 8.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1.05532.11063.16594.22125.2765SE +/- 0.44, N = 9SE +/- 2.36, N = 9SE +/- 0.07, N = 9SE +/- 0.13, N = 6SE +/- 0.09, N = 33.034.692.432.652.42MIN: 2.38 / MAX: 970.87MIN: 1.91 / MAX: 1305.64MIN: 2.09 / MAX: 5.8MIN: 2.23 / MAX: 6.49MIN: 2.24 / MAX: 9.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER246810SE +/- 0.16, N = 9SE +/- 6.77, N = 8SE +/- 0.09, N = 9SE +/- 1.14, N = 6SE +/- 0.02, N = 32.258.712.093.191.87MIN: 1.75 / MAX: 343.7MIN: 1.73 / MAX: 1561.29MIN: 1.78 / MAX: 2.85MIN: 1.82 / MAX: 1210.31MIN: 1.81 / MAX: 5.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.93831.87662.81493.75324.6915SE +/- 0.34, N = 8SE +/- 0.12, N = 7SE +/- 0.10, N = 8SE +/- 2.18, N = 6SE +/- 0.19, N = 32.312.112.034.172.05MIN: 1.76 / MAX: 421.42MIN: 1.77 / MAX: 2.53MIN: 1.84 / MAX: 2.58MIN: 1.83 / MAX: 1393.33MIN: 1.83 / MAX: 6.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.93151.8632.79453.7264.6575SE +/- 1.31, N = 9SE +/- 0.08, N = 8SE +/- 0.05, N = 9SE +/- 0.06, N = 5SE +/- 0.13, N = 33.852.222.302.242.21MIN: 1.89 / MAX: 1093.29MIN: 1.83 / MAX: 2.54MIN: 2.15 / MAX: 2.58MIN: 2.07 / MAX: 6.02MIN: 2.01 / MAX: 3.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 0.97, N = 9SE +/- 0.09, N = 9SE +/- 0.06, N = 9SE +/- 9.20, N = 6SE +/- 0.07, N = 35.073.463.4913.873.36MIN: 3.22 / MAX: 1124.2MIN: 2.91 / MAX: 3.79MIN: 3.18 / MAX: 4.03MIN: 2.86 / MAX: 2218.7MIN: 3.21 / MAX: 3.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER0.1980.3960.5940.7920.99SE +/- 0.04, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 9SE +/- 0.03, N = 6SE +/- 0.05, N = 30.840.840.810.860.86MIN: 0.65 / MAX: 4.63MIN: 0.64 / MAX: 0.96MIN: 0.61 / MAX: 1.19MIN: 0.64 / MAX: 3.3MIN: 0.75 / MAX: 2.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3691215SE +/- 1.21, N = 9SE +/- 0.14, N = 9SE +/- 0.14, N = 9SE +/- 1.05, N = 6SE +/- 0.24, N = 311.046.065.877.496.25MIN: 5.28 / MAX: 1769.19MIN: 5.33 / MAX: 8.36MIN: 5.2 / MAX: 6.88MIN: 5.46 / MAX: 1242.73MIN: 5.84 / MAX: 6.861. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER306090120150SE +/- 29.60, N = 9SE +/- 19.29, N = 9SE +/- 11.81, N = 9SE +/- 22.21, N = 6SE +/- 0.19, N = 3117.8154.5432.05145.7221.76MIN: 17.16 / MAX: 647.67MIN: 17.54 / MAX: 646.66MIN: 17.34 / MAX: 644.35MIN: 17.46 / MAX: 648.88MIN: 21.34 / MAX: 23.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 3.49, N = 9SE +/- 3.20, N = 9SE +/- 1.33, N = 9SE +/- 6.10, N = 6SE +/- 0.08, N = 38.978.585.4717.414.64MIN: 3.94 / MAX: 922.04MIN: 3.98 / MAX: 912.04MIN: 3.95 / MAX: 726.67MIN: 4.05 / MAX: 900.27MIN: 4.46 / MAX: 7.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER48121620SE +/- 5.86, N = 9SE +/- 3.71, N = 9SE +/- 0.03, N = 9SE +/- 0.02, N = 6SE +/- 0.03, N = 316.179.333.743.694.38MIN: 3.52 / MAX: 436.52MIN: 3.5 / MAX: 430.03MIN: 3.61 / MAX: 3.98MIN: 3.59 / MAX: 7.37MIN: 4.29 / MAX: 6.181. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50NVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1020304050SE +/- 14.70, N = 9SE +/- 0.10, N = 9SE +/- 4.23, N = 9SE +/- 11.48, N = 6SE +/- 0.22, N = 346.268.2414.3227.778.58MIN: 7.71 / MAX: 1829.99MIN: 7.87 / MAX: 9.87MIN: 7.9 / MAX: 1787.49MIN: 7.77 / MAX: 1603.33MIN: 8.2 / MAX: 11.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER1428425670SE +/- 10.56, N = 9SE +/- 7.50, N = 9SE +/- 2.58, N = 9SE +/- 5.27, N = 6SE +/- 3.14, N = 363.8225.1116.4726.8514.26MIN: 10.28 / MAX: 858.44MIN: 10.66 / MAX: 857.35MIN: 10.61 / MAX: 826.68MIN: 10.35 / MAX: 853.14MIN: 10.89 / MAX: 673.371. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER246810SE +/- 1.76, N = 9SE +/- 0.17, N = 9SE +/- 0.29, N = 9SE +/- 1.57, N = 6SE +/- 0.54, N = 36.865.275.366.635.11MIN: 4.34 / MAX: 1630.01MIN: 4.53 / MAX: 7.53MIN: 4.55 / MAX: 496.3MIN: 4.43 / MAX: 1636.66MIN: 4.43 / MAX: 8.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3691215SE +/- 3.28, N = 9SE +/- 0.31, N = 8SE +/- 0.24, N = 9SE +/- 1.17, N = 6SE +/- 0.60, N = 311.116.505.978.066.77MIN: 5.49 / MAX: 4942.19MIN: 5.52 / MAX: 460.02MIN: 5.49 / MAX: 7.35MIN: 5.43 / MAX: 1922.26MIN: 5.54 / MAX: 7.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER2004006008001000SE +/- 87.53, N = 9SE +/- 61.31, N = 9SE +/- 25.65, N = 9SE +/- 76.74, N = 6SE +/- 136.49, N = 3844.61281.56390.18663.24571.53MIN: 46.34 / MAX: 1866.93MIN: 46.48 / MAX: 1913.33MIN: 46.49 / MAX: 1816.77MIN: 46.42 / MAX: 1833.21MIN: 48.2 / MAX: 1819.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetNVIDIA RTX 4070 SUPERNVIDIA RTX 4070NVIDIA RTX 4070 TINVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER246810SE +/- 0.29, N = 9SE +/- 0.10, N = 9SE +/- 0.12, N = 8SE +/- 2.14, N = 6SE +/- 0.26, N = 32.862.342.846.382.54MIN: 2.17 / MAX: 577.17MIN: 2 / MAX: 3.86MIN: 2.4 / MAX: 5.07MIN: 2.14 / MAX: 1476.09MIN: 2.14 / MAX: 4.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 36.15, N = 3SE +/- 38.13, N = 320263.1323883.53

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp32-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER7K14K21K28K35KSE +/- 1.51, N = 3SE +/- 43.84, N = 326563.7231591.71

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 34.71, N = 3SE +/- 34.93, N = 320080.4723825.05

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp16-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER10K20K30K40K50KSE +/- 69.71, N = 3SE +/- 76.15, N = 339771.9747192.56

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.03, N = 3SE +/- 0.01, N = 3638.70750.47

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20230730fp64-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER160320480640800SE +/- 0.02, N = 3SE +/- 0.00, N = 3638.72749.76

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 3.21, N = 3SE +/- 0.36, N = 320280.3323874.85

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int32-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 2.34, N = 3SE +/- 34.76, N = 319996.9223733.30

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-scalarNVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER3K6K9K12K15KSE +/- 0.21, N = 3SE +/- 22.68, N = 313259.9715859.37

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20230730int16-vec4NVIDIA RTX 3090NVIDIA RTX 4070 TI SUPER5K10K15K20K25KSE +/- 1.52, N = 3SE +/- 3.02, N = 316331.1621124.09


Phoronix Test Suite v10.8.4