RTX 4070 SUPER Intel Core i9-13900K testing with a ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS) and ASUS NVIDIA GeForce RTX 4070 SUPER 12GB on EndeavourOS rolling via the Phoronix Test Suite. NVIDIA RTX 4070 SUPER: Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: ASUS NVIDIA GeForce RTX 4070 SUPER 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70 OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801, File-System: ext4, Screen Resolution: 1920x1080 RTX 4070 SUPER: Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: ASUS NVIDIA GeForce RTX 4070 SUPER 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70 OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801, File-System: ext4, Screen Resolution: 1920x1080 NVIDIA 4070 SUPER: Processor: Intel Core i9-13900K @ 5.50GHz (24 Cores / 32 Threads), Motherboard: ASUS TUF GAMING Z790-PRO WIFI (1401 BIOS), Chipset: Intel Device 7a27, Memory: 32GB, Disk: 4001GB Seagate ZP4000GP304001, Graphics: ASUS NVIDIA GeForce RTX 4070 SUPER 12GB, Audio: Realtek ALC1220, Monitor: ARZOPA, Network: Intel I226-V + Intel Device 7a70 OS: EndeavourOS rolling, Kernel: 6.7.1-arch1-1 (x86_64), Desktop: KDE Plasma 5.27.10, Display Server: X Server 1.21.1.11, Display Driver: NVIDIA 550.40.07, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.4.74, Compiler: GCC 13.2.1 20230801 + CUDA 12.3, File-System: ext4, Screen Resolution: 1920x1080 GpuOwl 7.2.1 Exponent: 77936867 Iterations / Second > Higher Is Better NVIDIA RTX 4070 SUPER . 646.41 |=============================================== GpuOwl 7.2.1 Exponent: 332220523 Iterations / Second > Higher Is Better NVIDIA RTX 4070 SUPER . 137.44 |=============================================== OctaneBench 2020.1 Total Score Score > Higher Is Better NVIDIA 4070 SUPER . 720.97 |=================================================== vkpeak 20230730 GFLOPS > Higher Is Better GpuOwl 7.2.1 Exponent: 57885161 Iterations / Second > Higher Is Better NVIDIA RTX 4070 SUPER . 869.07 |=============================================== FAHBench 2.3.2 Ns Per Day > Higher Is Better NVIDIA 4070 SUPER . 366.06 |=================================================== LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU M samples/sec > Higher Is Better NVIDIA 4070 SUPER . 12.82 |==================================================== LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU M samples/sec > Higher Is Better NVIDIA 4070 SUPER . 13.59 |==================================================== IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Bedroom M samples/s > Higher Is Better NVIDIA 4070 SUPER . 19.80 |==================================================== VkResample 1.0 Upscale: 2x - Precision: Double ms < Lower Is Better NVIDIA 4070 SUPER . 339.59 |=================================================== IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Supercar M samples/s > Higher Is Better NVIDIA 4070 SUPER . 52.81 |==================================================== LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU M samples/sec > Higher Is Better NVIDIA 4070 SUPER . 11.72 |==================================================== LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU M samples/sec > Higher Is Better NVIDIA 4070 SUPER . 10.56 |==================================================== Blender 4.0 Blend File: Barbershop - Compute: NVIDIA OptiX Seconds < Lower Is Better NVIDIA 4070 SUPER . 51.30 |==================================================== NAMD CUDA 2.14 ATPase Simulation - 327,506 Atoms days/ns < Lower Is Better NVIDIA 4070 SUPER . 0.06791 |================================================== Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA OptiX Seconds < Lower Is Better NVIDIA 4070 SUPER . 9.45 |===================================================== RealSR-NCNN 20200818 Scale: 4x - TAA: Yes Seconds < Lower Is Better NVIDIA 4070 SUPER . 34.89 |==================================================== RealSR-NCNN 20200818 Scale: 4x - TAA: No Seconds < Lower Is Better NVIDIA 4070 SUPER . 6.323 |==================================================== Blender 4.0 Blend File: BMW27 - Compute: NVIDIA OptiX Seconds < Lower Is Better NVIDIA 4070 SUPER . 5.57 |===================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 122 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 115 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 117 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 119 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T GB/s > Higher Is Better NVIDIA 4070 SUPER . 109 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N GB/s > Higher Is Better NVIDIA 4070 SUPER . 102 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - dDOT GB/s > Higher Is Better NVIDIA 4070 SUPER . 96.8 |===================================================== ViennaCL 1.7.1 Test: CPU BLAS - dAXPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 87.2 |===================================================== ViennaCL 1.7.1 Test: CPU BLAS - dCOPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 70.8 |===================================================== ViennaCL 1.7.1 Test: CPU BLAS - sDOT GB/s > Higher Is Better NVIDIA 4070 SUPER . 165 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - sAXPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 156 |====================================================== ViennaCL 1.7.1 Test: CPU BLAS - sCOPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 132 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 613 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 599 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 584 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN GFLOPs/s > Higher Is Better NVIDIA 4070 SUPER . 577 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T GB/s > Higher Is Better NVIDIA 4070 SUPER . 389 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N GB/s > Higher Is Better NVIDIA 4070 SUPER . 210 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT GB/s > Higher Is Better NVIDIA 4070 SUPER . 458 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 437 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 423 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT GB/s > Higher Is Better NVIDIA 4070 SUPER . 370 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 392 |====================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY GB/s > Higher Is Better NVIDIA 4070 SUPER . 334 |====================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l batches/sec > Higher Is Better RTX 4070 SUPER . 103.17 |====================================================== Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX Seconds < Lower Is Better NVIDIA 4070 SUPER . 14.29 |==================================================== Blender 4.0 Blend File: Classroom - Compute: NVIDIA OptiX Seconds < Lower Is Better NVIDIA 4070 SUPER . 12.60 |==================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write GB/s > Higher Is Better NVIDIA RTX 4070 SUPER . 455.01 |=============================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read GB/s > Higher Is Better NVIDIA RTX 4070 SUPER . 464.86 |=============================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute TIOPs/s > Higher Is Better NVIDIA RTX 4070 SUPER . 14.31 |================================================ ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute TIOPs/s > Higher Is Better NVIDIA RTX 4070 SUPER . 17.17 |================================================ ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute TIOPs/s > Higher Is Better NVIDIA RTX 4070 SUPER . 19.89 |================================================ ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute TIOPs/s > Higher Is Better NVIDIA RTX 4070 SUPER . 4.214 |================================================ ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute TFLOPs/s > Higher Is Better NVIDIA RTX 4070 SUPER . 38.59 |================================================ ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute TFLOPs/s > Higher Is Better NVIDIA RTX 4070 SUPER . 0.621 |================================================ PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 batches/sec > Higher Is Better RTX 4070 SUPER . 194.58 |====================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l batches/sec > Higher Is Better RTX 4070 SUPER . 102.60 |====================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l batches/sec > Higher Is Better RTX 4070 SUPER . 103.57 |====================================================== VkResample 1.0 Upscale: 2x - Precision: Single ms < Lower Is Better NVIDIA 4070 SUPER . 18.49 |==================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 batches/sec > Higher Is Better RTX 4070 SUPER . 507.45 |====================================================== clpeak 1.1.2 OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better NVIDIA 4070 SUPER . 630.11 |=================================================== LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU M samples/sec > Higher Is Better NVIDIA 4070 SUPER . 27.67 |==================================================== Hashcat 6.2.4 Benchmark: SHA-512 H/s > Higher Is Better NVIDIA 4070 SUPER . 3232733333 |=============================================== Hashcat 6.2.4 Benchmark: SHA1 H/s > Higher Is Better NVIDIA 4070 SUPER . 22132600000 |============================================== Hashcat 6.2.4 Benchmark: MD5 H/s > Higher Is Better NVIDIA 4070 SUPER . 67583033333 |============================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l batches/sec > Higher Is Better RTX 4070 SUPER . 106.37 |====================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 batches/sec > Higher Is Better RTX 4070 SUPER . 195.39 |====================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 batches/sec > Higher Is Better RTX 4070 SUPER . 504.27 |====================================================== Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS H/s > Higher Is Better NVIDIA 4070 SUPER . 802967 |=================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 batches/sec > Higher Is Better RTX 4070 SUPER . 201.94 |====================================================== Rodinia 3.1 Test: OpenCL Particle Filter Seconds < Lower Is Better NVIDIA 4070 SUPER . 3.480 |==================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 batches/sec > Higher Is Better RTX 4070 SUPER . 509.45 |====================================================== cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better NVIDIA 4070 SUPER . 331.8 |==================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better NVIDIA 4070 SUPER . 446.2 |==================================================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better NVIDIA 4070 SUPER . 407.5 |==================================================== Hashcat 6.2.4 Benchmark: 7-Zip H/s > Higher Is Better NVIDIA 4070 SUPER . 1176467 |================================================== Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes Seconds < Lower Is Better NVIDIA 4070 SUPER . 2.855 |==================================================== FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL ms < Lower Is Better NVIDIA 4070 SUPER . 5.912 |==================================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l batches/sec > Higher Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better NVIDIA 4070 SUPER . 437.65 |=================================================== MandelGPU 1.3pts1 OpenCL Device: GPU Samples/sec > Higher Is Better NVIDIA 4070 SUPER . 587219538.2 |============================================== PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 batches/sec > Higher Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: No Seconds < Lower Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l batches/sec > Higher Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT GIOPS > Higher Is Better NVIDIA 4070 SUPER . 18170.54 |================================================= PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 batches/sec > Higher Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better NVIDIA 4070 SUPER . 35492.69 |================================================= NeatBench 5 Acceleration: GPU FPS > Higher Is Better NVIDIA 4070 SUPER . 4070 |===================================================== PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better GROMACS 2023 Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better LeelaChessZero 0.30 Backend: OpenCL Nodes Per Second > Higher Is Better Libplacebo 5.229.1 FPS > Higher Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision VkFFT 1.2.31 Test: FFT + iFFT R2C / C2R Benchmark Score > Higher Is Better NCNN 20230517 Target: Vulkan GPU ms < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 Milli-Seconds < Lower Is Better Betsy GPU Compressor 1.1 Beta Codec: ETC2 RGB - Quality: Highest Seconds < Lower Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein in single precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in half precision Benchmark Score > Higher Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 Milli-Seconds < Lower Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D VkFFT 1.2.31 Test: FFT + iFFT C2C multidimensional in single precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in double precision Benchmark Score > Higher Is Better Betsy GPU Compressor 1.1 Beta Codec: ETC1 - Quality: Highest Seconds < Lower Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein benchmark in double precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision Benchmark Score > Higher Is Better