ngc AMD Ryzen Threadripper PRO 7995WX 96-Cores testing with a HP 8B24 (U65 Ver. 01.01.04 BIOS) and NVIDIA RTX A4000 16GB on Ubuntu 23.10 via the Phoronix Test Suite. a: Processor: AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads), Motherboard: HP 8B24 (U65 Ver. 01.01.04 BIOS), Chipset: AMD Device 14a4, Memory: 128GB, Disk: 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1, Graphics: NVIDIA RTX A4000 16GB, Audio: NVIDIA GA104 HD Audio, Monitor: ASUS VP28U, Network: Realtek RTL8111/8168/8411 OS: Ubuntu 23.10, Kernel: 6.5.0-14-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160 vkpeak 20230730 fp32-scalar GFLOPS > Higher Is Better a . 11276.77 |================================================================= vkpeak 20230730 fp32-vec4 GFLOPS > Higher Is Better a . 14596.88 |================================================================= vkpeak 20230730 fp16-scalar GFLOPS > Higher Is Better a . 11037.49 |================================================================= vkpeak 20230730 fp16-vec4 GFLOPS > Higher Is Better a . 21947.85 |================================================================= vkpeak 20230730 fp64-scalar GFLOPS > Higher Is Better a . 356.21 |=================================================================== vkpeak 20230730 fp64-vec4 GFLOPS > Higher Is Better a . 356.28 |=================================================================== vkpeak 20230730 int32-scalar GIOPS > Higher Is Better a . 11249.11 |================================================================= vkpeak 20230730 int32-vec4 GIOPS > Higher Is Better a . 10999.52 |================================================================= vkpeak 20230730 int16-scalar GIOPS > Higher Is Better a . 7189.38 |================================================================== vkpeak 20230730 int16-vec4 GIOPS > Higher Is Better a . 8646.42 |================================================================== RealSR-NCNN 20200818 Scale: 4x - TAA: No Seconds < Lower Is Better a . 10.01 |==================================================================== RealSR-NCNN 20200818 Scale: 4x - TAA: Yes Seconds < Lower Is Better a . 59.85 |==================================================================== Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: No Seconds < Lower Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes Seconds < Lower Is Better a . 4.889 |==================================================================== VkFFT 1.2.31 Test: FFT + iFFT R2C / C2R Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in half precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein in single precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in double precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C multidimensional in single precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein benchmark in double precision Benchmark Score > Higher Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling Benchmark Score > Higher Is Better Hashcat 6.2.4 Benchmark: MD5 H/s > Higher Is Better a . 33124266667 |============================================================== Hashcat 6.2.4 Benchmark: SHA1 H/s > Higher Is Better a . 10834766667 |============================================================== Hashcat 6.2.4 Benchmark: 7-Zip H/s > Higher Is Better a . 518600 |=================================================================== Hashcat 6.2.4 Benchmark: SHA-512 H/s > Higher Is Better a . 1351300000 |=============================================================== Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS H/s > Higher Is Better a . 370460 |=================================================================== Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D GFLOPS > Higher Is Better a . 209.57 |=================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad GB/s > Higher Is Better a . 24.85 |==================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP GFLOPS > Higher Is Better a . 1093.92 |================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash GHash/s > Higher Is Better a . 22.15 |==================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction GB/s > Higher Is Better a . 324.05 |=================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N GFLOPS > Higher Is Better a . 3391.66 |================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops GFLOPS > Higher Is Better a . 20821.2 |================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download GB/s > Higher Is Better a . 26.83 |==================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback GB/s > Higher Is Better a . 27.08 |==================================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth GB/s > Higher Is Better a . 2005.17 |================================================================== Libplacebo 5.229.1 FPS > Higher Is Better cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better a . 273.0 |==================================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better a . 366.1 |==================================================================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better a . 348.6 |==================================================================== NAMD CUDA 2.14 ATPase Simulation - 327,506 Atoms days/ns < Lower Is Better a . 0.12721 |================================================================== Betsy GPU Compressor 1.1 Beta Codec: ETC1 - Quality: Highest Seconds < Lower Is Better Betsy GPU Compressor 1.1 Beta Codec: ETC2 RGB - Quality: Highest Seconds < Lower Is Better VkResample 1.0 Upscale: 2x - Precision: Double ms < Lower Is Better a . 500.01 |=================================================================== VkResample 1.0 Upscale: 2x - Precision: Single ms < Lower Is Better a . 22.10 |==================================================================== OctaneBench 2020.1 Total Score Score > Higher Is Better a . 358.15 |=================================================================== RedShift Demo 3.0 Seconds < Lower Is Better FAHBench 2.3.2 Ns Per Day > Higher Is Better a . 224.57 |=================================================================== clpeak 1.1.2 OpenCL Test: Integer Compute INT GIOPS > Higher Is Better a . 9116.63 |================================================================== clpeak 1.1.2 OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better a . 17983.79 |================================================================= clpeak 1.1.2 OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better a . 347.10 |=================================================================== clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better a . 361.29 |=================================================================== LeelaChessZero 0.30 Backend: OpenCL Nodes Per Second > Higher Is Better Rodinia 3.1 Test: OpenCL Particle Filter Seconds < Lower Is Better a . 6.554 |==================================================================== LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU M samples/sec > Higher Is Better a . 6.28 |===================================================================== LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU M samples/sec > Higher Is Better a . 5.02 |===================================================================== LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU M samples/sec > Higher Is Better a . 6.51 |===================================================================== LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU M samples/sec > Higher Is Better a . 6.12 |===================================================================== LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU M samples/sec > Higher Is Better a . 18.87 |==================================================================== ArrayFire 3.9 Test: Conjugate Gradient OpenCL ms < Lower Is Better a . 2.300 |==================================================================== FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL ms < Lower Is Better a . 10.63 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sCOPY GB/s > Higher Is Better a . 601 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sAXPY GB/s > Higher Is Better a . 792 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sDOT GB/s > Higher Is Better a . 261 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dCOPY GB/s > Higher Is Better a . 769 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dAXPY GB/s > Higher Is Better a . 1272 |===================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dDOT GB/s > Higher Is Better a . 558 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N GB/s > Higher Is Better a . 43.6 |===================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T GB/s > Higher Is Better a . 156.3 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN GFLOPs/s > Higher Is Better a . 127 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT GFLOPs/s > Higher Is Better a . 121 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN GFLOPs/s > Higher Is Better a . 134 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT GFLOPs/s > Higher Is Better a . 129 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY GB/s > Higher Is Better a . 259 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY GB/s > Higher Is Better a . 325 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT GB/s > Higher Is Better a . 298 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY GB/s > Higher Is Better a . 332 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY GB/s > Higher Is Better a . 364 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT GB/s > Higher Is Better a . 368 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N GB/s > Higher Is Better a . 163 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T GB/s > Higher Is Better a . 302 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN GFLOPs/s > Higher Is Better a . 322 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT GFLOPs/s > Higher Is Better a . 323 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN GFLOPs/s > Higher Is Better a . 322 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT GFLOPs/s > Higher Is Better a . 322 |====================================================================== GROMACS 2023 Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 Milli-Seconds < Lower Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet ms < Lower Is Better a . 20.74 |==================================================================== NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better a . 12.35 |==================================================================== NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better a . 13.71 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 ms < Lower Is Better a . 16.76 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: mnasnet ms < Lower Is Better a . 11.73 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 ms < Lower Is Better a . 16.93 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: blazeface ms < Lower Is Better a . 7.46 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: googlenet ms < Lower Is Better a . 25.46 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: vgg16 ms < Lower Is Better a . 35.86 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: resnet18 ms < Lower Is Better a . 12.54 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: alexnet ms < Lower Is Better a . 7.15 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: resnet50 ms < Lower Is Better a . 23.60 |==================================================================== NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 ms < Lower Is Better a . 20.74 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny ms < Lower Is Better a . 34.43 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd ms < Lower Is Better a . 26.63 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m ms < Lower Is Better a . 56.77 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer ms < Lower Is Better a . 44.17 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: FastestDet ms < Lower Is Better a . 18.23 |==================================================================== PlaidML FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL Examples Per Second > Higher Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA OptiX Seconds < Lower Is Better a . 11.32 |==================================================================== Blender 4.0 Blend File: Classroom - Compute: NVIDIA OptiX Seconds < Lower Is Better a . 28.80 |==================================================================== Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA OptiX Seconds < Lower Is Better a . 20.25 |==================================================================== Blender 4.0 Blend File: Barbershop - Compute: NVIDIA OptiX Seconds < Lower Is Better a . 100.99 |=================================================================== Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX Seconds < Lower Is Better a . 32.20 |==================================================================== IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Bedroom M samples/s > Higher Is Better a . 11.20 |==================================================================== IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Supercar M samples/s > Higher Is Better a . 31.77 |==================================================================== MandelGPU 1.3pts1 OpenCL Device: GPU Samples/sec > Higher Is Better a . 346801977.5 |============================================================== NeatBench 5 Acceleration: GPU FPS > Higher Is Better Chaos Group V-RAY 5.02 Mode: NVIDIA RTX GPU vrays > Higher Is Better a . 1790 |===================================================================== Chaos Group V-RAY 5.02 Mode: NVIDIA CUDA GPU vpaths > Higher Is Better a . 1182 |=====================================================================