ngc smoke run ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 c: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 d: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R Benchmark Score > Higher Is Better a . 42397 |=================================================================== b . 41809 |================================================================== c . 42581 |=================================================================== d . 43048 |==================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision Benchmark Score > Higher Is Better a . 151912 |=================================================================== b . 151910 |=================================================================== c . 152866 |=================================================================== d . 151969 |=================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision Benchmark Score > Higher Is Better a . 17867 |==================================================================== b . 17967 |==================================================================== c . 17886 |==================================================================== d . 17942 |==================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision Benchmark Score > Higher Is Better a . 58405 |==================================================================== b . 58253 |==================================================================== c . 58256 |==================================================================== d . 58299 |==================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision Benchmark Score > Higher Is Better a . 185774 |================================================================= b . 186082 |================================================================== c . 189944 |=================================================================== d . 190310 |=================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision Benchmark Score > Higher Is Better a . 44489 |=================================================================== b . 43731 |================================================================== c . 45071 |==================================================================== d . 45007 |==================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision Benchmark Score > Higher Is Better a . 20810 |================================================================== b . 21000 |=================================================================== c . 21094 |=================================================================== d . 21320 |==================================================================== VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling Benchmark Score > Higher Is Better a . 194497 |=================================================================== b . 190037 |================================================================= c . 190909 |================================================================== d . 192507 |================================================================== cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better a . 308.6 |==================================================================== b . 308.5 |==================================================================== c . 308.6 |==================================================================== d . 308.5 |==================================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better a . 1045.9 |=================================================================== b . 1045.9 |=================================================================== c . 1046.1 |=================================================================== d . 1046.0 |=================================================================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better a . 2354.9 |=================================================================== b . 2353.4 |=================================================================== c . 2354.9 |=================================================================== d . 2352.1 |=================================================================== VkResample 1.0 Upscale: 2x - Precision: Double ms < Lower Is Better a . 24.30 |==================================================================== b . 24.29 |==================================================================== c . 24.29 |==================================================================== d . 24.30 |==================================================================== VkResample 1.0 Upscale: 2x - Precision: Single ms < Lower Is Better a . 5.230 |==================================================================== b . 5.231 |==================================================================== c . 5.230 |==================================================================== d . 5.230 |==================================================================== clpeak 1.1.2 OpenCL Test: Integer Compute INT GIOPS > Higher Is Better a . 33119.10 |================================================================= b . 33144.74 |================================================================= c . 33146.12 |================================================================= d . 33129.34 |================================================================= clpeak 1.1.2 OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better a . 64545.62 |================================================================= b . 64547.74 |================================================================= c . 64547.25 |================================================================= d . 64520.97 |================================================================= clpeak 1.1.2 OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better a . 32959.17 |================================================================= b . 32961.21 |================================================================= c . 32941.99 |================================================================= d . 32933.63 |================================================================= clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better a . 3483.99 |================================================================== b . 3484.06 |================================================================== c . 3483.95 |================================================================== d . 3484.32 |================================================================== ArrayFire 3.9 Test: Conjugate Gradient OpenCL ms < Lower Is Better a . 2.997 |==================================================================== b . 2.983 |==================================================================== c . 2.998 |==================================================================== d . 2.997 |==================================================================== FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL ms < Lower Is Better a . 4.347 |==================================================================== b . 4.373 |==================================================================== c . 4.351 |==================================================================== d . 4.339 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sCOPY GB/s > Higher Is Better a . 2920 |===================================================================== b . 2892 |==================================================================== c . 2907 |===================================================================== d . 2857 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sAXPY GB/s > Higher Is Better a . 3943 |===================================================================== b . 3924 |===================================================================== c . 3917 |===================================================================== d . 3920 |===================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sDOT GB/s > Higher Is Better a . 667 |====================================================================== b . 664 |====================================================================== c . 666 |====================================================================== d . 663 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dCOPY GB/s > Higher Is Better a . 2027 |===================================================================== b . 1948 |================================================================== c . 1920 |================================================================= d . 1917 |================================================================= ViennaCL 1.7.1 Test: CPU BLAS - dAXPY GB/s > Higher Is Better a . 1803 |==================================================================== b . 1806 |==================================================================== c . 1837 |===================================================================== d . 1830 |===================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dDOT GB/s > Higher Is Better a . 1247 |===================================================================== b . 1238 |===================================================================== c . 1243 |===================================================================== d . 1247 |===================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N GB/s > Higher Is Better a . 411 |===================================================================== b . 408 |==================================================================== c . 405 |==================================================================== d . 418 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T GB/s > Higher Is Better a . 686 |===================================================================== b . 699 |====================================================================== c . 691 |===================================================================== d . 696 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN GFLOPs/s > Higher Is Better a . 135 |=================================================================== b . 137 |==================================================================== c . 139 |===================================================================== d . 141 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT GFLOPs/s > Higher Is Better a . 125 |====================================================================== b . 125 |====================================================================== c . 124 |===================================================================== d . 124 |===================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN GFLOPs/s > Higher Is Better a . 141 |====================================================================== b . 140 |====================================================================== c . 141 |====================================================================== d . 140 |====================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT GFLOPs/s > Higher Is Better a . 137 |===================================================================== b . 138 |===================================================================== c . 140 |====================================================================== d . 136 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY GB/s > Higher Is Better a . 316 |====================================================================== b . 316 |====================================================================== c . 316 |====================================================================== d . 316 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY GB/s > Higher Is Better a . 420 |===================================================================== b . 427 |====================================================================== c . 427 |====================================================================== d . 426 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT GB/s > Higher Is Better a . 282 |====================================================================== b . 282 |====================================================================== c . 283 |====================================================================== d . 283 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY GB/s > Higher Is Better a . 603 |====================================================================== b . 604 |====================================================================== c . 604 |====================================================================== d . 604 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY GB/s > Higher Is Better a . 799 |====================================================================== b . 798 |====================================================================== c . 799 |====================================================================== d . 799 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT GB/s > Higher Is Better a . 550 |====================================================================== b . 552 |====================================================================== c . 552 |====================================================================== d . 553 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N GB/s > Higher Is Better a . 81.2 |===================================================================== b . 81.5 |===================================================================== c . 81.2 |===================================================================== d . 81.4 |===================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T GB/s > Higher Is Better a . 308 |====================================================================== b . 308 |====================================================================== c . 308 |====================================================================== d . 307 |====================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN GFLOPs/s > Higher Is Better a . 7057 |===================================================================== b . 7093 |===================================================================== c . 7037 |==================================================================== d . 7053 |===================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT GFLOPs/s > Higher Is Better a . 7527 |===================================================================== b . 7537 |===================================================================== c . 7537 |===================================================================== d . 7540 |===================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN GFLOPs/s > Higher Is Better a . 7027 |===================================================================== b . 7067 |===================================================================== c . 7000 |==================================================================== d . 7070 |===================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT GFLOPs/s > Higher Is Better a . 7070 |===================================================================== b . 7070 |===================================================================== c . 7057 |===================================================================== d . 7070 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: mobilenet ms < Lower Is Better a . 4.89 |===================================================================== b . 4.92 |===================================================================== c . 4.92 |===================================================================== d . 4.91 |===================================================================== NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better a . 2.13 |==================================================================== b . 2.16 |===================================================================== c . 2.12 |==================================================================== d . 2.12 |==================================================================== NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better a . 2.26 |==================================================================== b . 2.27 |==================================================================== c . 2.30 |===================================================================== d . 2.27 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 ms < Lower Is Better a . 2.29 |===================================================================== b . 2.29 |===================================================================== c . 2.27 |==================================================================== d . 2.27 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: mnasnet ms < Lower Is Better a . 2.04 |===================================================================== b . 2.03 |===================================================================== c . 2.04 |===================================================================== d . 2.04 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 ms < Lower Is Better a . 3.49 |==================================================================== b . 3.55 |===================================================================== c . 3.52 |==================================================================== d . 3.53 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: blazeface ms < Lower Is Better a . 1.75 |==================================================================== b . 1.78 |===================================================================== c . 1.74 |=================================================================== d . 1.77 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: googlenet ms < Lower Is Better a . 4.16 |==================================================================== b . 4.23 |===================================================================== c . 4.23 |===================================================================== d . 4.21 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: vgg16 ms < Lower Is Better a . 5.26 |===================================================================== b . 5.25 |===================================================================== c . 5.25 |===================================================================== d . 5.26 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: resnet18 ms < Lower Is Better a . 2.16 |==================================================================== b . 2.18 |==================================================================== c . 2.17 |==================================================================== d . 2.20 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: alexnet ms < Lower Is Better a . 1.63 |==================================================================== b . 1.63 |==================================================================== c . 1.65 |===================================================================== d . 1.62 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: resnet50 ms < Lower Is Better a . 4.27 |==================================================================== b . 4.28 |==================================================================== c . 4.32 |===================================================================== d . 4.32 |===================================================================== NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 ms < Lower Is Better a . 4.89 |===================================================================== b . 4.92 |===================================================================== c . 4.92 |===================================================================== d . 4.91 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny ms < Lower Is Better a . 6.79 |===================================================================== b . 6.80 |===================================================================== c . 6.81 |===================================================================== d . 6.82 |===================================================================== NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd ms < Lower Is Better a . 5.43 |==================================================================== b . 5.47 |===================================================================== c . 5.48 |===================================================================== d . 5.43 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m ms < Lower Is Better a . 14.78 |================================================================== b . 14.74 |================================================================== c . 14.77 |================================================================== d . 15.22 |==================================================================== NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer ms < Lower Is Better a . 31.52 |================================================================== b . 32.32 |==================================================================== c . 31.92 |=================================================================== d . 31.13 |================================================================= NCNN 20230517 Target: Vulkan GPU - Model: FastestDet ms < Lower Is Better a . 3.09 |==================================================================== b . 3.10 |===================================================================== c . 3.08 |==================================================================== d . 3.12 |=====================================================================