ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2403013-NE-NGCSMOKER54
ngc smoke run
ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
a:
Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE
OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200
b:
Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE
OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200
c:
Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE
OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200
d:
Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE
OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200
VkFFT 1.3.4
Test: FFT + iFFT R2C / C2R
Benchmark Score > Higher Is Better
a . 42397 |===================================================================
b . 41809 |==================================================================
c . 42581 |===================================================================
d . 43048 |====================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C 1D batched in half precision
Benchmark Score > Higher Is Better
a . 151912 |===================================================================
b . 151910 |===================================================================
c . 152866 |===================================================================
d . 151969 |===================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C Bluestein in single precision
Benchmark Score > Higher Is Better
a . 17867 |====================================================================
b . 17967 |====================================================================
c . 17886 |====================================================================
d . 17942 |====================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C 1D batched in double precision
Benchmark Score > Higher Is Better
a . 58405 |====================================================================
b . 58253 |====================================================================
c . 58256 |====================================================================
d . 58299 |====================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C 1D batched in single precision
Benchmark Score > Higher Is Better
a . 185774 |=================================================================
b . 186082 |==================================================================
c . 189944 |===================================================================
d . 190310 |===================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C multidimensional in single precision
Benchmark Score > Higher Is Better
a . 44489 |===================================================================
b . 43731 |==================================================================
c . 45071 |====================================================================
d . 45007 |====================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C Bluestein benchmark in double precision
Benchmark Score > Higher Is Better
a . 20810 |==================================================================
b . 21000 |===================================================================
c . 21094 |===================================================================
d . 21320 |====================================================================
VkFFT 1.3.4
Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling
Benchmark Score > Higher Is Better
a . 194497 |===================================================================
b . 190037 |=================================================================
c . 190909 |==================================================================
d . 192507 |==================================================================
cl-mem 2017-01-13
Benchmark: Copy
GB/s > Higher Is Better
a . 308.6 |====================================================================
b . 308.5 |====================================================================
c . 308.6 |====================================================================
d . 308.5 |====================================================================
cl-mem 2017-01-13
Benchmark: Read
GB/s > Higher Is Better
a . 1045.9 |===================================================================
b . 1045.9 |===================================================================
c . 1046.1 |===================================================================
d . 1046.0 |===================================================================
cl-mem 2017-01-13
Benchmark: Write
GB/s > Higher Is Better
a . 2354.9 |===================================================================
b . 2353.4 |===================================================================
c . 2354.9 |===================================================================
d . 2352.1 |===================================================================
VkResample 1.0
Upscale: 2x - Precision: Double
ms < Lower Is Better
a . 24.30 |====================================================================
b . 24.29 |====================================================================
c . 24.29 |====================================================================
d . 24.30 |====================================================================
VkResample 1.0
Upscale: 2x - Precision: Single
ms < Lower Is Better
a . 5.230 |====================================================================
b . 5.231 |====================================================================
c . 5.230 |====================================================================
d . 5.230 |====================================================================
clpeak 1.1.2
OpenCL Test: Integer Compute INT
GIOPS > Higher Is Better
a . 33119.10 |=================================================================
b . 33144.74 |=================================================================
c . 33146.12 |=================================================================
d . 33129.34 |=================================================================
clpeak 1.1.2
OpenCL Test: Single-Precision Float
GFLOPS > Higher Is Better
a . 64545.62 |=================================================================
b . 64547.74 |=================================================================
c . 64547.25 |=================================================================
d . 64520.97 |=================================================================
clpeak 1.1.2
OpenCL Test: Double-Precision Double
GFLOPS > Higher Is Better
a . 32959.17 |=================================================================
b . 32961.21 |=================================================================
c . 32941.99 |=================================================================
d . 32933.63 |=================================================================
clpeak 1.1.2
OpenCL Test: Global Memory Bandwidth
GBPS > Higher Is Better
a . 3483.99 |==================================================================
b . 3484.06 |==================================================================
c . 3483.95 |==================================================================
d . 3484.32 |==================================================================
ArrayFire 3.9
Test: Conjugate Gradient OpenCL
ms < Lower Is Better
a . 2.997 |====================================================================
b . 2.983 |====================================================================
c . 2.998 |====================================================================
d . 2.997 |====================================================================
FinanceBench 2016-07-25
Benchmark: Black-Scholes OpenCL
ms < Lower Is Better
a . 4.347 |====================================================================
b . 4.373 |====================================================================
c . 4.351 |====================================================================
d . 4.339 |===================================================================
ViennaCL 1.7.1
Test: CPU BLAS - sCOPY
GB/s > Higher Is Better
a . 2920 |=====================================================================
b . 2892 |====================================================================
c . 2907 |=====================================================================
d . 2857 |====================================================================
ViennaCL 1.7.1
Test: CPU BLAS - sAXPY
GB/s > Higher Is Better
a . 3943 |=====================================================================
b . 3924 |=====================================================================
c . 3917 |=====================================================================
d . 3920 |=====================================================================
ViennaCL 1.7.1
Test: CPU BLAS - sDOT
GB/s > Higher Is Better
a . 667 |======================================================================
b . 664 |======================================================================
c . 666 |======================================================================
d . 663 |======================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dCOPY
GB/s > Higher Is Better
a . 2027 |=====================================================================
b . 1948 |==================================================================
c . 1920 |=================================================================
d . 1917 |=================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dAXPY
GB/s > Higher Is Better
a . 1803 |====================================================================
b . 1806 |====================================================================
c . 1837 |=====================================================================
d . 1830 |=====================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dDOT
GB/s > Higher Is Better
a . 1247 |=====================================================================
b . 1238 |=====================================================================
c . 1243 |=====================================================================
d . 1247 |=====================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dGEMV-N
GB/s > Higher Is Better
a . 411 |=====================================================================
b . 408 |====================================================================
c . 405 |====================================================================
d . 418 |======================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dGEMV-T
GB/s > Higher Is Better
a . 686 |=====================================================================
b . 699 |======================================================================
c . 691 |=====================================================================
d . 696 |======================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dGEMM-NN
GFLOPs/s > Higher Is Better
a . 135 |===================================================================
b . 137 |====================================================================
c . 139 |=====================================================================
d . 141 |======================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dGEMM-NT
GFLOPs/s > Higher Is Better
a . 125 |======================================================================
b . 125 |======================================================================
c . 124 |=====================================================================
d . 124 |=====================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dGEMM-TN
GFLOPs/s > Higher Is Better
a . 141 |======================================================================
b . 140 |======================================================================
c . 141 |======================================================================
d . 140 |======================================================================
ViennaCL 1.7.1
Test: CPU BLAS - dGEMM-TT
GFLOPs/s > Higher Is Better
a . 137 |=====================================================================
b . 138 |=====================================================================
c . 140 |======================================================================
d . 136 |====================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - sCOPY
GB/s > Higher Is Better
a . 316 |======================================================================
b . 316 |======================================================================
c . 316 |======================================================================
d . 316 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - sAXPY
GB/s > Higher Is Better
a . 420 |=====================================================================
b . 427 |======================================================================
c . 427 |======================================================================
d . 426 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - sDOT
GB/s > Higher Is Better
a . 282 |======================================================================
b . 282 |======================================================================
c . 283 |======================================================================
d . 283 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dCOPY
GB/s > Higher Is Better
a . 603 |======================================================================
b . 604 |======================================================================
c . 604 |======================================================================
d . 604 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dAXPY
GB/s > Higher Is Better
a . 799 |======================================================================
b . 798 |======================================================================
c . 799 |======================================================================
d . 799 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dDOT
GB/s > Higher Is Better
a . 550 |======================================================================
b . 552 |======================================================================
c . 552 |======================================================================
d . 553 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dGEMV-N
GB/s > Higher Is Better
a . 81.2 |=====================================================================
b . 81.5 |=====================================================================
c . 81.2 |=====================================================================
d . 81.4 |=====================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dGEMV-T
GB/s > Higher Is Better
a . 308 |======================================================================
b . 308 |======================================================================
c . 308 |======================================================================
d . 307 |======================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dGEMM-NN
GFLOPs/s > Higher Is Better
a . 7057 |=====================================================================
b . 7093 |=====================================================================
c . 7037 |====================================================================
d . 7053 |=====================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dGEMM-NT
GFLOPs/s > Higher Is Better
a . 7527 |=====================================================================
b . 7537 |=====================================================================
c . 7537 |=====================================================================
d . 7540 |=====================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dGEMM-TN
GFLOPs/s > Higher Is Better
a . 7027 |=====================================================================
b . 7067 |=====================================================================
c . 7000 |====================================================================
d . 7070 |=====================================================================
ViennaCL 1.7.1
Test: OpenCL BLAS - dGEMM-TT
GFLOPs/s > Higher Is Better
a . 7070 |=====================================================================
b . 7070 |=====================================================================
c . 7057 |=====================================================================
d . 7070 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: mobilenet
ms < Lower Is Better
a . 4.89 |=====================================================================
b . 4.92 |=====================================================================
c . 4.92 |=====================================================================
d . 4.91 |=====================================================================
NCNN 20230517
Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2
ms < Lower Is Better
a . 2.13 |====================================================================
b . 2.16 |=====================================================================
c . 2.12 |====================================================================
d . 2.12 |====================================================================
NCNN 20230517
Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3
ms < Lower Is Better
a . 2.26 |====================================================================
b . 2.27 |====================================================================
c . 2.30 |=====================================================================
d . 2.27 |====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: shufflenet-v2
ms < Lower Is Better
a . 2.29 |=====================================================================
b . 2.29 |=====================================================================
c . 2.27 |====================================================================
d . 2.27 |====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: mnasnet
ms < Lower Is Better
a . 2.04 |=====================================================================
b . 2.03 |=====================================================================
c . 2.04 |=====================================================================
d . 2.04 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: efficientnet-b0
ms < Lower Is Better
a . 3.49 |====================================================================
b . 3.55 |=====================================================================
c . 3.52 |====================================================================
d . 3.53 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: blazeface
ms < Lower Is Better
a . 1.75 |====================================================================
b . 1.78 |=====================================================================
c . 1.74 |===================================================================
d . 1.77 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: googlenet
ms < Lower Is Better
a . 4.16 |====================================================================
b . 4.23 |=====================================================================
c . 4.23 |=====================================================================
d . 4.21 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: vgg16
ms < Lower Is Better
a . 5.26 |=====================================================================
b . 5.25 |=====================================================================
c . 5.25 |=====================================================================
d . 5.26 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: resnet18
ms < Lower Is Better
a . 2.16 |====================================================================
b . 2.18 |====================================================================
c . 2.17 |====================================================================
d . 2.20 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: alexnet
ms < Lower Is Better
a . 1.63 |====================================================================
b . 1.63 |====================================================================
c . 1.65 |=====================================================================
d . 1.62 |====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: resnet50
ms < Lower Is Better
a . 4.27 |====================================================================
b . 4.28 |====================================================================
c . 4.32 |=====================================================================
d . 4.32 |=====================================================================
NCNN 20230517
Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3
ms < Lower Is Better
a . 4.89 |=====================================================================
b . 4.92 |=====================================================================
c . 4.92 |=====================================================================
d . 4.91 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: yolov4-tiny
ms < Lower Is Better
a . 6.79 |=====================================================================
b . 6.80 |=====================================================================
c . 6.81 |=====================================================================
d . 6.82 |=====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: squeezenet_ssd
ms < Lower Is Better
a . 5.43 |====================================================================
b . 5.47 |=====================================================================
c . 5.48 |=====================================================================
d . 5.43 |====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: regnety_400m
ms < Lower Is Better
a . 14.78 |==================================================================
b . 14.74 |==================================================================
c . 14.77 |==================================================================
d . 15.22 |====================================================================
NCNN 20230517
Target: Vulkan GPU - Model: vision_transformer
ms < Lower Is Better
a . 31.52 |==================================================================
b . 32.32 |====================================================================
c . 31.92 |===================================================================
d . 31.13 |=================================================================
NCNN 20230517
Target: Vulkan GPU - Model: FastestDet
ms < Lower Is Better
a . 3.09 |====================================================================
b . 3.10 |=====================================================================
c . 3.08 |====================================================================
d . 3.12 |=====================================================================