opencl benchmark smoke test ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 c: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 d: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 e: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 f: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute TFLOPs/s > Higher Is Better a . 31.02 |==================================================================== b . 31.00 |==================================================================== c . 30.99 |==================================================================== d . 31.00 |==================================================================== e . 31.00 |==================================================================== f . 31.02 |==================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute TFLOPs/s > Higher Is Better a . 62.97 |==================================================================== b . 62.96 |==================================================================== c . 62.98 |==================================================================== d . 62.95 |==================================================================== e . 62.99 |==================================================================== f . 62.95 |==================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute TIOPs/s > Higher Is Better a . 3.248 |==================================================================== b . 3.248 |==================================================================== c . 3.248 |==================================================================== d . 3.245 |==================================================================== e . 3.247 |==================================================================== f . 3.247 |==================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute TIOPs/s > Higher Is Better a . 32.99 |==================================================================== b . 32.99 |==================================================================== c . 32.99 |==================================================================== d . 33.00 |==================================================================== e . 32.98 |==================================================================== f . 32.99 |==================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute TIOPs/s > Higher Is Better a . 30.94 |==================================================================== b . 30.93 |==================================================================== c . 30.93 |==================================================================== d . 30.93 |==================================================================== e . 30.95 |==================================================================== f . 30.93 |==================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute TIOPs/s > Higher Is Better a . 30.62 |==================================================================== b . 30.62 |==================================================================== c . 30.63 |==================================================================== d . 30.62 |==================================================================== e . 30.63 |==================================================================== f . 30.63 |==================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read GB/s > Higher Is Better a . 3614.32 |================================================================== b . 3616.79 |================================================================== c . 3617.16 |================================================================== d . 3616.23 |================================================================== e . 3615.04 |================================================================== f . 3612.22 |================================================================== ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write GB/s > Higher Is Better a . 3688.04 |================================================================= b . 3733.62 |================================================================== c . 3707.63 |================================================================== d . 3684.02 |================================================================= e . 3690.92 |================================================================= f . 3731.96 |================================================================== cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better a . 308.4 |==================================================================== b . 308.5 |==================================================================== c . 308.6 |==================================================================== d . 308.5 |==================================================================== e . 308.5 |==================================================================== f . 308.5 |==================================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better a . 1045.3 |=================================================================== b . 1046.0 |=================================================================== c . 1045.9 |=================================================================== d . 1046.3 |=================================================================== e . 1046.1 |=================================================================== f . 1046.2 |=================================================================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better a . 2359.3 |=================================================================== b . 2362.6 |=================================================================== c . 2358.1 |=================================================================== d . 2361.9 |=================================================================== e . 2360.2 |=================================================================== f . 2361.6 |=================================================================== clpeak 1.1.2 OpenCL Test: Kernel Latency us < Lower Is Better a . 4.79 |===================================================================== b . 4.78 |===================================================================== c . 4.77 |===================================================================== d . 4.79 |===================================================================== e . 4.80 |===================================================================== f . 4.78 |===================================================================== clpeak 1.1.2 OpenCL Test: Integer Compute GIOPS > Higher Is Better a . 33143.19 |================================================================= b . 33132.95 |================================================================= c . 33144.05 |================================================================= d . 33118.50 |================================================================= e . 33116.43 |================================================================= f . 33144.05 |================================================================= clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute GIOPS > Higher Is Better a . 33101.40 |================================================================= b . 33102.77 |================================================================= c . 33102.51 |================================================================= d . 33103.80 |================================================================= e . 33101.22 |================================================================= f . 33102.00 |================================================================= clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better a . 3485.68 |================================================================== b . 3486.17 |================================================================== c . 3484.48 |================================================================== d . 3486.17 |================================================================== e . 3485.50 |================================================================== f . 3486.06 |================================================================== clpeak 1.1.2 OpenCL Test: Double-Precision Compute GFLOPS > Higher Is Better a . 32963.76 |================================================================= b . 32959.84 |================================================================= c . 32960.86 |================================================================= d . 32960.36 |================================================================= e . 32948.10 |================================================================= f . 32958.31 |================================================================= clpeak 1.1.2 OpenCL Test: Single-Precision Compute GFLOPS > Higher Is Better a . 64543.49 |================================================================= b . 64543.33 |================================================================= c . 64546.27 |================================================================= d . 64546.76 |================================================================= e . 64538.91 |================================================================= f . 64541.86 |================================================================= clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer GBPS > Higher Is Better a . 295.59 |=================================================================== b . 295.58 |=================================================================== c . 295.51 |=================================================================== d . 295.33 |=================================================================== e . 295.22 |=================================================================== f . 295.40 |=================================================================== clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer GBPS > Higher Is Better a . 379.83 |=================================================================== b . 379.79 |=================================================================== c . 379.65 |=================================================================== d . 379.55 |=================================================================== e . 379.32 |=================================================================== f . 379.43 |=================================================================== ArrayFire 3.9 Test: Conjugate Gradient OpenCL ms < Lower Is Better a . 2.998 |==================================================================== b . 3.002 |==================================================================== c . 3.001 |==================================================================== d . 2.987 |==================================================================== e . 2.990 |==================================================================== f . 2.986 |==================================================================== FinanceBench 2016-07-25 Benchmark: Monte-Carlo OpenCL ms < Lower Is Better a . 58.97 |=================================================================== b . 59.19 |=================================================================== c . 59.51 |==================================================================== d . 59.05 |=================================================================== e . 58.95 |=================================================================== f . 59.89 |==================================================================== FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL ms < Lower Is Better a . 4.318 |==================================================================== b . 4.328 |==================================================================== c . 4.317 |==================================================================== d . 4.305 |==================================================================== e . 4.330 |==================================================================== f . 4.293 |=================================================================== Blender Blend File: BMW27 - Compute: CUDA Seconds < Lower Is Better a . 30.77 |==================================================================== f . 30.82 |====================================================================