onednn NVIDIA GH200 ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 c: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 d: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: NVIDIA GH200 480GB, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 22.04, Kernel: 6.5.0-1007-NVIDIA-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.89, Vulkan: 1.3.277, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1200 oneDNN 3.4 Harness: IP Shapes 1D - Engine: CPU ms < Lower Is Better a . 4.17093 |================================================================== b . 4.13326 |================================================================= c . 4.11456 |================================================================= d . 4.09690 |================================================================= oneDNN 3.4 Harness: IP Shapes 3D - Engine: CPU ms < Lower Is Better a . 1.83694 |================================================================== b . 1.83867 |================================================================== c . 1.82904 |================================================================== d . 1.83058 |================================================================== oneDNN 3.4 Harness: Convolution Batch Shapes Auto - Engine: CPU ms < Lower Is Better a . 4.71311 |================================================================== b . 4.72171 |================================================================== c . 4.70649 |================================================================== d . 4.71070 |================================================================== oneDNN 3.4 Harness: Deconvolution Batch shapes_1d - Engine: CPU ms < Lower Is Better a . 24.48 |==================================================================== b . 24.53 |==================================================================== c . 24.56 |==================================================================== d . 24.49 |==================================================================== oneDNN 3.4 Harness: Deconvolution Batch shapes_3d - Engine: CPU ms < Lower Is Better a . 6.43106 |================================================================== b . 6.43839 |================================================================== c . 6.44031 |================================================================== d . 6.44054 |================================================================== oneDNN 3.4 Harness: Recurrent Neural Network Training - Engine: CPU ms < Lower Is Better a . 3583.01 |================================================================== b . 3584.68 |================================================================== c . 3574.73 |================================================================= d . 3602.94 |================================================================== oneDNN 3.4 Harness: Recurrent Neural Network Inference - Engine: CPU ms < Lower Is Better a . 2283.74 |================================================================== b . 2287.55 |================================================================== c . 2286.80 |================================================================== d . 2280.18 |==================================================================