onnx gh200 ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.5.0-15-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.5.0-15-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 c: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.5.0-15-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 d: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.5.0-15-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 8.36597 |================================================================ b . 8.30820 |=============================================================== c . 8.65640 |================================================================== d . 8.34625 |================================================================ ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 119.53 |=================================================================== b . 120.42 |=================================================================== c . 115.52 |================================================================ d . 119.81 |=================================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 11.07 |================================================================== b . 11.19 |=================================================================== c . 11.37 |==================================================================== d . 11.41 |==================================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 90.35 |==================================================================== b . 89.40 |=================================================================== c . 87.95 |================================================================== d . 87.63 |================================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 75.24 |================================================================= b . 74.16 |================================================================ c . 76.26 |================================================================== d . 78.59 |==================================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 13.29 |=================================================================== b . 13.48 |==================================================================== c . 13.11 |================================================================== d . 12.72 |================================================================ ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 367.96 |================================================================== b . 374.21 |=================================================================== c . 374.20 |=================================================================== d . 344.56 |============================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 2.71650 |============================================================== b . 2.67306 |============================================================= c . 2.66641 |============================================================= d . 2.89589 |================================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 548.97 |=================================================================== b . 532.26 |================================================================= c . 539.31 |================================================================== d . 536.92 |================================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 1.82009 |================================================================ b . 1.87709 |================================================================== c . 1.85241 |================================================================= d . 1.86068 |================================================================= ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 1112.69 |=============================================================== b . 1125.55 |================================================================ c . 1164.06 |================================================================== d . 1154.69 |================================================================= ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 0.897753 |================================================================= b . 0.887622 |================================================================ c . 0.858087 |============================================================== d . 0.865053 |=============================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 1.47621 |================================================================ b . 1.47745 |================================================================ c . 1.52538 |================================================================== d . 1.48956 |================================================================ ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 677.75 |=================================================================== b . 676.87 |=================================================================== c . 655.57 |================================================================= d . 671.34 |================================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 1.48485 |================================================================== b . 1.49458 |================================================================== c . 1.46640 |================================================================= d . 1.48493 |================================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 673.47 |================================================================== b . 669.13 |================================================================== c . 681.94 |=================================================================== d . 673.42 |================================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 241.43 |================================================================== b . 241.10 |================================================================= c . 246.94 |=================================================================== d . 245.09 |================================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 4.14119 |================================================================== b . 4.14668 |================================================================== c . 4.04811 |================================================================ d . 4.07887 |================================================================= ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 272.68 |=============================================================== b . 277.49 |================================================================ c . 290.32 |=================================================================== d . 284.01 |================================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 3.66668 |================================================================== b . 3.60302 |================================================================= c . 3.44312 |============================================================== d . 3.51969 |=============================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 43.37 |==================================================================== b . 42.77 |=================================================================== c . 41.69 |================================================================= d . 42.86 |=================================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 23.06 |================================================================= b . 23.38 |================================================================== c . 23.98 |==================================================================== d . 23.33 |================================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 149.55 |=================================================================== b . 148.71 |=================================================================== c . 148.30 |================================================================== d . 149.68 |=================================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 6.68539 |================================================================= b . 6.72220 |================================================================== c . 6.74173 |================================================================== d . 6.67817 |================================================================= ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better