onednn onnx threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite. A: Processor: AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads), Motherboard: Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS), Chipset: AMD Starship/Matisse, Memory: 128GB, Disk: Samsung SSD 970 EVO Plus 500GB, Graphics: AMD Radeon RX 5700 8GB (1750/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: DELL P2415Q, Network: Intel I211 + Intel Wi-Fi 6 AX200 OS: Pop 21.10, Kernel: 5.17.0-rc1-sched-core-phx (x86_64), Desktop: GNOME Shell 40.5, Display Server: X Server, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.1), Vulkan: 1.2.182, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 3840x2160 B: Processor: AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads), Motherboard: Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS), Chipset: AMD Starship/Matisse, Memory: 128GB, Disk: Samsung SSD 970 EVO Plus 500GB, Graphics: AMD Radeon RX 5700 8GB (1750/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: DELL P2415Q, Network: Intel I211 + Intel Wi-Fi 6 AX200 OS: Pop 21.10, Kernel: 5.17.0-rc1-sched-core-phx (x86_64), Desktop: GNOME Shell 40.5, Display Server: X Server, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.1), Vulkan: 1.2.182, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 3840x2160 C: Processor: AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads), Motherboard: Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS), Chipset: AMD Starship/Matisse, Memory: 128GB, Disk: Samsung SSD 970 EVO Plus 500GB, Graphics: AMD Radeon RX 5700 8GB (1750/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: DELL P2415Q, Network: Intel I211 + Intel Wi-Fi 6 AX200 OS: Pop 21.10, Kernel: 5.17.0-rc1-sched-core-phx (x86_64), Desktop: GNOME Shell 40.5, Display Server: X Server, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.1), Vulkan: 1.2.182, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 3840x2160 D: Processor: AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads), Motherboard: Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS), Chipset: AMD Starship/Matisse, Memory: 128GB, Disk: Samsung SSD 970 EVO Plus 500GB, Graphics: AMD Radeon RX 5700 8GB (1750/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: DELL P2415Q, Network: Intel I211 + Intel Wi-Fi 6 AX200 OS: Pop 21.10, Kernel: 5.17.0-rc1-sched-core-phx (x86_64), Desktop: GNOME Shell 40.5, Display Server: X Server, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.1), Vulkan: 1.2.182, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 3840x2160 ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Minute > Higher Is Better A . 531 |========================================================= B . 647 |====================================================================== C . 646 |====================================================================== D . 642 |===================================================================== oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 5.54387 |========================================================= B . 6.27072 |================================================================= C . 6.28663 |================================================================= D . 6.40806 |================================================================== ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Minute > Higher Is Better A . 4219 |============================================================ B . 4710 |=================================================================== C . 4823 |===================================================================== D . 4441 |================================================================ oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 0.941266 |================================================================= B . 0.910056 |=============================================================== C . 0.904110 |============================================================== D . 0.928404 |================================================================ oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 1260.20 |================================================================== B . 1251.24 |================================================================== C . 1221.12 |================================================================ D . 1211.44 |=============================================================== oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 6.68005 |================================================================ B . 6.82166 |================================================================= C . 6.85039 |================================================================= D . 6.90607 |================================================================== oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better A . 5028.06 |================================================================== B . 4997.82 |================================================================== C . 5011.49 |================================================================== D . 4884.90 |================================================================ oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 1.13774 |================================================================== B . 1.11774 |================================================================= C . 1.11927 |================================================================= D . 1.10705 |================================================================ ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Minute > Higher Is Better A . 995 |==================================================================== B . 1010 |===================================================================== C . 1017 |===================================================================== D . 991 |=================================================================== ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Minute > Higher Is Better A . 153 |==================================================================== B . 156 |====================================================================== C . 153 |==================================================================== D . 157 |====================================================================== ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Minute > Higher Is Better A . 424 |===================================================================== B . 425 |===================================================================== C . 432 |====================================================================== D . 421 |==================================================================== ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Minute > Higher Is Better A . 82 |======================================================================= B . 80 |===================================================================== C . 81 |====================================================================== D . 81 |====================================================================== oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 4959.96 |================================================================= B . 5003.99 |================================================================== C . 4964.59 |================================================================= D . 4882.28 |================================================================ ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Minute > Higher Is Better A . 293 |==================================================================== B . 293 |==================================================================== C . 295 |===================================================================== D . 300 |====================================================================== ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Minute > Higher Is Better A . 3815 |===================================================================== B . 3780 |==================================================================== C . 3784 |==================================================================== D . 3731 |=================================================================== ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Minute > Higher Is Better A . 3461 |==================================================================== B . 3512 |===================================================================== C . 3529 |===================================================================== D . 3495 |==================================================================== oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 1236.75 |================================================================== B . 1246.07 |================================================================== C . 1238.60 |================================================================== D . 1223.53 |================================================================= oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 4954.34 |================================================================= B . 5034.83 |================================================================== C . 5003.38 |================================================================== D . 4950.97 |================================================================= ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Minute > Higher Is Better A . 1088 |===================================================================== B . 1072 |==================================================================== C . 1079 |==================================================================== D . 1079 |==================================================================== oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 0.979135 |================================================================ B . 0.992713 |================================================================= C . 0.987020 |================================================================= D . 0.984819 |================================================================ oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 6.39430 |================================================================= B . 6.43330 |================================================================== C . 6.44689 |================================================================== D . 6.44330 |================================================================== oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 2.10617 |================================================================== B . 2.11025 |================================================================== C . 2.11212 |================================================================== D . 2.11146 |================================================================== ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Minute > Higher Is Better A . 361 |====================================================================== B . 362 |====================================================================== C . 362 |====================================================================== D . 361 |====================================================================== ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Minute > Higher Is Better A . 7323 |=================================================================== B . 6401 |========================================================== C . 7560 |===================================================================== D . 7375 |=================================================================== oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 11.60 |================================================================== B . 11.34 |================================================================= C . 11.90 |==================================================================== D . 11.75 |=================================================================== oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better A . 1208.52 |================================================================ B . 1242.44 |================================================================= C . 1250.99 |================================================================== D . 1254.39 |================================================================== oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 7.59165 |================================================================== B . 6.93013 |============================================================ C . 7.57941 |================================================================== D . 7.04981 |============================================================= oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 1.52619 |================================================================= B . 1.49871 |=============================================================== C . 1.56000 |================================================================== D . 1.45511 |============================================================== oneDNN 2.6 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better A . 2.18433 |============================================================ B . 2.35161 |================================================================ C . 2.37403 |================================================================= D . 2.42176 |================================================================== oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU ms < Lower Is Better A . 2.00953 |================================================================== B . 1.96420 |================================================================= C . 1.99383 |================================================================= D . 1.91681 |===============================================================