mi100-1 KVM testing on AlmaLinux 8.5 via the Phoronix Test Suite. mi100: Processor: 16 x Intel Core (Haswell no TSX) (16 Cores), Motherboard: RDO OpenStack Compute (1.11.0-2.el7 BIOS), Chipset: Intel 82G33/G31/P35/P31 + ICH9, Memory: 64GB, Disk: 21GB QEMU HDD + 107GB QEMU HDD, Graphics: Cirrus Logic GD 5446 32GB, Network: Red Hat Virtio device OS: Ubuntu 18.04, Kernel: 5.4.0-64-generic (x86_64), OpenCL: OpenCL 2.0 AMD-APP (3275.0), Compiler: GCC 7.5.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM V100: Processor: 2 x Intel Xeon (Skylake IBRS) (2 Cores), Motherboard: RDO OpenStack Compute (1.11.0-2.el7 BIOS), Chipset: Intel 82G33/G31/P35/P31 + ICH9, Memory: 8GB, Disk: 21GB QEMU HDD + 53GB QEMU HDD, Graphics: Cirrus Logic GD 5446 8GB, Network: Red Hat Virtio device OS: Ubuntu 20.04, Kernel: 5.4.0-67-generic (x86_64), Display Driver: NVIDIA, OpenCL: OpenCL 1.2 CUDA 11.0.228, Vulkan: 1.2.133, Compiler: GCC 9.3.0 + CUDA 11.2, File-System: ext4, System Layer: KVM P40: Processor: 4 x Intel Xeon (Cascadelake) (4 Cores), Motherboard: Red Hat RHEL-AV (0.0.0 BIOS), Chipset: Intel 82G33/G31/P35/P31 + ICH9, Memory: 16GB, Disk: 21GB QEMU HDD + 54GB QEMU HDD, Graphics: Cirrus Logic GD 5446 6GB, Network: Red Hat Virtio device OS: AlmaLinux 8.5, Kernel: 4.18.0-305.19.1.el8_4.x86_64 (x86_64), Display Driver: NVIDIA, Compiler: GCC 8.5.0 20210514, File-System: xfs, Screen Resolution: 1024x768, System Layer: KVM Blender 2.92 Blend File: BMW27 - Compute: OpenCL Seconds < Lower Is Better mi100 . 53.76 |=== V100 .. 1281.46 |============================================================== P40 ... 549.47 |=========================== cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better mi100 . 286.8 |================================================================ V100 .. 268.5 |============================================================ P40 ... 240.3 |====================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better mi100 . 916.8 |================================================================ V100 .. 780.2 |====================================================== P40 ... 292.2 |==================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better mi100 . 730.0 |=============================================================== V100 .. 736.7 |================================================================ P40 ... 289.9 |========================= clpeak OpenCL Test: Kernel Latency us < Lower Is Better mi100 . 17.87 |==================== V100 .. 5.51 |====== P40 ... 57.18 |================================================================ clpeak OpenCL Test: Integer Compute INT GIOPS > Higher Is Better mi100 . 7487.84 |================================= V100 .. 13899.17 |============================================================= P40 ... 3140.69 |============== clpeak OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better mi100 . 22813.55 |============================================================= V100 .. 14073.61 |====================================== P40 ... 10130.74 |=========================== clpeak OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better mi100 . 11439.47 |============================================================= V100 .. 7003.99 |===================================== P40 ... 368.87 |== clpeak OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better mi100 . 960.15 |=============================================================== V100 .. 769.52 |================================================== P40 ... 282.73 |=================== clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer GBPS > Higher Is Better mi100 . 4.86 |======================================================= V100 .. 4.04 |============================================= P40 ... 5.78 |================================================================= clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer GBPS > Higher Is Better mi100 . 10.96 |================================================================ V100 .. 6.64 |======================================= P40 ... 7.54 |============================================ Darktable 2.4.2 Test: Boat - Acceleration: OpenCL Seconds < Lower Is Better mi100 . 2.008 |================================================================ Darktable 2.4.2 Test: Masskrug - Acceleration: OpenCL Seconds < Lower Is Better mi100 . 5.075 |================================================================ Darktable 2.4.2 Test: Server Rack - Acceleration: OpenCL Seconds < Lower Is Better mi100 . 0.177 |================================================================ Darktable 2.4.2 Test: Server Room - Acceleration: OpenCL Seconds < Lower Is Better mi100 . 0.864 |================================================================ Darktable 3.0.1 Test: Boat - Acceleration: OpenCL Seconds < Lower Is Better V100 . 5.566 |================================================================= Darktable 3.0.1 Test: Masskrug - Acceleration: OpenCL Seconds < Lower Is Better V100 . 18.16 |================================================================= Darktable 3.0.1 Test: Server Rack - Acceleration: OpenCL Seconds < Lower Is Better V100 . 0.414 |================================================================= Darktable 3.0.1 Test: Server Room - Acceleration: OpenCL Seconds < Lower Is Better V100 . 1.810 |================================================================= Rodinia 3.1 Test: OpenCL Myocyte Seconds < Lower Is Better mi100 . 132.60 |=============================================================== V100 .. 115.48 |======================================================= Rodinia 3.1 Test: OpenCL Heartwall Seconds < Lower Is Better mi100 . 3.133 |================================================================ V100 .. 2.919 |============================================================ SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad GB/s > Higher Is Better mi100 . 12.27 |================================================================ V100 .. 12.26 |================================================================ P40 ... 11.80 |============================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP GFLOPS > Higher Is Better mi100 . 2783.51 |============================================================== V100 .. 2278.09 |=================================================== P40 ... 800.25 |================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash GHash/s > Higher Is Better mi100 . 27.89 |========================================================= V100 .. 31.09 |================================================================ P40 ... 17.66 |==================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops GFLOPS > Higher Is Better mi100 . 21943033.0 |=========================================================== V100 .. 14052.7 | P40 ... 11756.0 | SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download GB/s > Higher Is Better mi100 . 13.67 |================================================================ V100 .. 12.34 |========================================================== P40 ... 12.34 |========================================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback GB/s > Higher Is Better mi100 . 14.08 |================================================================ V100 .. 13.17 |============================================================ P40 ... 13.17 |============================================================ SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth GB/s > Higher Is Better mi100 . 706.11 |============================== V100 .. 1470.52 |============================================================== P40 ... 503.54 |=====================