mi100-1

KVM testing on AlmaLinux 8.5 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2111222-TJ-2105265IB60&sro.

mi100-1ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelOpenCLCompilerFile-SystemScreen ResolutionSystem LayerDisplay DriverVulkanmi100V100P4016 x Intel Core (Haswell no TSX) (16 Cores)RDO OpenStack Compute (1.11.0-2.el7 BIOS)Intel 82G33/G31/P35/P31 + ICH964GB21GB QEMU HDD + 107GB QEMU HDDCirrus Logic GD 5446 32GBRed Hat Virtio deviceUbuntu 18.045.4.0-64-generic (x86_64)OpenCL 2.0 AMD-APP (3275.0)GCC 7.5.0ext41024x768KVM2 x Intel Xeon (Skylake IBRS) (2 Cores)8GB21GB QEMU HDD + 53GB QEMU HDDCirrus Logic GD 5446 8GBUbuntu 20.045.4.0-67-generic (x86_64)NVIDIAOpenCL 1.2 CUDA 11.0.2281.2.133GCC 9.3.0 + CUDA 11.24 x Intel Xeon (Cascadelake) (4 Cores)Red Hat RHEL-AV (0.0.0 BIOS)16GB21GB QEMU HDD + 54GB QEMU HDDCirrus Logic GD 5446 6GBAlmaLinux 8.54.18.0-305.19.1.el8_4.x86_64 (x86_64)GCC 8.5.0 20210514xfs1024x768OpenBenchmarking.orgKernel Details- mi100: Transparent Huge Pages: madvise- V100: Transparent Huge Pages: madvise- P40: Transparent Huge Pages: alwaysCompiler Details- mi100: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - V100: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - P40: --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details- CPU Microcode: 0x1Python Details- mi100: Python 2.7.17 + Python 3.6.9Security Details- mi100: itlb_multihit: KVM: Vulnerable + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline STIBP: disabled RSB filling + srbds: Unknown: Dependent on hypervisor status + tsx_async_abort: Not affected - V100: itlb_multihit: KVM: Vulnerable + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - P40: SELinux + itlb_multihit: KVM: Vulnerable + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected Graphics Details- P40: BAR1 / Visible vRAM Size: 8192 MiB

mi100-1shoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthcl-mem: Copycl-mem: Readcl-mem: Writerodinia: OpenCL Myocyterodinia: OpenCL Heartwalldarktable: Boat - OpenCLdarktable: Masskrug - OpenCLdarktable: Server Rack - OpenCLdarktable: Server Room - OpenCLblender: BMW27 - OpenCLclpeak: Kernel Latencyclpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidthclpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferdarktable: Boat - OpenCLdarktable: Masskrug - OpenCLdarktable: Server Rack - OpenCLdarktable: Server Room - OpenCLmi100V100P4012.27402783.5127.89102194303313.669414.0831706.109286.8916.8730.0132.6003.1332.0085.0750.1770.86453.7617.877487.8422813.5511439.47960.154.8610.9612.26492278.0931.092814052.712.344113.17091470.52268.5780.2736.7115.4792.9191281.465.5113899.1714073.617003.99769.524.046.645.56618.1560.4141.81011.8020800.24617.658811756.012.336013.1668503.539240.3292.2289.9549.4757.183140.6910130.74368.87282.735.787.54OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadP40V100mi1003691215SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.15, N = 311.8012.2612.271. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPP40V100mi1006001200180024003000SE +/- 0.45, N = 3SE +/- 7.23, N = 3SE +/- 2.72, N = 3800.252278.092783.511. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashP40V100mi100714212835SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 317.6631.0927.891. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsP40V100mi1005M10M15M20M25MSE +/- 2.39, N = 3SE +/- 6.33, N = 3SE +/- 89939.99, N = 311756.014052.721943033.01. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadP40V100mi10048121620SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 312.3412.3413.671. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackP40V100mi10048121620SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 313.1713.1714.081. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthP40V100mi10030060090012001500SE +/- 0.70, N = 3SE +/- 1.76, N = 3SE +/- 0.38, N = 3503.541470.52706.111. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyP40V100mi10060120180240300SE +/- 0.03, N = 3SE +/- 0.47, N = 3SE +/- 1.71, N = 3240.3268.5286.81. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadP40V100mi1002004006008001000SE +/- 0.12, N = 3SE +/- 1.72, N = 3SE +/- 1.78, N = 3292.2780.2916.81. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteP40V100mi100160320480640800SE +/- 0.07, N = 3SE +/- 0.59, N = 3SE +/- 0.52, N = 3289.9736.7730.01. (CC) gcc options: -O2 -flto -lOpenCL

Rodinia

Test: OpenCL Myocyte

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL MyocyteV100mi100306090120150SE +/- 0.98, N = 3SE +/- 3.88, N = 12115.48132.60-m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl-O2 -lOpenCL1. (CXX) g++ options:

Rodinia

Test: OpenCL Heartwall

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL HeartwallV100mi1000.70491.40982.11472.81963.5245SE +/- 0.012, N = 32.9193.133-m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl-O2 -lOpenCL1. (CXX) g++ options:

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Boat - Acceleration: OpenCLmi1000.45180.90361.35541.80722.259SE +/- 0.014, N = 152.008

Darktable

Test: Masskrug - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Masskrug - Acceleration: OpenCLmi1001.14192.28383.42574.56765.7095SE +/- 0.051, N = 35.075

Darktable

Test: Server Rack - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Server Rack - Acceleration: OpenCLmi1000.03980.07960.11940.15920.199SE +/- 0.005, N = 150.177

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Server Room - Acceleration: OpenCLmi1000.19440.38880.58320.77760.972SE +/- 0.001, N = 30.864

Blender

Blend File: BMW27 - Compute: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: BMW27 - Compute: OpenCLP40V100mi10030060090012001500SE +/- 3.70, N = 3SE +/- 2.77, N = 3SE +/- 2.10, N = 15549.471281.4653.76

clpeak

OpenCL Test: Kernel Latency

OpenBenchmarking.orgus, Fewer Is BetterclpeakOpenCL Test: Kernel LatencyP40V100mi1001326395265SE +/- 0.14, N = 3SE +/- 0.06, N = 3SE +/- 0.64, N = 1257.185.5117.871. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is BetterclpeakOpenCL Test: Integer Compute INTP40V100mi1003K6K9K12K15KSE +/- 24.17, N = 15SE +/- 168.65, N = 3SE +/- 5.18, N = 33140.6913899.177487.841. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Single-Precision FloatP40V100mi1005K10K15K20K25KSE +/- 127.29, N = 15SE +/- 50.95, N = 3SE +/- 6.31, N = 310130.7414073.6122813.551. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Double-Precision DoubleP40V100mi1002K4K6K8K10KSE +/- 0.52, N = 3SE +/- 57.57, N = 3SE +/- 3.65, N = 3368.877003.9911439.471. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Global Memory BandwidthP40V100mi1002004006008001000SE +/- 0.32, N = 3SE +/- 0.50, N = 3SE +/- 0.94, N = 3282.73769.52960.151. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Transfer Bandwidth enqueueReadBuffer

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Transfer Bandwidth enqueueReadBufferP40V100mi1001.30052.6013.90155.2026.5025SE +/- 0.43, N = 12SE +/- 0.02, N = 3SE +/- 0.05, N = 65.784.044.861. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Transfer Bandwidth enqueueWriteBuffer

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Transfer Bandwidth enqueueWriteBufferP40V100mi1003691215SE +/- 0.37, N = 15SE +/- 0.19, N = 15SE +/- 1.90, N = 157.546.6410.961. (CXX) g++ options: -O3 -rdynamic -lOpenCL

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Boat - Acceleration: OpenCLV1001.25242.50483.75725.00966.262SE +/- 0.038, N = 35.566

Darktable

Test: Masskrug - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Masskrug - Acceleration: OpenCLV10048121620SE +/- 0.18, N = 318.16

Darktable

Test: Server Rack - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Server Rack - Acceleration: OpenCLV1000.09320.18640.27960.37280.466SE +/- 0.008, N = 150.414

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Server Room - Acceleration: OpenCLV1000.40730.81461.22191.62922.0365SE +/- 0.021, N = 151.810


Phoronix Test Suite v10.8.3