2

KVM testing on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402237-NE-20860178071&grs&sro.

2ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLVulkanCompilerFile-SystemScreen ResolutionSystem LayerNVIDIA A100 80GB PCIe14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -14 x Intel Xeon Gold 6342 (14 Cores)Nutanix AHV (nutanix-ahv-2.20220304.0.2619.el7 BIOS)Intel 440FX 82441FX PMC4 x 16384 MB RAM428GB VDISKNVIDIA A100 80GB PCIeRed Hat Virtio deviceUbuntu 20.045.4.0-172-generic (x86_64)NVIDIAOpenCL 3.0 CUDA 12.2.1481.3.242GCC 9.4.0 + CUDA 12.3ext41024x768KVMOpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-9QDOt0/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- CPU Microcode: 0x1Graphics Details- NVIDIA A100 80GB PCIe: BAR1 / Visible vRAM Size: 131072 MiB - vBIOS Version: 92.00.90.00.0fPython Details- NVIDIA A100 80GB PCIe: Python 3.8.10Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - resnet18blender: Pabellon Barcelona - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: Classroom - NVIDIA OptiXcaffe: GoogleNet - NVIDIA CUDA - 1000caffe: GoogleNet - NVIDIA CUDA - 200caffe: GoogleNet - NVIDIA CUDA - 100caffe: AlexNet - NVIDIA CUDA - 1000caffe: AlexNet - NVIDIA CUDA - 200caffe: AlexNet - NVIDIA CUDA - 100gromacs: NVIDIA CUDA GPU - water_GMX50_bareviennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sCOPYviennacl: CPU BLAS - dGEMM-TTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - sDOTfinancebench: Black-Scholes OpenCLarrayfire: Conjugate Gradient OpenCLclpeak: Global Memory Bandwidthclpeak: Double-Precision Doubleclpeak: Single-Precision Floatclpeak: Integer Compute INTfahbench: cl-mem: Writecl-mem: Readcl-mem: Copyshoc: OpenCL - Texture Read Bandwidthshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Max SP Flopsshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Reductionshoc: OpenCL - MD5 Hashshoc: OpenCL - FFT SPshoc: OpenCL - Triadshoc: OpenCL - S3Dmixbench: OpenCL - Single Precisionmixbench: OpenCL - Double Precisionmixbench: OpenCL - Integerblender: BMW27 - NVIDIA OptiXncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU-v2-v2 - mobilenet-v2viennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sCOPYhashcat: MD5NVIDIA A100 80GB PCIe14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -4.404.245.545.954.3683.8232.1222.4911.3810.9312.4814.7018.198.0044.9983.6422.3620.8231538.76316.843190.998505.731709.23857.69225.613427042204653424324568.243557244022731223226.425.926.126.375.897.110711873.084.11.0351.9881495.369689.0319311.0619208.70258.59711405.8796.1234.81582.1226.401025.305219366.213470.7236.02842.75894423.0124.7960815.72618866.429542.0918824.2227.525.101.574.99178147.44.624.405.366.124.4585.1131.6722.2011.5111.0312.5914.7918.238.015.451.655.22OpenBenchmarking.org

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnet14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe1.03952.0793.11854.1585.1975SE +/- 0.07, N = 15SE +/- 0.07, N = 144.624.40MIN: 4.03 / MAX: 5.52MIN: 3.88 / MAX: 5.281. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v314 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe0.991.982.973.964.95SE +/- 0.05, N = 15SE +/- 0.05, N = 154.404.24MIN: 3.99 / MAX: 5.34MIN: 3.74 / MAX: 4.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnet14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe1.24652.4933.73954.9866.2325SE +/- 0.05, N = 15SE +/- 0.04, N = 155.365.54MIN: 4.8 / MAX: 6.09MIN: 5.23 / MAX: 8.321. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b014 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe246810SE +/- 0.06, N = 15SE +/- 0.04, N = 156.125.95MIN: 5.57 / MAX: 6.88MIN: 5.48 / MAX: 6.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v214 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe1.00132.00263.00394.00525.0065SE +/- 0.04, N = 14SE +/- 0.04, N = 154.454.36MIN: 4.05 / MAX: 5.16MIN: 3.94 / MAX: 4.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformer14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe20406080100SE +/- 0.43, N = 15SE +/- 0.54, N = 1585.1183.82MIN: 80.72 / MAX: 98.92MIN: 80.45 / MAX: 98.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg1614 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe714212835SE +/- 0.13, N = 15SE +/- 0.20, N = 1531.6732.12MIN: 29.72 / MAX: 41.37MIN: 30.68 / MAX: 41.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tiny14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe510152025SE +/- 0.17, N = 15SE +/- 0.06, N = 1522.2022.49MIN: 20.51 / MAX: 26.86MIN: 21.4 / MAX: 30.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400m14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe3691215SE +/- 0.12, N = 15SE +/- 0.06, N = 1511.5111.38MIN: 10.62 / MAX: 12.43MIN: 10.87 / MAX: 14.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssd14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe3691215SE +/- 0.12, N = 15SE +/- 0.08, N = 1511.0310.93MIN: 10.02 / MAX: 13.51MIN: 10.12 / MAX: 11.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenet14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe3691215SE +/- 0.19, N = 15SE +/- 0.12, N = 1512.5912.48MIN: 11.13 / MAX: 14.43MIN: 11.64 / MAX: 14.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenet14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe48121620SE +/- 0.20, N = 15SE +/- 0.10, N = 1514.7914.70MIN: 13.45 / MAX: 16.57MIN: 13.6 / MAX: 50.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet5014 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe48121620SE +/- 0.13, N = 15SE +/- 0.11, N = 1518.2318.19MIN: 17.21 / MAX: 33.45MIN: 17.46 / MAX: 19.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet1814 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe246810SE +/- 0.08, N = 15SE +/- 0.08, N = 158.018.00MIN: 7.25 / MAX: 10.22MIN: 7.36 / MAX: 13.061. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXNVIDIA A100 80GB PCIe1020304050SE +/- 0.02, N = 344.99

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA OptiXNVIDIA A100 80GB PCIe20406080100SE +/- 0.14, N = 383.64

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA OptiXNVIDIA A100 80GB PCIe510152025SE +/- 0.20, N = 822.36

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA OptiXNVIDIA A100 80GB PCIe510152025SE +/- 0.02, N = 320.82

Caffe

Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000NVIDIA A100 80GB PCIe7K14K21K28K35KSE +/- 22.82, N = 331538.71. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200NVIDIA A100 80GB PCIe14002800420056007000SE +/- 3.90, N = 36316.841. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100NVIDIA A100 80GB PCIe7001400210028003500SE +/- 11.28, N = 33190.991. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000NVIDIA A100 80GB PCIe2K4K6K8K10KSE +/- 15.42, N = 38505.731. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200NVIDIA A100 80GB PCIe400800120016002000SE +/- 2.64, N = 31709.231. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100NVIDIA A100 80GB PCIe2004006008001000SE +/- 0.45, N = 3857.691. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

GROMACS

Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bareNVIDIA A100 80GB PCIe612182430SE +/- 0.03, N = 325.611. (CXX) g++ options: -O3 -lm

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTNVIDIA A100 80GB PCIe900180027003600450042701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNNVIDIA A100 80GB PCIe900180027003600450042201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTNVIDIA A100 80GB PCIe10002000300040005000SE +/- 3.33, N = 346531. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNNVIDIA A100 80GB PCIe9001800270036004500SE +/- 3.33, N = 342431. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TNVIDIA A100 80GB PCIe501001502002502451. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NNVIDIA A100 80GB PCIe1530456075SE +/- 0.03, N = 368.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTNVIDIA A100 80GB PCIe901802703604504351. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYNVIDIA A100 80GB PCIe1202403604806005721. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYNVIDIA A100 80GB PCIe1002003004005004401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTNVIDIA A100 80GB PCIe501001502002502271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYNVIDIA A100 80GB PCIe701402102803503121. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYNVIDIA A100 80GB PCIe501001502002502321. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTNVIDIA A100 80GB PCIe612182430SE +/- 0.09, N = 1526.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNNVIDIA A100 80GB PCIe612182430SE +/- 0.08, N = 1425.91. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTNVIDIA A100 80GB PCIe612182430SE +/- 0.08, N = 1526.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNNVIDIA A100 80GB PCIe612182430SE +/- 0.09, N = 1526.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TNVIDIA A100 80GB PCIe20406080100SE +/- 0.26, N = 1575.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NNVIDIA A100 80GB PCIe20406080100SE +/- 0.78, N = 1597.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTNVIDIA A100 80GB PCIe20406080100SE +/- 1.25, N = 151071. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYNVIDIA A100 80GB PCIe306090120150SE +/- 1.03, N = 151181. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYNVIDIA A100 80GB PCIe1632486480SE +/- 0.65, N = 1573.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTNVIDIA A100 80GB PCIe20406080100SE +/- 0.26, N = 1584.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLNVIDIA A100 80GB PCIe0.23290.46580.69870.93161.1645SE +/- 0.009, N = 61.0351. (CXX) g++ options: -O3 -march=native -fopenmp

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.9Test: Conjugate Gradient OpenCLNVIDIA A100 80GB PCIe0.44730.89461.34191.78922.2365SE +/- 0.006, N = 31.9881. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthNVIDIA A100 80GB PCIe30060090012001500SE +/- 0.08, N = 31495.361. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleNVIDIA A100 80GB PCIe2K4K6K8K10KSE +/- 3.43, N = 39689.031. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatNVIDIA A100 80GB PCIe4K8K12K16K20KSE +/- 10.20, N = 319311.061. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTNVIDIA A100 80GB PCIe4K8K12K16K20KSE +/- 22.16, N = 319208.701. (CXX) g++ options: -O3

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2NVIDIA A100 80GB PCIe60120180240300SE +/- 0.28, N = 3258.60

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteNVIDIA A100 80GB PCIe30060090012001500SE +/- 0.65, N = 31405.81. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadNVIDIA A100 80GB PCIe2004006008001000SE +/- 0.32, N = 3796.11. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyNVIDIA A100 80GB PCIe50100150200250SE +/- 0.03, N = 3234.81. (CC) gcc options: -O2 -flto -lOpenCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthNVIDIA A100 80GB PCIe30060090012001500SE +/- 0.32, N = 31582.121. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackNVIDIA A100 80GB PCIe612182430SE +/- 0.00, N = 326.401. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadNVIDIA A100 80GB PCIe612182430SE +/- 0.00, N = 325.311. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsNVIDIA A100 80GB PCIe4K8K12K16K20KSE +/- 4.11, N = 319366.21. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NNVIDIA A100 80GB PCIe3K6K9K12K15KSE +/- 2.22, N = 313470.71. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionNVIDIA A100 80GB PCIe50100150200250SE +/- 2.10, N = 3236.031. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashNVIDIA A100 80GB PCIe1020304050SE +/- 0.00, N = 342.761. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPNVIDIA A100 80GB PCIe9001800270036004500SE +/- 7.20, N = 34423.011. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadNVIDIA A100 80GB PCIe612182430SE +/- 0.02, N = 324.801. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DNVIDIA A100 80GB PCIe2004006008001000SE +/- 2.56, N = 3815.731. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Mixbench

Backend: OpenCL - Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Single PrecisionNVIDIA A100 80GB PCIe4K8K12K16K20KSE +/- 0.00, N = 318866.421. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Double Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Double PrecisionNVIDIA A100 80GB PCIe2K4K6K8K10KSE +/- 1.80, N = 39542.091. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Integer

OpenBenchmarking.orgGIOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: IntegerNVIDIA A100 80GB PCIe4K8K12K16K20KSE +/- 21.05, N = 318824.221. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA OptiXNVIDIA A100 80GB PCIe612182430SE +/- 16.82, N = 1427.52

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDet14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe1.22632.45263.67894.90526.1315SE +/- 0.13, N = 15SE +/- 0.16, N = 145.455.10MIN: 4.21 / MAX: 7.84MIN: 3.96 / MAX: 7.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazeface14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe0.37130.74261.11391.48521.8565SE +/- 0.03, N = 15SE +/- 0.03, N = 151.651.57MIN: 1.4 / MAX: 2.22MIN: 1.39 / MAX: 1.851. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v214 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe -NVIDIA A100 80GB PCIe1.17452.3493.52354.6985.8725SE +/- 0.10, N = 15SE +/- 0.07, N = 155.224.99MIN: 4.03 / MAX: 13.71MIN: 4.53 / MAX: 6.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYNVIDIA A100 80GB PCIe4080120160200SE +/- 5.09, N = 141781. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYNVIDIA A100 80GB PCIe306090120150SE +/- 11.32, N = 15147.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL


Phoronix Test Suite v10.8.5