KVM testing on Ubuntu 20.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2402237-NE-20860178071 2 - Phoronix Test Suite 2 KVM testing on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402237-NE-20860178071&grw&sro&rro .
2 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution System Layer NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 14 x Intel Xeon Gold 6342 (14 Cores) Nutanix AHV (nutanix-ahv-2.20220304.0.2619.el7 BIOS) Intel 440FX 82441FX PMC 4 x 16384 MB RAM 428GB VDISK NVIDIA A100 80GB PCIe Red Hat Virtio device Ubuntu 20.04 5.4.0-172-generic (x86_64) NVIDIA OpenCL 3.0 CUDA 12.2.148 1.3.242 GCC 9.4.0 + CUDA 12.3 ext4 1024x768 KVM OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-9QDOt0/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - CPU Microcode: 0x1 Graphics Details - NVIDIA A100 80GB PCIe: BAR1 / Visible vRAM Size: 131072 MiB - vBIOS Version: 92.00.90.00.0f Python Details - NVIDIA A100 80GB PCIe: Python 3.8.10 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Unknown: No mitigations + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
2 caffe: AlexNet - NVIDIA CUDA - 100 caffe: AlexNet - NVIDIA CUDA - 200 caffe: AlexNet - NVIDIA CUDA - 1000 caffe: GoogleNet - NVIDIA CUDA - 100 caffe: GoogleNet - NVIDIA CUDA - 200 caffe: GoogleNet - NVIDIA CUDA - 1000 shoc: OpenCL - S3D shoc: OpenCL - Triad shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash shoc: OpenCL - Reduction shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Max SP Flops shoc: OpenCL - Bus Speed Download shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Texture Read Bandwidth ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - mnasnet gromacs: NVIDIA CUDA GPU - water_GMX50_bare arrayfire: Conjugate Gradient OpenCL blender: BMW27 - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Pabellon Barcelona - NVIDIA OptiX fahbench: mixbench: OpenCL - Integer mixbench: OpenCL - Double Precision mixbench: OpenCL - Single Precision financebench: Black-Scholes OpenCL cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-TT NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 857.692 1709.23 8505.73 3190.99 6316.84 31538.7 815.726 24.7960 4423.01 42.7589 236.028 13470.7 19366.2 25.3052 26.4010 1582.12 14.70 4.99 4.24 4.36 5.95 1.57 12.48 32.12 8.00 5.54 18.19 22.49 10.93 11.38 83.82 5.10 4.40 25.613 1.988 27.52 20.82 22.36 83.64 44.99 258.5971 18824.22 9542.09 18866.42 1.035 234.8 796.1 1405.8 19208.70 19311.06 9689.03 1495.36 147.4 178 84.1 73.0 118 107 97.1 75.8 26.3 26.1 25.9 26.4 232 312 227 440 572 435 68.2 245 4243 4653 4220 4270 14.79 5.22 4.40 4.45 6.12 1.65 12.59 31.67 8.01 5.36 18.23 22.20 11.03 11.51 85.11 5.45 4.62 OpenBenchmarking.org
Caffe Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 NVIDIA A100 80GB PCIe 200 400 600 800 1000 SE +/- 0.45, N = 3 857.69 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 NVIDIA A100 80GB PCIe 400 800 1200 1600 2000 SE +/- 2.64, N = 3 1709.23 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 NVIDIA A100 80GB PCIe 2K 4K 6K 8K 10K SE +/- 15.42, N = 3 8505.73 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 NVIDIA A100 80GB PCIe 700 1400 2100 2800 3500 SE +/- 11.28, N = 3 3190.99 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 NVIDIA A100 80GB PCIe 1400 2800 4200 5600 7000 SE +/- 3.90, N = 3 6316.84 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 NVIDIA A100 80GB PCIe 7K 14K 21K 28K 35K SE +/- 22.82, N = 3 31538.7 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D NVIDIA A100 80GB PCIe 200 400 600 800 1000 SE +/- 2.56, N = 3 815.73 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.02, N = 3 24.80 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP NVIDIA A100 80GB PCIe 900 1800 2700 3600 4500 SE +/- 7.20, N = 3 4423.01 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash NVIDIA A100 80GB PCIe 10 20 30 40 50 SE +/- 0.00, N = 3 42.76 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction NVIDIA A100 80GB PCIe 50 100 150 200 250 SE +/- 2.10, N = 3 236.03 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N NVIDIA A100 80GB PCIe 3K 6K 9K 12K 15K SE +/- 2.22, N = 3 13470.7 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops NVIDIA A100 80GB PCIe 4K 8K 12K 16K 20K SE +/- 4.11, N = 3 19366.2 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.00, N = 3 25.31 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.00, N = 3 26.40 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth NVIDIA A100 80GB PCIe 300 600 900 1200 1500 SE +/- 0.32, N = 3 1582.12 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 4 8 12 16 20 SE +/- 0.10, N = 15 SE +/- 0.20, N = 15 14.70 14.79 MIN: 13.6 / MAX: 50.6 MIN: 13.45 / MAX: 16.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 1.1745 2.349 3.5235 4.698 5.8725 SE +/- 0.07, N = 15 SE +/- 0.10, N = 15 4.99 5.22 MIN: 4.53 / MAX: 6.41 MIN: 4.03 / MAX: 13.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 0.99 1.98 2.97 3.96 4.95 SE +/- 0.05, N = 15 SE +/- 0.05, N = 15 4.24 4.40 MIN: 3.74 / MAX: 4.84 MIN: 3.99 / MAX: 5.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 1.0013 2.0026 3.0039 4.0052 5.0065 SE +/- 0.04, N = 15 SE +/- 0.04, N = 14 4.36 4.45 MIN: 3.94 / MAX: 4.8 MIN: 4.05 / MAX: 5.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 2 4 6 8 10 SE +/- 0.04, N = 15 SE +/- 0.06, N = 15 5.95 6.12 MIN: 5.48 / MAX: 6.65 MIN: 5.57 / MAX: 6.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 0.3713 0.7426 1.1139 1.4852 1.8565 SE +/- 0.03, N = 15 SE +/- 0.03, N = 15 1.57 1.65 MIN: 1.39 / MAX: 1.85 MIN: 1.4 / MAX: 2.22 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 3 6 9 12 15 SE +/- 0.12, N = 15 SE +/- 0.19, N = 15 12.48 12.59 MIN: 11.64 / MAX: 14.98 MIN: 11.13 / MAX: 14.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 7 14 21 28 35 SE +/- 0.20, N = 15 SE +/- 0.13, N = 15 32.12 31.67 MIN: 30.68 / MAX: 41.36 MIN: 29.72 / MAX: 41.37 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 2 4 6 8 10 SE +/- 0.08, N = 15 SE +/- 0.08, N = 15 8.00 8.01 MIN: 7.36 / MAX: 13.06 MIN: 7.25 / MAX: 10.22 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 1.2465 2.493 3.7395 4.986 6.2325 SE +/- 0.04, N = 15 SE +/- 0.05, N = 15 5.54 5.36 MIN: 5.23 / MAX: 8.32 MIN: 4.8 / MAX: 6.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 4 8 12 16 20 SE +/- 0.11, N = 15 SE +/- 0.13, N = 15 18.19 18.23 MIN: 17.46 / MAX: 19.77 MIN: 17.21 / MAX: 33.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 5 10 15 20 25 SE +/- 0.06, N = 15 SE +/- 0.17, N = 15 22.49 22.20 MIN: 21.4 / MAX: 30.45 MIN: 20.51 / MAX: 26.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 3 6 9 12 15 SE +/- 0.08, N = 15 SE +/- 0.12, N = 15 10.93 11.03 MIN: 10.12 / MAX: 11.68 MIN: 10.02 / MAX: 13.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 3 6 9 12 15 SE +/- 0.06, N = 15 SE +/- 0.12, N = 15 11.38 11.51 MIN: 10.87 / MAX: 14.11 MIN: 10.62 / MAX: 12.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 20 40 60 80 100 SE +/- 0.54, N = 15 SE +/- 0.43, N = 15 83.82 85.11 MIN: 80.45 / MAX: 98.96 MIN: 80.72 / MAX: 98.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 1.2263 2.4526 3.6789 4.9052 6.1315 SE +/- 0.16, N = 14 SE +/- 0.13, N = 15 5.10 5.45 MIN: 3.96 / MAX: 7.17 MIN: 4.21 / MAX: 7.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet NVIDIA A100 80GB PCIe 14 x Intel Xeon Gold 6342 - NVIDIA A100 80GB PCIe - 1.0395 2.079 3.1185 4.158 5.1975 SE +/- 0.07, N = 14 SE +/- 0.07, N = 15 4.40 4.62 MIN: 3.88 / MAX: 5.28 MIN: 4.03 / MAX: 5.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
GROMACS Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.03, N = 3 25.61 1. (CXX) g++ options: -O3 -lm
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL NVIDIA A100 80GB PCIe 0.4473 0.8946 1.3419 1.7892 2.2365 SE +/- 0.006, N = 3 1.988 1. (CXX) g++ options: -O3
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA OptiX NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 16.82, N = 14 27.52
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: NVIDIA OptiX NVIDIA A100 80GB PCIe 5 10 15 20 25 SE +/- 0.02, N = 3 20.82
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA OptiX NVIDIA A100 80GB PCIe 5 10 15 20 25 SE +/- 0.20, N = 8 22.36
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: NVIDIA OptiX NVIDIA A100 80GB PCIe 20 40 60 80 100 SE +/- 0.14, N = 3 83.64
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX NVIDIA A100 80GB PCIe 10 20 30 40 50 SE +/- 0.02, N = 3 44.99
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 NVIDIA A100 80GB PCIe 60 120 180 240 300 SE +/- 0.28, N = 3 258.60
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer NVIDIA A100 80GB PCIe 4K 8K 12K 16K 20K SE +/- 21.05, N = 3 18824.22 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision NVIDIA A100 80GB PCIe 2K 4K 6K 8K 10K SE +/- 1.80, N = 3 9542.09 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision NVIDIA A100 80GB PCIe 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 18866.42 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL NVIDIA A100 80GB PCIe 0.2329 0.4658 0.6987 0.9316 1.1645 SE +/- 0.009, N = 6 1.035 1. (CXX) g++ options: -O3 -march=native -fopenmp
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy NVIDIA A100 80GB PCIe 50 100 150 200 250 SE +/- 0.03, N = 3 234.8 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read NVIDIA A100 80GB PCIe 200 400 600 800 1000 SE +/- 0.32, N = 3 796.1 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write NVIDIA A100 80GB PCIe 300 600 900 1200 1500 SE +/- 0.65, N = 3 1405.8 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT NVIDIA A100 80GB PCIe 4K 8K 12K 16K 20K SE +/- 22.16, N = 3 19208.70 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float NVIDIA A100 80GB PCIe 4K 8K 12K 16K 20K SE +/- 10.20, N = 3 19311.06 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double NVIDIA A100 80GB PCIe 2K 4K 6K 8K 10K SE +/- 3.43, N = 3 9689.03 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth NVIDIA A100 80GB PCIe 300 600 900 1200 1500 SE +/- 0.08, N = 3 1495.36 1. (CXX) g++ options: -O3
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY NVIDIA A100 80GB PCIe 30 60 90 120 150 SE +/- 11.32, N = 15 147.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY NVIDIA A100 80GB PCIe 40 80 120 160 200 SE +/- 5.09, N = 14 178 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT NVIDIA A100 80GB PCIe 20 40 60 80 100 SE +/- 0.26, N = 15 84.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY NVIDIA A100 80GB PCIe 16 32 48 64 80 SE +/- 0.65, N = 15 73.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY NVIDIA A100 80GB PCIe 30 60 90 120 150 SE +/- 1.03, N = 15 118 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT NVIDIA A100 80GB PCIe 20 40 60 80 100 SE +/- 1.25, N = 15 107 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N NVIDIA A100 80GB PCIe 20 40 60 80 100 SE +/- 0.78, N = 15 97.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T NVIDIA A100 80GB PCIe 20 40 60 80 100 SE +/- 0.26, N = 15 75.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.09, N = 15 26.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.08, N = 15 26.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.08, N = 14 25.9 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT NVIDIA A100 80GB PCIe 6 12 18 24 30 SE +/- 0.09, N = 15 26.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY NVIDIA A100 80GB PCIe 50 100 150 200 250 232 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY NVIDIA A100 80GB PCIe 70 140 210 280 350 312 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT NVIDIA A100 80GB PCIe 50 100 150 200 250 227 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY NVIDIA A100 80GB PCIe 100 200 300 400 500 440 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY NVIDIA A100 80GB PCIe 120 240 360 480 600 572 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT NVIDIA A100 80GB PCIe 90 180 270 360 450 435 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N NVIDIA A100 80GB PCIe 15 30 45 60 75 SE +/- 0.03, N = 3 68.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T NVIDIA A100 80GB PCIe 50 100 150 200 250 245 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN NVIDIA A100 80GB PCIe 900 1800 2700 3600 4500 SE +/- 3.33, N = 3 4243 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT NVIDIA A100 80GB PCIe 1000 2000 3000 4000 5000 SE +/- 3.33, N = 3 4653 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN NVIDIA A100 80GB PCIe 900 1800 2700 3600 4500 4220 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT NVIDIA A100 80GB PCIe 900 1800 2700 3600 4500 4270 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Phoronix Test Suite v10.8.4