cpu_2gpu

2 x AMD EPYC 7713 64-Core testing with a GIGABYTE MZ72-HB0-00 v01020102 (M10 BIOS) and ASPEED 80GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2309030-NE-CPU2GPU7184&grr.

cpu_2gpuProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDisplay ServerDisplay DriverOpenCLCompilerFile-SystemScreen Resolution2 NVIDIA A100 GPUs2 x AMD EPYC 7713 64-Core @ 2.00GHz (128 Cores / 256 Threads)GIGABYTE MZ72-HB0-00 v01020102 (M10 BIOS)AMD Starship/Matisse7 x 64 GB DDR4-3200MT/s 36ASF8G72PZ-3G2B22 x 1000GB Samsung SSD 980 PRO 1TB + 1000GB Western Digital WD Blue SN570 1TB + 1000GB Sabrent Rocket QASPEED 80GBPHL 243V72 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme II BCM57810 10Ubuntu 22.045.15.0-69-generic (x86_64)X Server 1.21.1.3NVIDIAOpenCL 3.0 CUDA 12.2.128GCC 11.4.0 + CUDA 12.2ext41920x1080OpenBenchmarking.org- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa00115d- BAR1 / Visible vRAM Size: 131072 MiB - vBIOS Version: 92.00.25.00.08- Python 2.7.18 + Python 3.10.12- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

cpu_2gpuncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetlczero: OpenCLluxcorerender: LuxCore Benchmark - GPUluxcorerender: DLSC - GPUfahbench: viennacl: CPU BLAS - dGEMM-TTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sCOPYshoc: OpenCL - Max SP Flopsluxcorerender: Orange Juice - GPUluxcorerender: Danish Mood - GPUhashcat: MD5luxcorerender: Rainbow Colors and Prism - GPUviennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sCOPYhashcat: TrueCrypt RIPEMD160 + XTShashcat: SHA-512hashcat: SHA1shoc: OpenCL - Texture Read Bandwidthhashcat: 7-Zipfinancebench: Black-Scholes OpenCLarrayfire: Conjugate Gradient OpenCLshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - S3Dcl-mem: Copycl-mem: Writecl-mem: Readmixbench: OpenCL - Integerrodinia: OpenCL Particle Filtershoc: OpenCL - Triadshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Bus Speed Downloadclpeak: Single-Precision Floatshoc: OpenCL - FFT SPclpeak: Global Memory Bandwidthshoc: OpenCL - Reductionclpeak: Integer Compute INTclpeak: Double-Precision Doubleshoc: OpenCL - MD5 Hashmixbench: OpenCL - Double Precisionmixbench: OpenCL - Single Precisionredshift: 2 NVIDIA A100 GPUs38.31135.41166.3577.7370.5288.6816.6035.5490.7264.0820.6564.0338.8665.3563.5969.0674.59101820.158.74267.384983.986.679.377.1511.0824.68394.52796.644546963359019428.31.070.1410063641875057.32428042334667426324568.443657444222531423316350006365366667438089666671581.6723596330.9222.6116.7778829.069235.11404.5797.619044.552.0156.692513583.16.728719356.034439.221494.62241.54219279.669721.8842.97379631.3919044.55OpenBenchmarking.org

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDet2 NVIDIA A100 GPUs918273645SE +/- 2.93, N = 938.31MIN: 18.48 / MAX: 3111.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformer2 NVIDIA A100 GPUs306090120150SE +/- 11.92, N = 9135.41MIN: 99.12 / MAX: 4466.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400m2 NVIDIA A100 GPUs4080120160200SE +/- 19.67, N = 9166.35MIN: 87.33 / MAX: 16483.691. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssd2 NVIDIA A100 GPUs20406080100SE +/- 9.11, N = 977.73MIN: 31.71 / MAX: 4075.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tiny2 NVIDIA A100 GPUs1632486480SE +/- 9.06, N = 970.52MIN: 43 / MAX: 1617.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet502 NVIDIA A100 GPUs20406080100SE +/- 13.51, N = 988.68MIN: 40.02 / MAX: 3430.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnet2 NVIDIA A100 GPUs48121620SE +/- 0.58, N = 916.60MIN: 13.21 / MAX: 83.781. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet182 NVIDIA A100 GPUs816243240SE +/- 5.30, N = 935.54MIN: 20.86 / MAX: 1844.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg162 NVIDIA A100 GPUs20406080100SE +/- 10.93, N = 990.72MIN: 55.07 / MAX: 2076.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenet2 NVIDIA A100 GPUs1428425670SE +/- 13.91, N = 964.08MIN: 31.44 / MAX: 3529.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazeface2 NVIDIA A100 GPUs510152025SE +/- 3.81, N = 920.65MIN: 11.33 / MAX: 1718.851. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b02 NVIDIA A100 GPUs1428425670SE +/- 7.12, N = 964.03MIN: 24.69 / MAX: 3658.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnet2 NVIDIA A100 GPUs918273645SE +/- 3.36, N = 938.86MIN: 15.02 / MAX: 21221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v22 NVIDIA A100 GPUs1530456075SE +/- 17.58, N = 965.35MIN: 21.86 / MAX: 2909.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v32 NVIDIA A100 GPUs1428425670SE +/- 11.84, N = 963.59MIN: 17 / MAX: 2572.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v22 NVIDIA A100 GPUs1530456075SE +/- 9.55, N = 969.06MIN: 14.24 / MAX: 2172.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenet2 NVIDIA A100 GPUs20406080100SE +/- 12.08, N = 974.59MIN: 30.47 / MAX: 2484.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

LeelaChessZero

Backend: OpenCL

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: OpenCL2 NVIDIA A100 GPUs2K4K6K8K10KSE +/- 137.36, N = 3101821. (CXX) g++ options: -flto -pthread

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPU2 NVIDIA A100 GPUs0.03380.06760.10140.13520.169SE +/- 0.00, N = 30.15MAX: 0.19

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPU2 NVIDIA A100 GPUs246810SE +/- 0.06, N = 38.74MIN: 6.84 / MAX: 19.23

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.22 NVIDIA A100 GPUs60120180240300SE +/- 0.36, N = 3267.38

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TT2 NVIDIA A100 GPUs20406080100SE +/- 0.52, N = 1283.91. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TN2 NVIDIA A100 GPUs20406080100SE +/- 0.86, N = 1286.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NT2 NVIDIA A100 GPUs20406080100SE +/- 0.42, N = 1279.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NN2 NVIDIA A100 GPUs20406080100SE +/- 0.96, N = 1277.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-T2 NVIDIA A100 GPUs110220330440550SE +/- 92.89, N = 12511.081. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-N2 NVIDIA A100 GPUs612182430SE +/- 2.63, N = 1224.681. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOT2 NVIDIA A100 GPUs90180270360450SE +/- 85.20, N = 12394.521. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPY2 NVIDIA A100 GPUs2004006008001000SE +/- 143.43, N = 12796.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPY2 NVIDIA A100 GPUs100200300400500SE +/- 61.34, N = 124451. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOT2 NVIDIA A100 GPUs100200300400500SE +/- 2.78, N = 124691. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPY2 NVIDIA A100 GPUs140280420560700SE +/- 28.79, N = 126331. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPY2 NVIDIA A100 GPUs130260390520650SE +/- 26.23, N = 125901. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP Flops2 NVIDIA A100 GPUs4K8K12K16K20KSE +/- 1.59, N = 319428.31. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPU2 NVIDIA A100 GPUs0.24080.48160.72240.96321.204SE +/- 0.00, N = 31.07MIN: 0.8 / MAX: 1.61

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPU2 NVIDIA A100 GPUs0.03150.0630.09450.1260.1575SE +/- 0.00, N = 30.14MAX: 0.17

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD52 NVIDIA A100 GPUs20000M40000M60000M80000M100000MSE +/- 8779507020.41, N = 16100636418750

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPU2 NVIDIA A100 GPUs1326395265SE +/- 0.85, N = 1357.32MIN: 45.61 / MAX: 71.88

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TT2 NVIDIA A100 GPUs9001800270036004500SE +/- 0.00, N = 342801. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TN2 NVIDIA A100 GPUs9001800270036004500SE +/- 3.33, N = 342331. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NT2 NVIDIA A100 GPUs10002000300040005000SE +/- 3.33, N = 346671. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NN2 NVIDIA A100 GPUs9001800270036004500SE +/- 3.33, N = 342631. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-T2 NVIDIA A100 GPUs50100150200250SE +/- 0.00, N = 32451. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-N2 NVIDIA A100 GPUs1530456075SE +/- 0.00, N = 368.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOT2 NVIDIA A100 GPUs90180270360450SE +/- 0.33, N = 34361. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPY2 NVIDIA A100 GPUs120240360480600SE +/- 0.00, N = 35741. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPY2 NVIDIA A100 GPUs100200300400500SE +/- 0.00, N = 34421. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOT2 NVIDIA A100 GPUs50100150200250SE +/- 0.00, N = 32251. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPY2 NVIDIA A100 GPUs70140210280350SE +/- 0.00, N = 33141. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPY2 NVIDIA A100 GPUs50100150200250SE +/- 0.00, N = 32331. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTS2 NVIDIA A100 GPUs400K800K1200K1600K2000KSE +/- 529.15, N = 31635000

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-5122 NVIDIA A100 GPUs1400M2800M4200M5600M7000MSE +/- 6993886.22, N = 36365366667

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA12 NVIDIA A100 GPUs9000M18000M27000M36000M45000MSE +/- 24340181.68, N = 343808966667

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read Bandwidth2 NVIDIA A100 GPUs30060090012001500SE +/- 0.32, N = 31581.671. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-Zip2 NVIDIA A100 GPUs500K1000K1500K2000K2500KSE +/- 2469.37, N = 32359633

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCL2 NVIDIA A100 GPUs0.20750.4150.62250.831.0375SE +/- 0.015, N = 150.9221. (CXX) g++ options: -O3 -march=native -fopenmp

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.7Test: Conjugate Gradient OpenCL2 NVIDIA A100 GPUs0.58751.1751.76252.352.9375SE +/- 0.003, N = 32.6111. (CXX) g++ options: -rdynamic

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Readback2 NVIDIA A100 GPUs246810SE +/- 0.0000, N = 36.77781. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3D2 NVIDIA A100 GPUs2004006008001000SE +/- 0.56, N = 3829.071. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copy2 NVIDIA A100 GPUs50100150200250SE +/- 0.00, N = 3235.11. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Write2 NVIDIA A100 GPUs30060090012001500SE +/- 0.10, N = 31404.51. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Read2 NVIDIA A100 GPUs2004006008001000SE +/- 0.06, N = 3797.61. (CC) gcc options: -O2 -flto -lOpenCL

Mixbench

Backend: OpenCL - Benchmark: Integer

OpenBenchmarking.orgGIOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Integer2 NVIDIA A100 GPUs4K8K12K16K20KSE +/- 7.20, N = 319044.551. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle Filter2 NVIDIA A100 GPUs0.45340.90681.36021.81362.2672.0151. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Triad2 NVIDIA A100 GPUs246810SE +/- 0.0004, N = 36.69251. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_N2 NVIDIA A100 GPUs3K6K9K12K15KSE +/- 1.91, N = 313583.11. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Download2 NVIDIA A100 GPUs246810SE +/- 0.0000, N = 36.72871. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Float2 NVIDIA A100 GPUs4K8K12K16K20KSE +/- 0.12, N = 319356.031. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SP2 NVIDIA A100 GPUs10002000300040005000SE +/- 0.91, N = 34439.221. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidth2 NVIDIA A100 GPUs30060090012001500SE +/- 0.32, N = 31494.621. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Reduction2 NVIDIA A100 GPUs50100150200250SE +/- 0.01, N = 3241.541. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INT2 NVIDIA A100 GPUs4K8K12K16K20KSE +/- 0.09, N = 319279.661. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Double2 NVIDIA A100 GPUs2K4K6K8K10KSE +/- 0.07, N = 39721.881. (CXX) g++ options: -O3

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 Hash2 NVIDIA A100 GPUs1020304050SE +/- 0.00, N = 342.971. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Mixbench

Backend: OpenCL - Benchmark: Double Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Double Precision2 NVIDIA A100 GPUs2K4K6K8K10KSE +/- 1.84, N = 39631.391. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Single Precision2 NVIDIA A100 GPUs4K8K12K16K20KSE +/- 7.20, N = 319044.551. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2


Phoronix Test Suite v10.8.4