pts-result-gpu-30-05-2024.list 2 x AMD EPYC 7452 32-Core testing with a Supermicro AS-2124GQ-NART H12DSG-Q-CPU6 v1.01 (1.0a BIOS) and NVIDIA A100-SXM4-40GB on CentOS Linux 7 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2407234-NE-PTSRESULT61&rdt&grt .
pts-result-gpu-30-05-2024.list Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Display Server Display Driver Vulkan Compiler File-System Screen Resolution GPU-run-30-05-2024 pts-config-gpu-30-05-2024 2 x AMD EPYC 7452 32-Core @ 2.35GHz (64 Cores) Supermicro AS-2124GQ-NART H12DSG-Q-CPU6 v1.01 (1.0a BIOS) AMD Starship/Matisse 16 x 32 GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE 252GB NVIDIA A100-SXM4-40GB 2 x Intel 10-Gigabit X540-AT2 CentOS Linux 7 5.4.265-1.el7.elrepo.x86_64 (x86_64) X Server NVIDIA 1.3.260 GCC 4.8.5 20150623 + CUDA 12.3 tmpfs 1024x768 GCC 8.3.0 + CUDA 12.3 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - GPU-run-30-05-2024: --build=x86_64-redhat-linux --disable-libgcj --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-linker-hash-style=gnu --with-tune=generic - pts-config-gpu-30-05-2024: --disable-multilib --enable-languages=c,c++ Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0x830107a Graphics Details - BAR1 / Visible vRAM Size: 65536 MiB - vBIOS Version: 92.00.19.00.13 Python Details - GPU-run-30-05-2024: Python 2.7.5 + Python 3.6.8 - pts-config-gpu-30-05-2024: Python 2.7.5 + Python 3.9.7 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable IBPB: disabled STIBP: disabled PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected Environment Details - pts-config-gpu-30-05-2024: EXTRA_NVCCFLAGS=-cudart=shared
pts-result-gpu-30-05-2024.list cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth financebench: Black-Scholes OpenCL hashcat: MD5 hashcat: SHA1 hashcat: 7-Zip hashcat: SHA-512 hashcat: TrueCrypt RIPEMD160 + XTS mixbench: OpenCL - Integer mixbench: NVIDIA CUDA - Integer mixbench: OpenCL - Double Precision mixbench: OpenCL - Single Precision mixbench: NVIDIA CUDA - Half Precision mixbench: NVIDIA CUDA - Double Precision mixbench: NVIDIA CUDA - Single Precision ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet rodinia: OpenCL Particle Filter viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: OpenCL BLAS GPU-run-30-05-2024 pts-config-gpu-30-05-2024 231.7 780.7 1242.6 16043.49 17926.38 7979.31 1300.10 1.191 170625637500 88126433333 4385933 12848366667 3299133 14801.87 12812.49 7699.84 15389.52 44809.08 7814.50 15201.95 39.73 18.97 18.30 21.59 15.40 23.16 8.97 38.14 124.62 28.99 17.89 58.77 39.73 70.32 37.74 41.64 156.26 18.25 3.302 470 736 418 742 948 627 157.2 450 74.6 68.8 85.1 76.3 OpenBenchmarking.org
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy pts-config-gpu-30-05-2024 50 100 150 200 250 SE +/- 0.03, N = 3 231.7 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read pts-config-gpu-30-05-2024 200 400 600 800 1000 SE +/- 0.06, N = 3 780.7 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write pts-config-gpu-30-05-2024 300 600 900 1200 1500 SE +/- 0.45, N = 3 1242.6 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 199.00, N = 15 16043.49
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float pts-config-gpu-30-05-2024 4K 8K 12K 16K 20K SE +/- 165.14, N = 3 17926.38
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double pts-config-gpu-30-05-2024 2K 4K 6K 8K 10K SE +/- 119.81, N = 12 7979.31
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth pts-config-gpu-30-05-2024 300 600 900 1200 1500 SE +/- 3.13, N = 3 1300.10
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL pts-config-gpu-30-05-2024 0.268 0.536 0.804 1.072 1.34 SE +/- 0.005, N = 3 1.191 1. (CXX) g++ options: -O3 -march=native -fopenmp
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: MD5 pts-config-gpu-30-05-2024 40000M 80000M 120000M 160000M 200000M SE +/- 26429817415.14, N = 16 170625637500
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 pts-config-gpu-30-05-2024 20000M 40000M 60000M 80000M 100000M SE +/- 104285670.69, N = 3 88126433333
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: 7-Zip pts-config-gpu-30-05-2024 900K 1800K 2700K 3600K 4500K SE +/- 25606.34, N = 3 4385933
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 pts-config-gpu-30-05-2024 3000M 6000M 9000M 12000M 15000M SE +/- 3468589.21, N = 3 12848366667
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS pts-config-gpu-30-05-2024 700K 1400K 2100K 2800K 3500K SE +/- 592.55, N = 3 3299133
Mixbench Backend: OpenCL - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 4.35, N = 3 14801.87 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Integer OpenBenchmarking.org GIOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 5.64, N = 3 12812.49 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision pts-config-gpu-30-05-2024 1700 3400 5100 6800 8500 SE +/- 113.22, N = 15 7699.84 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: OpenCL - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 262.04, N = 15 15389.52 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Half Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision pts-config-gpu-30-05-2024 10K 20K 30K 40K 50K SE +/- 727.84, N = 15 44809.08 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Double Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision pts-config-gpu-30-05-2024 2K 4K 6K 8K 10K SE +/- 160.94, N = 15 7814.50 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
Mixbench Backend: NVIDIA CUDA - Benchmark: Single Precision OpenBenchmarking.org GFLOPS, More Is Better Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision pts-config-gpu-30-05-2024 3K 6K 9K 12K 15K SE +/- 290.53, N = 12 15201.95 1. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.15, N = 9 39.73 MIN: 29.4 / MAX: 277.23
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 pts-config-gpu-30-05-2024 5 10 15 20 25 SE +/- 3.03, N = 9 18.97 MIN: 10.12 / MAX: 542.78
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 pts-config-gpu-30-05-2024 5 10 15 20 25 SE +/- 1.67, N = 9 18.30 MIN: 9.64 / MAX: 63.36
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 pts-config-gpu-30-05-2024 5 10 15 20 25 SE +/- 1.17, N = 9 21.59 MIN: 12.52 / MAX: 248.51
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet pts-config-gpu-30-05-2024 4 8 12 16 20 SE +/- 1.11, N = 9 15.40 MIN: 8.91 / MAX: 466.02
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 pts-config-gpu-30-05-2024 6 12 18 24 30 SE +/- 2.28, N = 9 23.16 MIN: 12.29 / MAX: 221.56
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface pts-config-gpu-30-05-2024 3 6 9 12 15 SE +/- 0.98, N = 9 8.97 MIN: 4.17 / MAX: 172.12
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.64, N = 9 38.14 MIN: 27.74 / MAX: 523.7
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 pts-config-gpu-30-05-2024 30 60 90 120 150 SE +/- 5.02, N = 9 124.62 MIN: 66.23 / MAX: 682.25
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 pts-config-gpu-30-05-2024 7 14 21 28 35 SE +/- 1.18, N = 9 28.99 MIN: 19.8 / MAX: 221.98
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet pts-config-gpu-30-05-2024 4 8 12 16 20 SE +/- 1.37, N = 9 17.89 MIN: 13.81 / MAX: 182.2
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 pts-config-gpu-30-05-2024 13 26 39 52 65 SE +/- 1.60, N = 9 58.77 MIN: 44.97 / MAX: 337.56
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.15, N = 9 39.73 MIN: 29.4 / MAX: 277.23
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny pts-config-gpu-30-05-2024 16 32 48 64 80 SE +/- 2.29, N = 9 70.32 MIN: 50.39 / MAX: 562.92
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd pts-config-gpu-30-05-2024 9 18 27 36 45 SE +/- 1.52, N = 9 37.74 MIN: 27.11 / MAX: 623.11
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m pts-config-gpu-30-05-2024 10 20 30 40 50 SE +/- 2.44, N = 9 41.64 MIN: 28.21 / MAX: 229.16
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer pts-config-gpu-30-05-2024 30 60 90 120 150 SE +/- 3.66, N = 9 156.26 MIN: 137.55 / MAX: 893.71
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet pts-config-gpu-30-05-2024 4 8 12 16 20 SE +/- 1.79, N = 9 18.25 MIN: 11.3 / MAX: 482.82
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter pts-config-gpu-30-05-2024 0.743 1.486 2.229 2.972 3.715 SE +/- 0.027, N = 13 3.302 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY pts-config-gpu-30-05-2024 100 200 300 400 500 SE +/- 17.64, N = 15 470
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY pts-config-gpu-30-05-2024 160 320 480 640 800 SE +/- 25.06, N = 15 736
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT pts-config-gpu-30-05-2024 90 180 270 360 450 SE +/- 1.84, N = 15 418
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY pts-config-gpu-30-05-2024 160 320 480 640 800 SE +/- 36.07, N = 15 742
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY pts-config-gpu-30-05-2024 200 400 600 800 1000 SE +/- 42.05, N = 15 948
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT pts-config-gpu-30-05-2024 140 280 420 560 700 SE +/- 19.94, N = 15 627
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N pts-config-gpu-30-05-2024 30 60 90 120 150 SE +/- 6.04, N = 15 157.2
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T pts-config-gpu-30-05-2024 100 200 300 400 500 SE +/- 5.98, N = 15 450
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN pts-config-gpu-30-05-2024 20 40 60 80 100 SE +/- 0.41, N = 15 74.6
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT pts-config-gpu-30-05-2024 15 30 45 60 75 SE +/- 0.48, N = 15 68.8
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN pts-config-gpu-30-05-2024 20 40 60 80 100 SE +/- 0.59, N = 15 85.1
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT pts-config-gpu-30-05-2024 20 40 60 80 100 SE +/- 0.79, N = 15 76.3
Phoronix Test Suite v10.8.5