AMD EPYC 7313 16-Core testing with a GIGABYTE MZE2-G10-00 v01010101 (M07 BIOS) and ASPEED 45GB on Debian 12 via the Phoronix Test Suite.
heikows3-2023-08-18-nvidia-gpu-compute Processor: AMD EPYC 7313 16-Core @ 3.00GHz (16 Cores / 32 Threads), Motherboard: GIGABYTE MZE2-G10-00 v01010101 (M07 BIOS), Chipset: AMD Starship/Matisse, Memory: 8 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E7, Disk: 7682GB Micron_7450_MTFDKCC7T6TFR + 1920GB Micron_7450_MTFDKBG1T9TFR, Graphics: ASPEED 45GB, Network: 2 x Intel I350
OS: Debian 12, Kernel: 6.2.16-3-pve (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.2.79, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 640x480
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa00115dGraphics Notes: BAR1 / Visible vRAM Size: 65536 MiB - vBIOS Version: 95.02.39.00.01Python Notes: Python 3.11.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
PlaidML This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.
FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./plaidml: line 24: /.local/bin/plaidbench: No such file or directory
FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./plaidml: line 24: /.local/bin/plaidbench: No such file or directory
FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./plaidml: line 24: /.local/bin/plaidbench: No such file or directory
FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./plaidml: line 24: /.local/bin/plaidbench: No such file or directory
FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./plaidml: line 24: /.local/bin/plaidbench: No such file or directory
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D heikows3-2023-08-18-nvidia-gpu-compute 110 220 330 440 550 SE +/- 0.45, N = 3 527.28 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet heikows3-2023-08-18-nvidia-gpu-compute 4 8 12 16 20 SE +/- 2.05, N = 12 17.00 MIN: 11.33 / MAX: 1631.33 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 heikows3-2023-08-18-nvidia-gpu-compute 1.1858 2.3716 3.5574 4.7432 5.929 SE +/- 0.10, N = 12 5.27 MIN: 4.22 / MAX: 95.37 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 heikows3-2023-08-18-nvidia-gpu-compute 2 4 6 8 10 SE +/- 0.74, N = 11 6.33 MIN: 3.96 / MAX: 300.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 heikows3-2023-08-18-nvidia-gpu-compute 2 4 6 8 10 SE +/- 0.63, N = 12 7.41 MIN: 4.83 / MAX: 750.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet heikows3-2023-08-18-nvidia-gpu-compute 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.34, N = 12 5.33 MIN: 3.83 / MAX: 270.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 heikows3-2023-08-18-nvidia-gpu-compute 2 4 6 8 10 SE +/- 0.65, N = 12 7.87 MIN: 5.8 / MAX: 250.32 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface heikows3-2023-08-18-nvidia-gpu-compute 0.567 1.134 1.701 2.268 2.835 SE +/- 0.18, N = 12 2.52 MIN: 1.87 / MAX: 171.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet heikows3-2023-08-18-nvidia-gpu-compute 4 8 12 16 20 SE +/- 0.92, N = 12 14.86 MIN: 10.8 / MAX: 555.15 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 heikows3-2023-08-18-nvidia-gpu-compute 7 14 21 28 35 SE +/- 2.77, N = 12 30.29 MIN: 19.27 / MAX: 995.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 heikows3-2023-08-18-nvidia-gpu-compute 2 4 6 8 10 SE +/- 0.65, N = 12 8.71 MIN: 6.01 / MAX: 281.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet heikows3-2023-08-18-nvidia-gpu-compute 2 4 6 8 10 SE +/- 1.51, N = 12 7.88 MIN: 4.04 / MAX: 594.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 heikows3-2023-08-18-nvidia-gpu-compute 4 8 12 16 20 SE +/- 0.67, N = 12 16.37 MIN: 12.02 / MAX: 381.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny heikows3-2023-08-18-nvidia-gpu-compute 6 12 18 24 30 SE +/- 1.50, N = 12 24.55 MIN: 17.26 / MAX: 1220.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd heikows3-2023-08-18-nvidia-gpu-compute 3 6 9 12 15 SE +/- 0.35, N = 12 12.44 MIN: 9.33 / MAX: 436.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m heikows3-2023-08-18-nvidia-gpu-compute 4 8 12 16 20 SE +/- 1.11, N = 12 15.38 MIN: 11.62 / MAX: 228.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer heikows3-2023-08-18-nvidia-gpu-compute 20 40 60 80 100 SE +/- 9.48, N = 12 98.46 MIN: 64.31 / MAX: 2287.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet heikows3-2023-08-18-nvidia-gpu-compute 2 4 6 8 10 SE +/- 0.70, N = 11 8.40 MIN: 5.61 / MAX: 443.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter heikows3-2023-08-18-nvidia-gpu-compute 0.4808 0.9616 1.4424 1.9232 2.404 SE +/- 0.021, N = 14 2.137 1. (CXX) g++ options: -O2 -lOpenCL
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU heikows3-2023-08-18-nvidia-gpu-compute 4 8 12 16 20 SE +/- 0.00, N = 3 14.66 MIN: 14.04 / MAX: 14.8
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU heikows3-2023-08-18-nvidia-gpu-compute 3 6 9 12 15 SE +/- 0.18, N = 3 12.87 MIN: 3.79 / MAX: 15.91
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU heikows3-2023-08-18-nvidia-gpu-compute 3 6 9 12 15 SE +/- 0.03, N = 3 12.72 MIN: 11.03 / MAX: 16.62
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU heikows3-2023-08-18-nvidia-gpu-compute 3 6 9 12 15 SE +/- 0.02, N = 3 12.82 MIN: 3.62 / MAX: 15.6
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU heikows3-2023-08-18-nvidia-gpu-compute 6 12 18 24 30 SE +/- 0.26, N = 3 27.32 MIN: 25.15 / MAX: 29.06
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA1 heikows3-2023-08-18-nvidia-gpu-compute 9000M 18000M 27000M 36000M 45000M SE +/- 156567582.14, N = 3 43759733333
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: SHA-512 heikows3-2023-08-18-nvidia-gpu-compute 1100M 2200M 3300M 4400M 5500M SE +/- 15332644.91, N = 3 5317300000
OpenBenchmarking.org H/s, More Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS heikows3-2023-08-18-nvidia-gpu-compute 300K 600K 900K 1200K 1500K SE +/- 14853.10, N = 3 1464567
Mixbench A benchmark suite for GPUs on mixed operational intensity kernels. Learn more via the OpenBenchmarking.org test page.
Backend: OpenCL - Benchmark: Integer
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./mixbench: 3: ./mixbench-ocl-ro: not found
Backend: OpenCL - Benchmark: Double Precision
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./mixbench: 3: ./mixbench-ocl-ro: not found
Backend: OpenCL - Benchmark: Single Precision
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./mixbench: 3: ./mixbench-ocl-ro: not found
RedShift Demo This is a test of MAXON's RedShift demo build that currently requires NVIDIA GPU acceleration. Learn more via the OpenBenchmarking.org test page.
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./redshift: 3: /usr/redshift/bin/redshiftBenchmark: not found
FinanceBench FinanceBench is a collection of financial program benchmarks with support for benchmarking on the GPU via OpenCL and CPU benchmarking with OpenMP. The FinanceBench test cases are focused on Black-Sholes-Merton Process with Analytic European Option engine, QMC (Sobol) Monte-Carlo method (Equity Option Example), Bonds Fixed-rate bond with flat forward curve, and Repo Securities repurchase agreement. FinanceBench was originally written by the Cavazos Lab at University of Delaware. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL heikows3-2023-08-18-nvidia-gpu-compute 0.628 1.256 1.884 2.512 3.14 SE +/- 0.004, N = 3 2.791 1. (CXX) g++ options: -O3 -march=native -fopenmp
OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read heikows3-2023-08-18-nvidia-gpu-compute 150 300 450 600 750 SE +/- 0.29, N = 3 697.2 1. (CC) gcc options: -O2 -flto -lOpenCL
OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write heikows3-2023-08-18-nvidia-gpu-compute 100 200 300 400 500 SE +/- 0.47, N = 3 449.8 1. (CC) gcc options: -O2 -flto -lOpenCL
OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float heikows3-2023-08-18-nvidia-gpu-compute 20K 40K 60K 80K 100K SE +/- 664.25, N = 3 82945.32 1. (CXX) g++ options: -O3
OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double heikows3-2023-08-18-nvidia-gpu-compute 300 600 900 1200 1500 SE +/- 0.03, N = 3 1412.32 1. (CXX) g++ options: -O3
OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth heikows3-2023-08-18-nvidia-gpu-compute 140 280 420 560 700 SE +/- 0.03, N = 3 670.25 1. (CXX) g++ options: -O3
MandelGPU MandelGPU is an OpenCL benchmark and this test runs with the OpenCL rendering float4 kernel with a maximum of 4096 iterations. Learn more via the OpenBenchmarking.org test page.
OpenCL Device: GPU
heikows3-2023-08-18-nvidia-gpu-compute: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.
ViennaCL ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile makes use of ViennaCL's built-in benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY heikows3-2023-08-18-nvidia-gpu-compute 160 320 480 640 800 SE +/- 25.16, N = 13 764 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY heikows3-2023-08-18-nvidia-gpu-compute 200 400 600 800 1000 SE +/- 28.96, N = 13 1042 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT heikows3-2023-08-18-nvidia-gpu-compute 60 120 180 240 300 SE +/- 5.14, N = 13 252 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY heikows3-2023-08-18-nvidia-gpu-compute 60 120 180 240 300 SE +/- 9.27, N = 13 274 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY heikows3-2023-08-18-nvidia-gpu-compute 90 180 270 360 450 SE +/- 14.54, N = 13 411 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT heikows3-2023-08-18-nvidia-gpu-compute 70 140 210 280 350 SE +/- 26.27, N = 13 313.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N heikows3-2023-08-18-nvidia-gpu-compute 40 80 120 160 200 SE +/- 17.46, N = 12 202.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T heikows3-2023-08-18-nvidia-gpu-compute 50 100 150 200 250 SE +/- 4.86, N = 13 233 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN heikows3-2023-08-18-nvidia-gpu-compute 16 32 48 64 80 SE +/- 0.55, N = 13 73.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT heikows3-2023-08-18-nvidia-gpu-compute 15 30 45 60 75 SE +/- 0.56, N = 13 69.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN heikows3-2023-08-18-nvidia-gpu-compute 16 32 48 64 80 SE +/- 0.35, N = 12 74.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT heikows3-2023-08-18-nvidia-gpu-compute 16 32 48 64 80 SE +/- 0.48, N = 13 71.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY heikows3-2023-08-18-nvidia-gpu-compute 200 400 600 800 1000 SE +/- 0.00, N = 3 879 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY heikows3-2023-08-18-nvidia-gpu-compute 300 600 900 1200 1500 SE +/- 3.33, N = 3 1253 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT heikows3-2023-08-18-nvidia-gpu-compute 170 340 510 680 850 SE +/- 0.33, N = 3 774 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY heikows3-2023-08-18-nvidia-gpu-compute 120 240 360 480 600 SE +/- 0.33, N = 3 546 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY heikows3-2023-08-18-nvidia-gpu-compute 130 260 390 520 650 SE +/- 0.00, N = 3 624 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT heikows3-2023-08-18-nvidia-gpu-compute 130 260 390 520 650 SE +/- 0.33, N = 3 621 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N heikows3-2023-08-18-nvidia-gpu-compute 40 80 120 160 200 SE +/- 0.00, N = 3 196 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T heikows3-2023-08-18-nvidia-gpu-compute 90 180 270 360 450 SE +/- 0.33, N = 3 394 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN heikows3-2023-08-18-nvidia-gpu-compute 300 600 900 1200 1500 SE +/- 0.00, N = 3 1190 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN heikows3-2023-08-18-nvidia-gpu-compute 300 600 900 1200 1500 SE +/- 0.00, N = 3 1180 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
heikows3-2023-08-18-nvidia-gpu-compute Processor: AMD EPYC 7313 16-Core @ 3.00GHz (16 Cores / 32 Threads), Motherboard: GIGABYTE MZE2-G10-00 v01010101 (M07 BIOS), Chipset: AMD Starship/Matisse, Memory: 8 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E7, Disk: 7682GB Micron_7450_MTFDKCC7T6TFR + 1920GB Micron_7450_MTFDKBG1T9TFR, Graphics: ASPEED 45GB, Network: 2 x Intel I350
OS: Debian 12, Kernel: 6.2.16-3-pve (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.2.79, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 640x480
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa00115dGraphics Notes: BAR1 / Visible vRAM Size: 65536 MiB - vBIOS Version: 95.02.39.00.01Python Notes: Python 3.11.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 18 August 2023 07:46 by user root.