opencl benchmark smoke test ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402266-NE-OPENCLBEN88&grr&rdt .
opencl benchmark smoke test Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c d e f ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
opencl benchmark smoke test blender: BMW27 - CUDA opencl-benchmark: Memory Bandwidth Coalesced Write opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: INT8 Compute opencl-benchmark: INT16 Compute opencl-benchmark: INT32 Compute opencl-benchmark: INT64 Compute opencl-benchmark: FP32 Compute opencl-benchmark: FP64 Compute financebench: Monte-Carlo OpenCL clpeak: Transfer Bandwidth enqueueReadBuffer clpeak: Transfer Bandwidth enqueueWriteBuffer cl-mem: Copy cl-mem: Write cl-mem: Read arrayfire: Conjugate Gradient OpenCL clpeak: Global Memory Bandwidth clpeak: Single-Precision Compute clpeak: Integer 24-bit Compute clpeak: Integer Compute financebench: Black-Scholes OpenCL clpeak: Kernel Latency clpeak: Double-Precision Compute a b c d e f 30.77 3688.04 3614.32 30.623 30.941 32.991 3.248 62.966 31.020 58.974668 295.59 379.83 308.4 2359.3 1045.3 2.998 3485.68 64543.49 33101.40 33143.19 4.318 4.79 32963.76 3733.62 3616.79 30.617 30.934 32.987 3.248 62.961 31.001 59.186001 295.58 379.79 308.5 2362.6 1046 3.002 3486.17 64543.33 33102.77 33132.95 4.328 4.78 32959.84 3707.63 3617.16 30.633 30.933 32.991 3.248 62.975 30.993 59.507999 295.51 379.65 308.6 2358.1 1045.9 3.001 3484.48 64546.27 33102.51 33144.05 4.317 4.77 32960.86 3684.02 3616.23 30.62 30.934 32.996 3.245 62.953 31 59.050999 295.33 379.55 308.5 2361.9 1046.3 2.987 3486.17 64546.76 33103.8 33118.5 4.305 4.79 32960.36 3690.92 3615.04 30.627 30.95 32.984 3.247 62.987 31.001 58.950001 295.22 379.32 308.5 2360.2 1046.1 2.99 3485.5 64538.91 33101.22 33116.43 4.33 4.8 32948.1 30.82 3731.96 3612.22 30.627 30.934 32.987 3.247 62.946 31.022 59.888 295.4 379.43 308.5 2361.6 1046.2 2.986 3486.06 64541.86 33102 33144.05 4.293 4.78 32958.31 OpenBenchmarking.org
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender Blend File: BMW27 - Compute: CUDA a f 7 14 21 28 35 SE +/- 0.16, N = 3 30.77 30.82
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c d e f 800 1600 2400 3200 4000 SE +/- 12.89, N = 3 3688.04 3733.62 3707.63 3684.02 3690.92 3731.96 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read a b c d e f 800 1600 2400 3200 4000 SE +/- 0.88, N = 3 3614.32 3616.79 3617.16 3616.23 3615.04 3612.22 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 30.62 30.62 30.63 30.62 30.63 30.63 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a b c d e f 7 14 21 28 35 SE +/- 0.01, N = 3 30.94 30.93 30.93 30.93 30.95 30.93 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute a b c d e f 8 16 24 32 40 SE +/- 0.00, N = 3 32.99 32.99 32.99 33.00 32.98 32.99 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a b c d e f 0.7308 1.4616 2.1924 2.9232 3.654 SE +/- 0.002, N = 3 3.248 3.248 3.248 3.245 3.247 3.247 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute a b c d e f 14 28 42 56 70 SE +/- 0.01, N = 3 62.97 62.96 62.98 62.95 62.99 62.95 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 31.02 31.00 30.99 31.00 31.00 31.02 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
FinanceBench Benchmark: Monte-Carlo OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Monte-Carlo OpenCL a b c d e f 13 26 39 52 65 SE +/- 0.06, N = 3 58.97 59.19 59.51 59.05 58.95 59.89 1. (CXX) g++ options: -O3 -march=native -fopenmp
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer a b c d e f 60 120 180 240 300 SE +/- 0.04, N = 3 295.59 295.58 295.51 295.33 295.22 295.40 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer a b c d e f 80 160 240 320 400 SE +/- 0.04, N = 3 379.83 379.79 379.65 379.55 379.32 379.43 1. (CXX) g++ options: -O3
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy a b c d e f 70 140 210 280 350 SE +/- 0.07, N = 3 308.4 308.5 308.6 308.5 308.5 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write a b c d e f 500 1000 1500 2000 2500 SE +/- 0.67, N = 3 2359.3 2362.6 2358.1 2361.9 2360.2 2361.6 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read a b c d e f 200 400 600 800 1000 SE +/- 0.06, N = 3 1045.3 1046.0 1045.9 1046.3 1046.1 1046.2 1. (CC) gcc options: -O2 -flto -lOpenCL
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL a b c d e f 0.6755 1.351 2.0265 2.702 3.3775 SE +/- 0.004, N = 3 2.998 3.002 3.001 2.987 2.990 2.986 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth a b c d e f 700 1400 2100 2800 3500 SE +/- 0.36, N = 3 3485.68 3486.17 3484.48 3486.17 3485.50 3486.06 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Compute a b c d e f 14K 28K 42K 56K 70K SE +/- 1.66, N = 3 64543.49 64543.33 64546.27 64546.76 64538.91 64541.86 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer 24-bit Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.71, N = 3 33101.40 33102.77 33102.51 33103.80 33101.22 33102.00 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 0.38, N = 3 33143.19 33132.95 33144.05 33118.50 33116.43 33144.05 1. (CXX) g++ options: -O3
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL a b c d e f 0.9743 1.9486 2.9229 3.8972 4.8715 SE +/- 0.006, N = 3 4.318 4.328 4.317 4.305 4.330 4.293 1. (CXX) g++ options: -O3 -march=native -fopenmp
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak 1.1.2 OpenCL Test: Kernel Latency a b c d e f 1.08 2.16 3.24 4.32 5.4 SE +/- 0.01, N = 3 4.79 4.78 4.77 4.79 4.80 4.78 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.90, N = 3 32963.76 32959.84 32960.86 32960.36 32948.10 32958.31 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5