opencl benchmark smoke test ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402266-NE-OPENCLBEN88&grw&rdt .
opencl benchmark smoke test Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c d e f ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
opencl benchmark smoke test opencl-benchmark: FP32 Compute opencl-benchmark: Memory Bandwidth Coalesced Write opencl-benchmark: INT8 Compute opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: INT16 Compute opencl-benchmark: INT32 Compute opencl-benchmark: INT64 Compute arrayfire: Conjugate Gradient OpenCL financebench: Monte-Carlo OpenCL financebench: Black-Scholes OpenCL cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Kernel Latency clpeak: Integer Compute clpeak: Integer 24-bit Compute clpeak: Global Memory Bandwidth clpeak: Double-Precision Compute clpeak: Single-Precision Compute clpeak: Transfer Bandwidth enqueueReadBuffer clpeak: Transfer Bandwidth enqueueWriteBuffer blender: BMW27 - CUDA opencl-benchmark: FP64 Compute a b c d e f 62.966 3688.04 30.623 3614.32 30.941 32.991 3.248 2.998 58.974668 4.318 308.4 1045.3 2359.3 4.79 33143.19 33101.40 3485.68 32963.76 64543.49 295.59 379.83 30.77 31.020 62.961 3733.62 30.617 3616.79 30.934 32.987 3.248 3.002 59.186001 4.328 308.5 1046 2362.6 4.78 33132.95 33102.77 3486.17 32959.84 64543.33 295.58 379.79 31.001 62.975 3707.63 30.633 3617.16 30.933 32.991 3.248 3.001 59.507999 4.317 308.6 1045.9 2358.1 4.77 33144.05 33102.51 3484.48 32960.86 64546.27 295.51 379.65 30.993 62.953 3684.02 30.62 3616.23 30.934 32.996 3.245 2.987 59.050999 4.305 308.5 1046.3 2361.9 4.79 33118.5 33103.8 3486.17 32960.36 64546.76 295.33 379.55 31 62.987 3690.92 30.627 3615.04 30.95 32.984 3.247 2.99 58.950001 4.33 308.5 1046.1 2360.2 4.8 33116.43 33101.22 3485.5 32948.1 64538.91 295.22 379.32 31.001 62.946 3731.96 30.627 3612.22 30.934 32.987 3.247 2.986 59.888 4.293 308.5 1046.2 2361.6 4.78 33144.05 33102 3486.06 32958.31 64541.86 295.4 379.43 30.82 31.022 OpenBenchmarking.org
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute a b c d e f 14 28 42 56 70 SE +/- 0.01, N = 3 62.97 62.96 62.98 62.95 62.99 62.95 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c d e f 800 1600 2400 3200 4000 SE +/- 12.89, N = 3 3688.04 3733.62 3707.63 3684.02 3690.92 3731.96 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 30.62 30.62 30.63 30.62 30.63 30.63 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read a b c d e f 800 1600 2400 3200 4000 SE +/- 0.88, N = 3 3614.32 3616.79 3617.16 3616.23 3615.04 3612.22 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a b c d e f 7 14 21 28 35 SE +/- 0.01, N = 3 30.94 30.93 30.93 30.93 30.95 30.93 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute a b c d e f 8 16 24 32 40 SE +/- 0.00, N = 3 32.99 32.99 32.99 33.00 32.98 32.99 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a b c d e f 0.7308 1.4616 2.1924 2.9232 3.654 SE +/- 0.002, N = 3 3.248 3.248 3.248 3.245 3.247 3.247 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL a b c d e f 0.6755 1.351 2.0265 2.702 3.3775 SE +/- 0.004, N = 3 2.998 3.002 3.001 2.987 2.990 2.986 1. (CXX) g++ options: -O3
FinanceBench Benchmark: Monte-Carlo OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Monte-Carlo OpenCL a b c d e f 13 26 39 52 65 SE +/- 0.06, N = 3 58.97 59.19 59.51 59.05 58.95 59.89 1. (CXX) g++ options: -O3 -march=native -fopenmp
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL a b c d e f 0.9743 1.9486 2.9229 3.8972 4.8715 SE +/- 0.006, N = 3 4.318 4.328 4.317 4.305 4.330 4.293 1. (CXX) g++ options: -O3 -march=native -fopenmp
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy a b c d e f 70 140 210 280 350 SE +/- 0.07, N = 3 308.4 308.5 308.6 308.5 308.5 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read a b c d e f 200 400 600 800 1000 SE +/- 0.06, N = 3 1045.3 1046.0 1045.9 1046.3 1046.1 1046.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write a b c d e f 500 1000 1500 2000 2500 SE +/- 0.67, N = 3 2359.3 2362.6 2358.1 2361.9 2360.2 2361.6 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak 1.1.2 OpenCL Test: Kernel Latency a b c d e f 1.08 2.16 3.24 4.32 5.4 SE +/- 0.01, N = 3 4.79 4.78 4.77 4.79 4.80 4.78 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 0.38, N = 3 33143.19 33132.95 33144.05 33118.50 33116.43 33144.05 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer 24-bit Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.71, N = 3 33101.40 33102.77 33102.51 33103.80 33101.22 33102.00 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth a b c d e f 700 1400 2100 2800 3500 SE +/- 0.36, N = 3 3485.68 3486.17 3484.48 3486.17 3485.50 3486.06 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.90, N = 3 32963.76 32959.84 32960.86 32960.36 32948.10 32958.31 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Compute a b c d e f 14K 28K 42K 56K 70K SE +/- 1.66, N = 3 64543.49 64543.33 64546.27 64546.76 64538.91 64541.86 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer a b c d e f 60 120 180 240 300 SE +/- 0.04, N = 3 295.59 295.58 295.51 295.33 295.22 295.40 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer a b c d e f 80 160 240 320 400 SE +/- 0.04, N = 3 379.83 379.79 379.65 379.55 379.32 379.43 1. (CXX) g++ options: -O3
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender Blend File: BMW27 - Compute: CUDA a f 7 14 21 28 35 SE +/- 0.16, N = 3 30.77 30.82
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 31.02 31.00 30.99 31.00 31.00 31.02 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
Phoronix Test Suite v10.8.5