opencl benchmark smoke test ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402266-NE-OPENCLBEN88&grs .
opencl benchmark smoke test Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c d e f ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
opencl benchmark smoke test financebench: Monte-Carlo OpenCL opencl-benchmark: Memory Bandwidth Coalesced Write financebench: Black-Scholes OpenCL clpeak: Kernel Latency arrayfire: Conjugate Gradient OpenCL cl-mem: Write blender: BMW27 - CUDA opencl-benchmark: Memory Bandwidth Coalesced Read clpeak: Transfer Bandwidth enqueueWriteBuffer clpeak: Transfer Bandwidth enqueueReadBuffer cl-mem: Read opencl-benchmark: FP64 Compute opencl-benchmark: INT64 Compute clpeak: Integer Compute opencl-benchmark: FP32 Compute cl-mem: Copy opencl-benchmark: INT16 Compute opencl-benchmark: INT8 Compute clpeak: Global Memory Bandwidth clpeak: Double-Precision Compute opencl-benchmark: INT32 Compute clpeak: Single-Precision Compute clpeak: Integer 24-bit Compute a b c d e f 58.974668 3688.04 4.318 4.79 2.998 2359.3 30.77 3614.32 379.83 295.59 1045.3 31.020 3.248 33143.19 62.966 308.4 30.941 30.623 3485.68 32963.76 32.991 64543.49 33101.40 59.186001 3733.62 4.328 4.78 3.002 2362.6 3616.79 379.79 295.58 1046 31.001 3.248 33132.95 62.961 308.5 30.934 30.617 3486.17 32959.84 32.987 64543.33 33102.77 59.507999 3707.63 4.317 4.77 3.001 2358.1 3617.16 379.65 295.51 1045.9 30.993 3.248 33144.05 62.975 308.6 30.933 30.633 3484.48 32960.86 32.991 64546.27 33102.51 59.050999 3684.02 4.305 4.79 2.987 2361.9 3616.23 379.55 295.33 1046.3 31 3.245 33118.5 62.953 308.5 30.934 30.62 3486.17 32960.36 32.996 64546.76 33103.8 58.950001 3690.92 4.33 4.8 2.99 2360.2 3615.04 379.32 295.22 1046.1 31.001 3.247 33116.43 62.987 308.5 30.95 30.627 3485.5 32948.1 32.984 64538.91 33101.22 59.888 3731.96 4.293 4.78 2.986 2361.6 30.82 3612.22 379.43 295.4 1046.2 31.022 3.247 33144.05 62.946 308.5 30.934 30.627 3486.06 32958.31 32.987 64541.86 33102 OpenBenchmarking.org
FinanceBench Benchmark: Monte-Carlo OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Monte-Carlo OpenCL a b c d e f 13 26 39 52 65 SE +/- 0.06, N = 3 58.97 59.19 59.51 59.05 58.95 59.89 1. (CXX) g++ options: -O3 -march=native -fopenmp
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c d e f 800 1600 2400 3200 4000 SE +/- 12.89, N = 3 3688.04 3733.62 3707.63 3684.02 3690.92 3731.96 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL a b c d e f 0.9743 1.9486 2.9229 3.8972 4.8715 SE +/- 0.006, N = 3 4.318 4.328 4.317 4.305 4.330 4.293 1. (CXX) g++ options: -O3 -march=native -fopenmp
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak 1.1.2 OpenCL Test: Kernel Latency a b c d e f 1.08 2.16 3.24 4.32 5.4 SE +/- 0.01, N = 3 4.79 4.78 4.77 4.79 4.80 4.78 1. (CXX) g++ options: -O3
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL a b c d e f 0.6755 1.351 2.0265 2.702 3.3775 SE +/- 0.004, N = 3 2.998 3.002 3.001 2.987 2.990 2.986 1. (CXX) g++ options: -O3
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write a b c d e f 500 1000 1500 2000 2500 SE +/- 0.67, N = 3 2359.3 2362.6 2358.1 2361.9 2360.2 2361.6 1. (CC) gcc options: -O2 -flto -lOpenCL
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender Blend File: BMW27 - Compute: CUDA a f 7 14 21 28 35 SE +/- 0.16, N = 3 30.77 30.82
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read a b c d e f 800 1600 2400 3200 4000 SE +/- 0.88, N = 3 3614.32 3616.79 3617.16 3616.23 3615.04 3612.22 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer a b c d e f 80 160 240 320 400 SE +/- 0.04, N = 3 379.83 379.79 379.65 379.55 379.32 379.43 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer a b c d e f 60 120 180 240 300 SE +/- 0.04, N = 3 295.59 295.58 295.51 295.33 295.22 295.40 1. (CXX) g++ options: -O3
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read a b c d e f 200 400 600 800 1000 SE +/- 0.06, N = 3 1045.3 1046.0 1045.9 1046.3 1046.1 1046.2 1. (CC) gcc options: -O2 -flto -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 31.02 31.00 30.99 31.00 31.00 31.02 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a b c d e f 0.7308 1.4616 2.1924 2.9232 3.654 SE +/- 0.002, N = 3 3.248 3.248 3.248 3.245 3.247 3.247 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
clpeak OpenCL Test: Integer Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 0.38, N = 3 33143.19 33132.95 33144.05 33118.50 33116.43 33144.05 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute a b c d e f 14 28 42 56 70 SE +/- 0.01, N = 3 62.97 62.96 62.98 62.95 62.99 62.95 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy a b c d e f 70 140 210 280 350 SE +/- 0.07, N = 3 308.4 308.5 308.6 308.5 308.5 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a b c d e f 7 14 21 28 35 SE +/- 0.01, N = 3 30.94 30.93 30.93 30.93 30.95 30.93 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 30.62 30.62 30.63 30.62 30.63 30.63 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth a b c d e f 700 1400 2100 2800 3500 SE +/- 0.36, N = 3 3485.68 3486.17 3484.48 3486.17 3485.50 3486.06 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.90, N = 3 32963.76 32959.84 32960.86 32960.36 32948.10 32958.31 1. (CXX) g++ options: -O3
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute a b c d e f 8 16 24 32 40 SE +/- 0.00, N = 3 32.99 32.99 32.99 33.00 32.98 32.99 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
clpeak OpenCL Test: Single-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Compute a b c d e f 14K 28K 42K 56K 70K SE +/- 1.66, N = 3 64543.49 64543.33 64546.27 64546.76 64538.91 64541.86 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer 24-bit Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.71, N = 3 33101.40 33102.77 33102.51 33103.80 33101.22 33102.00 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5