opencl benchmark smoke test ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402266-NE-OPENCLBEN88 .
opencl benchmark smoke test Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c d e f ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
opencl benchmark smoke test opencl-benchmark: FP64 Compute opencl-benchmark: FP32 Compute opencl-benchmark: INT64 Compute opencl-benchmark: INT32 Compute opencl-benchmark: INT16 Compute opencl-benchmark: INT8 Compute opencl-benchmark: Memory Bandwidth Coalesced Read opencl-benchmark: Memory Bandwidth Coalesced Write cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Kernel Latency clpeak: Integer Compute clpeak: Integer 24-bit Compute clpeak: Global Memory Bandwidth clpeak: Double-Precision Compute clpeak: Single-Precision Compute clpeak: Transfer Bandwidth enqueueReadBuffer clpeak: Transfer Bandwidth enqueueWriteBuffer arrayfire: Conjugate Gradient OpenCL financebench: Monte-Carlo OpenCL financebench: Black-Scholes OpenCL blender: BMW27 - CUDA a b c d e f 31.020 62.966 3.248 32.991 30.941 30.623 3614.32 3688.04 308.4 1045.3 2359.3 4.79 33143.19 33101.40 3485.68 32963.76 64543.49 295.59 379.83 2.998 58.974668 4.318 30.77 31.001 62.961 3.248 32.987 30.934 30.617 3616.79 3733.62 308.5 1046 2362.6 4.78 33132.95 33102.77 3486.17 32959.84 64543.33 295.58 379.79 3.002 59.186001 4.328 30.993 62.975 3.248 32.991 30.933 30.633 3617.16 3707.63 308.6 1045.9 2358.1 4.77 33144.05 33102.51 3484.48 32960.86 64546.27 295.51 379.65 3.001 59.507999 4.317 31 62.953 3.245 32.996 30.934 30.62 3616.23 3684.02 308.5 1046.3 2361.9 4.79 33118.5 33103.8 3486.17 32960.36 64546.76 295.33 379.55 2.987 59.050999 4.305 31.001 62.987 3.247 32.984 30.95 30.627 3615.04 3690.92 308.5 1046.1 2360.2 4.8 33116.43 33101.22 3485.5 32948.1 64538.91 295.22 379.32 2.99 58.950001 4.33 31.022 62.946 3.247 32.987 30.934 30.627 3612.22 3731.96 308.5 1046.2 2361.6 4.78 33144.05 33102 3486.06 32958.31 64541.86 295.4 379.43 2.986 59.888 4.293 30.82 OpenBenchmarking.org
ProjectPhysX OpenCL-Benchmark Operation: FP64 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP64 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 31.02 31.00 30.99 31.00 31.00 31.02 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: FP32 Compute OpenBenchmarking.org TFLOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: FP32 Compute a b c d e f 14 28 42 56 70 SE +/- 0.01, N = 3 62.97 62.96 62.98 62.95 62.99 62.95 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT64 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT64 Compute a b c d e f 0.7308 1.4616 2.1924 2.9232 3.654 SE +/- 0.002, N = 3 3.248 3.248 3.248 3.245 3.247 3.247 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT32 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT32 Compute a b c d e f 8 16 24 32 40 SE +/- 0.00, N = 3 32.99 32.99 32.99 33.00 32.98 32.99 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT16 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT16 Compute a b c d e f 7 14 21 28 35 SE +/- 0.01, N = 3 30.94 30.93 30.93 30.93 30.95 30.93 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: INT8 Compute OpenBenchmarking.org TIOPs/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: INT8 Compute a b c d e f 7 14 21 28 35 SE +/- 0.00, N = 3 30.62 30.62 30.63 30.62 30.63 30.63 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Read OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Read a b c d e f 800 1600 2400 3200 4000 SE +/- 0.88, N = 3 3614.32 3616.79 3617.16 3616.23 3615.04 3612.22 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
ProjectPhysX OpenCL-Benchmark Operation: Memory Bandwidth Coalesced Write OpenBenchmarking.org GB/s, More Is Better ProjectPhysX OpenCL-Benchmark 1.2 Operation: Memory Bandwidth Coalesced Write a b c d e f 800 1600 2400 3200 4000 SE +/- 12.89, N = 3 3688.04 3733.62 3707.63 3684.02 3690.92 3731.96 1. (CXX) g++ options: -std=c++17 -pthread -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy a b c d e f 70 140 210 280 350 SE +/- 0.07, N = 3 308.4 308.5 308.6 308.5 308.5 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read a b c d e f 200 400 600 800 1000 SE +/- 0.06, N = 3 1045.3 1046.0 1045.9 1046.3 1046.1 1046.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write a b c d e f 500 1000 1500 2000 2500 SE +/- 0.67, N = 3 2359.3 2362.6 2358.1 2361.9 2360.2 2361.6 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak 1.1.2 OpenCL Test: Kernel Latency a b c d e f 1.08 2.16 3.24 4.32 5.4 SE +/- 0.01, N = 3 4.79 4.78 4.77 4.79 4.80 4.78 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 0.38, N = 3 33143.19 33132.95 33144.05 33118.50 33116.43 33144.05 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer 24-bit Compute OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer 24-bit Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.71, N = 3 33101.40 33102.77 33102.51 33103.80 33101.22 33102.00 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth a b c d e f 700 1400 2100 2800 3500 SE +/- 0.36, N = 3 3485.68 3486.17 3484.48 3486.17 3485.50 3486.06 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Compute a b c d e f 7K 14K 21K 28K 35K SE +/- 1.90, N = 3 32963.76 32959.84 32960.86 32960.36 32948.10 32958.31 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Compute OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Compute a b c d e f 14K 28K 42K 56K 70K SE +/- 1.66, N = 3 64543.49 64543.33 64546.27 64546.76 64538.91 64541.86 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueReadBuffer a b c d e f 60 120 180 240 300 SE +/- 0.04, N = 3 295.59 295.58 295.51 295.33 295.22 295.40 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Transfer Bandwidth enqueueWriteBuffer a b c d e f 80 160 240 320 400 SE +/- 0.04, N = 3 379.83 379.79 379.65 379.55 379.32 379.43 1. (CXX) g++ options: -O3
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL a b c d e f 0.6755 1.351 2.0265 2.702 3.3775 SE +/- 0.004, N = 3 2.998 3.002 3.001 2.987 2.990 2.986 1. (CXX) g++ options: -O3
FinanceBench Benchmark: Monte-Carlo OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Monte-Carlo OpenCL a b c d e f 13 26 39 52 65 SE +/- 0.06, N = 3 58.97 59.19 59.51 59.05 58.95 59.89 1. (CXX) g++ options: -O3 -march=native -fopenmp
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL a b c d e f 0.9743 1.9486 2.9229 3.8972 4.8715 SE +/- 0.006, N = 3 4.318 4.328 4.317 4.305 4.330 4.293 1. (CXX) g++ options: -O3 -march=native -fopenmp
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender Blend File: BMW27 - Compute: CUDA a f 7 14 21 28 35 SE +/- 0.16, N = 3 30.77 30.82
Phoronix Test Suite v10.8.5