opencl benchmark smoke test

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402266-NE-OPENCLBEN88&grt&sor.

opencl benchmark smoke testProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLVulkanCompilerFile-SystemScreen ResolutionabcdefARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.891.3.277GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Graphics Details- BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

opencl benchmark smoke testarrayfire: Conjugate Gradient OpenCLblender: BMW27 - CUDAcl-mem: Copycl-mem: Readcl-mem: Writeclpeak: Kernel Latencyclpeak: Integer Computeclpeak: Integer 24-bit Computeclpeak: Global Memory Bandwidthclpeak: Double-Precision Computeclpeak: Single-Precision Computeclpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferfinancebench: Monte-Carlo OpenCLfinancebench: Black-Scholes OpenCLopencl-benchmark: FP64 Computeopencl-benchmark: FP32 Computeopencl-benchmark: INT64 Computeopencl-benchmark: INT32 Computeopencl-benchmark: INT16 Computeopencl-benchmark: INT8 Computeopencl-benchmark: Memory Bandwidth Coalesced Readopencl-benchmark: Memory Bandwidth Coalesced Writeabcdef2.99830.77308.41045.32359.34.7933143.1933101.403485.6832963.7664543.49295.59379.8358.9746684.31831.02062.9663.24832.99130.94130.6233614.323688.043.002308.510462362.64.7833132.9533102.773486.1732959.8464543.33295.58379.7959.1860014.32831.00162.9613.24832.98730.93430.6173616.793733.623.001308.61045.92358.14.7733144.0533102.513484.4832960.8664546.27295.51379.6559.5079994.31730.99362.9753.24832.99130.93330.6333617.163707.632.987308.51046.32361.94.7933118.533103.83486.1732960.3664546.76295.33379.5559.0509994.3053162.9533.24532.99630.93430.623616.233684.022.99308.51046.12360.24.833116.4333101.223485.532948.164538.91295.22379.3258.9500014.3331.00162.9873.24732.98430.9530.6273615.043690.922.98630.82308.51046.22361.64.7833144.05331023486.0632958.3164541.86295.4379.4359.8884.29331.02262.9463.24732.98730.93430.6273612.223731.96OpenBenchmarking.org

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.9Test: Conjugate Gradient OpenCLfdeacb0.67551.3512.02652.7023.3775SE +/- 0.004, N = 32.9862.9872.9902.9983.0013.0021. (CXX) g++ options: -O3

Blender

Blend File: BMW27 - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlenderBlend File: BMW27 - Compute: CUDAaf714212835SE +/- 0.16, N = 330.7730.82

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copycfedba70140210280350SE +/- 0.07, N = 3308.6308.5308.5308.5308.5308.41. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readdfebca2004006008001000SE +/- 0.06, N = 31046.31046.21046.11046.01045.91045.31. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writebdfeac5001000150020002500SE +/- 0.67, N = 32362.62361.92361.62360.22359.32358.11. (CC) gcc options: -O2 -flto -lOpenCL

clpeak

OpenCL Test: Kernel Latency

OpenBenchmarking.orgus, Fewer Is Betterclpeak 1.1.2OpenCL Test: Kernel Latencycbfade1.082.163.244.325.4SE +/- 0.01, N = 34.774.784.784.794.794.801. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Computefcabde7K14K21K28K35KSE +/- 0.38, N = 333144.0533144.0533143.1933132.9533118.5033116.431. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer 24-bit Compute

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer 24-bit Computedbcfae7K14K21K28K35KSE +/- 1.71, N = 333103.8033102.7733102.5133102.0033101.4033101.221. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthdbfaec7001400210028003500SE +/- 0.36, N = 33486.173486.173486.063485.683485.503484.481. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Compute

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Computeacdbfe7K14K21K28K35KSE +/- 1.90, N = 332963.7632960.8632960.3632959.8432958.3132948.101. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Compute

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Computedcabfe14K28K42K56K70KSE +/- 1.66, N = 364546.7664546.2764543.4964543.3364541.8664538.911. (CXX) g++ options: -O3

clpeak

OpenCL Test: Transfer Bandwidth enqueueReadBuffer

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueReadBufferabcfde60120180240300SE +/- 0.04, N = 3295.59295.58295.51295.40295.33295.221. (CXX) g++ options: -O3

clpeak

OpenCL Test: Transfer Bandwidth enqueueWriteBuffer

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Transfer Bandwidth enqueueWriteBufferabcdfe80160240320400SE +/- 0.04, N = 3379.83379.79379.65379.55379.43379.321. (CXX) g++ options: -O3

FinanceBench

Benchmark: Monte-Carlo OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Monte-Carlo OpenCLeadbcf1326395265SE +/- 0.06, N = 358.9558.9759.0559.1959.5159.891. (CXX) g++ options: -O3 -march=native -fopenmp

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLfdcabe0.97431.94862.92293.89724.8715SE +/- 0.006, N = 34.2934.3054.3174.3184.3284.3301. (CXX) g++ options: -O3 -march=native -fopenmp

ProjectPhysX OpenCL-Benchmark

Operation: FP64 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP64 Computefaebdc714212835SE +/- 0.00, N = 331.0231.0231.0031.0031.0030.991. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: FP32 Compute

OpenBenchmarking.orgTFLOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: FP32 Computeecabdf1428425670SE +/- 0.01, N = 362.9962.9862.9762.9662.9562.951. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT64 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT64 Computecbafed0.73081.46162.19242.92323.654SE +/- 0.002, N = 33.2483.2483.2483.2473.2473.2451. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT32 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT32 Computedcafbe816243240SE +/- 0.00, N = 333.0032.9932.9932.9932.9932.981. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT16 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT16 Computeeafdbc714212835SE +/- 0.01, N = 330.9530.9430.9330.9330.9330.931. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: INT8 Compute

OpenBenchmarking.orgTIOPs/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: INT8 Computecfeadb714212835SE +/- 0.00, N = 330.6330.6330.6330.6230.6230.621. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Read

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced Readcbdeaf8001600240032004000SE +/- 0.88, N = 33617.163616.793616.233615.043614.323612.221. (CXX) g++ options: -std=c++17 -pthread -lOpenCL

ProjectPhysX OpenCL-Benchmark

Operation: Memory Bandwidth Coalesced Write

OpenBenchmarking.orgGB/s, More Is BetterProjectPhysX OpenCL-Benchmark 1.2Operation: Memory Bandwidth Coalesced Writebfcead8001600240032004000SE +/- 12.89, N = 33733.623731.963707.633690.923688.043684.021. (CXX) g++ options: -std=c++17 -pthread -lOpenCL


Phoronix Test Suite v10.8.5