garuda-unit-test AMD Ryzen 5 5600X 6-Core testing with a Gigabyte X570 I AORUS PRO WIFI (F36d BIOS) and ASUS NVIDIA GeForce RTX 3070 8GB on Garuda Soaring via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2301034-NE-GARUDAUNI78&grs .
garuda-unit-test Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Vulkan Compiler File-System Screen Resolution CantRemember AMD Ryzen 5 5600X 6-Core @ 3.70GHz (6 Cores / 12 Threads) Gigabyte X570 I AORUS PRO WIFI (F36d BIOS) AMD Starship/Matisse 16GB 1000GB CT1000P1SSD8 + 2000GB CT2000P3SSD8 ASUS NVIDIA GeForce RTX 3070 8GB NVIDIA GA104 HD Audio Odyssey G52A Intel I211 + Intel Wi-Fi 6 AX200 Garuda Soaring 6.1.1-zen1-1-zen (x86_64) KDE Plasma 5.26.4 X Server 1.21.1.6 NVIDIA 525.60.11 4.6.0 1.3.224 GCC 12.2.0 btrfs 5120x1440 OpenBenchmarking.org - Transparent Huge Pages: always - --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa201016 - BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.3a.40.20 - GPU Compute Cores: 5888 - Python 3.10.9 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
garuda-unit-test lulesh-cl: clpeak: Transfer Bandwidth enqueueWriteBuffer clpeak: Transfer Bandwidth enqueueReadBuffer clpeak: Global Memory Bandwidth clpeak: Double-Precision Double clpeak: Single-Precision Float clpeak: Integer Compute INT clpeak: Kernel Latency luxmark: CPU+GPU - Luxball HDR luxmark: CPU+GPU - Microphone luxmark: GPU - Luxball HDR luxmark: GPU - Microphone luxmark: CPU+GPU - Hotel luxmark: GPU - Hotel smallpt-gpu: GPU - 5120 x 1440 - Caustic3 smallpt-gpu: GPU - 5120 x 1440 - Cornell smallpt-gpu: GPU - 5120 x 1440 - Caustic darktable: Server Room - OpenCL darktable: Server Rack - OpenCL darktable: Masskrug - OpenCL darktable: Boat - OpenCL viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dGEMM-TT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sCOPY viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dCOPY rodinia: OpenCL Particle Filter rodinia: OpenCL Leukocyte rodinia: OpenCL Myocyte fluidx3d: FP32-FP16S fluidx3d: FP32-FP16C fluidx3d: FP32-FP32 cl-mem: Write cl-mem: Read cl-mem: Copy shoc: OpenCL - Texture Read Bandwidth shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Bus Speed Download shoc: OpenCL - Max SP Flops shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Reduction shoc: OpenCL - MD5 Hash shoc: OpenCL - FFT SP shoc: OpenCL - Triad shoc: OpenCL - S3D viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sCOPY parboil: OpenCL BFS CantRemember 5940.0720 24.45 14.13 390.58 358.24 20546.88 10411.51 5.25 51047 35587 50404 35183 11142 11041 1672726825 1672726690 1672726555 0.780 0.141 3.338 1.704 360 321 321 328 327 316 168 385 377 311 342 284 25.5 27.8 25.6 26.9 30.9 20.5 6.138 3.282 28.609 4506 4439 2246 321.1 344.0 259.5 2061.08 27.1039 26.3063 20986.7 3589.40 327.380 25.2819 1117.17 24.6456 217.979 40.1 42.6 29.8 37.7 37.1 24.1 OpenBenchmarking.org
Lulesh OpenCL OpenBenchmarking.org z/s, More Is Better Lulesh OpenCL 2017-07-06 CantRemember 1300 2600 3900 5200 6500 SE +/- 6.98, N = 3 5940.07 1. (CXX) g++ options: -std=c++11 -lOpenCL -O3 -lm
clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer OpenBenchmarking.org GBPS, More Is Better clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer CantRemember 6 12 18 24 30 SE +/- 0.19, N = 3 24.45 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer OpenBenchmarking.org GBPS, More Is Better clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer CantRemember 4 8 12 16 20 SE +/- 0.05, N = 3 14.13 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak OpenCL Test: Global Memory Bandwidth CantRemember 80 160 240 320 400 SE +/- 0.02, N = 3 390.58 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak OpenCL Test: Double-Precision Double CantRemember 80 160 240 320 400 SE +/- 0.60, N = 3 358.24 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak OpenCL Test: Single-Precision Float CantRemember 4K 8K 12K 16K 20K SE +/- 53.88, N = 3 20546.88 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak OpenCL Test: Integer Compute INT CantRemember 2K 4K 6K 8K 10K SE +/- 36.62, N = 3 10411.51 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Kernel Latency OpenBenchmarking.org us, Fewer Is Better clpeak OpenCL Test: Kernel Latency CantRemember 1.1813 2.3626 3.5439 4.7252 5.9065 SE +/- 0.01, N = 3 5.25 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
LuxMark OpenCL Device: CPU+GPU - Scene: Luxball HDR OpenBenchmarking.org Score, More Is Better LuxMark 3.1 OpenCL Device: CPU+GPU - Scene: Luxball HDR CantRemember 11K 22K 33K 44K 55K SE +/- 47.99, N = 3 51047
LuxMark OpenCL Device: CPU+GPU - Scene: Microphone OpenBenchmarking.org Score, More Is Better LuxMark 3.1 OpenCL Device: CPU+GPU - Scene: Microphone CantRemember 8K 16K 24K 32K 40K SE +/- 5.61, N = 3 35587
LuxMark OpenCL Device: GPU - Scene: Luxball HDR OpenBenchmarking.org Score, More Is Better LuxMark 3.1 OpenCL Device: GPU - Scene: Luxball HDR CantRemember 11K 22K 33K 44K 55K SE +/- 164.85, N = 3 50404
LuxMark OpenCL Device: GPU - Scene: Microphone OpenBenchmarking.org Score, More Is Better LuxMark 3.1 OpenCL Device: GPU - Scene: Microphone CantRemember 8K 16K 24K 32K 40K SE +/- 189.69, N = 3 35183
LuxMark OpenCL Device: CPU+GPU - Scene: Hotel OpenBenchmarking.org Score, More Is Better LuxMark 3.1 OpenCL Device: CPU+GPU - Scene: Hotel CantRemember 2K 4K 6K 8K 10K SE +/- 102.63, N = 12 11142
LuxMark OpenCL Device: GPU - Scene: Hotel OpenBenchmarking.org Score, More Is Better LuxMark 3.1 OpenCL Device: GPU - Scene: Hotel CantRemember 2K 4K 6K 8K 10K SE +/- 61.92, N = 3 11041
SmallPT GPU OpenCL Device: GPU - Resolution: 5120 x 1440 - Scene: Caustic3 OpenBenchmarking.org Samples/sec, More Is Better SmallPT GPU 1.6pts1 OpenCL Device: GPU - Resolution: 5120 x 1440 - Scene: Caustic3 CantRemember 400M 800M 1200M 1600M 2000M SE +/- 24.83, N = 3 1672726825 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
SmallPT GPU OpenCL Device: GPU - Resolution: 5120 x 1440 - Scene: Cornell OpenBenchmarking.org Samples/sec, More Is Better SmallPT GPU 1.6pts1 OpenCL Device: GPU - Resolution: 5120 x 1440 - Scene: Cornell CantRemember 400M 800M 1200M 1600M 2000M SE +/- 24.25, N = 3 1672726690 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
SmallPT GPU OpenCL Device: GPU - Resolution: 5120 x 1440 - Scene: Caustic OpenBenchmarking.org Samples/sec, More Is Better SmallPT GPU 1.6pts1 OpenCL Device: GPU - Resolution: 5120 x 1440 - Scene: Caustic CantRemember 400M 800M 1200M 1600M 2000M SE +/- 25.12, N = 3 1672726555 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
Darktable Test: Server Room - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.0 Test: Server Room - Acceleration: OpenCL CantRemember 0.1755 0.351 0.5265 0.702 0.8775 SE +/- 0.003, N = 3 0.780
Darktable Test: Server Rack - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.0 Test: Server Rack - Acceleration: OpenCL CantRemember 0.0317 0.0634 0.0951 0.1268 0.1585 SE +/- 0.001, N = 3 0.141
Darktable Test: Masskrug - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.0 Test: Masskrug - Acceleration: OpenCL CantRemember 0.7511 1.5022 2.2533 3.0044 3.7555 SE +/- 0.025, N = 3 3.338
Darktable Test: Boat - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 4.2.0 Test: Boat - Acceleration: OpenCL CantRemember 0.3834 0.7668 1.1502 1.5336 1.917 SE +/- 0.014, N = 15 1.704
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY CantRemember 80 160 240 320 400 SE +/- 4.50, N = 2 360 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT CantRemember 70 140 210 280 350 SE +/- 10.50, N = 2 321 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN CantRemember 70 140 210 280 350 SE +/- 9.21, N = 3 321 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT CantRemember 70 140 210 280 350 SE +/- 1.45, N = 3 328 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN CantRemember 70 140 210 280 350 SE +/- 0.88, N = 3 327 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T CantRemember 70 140 210 280 350 SE +/- 3.38, N = 3 316 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N CantRemember 40 80 120 160 200 SE +/- 4.26, N = 3 168 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT CantRemember 80 160 240 320 400 SE +/- 2.03, N = 3 385 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY CantRemember 80 160 240 320 400 SE +/- 3.84, N = 3 377 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT CantRemember 70 140 210 280 350 SE +/- 6.56, N = 3 311 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY CantRemember 70 140 210 280 350 SE +/- 4.93, N = 3 342 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY CantRemember 60 120 180 240 300 SE +/- 0.67, N = 3 284 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT CantRemember 6 12 18 24 30 SE +/- 0.24, N = 13 25.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN CantRemember 7 14 21 28 35 SE +/- 0.36, N = 15 27.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT CantRemember 6 12 18 24 30 SE +/- 0.15, N = 14 25.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN CantRemember 6 12 18 24 30 SE +/- 0.07, N = 15 26.9 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY CantRemember 7 14 21 28 35 SE +/- 0.27, N = 15 30.9 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY CantRemember 5 10 15 20 25 SE +/- 0.25, N = 15 20.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Rodinia Test: OpenCL Particle Filter OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Particle Filter CantRemember 2 4 6 8 10 SE +/- 0.065, N = 3 6.138 1. (CXX) g++ options: -O2 -lOpenCL
Rodinia Test: OpenCL Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Leukocyte CantRemember 0.7385 1.477 2.2155 2.954 3.6925 SE +/- 0.033, N = 6 3.282 1. (CXX) g++ options: -O2 -lOpenCL
Rodinia Test: OpenCL Myocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Myocyte CantRemember 7 14 21 28 35 SE +/- 0.38, N = 15 28.61 1. (CXX) g++ options: -O2 -lOpenCL
FluidX3D Test: FP32-FP16S OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 1.4 Test: FP32-FP16S CantRemember 1000 2000 3000 4000 5000 SE +/- 1.53, N = 3 4506
FluidX3D Test: FP32-FP16C OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 1.4 Test: FP32-FP16C CantRemember 1000 2000 3000 4000 5000 SE +/- 8.84, N = 3 4439
FluidX3D Test: FP32-FP32 OpenBenchmarking.org MLUPs/s, More Is Better FluidX3D 1.4 Test: FP32-FP32 CantRemember 500 1000 1500 2000 2500 SE +/- 1.53, N = 3 2246
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write CantRemember 70 140 210 280 350 SE +/- 0.07, N = 3 321.1 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read CantRemember 70 140 210 280 350 SE +/- 0.20, N = 3 344.0 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy CantRemember 60 120 180 240 300 SE +/- 0.27, N = 3 259.5 1. (CC) gcc options: -O2 -flto -lOpenCL
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth CantRemember 400 800 1200 1600 2000 SE +/- 3.84, N = 3 2061.08 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback CantRemember 6 12 18 24 30 SE +/- 0.01, N = 3 27.10 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download CantRemember 6 12 18 24 30 SE +/- 0.01, N = 3 26.31 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops CantRemember 4K 8K 12K 16K 20K SE +/- 245.20, N = 3 20986.7 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N CantRemember 800 1600 2400 3200 4000 SE +/- 22.79, N = 3 3589.40 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction CantRemember 70 140 210 280 350 SE +/- 0.35, N = 3 327.38 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash CantRemember 6 12 18 24 30 SE +/- 0.04, N = 3 25.28 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP CantRemember 200 400 600 800 1000 SE +/- 7.28, N = 3 1117.17 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad CantRemember 6 12 18 24 30 SE +/- 0.00, N = 3 24.65 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D CantRemember 50 100 150 200 250 SE +/- 0.19, N = 3 217.98 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T CantRemember 9 18 27 36 45 SE +/- 1.19, N = 13 40.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N CantRemember 10 20 30 40 50 SE +/- 0.70, N = 15 42.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT CantRemember 7 14 21 28 35 SE +/- 0.50, N = 14 29.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT CantRemember 9 18 27 36 45 SE +/- 1.56, N = 15 37.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY CantRemember 9 18 27 36 45 SE +/- 0.58, N = 15 37.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY CantRemember 6 12 18 24 30 SE +/- 0.79, N = 15 24.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Phoronix Test Suite v10.8.4