nvidia_opencl_linux

AMD wx4150 on Ubuntu 20.04.2 with ROCM

HTML result view exported from: https://openbenchmarking.org/result/2103141-HA-2011304FI52&grr.

nvidia_opencl_linuxProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkMonitorOSKernelDesktopDisplay ServerDisplay DriverOpenCLVulkanCompilerFile-SystemScreen ResolutionOpenGLnvidia_opencl_linuxamd_opencl_linuxIntel Core i7-4700MQ @ 3.40GHz (4 Cores / 8 Threads)HP 1909 (L70 Ver. 01.42 BIOS)Intel Xeon E3-1200 v3/4th32GB500GB Samsung SSD 860 + 256GB SAMSUNG MZ7PD256 + 500GB Seagate ST500LT012-1DG14 + 256GB SAMSUNG MZMPD256 + 128GB ED2S5NVIDIA Quadro M1000M 2GB (135/405MHz)IDT 92HD91BXXIntel I217-LM + Intel 7260Ubuntu 20.045.4.0-53-generic (x86_64)GNOME Shell 3.36.4X Server 1.20.8NVIDIA 450.80.02OpenCL 1.2 CUDA 11.0.2281.2.131GCC 9.3.0ext41920x1200HP 1909 (L70 Ver. 01.45 BIOS)500GB Samsung SSD 860 + 500GB Seagate ST500LT012-1DG14Intel HD 4600 2GB (1150MHz)HP ZR24w5.6.0-1042-oem (x86_64)X Server 1.20.94.5 Mesa 20.2.6OpenCL 2.0 AMD-APP (3212.0)1.2.1453840x1200OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave - CPU Microcode: 0x28 - Thermald 1.9.1OpenCL Details- nvidia_opencl_linux: GPU Compute Cores: 512Python Details- nvidia_opencl_linux: Python 3.8.5Security Details- itlb_multihit: KVM: Mitigation of Split huge pages + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT vulnerable + mds: Mitigation of Clear buffers; SMT vulnerable + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Mitigation of Microcode + tsx_async_abort: Not affected Kernel Details- amd_opencl_linux: Transparent Huge Pages: madviseGraphics Details- amd_opencl_linux: GLAMOR

nvidia_opencl_linuxblender: Barbershop - OpenCLblender: BMW27 - OpenCLshoc: OpenCL - Max SP Flopsrodinia: OpenCL Myocyteluxmark: GPU - Hotelluxmark: GPU - Microphoneluxmark: GPU - Luxball HDRrodinia: OpenCL Particle Filtersmallpt-gpu: GPU - 1920 x 1200 - Causticsmallpt-gpu: GPU - 1920 x 1200 - Caustic3clpeak: Double-Precision Doublesmallpt-gpu: GPU - 1920 x 1200 - Cornellcl-mem: Readcl-mem: Copycl-mem: Writerodinia: OpenCL Heartwalldarktable: Server Room - OpenCLdarktable: Boat - OpenCLclpeak: Kernel Latencyshoc: OpenCL - Texture Read Bandwidthdarktable: Masskrug - OpenCLclpeak: Integer Compute INTclpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferclpeak: Global Memory Bandwidthshoc: OpenCL - MD5 Hashclpeak: Single-Precision Floatrodinia: OpenCL LavaMDdarktable: Server Rack - OpenCLshoc: OpenCL - FFT SPshoc: OpenCL - Triadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadnvidia_opencl_linuxamd_opencl_linux2379.33694.901130.1255.3387512512373045.3061606756748160675699535.63160675686967.460.163.35.9784.59411.0317.75110.99010.822251.316.6310.9366.991.4322700.293.9510.323122.37310.566312.764012.6878908135230.3454903757532216157009271615701203115.44161570106379.668.574.17.7982.9028.0476.1381.66719.751368.4810.3619.3975.212.35021823.800.223217.1394.75025.24615.6690OpenBenchmarking.org

Blender

Blend File: Barbershop - Compute: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.90Blend File: Barbershop - Compute: OpenCLnvidia_opencl_linux5001000150020002500SE +/- 5.31, N = 32379.33

Blender

Blend File: BMW27 - Compute: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.90Blend File: BMW27 - Compute: OpenCLnvidia_opencl_linux150300450600750SE +/- 10.50, N = 3694.90

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Max SP Flopsnvidia_opencl_linuxamd_opencl_linux200K400K600K800K1000KSE +/- 1.98, N = 3SE +/- 1706.69, N = 31130.12908135.001. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Rodinia

Test: OpenCL Myocyte

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Myocytenvidia_opencl_linuxamd_opencl_linux50100150200250SE +/- 0.27, N = 3SE +/- 0.41, N = 355.34230.351. (CXX) g++ options: -O2 -lOpenCL

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: Hotelnvidia_opencl_linuxamd_opencl_linux160320480640800SE +/- 3.48, N = 3SE +/- 0.67, N = 3751490

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: Microphonenvidia_opencl_linuxamd_opencl_linux8001600240032004000SE +/- 2.67, N = 3SE +/- 10.84, N = 325123757

LuxMark

OpenCL Device: GPU - Scene: Luxball HDR

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: Luxball HDRnvidia_opencl_linuxamd_opencl_linux11002200330044005500SE +/- 12.68, N = 337305322

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle Filternvidia_opencl_linux1020304050SE +/- 0.03, N = 345.311. (CXX) g++ options: -O2 -lOpenCL

SmallPT GPU

OpenCL Device: GPU - Resolution: 1920 x 1200 - Scene: Caustic

OpenBenchmarking.orgSamples/sec, More Is BetterSmallPT GPU 1.6pts1OpenCL Device: GPU - Resolution: 1920 x 1200 - Scene: Causticnvidia_opencl_linuxamd_opencl_linux300M600M900M1200M1500MSE +/- 24.25, N = 3SE +/- 25.98, N = 3160675674816157009271. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

SmallPT GPU

OpenCL Device: GPU - Resolution: 1920 x 1200 - Scene: Caustic3

OpenBenchmarking.orgSamples/sec, More Is BetterSmallPT GPU 1.6pts1OpenCL Device: GPU - Resolution: 1920 x 1200 - Scene: Caustic3nvidia_opencl_linuxamd_opencl_linux300M600M900M1200M1500MSE +/- 23.96, N = 3SE +/- 26.27, N = 3160675699516157012031. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Double-Precision Doublenvidia_opencl_linuxamd_opencl_linux306090120150SE +/- 0.02, N = 3SE +/- 0.11, N = 335.63115.441. (CXX) g++ options: -O3 -rdynamic -lOpenCL

SmallPT GPU

OpenCL Device: GPU - Resolution: 1920 x 1200 - Scene: Cornell

OpenBenchmarking.orgSamples/sec, More Is BetterSmallPT GPU 1.6pts1OpenCL Device: GPU - Resolution: 1920 x 1200 - Scene: Cornellnvidia_opencl_linuxamd_opencl_linux300M600M900M1200M1500MSE +/- 20.78, N = 3SE +/- 24.54, N = 3160675686916157010631. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readnvidia_opencl_linuxamd_opencl_linux20406080100SE +/- 0.03, N = 3SE +/- 0.03, N = 367.479.61. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copynvidia_opencl_linuxamd_opencl_linux1530456075SE +/- 0.00, N = 3SE +/- 0.12, N = 360.168.51. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writenvidia_opencl_linuxamd_opencl_linux1632486480SE +/- 0.03, N = 3SE +/- 0.13, N = 363.374.11. (CC) gcc options: -O2 -flto -lOpenCL

Rodinia

Test: OpenCL Heartwall

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Heartwallnvidia_opencl_linuxamd_opencl_linux246810SE +/- 0.057, N = 14SE +/- 0.026, N = 35.9787.7981. (CXX) g++ options: -O2 -lOpenCL

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Server Room - Acceleration: OpenCLnvidia_opencl_linuxamd_opencl_linux1.03372.06743.10114.13485.1685SE +/- 0.007, N = 3SE +/- 0.021, N = 154.5942.902

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Boat - Acceleration: OpenCLnvidia_opencl_linuxamd_opencl_linux3691215SE +/- 0.011, N = 3SE +/- 0.044, N = 311.0318.047

clpeak

OpenCL Test: Kernel Latency

OpenBenchmarking.orgus, Fewer Is BetterclpeakOpenCL Test: Kernel Latencynvidia_opencl_linuxamd_opencl_linux246810SE +/- 0.05, N = 3SE +/- 0.06, N = 77.756.131. (CXX) g++ options: -O3 -rdynamic -lOpenCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read Bandwidthnvidia_opencl_linuxamd_opencl_linux20406080100SE +/- 0.57, N = 3SE +/- 0.21, N = 3110.9981.671. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Darktable

Test: Masskrug - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Masskrug - Acceleration: OpenCLnvidia_opencl_linuxamd_opencl_linux3691215SE +/- 0.011, N = 3SE +/- 0.032, N = 310.8229.751

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is BetterclpeakOpenCL Test: Integer Compute INTnvidia_opencl_linuxamd_opencl_linux80160240320400SE +/- 1.52, N = 3SE +/- 0.11, N = 3251.31368.481. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Transfer Bandwidth enqueueReadBuffer

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Transfer Bandwidth enqueueReadBuffernvidia_opencl_linuxamd_opencl_linux3691215SE +/- 0.01, N = 3SE +/- 0.07, N = 36.6310.361. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Transfer Bandwidth enqueueWriteBuffer

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Transfer Bandwidth enqueueWriteBuffernvidia_opencl_linuxamd_opencl_linux510152025SE +/- 0.06, N = 3SE +/- 0.03, N = 310.9319.391. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Global Memory Bandwidthnvidia_opencl_linuxamd_opencl_linux20406080100SE +/- 0.05, N = 3SE +/- 0.06, N = 366.9975.211. (CXX) g++ options: -O3 -rdynamic -lOpenCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 Hashnvidia_opencl_linuxamd_opencl_linux0.52881.05761.58642.11522.644SE +/- 0.0007, N = 3SE +/- 0.0003, N = 31.43222.35021. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Single-Precision Floatnvidia_opencl_linuxamd_opencl_linux400800120016002000SE +/- 0.31, N = 3SE +/- 0.11, N = 3700.291823.801. (CXX) g++ options: -O3 -rdynamic -lOpenCL

Rodinia

Test: OpenCL LavaMD

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL LavaMDnvidia_opencl_linux0.8891.7782.6673.5564.445SE +/- 0.052, N = 53.9511. (CXX) g++ options: -O2 -lOpenCL

Darktable

Test: Server Rack - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.0.1Test: Server Rack - Acceleration: OpenCLnvidia_opencl_linuxamd_opencl_linux0.07270.14540.21810.29080.3635SE +/- 0.001, N = 3SE +/- 0.002, N = 150.3230.223

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPnvidia_opencl_linuxamd_opencl_linux50100150200250SE +/- 0.82, N = 3SE +/- 0.28, N = 3122.37217.141. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Triadnvidia_opencl_linuxamd_opencl_linux3691215SE +/- 0.0068, N = 3SE +/- 0.0171, N = 310.56634.75021. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed Readbacknvidia_opencl_linuxamd_opencl_linux3691215SE +/- 0.0014, N = 3SE +/- 0.0057, N = 312.76405.24611. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed Downloadnvidia_opencl_linuxamd_opencl_linux3691215SE +/- 0.0011, N = 3SE +/- 0.0012, N = 312.68785.66901. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi


Phoronix Test Suite v10.8.4