NVIDIA GPU Compute Benchmarks

Benchmarks for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2106127-IB-3080C630035&grw.

NVIDIA GPU Compute BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionRTX 3080 RBARAMD Ryzen 9 5900X 12-Core @ 3.70GHz (12 Cores / 24 Threads)ASUS ROG CROSSHAIR VIII HERO (3402 BIOS)AMD Starship/Matisse16GB1000GB Sabrent Rocket 4.0 Plus + 2000GBNVIDIA GeForce RTX 3080 10GBNVIDIA GA102 HD AudioASUS VP28URealtek RTL8125 2.5GbE + Intel I211Ubuntu 21.045.11.0-17-generic (x86_64)GNOME Shell 3.38.4X Server 1.20.11NVIDIA 465.314.6.0OpenCL 3.0 CUDA 11.3.1161.2.168GCC 10.3.0 + CUDA 11.3ext43840x2160OpenBenchmarking.org- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-mutex --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-gDeRY6/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009 - GPU Compute Cores: 8704- Python 3.9.5- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected

NVIDIA GPU Compute Benchmarksbetsy: ETC1 - Highestbetsy: ETC2 RGB - Highestplaidml: No - Inference - ResNet 50 - OpenCLplaidml: No - Inference - VGG16 - OpenCLplaidml: No - Inference - VGG19 - OpenCLlczero: OpenCLshoc: OpenCL - Texture Read Bandwidthshoc: OpenCL - FFT SPshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - MD5 Hashshoc: OpenCL - S3Darrayfire: BLAS OpenCLarrayfire: Conjugate Gradient OpenCLv-ray: NVIDIA CUDA GPUv-ray: NVIDIA RTX GPUblender: BMW27 - CUDAblender: BMW27 - NVIDIA OptiXblender: Classroom - CUDAblender: Classroom - NVIDIA OptiXblender: Fishy Cat - CUDAblender: Fishy Cat - NVIDIA OptiXblender: Pabellon Barcelona - CUDAblender: Pabellon Barcelona - NVIDIA OptiXblender: Barbershop - CUDAblender: Barbershop - NVIDIA OptiXindigobench: OpenCL GPU - Supercarindigobench: OpenCL GPU - Bedroomluxcorerender: DLSC - GPUluxcorerender: Rainbow Colors and Prism - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Orange Juice - GPUluxcorerender: Danish Mood - GPUfahbench: hashcat: MD5hashcat: SHA1hashcat: SHA-512hashcat: 7-Ziphashcat: TrueCrypt RIPEMD160 + XTSmixbench: NVIDIA CUDA - Single Precisionmixbench: NVIDIA CUDA - Double Precisionmixbench: NVIDIA CUDA - Half Precisionmixbench: NVIDIA CUDA - Integernamd-cuda: ATPase Simulation - 327,506 Atomsoctanebench: Total Scoreredshift: cl-mem: Readcl-mem: Writecl-mem: Copyclpeak: Global Memory Bandwidthclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Integer Compute INTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-TTrealsr-ncnn: 4x - Yesrealsr-ncnn: 4x - Novkpeak: fp32-scalarvkpeak: fp32-vec4vkpeak: fp16-scalarvkpeak: fp16-vec4vkpeak: fp64-scalarvkpeak: fp64-vec4vkpeak: int32-scalarvkpeak: int32-vec4vkpeak: int16-scalarvkpeak: int16-vec4vkresample: 2x - Singlewaifu2x-ncnn: 2x - 3 - YesRTX 3080 RBAR3.3444.528727.41280.78223.01373592205.991914.116243.8037.2172337.6437394.201.5631773237922.1111.3661.2236.4141.6922.98147.0253.33445.39408.7546.14417.47211.6228.499.9810.878.03321.05465859836666719164533333275370000099573372016729095.86417.9831706.6613924.200.12157567.026556165670.3634.5353.4659.4029422.68540.4415383.9135447755262759649649849436324037449632.5706.11517141.9222762.9617075.8034257.55534.10536.0517069.5917007.0311282.6714506.5211.3043.386OpenBenchmarking.org

Betsy GPU Compressor

Codec: ETC1 - Quality: Highest

OpenBenchmarking.orgSeconds, Fewer Is BetterBetsy GPU Compressor 1.1 BetaCodec: ETC1 - Quality: HighestRTX 3080 RBAR0.75241.50482.25723.00963.762SE +/- 0.015, N = 33.3441. (CXX) g++ options: -O3 -O2 -lpthread -ldl

Betsy GPU Compressor

Codec: ETC2 RGB - Quality: Highest

OpenBenchmarking.orgSeconds, Fewer Is BetterBetsy GPU Compressor 1.1 BetaCodec: ETC2 RGB - Quality: HighestRTX 3080 RBAR1.01882.03763.05644.07525.094SE +/- 0.049, N = 34.5281. (CXX) g++ options: -O3 -O2 -lpthread -ldl

PlaidML

FP16: No - Mode: Inference - Network: ResNet 50 - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: ResNet 50 - Device: OpenCLRTX 3080 RBAR160320480640800SE +/- 1.15, N = 3727.41

PlaidML

FP16: No - Mode: Inference - Network: VGG16 - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: VGG16 - Device: OpenCLRTX 3080 RBAR60120180240300SE +/- 0.65, N = 3280.78

PlaidML

FP16: No - Mode: Inference - Network: VGG19 - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: VGG19 - Device: OpenCLRTX 3080 RBAR50100150200250SE +/- 0.43, N = 3223.01

LeelaChessZero

Backend: OpenCL

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.26Backend: OpenCLRTX 3080 RBAR8K16K24K32K40KSE +/- 129.19, N = 3373591. (CXX) g++ options: -flto -pthread

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthRTX 3080 RBAR5001000150020002500SE +/- 3.47, N = 32205.991. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPRTX 3080 RBAR400800120016002000SE +/- 3.10, N = 31914.111. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NRTX 3080 RBAR13002600390052006500SE +/- 27.64, N = 36243.801. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashRTX 3080 RBAR918273645SE +/- 0.00, N = 337.221. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DRTX 3080 RBAR70140210280350SE +/- 0.27, N = 3337.641. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

ArrayFire

Test: BLAS OpenCL

OpenBenchmarking.orgGFLOPS, More Is BetterArrayFire 3.7Test: BLAS OpenCLRTX 3080 RBAR16003200480064008000SE +/- 3.09, N = 37394.201. (CXX) g++ options: -rdynamic

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.7Test: Conjugate Gradient OpenCLRTX 3080 RBAR0.35170.70341.05511.40681.7585SE +/- 0.003, N = 31.5631. (CXX) g++ options: -rdynamic

Chaos Group V-RAY

Mode: NVIDIA CUDA GPU

OpenBenchmarking.orgvpaths, More Is BetterChaos Group V-RAY 5Mode: NVIDIA CUDA GPURTX 3080 RBAR400800120016002000SE +/- 0.33, N = 31773

Chaos Group V-RAY

Mode: NVIDIA RTX GPU

OpenBenchmarking.orgvrays, More Is BetterChaos Group V-RAY 5Mode: NVIDIA RTX GPURTX 3080 RBAR5001000150020002500SE +/- 2.73, N = 32379

Blender

Blend File: BMW27 - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: BMW27 - Compute: CUDARTX 3080 RBAR510152025SE +/- 0.03, N = 322.11

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: BMW27 - Compute: NVIDIA OptiXRTX 3080 RBAR3691215SE +/- 0.03, N = 311.36

Blender

Blend File: Classroom - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Classroom - Compute: CUDARTX 3080 RBAR1428425670SE +/- 0.08, N = 361.22

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Classroom - Compute: NVIDIA OptiXRTX 3080 RBAR816243240SE +/- 0.03, N = 336.41

Blender

Blend File: Fishy Cat - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Fishy Cat - Compute: CUDARTX 3080 RBAR1020304050SE +/- 0.01, N = 341.69

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Fishy Cat - Compute: NVIDIA OptiXRTX 3080 RBAR612182430SE +/- 0.02, N = 322.98

Blender

Blend File: Pabellon Barcelona - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Pabellon Barcelona - Compute: CUDARTX 3080 RBAR306090120150SE +/- 0.06, N = 3147.02

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXRTX 3080 RBAR1224364860SE +/- 0.17, N = 353.33

Blender

Blend File: Barbershop - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Barbershop - Compute: CUDARTX 3080 RBAR100200300400500SE +/- 0.22, N = 3445.39

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.92Blend File: Barbershop - Compute: NVIDIA OptiXRTX 3080 RBAR90180270360450SE +/- 0.62, N = 3408.75

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: SupercarRTX 3080 RBAR1020304050SE +/- 0.04, N = 346.14

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: BedroomRTX 3080 RBAR48121620SE +/- 0.01, N = 317.47

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.5Scene: DLSC - Acceleration: GPURTX 3080 RBAR3691215SE +/- 0.01, N = 311.62MIN: 11.21 / MAX: 11.85

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.5Scene: Rainbow Colors and Prism - Acceleration: GPURTX 3080 RBAR714212835SE +/- 0.06, N = 328.49MIN: 26.83 / MAX: 31.13

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.5Scene: LuxCore Benchmark - Acceleration: GPURTX 3080 RBAR3691215SE +/- 0.01, N = 39.98MIN: 3.02 / MAX: 11.81

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.5Scene: Orange Juice - Acceleration: GPURTX 3080 RBAR3691215SE +/- 0.01, N = 310.87MIN: 8.73 / MAX: 14.44

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.5Scene: Danish Mood - Acceleration: GPURTX 3080 RBAR246810SE +/- 0.06, N = 38.03MIN: 2.38 / MAX: 9.73

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2RTX 3080 RBAR70140210280350SE +/- 0.67, N = 3321.05

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.1.1Benchmark: MD5RTX 3080 RBAR13000M26000M39000M52000M65000MSE +/- 81551462.96, N = 358598366667

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.1.1Benchmark: SHA1RTX 3080 RBAR4000M8000M12000M16000M20000MSE +/- 22857116.57, N = 319164533333

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.1.1Benchmark: SHA-512RTX 3080 RBAR600M1200M1800M2400M3000MSE +/- 2042057.79, N = 32753700000

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.1.1Benchmark: 7-ZipRTX 3080 RBAR200K400K600K800K1000KSE +/- 2771.48, N = 3995733

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.1.1Benchmark: TrueCrypt RIPEMD160 + XTSRTX 3080 RBAR150K300K450K600K750KSE +/- 1322.04, N = 3720167

Mixbench

Backend: NVIDIA CUDA - Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Single PrecisionRTX 3080 RBAR6K12K18K24K30KSE +/- 555.26, N = 1529095.861. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Double Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Double PrecisionRTX 3080 RBAR90180270360450SE +/- 7.75, N = 15417.981. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Half Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: Half PrecisionRTX 3080 RBAR7K14K21K28K35KSE +/- 650.08, N = 1231706.661. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: NVIDIA CUDA - Benchmark: Integer

OpenBenchmarking.orgGIOPS, More Is BetterMixbench 2020-06-23Backend: NVIDIA CUDA - Benchmark: IntegerRTX 3080 RBAR3K6K9K12K15KSE +/- 258.54, N = 1513924.201. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 AtomsRTX 3080 RBAR0.02740.05480.08220.10960.137SE +/- 0.00003, N = 30.12157

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total ScoreRTX 3080 RBAR120240360480600567.03

RedShift Demo

OpenBenchmarking.orgSeconds, Fewer Is BetterRedShift Demo 3.0RTX 3080 RBAR4080120160200SE +/- 0.67, N = 3165

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadRTX 3080 RBAR140280420560700SE +/- 0.82, N = 3670.31. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteRTX 3080 RBAR140280420560700SE +/- 0.68, N = 3634.51. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyRTX 3080 RBAR80160240320400SE +/- 0.12, N = 3353.41. (CC) gcc options: -O2 -flto -lOpenCL

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Global Memory BandwidthRTX 3080 RBAR140280420560700SE +/- 2.74, N = 3659.401. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Single-Precision FloatRTX 3080 RBAR6K12K18K24K30KSE +/- 169.65, N = 329422.681. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Double-Precision DoubleRTX 3080 RBAR120240360480600SE +/- 0.39, N = 3540.441. (CXX) g++ options: -O3 -rdynamic -lOpenCL

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is BetterclpeakOpenCL Test: Integer Compute INTRTX 3080 RBAR3K6K9K12K15KSE +/- 105.69, N = 1515383.911. (CXX) g++ options: -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYRTX 3080 RBAR80160240320400SE +/- 0.67, N = 33541. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYRTX 3080 RBAR100200300400500SE +/- 0.33, N = 34771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYRTX 3080 RBAR120240360480600SE +/- 0.33, N = 35521. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYRTX 3080 RBAR1402804205607006271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTRTX 3080 RBAR1302603905206505961. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNRTX 3080 RBAR110220330440550SE +/- 1.33, N = 34961. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTRTX 3080 RBAR110220330440550SE +/- 1.20, N = 34981. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNRTX 3080 RBAR110220330440550SE +/- 1.00, N = 34941. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTRTX 3080 RBAR80160240320400SE +/- 0.33, N = 33631. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NRTX 3080 RBAR50100150200250SE +/- 0.33, N = 32401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TRTX 3080 RBAR80160240320400SE +/- 0.33, N = 33741. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTRTX 3080 RBAR1102203304405504961. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: YesRTX 3080 RBAR816243240SE +/- 0.07, N = 332.57

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: NoRTX 3080 RBAR246810SE +/- 0.009, N = 36.115

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp32-scalarRTX 3080 RBAR4K8K12K16K20KSE +/- 42.89, N = 317141.92

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp32-vec4RTX 3080 RBAR5K10K15K20K25KSE +/- 39.23, N = 322762.96

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp16-scalarRTX 3080 RBAR4K8K12K16K20KSE +/- 42.62, N = 317075.80

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp16-vec4RTX 3080 RBAR7K14K21K28K35KSE +/- 84.67, N = 334257.55

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp64-scalarRTX 3080 RBAR120240360480600SE +/- 1.41, N = 3534.10

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp64-vec4RTX 3080 RBAR120240360480600SE +/- 0.35, N = 3536.05

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int32-scalarRTX 3080 RBAR4K8K12K16K20KSE +/- 3.48, N = 317069.59

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int32-vec4RTX 3080 RBAR4K8K12K16K20KSE +/- 2.37, N = 317007.03

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int16-scalarRTX 3080 RBAR2K4K6K8K10KSE +/- 0.12, N = 311282.67

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int16-vec4RTX 3080 RBAR3K6K9K12K15KSE +/- 19.69, N = 314506.52

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: SingleRTX 3080 RBAR3691215SE +/- 0.01, N = 311.301. (CXX) g++ options: -O3 -pthread

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: YesRTX 3080 RBAR0.76191.52382.28573.04763.8095SE +/- 0.006, N = 33.386


Phoronix Test Suite v10.8.5