compulab-airtop-3-rtx-4000-compute Intel Xeon E-2288G testing with a Compulab SBC-ATCFL v1.2 (ATOP3.PRD.0.29.2 BIOS) and NVIDIA Quadro RTX 4000 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2010311-FI-COMPULABA24&sro&gru .
compulab-airtop-3-rtx-4000-compute Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution 1 1a 2 1b 1c 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 NVIDIA RTX 4000 Intel Xeon E-2288G @ 5.00GHz (8 Cores / 16 Threads) Compulab SBC-ATCFL v1.2 (ATOP3.PRD.0.29.2 BIOS) Intel Cannon Lake PCH 64GB Samsung SSD 970 EVO Plus 250GB NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) Intel Cannon Lake PCH cAVS VE228 Intel I219-LM + Intel I210 Ubuntu 20.10 5.8.0-26-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 NVIDIA 455.28 4.6.0 OpenCL 1.2 CUDA 11.1.96 1.2.142 GCC 10.2.0 ext4 1920x1080 NVIDIA Quadro RTX 4000 8GB (300/405MHz) NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) NVIDIA Quadro RTX 4000 8GB (300/405MHz) NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd6 - Thermald 2.3 OpenCL Details - GPU Compute Cores: 2304 Python Details - 1, 1a, 1b, 1d, 1e, NVIDIA Quadro RTX 4000, RTX 4000, NVIDIA RTX 4000: Python 3.8.6 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Mitigation of TSX disabled + tsx_async_abort: Mitigation of TSX disabled
compulab-airtop-3-rtx-4000-compute vkfft: plaidml: No - Inference - IMDB LSTM - OpenCL plaidml: No - Inference - Mobilenet - OpenCL plaidml: Yes - Inference - Mobilenet - OpenCL plaidml: No - Inference - DenseNet 201 - OpenCL neatbench: GPU cl-mem: Copy cl-mem: Read cl-mem: Write clpeak: Global Memory Bandwidth viennacl: OpenCL LU Factorization clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Integer Compute INT hashcat: MD5 hashcat: SHA1 hashcat: 7-Zip hashcat: SHA-512 hashcat: TrueCrypt RIPEMD160 + XTS luxcorerender-cl: DLSC luxcorerender-cl: Food luxcorerender-cl: LuxCore Benchmark luxcorerender-cl: Rainbow Colors and Prism fahbench: mandelgpu: GPU financebench: Black-Scholes OpenCL arrayfire: Conjugate Gradient OpenCL ncnn: Vulkan GPU - squeezenet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny realsr-ncnn: 4x - No realsr-ncnn: 4x - Yes waifu2x-ncnn: 2x - 3 - Yes redshift: blender: BMW27 - CUDA blender: Classroom - CUDA blender: Fishy Cat - CUDA blender: Barbershop - CUDA blender: BMW27 - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Pabellon Barcelona - CUDA blender: Pabellon Barcelona - NVIDIA OptiX 1 1a 2 1b 1c 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 NVIDIA RTX 4000 25694 283.0 379.7 325.5 68.2883 25121966667 8704500000 446900 1102266667 327233 14.034 12.552 80.445 5.502 25585 282.6 379.2 320.4 68.4737 24944700000 8642000000 444300 1095266667 326300 14.032 12.604 81.129 5.551 25486 282.1 379.2 321.1 68.2546 24838966667 8615933333 442667 1091033333 324033 14.037 12.665 81.544 5.572 12.701 82.464 5.671 25027 281.1 379.3 321.1 68.3795 24259033333 8426866667 433300 1070066667 317867 3.99 1.52 3.40 10.42 191.6264 14.036 12.978 84.174 5.694 382 25457 423.50 1490.38 1843.84 140.54 31.0 282.1 379.3 322.5 346.09 68.4059 6033.10 259.66 5712.25 24876900000 8633500000 443000 1092200000 324267 4.09 1.57 3.50 10.78 191.8417 248122412.9 14.034 2.247 3.77 4.66 1.48 1.71 1.33 1.52 2.73 0.63 3.32 8.79 1.8 2.18 3.89 8.62 12.580 81.174 5.571 381 57.80 218.66 112.74 756.64 32.14 115.78 58.38 1307.24 459.35 160.82 24684 421.03 1475.98 1834.20 138.91 30.9 278.7 379.3 318.4 340.91 68.0188 6004.48 259.50 5741.92 23840966667 8254033333 424633 1049733333 312233 4.01 1.55 3.47 10.79 190.7594 248177151.4 14.034 2.255 3.78 4.67 1.48 1.74 1.35 1.54 2.75 0.63 3.40 9.14 1.95 2.21 3.93 8.33 13.078 85.926 5.785 391 58.10 224.85 114.00 771.19 29.23 118.39 59.12 1321.89 460.64 160.95 24538 420.80 1480.20 1829.24 139.15 30.2 278.8 379.3 319.4 342.37 68.0204 6536.45 259.33 6013.59 23506866667 8181566667 417767 1041500000 309400 4.02 1.55 3.47 10.73 190.8199 246857018.0 14.034 2.257 3.80 4.86 1.48 1.74 1.36 1.54 2.76 0.64 3.37 9.02 1.77 2.27 4.08 8.22 13.350 87.793 5.861 393 58.41 223.81 113.88 764.30 29.14 117.20 58.69 1307.37 459.08 159.09 25593 282.1 379.3 323.0 68.2894 25075466667 8700533333 446400 1100433333 327567 14.032 12.494 80.549 5.540 379 OpenBenchmarking.org
VkFFT OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 2020-09-29 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 6K 12K 18K 24K 30K SE +/- 28.39, N = 3 SE +/- 16.51, N = 3 SE +/- 27.82, N = 3 SE +/- 17.21, N = 3 SE +/- 32.54, N = 3 SE +/- 20.11, N = 3 SE +/- 4.04, N = 3 25694 25585 25486 25027 25457 24684 25593 24538
PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 90 180 270 360 450 SE +/- 0.45, N = 3 SE +/- 1.26, N = 3 SE +/- 0.32, N = 3 423.50 421.03 420.80
PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 300 600 900 1200 1500 SE +/- 5.16, N = 3 SE +/- 5.69, N = 3 SE +/- 5.32, N = 3 1490.38 1475.98 1480.20
PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 400 800 1200 1600 2000 SE +/- 9.92, N = 3 SE +/- 2.10, N = 3 SE +/- 8.76, N = 3 1843.84 1834.20 1829.24
PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.11, N = 3 SE +/- 0.12, N = 3 140.54 138.91 139.15
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU 1e NVIDIA Quadro RTX 4000 RTX 4000 7 14 21 28 35 SE +/- 0.66, N = 15 SE +/- 0.69, N = 15 SE +/- 0.63, N = 15 31.0 30.9 30.2
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 60 120 180 240 300 SE +/- 0.26, N = 3 SE +/- 0.06, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 SE +/- 0.28, N = 3 SE +/- 0.12, N = 3 283.0 282.6 282.1 281.1 282.1 278.7 282.1 278.8 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 80 160 240 320 400 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 379.7 379.2 379.2 379.3 379.3 379.3 379.3 379.3 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 70 140 210 280 350 SE +/- 1.79, N = 3 SE +/- 1.48, N = 3 SE +/- 0.78, N = 3 SE +/- 0.96, N = 3 SE +/- 0.58, N = 3 SE +/- 2.17, N = 3 SE +/- 1.47, N = 3 SE +/- 1.44, N = 3 325.5 320.4 321.1 321.1 322.5 318.4 323.0 319.4 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak OpenCL Test: Global Memory Bandwidth 1e NVIDIA Quadro RTX 4000 RTX 4000 80 160 240 320 400 SE +/- 4.72, N = 3 SE +/- 4.44, N = 3 SE +/- 5.13, N = 3 346.09 340.91 342.37 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
ViennaCL OpenCL LU Factorization OpenBenchmarking.org GFLOPS, More Is Better ViennaCL 1.4.2 OpenCL LU Factorization 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 15 30 45 60 75 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.28, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 68.29 68.47 68.25 68.38 68.41 68.02 68.29 68.02 1. (CXX) g++ options: -rdynamic -lOpenCL
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak OpenCL Test: Single-Precision Float 1e NVIDIA Quadro RTX 4000 RTX 4000 1400 2800 4200 5600 7000 SE +/- 35.64, N = 3 SE +/- 55.06, N = 3 SE +/- 97.61, N = 3 6033.10 6004.48 6536.45 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak OpenCL Test: Double-Precision Double 1e NVIDIA Quadro RTX 4000 RTX 4000 60 120 180 240 300 SE +/- 0.31, N = 3 SE +/- 0.18, N = 3 SE +/- 0.02, N = 3 259.66 259.50 259.33 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak OpenCL Test: Integer Compute INT 1e NVIDIA Quadro RTX 4000 RTX 4000 1300 2600 3900 5200 6500 SE +/- 68.26, N = 12 SE +/- 46.13, N = 3 SE +/- 102.19, N = 3 5712.25 5741.92 6013.59 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: MD5 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 5000M 10000M 15000M 20000M 25000M SE +/- 25031801.99, N = 3 SE +/- 2051828.45, N = 3 SE +/- 12651789.51, N = 3 SE +/- 2643440.52, N = 3 SE +/- 12698162.60, N = 3 SE +/- 13574649.58, N = 3 SE +/- 24626025.08, N = 3 SE +/- 2355372.11, N = 3 25121966667 24944700000 24838966667 24259033333 24876900000 23840966667 25075466667 23506866667
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: SHA1 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 2000M 4000M 6000M 8000M 10000M SE +/- 9832090.32, N = 3 SE +/- 6847870.72, N = 3 SE +/- 3773739.67, N = 3 SE +/- 7846938.54, N = 3 SE +/- 6005275.46, N = 3 SE +/- 5691026.07, N = 3 SE +/- 2630800.47, N = 3 SE +/- 6590228.46, N = 3 8704500000 8642000000 8615933333 8426866667 8633500000 8254033333 8700533333 8181566667
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: 7-Zip 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 100K 200K 300K 400K 500K SE +/- 208.17, N = 3 SE +/- 556.78, N = 3 SE +/- 202.76, N = 3 SE +/- 321.46, N = 3 SE +/- 1365.04, N = 3 SE +/- 463.08, N = 3 SE +/- 200.00, N = 3 SE +/- 233.33, N = 3 446900 444300 442667 433300 443000 424633 446400 417767
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: SHA-512 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 200M 400M 600M 800M 1000M SE +/- 819213.72, N = 3 SE +/- 491030.66, N = 3 SE +/- 643773.60, N = 3 SE +/- 1017076.42, N = 3 SE +/- 1021436.90, N = 3 SE +/- 1260070.54, N = 3 SE +/- 240370.09, N = 3 SE +/- 953939.20, N = 3 1102266667 1095266667 1091033333 1070066667 1092200000 1049733333 1100433333 1041500000
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: TrueCrypt RIPEMD160 + XTS 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 70K 140K 210K 280K 350K SE +/- 533.33, N = 3 SE +/- 185.59, N = 3 SE +/- 88.19, N = 3 SE +/- 233.33, N = 3 SE +/- 317.98, N = 3 SE +/- 683.94, N = 3 327233 326300 324033 317867 324267 312233 327567 309400
LuxCoreRender OpenCL Scene: DLSC OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: DLSC 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 0.9203 1.8406 2.7609 3.6812 4.6015 SE +/- 0.08, N = 12 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 3.99 4.09 4.01 4.02 MIN: 1.12 / MAX: 4.22 MIN: 3.82 / MAX: 4.25 MIN: 3.83 / MAX: 4.21 MIN: 3.82 / MAX: 4.2
LuxCoreRender OpenCL Scene: Food OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: Food 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 0.3533 0.7066 1.0599 1.4132 1.7665 SE +/- 0.04, N = 12 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.52 1.57 1.55 1.55 MIN: 0.14 / MAX: 1.88 MIN: 0.26 / MAX: 1.89 MIN: 0.25 / MAX: 1.85 MIN: 0.26 / MAX: 1.86
LuxCoreRender OpenCL Scene: LuxCore Benchmark OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: LuxCore Benchmark 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 0.7875 1.575 2.3625 3.15 3.9375 SE +/- 0.07, N = 12 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 3.40 3.50 3.47 3.47 MIN: 0.17 / MAX: 3.97 MIN: 0.27 / MAX: 4 MIN: 0.27 / MAX: 3.96 MIN: 0.33 / MAX: 3.96
LuxCoreRender OpenCL Scene: Rainbow Colors and Prism OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: Rainbow Colors and Prism 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 3 6 9 12 15 SE +/- 0.34, N = 12 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 10.42 10.78 10.79 10.73 MIN: 3.45 / MAX: 11.19 MIN: 10.09 / MAX: 11.23 MIN: 10.45 / MAX: 11.21 MIN: 9.75 / MAX: 11.24
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 40 80 120 160 200 SE +/- 0.40, N = 3 SE +/- 0.32, N = 3 SE +/- 0.27, N = 3 SE +/- 0.37, N = 3 191.63 191.84 190.76 190.82
MandelGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelGPU 1.3pts1 OpenCL Device: GPU 1e NVIDIA Quadro RTX 4000 RTX 4000 50M 100M 150M 200M 250M SE +/- 711502.39, N = 3 SE +/- 308768.05, N = 3 SE +/- 540319.59, N = 3 248122412.9 248177151.4 246857018.0 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-06-06 Benchmark: Black-Scholes OpenCL 1 1a 1b 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 14.03 14.03 14.04 14.04 14.03 14.03 14.03 14.03 1. (CXX) g++ options: -O3 -lOpenCL
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 0.5078 1.0156 1.5234 2.0312 2.539 SE +/- 0.008, N = 3 SE +/- 0.012, N = 3 SE +/- 0.007, N = 3 2.247 2.255 2.257 1. (CXX) g++ options: -rdynamic
NCNN Target: Vulkan GPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: squeezenet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.855 1.71 2.565 3.42 4.275 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.77 3.78 3.80 MIN: 3.71 / MAX: 3.87 MIN: 3.72 / MAX: 3.84 MIN: 3.74 / MAX: 10.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mobilenet 1e NVIDIA Quadro RTX 4000 RTX 4000 1.0935 2.187 3.2805 4.374 5.4675 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.17, N = 3 4.66 4.67 4.86 MIN: 4.6 / MAX: 4.86 MIN: 4.64 / MAX: 4.75 MIN: 4.64 / MAX: 71.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 1e NVIDIA Quadro RTX 4000 RTX 4000 0.333 0.666 0.999 1.332 1.665 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.48 1.48 1.48 MIN: 1.44 / MAX: 20.23 MIN: 1.46 / MAX: 1.5 MIN: 1.47 / MAX: 1.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 1e NVIDIA Quadro RTX 4000 RTX 4000 0.3915 0.783 1.1745 1.566 1.9575 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.71 1.74 1.74 MIN: 1.7 / MAX: 1.75 MIN: 1.73 / MAX: 1.81 MIN: 1.73 / MAX: 1.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: shufflenet-v2 1e NVIDIA Quadro RTX 4000 RTX 4000 0.306 0.612 0.918 1.224 1.53 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.33 1.35 1.36 MIN: 1.32 / MAX: 1.4 MIN: 1.33 / MAX: 1.4 MIN: 1.34 / MAX: 1.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mnasnet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.3465 0.693 1.0395 1.386 1.7325 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.52 1.54 1.54 MIN: 1.5 / MAX: 1.56 MIN: 1.53 / MAX: 1.63 MIN: 1.53 / MAX: 1.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: efficientnet-b0 1e NVIDIA Quadro RTX 4000 RTX 4000 0.621 1.242 1.863 2.484 3.105 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.73 2.75 2.76 MIN: 2.7 / MAX: 8.24 MIN: 2.74 / MAX: 3.38 MIN: 2.75 / MAX: 3.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: blazeface 1e NVIDIA Quadro RTX 4000 RTX 4000 0.144 0.288 0.432 0.576 0.72 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 0.63 0.63 0.64 MIN: 0.62 / MAX: 0.68 MIN: 0.62 / MAX: 0.65 MIN: 0.62 / MAX: 0.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: googlenet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.765 1.53 2.295 3.06 3.825 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 3.32 3.40 3.37 MIN: 3.29 / MAX: 3.43 MIN: 3.33 / MAX: 20.26 MIN: 3.35 / MAX: 3.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: vgg16 1e NVIDIA Quadro RTX 4000 RTX 4000 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 SE +/- 0.04, N = 3 8.79 9.14 9.02 MIN: 8.1 / MAX: 20.83 MIN: 8.49 / MAX: 36.48 MIN: 8.35 / MAX: 20.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet18 1e NVIDIA Quadro RTX 4000 RTX 4000 0.4388 0.8776 1.3164 1.7552 2.194 SE +/- 0.05, N = 2 SE +/- 0.13, N = 3 SE +/- 0.04, N = 3 1.80 1.95 1.77 MIN: 1.69 / MAX: 21.82 MIN: 1.7 / MAX: 20.49 MIN: 1.71 / MAX: 24.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: alexnet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.5108 1.0216 1.5324 2.0432 2.554 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 2.18 2.21 2.27 MIN: 1.91 / MAX: 11.43 MIN: 1.91 / MAX: 6.96 MIN: 2.15 / MAX: 23.82 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet50 1e NVIDIA Quadro RTX 4000 RTX 4000 0.918 1.836 2.754 3.672 4.59 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.14, N = 3 3.89 3.93 4.08 MIN: 3.86 / MAX: 3.99 MIN: 3.91 / MAX: 4.04 MIN: 3.92 / MAX: 40.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: yolov4-tiny 1e NVIDIA Quadro RTX 4000 RTX 4000 2 4 6 8 10 SE +/- 0.37, N = 3 SE +/- 0.09, N = 3 SE +/- 0.00, N = 3 8.62 8.33 8.22 MIN: 8.1 / MAX: 74.77 MIN: 8.13 / MAX: 55.28 MIN: 8.15 / MAX: 8.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
RealSR-NCNN Scale: 4x - TAA: No OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: No 1 1a 1b 1c 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 12.55 12.60 12.67 12.70 12.98 12.58 13.08 12.49 13.35
RealSR-NCNN Scale: 4x - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: Yes 1 1a 1b 1c 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 20 40 60 80 100 SE +/- 0.35, N = 3 SE +/- 0.36, N = 3 SE +/- 0.40, N = 3 SE +/- 0.40, N = 3 SE +/- 0.36, N = 3 SE +/- 0.36, N = 3 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 SE +/- 0.27, N = 3 80.45 81.13 81.54 82.46 84.17 81.17 85.93 80.55 87.79
Waifu2x-NCNN Vulkan Scale: 2x - Denoise: 3 - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes 1 1a 1b 1c 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 1.3187 2.6374 3.9561 5.2748 6.5935 SE +/- 0.008, N = 3 SE +/- 0.023, N = 3 SE +/- 0.022, N = 3 SE +/- 0.047, N = 3 SE +/- 0.008, N = 3 SE +/- 0.017, N = 3 SE +/- 0.018, N = 3 SE +/- 0.041, N = 3 SE +/- 0.014, N = 3 5.502 5.551 5.572 5.671 5.694 5.571 5.785 5.540 5.861
RedShift Demo OpenBenchmarking.org Seconds, Fewer Is Better RedShift Demo 3.0 1d 1e NVIDIA Quadro RTX 4000 NVIDIA RTX 4000 RTX 4000 90 180 270 360 450 SE +/- 2.60, N = 3 SE +/- 2.31, N = 3 SE +/- 4.63, N = 3 SE +/- 2.33, N = 3 SE +/- 4.91, N = 3 382 381 391 379 393
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CUDA 1e NVIDIA Quadro RTX 4000 RTX 4000 13 26 39 52 65 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.10, N = 3 57.80 58.10 58.41
Blender Blend File: Classroom - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CUDA 1e NVIDIA Quadro RTX 4000 RTX 4000 50 100 150 200 250 SE +/- 1.49, N = 3 SE +/- 3.62, N = 3 SE +/- 3.26, N = 3 218.66 224.85 223.81
Blender Blend File: Fishy Cat - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CUDA 1e NVIDIA Quadro RTX 4000 RTX 4000 30 60 90 120 150 SE +/- 0.28, N = 3 SE +/- 0.16, N = 3 SE +/- 0.21, N = 3 112.74 114.00 113.88
Blender Blend File: Barbershop - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CUDA 1e NVIDIA Quadro RTX 4000 RTX 4000 170 340 510 680 850 SE +/- 2.80, N = 3 SE +/- 1.15, N = 3 SE +/- 0.87, N = 3 756.64 771.19 764.30
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: NVIDIA OptiX 1e NVIDIA Quadro RTX 4000 RTX 4000 7 14 21 28 35 SE +/- 3.24, N = 15 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 32.14 29.23 29.14
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: NVIDIA OptiX 1e NVIDIA Quadro RTX 4000 RTX 4000 30 60 90 120 150 SE +/- 0.87, N = 3 SE +/- 0.55, N = 3 SE +/- 0.57, N = 3 115.78 118.39 117.20
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: NVIDIA OptiX 1e NVIDIA Quadro RTX 4000 RTX 4000 13 26 39 52 65 SE +/- 0.21, N = 3 SE +/- 0.24, N = 3 SE +/- 0.22, N = 3 58.38 59.12 58.69
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: NVIDIA OptiX 1e NVIDIA Quadro RTX 4000 RTX 4000 300 600 900 1200 1500 SE +/- 4.68, N = 3 SE +/- 1.05, N = 3 SE +/- 0.53, N = 3 1307.24 1321.89 1307.37
Blender Blend File: Pabellon Barcelona - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CUDA 1e NVIDIA Quadro RTX 4000 RTX 4000 100 200 300 400 500 SE +/- 1.17, N = 3 SE +/- 1.47, N = 3 SE +/- 0.47, N = 3 459.35 460.64 459.08
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX 1e NVIDIA Quadro RTX 4000 RTX 4000 40 80 120 160 200 SE +/- 0.15, N = 3 SE +/- 0.11, N = 3 SE +/- 0.48, N = 3 160.82 160.95 159.09
Phoronix Test Suite v10.8.4