compulab-airtop-3-rtx-4000-compute Intel Xeon E-2288G testing with a Compulab SBC-ATCFL v1.2 (ATOP3.PRD.0.29.2 BIOS) and NVIDIA Quadro RTX 4000 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2010311-FI-COMPULABA24&sor .
compulab-airtop-3-rtx-4000-compute Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution 1 1a 2 1b 1c 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 NVIDIA RTX 4000 Intel Xeon E-2288G @ 5.00GHz (8 Cores / 16 Threads) Compulab SBC-ATCFL v1.2 (ATOP3.PRD.0.29.2 BIOS) Intel Cannon Lake PCH 64GB Samsung SSD 970 EVO Plus 250GB NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) Intel Cannon Lake PCH cAVS VE228 Intel I219-LM + Intel I210 Ubuntu 20.10 5.8.0-26-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 NVIDIA 455.28 4.6.0 OpenCL 1.2 CUDA 11.1.96 1.2.142 GCC 10.2.0 ext4 1920x1080 NVIDIA Quadro RTX 4000 8GB (300/405MHz) NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) NVIDIA Quadro RTX 4000 8GB (300/405MHz) NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd6 - Thermald 2.3 OpenCL Details - GPU Compute Cores: 2304 Python Details - 1, 1a, 1b, 1d, 1e, NVIDIA Quadro RTX 4000, RTX 4000, NVIDIA RTX 4000: Python 3.8.6 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Mitigation of TSX disabled + tsx_async_abort: Mitigation of TSX disabled
compulab-airtop-3-rtx-4000-compute realsr-ncnn: 4x - No realsr-ncnn: 4x - Yes waifu2x-ncnn: 2x - 3 - Yes vkfft: hashcat: MD5 hashcat: SHA1 hashcat: 7-Zip hashcat: SHA-512 hashcat: TrueCrypt RIPEMD160 + XTS financebench: Black-Scholes OpenCL viennacl: OpenCL LU Factorization cl-mem: Copy cl-mem: Read cl-mem: Write redshift: luxcorerender-cl: DLSC luxcorerender-cl: Food luxcorerender-cl: LuxCore Benchmark luxcorerender-cl: Rainbow Colors and Prism fahbench: arrayfire: Conjugate Gradient OpenCL ncnn: Vulkan GPU - squeezenet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny plaidml: No - Inference - IMDB LSTM - OpenCL plaidml: No - Inference - Mobilenet - OpenCL plaidml: Yes - Inference - Mobilenet - OpenCL plaidml: No - Inference - DenseNet 201 - OpenCL blender: BMW27 - CUDA blender: Classroom - CUDA blender: Fishy Cat - CUDA blender: Barbershop - CUDA blender: BMW27 - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Barbershop - NVIDIA OptiX blender: Pabellon Barcelona - CUDA blender: Pabellon Barcelona - NVIDIA OptiX mandelgpu: GPU clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth neatbench: GPU 1 1a 2 1b 1c 1d 1e NVIDIA Quadro RTX 4000 RTX 4000 NVIDIA RTX 4000 12.552 80.445 5.502 25694 25121966667 8704500000 446900 1102266667 327233 14.034 68.2883 283.0 379.7 325.5 12.604 81.129 5.551 25585 24944700000 8642000000 444300 1095266667 326300 14.032 68.4737 282.6 379.2 320.4 12.665 81.544 5.572 25486 24838966667 8615933333 442667 1091033333 324033 14.037 68.2546 282.1 379.2 321.1 12.701 82.464 5.671 12.978 84.174 5.694 25027 24259033333 8426866667 433300 1070066667 317867 14.036 68.3795 281.1 379.3 321.1 382 3.99 1.52 3.40 10.42 191.6264 12.580 81.174 5.571 25457 24876900000 8633500000 443000 1092200000 324267 14.034 68.4059 282.1 379.3 322.5 381 4.09 1.57 3.50 10.78 191.8417 2.247 3.77 4.66 1.48 1.71 1.33 1.52 2.73 0.63 3.32 8.79 1.8 2.18 3.89 8.62 423.50 1490.38 1843.84 140.54 57.80 218.66 112.74 756.64 32.14 115.78 58.38 1307.24 459.35 160.82 248122412.9 5712.25 6033.10 259.66 346.09 31.0 13.078 85.926 5.785 24684 23840966667 8254033333 424633 1049733333 312233 14.034 68.0188 278.7 379.3 318.4 391 4.01 1.55 3.47 10.79 190.7594 2.255 3.78 4.67 1.48 1.74 1.35 1.54 2.75 0.63 3.40 9.14 1.95 2.21 3.93 8.33 421.03 1475.98 1834.20 138.91 58.10 224.85 114.00 771.19 29.23 118.39 59.12 1321.89 460.64 160.95 248177151.4 5741.92 6004.48 259.50 340.91 30.9 13.350 87.793 5.861 24538 23506866667 8181566667 417767 1041500000 309400 14.034 68.0204 278.8 379.3 319.4 393 4.02 1.55 3.47 10.73 190.8199 2.257 3.80 4.86 1.48 1.74 1.36 1.54 2.76 0.64 3.37 9.02 1.77 2.27 4.08 8.22 420.80 1480.20 1829.24 139.15 58.41 223.81 113.88 764.30 29.14 117.20 58.69 1307.37 459.08 159.09 246857018.0 6013.59 6536.45 259.33 342.37 30.2 12.494 80.549 5.540 25593 25075466667 8700533333 446400 1100433333 327567 14.032 68.2894 282.1 379.3 323.0 379 OpenBenchmarking.org
RealSR-NCNN Scale: 4x - TAA: No OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: No NVIDIA RTX 4000 1 1e 1a 1b 1c 1d NVIDIA Quadro RTX 4000 RTX 4000 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 12.49 12.55 12.58 12.60 12.67 12.70 12.98 13.08 13.35
RealSR-NCNN Scale: 4x - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better RealSR-NCNN 20200818 Scale: 4x - TAA: Yes 1 NVIDIA RTX 4000 1a 1e 1b 1c 1d NVIDIA Quadro RTX 4000 RTX 4000 20 40 60 80 100 SE +/- 0.35, N = 3 SE +/- 0.37, N = 3 SE +/- 0.36, N = 3 SE +/- 0.36, N = 3 SE +/- 0.40, N = 3 SE +/- 0.40, N = 3 SE +/- 0.36, N = 3 SE +/- 0.45, N = 3 SE +/- 0.27, N = 3 80.45 80.55 81.13 81.17 81.54 82.46 84.17 85.93 87.79
Waifu2x-NCNN Vulkan Scale: 2x - Denoise: 3 - TAA: Yes OpenBenchmarking.org Seconds, Fewer Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes 1 NVIDIA RTX 4000 1a 1e 1b 1c 1d NVIDIA Quadro RTX 4000 RTX 4000 1.3187 2.6374 3.9561 5.2748 6.5935 SE +/- 0.008, N = 3 SE +/- 0.041, N = 3 SE +/- 0.023, N = 3 SE +/- 0.017, N = 3 SE +/- 0.022, N = 3 SE +/- 0.047, N = 3 SE +/- 0.008, N = 3 SE +/- 0.018, N = 3 SE +/- 0.014, N = 3 5.502 5.540 5.551 5.571 5.572 5.671 5.694 5.785 5.861
VkFFT OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 2020-09-29 1 NVIDIA RTX 4000 1a 1b 1e 1d NVIDIA Quadro RTX 4000 RTX 4000 6K 12K 18K 24K 30K SE +/- 28.39, N = 3 SE +/- 20.11, N = 3 SE +/- 16.51, N = 3 SE +/- 27.82, N = 3 SE +/- 32.54, N = 3 SE +/- 17.21, N = 3 SE +/- 4.04, N = 3 25694 25593 25585 25486 25457 25027 24684 24538
Hashcat Benchmark: MD5 OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: MD5 1 NVIDIA RTX 4000 1a 1e 1b 1d NVIDIA Quadro RTX 4000 RTX 4000 5000M 10000M 15000M 20000M 25000M SE +/- 25031801.99, N = 3 SE +/- 24626025.08, N = 3 SE +/- 2051828.45, N = 3 SE +/- 12698162.60, N = 3 SE +/- 12651789.51, N = 3 SE +/- 2643440.52, N = 3 SE +/- 13574649.58, N = 3 SE +/- 2355372.11, N = 3 25121966667 25075466667 24944700000 24876900000 24838966667 24259033333 23840966667 23506866667
Hashcat Benchmark: SHA1 OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: SHA1 1 NVIDIA RTX 4000 1a 1e 1b 1d NVIDIA Quadro RTX 4000 RTX 4000 2000M 4000M 6000M 8000M 10000M SE +/- 9832090.32, N = 3 SE +/- 2630800.47, N = 3 SE +/- 6847870.72, N = 3 SE +/- 6005275.46, N = 3 SE +/- 3773739.67, N = 3 SE +/- 7846938.54, N = 3 SE +/- 5691026.07, N = 3 SE +/- 6590228.46, N = 3 8704500000 8700533333 8642000000 8633500000 8615933333 8426866667 8254033333 8181566667
Hashcat Benchmark: 7-Zip OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: 7-Zip 1 NVIDIA RTX 4000 1a 1e 1b 1d NVIDIA Quadro RTX 4000 RTX 4000 100K 200K 300K 400K 500K SE +/- 208.17, N = 3 SE +/- 200.00, N = 3 SE +/- 556.78, N = 3 SE +/- 1365.04, N = 3 SE +/- 202.76, N = 3 SE +/- 321.46, N = 3 SE +/- 463.08, N = 3 SE +/- 233.33, N = 3 446900 446400 444300 443000 442667 433300 424633 417767
Hashcat Benchmark: SHA-512 OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: SHA-512 1 NVIDIA RTX 4000 1a 1e 1b 1d NVIDIA Quadro RTX 4000 RTX 4000 200M 400M 600M 800M 1000M SE +/- 819213.72, N = 3 SE +/- 240370.09, N = 3 SE +/- 491030.66, N = 3 SE +/- 1021436.90, N = 3 SE +/- 643773.60, N = 3 SE +/- 1017076.42, N = 3 SE +/- 1260070.54, N = 3 SE +/- 953939.20, N = 3 1102266667 1100433333 1095266667 1092200000 1091033333 1070066667 1049733333 1041500000
Hashcat Benchmark: TrueCrypt RIPEMD160 + XTS OpenBenchmarking.org H/s, More Is Better Hashcat 6.1.1 Benchmark: TrueCrypt RIPEMD160 + XTS NVIDIA RTX 4000 1 1a 1e 1b 1d NVIDIA Quadro RTX 4000 RTX 4000 70K 140K 210K 280K 350K SE +/- 683.94, N = 3 SE +/- 533.33, N = 3 SE +/- 233.33, N = 3 SE +/- 185.59, N = 3 SE +/- 88.19, N = 3 SE +/- 317.98, N = 3 327567 327233 326300 324267 324033 317867 312233 309400
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-06-06 Benchmark: Black-Scholes OpenCL 1a NVIDIA RTX 4000 1 1e NVIDIA Quadro RTX 4000 RTX 4000 1d 1b 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 14.03 14.03 14.03 14.03 14.03 14.03 14.04 14.04 1. (CXX) g++ options: -O3 -lOpenCL
ViennaCL OpenCL LU Factorization OpenBenchmarking.org GFLOPS, More Is Better ViennaCL 1.4.2 OpenCL LU Factorization 1a 1e 1d NVIDIA RTX 4000 1 1b RTX 4000 NVIDIA Quadro RTX 4000 15 30 45 60 75 SE +/- 0.12, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.12, N = 3 SE +/- 0.28, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 68.47 68.41 68.38 68.29 68.29 68.25 68.02 68.02 1. (CXX) g++ options: -rdynamic -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy 1 1a NVIDIA RTX 4000 1e 1b 1d RTX 4000 NVIDIA Quadro RTX 4000 60 120 180 240 300 SE +/- 0.26, N = 3 SE +/- 0.06, N = 3 SE +/- 0.28, N = 3 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 283.0 282.6 282.1 282.1 282.1 281.1 278.8 278.7 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read 1 NVIDIA RTX 4000 RTX 4000 NVIDIA Quadro RTX 4000 1e 1d 1b 1a 80 160 240 320 400 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 379.7 379.3 379.3 379.3 379.3 379.3 379.2 379.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write 1 NVIDIA RTX 4000 1e 1d 1b 1a RTX 4000 NVIDIA Quadro RTX 4000 70 140 210 280 350 SE +/- 1.79, N = 3 SE +/- 1.47, N = 3 SE +/- 0.58, N = 3 SE +/- 0.96, N = 3 SE +/- 0.78, N = 3 SE +/- 1.48, N = 3 SE +/- 1.44, N = 3 SE +/- 2.17, N = 3 325.5 323.0 322.5 321.1 321.1 320.4 319.4 318.4 1. (CC) gcc options: -O2 -flto -lOpenCL
RedShift Demo OpenBenchmarking.org Seconds, Fewer Is Better RedShift Demo 3.0 NVIDIA RTX 4000 1e 1d NVIDIA Quadro RTX 4000 RTX 4000 90 180 270 360 450 SE +/- 2.33, N = 3 SE +/- 2.31, N = 3 SE +/- 2.60, N = 3 SE +/- 4.63, N = 3 SE +/- 4.91, N = 3 379 381 382 391 393
LuxCoreRender OpenCL Scene: DLSC OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: DLSC 1e RTX 4000 NVIDIA Quadro RTX 4000 1d 0.9203 1.8406 2.7609 3.6812 4.6015 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.08, N = 12 4.09 4.02 4.01 3.99 MIN: 3.82 / MAX: 4.25 MIN: 3.82 / MAX: 4.2 MIN: 3.83 / MAX: 4.21 MIN: 1.12 / MAX: 4.22
LuxCoreRender OpenCL Scene: Food OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: Food 1e RTX 4000 NVIDIA Quadro RTX 4000 1d 0.3533 0.7066 1.0599 1.4132 1.7665 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 12 1.57 1.55 1.55 1.52 MIN: 0.26 / MAX: 1.89 MIN: 0.26 / MAX: 1.86 MIN: 0.25 / MAX: 1.85 MIN: 0.14 / MAX: 1.88
LuxCoreRender OpenCL Scene: LuxCore Benchmark OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: LuxCore Benchmark 1e RTX 4000 NVIDIA Quadro RTX 4000 1d 0.7875 1.575 2.3625 3.15 3.9375 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.07, N = 12 3.50 3.47 3.47 3.40 MIN: 0.27 / MAX: 4 MIN: 0.33 / MAX: 3.96 MIN: 0.27 / MAX: 3.96 MIN: 0.17 / MAX: 3.97
LuxCoreRender OpenCL Scene: Rainbow Colors and Prism OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender OpenCL 2.3 Scene: Rainbow Colors and Prism NVIDIA Quadro RTX 4000 1e RTX 4000 1d 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.34, N = 12 10.79 10.78 10.73 10.42 MIN: 10.45 / MAX: 11.21 MIN: 10.09 / MAX: 11.23 MIN: 9.75 / MAX: 11.24 MIN: 3.45 / MAX: 11.19
FAHBench OpenBenchmarking.org Ns Per Day, More Is Better FAHBench 2.3.2 1e 1d RTX 4000 NVIDIA Quadro RTX 4000 40 80 120 160 200 SE +/- 0.32, N = 3 SE +/- 0.40, N = 3 SE +/- 0.37, N = 3 SE +/- 0.27, N = 3 191.84 191.63 190.82 190.76
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 0.5078 1.0156 1.5234 2.0312 2.539 SE +/- 0.008, N = 3 SE +/- 0.012, N = 3 SE +/- 0.007, N = 3 2.247 2.255 2.257 1. (CXX) g++ options: -rdynamic
NCNN Target: Vulkan GPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: squeezenet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.855 1.71 2.565 3.42 4.275 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.77 3.78 3.80 MIN: 3.71 / MAX: 3.87 MIN: 3.72 / MAX: 3.84 MIN: 3.74 / MAX: 10.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mobilenet 1e NVIDIA Quadro RTX 4000 RTX 4000 1.0935 2.187 3.2805 4.374 5.4675 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.17, N = 3 4.66 4.67 4.86 MIN: 4.6 / MAX: 4.86 MIN: 4.64 / MAX: 4.75 MIN: 4.64 / MAX: 71.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 1e NVIDIA Quadro RTX 4000 RTX 4000 0.333 0.666 0.999 1.332 1.665 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.48 1.48 1.48 MIN: 1.44 / MAX: 20.23 MIN: 1.46 / MAX: 1.5 MIN: 1.47 / MAX: 1.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 1e NVIDIA Quadro RTX 4000 RTX 4000 0.3915 0.783 1.1745 1.566 1.9575 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.71 1.74 1.74 MIN: 1.7 / MAX: 1.75 MIN: 1.73 / MAX: 1.81 MIN: 1.73 / MAX: 1.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: shufflenet-v2 1e NVIDIA Quadro RTX 4000 RTX 4000 0.306 0.612 0.918 1.224 1.53 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.33 1.35 1.36 MIN: 1.32 / MAX: 1.4 MIN: 1.33 / MAX: 1.4 MIN: 1.34 / MAX: 1.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mnasnet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.3465 0.693 1.0395 1.386 1.7325 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.52 1.54 1.54 MIN: 1.5 / MAX: 1.56 MIN: 1.53 / MAX: 1.63 MIN: 1.53 / MAX: 1.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: efficientnet-b0 1e NVIDIA Quadro RTX 4000 RTX 4000 0.621 1.242 1.863 2.484 3.105 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.73 2.75 2.76 MIN: 2.7 / MAX: 8.24 MIN: 2.74 / MAX: 3.38 MIN: 2.75 / MAX: 3.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: blazeface 1e NVIDIA Quadro RTX 4000 RTX 4000 0.144 0.288 0.432 0.576 0.72 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 0.63 0.63 0.64 MIN: 0.62 / MAX: 0.68 MIN: 0.62 / MAX: 0.65 MIN: 0.62 / MAX: 0.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: googlenet 1e RTX 4000 NVIDIA Quadro RTX 4000 0.765 1.53 2.295 3.06 3.825 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 3.32 3.37 3.40 MIN: 3.29 / MAX: 3.43 MIN: 3.35 / MAX: 3.44 MIN: 3.33 / MAX: 20.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: vgg16 1e RTX 4000 NVIDIA Quadro RTX 4000 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 8.79 9.02 9.14 MIN: 8.1 / MAX: 20.83 MIN: 8.35 / MAX: 20.34 MIN: 8.49 / MAX: 36.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet18 RTX 4000 1e NVIDIA Quadro RTX 4000 0.4388 0.8776 1.3164 1.7552 2.194 SE +/- 0.04, N = 3 SE +/- 0.05, N = 2 SE +/- 0.13, N = 3 1.77 1.80 1.95 MIN: 1.71 / MAX: 24.18 MIN: 1.69 / MAX: 21.82 MIN: 1.7 / MAX: 20.49 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: alexnet 1e NVIDIA Quadro RTX 4000 RTX 4000 0.5108 1.0216 1.5324 2.0432 2.554 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 2.18 2.21 2.27 MIN: 1.91 / MAX: 11.43 MIN: 1.91 / MAX: 6.96 MIN: 2.15 / MAX: 23.82 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet50 1e NVIDIA Quadro RTX 4000 RTX 4000 0.918 1.836 2.754 3.672 4.59 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.14, N = 3 3.89 3.93 4.08 MIN: 3.86 / MAX: 3.99 MIN: 3.91 / MAX: 4.04 MIN: 3.92 / MAX: 40.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: yolov4-tiny RTX 4000 NVIDIA Quadro RTX 4000 1e 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.09, N = 3 SE +/- 0.37, N = 3 8.22 8.33 8.62 MIN: 8.15 / MAX: 8.57 MIN: 8.13 / MAX: 55.28 MIN: 8.1 / MAX: 74.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 90 180 270 360 450 SE +/- 0.45, N = 3 SE +/- 1.26, N = 3 SE +/- 0.32, N = 3 423.50 421.03 420.80
PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL 1e RTX 4000 NVIDIA Quadro RTX 4000 300 600 900 1200 1500 SE +/- 5.16, N = 3 SE +/- 5.32, N = 3 SE +/- 5.69, N = 3 1490.38 1480.20 1475.98
PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL 1e NVIDIA Quadro RTX 4000 RTX 4000 400 800 1200 1600 2000 SE +/- 9.92, N = 3 SE +/- 2.10, N = 3 SE +/- 8.76, N = 3 1843.84 1834.20 1829.24
PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL 1e RTX 4000 NVIDIA Quadro RTX 4000 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.12, N = 3 SE +/- 0.11, N = 3 140.54 139.15 138.91
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CUDA 1e NVIDIA Quadro RTX 4000 RTX 4000 13 26 39 52 65 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.10, N = 3 57.80 58.10 58.41
Blender Blend File: Classroom - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CUDA 1e RTX 4000 NVIDIA Quadro RTX 4000 50 100 150 200 250 SE +/- 1.49, N = 3 SE +/- 3.26, N = 3 SE +/- 3.62, N = 3 218.66 223.81 224.85
Blender Blend File: Fishy Cat - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CUDA 1e RTX 4000 NVIDIA Quadro RTX 4000 30 60 90 120 150 SE +/- 0.28, N = 3 SE +/- 0.21, N = 3 SE +/- 0.16, N = 3 112.74 113.88 114.00
Blender Blend File: Barbershop - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CUDA 1e RTX 4000 NVIDIA Quadro RTX 4000 170 340 510 680 850 SE +/- 2.80, N = 3 SE +/- 0.87, N = 3 SE +/- 1.15, N = 3 756.64 764.30 771.19
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: NVIDIA OptiX RTX 4000 NVIDIA Quadro RTX 4000 1e 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 3.24, N = 15 29.14 29.23 32.14
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: NVIDIA OptiX 1e RTX 4000 NVIDIA Quadro RTX 4000 30 60 90 120 150 SE +/- 0.87, N = 3 SE +/- 0.57, N = 3 SE +/- 0.55, N = 3 115.78 117.20 118.39
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: NVIDIA OptiX 1e RTX 4000 NVIDIA Quadro RTX 4000 13 26 39 52 65 SE +/- 0.21, N = 3 SE +/- 0.22, N = 3 SE +/- 0.24, N = 3 58.38 58.69 59.12
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: NVIDIA OptiX 1e RTX 4000 NVIDIA Quadro RTX 4000 300 600 900 1200 1500 SE +/- 4.68, N = 3 SE +/- 0.53, N = 3 SE +/- 1.05, N = 3 1307.24 1307.37 1321.89
Blender Blend File: Pabellon Barcelona - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CUDA RTX 4000 1e NVIDIA Quadro RTX 4000 100 200 300 400 500 SE +/- 0.47, N = 3 SE +/- 1.17, N = 3 SE +/- 1.47, N = 3 459.08 459.35 460.64
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX RTX 4000 1e NVIDIA Quadro RTX 4000 40 80 120 160 200 SE +/- 0.48, N = 3 SE +/- 0.15, N = 3 SE +/- 0.11, N = 3 159.09 160.82 160.95
MandelGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelGPU 1.3pts1 OpenCL Device: GPU NVIDIA Quadro RTX 4000 1e RTX 4000 50M 100M 150M 200M 250M SE +/- 308768.05, N = 3 SE +/- 711502.39, N = 3 SE +/- 540319.59, N = 3 248177151.4 248122412.9 246857018.0 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak OpenCL Test: Integer Compute INT RTX 4000 NVIDIA Quadro RTX 4000 1e 1300 2600 3900 5200 6500 SE +/- 102.19, N = 3 SE +/- 46.13, N = 3 SE +/- 68.26, N = 12 6013.59 5741.92 5712.25 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak OpenCL Test: Single-Precision Float RTX 4000 1e NVIDIA Quadro RTX 4000 1400 2800 4200 5600 7000 SE +/- 97.61, N = 3 SE +/- 35.64, N = 3 SE +/- 55.06, N = 3 6536.45 6033.10 6004.48 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak OpenCL Test: Double-Precision Double 1e NVIDIA Quadro RTX 4000 RTX 4000 60 120 180 240 300 SE +/- 0.31, N = 3 SE +/- 0.18, N = 3 SE +/- 0.02, N = 3 259.66 259.50 259.33 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak OpenCL Test: Global Memory Bandwidth 1e RTX 4000 NVIDIA Quadro RTX 4000 80 160 240 320 400 SE +/- 4.72, N = 3 SE +/- 5.13, N = 3 SE +/- 4.44, N = 3 346.09 342.37 340.91 1. (CXX) g++ options: -O3 -rdynamic -lOpenCL
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU 1e NVIDIA Quadro RTX 4000 RTX 4000 7 14 21 28 35 SE +/- 0.66, N = 15 SE +/- 0.69, N = 15 SE +/- 0.63, N = 15 31.0 30.9 30.2
Phoronix Test Suite v10.8.4