CompuLab Airtop 3 Intel Xeon E-2288G testing with a Compulab SBC-ATCFL v1.2 (ATOP3.PRD.0.29.2 BIOS) and NVIDIA Quadro RTX 4000 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2011040-FI-COMPULABA81&grr&rdt .
CompuLab Airtop 3 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution 1 2 3 Intel Xeon E-2288G @ 5.00GHz (8 Cores / 16 Threads) Compulab SBC-ATCFL v1.2 (ATOP3.PRD.0.29.2 BIOS) Intel Cannon Lake PCH 64GB Samsung SSD 970 EVO Plus 250GB NVIDIA Quadro RTX 4000 8GB (1005/6500MHz) Intel Cannon Lake PCH cAVS VE228 Intel I219-LM + Intel I210 Ubuntu 20.10 5.8.0-26-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 NVIDIA 455.28 4.6.0 OpenCL 1.2 CUDA 11.1.96 1.2.142 GCC 10.2.0 ext4 1920x1080 NVIDIA Quadro RTX 4000 8GB (300/405MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd6 - Thermald 2.3 OpenCL Details - GPU Compute Cores: 2304 Java Details - OpenJDK Runtime Environment (build 11.0.9+11-Ubuntu-0ubuntu1) Python Details - Python 3.8.6 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Mitigation of TSX disabled + tsx_async_abort: Mitigation of TSX disabled
CompuLab Airtop 3 lammps: 20k Atoms blender: Barbershop - NVIDIA OptiX java-gradle-perf: Reactor blender: Barbershop - CPU-Only build-llvm: Time To Compile blender: Pabellon Barcelona - CPU-Only blender: Classroom - CPU-Only lczero: Eigen incompact3d: Cylinder hint: FLOAT lczero: BLAS rodinia: OpenMP HotSpot3D brl-cad: VGR Performance Metric rodinia: OpenMP LavaMD astcenc: Exhaustive blender: Fishy Cat - CPU-Only rodinia: OpenMP CFD Solver gromacs: Water Benchmark caffe: GoogleNet - CPU - 200 blender: BMW27 - CPU-Only rodinia: OpenMP Leukocyte wireguard: tensorflow-lite: Inception V4 tensorflow-lite: Inception ResNet V2 blender: Pabellon Barcelona - NVIDIA OptiX kvazaar: Bosphorus 4K - Slow kvazaar: Bosphorus 4K - Medium dav1d: Chimera 1080p 10-bit byte: Dhrystone 2 rodinia: OpenMP Streamcluster blender: Classroom - NVIDIA OptiX caffe: GoogleNet - CPU - 100 namd: ATPase Simulation - 327,506 Atoms hmmer: Pfam Database Search avifenc: 0 build-linux-kernel: Time To Compile pgbench: 1 - 250 - Read Write - Average Latency pgbench: 1 - 250 - Read Write caffe: AlexNet - CPU - 200 pgbench: 1 - 1 - Read Write - Average Latency pgbench: 1 - 1 - Read Write influxdb: 4 - 10000 - 2,5000,1 - 10000 influxdb: 64 - 10000 - 2,5000,1 - 10000 influxdb: 1024 - 10000 - 2,5000,1 - 10000 onednn: Deconvolution Batch deconv_1d - f32 - CPU keydb: onednn: IP Batch 1D - f32 - CPU pyperformance: raytrace ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet ncnn: CPU - squeezenet rawtherapee: Total Benchmark Time tensorflow-lite: SqueezeNet tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Quant tensorflow-lite: Mobilenet Float avifenc: 2 onednn: IP Batch All - u8s8f32 - CPU onednn: IP Batch All - f32 - CPU sunflow: Global Illumination + Image Synthesis pyperformance: python_startup blender: Fishy Cat - NVIDIA OptiX rodinia: OpenCL Myocyte kvazaar: Bosphorus 4K - Very Fast astcenc: Thorough x265: Bosphorus 4K pyperformance: 2to3 onednn: IP Batch 1D - u8s8f32 - CPU caffe: AlexNet - CPU - 100 pyperformance: go onednn: Recurrent Neural Network Training - f32 - CPU pgbench: 1 - 250 - Read Only - Average Latency pgbench: 1 - 250 - Read Only onednn: Recurrent Neural Network Inference - f32 - CPU libraw: Post-Processing Benchmark webp: Quality 100, Lossless, Highest Compression kvazaar: Bosphorus 1080p - Slow kvazaar: Bosphorus 1080p - Medium pyperformance: float aom-av1: Speed 0 Two-Pass pyperformance: django_template pyperformance: chaos pyperformance: crypto_pyaes dacapobench: H2 blender: BMW27 - NVIDIA OptiX pyperformance: regex_compile kvazaar: Bosphorus 4K - Ultra Fast aom-av1: Speed 6 Realtime pgbench: 1 - 100 - Read Write - Average Latency pgbench: 1 - 100 - Read Write pgbench: 1 - 50 - Read Write - Average Latency pgbench: 1 - 50 - Read Write pgbench: 1 - 1 - Read Only - Average Latency pgbench: 1 - 1 - Read Only pgbench: 1 - 100 - Read Only - Average Latency pgbench: 1 - 100 - Read Only pgbench: 1 - 50 - Read Only - Average Latency pgbench: 1 - 50 - Read Only aom-av1: Speed 6 Two-Pass dav1d: Summer Nature 4K pyperformance: pathlib ocrmypdf: Processing 60 Page PDF Document rnnoise: onednn: Deconvolution Batch deconv_1d - u8s8f32 - CPU tesseract-ocr: Time To OCR 7 Images openssl: RSA 4096-bit Performance tnn: CPU - MobileNet v2 pyperformance: pickle_pure_python pyperformance: json_loads neatbench: CPU tnn: CPU - SqueezeNet v1.1 pyperformance: nbody dav1d: Chimera 1080p aom-av1: Speed 4 Two-Pass ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU - squeezenet dolfyn: Computational Fluid Dynamics webp: Quality 100, Lossless kvazaar: Bosphorus 1080p - Very Fast aom-av1: Speed 8 Realtime dacapobench: Tradesoap onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU lammps: Rhodopsin Protein kvazaar: Bosphorus 1080p - Ultra Fast x265: Bosphorus 1080p dacapobench: Tradebeans astcenc: Medium onednn: Deconvolution Batch deconv_3d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU webp: Quality 100, Highest Compression dacapobench: Jython dav1d: Summer Nature 1080p onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU astcenc: Fast avifenc: 8 avifenc: 10 onednn: Deconvolution Batch deconv_3d - f32 - CPU neatbench: GPU ffte: N=256, 3D Complex FFT Routine webp: Quality 100 webp: Default 1 2 3 5.851 1312.28 197.762 787.85 771.207 651.27 584.98 751 378.478658 472824990.62325 844 92.668 99977 276.215 274.72 271.16 22.649 0.820 207902 186.64 140.313 160.513 3269353 2954177 145.47 4.15 4.21 114.55 48134384.7 19.563 104.11 103786 1.94453 100.123 99.669 98.182 419.725 599 80273 2.084 480 1507808.8 1534293.6 1535930.8 4.89651 737463.31 4.49627 374 28.70 28.80 14.38 15.20 67.22 14.99 1.46 6.69 3.92 3.21 4.06 5.11 19.30 16.10 61.749 227852 201015 157250 153919 58.664 27.6364 68.0525 1.138 6.80 54.93 34.786 11.95 33.33 12.37 264 1.80103 40095 200 321.803 0.982 254678 149.618 33.31 34.700 18.24 18.69 90.0 0.33 38.6 85.7 85.8 2864 27.90 138 21.87 23.54 176.126 568 81.865 611 0.03 33414 0.365 274463 0.173 289401 4.30 156.40 14.4 22.024 21.587 5.70594 20.134 2431.8 286.035 338 21.3 10.9 269.712 103 639.97 2.72 8.67 3.77 2.10 1.68 8.36 3.53 0.61 2.67 1.48 1.30 1.67 1.45 4.63 3.66 16.833 15.613 46.73 45.92 3652 3.99911 2.58081 6.600 87.16 54.49 2690 8.48 3.58701 17.1183 6.385 3709 581.34 16.4418 5.53 5.022 4.757 6.64261 29.4 31760.023038312 2.077 1.313 5.909 1264.86 199.454 783.83 762.884 650.83 587.97 762 377.557241 474322870.49016 837 89.729 99832 273.155 273.46 269.67 22.740 0.815 207889 186.19 138.389 158.360 3260710 2946580 144.87 4.17 4.23 116.36 48808319.1 19.631 103.42 103349 1.93847 100.236 99.273 98.147 424.257 591 79922 2.412 429 1526391.3 1543312.5 1557794.2 4.67392 736916.33 4.13153 375 29.58 28.91 14.33 15.49 66.60 15.09 1.45 6.68 3.91 3.17 4.07 5.13 19.22 16.10 61.854 226239 199504 156299 153221 58.399 27.3707 68.9994 1.132 6.74 54.71 34.514 12.05 33.24 12.39 261 1.80571 39759 199 326.960 0.995 251429 139.871 33.66 34.844 18.58 18.92 90.0 0.33 38.9 85.8 85.8 2925 27.76 137 22.45 23.44 173.120 578 83.886 596 0.030 33267 0.356 281291 0.169 296088 4.29 160.47 14.3 21.067 21.593 5.59357 20.315 2467.9 287.559 339 21.2 10.8 269.656 103 647.74 2.72 8.32 3.74 2.08 1.72 8.32 3.22 0.62 2.65 1.48 1.31 1.68 1.42 4.58 3.65 16.711 15.603 48.24 46.26 3550 4.06970 2.57926 6.805 92.53 56.14 2645 8.44 3.24189 17.1397 6.386 3698 582.13 16.7441 5.52 4.995 4.738 6.47349 30.5 31509.634098231 2.076 1.315 5.905 1325.38 197.054 787.49 650.06 584.40 671 605.889669 471834576.34719 809 213.515 99792 296.269 275.59 270.29 265.624 0.821 208264 186.87 214.308 159.753 3270433 2957670 145.79 4.15 4.20 113.41 48679363.3 69.374 104.69 103903 2.62968 100.832 99.341 449.114 557 80507 2.312 442 1508093.5 1530241.0 1539617.5 4.70756 734176.59 4.13129 377 29.07 29.20 14.72 15.61 67.82 15.38 1.50 6.85 4.17 3.26 4.20 5.31 19.23 16.38 62.124 227599 201265 157719 154362 58.664 27.3473 68.5948 1.127 6.75 55.23 35.760 11.91 33.34 12.34 261 1.86706 40315 200 324.335 1.004 249348 136.151 33.87 34.612 18.11 18.69 90.1 0.33 39.2 86.1 85.9 2775 27.86 139 21.92 23.36 173.239 578 82.733 604 0.031 32579 0.360 278019 0.171 291995 4.27 156.84 14.5 22.094 21.597 5.70690 20.224 2474.6 284.025 339 21.2 10.9 269.435 103 641.98 2.72 8.70 3.76 2.09 1.66 8.42 3.23 0.63 2.65 1.50 1.31 1.68 1.46 4.77 3.66 17.027 15.694 46.79 45.78 3704 3.87614 2.58232 6.322 87.18 54.42 2801 8.55 3.30402 17.1967 6.382 3718 579.67 16.6046 5.55 5.017 4.733 6.61172 30.6 19217.721194045 2.078 1.318 OpenBenchmarking.org
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 1 2 3 1.3295 2.659 3.9885 5.318 6.6475 SE +/- 0.013, N = 3 SE +/- 0.005, N = 3 SE +/- 0.046, N = 3 5.851 5.909 5.905 1. (CXX) g++ options: -O3 -pthread -lm
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: NVIDIA OptiX 1 2 3 300 600 900 1200 1500 SE +/- 19.68, N = 4 SE +/- 20.32, N = 3 SE +/- 22.81, N = 3 1312.28 1264.86 1325.38
Java Gradle Build Gradle Build: Reactor OpenBenchmarking.org Seconds, Fewer Is Better Java Gradle Build Gradle Build: Reactor 1 2 3 40 80 120 160 200 SE +/- 2.21, N = 12 SE +/- 2.22, N = 12 SE +/- 2.36, N = 12 197.76 199.45 197.05
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CPU-Only 1 2 3 200 400 600 800 1000 SE +/- 1.21, N = 3 SE +/- 0.37, N = 3 SE +/- 0.65, N = 3 787.85 783.83 787.49
Timed LLVM Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 10.0 Time To Compile 1 2 170 340 510 680 850 SE +/- 2.53, N = 3 SE +/- 1.50, N = 3 771.21 762.88
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CPU-Only 1 2 3 140 280 420 560 700 SE +/- 1.32, N = 3 SE +/- 1.12, N = 3 SE +/- 0.68, N = 3 651.27 650.83 650.06
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CPU-Only 1 2 3 130 260 390 520 650 SE +/- 0.63, N = 3 SE +/- 1.27, N = 3 SE +/- 0.47, N = 3 584.98 587.97 584.40
LeelaChessZero Backend: Eigen OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.26 Backend: Eigen 1 2 3 160 320 480 640 800 SE +/- 8.19, N = 3 SE +/- 4.63, N = 3 SE +/- 13.55, N = 9 751 762 671 1. (CXX) g++ options: -flto -pthread
Incompact3D Input: Cylinder OpenBenchmarking.org Seconds, Fewer Is Better Incompact3D 2020-09-17 Input: Cylinder 1 2 3 130 260 390 520 650 SE +/- 1.16, N = 3 SE +/- 1.24, N = 3 SE +/- 3.73, N = 3 378.48 377.56 605.89 1. (F9X) gfortran options: -cpp -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -ldl -levent -levent_pthreads -lutil -lm -lrt -lz
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT 1 2 3 100M 200M 300M 400M 500M SE +/- 493184.05, N = 3 SE +/- 93008.17, N = 3 SE +/- 1375357.19, N = 3 472824990.62 474322870.49 471834576.35 1. (CC) gcc options: -O3 -march=native -lm
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.26 Backend: BLAS 1 2 3 200 400 600 800 1000 SE +/- 10.82, N = 3 844 837 809 1. (CXX) g++ options: -flto -pthread
Rodinia Test: OpenMP HotSpot3D OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP HotSpot3D 1 2 3 50 100 150 200 250 SE +/- 1.38, N = 3 SE +/- 1.37, N = 3 SE +/- 13.20, N = 12 92.67 89.73 213.52 1. (CXX) g++ options: -O2 -lOpenCL
BRL-CAD VGR Performance Metric OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.30.8 VGR Performance Metric 1 2 3 20K 40K 60K 80K 100K 99977 99832 99792 1. (CXX) g++ options: -std=c++11 -pipe -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -pedantic -rdynamic -lSM -lICE -lGLU -lGL -lGLdispatch -lX11 -lXext -lpthread -ldl -luuid -lm
Rodinia Test: OpenMP LavaMD OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD 1 2 3 60 120 180 240 300 SE +/- 1.21, N = 3 SE +/- 0.19, N = 3 SE +/- 3.61, N = 3 276.22 273.16 296.27 1. (CXX) g++ options: -O2 -lOpenCL
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Exhaustive 1 2 3 60 120 180 240 300 SE +/- 0.80, N = 3 SE +/- 0.56, N = 3 SE +/- 0.84, N = 3 274.72 273.46 275.59 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CPU-Only 1 2 3 60 120 180 240 300 SE +/- 0.29, N = 3 SE +/- 0.26, N = 3 SE +/- 0.29, N = 3 271.16 269.67 270.29
Rodinia Test: OpenMP CFD Solver OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver 1 2 3 60 120 180 240 300 SE +/- 0.23, N = 3 SE +/- 0.15, N = 3 SE +/- 37.41, N = 7 22.65 22.74 265.62 1. (CXX) g++ options: -O2 -lOpenCL
GROMACS Water Benchmark OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2020.3 Water Benchmark 1 2 3 0.1847 0.3694 0.5541 0.7388 0.9235 SE +/- 0.001, N = 3 SE +/- 0.003, N = 3 SE +/- 0.001, N = 3 0.820 0.815 0.821 1. (CXX) g++ options: -O3 -pthread -lrt -lpthread -lm
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 1 2 3 40K 80K 120K 160K 200K SE +/- 124.50, N = 3 SE +/- 420.35, N = 3 SE +/- 147.41, N = 3 207902 207889 208264 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CPU-Only 1 2 3 40 80 120 160 200 SE +/- 0.68, N = 3 SE +/- 1.07, N = 3 SE +/- 1.31, N = 3 186.64 186.19 186.87
Rodinia Test: OpenMP Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte 1 2 3 50 100 150 200 250 SE +/- 1.02, N = 3 SE +/- 0.74, N = 3 SE +/- 0.84, N = 3 140.31 138.39 214.31 1. (CXX) g++ options: -O2 -lOpenCL
WireGuard + Linux Networking Stack Stress Test OpenBenchmarking.org Seconds, Fewer Is Better WireGuard + Linux Networking Stack Stress Test 1 2 3 40 80 120 160 200 SE +/- 0.83, N = 3 SE +/- 0.95, N = 3 SE +/- 0.79, N = 3 160.51 158.36 159.75
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 1 2 3 700K 1400K 2100K 2800K 3500K SE +/- 1176.00, N = 3 SE +/- 1202.26, N = 3 SE +/- 369.56, N = 3 3269353 3260710 3270433
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 1 2 3 600K 1200K 1800K 2400K 3000K SE +/- 93.33, N = 3 SE +/- 2042.95, N = 3 SE +/- 867.58, N = 3 2954177 2946580 2957670
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX 1 2 3 30 60 90 120 150 SE +/- 0.92, N = 3 SE +/- 0.89, N = 3 SE +/- 0.98, N = 3 145.47 144.87 145.79
Kvazaar Video Input: Bosphorus 4K - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Slow 1 2 3 0.9383 1.8766 2.8149 3.7532 4.6915 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 4.15 4.17 4.15 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Medium 1 2 3 0.9518 1.9036 2.8554 3.8072 4.759 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 4.21 4.23 4.20 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 10-bit 1 2 3 30 60 90 120 150 SE +/- 0.12, N = 3 SE +/- 0.22, N = 3 SE +/- 0.06, N = 3 114.55 116.36 113.41 MIN: 72.75 / MAX: 275.84 MIN: 73.44 / MAX: 275.45 MIN: 72.47 / MAX: 269.27 1. (CC) gcc options: -pthread
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 1 2 3 10M 20M 30M 40M 50M SE +/- 543289.97, N = 3 SE +/- 22614.50, N = 3 SE +/- 80326.86, N = 3 48134384.7 48808319.1 48679363.3
Rodinia Test: OpenMP Streamcluster OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster 1 2 3 15 30 45 60 75 SE +/- 0.06, N = 3 SE +/- 0.08, N = 3 SE +/- 7.66, N = 12 19.56 19.63 69.37 1. (CXX) g++ options: -O2 -lOpenCL
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: NVIDIA OptiX 1 2 3 20 40 60 80 100 SE +/- 0.79, N = 3 SE +/- 0.75, N = 3 SE +/- 0.86, N = 3 104.11 103.42 104.69
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 1 2 3 20K 40K 60K 80K 100K SE +/- 216.65, N = 3 SE +/- 79.24, N = 3 SE +/- 179.36, N = 3 103786 103349 103903 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 1 2 3 0.5917 1.1834 1.7751 2.3668 2.9585 SE +/- 0.00650, N = 3 SE +/- 0.00297, N = 3 SE +/- 0.04166, N = 3 1.94453 1.93847 2.62968
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.1 Pfam Database Search 1 2 3 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.11, N = 3 SE +/- 0.30, N = 3 100.12 100.24 100.83 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm
libavif avifenc Encoder Speed: 0 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.7.3 Encoder Speed: 0 1 2 3 20 40 60 80 100 SE +/- 0.38, N = 3 SE +/- 0.39, N = 3 SE +/- 0.38, N = 3 99.67 99.27 99.34 1. (CXX) g++ options: -O3 -fPIC
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 5.4 Time To Compile 1 2 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.19, N = 3 98.18 98.15
PostgreSQL pgbench Scaling Factor: 1 - Clients: 250 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 250 - Mode: Read Write - Average Latency 1 2 3 100 200 300 400 500 SE +/- 7.83, N = 15 SE +/- 4.74, N = 15 SE +/- 4.44, N = 3 419.73 424.26 449.11 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 250 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 250 - Mode: Read Write 1 2 3 130 260 390 520 650 SE +/- 11.83, N = 15 SE +/- 6.84, N = 15 SE +/- 5.55, N = 3 599 591 557 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 1 2 3 20K 40K 60K 80K 100K SE +/- 78.77, N = 3 SE +/- 143.26, N = 3 SE +/- 112.10, N = 3 80273 79922 80507 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency 1 2 3 0.5427 1.0854 1.6281 2.1708 2.7135 SE +/- 0.025, N = 3 SE +/- 0.138, N = 12 SE +/- 0.108, N = 12 2.084 2.412 2.312 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write 1 2 3 100 200 300 400 500 SE +/- 5.79, N = 3 SE +/- 22.30, N = 12 SE +/- 17.55, N = 12 480 429 442 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
InfluxDB Concurrent Streams: 4 - Batch Size: 10000 - Tags: 2,5000,1 - Points Per Series: 10000 OpenBenchmarking.org val/sec, More Is Better InfluxDB 1.8.2 Concurrent Streams: 4 - Batch Size: 10000 - Tags: 2,5000,1 - Points Per Series: 10000 1 2 3 300K 600K 900K 1200K 1500K SE +/- 2205.46, N = 3 SE +/- 2194.00, N = 3 SE +/- 2673.51, N = 3 1507808.8 1526391.3 1508093.5
InfluxDB Concurrent Streams: 64 - Batch Size: 10000 - Tags: 2,5000,1 - Points Per Series: 10000 OpenBenchmarking.org val/sec, More Is Better InfluxDB 1.8.2 Concurrent Streams: 64 - Batch Size: 10000 - Tags: 2,5000,1 - Points Per Series: 10000 1 2 3 300K 600K 900K 1200K 1500K SE +/- 5130.21, N = 3 SE +/- 4759.58, N = 3 SE +/- 4617.21, N = 3 1534293.6 1543312.5 1530241.0
InfluxDB Concurrent Streams: 1024 - Batch Size: 10000 - Tags: 2,5000,1 - Points Per Series: 10000 OpenBenchmarking.org val/sec, More Is Better InfluxDB 1.8.2 Concurrent Streams: 1024 - Batch Size: 10000 - Tags: 2,5000,1 - Points Per Series: 10000 1 2 3 300K 600K 900K 1200K 1500K SE +/- 4676.23, N = 3 SE +/- 6187.13, N = 3 SE +/- 1033.19, N = 3 1535930.8 1557794.2 1539617.5
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU 1 2 3 1.1017 2.2034 3.3051 4.4068 5.5085 SE +/- 0.04290, N = 15 SE +/- 0.05834, N = 12 SE +/- 0.04582, N = 3 4.89651 4.67392 4.70756 MIN: 3.92 MIN: 3.78 MIN: 4.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
KeyDB OpenBenchmarking.org Ops/sec, More Is Better KeyDB 6.0.16 1 2 3 160K 320K 480K 640K 800K SE +/- 1649.46, N = 3 SE +/- 3580.61, N = 3 SE +/- 673.14, N = 3 737463.31 736916.33 734176.59 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
oneDNN Harness: IP Batch 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU 1 2 3 1.0117 2.0234 3.0351 4.0468 5.0585 SE +/- 0.04422, N = 15 SE +/- 0.03723, N = 13 SE +/- 0.04076, N = 12 4.49627 4.13153 4.13129 MIN: 3.83 MIN: 3.59 MIN: 3.59 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
PyPerformance Benchmark: raytrace OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: raytrace 1 2 3 80 160 240 320 400 374 375 377
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: yolov4-tiny 1 2 3 7 14 21 28 35 SE +/- 0.04, N = 3 SE +/- 0.35, N = 3 SE +/- 0.12, N = 3 28.70 29.58 29.07 MIN: 28.5 / MAX: 29.09 MIN: 28.17 / MAX: 140.34 MIN: 28.8 / MAX: 29.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet50 1 2 3 7 14 21 28 35 SE +/- 0.17, N = 3 SE +/- 0.43, N = 3 SE +/- 0.53, N = 3 28.80 28.91 29.20 MIN: 27.99 / MAX: 145.54 MIN: 27.49 / MAX: 158.03 MIN: 27.61 / MAX: 140.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: alexnet 1 2 3 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.01, N = 3 SE +/- 0.16, N = 3 14.38 14.33 14.72 MIN: 14.11 / MAX: 17.2 MIN: 14.26 / MAX: 14.55 MIN: 14.22 / MAX: 122.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet18 1 2 3 4 8 12 16 20 SE +/- 0.20, N = 3 SE +/- 0.04, N = 3 SE +/- 0.25, N = 3 15.20 15.49 15.61 MIN: 14.69 / MAX: 15.68 MIN: 14.84 / MAX: 15.88 MIN: 15.01 / MAX: 51.06 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: vgg16 1 2 3 15 30 45 60 75 SE +/- 0.58, N = 3 SE +/- 0.00, N = 3 SE +/- 0.37, N = 3 67.22 66.60 67.82 MIN: 65.96 / MAX: 186.53 MIN: 66.46 / MAX: 67.69 MIN: 66.24 / MAX: 187.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: googlenet 1 2 3 4 8 12 16 20 SE +/- 0.31, N = 3 SE +/- 0.26, N = 3 SE +/- 0.01, N = 3 14.99 15.09 15.38 MIN: 14.17 / MAX: 16.1 MIN: 14.25 / MAX: 15.64 MIN: 15.04 / MAX: 15.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: blazeface 1 2 3 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 1.46 1.45 1.50 MIN: 1.38 / MAX: 1.57 MIN: 1.34 / MAX: 1.66 MIN: 1.42 / MAX: 1.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: efficientnet-b0 1 2 3 2 4 6 8 10 SE +/- 0.14, N = 3 SE +/- 0.15, N = 3 SE +/- 0.01, N = 3 6.69 6.68 6.85 MIN: 6.36 / MAX: 7.32 MIN: 6.34 / MAX: 7.11 MIN: 6.72 / MAX: 7.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mnasnet 1 2 3 0.9383 1.8766 2.8149 3.7532 4.6915 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.13, N = 3 3.92 3.91 4.17 MIN: 3.87 / MAX: 4.28 MIN: 3.86 / MAX: 4.25 MIN: 3.88 / MAX: 4.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: shufflenet-v2 1 2 3 0.7335 1.467 2.2005 2.934 3.6675 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 3.21 3.17 3.26 MIN: 2.98 / MAX: 3.47 MIN: 2.99 / MAX: 3.47 MIN: 2.99 / MAX: 3.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v3-v3 - Model: mobilenet-v3 1 2 3 0.945 1.89 2.835 3.78 4.725 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.15, N = 3 4.06 4.07 4.20 MIN: 4.01 / MAX: 4.32 MIN: 4.03 / MAX: 4.44 MIN: 4.01 / MAX: 4.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v2-v2 - Model: mobilenet-v2 1 2 3 1.1948 2.3896 3.5844 4.7792 5.974 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.14, N = 3 5.11 5.13 5.31 MIN: 5.01 / MAX: 5.37 MIN: 5.01 / MAX: 5.35 MIN: 5.02 / MAX: 8.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mobilenet 1 2 3 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.09, N = 3 19.30 19.22 19.23 MIN: 18.89 / MAX: 19.7 MIN: 18.82 / MAX: 26.38 MIN: 18.97 / MAX: 19.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: squeezenet 1 2 3 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.16, N = 3 16.10 16.10 16.38 MIN: 15.74 / MAX: 17.79 MIN: 15.94 / MAX: 17.11 MIN: 15.97 / MAX: 24.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
RawTherapee Total Benchmark Time OpenBenchmarking.org Seconds, Fewer Is Better RawTherapee Total Benchmark Time 1 2 3 14 28 42 56 70 SE +/- 0.13, N = 3 SE +/- 0.28, N = 3 SE +/- 0.05, N = 3 61.75 61.85 62.12 1. RawTherapee, version 5.8, command line.
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet 1 2 3 50K 100K 150K 200K 250K SE +/- 345.32, N = 3 SE +/- 496.71, N = 3 SE +/- 627.42, N = 3 227852 226239 227599
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile 1 2 3 40K 80K 120K 160K 200K SE +/- 363.56, N = 3 SE +/- 120.29, N = 3 SE +/- 1000.76, N = 3 201015 199504 201265
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant 1 2 3 30K 60K 90K 120K 150K SE +/- 112.51, N = 3 SE +/- 261.09, N = 3 SE +/- 197.44, N = 3 157250 156299 157719
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float 1 2 3 30K 60K 90K 120K 150K SE +/- 387.33, N = 3 SE +/- 330.84, N = 3 SE +/- 188.88, N = 3 153919 153221 154362
libavif avifenc Encoder Speed: 2 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.7.3 Encoder Speed: 2 1 2 3 13 26 39 52 65 SE +/- 0.10, N = 3 SE +/- 0.27, N = 3 SE +/- 0.17, N = 3 58.66 58.40 58.66 1. (CXX) g++ options: -O3 -fPIC
oneDNN Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU 1 2 3 7 14 21 28 35 SE +/- 0.07, N = 3 SE +/- 0.24, N = 3 SE +/- 0.04, N = 3 27.64 27.37 27.35 MIN: 25.19 MIN: 24.77 MIN: 25.36 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch All - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: f32 - Engine: CPU 1 2 3 15 30 45 60 75 SE +/- 0.38, N = 3 SE +/- 0.04, N = 3 SE +/- 0.54, N = 3 68.05 69.00 68.59 MIN: 64.46 MIN: 63.06 MIN: 62.18 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Sunflow Rendering System Global Illumination + Image Synthesis OpenBenchmarking.org Seconds, Fewer Is Better Sunflow Rendering System 0.07.2 Global Illumination + Image Synthesis 1 2 3 0.2561 0.5122 0.7683 1.0244 1.2805 SE +/- 0.010, N = 15 SE +/- 0.011, N = 15 SE +/- 0.011, N = 15 1.138 1.132 1.127 MIN: 0.97 / MAX: 1.48 MIN: 0.94 / MAX: 1.51 MIN: 0.95 / MAX: 1.48
PyPerformance Benchmark: python_startup OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: python_startup 1 2 3 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 6.80 6.74 6.75
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: NVIDIA OptiX 1 2 3 12 24 36 48 60 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 SE +/- 0.12, N = 3 54.93 54.71 55.23
Rodinia Test: OpenCL Myocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenCL Myocyte 1 2 3 8 16 24 32 40 SE +/- 0.13, N = 3 SE +/- 0.05, N = 3 SE +/- 0.36, N = 8 34.79 34.51 35.76 1. (CXX) g++ options: -O2 -lOpenCL
Kvazaar Video Input: Bosphorus 4K - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Very Fast 1 2 3 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 11.95 12.05 11.91 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
ASTC Encoder Preset: Thorough OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Thorough 1 2 3 8 16 24 32 40 SE +/- 0.40, N = 3 SE +/- 0.42, N = 3 SE +/- 0.40, N = 6 33.33 33.24 33.34 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K 1 2 3 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 12.37 12.39 12.34 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
PyPerformance Benchmark: 2to3 OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: 2to3 1 2 3 60 120 180 240 300 264 261 261
oneDNN Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.4201 0.8402 1.2603 1.6804 2.1005 SE +/- 0.01882, N = 8 SE +/- 0.01584, N = 15 SE +/- 0.03006, N = 3 1.80103 1.80571 1.86706 MIN: 1.51 MIN: 1.49 MIN: 1.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 1 2 3 9K 18K 27K 36K 45K SE +/- 134.25, N = 3 SE +/- 121.90, N = 3 SE +/- 115.70, N = 3 40095 39759 40315 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PyPerformance Benchmark: go OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: go 1 2 3 40 80 120 160 200 200 199 200
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 70 140 210 280 350 SE +/- 3.32, N = 3 SE +/- 2.20, N = 3 SE +/- 1.53, N = 3 321.80 326.96 324.34 MIN: 304.56 MIN: 315.72 MIN: 307.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
PostgreSQL pgbench Scaling Factor: 1 - Clients: 250 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 250 - Mode: Read Only - Average Latency 1 2 3 0.2259 0.4518 0.6777 0.9036 1.1295 SE +/- 0.003, N = 3 SE +/- 0.013, N = 5 SE +/- 0.013, N = 5 0.982 0.995 1.004 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 250 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 250 - Mode: Read Only 1 2 3 50K 100K 150K 200K 250K SE +/- 853.38, N = 3 SE +/- 3217.19, N = 5 SE +/- 3216.57, N = 5 254678 251429 249348 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 30 60 90 120 150 SE +/- 1.81, N = 3 SE +/- 0.87, N = 3 SE +/- 1.35, N = 3 149.62 139.87 136.15 MIN: 143.31 MIN: 136.06 MIN: 130.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LibRaw Post-Processing Benchmark OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark 1 2 3 8 16 24 32 40 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 SE +/- 0.08, N = 3 33.31 33.66 33.87 1. (CXX) g++ options: -O2 -fopenmp -ljpeg -lz -lm
WebP Image Encode Encode Settings: Quality 100, Lossless, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression 1 2 3 8 16 24 32 40 SE +/- 0.23, N = 3 SE +/- 0.14, N = 3 SE +/- 0.07, N = 3 34.70 34.84 34.61 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Slow 1 2 3 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.15, N = 3 SE +/- 0.02, N = 3 18.24 18.58 18.11 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Medium 1 2 3 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 18.69 18.92 18.69 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
PyPerformance Benchmark: float OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: float 1 2 3 20 40 60 80 100 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.36, N = 3 90.0 90.0 90.1
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 0 Two-Pass 1 2 3 0.0743 0.1486 0.2229 0.2972 0.3715 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.33 0.33 0.33 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
PyPerformance Benchmark: django_template OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: django_template 1 2 3 9 18 27 36 45 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 38.6 38.9 39.2
PyPerformance Benchmark: chaos OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: chaos 1 2 3 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 85.7 85.8 86.1
PyPerformance Benchmark: crypto_pyaes OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: crypto_pyaes 1 2 3 20 40 60 80 100 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 85.8 85.8 85.9
DaCapo Benchmark Java Test: H2 OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 9.12-MR1 Java Test: H2 1 2 3 600 1200 1800 2400 3000 SE +/- 46.69, N = 19 SE +/- 39.90, N = 20 SE +/- 38.16, N = 4 2864 2925 2775
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: NVIDIA OptiX 1 2 3 7 14 21 28 35 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 27.90 27.76 27.86
PyPerformance Benchmark: regex_compile OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: regex_compile 1 2 3 30 60 90 120 150 138 137 139
Kvazaar Video Input: Bosphorus 4K - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast 1 2 3 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.23, N = 3 SE +/- 0.11, N = 3 21.87 22.45 21.92 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Realtime 1 2 3 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.10, N = 3 SE +/- 0.09, N = 3 23.54 23.44 23.36 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
PostgreSQL pgbench Scaling Factor: 1 - Clients: 100 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 100 - Mode: Read Write - Average Latency 1 2 3 40 80 120 160 200 SE +/- 0.42, N = 3 SE +/- 1.60, N = 3 SE +/- 1.01, N = 3 176.13 173.12 173.24 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 100 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 100 - Mode: Read Write 1 2 3 120 240 360 480 600 SE +/- 1.34, N = 3 SE +/- 5.30, N = 3 SE +/- 3.40, N = 3 568 578 578 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency 1 2 3 20 40 60 80 100 SE +/- 1.15, N = 3 SE +/- 0.03, N = 3 SE +/- 0.67, N = 3 81.87 83.89 82.73 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write 1 2 3 130 260 390 520 650 SE +/- 8.58, N = 3 SE +/- 0.22, N = 3 SE +/- 4.93, N = 3 611 596 604 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency 1 2 3 0.007 0.014 0.021 0.028 0.035 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.030 0.030 0.031 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only 1 2 3 7K 14K 21K 28K 35K SE +/- 77.57, N = 3 SE +/- 373.98, N = 3 SE +/- 92.20, N = 3 33414 33267 32579 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 100 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 100 - Mode: Read Only - Average Latency 1 2 3 0.0821 0.1642 0.2463 0.3284 0.4105 SE +/- 0.003, N = 3 SE +/- 0.004, N = 3 SE +/- 0.001, N = 3 0.365 0.356 0.360 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 100 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 100 - Mode: Read Only 1 2 3 60K 120K 180K 240K 300K SE +/- 1999.95, N = 3 SE +/- 3007.11, N = 3 SE +/- 753.18, N = 3 274463 281291 278019 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency 1 2 3 0.0389 0.0778 0.1167 0.1556 0.1945 SE +/- 0.001, N = 3 SE +/- 0.003, N = 3 SE +/- 0.002, N = 3 0.173 0.169 0.171 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only 1 2 3 60K 120K 180K 240K 300K SE +/- 1001.63, N = 3 SE +/- 4333.65, N = 3 SE +/- 3723.51, N = 3 289401 296088 291995 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Two-Pass 1 2 3 0.9675 1.935 2.9025 3.87 4.8375 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 4.30 4.29 4.27 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 4K 1 2 3 40 80 120 160 200 SE +/- 1.17, N = 3 SE +/- 1.07, N = 3 SE +/- 0.52, N = 3 156.40 160.47 156.84 MIN: 125.77 / MAX: 171.79 MIN: 149.07 / MAX: 177.15 MIN: 148.02 / MAX: 171.09 1. (CC) gcc options: -pthread
PyPerformance Benchmark: pathlib OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: pathlib 1 2 3 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.09, N = 3 14.4 14.3 14.5
OCRMyPDF Processing 60 Page PDF Document OpenBenchmarking.org Seconds, Fewer Is Better OCRMyPDF 10.3.1+dfsg Processing 60 Page PDF Document 1 2 3 5 10 15 20 25 SE +/- 0.12, N = 3 SE +/- 0.22, N = 3 SE +/- 0.21, N = 3 22.02 21.07 22.09
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 1 2 3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 21.59 21.59 21.60 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 1.2841 2.5682 3.8523 5.1364 6.4205 SE +/- 0.03901, N = 3 SE +/- 0.01208, N = 3 SE +/- 0.05691, N = 3 5.70594 5.59357 5.70690 MIN: 4.96 MIN: 4.91 MIN: 4.91 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Tesseract OCR Time To OCR 7 Images OpenBenchmarking.org Seconds, Fewer Is Better Tesseract OCR 4.1.1 Time To OCR 7 Images 1 2 3 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 20.13 20.32 20.22
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance 1 2 3 500 1000 1500 2000 2500 SE +/- 27.67, N = 3 SE +/- 18.02, N = 3 SE +/- 14.41, N = 3 2431.8 2467.9 2474.6 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 1 2 3 60 120 180 240 300 SE +/- 1.30, N = 3 SE +/- 0.68, N = 3 SE +/- 0.57, N = 3 286.04 287.56 284.03 MIN: 283.63 / MAX: 356.82 MIN: 285.93 / MAX: 307.5 MIN: 282.82 / MAX: 325.11 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
PyPerformance Benchmark: pickle_pure_python OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: pickle_pure_python 1 2 3 70 140 210 280 350 338 339 339
PyPerformance Benchmark: json_loads OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: json_loads 1 2 3 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 21.3 21.2 21.2
NeatBench Acceleration: CPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: CPU 1 2 3 3 6 9 12 15 SE +/- 1.26, N = 16 SE +/- 1.23, N = 16 SE +/- 1.26, N = 16 10.9 10.8 10.9
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 1 2 3 60 120 180 240 300 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 SE +/- 0.13, N = 3 269.71 269.66 269.44 MIN: 268.38 / MAX: 284.03 MIN: 268.59 / MAX: 282.48 MIN: 268.04 / MAX: 281.47 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
PyPerformance Benchmark: nbody OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.0.0 Benchmark: nbody 1 2 3 20 40 60 80 100 103 103 103
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 1 2 3 140 280 420 560 700 SE +/- 2.78, N = 3 SE +/- 1.12, N = 3 SE +/- 3.61, N = 3 639.97 647.74 641.98 MIN: 466.44 / MAX: 934.79 MIN: 471.76 / MAX: 969.53 MIN: 460.65 / MAX: 940.69 1. (CC) gcc options: -pthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 4 Two-Pass 1 2 3 0.612 1.224 1.836 2.448 3.06 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 2.72 2.72 2.72 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: yolov4-tiny 1 2 3 2 4 6 8 10 SE +/- 0.18, N = 3 SE +/- 0.02, N = 3 SE +/- 0.34, N = 3 8.67 8.32 8.70 MIN: 8.03 / MAX: 53 MIN: 8.01 / MAX: 8.7 MIN: 8.04 / MAX: 83.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet50 1 2 3 0.8483 1.6966 2.5449 3.3932 4.2415 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.77 3.74 3.76 MIN: 3.72 / MAX: 13.18 MIN: 3.72 / MAX: 3.76 MIN: 3.74 / MAX: 3.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: alexnet 1 2 3 0.4725 0.945 1.4175 1.89 2.3625 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.10 2.08 2.09 MIN: 1.85 / MAX: 2.59 MIN: 2.04 / MAX: 2.57 MIN: 1.82 / MAX: 2.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet18 1 2 3 0.387 0.774 1.161 1.548 1.935 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 1.68 1.72 1.66 MIN: 1.64 / MAX: 17.75 MIN: 1.64 / MAX: 23.58 MIN: 1.64 / MAX: 1.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: vgg16 1 2 3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 8.36 8.32 8.42 MIN: 7.77 / MAX: 22.44 MIN: 7.77 / MAX: 22.07 MIN: 7.88 / MAX: 18.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: googlenet 1 2 3 0.7943 1.5886 2.3829 3.1772 3.9715 SE +/- 0.31, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 3.53 3.22 3.23 MIN: 3.2 / MAX: 21.92 MIN: 3.21 / MAX: 3.25 MIN: 3.22 / MAX: 3.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: blazeface 1 2 3 0.1418 0.2836 0.4254 0.5672 0.709 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 0.61 0.62 0.63 MIN: 0.6 / MAX: 0.63 MIN: 0.6 / MAX: 0.66 MIN: 0.61 / MAX: 2.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: efficientnet-b0 1 2 3 0.6008 1.2016 1.8024 2.4032 3.004 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.67 2.65 2.65 MIN: 2.64 / MAX: 12.58 MIN: 2.63 / MAX: 3.85 MIN: 2.63 / MAX: 2.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mnasnet 1 2 3 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 1.48 1.48 1.50 MIN: 1.47 / MAX: 1.72 MIN: 1.47 / MAX: 1.53 MIN: 1.47 / MAX: 6.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: shufflenet-v2 1 2 3 0.2948 0.5896 0.8844 1.1792 1.474 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.30 1.31 1.31 MIN: 1.29 / MAX: 1.32 MIN: 1.29 / MAX: 1.35 MIN: 1.3 / MAX: 1.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 1 2 3 0.378 0.756 1.134 1.512 1.89 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.67 1.68 1.68 MIN: 1.66 / MAX: 1.7 MIN: 1.66 / MAX: 1.93 MIN: 1.66 / MAX: 1.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 1 2 3 0.3285 0.657 0.9855 1.314 1.6425 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 1.45 1.42 1.46 MIN: 1.41 / MAX: 20.49 MIN: 1.41 / MAX: 1.44 MIN: 1.41 / MAX: 20.24 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mobilenet 1 2 3 1.0733 2.1466 3.2199 4.2932 5.3665 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.12, N = 3 4.63 4.58 4.77 MIN: 4.56 / MAX: 5.02 MIN: 4.54 / MAX: 4.65 MIN: 4.56 / MAX: 34.32 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: squeezenet 1 2 3 0.8235 1.647 2.4705 3.294 4.1175 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.66 3.65 3.66 MIN: 3.6 / MAX: 3.77 MIN: 3.6 / MAX: 3.7 MIN: 3.61 / MAX: 3.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Dolfyn Computational Fluid Dynamics OpenBenchmarking.org Seconds, Fewer Is Better Dolfyn 0.527 Computational Fluid Dynamics 1 2 3 4 8 12 16 20 SE +/- 0.12, N = 3 SE +/- 0.03, N = 3 SE +/- 0.08, N = 3 16.83 16.71 17.03
WebP Image Encode Encode Settings: Quality 100, Lossless OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless 1 2 3 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 15.61 15.60 15.69 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Very Fast 1 2 3 11 22 33 44 55 SE +/- 0.52, N = 3 SE +/- 0.64, N = 5 SE +/- 0.02, N = 3 46.73 48.24 46.79 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 8 Realtime 1 2 3 10 20 30 40 50 SE +/- 0.13, N = 3 SE +/- 0.09, N = 3 SE +/- 0.23, N = 3 45.92 46.26 45.78 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
DaCapo Benchmark Java Test: Tradesoap OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 9.12-MR1 Java Test: Tradesoap 1 2 3 800 1600 2400 3200 4000 SE +/- 18.80, N = 4 SE +/- 25.69, N = 4 SE +/- 42.18, N = 4 3652 3550 3704
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.9157 1.8314 2.7471 3.6628 4.5785 SE +/- 0.00919, N = 3 SE +/- 0.04507, N = 3 SE +/- 0.00529, N = 3 3.99911 4.06970 3.87614 MIN: 3.93 MIN: 3.86 MIN: 3.82 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.581 1.162 1.743 2.324 2.905 SE +/- 0.02285, N = 3 SE +/- 0.03056, N = 3 SE +/- 0.02621, N = 3 2.58081 2.57926 2.58232 MIN: 2.21 MIN: 2.16 MIN: 2.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 1 2 3 2 4 6 8 10 SE +/- 0.086, N = 15 SE +/- 0.019, N = 3 SE +/- 0.197, N = 12 6.600 6.805 6.322 1. (CXX) g++ options: -O3 -pthread -lm
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast 1 2 3 20 40 60 80 100 SE +/- 1.07, N = 3 SE +/- 0.90, N = 9 SE +/- 1.01, N = 3 87.16 92.53 87.18 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p 1 2 3 13 26 39 52 65 SE +/- 0.06, N = 3 SE +/- 0.96, N = 3 SE +/- 0.78, N = 3 54.49 56.14 54.42 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
DaCapo Benchmark Java Test: Tradebeans OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 9.12-MR1 Java Test: Tradebeans 1 2 3 600 1200 1800 2400 3000 SE +/- 35.35, N = 5 SE +/- 27.27, N = 4 SE +/- 21.94, N = 4 2690 2645 2801
ASTC Encoder Preset: Medium OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Medium 1 2 3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 8.48 8.44 8.55 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.8071 1.6142 2.4213 3.2284 4.0355 SE +/- 0.03967, N = 3 SE +/- 0.05394, N = 3 SE +/- 0.03320, N = 15 3.58701 3.24189 3.30402 MIN: 3.47 MIN: 3.1 MIN: 3.1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 4 8 12 16 20 SE +/- 0.13, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 4 17.12 17.14 17.20 MIN: 16.89 MIN: 16.92 MIN: 16.87 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
WebP Image Encode Encode Settings: Quality 100, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression 1 2 3 2 4 6 8 10 SE +/- 0.014, N = 3 SE +/- 0.012, N = 3 SE +/- 0.013, N = 3 6.385 6.386 6.382 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
DaCapo Benchmark Java Test: Jython OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 9.12-MR1 Java Test: Jython 1 2 3 800 1600 2400 3200 4000 SE +/- 52.97, N = 4 SE +/- 52.89, N = 4 SE +/- 27.24, N = 4 3709 3698 3718
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 1080p 1 2 3 130 260 390 520 650 SE +/- 1.17, N = 3 SE +/- 0.67, N = 3 SE +/- 1.29, N = 3 581.34 582.13 579.67 MIN: 498.37 / MAX: 635.39 MIN: 511.78 / MAX: 635.13 MIN: 479.84 / MAX: 634.42 1. (CC) gcc options: -pthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.27, N = 3 SE +/- 0.16, N = 3 16.44 16.74 16.60 MIN: 16.35 MIN: 16.38 MIN: 16.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
ASTC Encoder Preset: Fast OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Fast 1 2 3 1.2488 2.4976 3.7464 4.9952 6.244 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.53 5.52 5.55 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
libavif avifenc Encoder Speed: 8 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.7.3 Encoder Speed: 8 1 2 3 1.13 2.26 3.39 4.52 5.65 SE +/- 0.013, N = 3 SE +/- 0.019, N = 3 SE +/- 0.017, N = 3 5.022 4.995 5.017 1. (CXX) g++ options: -O3 -fPIC
libavif avifenc Encoder Speed: 10 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.7.3 Encoder Speed: 10 1 2 3 1.0703 2.1406 3.2109 4.2812 5.3515 SE +/- 0.009, N = 3 SE +/- 0.014, N = 3 SE +/- 0.010, N = 3 4.757 4.738 4.733 1. (CXX) g++ options: -O3 -fPIC
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.02212, N = 3 SE +/- 0.00067, N = 3 SE +/- 0.09853, N = 4 6.64261 6.47349 6.61172 MIN: 6.44 MIN: 6.31 MIN: 6.32 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
NeatBench Acceleration: GPU OpenBenchmarking.org FPS, More Is Better NeatBench 5 Acceleration: GPU 1 2 3 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.50, N = 15 SE +/- 0.50, N = 15 29.4 30.5 30.6
FFTE N=256, 3D Complex FFT Routine OpenBenchmarking.org MFLOPS, More Is Better FFTE 7.0 N=256, 3D Complex FFT Routine 1 2 3 7K 14K 21K 28K 35K SE +/- 39.81, N = 3 SE +/- 21.47, N = 3 SE +/- 122.08, N = 3 31760.02 31509.63 19217.72 1. (F9X) gfortran options: -O3 -fomit-frame-pointer -fopenmp
WebP Image Encode Encode Settings: Quality 100 OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 1 2 3 0.4676 0.9352 1.4028 1.8704 2.338 SE +/- 0.002, N = 3 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 2.077 2.076 2.078 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
WebP Image Encode Encode Settings: Default OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Default 1 2 3 0.2966 0.5932 0.8898 1.1864 1.483 SE +/- 0.000, N = 3 SE +/- 0.003, N = 3 SE +/- 0.006, N = 3 1.313 1.315 1.318 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
Phoronix Test Suite v10.8.4