installres1 AMD Ryzen 5 5600X 6-Core testing with a ASRock X570 Phantom Gaming-ITX/TB3 (P3.00 BIOS) and NVIDIA GeForce RTX 3090 24GB on Ubuntu 18.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103310-HA-INSTALLRE08&gru .
installres1 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution NVIDIA GeForce RTX 3090 AMD Ryzen 5 5600X 6-Core @ 3.70GHz (6 Cores / 12 Threads) ASRock X570 Phantom Gaming-ITX/TB3 (P3.00 BIOS) AMD Device 1480 64GB 2000GB Samsung SSD 970 EVO Plus 2TB + 4001GB Samsung SSD 870 + ProductCode NVIDIA GeForce RTX 3090 24GB NVIDIA Device 1aef marantz-AVR Intel I211 + Intel Device 2723 Ubuntu 18.04 5.4.0-70-generic (x86_64) GNOME Shell 3.28.4 X Server 1.20.8 NVIDIA 460.32.03 4.6.0 OpenCL 1.2 CUDA 11.2.109 1.2.155 GCC 7.5.0 + CUDA 11.2 ext4 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201009 - GPU Compute Cores: 10496 - Python 3.8.5 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
installres1 plaidml: No - Inference - VGG16 - CPU plaidml: No - Inference - ResNet 50 - CPU shoc: OpenCL - Triad shoc: OpenCL - Reduction shoc: OpenCL - Bus Speed Download shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Texture Read Bandwidth shoc: OpenCL - S3D shoc: OpenCL - FFT SP shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Max SP Flops shoc: OpenCL - MD5 Hash numpy: ai-benchmark: Device Inference Score ai-benchmark: Device Training Score ai-benchmark: Device AI Score tensorflow-lite: SqueezeNet tensorflow-lite: Inception V4 tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Float tensorflow-lite: Mobilenet Quant tensorflow-lite: Inception ResNet V2 onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 deepspeech: CPU rnnoise: ecp-candle: P1B2 ecp-candle: P3B1 ecp-candle: P3B2 numenta-nab: EXPoSE numenta-nab: Relative Entropy numenta-nab: Windowed Gaussian numenta-nab: Earthgecko Skyline numenta-nab: Bayesian Changepoint mlpack: scikit_ica mlpack: scikit_qda mlpack: scikit_svm mlpack: scikit_linearridgeregression scikit-learn: NVIDIA GeForce RTX 3090 10.99 9.40 12.8371 383.062 13.0704 13.1776 2156.05 428.140 2346.70 8373.79 39360.4 42.9143 466.33 1111 1163 2274 234227 3457463 192684 159375 174205 3121703 4.64288 7.09188 1.99499 1.47217 13.5351 10.5388 8.33697 12.8089 2.61408 4.37194 4007.49 2107.13 3999.33 2135.24 2.25636 4001.99 2109.64 3.53795 4.143 23.676 2.099 2.734 28.194 14.22 3.68 3.11 4.40 3.51 5.37 1.40 13.13 55.98 15.15 13.48 27.47 22.15 16.70 9.73 14.18 3.68 3.18 4.31 3.43 5.09 1.40 12.97 55.85 15.11 13.42 27.56 22.63 16.93 9.62 213.475 214.405 63.92980 15.974 31.596 896.151 453.131 319.329 19.078 13.084 141.338 43.282 33.52 120.45 17.81 2.43 8.735 OpenBenchmarking.org
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.17, N = 3 10.99
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.03, N = 3 9.40
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.00, N = 3 12.84 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction NVIDIA GeForce RTX 3090 80 160 240 320 400 SE +/- 0.34, N = 3 383.06 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.00, N = 3 13.07 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.00, N = 3 13.18 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 0.75, N = 3 2156.05 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.05, N = 3 428.14 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 0.45, N = 3 2346.70 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N NVIDIA GeForce RTX 3090 2K 4K 6K 8K 10K SE +/- 70.49, N = 3 8373.79 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops NVIDIA GeForce RTX 3090 8K 16K 24K 32K 40K SE +/- 160.97, N = 3 39360.4 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.00, N = 3 42.91 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark NVIDIA GeForce RTX 3090 100 200 300 400 500 SE +/- 0.17, N = 3 466.33
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score NVIDIA GeForce RTX 3090 200 400 600 800 1000 1111
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score NVIDIA GeForce RTX 3090 300 600 900 1200 1500 1163
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 2274
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet NVIDIA GeForce RTX 3090 50K 100K 150K 200K 250K SE +/- 102.89, N = 3 234227
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 NVIDIA GeForce RTX 3090 700K 1400K 2100K 2800K 3500K SE +/- 684.89, N = 3 3457463
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile NVIDIA GeForce RTX 3090 40K 80K 120K 160K 200K SE +/- 231.45, N = 3 192684
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float NVIDIA GeForce RTX 3090 30K 60K 90K 120K 150K SE +/- 31.27, N = 3 159375
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant NVIDIA GeForce RTX 3090 40K 80K 120K 160K 200K SE +/- 78.39, N = 3 174205
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 NVIDIA GeForce RTX 3090 700K 1400K 2100K 2800K 3500K SE +/- 624.96, N = 3 3121703
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 1.0446 2.0892 3.1338 4.1784 5.223 SE +/- 0.01355, N = 3 4.64288 MIN: 4.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.02249, N = 3 7.09188 MIN: 6.94 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 0.4489 0.8978 1.3467 1.7956 2.2445 SE +/- 0.01040, N = 3 1.99499 MIN: 1.81 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 0.3312 0.6624 0.9936 1.3248 1.656 SE +/- 0.00109, N = 3 1.47217 MIN: 1.41 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.02, N = 3 13.54 MIN: 13.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.02, N = 3 10.54 MIN: 6.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.02918, N = 3 8.33697 MIN: 8.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.20, N = 4 12.81 MIN: 12.25 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 0.5882 1.1764 1.7646 2.3528 2.941 SE +/- 0.00540, N = 3 2.61408 MIN: 2.41 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 0.9837 1.9674 2.9511 3.9348 4.9185 SE +/- 0.00941, N = 3 4.37194 MIN: 4.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 900 1800 2700 3600 4500 SE +/- 3.18, N = 3 4007.49 MIN: 3971.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 1.73, N = 3 2107.13 MIN: 2086.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 900 1800 2700 3600 4500 SE +/- 5.89, N = 3 3999.33 MIN: 3962.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 27.02, N = 3 2135.24 MIN: 2083.3 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3090 0.5077 1.0154 1.5231 2.0308 2.5385 SE +/- 0.00794, N = 3 2.25636 MIN: 2.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU NVIDIA GeForce RTX 3090 900 1800 2700 3600 4500 SE +/- 3.64, N = 3 4001.99 MIN: 3964.95 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 2.08, N = 3 2109.64 MIN: 2084.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3090 0.796 1.592 2.388 3.184 3.98 SE +/- 0.00554, N = 3 3.53795 MIN: 3.28 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: SqueezeNetV1.0 NVIDIA GeForce RTX 3090 0.9322 1.8644 2.7966 3.7288 4.661 SE +/- 0.044, N = 3 4.143 MIN: 3.94 / MAX: 60.31 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: resnet-v2-50 NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.06, N = 3 23.68 MIN: 23.08 / MAX: 54.3 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: MobileNetV2_224 NVIDIA GeForce RTX 3090 0.4723 0.9446 1.4169 1.8892 2.3615 SE +/- 0.002, N = 3 2.099 MIN: 2.05 / MAX: 6.64 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: mobilenet-v1-1.0 NVIDIA GeForce RTX 3090 0.6152 1.2304 1.8456 2.4608 3.076 SE +/- 0.007, N = 3 2.734 MIN: 2.67 / MAX: 7.63 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: inception-v3 NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.05, N = 3 28.19 MIN: 27.64 / MAX: 58.71 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.22, N = 4 14.22 MIN: 13.84 / MAX: 23.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 NVIDIA GeForce RTX 3090 0.828 1.656 2.484 3.312 4.14 SE +/- 0.01, N = 4 3.68 MIN: 3.58 / MAX: 8.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 NVIDIA GeForce RTX 3090 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.01, N = 4 3.11 MIN: 3.06 / MAX: 8.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 NVIDIA GeForce RTX 3090 0.99 1.98 2.97 3.96 4.95 SE +/- 0.04, N = 4 4.40 MIN: 4.29 / MAX: 29.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet NVIDIA GeForce RTX 3090 0.7898 1.5796 2.3694 3.1592 3.949 SE +/- 0.04, N = 4 3.51 MIN: 3.32 / MAX: 22.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 NVIDIA GeForce RTX 3090 1.2083 2.4166 3.6249 4.8332 6.0415 SE +/- 0.11, N = 4 5.37 MIN: 5 / MAX: 19.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface NVIDIA GeForce RTX 3090 0.315 0.63 0.945 1.26 1.575 SE +/- 0.03, N = 4 1.40 MIN: 1.33 / MAX: 5.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 4 13.13 MIN: 12.48 / MAX: 23.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 NVIDIA GeForce RTX 3090 13 26 39 52 65 SE +/- 0.16, N = 4 55.98 MIN: 54.46 / MAX: 67.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.04, N = 4 15.15 MIN: 14.72 / MAX: 25.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.11, N = 4 13.48 MIN: 12.81 / MAX: 24.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.10, N = 4 27.47 MIN: 26.81 / MAX: 52.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.16, N = 4 22.15 MIN: 21.24 / MAX: 31.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.02, N = 4 16.70 MIN: 16.4 / MAX: 35.06 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.08, N = 4 9.73 MIN: 9.45 / MAX: 34.2 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: mobilenet NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.08, N = 3 14.18 MIN: 13.89 / MAX: 18.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 NVIDIA GeForce RTX 3090 0.828 1.656 2.484 3.312 4.14 SE +/- 0.01, N = 3 3.68 MIN: 3.6 / MAX: 8.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 NVIDIA GeForce RTX 3090 0.7155 1.431 2.1465 2.862 3.5775 SE +/- 0.04, N = 3 3.18 MIN: 3.03 / MAX: 20.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: shufflenet-v2 NVIDIA GeForce RTX 3090 0.9698 1.9396 2.9094 3.8792 4.849 SE +/- 0.07, N = 3 4.31 MIN: 4.11 / MAX: 22.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: mnasnet NVIDIA GeForce RTX 3090 0.7718 1.5436 2.3154 3.0872 3.859 SE +/- 0.07, N = 3 3.43 MIN: 3.3 / MAX: 29.72 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: efficientnet-b0 NVIDIA GeForce RTX 3090 1.1453 2.2906 3.4359 4.5812 5.7265 SE +/- 0.09, N = 3 5.09 MIN: 4.94 / MAX: 9.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: blazeface NVIDIA GeForce RTX 3090 0.315 0.63 0.945 1.26 1.575 SE +/- 0.05, N = 3 1.40 MIN: 1.32 / MAX: 3.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: googlenet NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.06, N = 3 12.97 MIN: 12.29 / MAX: 24.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: vgg16 NVIDIA GeForce RTX 3090 13 26 39 52 65 SE +/- 0.32, N = 3 55.85 MIN: 54.73 / MAX: 68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: resnet18 NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.02, N = 3 15.11 MIN: 14.69 / MAX: 24.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: alexnet NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.18, N = 3 13.42 MIN: 12.87 / MAX: 23.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: resnet50 NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.04, N = 3 27.56 MIN: 26.94 / MAX: 58.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: yolov4-tiny NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.02, N = 3 22.63 MIN: 21.64 / MAX: 47.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: squeezenet_ssd NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.07, N = 3 16.93 MIN: 16.52 / MAX: 24.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: Vulkan GPU - Model: regnety_400m NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.04, N = 3 9.62 MIN: 9.41 / MAX: 14.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 NVIDIA GeForce RTX 3090 50 100 150 200 250 SE +/- 0.11, N = 3 213.48 MIN: 212.57 / MAX: 222.67 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 NVIDIA GeForce RTX 3090 50 100 150 200 250 SE +/- 0.06, N = 3 214.41 MIN: 214.21 / MAX: 217.45 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU NVIDIA GeForce RTX 3090 14 28 42 56 70 SE +/- 0.22, N = 3 63.93
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.02, N = 3 15.97 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P1B2 NVIDIA GeForce RTX 3090 7 14 21 28 35 31.60
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B1 NVIDIA GeForce RTX 3090 200 400 600 800 1000 896.15
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B2 NVIDIA GeForce RTX 3090 100 200 300 400 500 453.13
Numenta Anomaly Benchmark Detector: EXPoSE OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: EXPoSE NVIDIA GeForce RTX 3090 70 140 210 280 350 SE +/- 0.35, N = 3 319.33
Numenta Anomaly Benchmark Detector: Relative Entropy OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Relative Entropy NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.26, N = 3 19.08
Numenta Anomaly Benchmark Detector: Windowed Gaussian OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Windowed Gaussian NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.06, N = 3 13.08
Numenta Anomaly Benchmark Detector: Earthgecko Skyline OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Earthgecko Skyline NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.95, N = 3 141.34
Numenta Anomaly Benchmark Detector: Bayesian Changepoint OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Bayesian Changepoint NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.35, N = 3 43.28
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica NVIDIA GeForce RTX 3090 8 16 24 32 40 SE +/- 0.02, N = 3 33.52
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.77, N = 3 120.45
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.07, N = 3 17.81
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression NVIDIA GeForce RTX 3090 0.5468 1.0936 1.6404 2.1872 2.734 SE +/- 0.01, N = 3 2.43
Scikit-Learn OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 0.22.1 NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.085, N = 3 8.735
Phoronix Test Suite v10.8.4