102424machinelearningtest Intel Core i9-12900K testing with a ASUS PRIME Z790-V AX (1802 BIOS) and ASUS NVIDIA w Dual GeForce RTX 3090 24GB w NVLink on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410281-NE-102424MAC72&grr .
102424machinelearningtest Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution ASUS NVIDIA GeForce RTX 3090 Intel Core i9-12900K @ 5.10GHz (16 Cores / 24 Threads) ASUS PRIME Z790-V AX (1802 BIOS) Intel Raptor Lake-S PCH 96GB 2000GB Samsung SSD 970 EVO Plus 2TB ASUS NVIDIA GeForce RTX 3090 24GB Intel Raptor Lake HD Audio S24F350 Realtek RTL8111/8168/8211/8411 + Realtek Device b851 Ubuntu 24.04 6.8.0-47-generic (x86_64) GNOME Shell 46.0 X Server + Wayland NVIDIA 560.35.03 4.6.0 OpenCL 3.0 CUDA 12.6.65 GCC 13.2.0 + CUDA 12.5 ext4 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: madvise - PRIMUS_libGLa=/usr/lib/nvidia-current/libGL.so.1:/usr/lib32/nvidia-current/libGL.so.1:/usr/lib/x86_64-linux-gnu/libGL.so.1:/usr/lib/i386-linux-gnu/libGL.so.1 PRIMUS_libGLd=/usr/$LIB/libGL.so.1:/usr/lib/$LIB/libGL.so.1:/usr/$LIB/mesa/libGL.so.1:/usr/lib/$LIB/mesa/libGL.so.1 - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x37 - Thermald 2.5.6 - BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 94.02.4b.00.0b - GPU Compute Cores: 10496 - Python 3.12.3 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Mitigation of Clear Register File + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected
102424machinelearningtest tensorflow: GPU - 512 - VGG-16 tensorflow: GPU - 256 - VGG-16 tensorflow: GPU - 512 - ResNet-50 tensorflow: CPU - 512 - VGG-16 tensorflow: GPU - 256 - ResNet-50 tensorflow: GPU - 64 - VGG-16 tensorflow: CPU - 256 - VGG-16 tensorflow: GPU - 512 - GoogLeNet tensorflow: CPU - 512 - ResNet-50 whisper-cpp: ggml-medium.en - 2016 State of the Union scikit-learn: Isotonic / Perturbed Logarithm scikit-learn: Isotonic / Logistic scikit-learn: Hist Gradient Boosting Adult tensorflow: GPU - 32 - VGG-16 scikit-learn: Hist Gradient Boosting shoc: OpenCL - Max SP Flops tensorflow: GPU - 512 - AlexNet tensorflow: GPU - 256 - GoogLeNet tensorflow: CPU - 256 - ResNet-50 tensorflow: GPU - 64 - ResNet-50 xnnpack: QS8MobileNetV2 xnnpack: FP16MobileNetV3Small xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV1 pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 512 - Efficientnet_v2_l tensorflow: GPU - 16 - VGG-16 scikit-learn: Sparse Rand Projections / 100 Iterations tensorflow: CPU - 64 - VGG-16 scikit-learn: SAGA pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 512 - ResNet-152 tensorflow: GPU - 256 - AlexNet whisper-cpp: ggml-small.en - 2016 State of the Union pytorch: CPU - 256 - ResNet-152 tensorflow: CPU - 512 - GoogLeNet lczero: BLAS ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet ncnn: CPU - FastestDet ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet scikit-learn: SGDOneClassSVM tensorflow: GPU - 32 - ResNet-50 scikit-learn: Plot Parallel Pairwise scikit-learn: Hist Gradient Boosting Higgs Boson scikit-learn: Covertype Dataset Benchmark tensorflow: CPU - 32 - VGG-16 tensorflow: GPU - 1 - VGG-16 scikit-learn: Lasso pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 1 - ResNet-152 tensorflow-lite: NASNet Mobile onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel tensorflow-lite: Inception ResNet V2 tensorflow-lite: Inception V4 tensorflow: CPU - 256 - GoogLeNet onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel tensorflow-lite: Mobilenet Float tensorflow-lite: SqueezeNet onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel pytorch: CPU - 256 - Efficientnet_v2_l tensorflow: CPU - 64 - ResNet-50 tensorflow: GPU - 64 - GoogLeNet scikit-learn: Hist Gradient Boosting Categorical Only pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l scikit-learn: TSNE MNIST Dataset scikit-learn: GLM tensorflow: CPU - 512 - AlexNet scikit-learn: Tree tensorflow: GPU - 16 - ResNet-50 scikit-learn: Isolation Forest whisper-cpp: ggml-base.en - 2016 State of the Union scikit-learn: Plot Hierarchical tensorflow: CPU - 16 - VGG-16 scikit-learn: LocalOutlierFactor scikit-learn: Hist Gradient Boosting Threading pytorch: CPU - 32 - ResNet-152 scikit-learn: Plot Polynomial Kernel Approximation tensorflow: GPU - 64 - AlexNet scikit-learn: Feature Expansions pytorch: CPU - 64 - ResNet-152 tensorflow: GPU - 32 - GoogLeNet tensorflow: CPU - 32 - ResNet-50 scikit-learn: Plot Neighbors tensorflow: CPU - 256 - AlexNet scikit-learn: Plot Incremental PCA scikit-learn: Sparsify scikit-learn: Sample Without Replacement opencv: DNN - Deep Neural Network mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 mnn: nasnet pytorch: CPU - 1 - Efficientnet_v2_l numpy: scikit-learn: Kernel PCA Solvers / Time vs. N Samples tensorflow: GPU - 32 - AlexNet scikit-learn: SGD Regression onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onednn: Recurrent Neural Network Training - CPU tensorflow: CPU - 64 - GoogLeNet onednn: Recurrent Neural Network Inference - CPU scikit-learn: MNIST Dataset tensorflow: GPU - 16 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 pytorch: CPU - 16 - ResNet-50 openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU pytorch: CPU - 512 - ResNet-50 onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU pytorch: CPU - 32 - ResNet-50 onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Standard openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard scikit-learn: Plot OMP vs. LARS tensorflow-lite: Mobilenet Quant openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard scikit-learn: Plot Ward tensorflow: GPU - 1 - GoogLeNet scikit-learn: Text Vectorizers tensorflow: GPU - 16 - AlexNet deepspeech: CPU tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 64 - AlexNet scikit-learn: Kernel PCA Solvers / Time vs. N Components pytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_l tensorflow: CPU - 1 - VGG-16 pytorch: CPU - 1 - ResNet-50 onednn: Deconvolution Batch shapes_1d - CPU tensorflow: CPU - 32 - AlexNet tensorflow: GPU - 1 - ResNet-50 tensorflow: CPU - 16 - GoogLeNet shoc: OpenCL - Texture Read Bandwidth pytorch: NVIDIA CUDA GPU - 64 - ResNet-152 pytorch: NVIDIA CUDA GPU - 512 - ResNet-152 pytorch: NVIDIA CUDA GPU - 32 - ResNet-152 pytorch: NVIDIA CUDA GPU - 16 - ResNet-152 pytorch: NVIDIA CUDA GPU - 256 - ResNet-152 pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_l onednn: IP Shapes 1D - CPU tensorflow: CPU - 16 - AlexNet scikit-learn: 20 Newsgroups / Logistic Regression rbenchmark: tensorflow: CPU - 1 - ResNet-50 tensorflow: GPU - 1 - AlexNet pytorch: NVIDIA CUDA GPU - 1 - ResNet-152 onednn: IP Shapes 3D - CPU tensorflow: CPU - 1 - AlexNet pytorch: NVIDIA CUDA GPU - 256 - ResNet-50 pytorch: NVIDIA CUDA GPU - 32 - ResNet-50 pytorch: NVIDIA CUDA GPU - 64 - ResNet-50 pytorch: NVIDIA CUDA GPU - 512 - ResNet-50 pytorch: NVIDIA CUDA GPU - 16 - ResNet-50 rnnoise: 26 Minute Long Talking Sample onednn: Convolution Batch Shapes Auto - CPU pytorch: NVIDIA CUDA GPU - 1 - ResNet-50 tensorflow: CPU - 1 - GoogLeNet onednn: Deconvolution Batch shapes_3d - CPU shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Bus Speed Download shoc: OpenCL - Triad shoc: OpenCL - FFT SP shoc: OpenCL - Reduction shoc: OpenCL - S3D shoc: OpenCL - MD5 Hash deepsparse: ResNet-50, Baseline - Synchronous Single-Stream ASUS NVIDIA GeForce RTX 3090 2.29 2.28 8.04 8.95 8.04 2.27 9.11 27.08 28.13 2000.10000 1469.791 1447.693 1173.890 2.27 1120.037 41982.2 44.64 27.04 27.49 8.00 904 928 1834 1646 2197 820 1585 1125 1394 6.97 7.26 2.25 587.844 9.31 521.305 11.49 11.46 44.05 638.01092 11.45 93.78 166 6.56 70.75 84.85 14.25 18.17 22.13 5.49 7.29 28.61 19.43 6.00 18.73 5.43 10.51 7.93 5.72 17.34 6.47 70.73 82.63 13.62 18.40 22.09 5.58 7.50 28.60 19.65 5.95 19.35 5.46 11.05 8.28 5.53 16.46 196.466 7.98 332.194 190.582 317.462 9.22 1.57 258.724 29.33 29.52 18.80 311542 604.549 1.65652 128179 26961.0 93.45 95.5591 10.49253 8.68614 115.166 1383.02 1904.01 6.14425 162.891 13.9393 71.8081 6.48 26.52 26.91 195.295 7.82 7.92 188.326 183.476 245.67 42.771 7.93 147.516 209.01863 153.056 8.97 35.645 140.749 10.16 133.120 42.49 120.329 11.66 26.63 26.82 96.755 227.40 21.307 85.975 84.273 25552 21.367 2.084 1.882 2.935 14.361 1.836 1.012 6.956 10.55 644.54 69.634 41.45 63.791 89.7581 11.1456 2702.00 94.21 1452.76 53.539 25.99 27.80 30.07 1375.14 4.33 30.14 1345.66 0.743436 1059.49 0.943867 382.74 15.61 30.27 400.351 2.49779 115.87 51.69 146.25 40.98 144.47 41.47 12.5924 79.4345 10.8141 92.4654 21.09 283.34 6.54304 152.798 45.622 2317.67 88.65 225.32 9.02 656.86 74.85 266.87 11.29 528.13 69.09 86.71 9.01 659.72 8.35 712.75 44.26 450.44 2.97 1948.78 0.62 29697.31 20.16 296.62 12.81 1544.54 5.02 1180.46 3.25179 307.372 1.66 11244.68 1.54657 646.109 3.19151 313.235 12.2276 81.7772 43.615 12.00 38.906 38.93 54.62917 96.69 192.13 25.585 84.09 85.37 85.57 85.74 86.27 4.12 49.49 4.47259 167.17 5.62 101.50 2178.63 162.69 162.35 162.45 162.59 162.68 88.41 2.89253 129.70 9.614 0.0970 14.04 14.14 172.05 9.32676 15.83 413.12 413.19 413.29 414.51 414.29 7.245 8.24915 469.71 47.92 5.57785 8456.99 6.7652 6.6413 6.6057 2510.98 406.411 455.955 45.4489 OpenBenchmarking.org
TensorFlow Device: GPU - Batch Size: 512 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.5153 1.0306 1.5459 2.0612 2.5765 SE +/- 0.00, N = 3 2.29
TensorFlow Device: GPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.513 1.026 1.539 2.052 2.565 SE +/- 0.00, N = 3 2.28
TensorFlow Device: GPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.01, N = 3 8.04
TensorFlow Device: CPU - Batch Size: 512 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.06, N = 3 8.95
TensorFlow Device: GPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.00, N = 3 8.04
TensorFlow Device: GPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.5108 1.0216 1.5324 2.0432 2.554 SE +/- 0.00, N = 3 2.27
TensorFlow Device: CPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 3 9.11
TensorFlow Device: GPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.04, N = 3 27.08
TensorFlow Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.01, N = 3 28.13
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union ASUS NVIDIA GeForce RTX 3090 400 800 1200 1600 2000 SE +/- 2.12, N = 3 2000.10 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2
Scikit-Learn Benchmark: Isotonic / Perturbed Logarithm OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Perturbed Logarithm ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 0.23, N = 3 1469.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Isotonic / Logistic OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Logistic ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 0.80, N = 3 1447.69 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Adult OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Adult ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 4.93, N = 3 1173.89 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.5108 1.0216 1.5324 2.0432 2.554 SE +/- 0.00, N = 3 2.27
Scikit-Learn Benchmark: Hist Gradient Boosting OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting ASUS NVIDIA GeForce RTX 3090 200 400 600 800 1000 SE +/- 3.91, N = 3 1120.04 1. (F9X) gfortran options: -O0
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops ASUS NVIDIA GeForce RTX 3090 9K 18K 27K 36K 45K SE +/- 208.39, N = 3 41982.2 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
TensorFlow Device: GPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.42, N = 3 44.64
TensorFlow Device: GPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.05, N = 3 27.04
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.18, N = 3 27.49
TensorFlow Device: GPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.00, N = 3 8.00
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 ASUS NVIDIA GeForce RTX 3090 200 400 600 800 1000 SE +/- 35.09, N = 12 904 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small ASUS NVIDIA GeForce RTX 3090 200 400 600 800 1000 SE +/- 16.32, N = 12 928 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large ASUS NVIDIA GeForce RTX 3090 400 800 1200 1600 2000 SE +/- 29.31, N = 12 1834 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 ASUS NVIDIA GeForce RTX 3090 400 800 1200 1600 2000 SE +/- 50.43, N = 12 1646 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 ASUS NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 74.47, N = 12 2197 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small ASUS NVIDIA GeForce RTX 3090 200 400 600 800 1000 SE +/- 29.73, N = 12 820 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 66.95, N = 12 1585 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 ASUS NVIDIA GeForce RTX 3090 200 400 600 800 1000 SE +/- 17.99, N = 12 1125 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 27.87, N = 12 1394 1. (CXX) g++ options: -O3 -lrt -lm
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.22, N = 9 6.97 MIN: 6.34 / MAX: 8.11
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.24, N = 9 7.26 MIN: 6.42 / MAX: 8.05
TensorFlow Device: GPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.5063 1.0126 1.5189 2.0252 2.5315 SE +/- 0.00, N = 3 2.25
Scikit-Learn Benchmark: Sparse Random Projections / 100 Iterations OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparse Random Projections / 100 Iterations ASUS NVIDIA GeForce RTX 3090 130 260 390 520 650 SE +/- 1.77, N = 3 587.84 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 3 9.31
Scikit-Learn Benchmark: SAGA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SAGA ASUS NVIDIA GeForce RTX 3090 110 220 330 440 550 SE +/- 0.32, N = 3 521.31 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.20, N = 12 11.49 MIN: 10.07 / MAX: 12.18
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.14, N = 12 11.46 MIN: 10.06 / MAX: 11.81
TensorFlow Device: GPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.40, N = 3 44.05
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union ASUS NVIDIA GeForce RTX 3090 140 280 420 560 700 SE +/- 0.42, N = 3 638.01 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.19, N = 12 11.45 MIN: 10.05 / MAX: 12.11
TensorFlow Device: CPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.03, N = 3 93.78
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.31.1 Backend: BLAS ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 1.83, N = 5 166 1. (CXX) g++ options: -flto -pthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.07, N = 9 6.56 MIN: 5.01 / MAX: 14.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer ASUS NVIDIA GeForce RTX 3090 16 32 48 64 80 SE +/- 0.03, N = 9 70.75 MIN: 67.65 / MAX: 81.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.69, N = 9 84.85 MIN: 22.37 / MAX: 121 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.48, N = 9 14.25 MIN: 7.96 / MAX: 30.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.19, N = 9 18.17 MIN: 14.35 / MAX: 48.07 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.46, N = 9 22.13 MIN: 13.15 / MAX: 49.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet ASUS NVIDIA GeForce RTX 3090 1.2353 2.4706 3.7059 4.9412 6.1765 SE +/- 0.01, N = 8 5.49 MIN: 5.21 / MAX: 6.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.07, N = 9 7.29 MIN: 5.96 / MAX: 11.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.01, N = 9 28.61 MIN: 27.79 / MAX: 31.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.55, N = 9 19.43 MIN: 9.05 / MAX: 43.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.11, N = 8 6.00 MIN: 2.81 / MAX: 17.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.83, N = 9 18.73 MIN: 7.31 / MAX: 37.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet ASUS NVIDIA GeForce RTX 3090 1.2218 2.4436 3.6654 4.8872 6.109 SE +/- 0.03, N = 9 5.43 MIN: 4.16 / MAX: 9.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.50, N = 9 10.51 MIN: 4.55 / MAX: 29.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.60, N = 9 7.93 MIN: 4.48 / MAX: 21.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ASUS NVIDIA GeForce RTX 3090 1.287 2.574 3.861 5.148 6.435 SE +/- 0.13, N = 9 5.72 MIN: 4.2 / MAX: 19.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.37, N = 9 17.34 MIN: 9.39 / MAX: 40.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.13, N = 9 6.47 MIN: 4.73 / MAX: 17.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer ASUS NVIDIA GeForce RTX 3090 16 32 48 64 80 SE +/- 0.06, N = 9 70.73 MIN: 67.43 / MAX: 80.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.75, N = 9 82.63 MIN: 22.37 / MAX: 120.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.29, N = 9 13.62 MIN: 8.06 / MAX: 30.15 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.31, N = 9 18.40 MIN: 14.15 / MAX: 42.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.34, N = 9 22.09 MIN: 13.37 / MAX: 48.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet ASUS NVIDIA GeForce RTX 3090 1.2555 2.511 3.7665 5.022 6.2775 SE +/- 0.05, N = 9 5.58 MIN: 5.23 / MAX: 13.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.04, N = 9 7.50 MIN: 5.96 / MAX: 14.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.01, N = 9 28.60 MIN: 27.69 / MAX: 31.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.62, N = 9 19.65 MIN: 9.41 / MAX: 46.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface ASUS NVIDIA GeForce RTX 3090 1.3388 2.6776 4.0164 5.3552 6.694 SE +/- 0.11, N = 9 5.95 MIN: 2.83 / MAX: 16.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.78, N = 9 19.35 MIN: 7.5 / MAX: 39.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet ASUS NVIDIA GeForce RTX 3090 1.2285 2.457 3.6855 4.914 6.1425 SE +/- 0.05, N = 9 5.46 MIN: 4.13 / MAX: 13.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.58, N = 9 11.05 MIN: 4.48 / MAX: 28.4 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.34, N = 9 8.28 MIN: 4.71 / MAX: 21.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 ASUS NVIDIA GeForce RTX 3090 1.2443 2.4886 3.7329 4.9772 6.2215 SE +/- 0.07, N = 9 5.53 MIN: 4.11 / MAX: 12.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.47, N = 9 16.46 MIN: 9.33 / MAX: 40.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Scikit-Learn Benchmark: SGDOneClassSVM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGDOneClassSVM ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 1.90, N = 6 196.47 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.01, N = 3 7.98
Scikit-Learn Benchmark: Plot Parallel Pairwise OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Parallel Pairwise ASUS NVIDIA GeForce RTX 3090 70 140 210 280 350 SE +/- 1.13, N = 3 332.19 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Higgs Boson OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Higgs Boson ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 2.11, N = 5 190.58 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Covertype Dataset Benchmark OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Covertype Dataset Benchmark ASUS NVIDIA GeForce RTX 3090 70 140 210 280 350 SE +/- 0.13, N = 3 317.46 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.02, N = 3 9.22
TensorFlow Device: GPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.3533 0.7066 1.0599 1.4132 1.7665 SE +/- 0.02, N = 15 1.57
Scikit-Learn Benchmark: Lasso OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Lasso ASUS NVIDIA GeForce RTX 3090 60 120 180 240 300 SE +/- 1.16, N = 3 258.72 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.38, N = 15 29.33 MIN: 25.91 / MAX: 30.51
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.32, N = 15 29.52 MIN: 25.74 / MAX: 30.64
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.44, N = 15 18.80 MIN: 15.08 / MAX: 19.89
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile ASUS NVIDIA GeForce RTX 3090 70K 140K 210K 280K 350K SE +/- 11554.76, N = 15 311542
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 130 260 390 520 650 SE +/- 6.22, N = 15 604.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 0.3727 0.7454 1.1181 1.4908 1.8635 SE +/- 0.01666, N = 15 1.65652 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 ASUS NVIDIA GeForce RTX 3090 30K 60K 90K 120K 150K SE +/- 4137.26, N = 15 128179
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 ASUS NVIDIA GeForce RTX 3090 6K 12K 18K 24K 30K SE +/- 395.13, N = 15 26961.0
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.03, N = 3 93.45
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 1.36, N = 15 95.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.14, N = 15 10.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.05490, N = 15 8.68614 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.72, N = 15 115.17 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 17.51, N = 15 1383.02
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet ASUS NVIDIA GeForce RTX 3090 400 800 1200 1600 2000 SE +/- 15.29, N = 15 1904.01
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.05367, N = 15 6.14425 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 1.41, N = 15 162.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.12, N = 15 13.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 16 32 48 64 80 SE +/- 0.62, N = 15 71.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.01, N = 3 6.48 MIN: 6.44 / MAX: 6.62
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.01, N = 3 26.52
TensorFlow Device: GPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.01, N = 3 26.91
Scikit-Learn Benchmark: Hist Gradient Boosting Categorical Only OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Categorical Only ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.28, N = 3 195.30 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.07, N = 3 7.82 MIN: 6.39 / MAX: 7.98
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.05, N = 3 7.92 MIN: 7.36 / MAX: 8.02
Scikit-Learn Benchmark: TSNE MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: TSNE MNIST Dataset ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.44, N = 3 188.33 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: GLM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: GLM ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.31, N = 3 183.48 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 50 100 150 200 250 SE +/- 0.02, N = 3 245.67
Scikit-Learn Benchmark: Tree OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Tree ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.38, N = 15 42.77 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.01, N = 3 7.93
Scikit-Learn Benchmark: Isolation Forest OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isolation Forest ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.04, N = 3 147.52 1. (F9X) gfortran options: -O0
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union ASUS NVIDIA GeForce RTX 3090 50 100 150 200 250 SE +/- 0.74, N = 3 209.02 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2
Scikit-Learn Benchmark: Plot Hierarchical OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Hierarchical ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.14, N = 3 153.06 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.06, N = 3 8.97
Scikit-Learn Benchmark: LocalOutlierFactor OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: LocalOutlierFactor ASUS NVIDIA GeForce RTX 3090 8 16 24 32 40 SE +/- 0.32, N = 15 35.65 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Threading OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Threading ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 1.02, N = 3 140.75 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 3 10.16 MIN: 10.01 / MAX: 10.25
Scikit-Learn Benchmark: Plot Polynomial Kernel Approximation OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Polynomial Kernel Approximation ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.20, N = 3 133.12 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.02, N = 3 42.49
Scikit-Learn Benchmark: Feature Expansions OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Feature Expansions ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.09, N = 3 120.33 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.05, N = 3 11.66 MIN: 11.02 / MAX: 11.79
TensorFlow Device: GPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.04, N = 3 26.63
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.01, N = 3 26.82
Scikit-Learn Benchmark: Plot Neighbors OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Neighbors ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.18, N = 3 96.76 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 50 100 150 200 250 SE +/- 0.07, N = 3 227.40
Scikit-Learn Benchmark: Plot Incremental PCA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Incremental PCA ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.18, N = 15 21.31 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sparsify OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparsify ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.25, N = 3 85.98 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sample Without Replacement OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sample Without Replacement ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.15, N = 3 84.27 1. (F9X) gfortran options: -O0
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network ASUS NVIDIA GeForce RTX 3090 5K 10K 15K 20K 25K SE +/- 635.50, N = 13 25552 1. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: inception-v3 ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.04, N = 3 21.37 MIN: 20.43 / MAX: 62.25 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: mobilenet-v1-1.0 ASUS NVIDIA GeForce RTX 3090 0.4689 0.9378 1.4067 1.8756 2.3445 SE +/- 0.015, N = 3 2.084 MIN: 1.99 / MAX: 26.25 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: MobileNetV2_224 ASUS NVIDIA GeForce RTX 3090 0.4235 0.847 1.2705 1.694 2.1175 SE +/- 0.011, N = 3 1.882 MIN: 1.79 / MAX: 33.11 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: SqueezeNetV1.0 ASUS NVIDIA GeForce RTX 3090 0.6604 1.3208 1.9812 2.6416 3.302 SE +/- 0.049, N = 3 2.935 MIN: 2.7 / MAX: 34.14 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: resnet-v2-50 ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.08, N = 3 14.36 MIN: 13.6 / MAX: 59.49 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: squeezenetv1.1 ASUS NVIDIA GeForce RTX 3090 0.4131 0.8262 1.2393 1.6524 2.0655 SE +/- 0.006, N = 3 1.836 MIN: 1.71 / MAX: 31.7 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: mobilenetV3 ASUS NVIDIA GeForce RTX 3090 0.2277 0.4554 0.6831 0.9108 1.1385 SE +/- 0.003, N = 3 1.012 MIN: 0.93 / MAX: 24.1 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: nasnet ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.004, N = 3 6.956 MIN: 6.61 / MAX: 53.04 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.12, N = 3 10.55 MIN: 6.85 / MAX: 13.27
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark ASUS NVIDIA GeForce RTX 3090 140 280 420 560 700 SE +/- 0.55, N = 3 644.54
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Samples OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Samples ASUS NVIDIA GeForce RTX 3090 15 30 45 60 75 SE +/- 0.35, N = 3 69.63 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 9 18 27 36 45 SE +/- 0.40, N = 3 41.45
Scikit-Learn Benchmark: SGD Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGD Regression ASUS NVIDIA GeForce RTX 3090 14 28 42 56 70 SE +/- 0.12, N = 3 63.79 1. (F9X) gfortran options: -O0
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 1.09, N = 4 89.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.13, N = 4 11.15 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU ASUS NVIDIA GeForce RTX 3090 600 1200 1800 2400 3000 SE +/- 1.60, N = 3 2702.00 MIN: 2689.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.26, N = 3 94.21
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 2.94, N = 3 1452.76 MIN: 1432.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Scikit-Learn Benchmark: MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: MNIST Dataset ASUS NVIDIA GeForce RTX 3090 12 24 36 48 60 SE +/- 0.03, N = 3 53.54 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.06, N = 3 25.99
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.02, N = 3 27.80
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.12, N = 3 30.07 MIN: 28.1 / MAX: 30.36
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 6.34, N = 3 1375.14 MIN: 1104.54 / MAX: 1761.35 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 0.9743 1.9486 2.9229 3.8972 4.8715 SE +/- 0.03, N = 3 4.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.24, N = 3 30.14 MIN: 18.07 / MAX: 30.74
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 19.21, N = 3 1345.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 0.1673 0.3346 0.5019 0.6692 0.8365 SE +/- 0.010713, N = 3 0.743436 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 200 400 600 800 1000 SE +/- 2.93, N = 3 1059.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 0.2124 0.4248 0.6372 0.8496 1.062 SE +/- 0.002603, N = 3 0.943867 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 80 160 240 320 400 SE +/- 0.21, N = 3 382.74 MIN: 223.52 / MAX: 881.7 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.05, N = 3 15.61 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 7 14 21 28 35 SE +/- 0.11, N = 3 30.27 MIN: 28.61 / MAX: 30.57
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.32, N = 3 400.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 0.562 1.124 1.686 2.248 2.81 SE +/- 0.00202, N = 3 2.49779 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.40, N = 3 115.87 MIN: 60.69 / MAX: 196.8 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 12 24 36 48 60 SE +/- 0.18, N = 3 51.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.46, N = 3 146.25 MIN: 120.28 / MAX: 197.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 9 18 27 36 45 SE +/- 0.13, N = 3 40.98 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.71, N = 3 144.47 MIN: 72.51 / MAX: 211.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU ASUS NVIDIA GeForce RTX 3090 9 18 27 36 45 SE +/- 0.20, N = 3 41.47 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.18, N = 3 12.59 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 1.14, N = 3 79.43 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.07, N = 3 10.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.61, N = 3 92.47 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.05, N = 3 21.09 MIN: 9.8 / MAX: 54.63 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 60 120 180 240 300 SE +/- 0.63, N = 3 283.34 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.01981, N = 3 6.54304 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.46, N = 3 152.80 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Scikit-Learn Benchmark: Plot OMP vs. LARS OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot OMP vs. LARS ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.29, N = 3 45.62 1. (F9X) gfortran options: -O0
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant ASUS NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 20.94, N = 3 2317.67
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.10, N = 3 88.65 MIN: 56.64 / MAX: 150.06 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 50 100 150 200 250 SE +/- 0.26, N = 3 225.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 3 9.02 MIN: 6.21 / MAX: 25.89 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 140 280 420 560 700 SE +/- 0.48, N = 3 656.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.19, N = 3 74.85 MIN: 56.4 / MAX: 150.26 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 60 120 180 240 300 SE +/- 0.68, N = 3 266.87 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 3 11.29 MIN: 5.5 / MAX: 26.39 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 110 220 330 440 550 SE +/- 0.48, N = 3 528.13 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 15 30 45 60 75 SE +/- 0.30, N = 3 69.09 MIN: 27.97 / MAX: 92.16 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.37, N = 3 86.71 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.01, N = 3 9.01 MIN: 4.38 / MAX: 37.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 140 280 420 560 700 SE +/- 0.35, N = 3 659.72 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.00, N = 3 8.35 MIN: 4.12 / MAX: 34.48 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 150 300 450 600 750 SE +/- 0.27, N = 3 712.75 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.07, N = 3 44.26 MIN: 25.84 / MAX: 77.3 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 100 200 300 400 500 SE +/- 0.70, N = 3 450.44 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 0.6683 1.3366 2.0049 2.6732 3.3415 SE +/- 0.00, N = 3 2.97 MIN: 1.4 / MAX: 22.5 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 400 800 1200 1600 2000 SE +/- 3.09, N = 3 1948.78 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 0.1395 0.279 0.4185 0.558 0.6975 SE +/- 0.00, N = 3 0.62 MIN: 0.31 / MAX: 13.6 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 6K 12K 18K 24K 30K SE +/- 14.75, N = 3 29697.31 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 5 10 15 20 25 SE +/- 0.10, N = 3 20.16 MIN: 7.21 / MAX: 40.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 60 120 180 240 300 SE +/- 1.40, N = 3 296.62 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.00, N = 3 12.81 MIN: 7.26 / MAX: 52.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 0.90, N = 3 1544.54 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 1.1295 2.259 3.3885 4.518 5.6475 SE +/- 0.01, N = 3 5.02 MIN: 2.23 / MAX: 27.19 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 300 600 900 1200 1500 SE +/- 1.52, N = 3 1180.46 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 0.7317 1.4634 2.1951 2.9268 3.6585 SE +/- 0.00596, N = 3 3.25179 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel ASUS NVIDIA GeForce RTX 3090 70 140 210 280 350 SE +/- 0.56, N = 3 307.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 0.3735 0.747 1.1205 1.494 1.8675 SE +/- 0.00, N = 3 1.66 MIN: 0.78 / MAX: 17.39 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU ASUS NVIDIA GeForce RTX 3090 2K 4K 6K 8K 10K SE +/- 7.15, N = 3 11244.68 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 0.348 0.696 1.044 1.392 1.74 SE +/- 0.00165, N = 3 1.54657 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 140 280 420 560 700 SE +/- 0.70, N = 3 646.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 0.7181 1.4362 2.1543 2.8724 3.5905 SE +/- 0.00923, N = 3 3.19151 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 70 140 210 280 350 SE +/- 0.91, N = 3 313.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.00, N = 3 12.23 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.01, N = 3 81.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Scikit-Learn Benchmark: Plot Ward OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Ward ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.60, N = 3 43.62 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.21, N = 15 12.00
Scikit-Learn Benchmark: Text Vectorizers OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Text Vectorizers ASUS NVIDIA GeForce RTX 3090 9 18 27 36 45 SE +/- 0.03, N = 3 38.91 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 9 18 27 36 45 SE +/- 0.43, N = 3 38.93
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU ASUS NVIDIA GeForce RTX 3090 12 24 36 48 60 SE +/- 0.08, N = 3 54.63
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.06, N = 3 96.69
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.32, N = 3 192.13
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Components OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Components ASUS NVIDIA GeForce RTX 3090 6 12 18 24 30 SE +/- 0.23, N = 3 25.59 1. (F9X) gfortran options: -O0
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.95, N = 3 84.09 MIN: 70.71 / MAX: 86.82
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.50, N = 3 85.37 MIN: 74.95 / MAX: 88.01
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.11, N = 3 85.57 MIN: 73.68 / MAX: 87.24
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.47, N = 3 85.74 MIN: 75.44 / MAX: 87.56
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.12, N = 3 86.27 MIN: 76.41 / MAX: 88.03
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: VGG-16 ASUS NVIDIA GeForce RTX 3090 0.927 1.854 2.781 3.708 4.635 SE +/- 0.00, N = 3 4.12
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 11 22 33 44 55 SE +/- 0.18, N = 3 49.49 MIN: 30.42 / MAX: 50.09
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU ASUS NVIDIA GeForce RTX 3090 1.0063 2.0126 3.0189 4.0252 5.0315 SE +/- 0.00491, N = 3 4.47259 MIN: 4.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.14, N = 3 167.17
TensorFlow Device: GPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 1.2645 2.529 3.7935 5.058 6.3225 SE +/- 0.05, N = 3 5.62
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.15, N = 3 101.50
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth ASUS NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 2.20, N = 3 2178.63 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.16, N = 3 162.69 MIN: 143.93 / MAX: 165.52
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.22, N = 3 162.35 MIN: 99.28 / MAX: 165.46
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.21, N = 3 162.45 MIN: 87.19 / MAX: 165.45
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.56, N = 3 162.59 MIN: 114.4 / MAX: 167.22
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.10, N = 3 162.68 MIN: 145.92 / MAX: 165.55
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l ASUS NVIDIA GeForce RTX 3090 20 40 60 80 100 SE +/- 0.70, N = 3 88.41 MIN: 76.75 / MAX: 90.99
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU ASUS NVIDIA GeForce RTX 3090 0.6508 1.3016 1.9524 2.6032 3.254 SE +/- 0.00897, N = 3 2.89253 MIN: 2.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 30 60 90 120 150 SE +/- 0.01, N = 3 129.70
Scikit-Learn Benchmark: 20 Newsgroups / Logistic Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: 20 Newsgroups / Logistic Regression ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.072, N = 3 9.614 1. (F9X) gfortran options: -O0
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark ASUS NVIDIA GeForce RTX 3090 0.0218 0.0436 0.0654 0.0872 0.109 SE +/- 0.0004, N = 3 0.0970
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.07, N = 3 14.04
TensorFlow Device: GPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.07, N = 3 14.14
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 ASUS NVIDIA GeForce RTX 3090 40 80 120 160 200 SE +/- 0.36, N = 3 172.05 MIN: 143.74 / MAX: 175.36
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU ASUS NVIDIA GeForce RTX 3090 3 6 9 12 15 SE +/- 0.00039, N = 3 9.32676 MIN: 9.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: AlexNet ASUS NVIDIA GeForce RTX 3090 4 8 12 16 20 SE +/- 0.02, N = 3 15.83
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.66, N = 3 413.12 MIN: 161.19 / MAX: 421.75
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.17, N = 3 413.19 MIN: 321.33 / MAX: 421.07
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.26, N = 3 413.29 MIN: 337.68 / MAX: 420.31
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.17, N = 3 414.51 MIN: 337.19 / MAX: 422.34
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.07, N = 3 414.29 MIN: 339.59 / MAX: 422.13
RNNoise Input: 26 Minute Long Talking Sample OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 0.2 Input: 26 Minute Long Talking Sample ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.014, N = 3 7.245 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.00332, N = 3 8.24915 MIN: 7.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 ASUS NVIDIA GeForce RTX 3090 100 200 300 400 500 SE +/- 1.86, N = 3 469.71 MIN: 339.62 / MAX: 480.89
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: GoogLeNet ASUS NVIDIA GeForce RTX 3090 11 22 33 44 55 SE +/- 0.60, N = 3 47.92
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU ASUS NVIDIA GeForce RTX 3090 1.255 2.51 3.765 5.02 6.275 SE +/- 0.00815, N = 3 5.57785 MIN: 5.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N ASUS NVIDIA GeForce RTX 3090 2K 4K 6K 8K 10K SE +/- 38.88, N = 3 8456.99 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.0000, N = 3 6.7652 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.0002, N = 3 6.6413 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad ASUS NVIDIA GeForce RTX 3090 2 4 6 8 10 SE +/- 0.0011, N = 3 6.6057 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP ASUS NVIDIA GeForce RTX 3090 500 1000 1500 2000 2500 SE +/- 35.21, N = 3 2510.98 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction ASUS NVIDIA GeForce RTX 3090 90 180 270 360 450 SE +/- 0.22, N = 3 406.41 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D ASUS NVIDIA GeForce RTX 3090 100 200 300 400 500 SE +/- 0.72, N = 3 455.96 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash ASUS NVIDIA GeForce RTX 3090 10 20 30 40 50 SE +/- 0.01, N = 3 45.45 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5