102424machinelearningtest

Intel Core i9-12900K testing with a ASUS PRIME Z790-V AX (1802 BIOS) and ASUS NVIDIA w Dual GeForce RTX 3090 24GB w NVLink on Ubuntu 24.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2410281-NE-102424MAC72
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
ASUS NVIDIA GeForce RTX 3090
October 25
  4 Days, 17 Hours, 7 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


102424machinelearningtestOpenBenchmarking.orgPhoronix Test SuiteIntel Core i9-12900K @ 5.10GHz (16 Cores / 24 Threads)ASUS PRIME Z790-V AX (1802 BIOS)Intel Raptor Lake-S PCH96GB2000GB Samsung SSD 970 EVO Plus 2TBASUS NVIDIA GeForce RTX 3090 24GBIntel Raptor Lake HD AudioS24F350Realtek RTL8111/8168/8211/8411 + Realtek Device b851Ubuntu 24.046.8.0-47-generic (x86_64)GNOME Shell 46.0X Server + WaylandNVIDIA 560.35.034.6.0OpenCL 3.0 CUDA 12.6.65GCC 13.2.0 + CUDA 12.5ext41920x1080ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen Resolution102424machinelearningtest BenchmarksSystem Logs- Transparent Huge Pages: madvise- PRIMUS_libGLa=/usr/lib/nvidia-current/libGL.so.1:/usr/lib32/nvidia-current/libGL.so.1:/usr/lib/x86_64-linux-gnu/libGL.so.1:/usr/lib/i386-linux-gnu/libGL.so.1 PRIMUS_libGLd=/usr/$LIB/libGL.so.1:/usr/lib/$LIB/libGL.so.1:/usr/$LIB/mesa/libGL.so.1:/usr/lib/$LIB/mesa/libGL.so.1 - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x37 - Thermald 2.5.6 - BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 94.02.4b.00.0b- GPU Compute Cores: 10496- Python 3.12.3- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Mitigation of Clear Register File + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected

102424machinelearningtesttensorflow: GPU - 512 - VGG-16tensorflow: GPU - 256 - VGG-16tensorflow: GPU - 512 - ResNet-50tensorflow: CPU - 512 - VGG-16tensorflow: GPU - 256 - ResNet-50tensorflow: GPU - 64 - VGG-16tensorflow: CPU - 256 - VGG-16tensorflow: GPU - 512 - GoogLeNettensorflow: CPU - 512 - ResNet-50whisper-cpp: ggml-medium.en - 2016 State of the Unionscikit-learn: Isotonic / Perturbed Logarithmscikit-learn: Isotonic / Logisticscikit-learn: Hist Gradient Boosting Adulttensorflow: GPU - 32 - VGG-16scikit-learn: Hist Gradient Boostingshoc: OpenCL - Max SP Flopstensorflow: GPU - 512 - AlexNettensorflow: GPU - 256 - GoogLeNettensorflow: CPU - 256 - ResNet-50tensorflow: GPU - 64 - ResNet-50xnnpack: QS8MobileNetV2xnnpack: FP16MobileNetV3Smallxnnpack: FP16MobileNetV3Largexnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV1xnnpack: FP32MobileNetV3Smallxnnpack: FP32MobileNetV3Largexnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV1pytorch: CPU - 16 - Efficientnet_v2_lpytorch: CPU - 512 - Efficientnet_v2_ltensorflow: GPU - 16 - VGG-16scikit-learn: Sparse Rand Projections / 100 Iterationstensorflow: CPU - 64 - VGG-16scikit-learn: SAGApytorch: CPU - 16 - ResNet-152pytorch: CPU - 512 - ResNet-152tensorflow: GPU - 256 - AlexNetwhisper-cpp: ggml-small.en - 2016 State of the Unionpytorch: CPU - 256 - ResNet-152tensorflow: CPU - 512 - GoogLeNetlczero: BLASncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetncnn: CPU - FastestDetncnn: CPU - vision_transformerncnn: CPU - regnety_400mncnn: CPU - squeezenet_ssdncnn: CPU - yolov4-tinyncnn: CPU - resnet50ncnn: CPU - alexnetncnn: CPU - resnet18ncnn: CPU - vgg16ncnn: CPU - googlenetncnn: CPU - blazefacencnn: CPU - efficientnet-b0ncnn: CPU - mnasnetncnn: CPU - shufflenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU - mobilenetscikit-learn: SGDOneClassSVMtensorflow: GPU - 32 - ResNet-50scikit-learn: Plot Parallel Pairwisescikit-learn: Hist Gradient Boosting Higgs Bosonscikit-learn: Covertype Dataset Benchmarktensorflow: CPU - 32 - VGG-16tensorflow: GPU - 1 - VGG-16scikit-learn: Lassopytorch: CPU - 256 - ResNet-50pytorch: CPU - 64 - ResNet-50pytorch: CPU - 1 - ResNet-152tensorflow-lite: NASNet Mobileonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Paralleltensorflow-lite: Inception ResNet V2tensorflow-lite: Inception V4tensorflow: CPU - 256 - GoogLeNetonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Parallelonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Paralleltensorflow-lite: Mobilenet Floattensorflow-lite: SqueezeNetonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Parallelpytorch: CPU - 256 - Efficientnet_v2_ltensorflow: CPU - 64 - ResNet-50tensorflow: GPU - 64 - GoogLeNetscikit-learn: Hist Gradient Boosting Categorical Onlypytorch: CPU - 64 - Efficientnet_v2_lpytorch: CPU - 32 - Efficientnet_v2_lscikit-learn: TSNE MNIST Datasetscikit-learn: GLMtensorflow: CPU - 512 - AlexNetscikit-learn: Treetensorflow: GPU - 16 - ResNet-50scikit-learn: Isolation Forestwhisper-cpp: ggml-base.en - 2016 State of the Unionscikit-learn: Plot Hierarchicaltensorflow: CPU - 16 - VGG-16scikit-learn: LocalOutlierFactorscikit-learn: Hist Gradient Boosting Threadingpytorch: CPU - 32 - ResNet-152scikit-learn: Plot Polynomial Kernel Approximationtensorflow: GPU - 64 - AlexNetscikit-learn: Feature Expansionspytorch: CPU - 64 - ResNet-152tensorflow: GPU - 32 - GoogLeNettensorflow: CPU - 32 - ResNet-50scikit-learn: Plot Neighborstensorflow: CPU - 256 - AlexNetscikit-learn: Plot Incremental PCAscikit-learn: Sparsifyscikit-learn: Sample Without Replacementopencv: DNN - Deep Neural Networkmnn: inception-v3mnn: mobilenet-v1-1.0mnn: MobileNetV2_224mnn: SqueezeNetV1.0mnn: resnet-v2-50mnn: squeezenetv1.1mnn: mobilenetV3mnn: nasnetpytorch: CPU - 1 - Efficientnet_v2_lnumpy: scikit-learn: Kernel PCA Solvers / Time vs. N Samplestensorflow: GPU - 32 - AlexNetscikit-learn: SGD Regressiononnx: yolov4 - CPU - Standardonnx: yolov4 - CPU - Standardonednn: Recurrent Neural Network Training - CPUtensorflow: CPU - 64 - GoogLeNetonednn: Recurrent Neural Network Inference - CPUscikit-learn: MNIST Datasettensorflow: GPU - 16 - GoogLeNettensorflow: CPU - 16 - ResNet-50pytorch: CPU - 16 - ResNet-50openvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUpytorch: CPU - 512 - ResNet-50onnx: ResNet101_DUC_HDC-12 - CPU - Parallelonnx: ResNet101_DUC_HDC-12 - CPU - Parallelonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUpytorch: CPU - 32 - ResNet-50onnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUonnx: ZFNet-512 - CPU - Parallelonnx: ZFNet-512 - CPU - Parallelonnx: ZFNet-512 - CPU - Standardonnx: ZFNet-512 - CPU - Standardopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUonnx: T5 Encoder - CPU - Standardonnx: T5 Encoder - CPU - Standardscikit-learn: Plot OMP vs. LARStensorflow-lite: Mobilenet Quantopenvino: Handwritten English Recognition FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Face Detection Retail FP16 - CPUonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Parallelopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardscikit-learn: Plot Wardtensorflow: GPU - 1 - GoogLeNetscikit-learn: Text Vectorizerstensorflow: GPU - 16 - AlexNetdeepspeech: CPUtensorflow: CPU - 32 - GoogLeNettensorflow: CPU - 64 - AlexNetscikit-learn: Kernel PCA Solvers / Time vs. N Componentspytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_ltensorflow: CPU - 1 - VGG-16pytorch: CPU - 1 - ResNet-50onednn: Deconvolution Batch shapes_1d - CPUtensorflow: CPU - 32 - AlexNettensorflow: GPU - 1 - ResNet-50tensorflow: CPU - 16 - GoogLeNetshoc: OpenCL - Texture Read Bandwidthpytorch: NVIDIA CUDA GPU - 64 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-152pytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-152pytorch: NVIDIA CUDA GPU - 256 - ResNet-152pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_lonednn: IP Shapes 1D - CPUtensorflow: CPU - 16 - AlexNetscikit-learn: 20 Newsgroups / Logistic Regressionrbenchmark: tensorflow: CPU - 1 - ResNet-50tensorflow: GPU - 1 - AlexNetpytorch: NVIDIA CUDA GPU - 1 - ResNet-152onednn: IP Shapes 3D - CPUtensorflow: CPU - 1 - AlexNetpytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 512 - ResNet-50pytorch: NVIDIA CUDA GPU - 16 - ResNet-50rnnoise: 26 Minute Long Talking Sampleonednn: Convolution Batch Shapes Auto - CPUpytorch: NVIDIA CUDA GPU - 1 - ResNet-50tensorflow: CPU - 1 - GoogLeNetonednn: Deconvolution Batch shapes_3d - CPUshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - Reductionshoc: OpenCL - S3Dshoc: OpenCL - MD5 Hashdeepsparse: ResNet-50, Baseline - Synchronous Single-StreamASUS NVIDIA GeForce RTX 30902.292.288.048.958.042.279.1127.0828.132000.100001469.7911447.6931173.8902.271120.03741982.244.6427.0427.498.009049281834164621978201585112513946.977.262.25587.8449.31521.30511.4911.4644.05638.0109211.4593.781666.5670.7584.8514.2518.1722.135.497.2928.6119.436.0018.735.4310.517.935.7217.346.4770.7382.6313.6218.4022.095.587.5028.6019.655.9519.355.4611.058.285.5316.46196.4667.98332.194190.582317.4629.221.57258.72429.3329.5218.80311542604.5491.6565212817926961.093.4595.559110.492538.68614115.1661383.021904.016.14425162.89113.939371.80816.4826.5226.91195.2957.827.92188.326183.476245.6742.7717.93147.516209.01863153.0568.9735.645140.74910.16133.12042.49120.32911.6626.6326.8296.755227.4021.30785.97584.2732555221.3672.0841.8822.93514.3611.8361.0126.95610.55644.5469.63441.4563.79189.758111.14562702.0094.211452.7653.53925.9927.8030.071375.144.3330.141345.660.7434361059.490.943867382.7415.6130.27400.3512.49779115.8751.69146.2540.98144.4741.4712.592479.434510.814192.465421.09283.346.54304152.79845.6222317.6788.65225.329.02656.8674.85266.8711.29528.1369.0986.719.01659.728.35712.7544.26450.442.971948.780.6229697.3120.16296.6212.811544.545.021180.463.25179307.3721.6611244.681.54657646.1093.19151313.23512.227681.777243.61512.0038.90638.9354.6291796.69192.1325.58584.0985.3785.5785.7486.274.1249.494.47259167.175.62101.502178.63162.69162.35162.45162.59162.6888.412.89253129.709.6140.097014.0414.14172.059.3267615.83413.12413.19413.29414.51414.297.2458.24915469.7147.925.577858456.996.76526.64136.60572510.98406.411455.95545.4489OpenBenchmarking.org

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 512 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.51531.03061.54592.06122.5765SE +/- 0.00, N = 32.29

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 256 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.5131.0261.5392.0522.565SE +/- 0.00, N = 32.28

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 512 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.01, N = 38.04

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: VGG-16ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.06, N = 38.95

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 256 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.00, N = 38.04

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 64 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.51081.02161.53242.04322.554SE +/- 0.00, N = 32.27

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: VGG-16ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 39.11

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 512 - Model: GoogLeNetASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.04, N = 327.08

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.01, N = 328.13

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the UnionASUS NVIDIA GeForce RTX 3090400800120016002000SE +/- 2.12, N = 32000.101. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Isotonic / Perturbed LogarithmASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 0.23, N = 31469.791. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Isotonic / LogisticASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 0.80, N = 31447.691. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Hist Gradient Boosting AdultASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 4.93, N = 31173.891. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 32 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.51081.02161.53242.04322.554SE +/- 0.00, N = 32.27

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Hist Gradient BoostingASUS NVIDIA GeForce RTX 30902004006008001000SE +/- 3.91, N = 31120.041. (F9X) gfortran options: -O0

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsASUS NVIDIA GeForce RTX 30909K18K27K36K45KSE +/- 208.39, N = 341982.21. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 512 - Model: AlexNetASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.42, N = 344.64

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 256 - Model: GoogLeNetASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.05, N = 327.04

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.18, N = 327.49

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 64 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.00, N = 38.00

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2ASUS NVIDIA GeForce RTX 30902004006008001000SE +/- 35.09, N = 129041. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3SmallASUS NVIDIA GeForce RTX 30902004006008001000SE +/- 16.32, N = 129281. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3LargeASUS NVIDIA GeForce RTX 3090400800120016002000SE +/- 29.31, N = 1218341. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2ASUS NVIDIA GeForce RTX 3090400800120016002000SE +/- 50.43, N = 1216461. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1ASUS NVIDIA GeForce RTX 30905001000150020002500SE +/- 74.47, N = 1221971. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3SmallASUS NVIDIA GeForce RTX 30902004006008001000SE +/- 29.73, N = 128201. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3LargeASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 66.95, N = 1215851. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2ASUS NVIDIA GeForce RTX 30902004006008001000SE +/- 17.99, N = 1211251. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1ASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 27.87, N = 1213941. (CXX) g++ options: -O3 -lrt -lm

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 3090246810SE +/- 0.22, N = 96.97MIN: 6.34 / MAX: 8.11

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 3090246810SE +/- 0.24, N = 97.26MIN: 6.42 / MAX: 8.05

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 16 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.50631.01261.51892.02522.5315SE +/- 0.00, N = 32.25

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Sparse Random Projections / 100 IterationsASUS NVIDIA GeForce RTX 3090130260390520650SE +/- 1.77, N = 3587.841. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 64 - Model: VGG-16ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 39.31

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: SAGAASUS NVIDIA GeForce RTX 3090110220330440550SE +/- 0.32, N = 3521.311. (F9X) gfortran options: -O0

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 16 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.20, N = 1211.49MIN: 10.07 / MAX: 12.18

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.14, N = 1211.46MIN: 10.06 / MAX: 11.81

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 256 - Model: AlexNetASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.40, N = 344.05

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the UnionASUS NVIDIA GeForce RTX 3090140280420560700SE +/- 0.42, N = 3638.011. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.19, N = 1211.45MIN: 10.05 / MAX: 12.11

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: GoogLeNetASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.03, N = 393.78

LeelaChessZero

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.31.1Backend: BLASASUS NVIDIA GeForce RTX 30904080120160200SE +/- 1.83, N = 51661. (CXX) g++ options: -flto -pthread

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: Isotonic / Pathological

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetASUS NVIDIA GeForce RTX 3090246810SE +/- 0.07, N = 96.56MIN: 5.01 / MAX: 14.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerASUS NVIDIA GeForce RTX 30901632486480SE +/- 0.03, N = 970.75MIN: 67.65 / MAX: 81.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.69, N = 984.85MIN: 22.37 / MAX: 1211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdASUS NVIDIA GeForce RTX 309048121620SE +/- 0.48, N = 914.25MIN: 7.96 / MAX: 30.381. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyASUS NVIDIA GeForce RTX 309048121620SE +/- 0.19, N = 918.17MIN: 14.35 / MAX: 48.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50ASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.46, N = 922.13MIN: 13.15 / MAX: 49.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetASUS NVIDIA GeForce RTX 30901.23532.47063.70594.94126.1765SE +/- 0.01, N = 85.49MIN: 5.21 / MAX: 6.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.07, N = 97.29MIN: 5.96 / MAX: 11.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.01, N = 928.61MIN: 27.79 / MAX: 31.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.55, N = 919.43MIN: 9.05 / MAX: 43.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceASUS NVIDIA GeForce RTX 3090246810SE +/- 0.11, N = 86.00MIN: 2.81 / MAX: 17.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0ASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.83, N = 918.73MIN: 7.31 / MAX: 37.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetASUS NVIDIA GeForce RTX 30901.22182.44363.66544.88726.109SE +/- 0.03, N = 95.43MIN: 4.16 / MAX: 9.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.50, N = 910.51MIN: 4.55 / MAX: 29.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.60, N = 97.93MIN: 4.48 / MAX: 21.461. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2ASUS NVIDIA GeForce RTX 30901.2872.5743.8615.1486.435SE +/- 0.13, N = 95.72MIN: 4.2 / MAX: 19.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetASUS NVIDIA GeForce RTX 309048121620SE +/- 0.37, N = 917.34MIN: 9.39 / MAX: 40.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDetASUS NVIDIA GeForce RTX 3090246810SE +/- 0.13, N = 96.47MIN: 4.73 / MAX: 17.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformerASUS NVIDIA GeForce RTX 30901632486480SE +/- 0.06, N = 970.73MIN: 67.43 / MAX: 80.711. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400mASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.75, N = 982.63MIN: 22.37 / MAX: 120.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssdASUS NVIDIA GeForce RTX 309048121620SE +/- 0.29, N = 913.62MIN: 8.06 / MAX: 30.151. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tinyASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.31, N = 918.40MIN: 14.15 / MAX: 42.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet50ASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.34, N = 922.09MIN: 13.37 / MAX: 48.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnetASUS NVIDIA GeForce RTX 30901.25552.5113.76655.0226.2775SE +/- 0.05, N = 95.58MIN: 5.23 / MAX: 13.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet18ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.04, N = 97.50MIN: 5.96 / MAX: 14.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg16ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.01, N = 928.60MIN: 27.69 / MAX: 31.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenetASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.62, N = 919.65MIN: 9.41 / MAX: 46.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazefaceASUS NVIDIA GeForce RTX 30901.33882.67764.01645.35526.694SE +/- 0.11, N = 95.95MIN: 2.83 / MAX: 16.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b0ASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.78, N = 919.35MIN: 7.5 / MAX: 39.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnetASUS NVIDIA GeForce RTX 30901.22852.4573.68554.9146.1425SE +/- 0.05, N = 95.46MIN: 4.13 / MAX: 13.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v2ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.58, N = 911.05MIN: 4.48 / MAX: 28.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v3ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.34, N = 98.28MIN: 4.71 / MAX: 21.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v2ASUS NVIDIA GeForce RTX 30901.24432.48863.73294.97726.2215SE +/- 0.07, N = 95.53MIN: 4.11 / MAX: 12.431. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenetASUS NVIDIA GeForce RTX 309048121620SE +/- 0.47, N = 916.46MIN: 9.33 / MAX: 40.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: SGDOneClassSVMASUS NVIDIA GeForce RTX 30904080120160200SE +/- 1.90, N = 6196.471. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 32 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.01, N = 37.98

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot Parallel PairwiseASUS NVIDIA GeForce RTX 309070140210280350SE +/- 1.13, N = 3332.191. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Hist Gradient Boosting Higgs BosonASUS NVIDIA GeForce RTX 30904080120160200SE +/- 2.11, N = 5190.581. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Covertype Dataset BenchmarkASUS NVIDIA GeForce RTX 309070140210280350SE +/- 0.13, N = 3317.461. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 32 - Model: VGG-16ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.02, N = 39.22

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 1 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.35330.70661.05991.41321.7665SE +/- 0.02, N = 151.57

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: LassoASUS NVIDIA GeForce RTX 309060120180240300SE +/- 1.16, N = 3258.721. (F9X) gfortran options: -O0

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.38, N = 1529.33MIN: 25.91 / MAX: 30.51

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 64 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.32, N = 1529.52MIN: 25.74 / MAX: 30.64

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 1 - Model: ResNet-152ASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.44, N = 1518.80MIN: 15.08 / MAX: 19.89

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: NASNet MobileASUS NVIDIA GeForce RTX 309070K140K210K280K350KSE +/- 11554.76, N = 15311542

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 3090130260390520650SE +/- 6.22, N = 15604.551. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30900.37270.74541.11811.49081.8635SE +/- 0.01666, N = 151.656521. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception ResNet V2ASUS NVIDIA GeForce RTX 309030K60K90K120K150KSE +/- 4137.26, N = 15128179

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception V4ASUS NVIDIA GeForce RTX 30906K12K18K24K30KSE +/- 395.13, N = 1526961.0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: GoogLeNetASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.03, N = 393.45

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 309020406080100SE +/- 1.36, N = 1595.561. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30903691215SE +/- 0.14, N = 1510.491. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 3090246810SE +/- 0.05490, N = 158.686141. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.72, N = 15115.171. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet FloatASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 17.51, N = 151383.02

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: SqueezeNetASUS NVIDIA GeForce RTX 3090400800120016002000SE +/- 15.29, N = 151904.01

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 3090246810SE +/- 0.05367, N = 156.144251. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30904080120160200SE +/- 1.41, N = 15162.891. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 309048121620SE +/- 0.12, N = 1513.941. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30901632486480SE +/- 0.62, N = 1571.811. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 3090246810SE +/- 0.01, N = 36.48MIN: 6.44 / MAX: 6.62

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 64 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.01, N = 326.52

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 64 - Model: GoogLeNetASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.01, N = 326.91

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Hist Gradient Boosting Categorical OnlyASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.28, N = 3195.301. (F9X) gfortran options: -O0

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 3090246810SE +/- 0.07, N = 37.82MIN: 6.39 / MAX: 7.98

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 3090246810SE +/- 0.05, N = 37.92MIN: 7.36 / MAX: 8.02

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: TSNE MNIST DatasetASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.44, N = 3188.331. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: GLMASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.31, N = 3183.481. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: AlexNetASUS NVIDIA GeForce RTX 309050100150200250SE +/- 0.02, N = 3245.67

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: TreeASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.38, N = 1542.771. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 16 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090246810SE +/- 0.01, N = 37.93

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Isolation ForestASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.04, N = 3147.521. (F9X) gfortran options: -O0

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-base.en - Input: 2016 State of the UnionASUS NVIDIA GeForce RTX 309050100150200250SE +/- 0.74, N = 3209.021. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot HierarchicalASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.14, N = 3153.061. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 16 - Model: VGG-16ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.06, N = 38.97

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: LocalOutlierFactorASUS NVIDIA GeForce RTX 3090816243240SE +/- 0.32, N = 1535.651. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Hist Gradient Boosting ThreadingASUS NVIDIA GeForce RTX 3090306090120150SE +/- 1.02, N = 3140.751. (F9X) gfortran options: -O0

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 32 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 310.16MIN: 10.01 / MAX: 10.25

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot Polynomial Kernel ApproximationASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.20, N = 3133.121. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 64 - Model: AlexNetASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.02, N = 342.49

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Feature ExpansionsASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.09, N = 3120.331. (F9X) gfortran options: -O0

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 64 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30903691215SE +/- 0.05, N = 311.66MIN: 11.02 / MAX: 11.79

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 32 - Model: GoogLeNetASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.04, N = 326.63

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 32 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.01, N = 326.82

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot NeighborsASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.18, N = 396.761. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: AlexNetASUS NVIDIA GeForce RTX 309050100150200250SE +/- 0.07, N = 3227.40

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot Incremental PCAASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.18, N = 1521.311. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: SparsifyASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.25, N = 385.981. (F9X) gfortran options: -O0

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Sample Without ReplacementASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.15, N = 384.271. (F9X) gfortran options: -O0

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: DNN - Deep Neural NetworkASUS NVIDIA GeForce RTX 30905K10K15K20K25KSE +/- 635.50, N = 13255521. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: inception-v3ASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.04, N = 321.37MIN: 20.43 / MAX: 62.251. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: mobilenet-v1-1.0ASUS NVIDIA GeForce RTX 30900.46890.93781.40671.87562.3445SE +/- 0.015, N = 32.084MIN: 1.99 / MAX: 26.251. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: MobileNetV2_224ASUS NVIDIA GeForce RTX 30900.42350.8471.27051.6942.1175SE +/- 0.011, N = 31.882MIN: 1.79 / MAX: 33.111. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: SqueezeNetV1.0ASUS NVIDIA GeForce RTX 30900.66041.32081.98122.64163.302SE +/- 0.049, N = 32.935MIN: 2.7 / MAX: 34.141. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: resnet-v2-50ASUS NVIDIA GeForce RTX 309048121620SE +/- 0.08, N = 314.36MIN: 13.6 / MAX: 59.491. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: squeezenetv1.1ASUS NVIDIA GeForce RTX 30900.41310.82621.23931.65242.0655SE +/- 0.006, N = 31.836MIN: 1.71 / MAX: 31.71. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: mobilenetV3ASUS NVIDIA GeForce RTX 30900.22770.45540.68310.91081.1385SE +/- 0.003, N = 31.012MIN: 0.93 / MAX: 24.11. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: nasnetASUS NVIDIA GeForce RTX 3090246810SE +/- 0.004, N = 36.956MIN: 6.61 / MAX: 53.041. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 30903691215SE +/- 0.12, N = 310.55MIN: 6.85 / MAX: 13.27

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkASUS NVIDIA GeForce RTX 3090140280420560700SE +/- 0.55, N = 3644.54

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Kernel PCA Solvers / Time vs. N SamplesASUS NVIDIA GeForce RTX 30901530456075SE +/- 0.35, N = 369.631. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 32 - Model: AlexNetASUS NVIDIA GeForce RTX 3090918273645SE +/- 0.40, N = 341.45

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: SGD RegressionASUS NVIDIA GeForce RTX 30901428425670SE +/- 0.12, N = 363.791. (F9X) gfortran options: -O0

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 309020406080100SE +/- 1.09, N = 489.761. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30903691215SE +/- 0.13, N = 411.151. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUASUS NVIDIA GeForce RTX 30906001200180024003000SE +/- 1.60, N = 32702.00MIN: 2689.781. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 64 - Model: GoogLeNetASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.26, N = 394.21

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 2.94, N = 31452.76MIN: 1432.321. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: MNIST DatasetASUS NVIDIA GeForce RTX 30901224364860SE +/- 0.03, N = 353.541. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 16 - Model: GoogLeNetASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.06, N = 325.99

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 16 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.02, N = 327.80

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 16 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.12, N = 330.07MIN: 28.1 / MAX: 30.36

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 6.34, N = 31375.14MIN: 1104.54 / MAX: 1761.351. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 30900.97431.94862.92293.89724.8715SE +/- 0.03, N = 34.331. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.24, N = 330.14MIN: 18.07 / MAX: 30.74

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 19.21, N = 31345.661. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30900.16730.33460.50190.66920.8365SE +/- 0.010713, N = 30.7434361. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30902004006008001000SE +/- 2.93, N = 31059.491. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30900.21240.42480.63720.84961.062SE +/- 0.002603, N = 30.9438671. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 309080160240320400SE +/- 0.21, N = 3382.74MIN: 223.52 / MAX: 881.71. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 309048121620SE +/- 0.05, N = 315.611. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 32 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090714212835SE +/- 0.11, N = 330.27MIN: 28.61 / MAX: 30.57

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.32, N = 3400.351. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30900.5621.1241.6862.2482.81SE +/- 0.00202, N = 32.497791. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.40, N = 3115.87MIN: 60.69 / MAX: 196.81. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUASUS NVIDIA GeForce RTX 30901224364860SE +/- 0.18, N = 351.691. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.46, N = 3146.25MIN: 120.28 / MAX: 197.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090918273645SE +/- 0.13, N = 340.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP32 - Device: CPUASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.71, N = 3144.47MIN: 72.51 / MAX: 211.861. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP32 - Device: CPUASUS NVIDIA GeForce RTX 3090918273645SE +/- 0.20, N = 341.471. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30903691215SE +/- 0.18, N = 312.591. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 309020406080100SE +/- 1.14, N = 379.431. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30903691215SE +/- 0.07, N = 310.811. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.61, N = 392.471. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.05, N = 321.09MIN: 9.8 / MAX: 54.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 309060120180240300SE +/- 0.63, N = 3283.341. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 3090246810SE +/- 0.01981, N = 36.543041. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.46, N = 3152.801. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot OMP vs. LARSASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.29, N = 345.621. (F9X) gfortran options: -O0

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet QuantASUS NVIDIA GeForce RTX 30905001000150020002500SE +/- 20.94, N = 32317.67

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16 - Device: CPUASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.10, N = 388.65MIN: 56.64 / MAX: 150.061. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16 - Device: CPUASUS NVIDIA GeForce RTX 309050100150200250SE +/- 0.26, N = 3225.321. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUASUS NVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 39.02MIN: 6.21 / MAX: 25.891. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090140280420560700SE +/- 0.48, N = 3656.861. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.19, N = 374.85MIN: 56.4 / MAX: 150.261. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 309060120180240300SE +/- 0.68, N = 3266.871. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 311.29MIN: 5.5 / MAX: 26.391. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090110220330440550SE +/- 0.48, N = 3528.131. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16 - Device: CPUASUS NVIDIA GeForce RTX 30901530456075SE +/- 0.30, N = 369.09MIN: 27.97 / MAX: 92.161. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16 - Device: CPUASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.37, N = 386.711. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUASUS NVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 39.01MIN: 4.38 / MAX: 37.431. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090140280420560700SE +/- 0.35, N = 3659.721. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 3090246810SE +/- 0.00, N = 38.35MIN: 4.12 / MAX: 34.481. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 3090150300450600750SE +/- 0.27, N = 3712.751. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.07, N = 344.26MIN: 25.84 / MAX: 77.31. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090100200300400500SE +/- 0.70, N = 3450.441. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 30900.66831.33662.00492.67323.3415SE +/- 0.00, N = 32.97MIN: 1.4 / MAX: 22.51. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 3090400800120016002000SE +/- 3.09, N = 31948.781. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 30900.13950.2790.41850.5580.6975SE +/- 0.00, N = 30.62MIN: 0.31 / MAX: 13.61. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 30906K12K18K24K30KSE +/- 14.75, N = 329697.311. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 3090510152025SE +/- 0.10, N = 320.16MIN: 7.21 / MAX: 40.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16 - Device: CPUASUS NVIDIA GeForce RTX 309060120180240300SE +/- 1.40, N = 3296.621. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 30903691215SE +/- 0.00, N = 312.81MIN: 7.26 / MAX: 52.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 0.90, N = 31544.541. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16 - Device: CPUASUS NVIDIA GeForce RTX 30901.12952.2593.38854.5185.6475SE +/- 0.01, N = 35.02MIN: 2.23 / MAX: 27.191. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16 - Device: CPUASUS NVIDIA GeForce RTX 309030060090012001500SE +/- 1.52, N = 31180.461. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 30900.73171.46342.19512.92683.6585SE +/- 0.00596, N = 33.251791. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: ParallelASUS NVIDIA GeForce RTX 309070140210280350SE +/- 0.56, N = 3307.371. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUASUS NVIDIA GeForce RTX 30900.37350.7471.12051.4941.8675SE +/- 0.00, N = 31.66MIN: 0.78 / MAX: 17.391. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUASUS NVIDIA GeForce RTX 30902K4K6K8K10KSE +/- 7.15, N = 311244.681. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30900.3480.6961.0441.3921.74SE +/- 0.00165, N = 31.546571. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 3090140280420560700SE +/- 0.70, N = 3646.111. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30900.71811.43622.15432.87243.5905SE +/- 0.00923, N = 33.191511. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 309070140210280350SE +/- 0.91, N = 3313.241. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 30903691215SE +/- 0.00, N = 312.231. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: StandardASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.01, N = 381.781. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Plot WardASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.60, N = 343.621. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 1 - Model: GoogLeNetASUS NVIDIA GeForce RTX 30903691215SE +/- 0.21, N = 1512.00

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Text VectorizersASUS NVIDIA GeForce RTX 3090918273645SE +/- 0.03, N = 338.911. (F9X) gfortran options: -O0

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 16 - Model: AlexNetASUS NVIDIA GeForce RTX 3090918273645SE +/- 0.43, N = 338.93

DeepSpeech

Mozilla DeepSpeech is a speech-to-text engine powered by TensorFlow for machine learning and derived from Baidu's Deep Speech research paper. This test profile times the speech-to-text process for a roughly three minute audio recording. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPUASUS NVIDIA GeForce RTX 30901224364860SE +/- 0.08, N = 354.63

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: Plot Non-Negative Matrix Factorization

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: KeyError:

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 32 - Model: GoogLeNetASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.06, N = 396.69

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 64 - Model: AlexNetASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.32, N = 3192.13

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: Kernel PCA Solvers / Time vs. N ComponentsASUS NVIDIA GeForce RTX 3090612182430SE +/- 0.23, N = 325.591. (F9X) gfortran options: -O0

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.95, N = 384.09MIN: 70.71 / MAX: 86.82

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.50, N = 385.37MIN: 74.95 / MAX: 88.01

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.11, N = 385.57MIN: 73.68 / MAX: 87.24

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.47, N = 385.74MIN: 75.44 / MAX: 87.56

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.12, N = 386.27MIN: 76.41 / MAX: 88.03

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 1 - Model: VGG-16ASUS NVIDIA GeForce RTX 30900.9271.8542.7813.7084.635SE +/- 0.00, N = 34.12

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 1 - Model: ResNet-50ASUS NVIDIA GeForce RTX 30901122334455SE +/- 0.18, N = 349.49MIN: 30.42 / MAX: 50.09

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUASUS NVIDIA GeForce RTX 30901.00632.01263.01894.02525.0315SE +/- 0.00491, N = 34.47259MIN: 4.021. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 32 - Model: AlexNetASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.14, N = 3167.17

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 1 - Model: ResNet-50ASUS NVIDIA GeForce RTX 30901.26452.5293.79355.0586.3225SE +/- 0.05, N = 35.62

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 16 - Model: GoogLeNetASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.15, N = 3101.50

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthASUS NVIDIA GeForce RTX 30905001000150020002500SE +/- 2.20, N = 32178.631. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.16, N = 3162.69MIN: 143.93 / MAX: 165.52

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.22, N = 3162.35MIN: 99.28 / MAX: 165.46

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.21, N = 3162.45MIN: 87.19 / MAX: 165.45

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.56, N = 3162.59MIN: 114.4 / MAX: 167.22

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.10, N = 3162.68MIN: 145.92 / MAX: 165.55

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lASUS NVIDIA GeForce RTX 309020406080100SE +/- 0.70, N = 388.41MIN: 76.75 / MAX: 90.99

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUASUS NVIDIA GeForce RTX 30900.65081.30161.95242.60323.254SE +/- 0.00897, N = 32.89253MIN: 2.721. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 16 - Model: AlexNetASUS NVIDIA GeForce RTX 3090306090120150SE +/- 0.01, N = 3129.70

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 1.2.2Benchmark: 20 Newsgroups / Logistic RegressionASUS NVIDIA GeForce RTX 30903691215SE +/- 0.072, N = 39.6141. (F9X) gfortran options: -O0

R Benchmark

This test is a quick-running survey of general R performance Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterR BenchmarkASUS NVIDIA GeForce RTX 30900.02180.04360.06540.08720.109SE +/- 0.0004, N = 30.0970

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 1 - Model: ResNet-50ASUS NVIDIA GeForce RTX 309048121620SE +/- 0.07, N = 314.04

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: GPU - Batch Size: 1 - Model: AlexNetASUS NVIDIA GeForce RTX 309048121620SE +/- 0.07, N = 314.14

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152ASUS NVIDIA GeForce RTX 30904080120160200SE +/- 0.36, N = 3172.05MIN: 143.74 / MAX: 175.36

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: RCV1 Logreg Convergencet

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: IndexError: list index out of range

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUASUS NVIDIA GeForce RTX 30903691215SE +/- 0.00039, N = 39.32676MIN: 9.11. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 1 - Model: AlexNetASUS NVIDIA GeForce RTX 309048121620SE +/- 0.02, N = 315.83

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50ASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.66, N = 3413.12MIN: 161.19 / MAX: 421.75

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50ASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.17, N = 3413.19MIN: 321.33 / MAX: 421.07

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50ASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.26, N = 3413.29MIN: 337.68 / MAX: 420.31

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50ASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.17, N = 3414.51MIN: 337.19 / MAX: 422.34

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50ASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.07, N = 3414.29MIN: 339.59 / MAX: 422.13

RNNoise

RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 0.2Input: 26 Minute Long Talking SampleASUS NVIDIA GeForce RTX 3090246810SE +/- 0.014, N = 37.2451. (CC) gcc options: -O2 -pedantic -fvisibility=hidden

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUASUS NVIDIA GeForce RTX 3090246810SE +/- 0.00332, N = 38.24915MIN: 7.761. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50ASUS NVIDIA GeForce RTX 3090100200300400500SE +/- 1.86, N = 3469.71MIN: 339.62 / MAX: 480.89

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 1 - Model: GoogLeNetASUS NVIDIA GeForce RTX 30901122334455SE +/- 0.60, N = 347.92

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUASUS NVIDIA GeForce RTX 30901.2552.513.7655.026.275SE +/- 0.00815, N = 35.57785MIN: 5.371. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NASUS NVIDIA GeForce RTX 30902K4K6K8K10KSE +/- 38.88, N = 38456.991. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackASUS NVIDIA GeForce RTX 3090246810SE +/- 0.0000, N = 36.76521. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadASUS NVIDIA GeForce RTX 3090246810SE +/- 0.0002, N = 36.64131. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadASUS NVIDIA GeForce RTX 3090246810SE +/- 0.0011, N = 36.60571. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPASUS NVIDIA GeForce RTX 30905001000150020002500SE +/- 35.21, N = 32510.981. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionASUS NVIDIA GeForce RTX 309090180270360450SE +/- 0.22, N = 3406.411. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DASUS NVIDIA GeForce RTX 3090100200300400500SE +/- 0.72, N = 3455.961. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashASUS NVIDIA GeForce RTX 30901020304050SE +/- 0.01, N = 345.451. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: Plot Lasso Path

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'matplotlib.tri.triangulation'

Benchmark: Plot Fast KMeans

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'matplotlib.tri.triangulation'

Benchmark: Plot Singular Value Decomposition

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'matplotlib.tri.triangulation'

Benchmark: Glmnet

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'glmnet'

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Inference - Network: VGG16 - Device: CPU

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

Benchmark: scikit_linearridgeregression

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Benchmark: scikit_qda

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Benchmark: scikit_svm

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Benchmark: scikit_ica

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

Detector: Contextual Anomaly Detector OSE

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

spaCy

The spaCy library is an open-source solution for advanced neural language processing (NLP). The spaCy library leverages Python and is a leading neural language processing solution. This test profile times the spaCy CPU performance with various models. Learn more via the OpenBenchmarking.org test page.

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tqdm'

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

Detector: Bayesian Changepoint

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: Earthgecko Skyline

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: Windowed Gaussian

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: KNN CAD

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: Relative Entropy

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

ECP-CANDLE

The CANDLE benchmark codes implement deep learning architectures relevant to problems in cancer. These architectures address problems at different biological scales, specifically problems at the molecular, cellular and population scales. Learn more via the OpenBenchmarking.org test page.

Benchmark: P1B2

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

Benchmark: P3B1

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "GPT2/model.onnx" failed: No such file or directory

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "FasterRCNN-12-int8/FasterRCNN-12-int8.onnx" failed: No such file or directory

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "resnet100/resnet100.onnx" failed: No such file or directory

Model: bertsquad-12 - Device: CPU - Executor: Parallel

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "bertsquad-12/bertsquad-12.onnx" failed: No such file or directory

Model: GPT-2 - Device: CPU - Executor: Standard

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "GPT2/model.onnx" failed: No such file or directory

Llama.cpp

Model: llama-2-13b.Q4_0.gguf

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: main: error: unable to load model

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "FasterRCNN-12-int8/FasterRCNN-12-int8.onnx" failed: No such file or directory

ECP-CANDLE

The CANDLE benchmark codes implement deep learning architectures relevant to problems in cancer. These architectures address problems at different biological scales, specifically problems at the molecular, cellular and population scales. Learn more via the OpenBenchmarking.org test page.

Benchmark: P3B2

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

Llama.cpp

Model: llama-2-70b-chat.Q5_0.gguf

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: main: error: unable to load model

Model: llama-2-7b.Q4_0.gguf

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: main: error: unable to load model

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "resnet100/resnet100.onnx" failed: No such file or directory

Model: bertsquad-12 - Device: CPU - Executor: Standard

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "bertsquad-12/bertsquad-12.onnx" failed: No such file or directory

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./run-wizardcoder: line 2: ./wizardcoder-python-34b-v1.0.Q6_K.llamafile.86: No such file or directory

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./run-mistral: line 2: ./mistral-7b-instruct-v0.2.Q5_K_M.llamafile.86: No such file or directory

Test: llava-v1.5-7b-q4 - Acceleration: CPU

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./run-llava: line 2: ./llava-v1.6-mistral-7b.Q8_0.llamafile.86: No such file or directory

TNN

TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.

Target: CPU - Model: SqueezeNet v1.1

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./tnn: 3: ./test/TNNTest: not found

Target: CPU - Model: SqueezeNet v2

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./tnn: 3: ./test/TNNTest: not found

Target: CPU - Model: MobileNet v2

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./tnn: 3: ./test/TNNTest: not found

Target: CPU - Model: DenseNet

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./tnn: 3: ./test/TNNTest: not found

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

Model: GoogleNet - Acceleration: CPU - Iterations: 1000

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: GoogleNet - Acceleration: CPU - Iterations: 200

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: GoogleNet - Acceleration: CPU - Iterations: 100

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: AlexNet - Acceleration: CPU - Iterations: 200

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: AlexNet - Acceleration: CPU - Iterations: 100

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: Llama2 Chat 7b Quantized - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: Llama2 Chat 7b Quantized - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

Model: AlexNet - Acceleration: CPU - Iterations: 1000

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream

ASUS NVIDIA GeForce RTX 3090: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./deepsparse: 2: /.local/bin/deepsparse.benchmark: not found

272 Results Shown

TensorFlow:
  GPU - 512 - VGG-16
  GPU - 256 - VGG-16
  GPU - 512 - ResNet-50
  CPU - 512 - VGG-16
  GPU - 256 - ResNet-50
  GPU - 64 - VGG-16
  CPU - 256 - VGG-16
  GPU - 512 - GoogLeNet
  CPU - 512 - ResNet-50
Whisper.cpp
Scikit-Learn:
  Isotonic / Perturbed Logarithm
  Isotonic / Logistic
  Hist Gradient Boosting Adult
TensorFlow
Scikit-Learn
SHOC Scalable HeterOgeneous Computing
TensorFlow:
  GPU - 512 - AlexNet
  GPU - 256 - GoogLeNet
  CPU - 256 - ResNet-50
  GPU - 64 - ResNet-50
XNNPACK:
  QS8MobileNetV2
  FP16MobileNetV3Small
  FP16MobileNetV3Large
  FP16MobileNetV2
  FP16MobileNetV1
  FP32MobileNetV3Small
  FP32MobileNetV3Large
  FP32MobileNetV2
  FP32MobileNetV1
PyTorch:
  CPU - 16 - Efficientnet_v2_l
  CPU - 512 - Efficientnet_v2_l
TensorFlow
Scikit-Learn
TensorFlow
Scikit-Learn
PyTorch:
  CPU - 16 - ResNet-152
  CPU - 512 - ResNet-152
TensorFlow
Whisper.cpp
PyTorch
TensorFlow
LeelaChessZero
NCNN:
  Vulkan GPU - FastestDet
  Vulkan GPU - vision_transformer
  Vulkan GPU - regnety_400m
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - yolov4-tiny
  Vulkan GPU - resnet50
  Vulkan GPU - alexnet
  Vulkan GPU - resnet18
  Vulkan GPU - vgg16
  Vulkan GPU - googlenet
  Vulkan GPU - blazeface
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - mnasnet
  Vulkan GPU - shufflenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU - mobilenet
  CPU - FastestDet
  CPU - vision_transformer
  CPU - regnety_400m
  CPU - squeezenet_ssd
  CPU - yolov4-tiny
  CPU - resnet50
  CPU - alexnet
  CPU - resnet18
  CPU - vgg16
  CPU - googlenet
  CPU - blazeface
  CPU - efficientnet-b0
  CPU - mnasnet
  CPU - shufflenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU-v2-v2 - mobilenet-v2
  CPU - mobilenet
Scikit-Learn
TensorFlow
Scikit-Learn:
  Plot Parallel Pairwise
  Hist Gradient Boosting Higgs Boson
  Covertype Dataset Benchmark
TensorFlow:
  CPU - 32 - VGG-16
  GPU - 1 - VGG-16
Scikit-Learn
PyTorch:
  CPU - 256 - ResNet-50
  CPU - 64 - ResNet-50
  CPU - 1 - ResNet-152
TensorFlow Lite
ONNX Runtime:
  fcn-resnet101-11 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
TensorFlow Lite:
  Inception ResNet V2
  Inception V4
TensorFlow
ONNX Runtime:
  yolov4 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  T5 Encoder - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
TensorFlow Lite:
  Mobilenet Float
  SqueezeNet
ONNX Runtime:
  ResNet50 v1-12-int8 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  super-resolution-10 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
PyTorch
TensorFlow:
  CPU - 64 - ResNet-50
  GPU - 64 - GoogLeNet
Scikit-Learn
PyTorch:
  CPU - 64 - Efficientnet_v2_l
  CPU - 32 - Efficientnet_v2_l
Scikit-Learn:
  TSNE MNIST Dataset
  GLM
TensorFlow
Scikit-Learn
TensorFlow
Scikit-Learn
Whisper.cpp
Scikit-Learn
TensorFlow
Scikit-Learn:
  LocalOutlierFactor
  Hist Gradient Boosting Threading
PyTorch
Scikit-Learn
TensorFlow
Scikit-Learn
PyTorch
TensorFlow:
  GPU - 32 - GoogLeNet
  CPU - 32 - ResNet-50
Scikit-Learn
TensorFlow
Scikit-Learn:
  Plot Incremental PCA
  Sparsify
  Sample Without Replacement
OpenCV
Mobile Neural Network:
  inception-v3
  mobilenet-v1-1.0
  MobileNetV2_224
  SqueezeNetV1.0
  resnet-v2-50
  squeezenetv1.1
  mobilenetV3
  nasnet
PyTorch
Numpy Benchmark
Scikit-Learn
TensorFlow
Scikit-Learn
ONNX Runtime:
  yolov4 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
oneDNN
TensorFlow
oneDNN
Scikit-Learn
TensorFlow:
  GPU - 16 - GoogLeNet
  CPU - 16 - ResNet-50
PyTorch
OpenVINO:
  Face Detection FP16 - CPU:
    ms
    FPS
PyTorch
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  ResNet101_DUC_HDC-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
OpenVINO:
  Face Detection FP16-INT8 - CPU:
    ms
    FPS
PyTorch
ONNX Runtime:
  fcn-resnet101-11 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
OpenVINO:
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
  Person Detection FP16 - CPU:
    ms
    FPS
  Person Detection FP32 - CPU:
    ms
    FPS
ONNX Runtime:
  ZFNet-512 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  ZFNet-512 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
OpenVINO:
  Road Segmentation ADAS FP16-INT8 - CPU:
    ms
    FPS
ONNX Runtime:
  T5 Encoder - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
Scikit-Learn
TensorFlow Lite
OpenVINO:
  Handwritten English Recognition FP16 - CPU:
    ms
    FPS
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Handwritten English Recognition FP16-INT8 - CPU:
    ms
    FPS
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
  Road Segmentation ADAS FP16 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    ms
    FPS
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
  Face Detection Retail FP16-INT8 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
    ms
    FPS
  Vehicle Detection FP16 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
  Face Detection Retail FP16 - CPU:
    ms
    FPS
ONNX Runtime:
  CaffeNet 12-int8 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
OpenVINO:
  Age Gender Recognition Retail 0013 FP16 - CPU:
    ms
    FPS
ONNX Runtime:
  CaffeNet 12-int8 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  ResNet50 v1-12-int8 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  super-resolution-10 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
Scikit-Learn
TensorFlow
Scikit-Learn
TensorFlow
DeepSpeech
TensorFlow:
  CPU - 32 - GoogLeNet
  CPU - 64 - AlexNet
Scikit-Learn
PyTorch:
  NVIDIA CUDA GPU - 32 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 64 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 256 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 16 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 512 - Efficientnet_v2_l
TensorFlow
PyTorch
oneDNN
TensorFlow:
  CPU - 32 - AlexNet
  GPU - 1 - ResNet-50
  CPU - 16 - GoogLeNet
SHOC Scalable HeterOgeneous Computing
PyTorch:
  NVIDIA CUDA GPU - 64 - ResNet-152
  NVIDIA CUDA GPU - 512 - ResNet-152
  NVIDIA CUDA GPU - 32 - ResNet-152
  NVIDIA CUDA GPU - 16 - ResNet-152
  NVIDIA CUDA GPU - 256 - ResNet-152
  NVIDIA CUDA GPU - 1 - Efficientnet_v2_l
oneDNN
TensorFlow
Scikit-Learn
R Benchmark
TensorFlow:
  CPU - 1 - ResNet-50
  GPU - 1 - AlexNet
PyTorch
oneDNN
TensorFlow
PyTorch:
  NVIDIA CUDA GPU - 256 - ResNet-50
  NVIDIA CUDA GPU - 32 - ResNet-50
  NVIDIA CUDA GPU - 64 - ResNet-50
  NVIDIA CUDA GPU - 512 - ResNet-50
  NVIDIA CUDA GPU - 16 - ResNet-50
RNNoise
oneDNN
PyTorch
TensorFlow
oneDNN
SHOC Scalable HeterOgeneous Computing:
  OpenCL - GEMM SGEMM_N
  OpenCL - Bus Speed Readback
  OpenCL - Bus Speed Download
  OpenCL - Triad
  OpenCL - FFT SP
  OpenCL - Reduction
  OpenCL - S3D
  OpenCL - MD5 Hash