phoronix-machine-learning.txt AMD Ryzen Threadripper 7960X 24-Cores testing with a Gigabyte TRX50 AERO D (FA BIOS) and Sapphire AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2411137-NE-PHORONIXM28&grr .
phoronix-machine-learning.txt Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution phoronix-ml.txt AMD Ryzen Threadripper 7960X 24-Cores @ 7.79GHz (24 Cores / 48 Threads) Gigabyte TRX50 AERO D (FA BIOS) AMD Device 14a4 4 x 32GB DDR5-5200MT/s Micron MTC20F1045S1RC56BG1 1000GB GIGABYTE AG512K1TB Sapphire AMD Radeon RX 7900 XTX 24GB AMD Device 14cc HP E273 Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Qualcomm WCN785x Wi-Fi 7 Ubuntu 24.04 6.8.0-48-generic (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.58) OpenCL 2.1 AMD-APP (3625.0) GCC 13.2.0 ext4 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 - BAR1 / Visible vRAM Size: 24560 MB - Python 3.12.3 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
phoronix-machine-learning.txt tensorflow: GPU - 512 - VGG-16 tensorflow: GPU - 256 - VGG-16 tensorflow: GPU - 512 - ResNet-50 scikit-learn: Isotonic / Pathological tensorflow: GPU - 256 - ResNet-50 tensorflow: GPU - 64 - VGG-16 scikit-learn: Isotonic / Perturbed Logarithm tensorflow: GPU - 512 - GoogLeNet scikit-learn: Isotonic / Logistic tensorflow: CPU - 512 - VGG-16 tensorflow: CPU - 256 - ResNet-50 tensorflow: GPU - 32 - VGG-16 tensorflow: GPU - 512 - AlexNet lczero: BLAS scikit-learn: SAGA tensorflow: GPU - 256 - GoogLeNet tensorflow: CPU - 512 - ResNet-50 tensorflow: CPU - 256 - VGG-16 pytorch: CPU - 512 - Efficientnet_v2_l tensorflow: GPU - 64 - ResNet-50 tensorflow: CPU - 512 - GoogLeNet tensorflow: GPU - 16 - VGG-16 scikit-learn: Sparse Rand Projections / 100 Iterations scikit-learn: Hist Gradient Boosting Adult whisper-cpp: ggml-medium.en - 2016 State of the Union tensorflow: GPU - 256 - AlexNet scikit-learn: Plot Parallel Pairwise scikit-learn: Hist Gradient Boosting Higgs Boson ncnn: CPU - FastestDet ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet scikit-learn: Covertype Dataset Benchmark scikit-learn: Lasso tensorflow: GPU - 32 - ResNet-50 scikit-learn: SGDOneClassSVM scikit-learn: TSNE MNIST Dataset openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU tensorflow-lite: Inception V4 tensorflow-lite: NASNet Mobile tensorflow-lite: SqueezeNet openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel scikit-learn: Isolation Forest tensorflow: GPU - 64 - GoogLeNet onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel tensorflow: CPU - 64 - VGG-16 openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU scikit-learn: GLM scikit-learn: Hist Gradient Boosting whisper-cpp: ggml-small.en - 2016 State of the Union tensorflow: GPU - 16 - ResNet-50 pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 64 - Efficientnet_v2_l mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 mnn: nasnet pytorch: CPU - 256 - Efficientnet_v2_l scikit-learn: Plot Hierarchical xnnpack: QS8MobileNetV2 xnnpack: FP16MobileNetV3Small xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV1 shoc: OpenCL - S3D opencv: DNN - Deep Neural Network scikit-learn: Hist Gradient Boosting Categorical Only scikit-learn: Plot Neighbors tensorflow: GPU - 64 - AlexNet scikit-learn: Sparsify scikit-learn: Plot Polynomial Kernel Approximation scikit-learn: Feature Expansions tensorflow: GPU - 32 - GoogLeNet tensorflow: CPU - 256 - GoogLeNet scikit-learn: Plot Ward tensorflow: CPU - 32 - VGG-16 scikit-learn: Sample Without Replacement pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 512 - ResNet-152 pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 16 - ResNet-152 numpy: tensorflow: CPU - 64 - ResNet-50 whisper-cpp: ggml-base.en - 2016 State of the Union scikit-learn: Tree tensorflow: CPU - 512 - AlexNet ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet scikit-learn: Hist Gradient Boosting Threading scikit-learn: SGD Regression scikit-learn: Kernel PCA Solvers / Time vs. N Samples pytorch: CPU - 1 - Efficientnet_v2_l onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard tensorflow: GPU - 32 - AlexNet onednn: Recurrent Neural Network Training - CPU onednn: IP Shapes 1D - CPU shoc: OpenCL - Max SP Flops onednn: Recurrent Neural Network Inference - CPU scikit-learn: MNIST Dataset tensorflow: GPU - 16 - GoogLeNet tensorflow: CPU - 16 - VGG-16 scikit-learn: Plot Incremental PCA scikit-learn: Text Vectorizers openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU tensorflow-lite: Inception ResNet V2 tensorflow-lite: Mobilenet Float tensorflow-lite: Mobilenet Quant openvino: Person Re-Identification Retail FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard tensorflow: CPU - 32 - ResNet-50 scikit-learn: Plot OMP vs. LARS tensorflow: GPU - 1 - VGG-16 pytorch: CPU - 1 - ResNet-152 tensorflow: GPU - 1 - AlexNet tensorflow: CPU - 256 - AlexNet onednn: IP Shapes 3D - CPU pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 512 - ResNet-50 pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 64 - ResNet-50 tensorflow: GPU - 16 - AlexNet scikit-learn: Kernel PCA Solvers / Time vs. N Components deepspeech: CPU tensorflow: CPU - 64 - GoogLeNet scikit-learn: LocalOutlierFactor tensorflow: CPU - 16 - ResNet-50 onednn: Deconvolution Batch shapes_1d - CPU pytorch: CPU - 1 - ResNet-50 tensorflow: GPU - 1 - ResNet-50 rbenchmark: tensorflow: CPU - 32 - GoogLeNet scikit-learn: 20 Newsgroups / Logistic Regression tensorflow: CPU - 64 - AlexNet tensorflow: CPU - 1 - VGG-16 tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 1 - ResNet-50 onednn: Convolution Batch Shapes Auto - CPU rnnoise: 26 Minute Long Talking Sample tensorflow: CPU - 16 - AlexNet tensorflow: GPU - 1 - GoogLeNet shoc: OpenCL - Texture Read Bandwidth tensorflow: CPU - 1 - AlexNet tensorflow: CPU - 1 - GoogLeNet onednn: Deconvolution Batch shapes_3d - CPU shoc: OpenCL - Triad shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Bus Speed Download shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Reduction shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream phoronix-ml.txt 2.55 2.55 9.34 3843.258 9.36 2.55 1528.966 28.73 1406.452 30.43 59.70 2.54 49.28 184 669.118 29.03 58.50 30.16 9.94 9.24 185.25 2.51 504.829 153.338 579.17077 48.92 167.997 65.736 9.80 40.59 18.58 14.34 24.14 13.85 13.33 5.56 8.02 25.71 16.42 3.11 8.28 6.11 8.15 6.45 6.30 13.85 320.394 308.225 9.14 233.407 247.556 11.53 2052.02 93.29 129.45 100.46 119.41 20372.6 33662.5 1836.28 25.83 465.86 11.12 1077.87 4.92768 203.059 176.287 28.53 834.572 1.20339 29.05 63.99 187.88 168.333 166.776 218.18523 8.94 9.85 9.90 9.98 36.452 3.784 3.268 6.429 18.534 4.327 2.536 15.297 10.11 141.463 1398 1464 2128 1495 1144 1503 2465 1873 1233 289.543 33080 30.188 114.839 47.85 108.455 104.700 100.353 27.91 227.06 42.104 28.32 90.640 17.64 17.92 17.99 17.78 17.97 715.50 70.89 92.75261 46.970 643.44 9.82 41.05 18.63 16.04 23.64 13.79 14.65 5.28 7.86 25.13 16.01 3.14 8.68 5.99 8.06 6.49 6.31 13.79 52.729 64.368 61.611 14.18 17.2875 57.8612 9.10376 109.863 5.81489 172.037 46.14 1261.40 1.13657 93757.3 736.400 52.736 26.91 27.34 31.209 45.340 607.48 19.69 320.42 37.36 770.906 1.29718 452.314 2.21086 242.908 4.11676 185.360 5.39684 3.67307 272.184 103.579 9.65442 17.62 679.66 6.18 1930.08 33356.3 1381.25 2501.21 4.72 2523.59 3.58 6458.38 21.58 1108.65 0.3 67537.77 5.51 2160.57 23.12 1035.29 0.43 48433.92 12.25 1947.67 6.31 3742.77 2.76 4273.09 1.56253 639.905 9.18292 108.875 3.04551 328.293 8.08149 123.736 10.2754 97.3177 67.94 41.476 2.20 23.19 15.38 627.50 1.39591 45.59 45.75 46.06 46.42 46.69 42.43 31.037 46.23475 225.16 21.616 62.29 3.77567 60.17 6.69 0.1252 218.07 10.450 516.18 9.70 198.11 409.56 18.38 2.36317 7.852 288.71 21.03 1003.321 30.66 60.92 1.85206 13.8158 7615.35 24.9893 26.2525 42.9449 752.837 46.5084 OpenBenchmarking.org
TensorFlow Device: GPU - Batch Size: 512 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: VGG-16 phoronix-ml.txt 0.5738 1.1476 1.7214 2.2952 2.869 SE +/- 0.00, N = 3 2.55
TensorFlow Device: GPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: VGG-16 phoronix-ml.txt 0.5738 1.1476 1.7214 2.2952 2.869 SE +/- 0.01, N = 3 2.55
TensorFlow Device: GPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: ResNet-50 phoronix-ml.txt 3 6 9 12 15 SE +/- 0.03, N = 3 9.34
Scikit-Learn Benchmark: Isotonic / Pathological OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Pathological phoronix-ml.txt 800 1600 2400 3200 4000 SE +/- 9.06, N = 3 3843.26 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: ResNet-50 phoronix-ml.txt 3 6 9 12 15 SE +/- 0.01, N = 3 9.36
TensorFlow Device: GPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: VGG-16 phoronix-ml.txt 0.5738 1.1476 1.7214 2.2952 2.869 SE +/- 0.00, N = 3 2.55
Scikit-Learn Benchmark: Isotonic / Perturbed Logarithm OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Perturbed Logarithm phoronix-ml.txt 300 600 900 1200 1500 SE +/- 2.36, N = 3 1528.97 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: GoogLeNet phoronix-ml.txt 7 14 21 28 35 SE +/- 0.10, N = 3 28.73
Scikit-Learn Benchmark: Isotonic / Logistic OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Logistic phoronix-ml.txt 300 600 900 1200 1500 SE +/- 0.82, N = 3 1406.45 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 512 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: VGG-16 phoronix-ml.txt 7 14 21 28 35 SE +/- 0.01, N = 3 30.43
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 phoronix-ml.txt 13 26 39 52 65 SE +/- 0.91, N = 9 59.70
TensorFlow Device: GPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: VGG-16 phoronix-ml.txt 0.5715 1.143 1.7145 2.286 2.8575 SE +/- 0.00, N = 3 2.54
TensorFlow Device: GPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: AlexNet phoronix-ml.txt 11 22 33 44 55 SE +/- 0.09, N = 3 49.28
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.31.1 Backend: BLAS phoronix-ml.txt 40 80 120 160 200 SE +/- 12.39, N = 9 184 1. (CXX) g++ options: -flto -pthread
Scikit-Learn Benchmark: SAGA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SAGA phoronix-ml.txt 140 280 420 560 700 SE +/- 3.66, N = 3 669.12 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: GoogLeNet phoronix-ml.txt 7 14 21 28 35 SE +/- 0.01, N = 3 29.03
TensorFlow Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 phoronix-ml.txt 13 26 39 52 65 SE +/- 0.33, N = 3 58.50
TensorFlow Device: CPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: VGG-16 phoronix-ml.txt 7 14 21 28 35 SE +/- 0.08, N = 3 30.16
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l phoronix-ml.txt 3 6 9 12 15 SE +/- 0.09, N = 12 9.94 MIN: 7.81 / MAX: 10.45
TensorFlow Device: GPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: ResNet-50 phoronix-ml.txt 3 6 9 12 15 SE +/- 0.00, N = 3 9.24
TensorFlow Device: CPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: GoogLeNet phoronix-ml.txt 40 80 120 160 200 SE +/- 1.69, N = 7 185.25
TensorFlow Device: GPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: VGG-16 phoronix-ml.txt 0.5648 1.1296 1.6944 2.2592 2.824 SE +/- 0.00, N = 3 2.51
Scikit-Learn Benchmark: Sparse Random Projections / 100 Iterations OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparse Random Projections / 100 Iterations phoronix-ml.txt 110 220 330 440 550 SE +/- 2.62, N = 3 504.83 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Adult OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Adult phoronix-ml.txt 30 60 90 120 150 SE +/- 1.23, N = 12 153.34 1. (F9X) gfortran options: -O0
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union phoronix-ml.txt 130 260 390 520 650 SE +/- 1.41, N = 3 579.17 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
TensorFlow Device: GPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: AlexNet phoronix-ml.txt 11 22 33 44 55 SE +/- 0.03, N = 3 48.92
Scikit-Learn Benchmark: Plot Parallel Pairwise OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Parallel Pairwise phoronix-ml.txt 40 80 120 160 200 SE +/- 4.47, N = 9 168.00 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Higgs Boson OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Higgs Boson phoronix-ml.txt 15 30 45 60 75 SE +/- 0.83, N = 3 65.74 1. (F9X) gfortran options: -O0
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet phoronix-ml.txt 3 6 9 12 15 SE +/- 0.30, N = 15 9.80 MIN: 6.73 / MAX: 273.49 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer phoronix-ml.txt 9 18 27 36 45 SE +/- 0.18, N = 15 40.59 MIN: 37.83 / MAX: 299.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m phoronix-ml.txt 5 10 15 20 25 SE +/- 0.12, N = 15 18.58 MIN: 17.32 / MAX: 295.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd phoronix-ml.txt 4 8 12 16 20 SE +/- 0.12, N = 15 14.34 MIN: 13.13 / MAX: 263.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny phoronix-ml.txt 6 12 18 24 30 SE +/- 0.11, N = 15 24.14 MIN: 21.58 / MAX: 105.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.12, N = 15 13.85 MIN: 12.83 / MAX: 247.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 phoronix-ml.txt 3 6 9 12 15 SE +/- 0.13, N = 15 13.33 MIN: 11.94 / MAX: 281.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet phoronix-ml.txt 1.251 2.502 3.753 5.004 6.255 SE +/- 0.07, N = 15 5.56 MIN: 5.07 / MAX: 35.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.07, N = 15 8.02 MIN: 7.54 / MAX: 17.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 phoronix-ml.txt 6 12 18 24 30 SE +/- 0.34, N = 15 25.71 MIN: 22.56 / MAX: 344.2 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet phoronix-ml.txt 4 8 12 16 20 SE +/- 0.15, N = 15 16.42 MIN: 15.38 / MAX: 271.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface phoronix-ml.txt 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.02, N = 15 3.11 MIN: 2.85 / MAX: 11.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.09, N = 15 8.28 MIN: 7.57 / MAX: 296.15 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet phoronix-ml.txt 2 4 6 8 10 SE +/- 0.11, N = 15 6.11 MIN: 5.26 / MAX: 321.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.09, N = 15 8.15 MIN: 7.52 / MAX: 291.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.04, N = 15 6.45 MIN: 5.96 / MAX: 63.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.05, N = 15 6.30 MIN: 5.63 / MAX: 33.15 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet phoronix-ml.txt 4 8 12 16 20 SE +/- 0.12, N = 15 13.85 MIN: 12.83 / MAX: 247.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Scikit-Learn Benchmark: Covertype Dataset Benchmark OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Covertype Dataset Benchmark phoronix-ml.txt 70 140 210 280 350 SE +/- 0.61, N = 3 320.39 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Lasso OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Lasso phoronix-ml.txt 70 140 210 280 350 SE +/- 0.07, N = 3 308.23 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: ResNet-50 phoronix-ml.txt 3 6 9 12 15 SE +/- 0.01, N = 3 9.14
Scikit-Learn Benchmark: SGDOneClassSVM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGDOneClassSVM phoronix-ml.txt 50 100 150 200 250 SE +/- 0.33, N = 3 233.41 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: TSNE MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: TSNE MNIST Dataset phoronix-ml.txt 50 100 150 200 250 SE +/- 0.66, N = 3 247.56 1. (F9X) gfortran options: -O0
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU phoronix-ml.txt 3 6 9 12 15 SE +/- 0.18, N = 15 11.53 MIN: 5.76 / MAX: 42.76 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU phoronix-ml.txt 400 800 1200 1600 2000 SE +/- 35.59, N = 15 2052.02 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU phoronix-ml.txt 20 40 60 80 100 SE +/- 2.03, N = 15 93.29 MIN: 31.12 / MAX: 185.66 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU phoronix-ml.txt 30 60 90 120 150 SE +/- 3.20, N = 15 129.45 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU phoronix-ml.txt 20 40 60 80 100 SE +/- 0.82, N = 15 100.46 MIN: 32.5 / MAX: 161.81 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU phoronix-ml.txt 30 60 90 120 150 SE +/- 1.07, N = 15 119.41 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 phoronix-ml.txt 4K 8K 12K 16K 20K SE +/- 520.83, N = 15 20372.6
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile phoronix-ml.txt 7K 14K 21K 28K 35K SE +/- 419.25, N = 15 33662.5
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet phoronix-ml.txt 400 800 1200 1600 2000 SE +/- 17.45, N = 15 1836.28
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU phoronix-ml.txt 6 12 18 24 30 SE +/- 0.43, N = 15 25.83 MIN: 10.2 / MAX: 57.07 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU phoronix-ml.txt 100 200 300 400 500 SE +/- 9.05, N = 15 465.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU phoronix-ml.txt 3 6 9 12 15 SE +/- 0.14, N = 15 11.12 MIN: 4.52 / MAX: 34.07 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU phoronix-ml.txt 200 400 600 800 1000 SE +/- 15.55, N = 15 1077.87 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel phoronix-ml.txt 1.1087 2.2174 3.3261 4.4348 5.5435 SE +/- 0.04054, N = 15 4.92768 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel phoronix-ml.txt 40 80 120 160 200 SE +/- 1.63, N = 15 203.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Scikit-Learn Benchmark: Isolation Forest OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isolation Forest phoronix-ml.txt 40 80 120 160 200 SE +/- 0.54, N = 3 176.29 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: GoogLeNet phoronix-ml.txt 7 14 21 28 35 SE +/- 0.02, N = 3 28.53
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel phoronix-ml.txt 200 400 600 800 1000 SE +/- 16.79, N = 12 834.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel phoronix-ml.txt 0.2708 0.5416 0.8124 1.0832 1.354 SE +/- 0.02346, N = 12 1.20339 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: VGG-16 phoronix-ml.txt 7 14 21 28 35 SE +/- 0.02, N = 3 29.05
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU phoronix-ml.txt 14 28 42 56 70 SE +/- 1.08, N = 12 63.99 MIN: 29.61 / MAX: 110.09 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU phoronix-ml.txt 40 80 120 160 200 SE +/- 3.37, N = 12 187.88 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Scikit-Learn Benchmark: GLM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: GLM phoronix-ml.txt 40 80 120 160 200 SE +/- 0.81, N = 3 168.33 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting phoronix-ml.txt 40 80 120 160 200 SE +/- 0.90, N = 3 166.78 1. (F9X) gfortran options: -O0
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union phoronix-ml.txt 50 100 150 200 250 SE +/- 0.41, N = 3 218.19 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
TensorFlow Device: GPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: ResNet-50 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.01, N = 3 8.94
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l phoronix-ml.txt 3 6 9 12 15 SE +/- 0.07, N = 3 9.85 MIN: 8.27 / MAX: 10.3
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l phoronix-ml.txt 3 6 9 12 15 SE +/- 0.09, N = 3 9.90 MIN: 8.07 / MAX: 10.35
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l phoronix-ml.txt 3 6 9 12 15 SE +/- 0.05, N = 3 9.98 MIN: 7.98 / MAX: 10.29
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: inception-v3 phoronix-ml.txt 8 16 24 32 40 SE +/- 0.04, N = 3 36.45 MIN: 36.16 / MAX: 50.97 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: mobilenet-v1-1.0 phoronix-ml.txt 0.8514 1.7028 2.5542 3.4056 4.257 SE +/- 0.007, N = 3 3.784 MIN: 3.71 / MAX: 6.58 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: MobileNetV2_224 phoronix-ml.txt 0.7353 1.4706 2.2059 2.9412 3.6765 SE +/- 0.042, N = 3 3.268 MIN: 3.15 / MAX: 5.05 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: SqueezeNetV1.0 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.200, N = 3 6.429 MIN: 5.97 / MAX: 7.14 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: resnet-v2-50 phoronix-ml.txt 5 10 15 20 25 SE +/- 0.11, N = 3 18.53 MIN: 18.26 / MAX: 28.9 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: squeezenetv1.1 phoronix-ml.txt 0.9736 1.9472 2.9208 3.8944 4.868 SE +/- 0.117, N = 3 4.327 MIN: 3.96 / MAX: 6.65 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: mobilenetV3 phoronix-ml.txt 0.5706 1.1412 1.7118 2.2824 2.853 SE +/- 0.008, N = 3 2.536 MIN: 2.4 / MAX: 3.47 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.9.b11b7037d Model: nasnet phoronix-ml.txt 4 8 12 16 20 SE +/- 0.02, N = 3 15.30 MIN: 14.65 / MAX: 21.59 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l phoronix-ml.txt 3 6 9 12 15 SE +/- 0.05, N = 3 10.11 MIN: 8.31 / MAX: 10.38
Scikit-Learn Benchmark: Plot Hierarchical OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Hierarchical phoronix-ml.txt 30 60 90 120 150 SE +/- 0.41, N = 3 141.46 1. (F9X) gfortran options: -O0
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 phoronix-ml.txt 300 600 900 1200 1500 SE +/- 7.84, N = 3 1398 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small phoronix-ml.txt 300 600 900 1200 1500 SE +/- 5.55, N = 3 1464 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large phoronix-ml.txt 500 1000 1500 2000 2500 SE +/- 6.66, N = 3 2128 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 phoronix-ml.txt 300 600 900 1200 1500 SE +/- 15.14, N = 3 1495 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 phoronix-ml.txt 200 400 600 800 1000 SE +/- 6.56, N = 3 1144 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small phoronix-ml.txt 300 600 900 1200 1500 SE +/- 3.61, N = 3 1503 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large phoronix-ml.txt 500 1000 1500 2000 2500 SE +/- 12.67, N = 3 2465 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 phoronix-ml.txt 400 800 1200 1600 2000 SE +/- 14.40, N = 3 1873 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 phoronix-ml.txt 300 600 900 1200 1500 SE +/- 2.52, N = 3 1233 1. (CXX) g++ options: -O3 -lrt -lm
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D phoronix-ml.txt 70 140 210 280 350 SE +/- 3.81, N = 15 298.49 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network phoronix-ml.txt 7K 14K 21K 28K 35K SE +/- 1066.17, N = 15 33080 1. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt
Scikit-Learn Benchmark: Hist Gradient Boosting Categorical Only OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Categorical Only phoronix-ml.txt 7 14 21 28 35 SE +/- 0.30, N = 15 30.19 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Neighbors OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Neighbors phoronix-ml.txt 30 60 90 120 150 SE +/- 0.47, N = 3 114.84 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: AlexNet phoronix-ml.txt 11 22 33 44 55 SE +/- 0.04, N = 3 47.85
Scikit-Learn Benchmark: Sparsify OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparsify phoronix-ml.txt 20 40 60 80 100 SE +/- 0.30, N = 3 108.46 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Polynomial Kernel Approximation OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Polynomial Kernel Approximation phoronix-ml.txt 20 40 60 80 100 SE +/- 0.04, N = 3 104.70 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Feature Expansions OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Feature Expansions phoronix-ml.txt 20 40 60 80 100 SE +/- 0.56, N = 3 100.35 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: GoogLeNet phoronix-ml.txt 7 14 21 28 35 SE +/- 0.04, N = 3 27.91
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: GoogLeNet phoronix-ml.txt 50 100 150 200 250 SE +/- 0.25, N = 3 227.06
Scikit-Learn Benchmark: Plot Ward OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Ward phoronix-ml.txt 10 20 30 40 50 SE +/- 0.35, N = 8 42.10 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: VGG-16 phoronix-ml.txt 7 14 21 28 35 SE +/- 0.05, N = 3 28.32
Scikit-Learn Benchmark: Sample Without Replacement OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sample Without Replacement phoronix-ml.txt 20 40 60 80 100 SE +/- 0.76, N = 3 90.64 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.08, N = 3 17.64 MIN: 14.41 / MAX: 18.1
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.06, N = 3 17.92 MIN: 14.68 / MAX: 18.36
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.07, N = 3 17.99 MIN: 14.67 / MAX: 18.38
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.03, N = 3 17.78 MIN: 15.02 / MAX: 18.13
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.07, N = 3 17.97 MIN: 14.56 / MAX: 18.28
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark phoronix-ml.txt 150 300 450 600 750 SE +/- 5.45, N = 3 715.50
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 phoronix-ml.txt 16 32 48 64 80 SE +/- 0.33, N = 3 70.89
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union phoronix-ml.txt 20 40 60 80 100 SE +/- 0.44, N = 3 92.75 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Scikit-Learn Benchmark: Tree OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Tree phoronix-ml.txt 11 22 33 44 55 SE +/- 0.48, N = 5 46.97 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: AlexNet phoronix-ml.txt 140 280 420 560 700 SE +/- 1.19, N = 3 643.44
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet phoronix-ml.txt 3 6 9 12 15 SE +/- 0.39, N = 3 9.82 MIN: 8.74 / MAX: 19.33 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer phoronix-ml.txt 9 18 27 36 45 SE +/- 0.27, N = 3 41.05 MIN: 38.99 / MAX: 101.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m phoronix-ml.txt 5 10 15 20 25 SE +/- 0.09, N = 3 18.63 MIN: 18.07 / MAX: 108.4 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd phoronix-ml.txt 4 8 12 16 20 SE +/- 1.72, N = 3 16.04 MIN: 13.26 / MAX: 580.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny phoronix-ml.txt 6 12 18 24 30 SE +/- 0.41, N = 3 23.64 MIN: 22.17 / MAX: 39.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 phoronix-ml.txt 4 8 12 16 20 SE +/- 0.10, N = 3 13.79 MIN: 13.15 / MAX: 23.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 phoronix-ml.txt 4 8 12 16 20 SE +/- 1.16, N = 3 14.65 MIN: 12.41 / MAX: 377.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet phoronix-ml.txt 1.188 2.376 3.564 4.752 5.94 SE +/- 0.02, N = 3 5.28 MIN: 5.1 / MAX: 15.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.08, N = 3 7.86 MIN: 7.54 / MAX: 15.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 phoronix-ml.txt 6 12 18 24 30 SE +/- 0.31, N = 3 25.13 MIN: 23.1 / MAX: 139.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet phoronix-ml.txt 4 8 12 16 20 SE +/- 0.04, N = 3 16.01 MIN: 15.51 / MAX: 26.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface phoronix-ml.txt 0.7065 1.413 2.1195 2.826 3.5325 SE +/- 0.00, N = 3 3.14 MIN: 3.01 / MAX: 8.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.37, N = 3 8.68 MIN: 7.76 / MAX: 202.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet phoronix-ml.txt 1.3478 2.6956 4.0434 5.3912 6.739 SE +/- 0.01, N = 3 5.99 MIN: 5.68 / MAX: 14.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.05, N = 3 8.06 MIN: 7.83 / MAX: 15.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.03, N = 3 6.49 MIN: 6.21 / MAX: 15.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.02, N = 3 6.31 MIN: 5.98 / MAX: 14.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet phoronix-ml.txt 4 8 12 16 20 SE +/- 0.10, N = 3 13.79 MIN: 13.15 / MAX: 23.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Scikit-Learn Benchmark: Hist Gradient Boosting Threading OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Threading phoronix-ml.txt 12 24 36 48 60 SE +/- 0.61, N = 4 52.73 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: SGD Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGD Regression phoronix-ml.txt 14 28 42 56 70 SE +/- 0.08, N = 3 64.37 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Samples OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Samples phoronix-ml.txt 14 28 42 56 70 SE +/- 0.24, N = 3 61.61 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l phoronix-ml.txt 4 8 12 16 20 SE +/- 0.19, N = 3 14.18 MIN: 12.11 / MAX: 14.81
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel phoronix-ml.txt 4 8 12 16 20 SE +/- 0.20, N = 4 17.29 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel phoronix-ml.txt 13 26 39 52 65 SE +/- 0.67, N = 4 57.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard phoronix-ml.txt 3 6 9 12 15 SE +/- 0.10702, N = 4 9.10376 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard phoronix-ml.txt 20 40 60 80 100 SE +/- 1.32, N = 4 109.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard phoronix-ml.txt 1.3084 2.6168 3.9252 5.2336 6.542 SE +/- 0.07434, N = 4 5.81489 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard phoronix-ml.txt 40 80 120 160 200 SE +/- 2.15, N = 4 172.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
TensorFlow Device: GPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: AlexNet phoronix-ml.txt 10 20 30 40 50 SE +/- 0.04, N = 3 46.14
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU phoronix-ml.txt 300 600 900 1200 1500 SE +/- 9.63, N = 3 1261.40 MIN: 1196.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU phoronix-ml.txt 0.2557 0.5114 0.7671 1.0228 1.2785 SE +/- 0.00979, N = 15 1.13657 MIN: 1.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops phoronix-ml.txt 20K 40K 60K 80K 100K SE +/- 230.38, N = 3 93757.3 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU phoronix-ml.txt 160 320 480 640 800 SE +/- 8.66, N = 3 736.40 MIN: 639.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Scikit-Learn Benchmark: MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: MNIST Dataset phoronix-ml.txt 12 24 36 48 60 SE +/- 0.42, N = 3 52.74 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: GoogLeNet phoronix-ml.txt 6 12 18 24 30 SE +/- 0.01, N = 3 26.91
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: VGG-16 phoronix-ml.txt 6 12 18 24 30 SE +/- 0.05, N = 3 27.34
Scikit-Learn Benchmark: Plot Incremental PCA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Incremental PCA phoronix-ml.txt 7 14 21 28 35 SE +/- 0.08, N = 3 31.21 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Text Vectorizers OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Text Vectorizers phoronix-ml.txt 10 20 30 40 50 SE +/- 0.18, N = 3 45.34 1. (F9X) gfortran options: -O0
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU phoronix-ml.txt 130 260 390 520 650 SE +/- 2.89, N = 3 607.48 MIN: 575.29 / MAX: 657.06 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU phoronix-ml.txt 5 10 15 20 25 SE +/- 0.09, N = 3 19.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU phoronix-ml.txt 70 140 210 280 350 SE +/- 0.15, N = 3 320.42 MIN: 299.06 / MAX: 380.18 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU phoronix-ml.txt 9 18 27 36 45 SE +/- 0.02, N = 3 37.36 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel phoronix-ml.txt 170 340 510 680 850 SE +/- 2.03, N = 3 770.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel phoronix-ml.txt 0.2919 0.5838 0.8757 1.1676 1.4595 SE +/- 0.00341, N = 3 1.29718 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard phoronix-ml.txt 100 200 300 400 500 SE +/- 1.06, N = 3 452.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard phoronix-ml.txt 0.4974 0.9948 1.4922 1.9896 2.487 SE +/- 0.00518, N = 3 2.21086 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard phoronix-ml.txt 50 100 150 200 250 SE +/- 0.28, N = 3 242.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard phoronix-ml.txt 0.9263 1.8526 2.7789 3.7052 4.6315 SE +/- 0.00482, N = 3 4.11676 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel phoronix-ml.txt 40 80 120 160 200 SE +/- 2.57, N = 3 185.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel phoronix-ml.txt 1.2143 2.4286 3.6429 4.8572 6.0715 SE +/- 0.07557, N = 3 5.39684 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel phoronix-ml.txt 0.8264 1.6528 2.4792 3.3056 4.132 SE +/- 0.01909, N = 3 3.67307 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel phoronix-ml.txt 60 120 180 240 300 SE +/- 1.41, N = 3 272.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard phoronix-ml.txt 20 40 60 80 100 SE +/- 0.38, N = 3 103.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard phoronix-ml.txt 3 6 9 12 15 SE +/- 0.03544, N = 3 9.65442 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU phoronix-ml.txt 4 8 12 16 20 SE +/- 0.06, N = 3 17.62 MIN: 9.01 / MAX: 33.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU phoronix-ml.txt 150 300 450 600 750 SE +/- 2.47, N = 3 679.66 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU phoronix-ml.txt 2 4 6 8 10 SE +/- 0.01, N = 3 6.18 MIN: 3.57 / MAX: 20.46 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU phoronix-ml.txt 400 800 1200 1600 2000 SE +/- 3.67, N = 3 1930.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 phoronix-ml.txt 7K 14K 21K 28K 35K SE +/- 434.39, N = 3 33356.3
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float phoronix-ml.txt 300 600 900 1200 1500 SE +/- 4.96, N = 3 1381.25
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant phoronix-ml.txt 500 1000 1500 2000 2500 SE +/- 12.96, N = 3 2501.21
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU phoronix-ml.txt 1.062 2.124 3.186 4.248 5.31 SE +/- 0.01, N = 3 4.72 MIN: 2.72 / MAX: 17.21 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU phoronix-ml.txt 500 1000 1500 2000 2500 SE +/- 3.45, N = 3 2523.59 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU phoronix-ml.txt 0.8055 1.611 2.4165 3.222 4.0275 SE +/- 0.00, N = 3 3.58 MIN: 1.95 / MAX: 16.91 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU phoronix-ml.txt 1400 2800 4200 5600 7000 SE +/- 8.36, N = 3 6458.38 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU phoronix-ml.txt 5 10 15 20 25 SE +/- 0.08, N = 3 21.58 MIN: 16.4 / MAX: 43.47 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU phoronix-ml.txt 200 400 600 800 1000 SE +/- 4.28, N = 3 1108.65 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU phoronix-ml.txt 0.0675 0.135 0.2025 0.27 0.3375 SE +/- 0.00, N = 3 0.3 MIN: 0.17 / MAX: 9.44 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU phoronix-ml.txt 14K 28K 42K 56K 70K SE +/- 38.55, N = 3 67537.77 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU phoronix-ml.txt 1.2398 2.4796 3.7194 4.9592 6.199 SE +/- 0.01, N = 3 5.51 MIN: 2.97 / MAX: 19.15 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU phoronix-ml.txt 500 1000 1500 2000 2500 SE +/- 3.34, N = 3 2160.57 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU phoronix-ml.txt 6 12 18 24 30 SE +/- 0.09, N = 3 23.12 MIN: 14.9 / MAX: 38.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU phoronix-ml.txt 200 400 600 800 1000 SE +/- 4.12, N = 3 1035.29 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU phoronix-ml.txt 0.0968 0.1936 0.2904 0.3872 0.484 SE +/- 0.00, N = 3 0.43 MIN: 0.23 / MAX: 11.96 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU phoronix-ml.txt 10K 20K 30K 40K 50K SE +/- 17.34, N = 3 48433.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU phoronix-ml.txt 3 6 9 12 15 SE +/- 0.01, N = 3 12.25 MIN: 6.32 / MAX: 26.6 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU phoronix-ml.txt 400 800 1200 1600 2000 SE +/- 1.95, N = 3 1947.67 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU phoronix-ml.txt 2 4 6 8 10 SE +/- 0.01, N = 3 6.31 MIN: 3.31 / MAX: 21.37 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU phoronix-ml.txt 800 1600 2400 3200 4000 SE +/- 3.08, N = 3 3742.77 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU phoronix-ml.txt 0.621 1.242 1.863 2.484 3.105 SE +/- 0.01, N = 3 2.76 MIN: 1.41 / MAX: 15.61 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU phoronix-ml.txt 900 1800 2700 3600 4500 SE +/- 13.60, N = 3 4273.09 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard phoronix-ml.txt 0.3516 0.7032 1.0548 1.4064 1.758 SE +/- 0.01747, N = 3 1.56253 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard phoronix-ml.txt 140 280 420 560 700 SE +/- 7.24, N = 3 639.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel phoronix-ml.txt 3 6 9 12 15 SE +/- 0.03096, N = 3 9.18292 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel phoronix-ml.txt 20 40 60 80 100 SE +/- 0.36, N = 3 108.88 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard phoronix-ml.txt 0.6852 1.3704 2.0556 2.7408 3.426 SE +/- 0.00627, N = 3 3.04551 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard phoronix-ml.txt 70 140 210 280 350 SE +/- 0.68, N = 3 328.29 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel phoronix-ml.txt 2 4 6 8 10 SE +/- 0.06021, N = 3 8.08149 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel phoronix-ml.txt 30 60 90 120 150 SE +/- 0.93, N = 3 123.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard phoronix-ml.txt 3 6 9 12 15 SE +/- 0.02, N = 3 10.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard phoronix-ml.txt 20 40 60 80 100 SE +/- 0.21, N = 3 97.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 phoronix-ml.txt 15 30 45 60 75 SE +/- 0.07, N = 3 67.94
Scikit-Learn Benchmark: Plot OMP vs. LARS OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot OMP vs. LARS phoronix-ml.txt 9 18 27 36 45 SE +/- 0.09, N = 3 41.48 1. (F9X) gfortran options: -O0
TensorFlow Device: GPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: VGG-16 phoronix-ml.txt 0.495 0.99 1.485 1.98 2.475 SE +/- 0.00, N = 3 2.20
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 phoronix-ml.txt 6 12 18 24 30 SE +/- 0.14, N = 3 23.19 MIN: 19.02 / MAX: 24.35
TensorFlow Device: GPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: AlexNet phoronix-ml.txt 4 8 12 16 20 SE +/- 0.14, N = 15 15.38
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: AlexNet phoronix-ml.txt 140 280 420 560 700 SE +/- 0.20, N = 3 627.50
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU phoronix-ml.txt 0.3141 0.6282 0.9423 1.2564 1.5705 SE +/- 0.01618, N = 15 1.39591 MIN: 1.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 phoronix-ml.txt 10 20 30 40 50 SE +/- 0.20, N = 3 45.59 MIN: 38.82 / MAX: 46.87
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 phoronix-ml.txt 10 20 30 40 50 SE +/- 0.29, N = 3 45.75 MIN: 41.27 / MAX: 46.82
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 phoronix-ml.txt 10 20 30 40 50 SE +/- 0.23, N = 3 46.06 MIN: 38.74 / MAX: 46.92
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 phoronix-ml.txt 11 22 33 44 55 SE +/- 0.47, N = 3 46.42 MIN: 41.66 / MAX: 47.49
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 phoronix-ml.txt 11 22 33 44 55 SE +/- 0.10, N = 3 46.69 MIN: 42.52 / MAX: 47.36
TensorFlow Device: GPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: AlexNet phoronix-ml.txt 10 20 30 40 50 SE +/- 0.02, N = 3 42.43
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Components OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Components phoronix-ml.txt 7 14 21 28 35 SE +/- 0.33, N = 3 31.04 1. (F9X) gfortran options: -O0
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU phoronix-ml.txt 10 20 30 40 50 SE +/- 0.15, N = 3 46.23
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: GoogLeNet phoronix-ml.txt 50 100 150 200 250 SE +/- 0.29, N = 3 225.16
Scikit-Learn Benchmark: LocalOutlierFactor OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: LocalOutlierFactor phoronix-ml.txt 5 10 15 20 25 SE +/- 0.13, N = 3 21.62 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 phoronix-ml.txt 14 28 42 56 70 SE +/- 0.06, N = 3 62.29
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU phoronix-ml.txt 0.8495 1.699 2.5485 3.398 4.2475 SE +/- 0.00801, N = 3 3.77567 MIN: 2.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 phoronix-ml.txt 13 26 39 52 65 SE +/- 0.03, N = 3 60.17 MIN: 49.64 / MAX: 62.8
TensorFlow Device: GPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: ResNet-50 phoronix-ml.txt 2 4 6 8 10 SE +/- 0.03, N = 3 6.69
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark phoronix-ml.txt 0.0282 0.0564 0.0846 0.1128 0.141 SE +/- 0.0007, N = 3 0.1252
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: GoogLeNet phoronix-ml.txt 50 100 150 200 250 SE +/- 0.23, N = 3 218.07
Scikit-Learn Benchmark: 20 Newsgroups / Logistic Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: 20 Newsgroups / Logistic Regression phoronix-ml.txt 3 6 9 12 15 SE +/- 0.06, N = 3 10.45 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: AlexNet phoronix-ml.txt 110 220 330 440 550 SE +/- 0.16, N = 3 516.18
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: VGG-16 phoronix-ml.txt 3 6 9 12 15 SE +/- 0.00, N = 3 9.70
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: GoogLeNet phoronix-ml.txt 40 80 120 160 200 SE +/- 0.22, N = 3 198.11
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: AlexNet phoronix-ml.txt 90 180 270 360 450 SE +/- 0.28, N = 3 409.56
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 phoronix-ml.txt 5 10 15 20 25 SE +/- 0.11, N = 3 18.38
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU phoronix-ml.txt 0.5317 1.0634 1.5951 2.1268 2.6585 SE +/- 0.02737, N = 4 2.36317 MIN: 1.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
RNNoise Input: 26 Minute Long Talking Sample OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 0.2 Input: 26 Minute Long Talking Sample phoronix-ml.txt 2 4 6 8 10 SE +/- 0.019, N = 3 7.852 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: AlexNet phoronix-ml.txt 60 120 180 240 300 SE +/- 0.39, N = 3 288.71
TensorFlow Device: GPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: GoogLeNet phoronix-ml.txt 5 10 15 20 25 SE +/- 0.15, N = 3 21.03
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth phoronix-ml.txt 200 400 600 800 1000 SE +/- 5.65, N = 3 1003.32 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: AlexNet phoronix-ml.txt 7 14 21 28 35 SE +/- 0.00, N = 3 30.66
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: GoogLeNet phoronix-ml.txt 14 28 42 56 70 SE +/- 0.30, N = 3 60.92
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU phoronix-ml.txt 0.4167 0.8334 1.2501 1.6668 2.0835 SE +/- 0.01905, N = 4 1.85206 MIN: 1.73 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad phoronix-ml.txt 6 12 18 24 30 SE +/- 0.23, N = 6 23.05 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N phoronix-ml.txt 2K 4K 6K 8K 10K SE +/- 23.01, N = 3 8470.02 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download phoronix-ml.txt 6 12 18 24 30 SE +/- 0.00, N = 3 24.99 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback phoronix-ml.txt 6 12 18 24 30 SE +/- 0.00, N = 3 26.25 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction phoronix-ml.txt 130 260 390 520 650 SE +/- 0.41, N = 3 595.05 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP phoronix-ml.txt 600 1200 1800 2400 3000 SE +/- 2.81, N = 3 2703.37 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash phoronix-ml.txt 11 22 33 44 55 SE +/- 0.68, N = 3 49.64 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5