Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs.

To run this test with the Phoronix Test Suite, the basic command is: phoronix-test-suite benchmark caffe.

Project Site

caffe.berkeleyvision.org

Test Created

14 November 2015

Last Updated

26 September 2020

Test Maintainer

Michael Larabel

Test Type

System

Average Install Time

24 Seconds

Average Run Time

2 Minutes, 1 Second

Test Dependencies

C/C++ Compiler Toolchain + CMake + Python + BLAS (Basic Linear Algebra Sub-Routine) + C++ Boost + Linear Algebra Pack + Snappy Compression + GFlags + OpenCV + HDF5

Accolades

150k+ Downloads

Supported Platforms

* Uploading of benchmark result data to OpenBenchmarking.org is always optional (opt-in) via the Phoronix Test Suite for users wishing to share their results publicly.
** Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform.
*** Test profile page view reporting began March 2021.
Data updated weekly as of 19 January 2025.

Revision History

pts/caffe-1.5.0 [View Source] Sat, 26 Sep 2020 21:35:45 GMT
Overhaul Caffe test profile with latest Git snapshot, switch to CMake build system, clean up test options, etc.

pts/caffe-1.4.0 [View Source] Sat, 29 Dec 2018 11:15:41 GMT
Update Caffe to latest Git snapshot to hopefully workaround build problems on newer distros.

pts/caffe-1.3.3 [View Source] Sun, 01 Apr 2018 18:50:19 GMT
Basic fix for OpenCV 3.4.

pts/caffe-1.3.2 [View Source] Wed, 04 Jan 2017 11:07:36 GMT
Fix for OpenCV 3.2.

pts/caffe-1.3.1 [View Source] Wed, 28 Dec 2016 20:36:42 GMT
Don't show title string of "Caffe AlexNet" but "Caffe" with recent test profile versions supporting more than just AlexNet.

pts/caffe-1.3.0 [View Source] Wed, 28 Dec 2016 20:34:27 GMT
Update to latest Git snapshot to fix OpenCV compatibility.

pts/caffe-1.2.0 [View Source] Mon, 15 Aug 2016 16:11:16 GMT
Add Googlenet support, decrease CPU only iteration count.

pts/caffe-1.1.1 [View Source] Sun, 12 Jun 2016 18:32:44 GMT
Add OpenCV and OpenBLAS support.

pts/caffe-1.1.0 [View Source] Sat, 11 Jun 2016 19:32:55 GMT
Update

pts/caffe-1.0.0 [View Source] Sat, 14 Nov 2015 15:29:45 GMT
Initial commit of Caffe deep learning framework and with this benchmark using the AlexNet model for benchmarking.

Suites Using This Test

Machine Learning

HPC - High Performance Computing

NVIDIA GPU Compute

Performance Metrics

Analyze Test Configuration:

Caffe 2020-02-13

Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100

OpenBenchmarking.org metrics for this test profile configuration based on 62 public results since 26 September 2020 with the latest data as of 23 February 2024.

Additional benchmark metrics will come after OpenBenchmarking.org has collected a sufficient data-set.

Based on OpenBenchmarking.org data, the selected test / test configuration (Caffe 2020-02-13 - Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100) has an average run-time of 2 minutes. By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.

Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 0.2%.

Notable Instruction Set Usage

Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.

Instruction Set

Support

Instructions Detected

SSE2 (SSE2)

Used by default on supported hardware.

PUNPCKLQDQ MOVDQA MOVDQU CVTSS2SD MOVD ADDSD DIVSD CVTTSD2SI MOVUPD CVTPS2PD CVTPD2PS CVTSD2SS PSHUFD XORPD SHUFPD SUBSD MULSD CVTSI2SD MOVAPD UCOMISD UNPCKLPD CVTDQ2PS COMISD CVTDQ2PD SQRTSD ANDPD ANDNPD CMPNLESD ORPD DIVPD MULPD MINSD MINPD MAXPD MAXSD CMPLTPD ADDPD CMPLTSD MOVHPD SUBPD MOVLPD UNPCKHPD PMULUDQ PSRLDQ

Advanced Vector Extensions (AVX)

Requires passing a supported compiler/build flag (verified with targets: sandybridge, skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Sandy Bridge (2011).
Found on AMD processors since Bulldozer (2011).

VZEROUPPER VINSERTF128 VEXTRACTF128 VPERM2F128 VPERMILPS VPERMILPD VBROADCASTSS VBROADCASTSD VMASKMOVPS

Advanced Vector Extensions 2 (AVX2)

Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Excavator (2016).

VPERM2I128 VPERMD VPERMPD VPBROADCASTQ VPBROADCASTD VPERMQ VGATHERQPS VEXTRACTI128 VPMASKMOVD VINSERTI128 VPGATHERDD VPBROADCASTW

FMA (FMA)

VFMADD132SS VFMADD132SD VFMSUB213PS VFMSUB132SS VFMSUB213PD VFMSUB132SD VFNMADD213SD VFNMADD213SS VFMADD231SS VFNMADD231SS VFMADD213SS VFNMADD132SS VFMADD231SD VFNMADD132SD VFMADD213SD VFMADD132PS VFMADD132PD VFNMADD132PD VFNMADD213PD VFNMADD132PS VFNMADD213PS VFMSUB231SD VFNMADD231SD VFMADD231PD

Advanced Vector Extensions 512 (AVX512)

Requires passing a supported compiler/build flag (verified with targets: cascadelake, sapphirerapids).

(ZMM REGISTER USE)

The test / benchmark does honor compiler flag changes.

Last automated analysis: 17 January 2022

This test profile binary relies on the shared libraries libcaffe.so.1.0.0, libglog.so.0, libgflags.so.2.2, libprotobuf.so.23, libc.so.6, libm.so.6, liblmdb.so.0, libopenblas.so.0, libunwind.so.8, libpthread.so.0, libz.so.1, libcrypto.so.3, libcurl.so.4, libsz.so.2, libgfortran.so.5, liblzma.so.5, libnghttp2.so.14, libidn2.so.0, librtmp.so.1, libssh.so.4, libpsl.so.5, libssl.so.3, libldap-2.5.so.0, liblber-2.5.so.0, libzstd.so.1, libbrotlidec.so.1, libaec.so.0, libquadmath.so.0, libunistring.so.2, libgnutls.so.30, libhogweed.so.6, libnettle.so.8, libgmp.so.10, libkrb5.so.3, libk5crypto.so.3, libkrb5support.so.0, libsasl2.so.2, libbrotlicommon.so.1, libp11-kit.so.0, libtasn1.so.6, libkeyutils.so.1, libresolv.so.2, libffi.so.8.

Tested CPU Architectures

This benchmark has been successfully tested on the below mentioned architectures. The CPU architectures listed is where successful OpenBenchmarking.org result uploads occurred, namely for helping to determine if a given test is compatible with various alternative CPU architectures.

CPU Architecture

Kernel Identifier

Verified On

Intel / AMD x86 64-bit

x86_64

(Many Processors)

Recent Test Results

Compare

RTX 4070 SUPER 6 Systems - 199 Benchmark Results	Intel Core Ultra 9 285K - MSI MEG Z890 UNIFY-X - Intel Device ae7f Ubuntu 24.10 - 6.12.1-061201-generic - GNOME Shell 47.0
RTX 4070 SUPER 5 Systems - 211 Benchmark Results	Intel Core Ultra 9 285K - MSI MEG Z890 UNIFY-X - Intel Device ae7f Ubuntu 24.10 - 6.12.1-061201-generic - GNOME Shell 47.0
test_002 1 System - 113 Benchmark Results	AMD Ryzen 9 7900X3D 12-Core - ASUS ProArt B650-CREATOR - AMD Device 14d8 Ubuntu 24.04 - 6.8.0-47-generic - GNOME Shell 46.0
bbbbbb 1 System - 116 Benchmark Results	AMD Ryzen 7 3700X 8-Core - ASUS TUF GAMING X570-PLUS - AMD Starship RockyLinux 9.4 - 6.1.102-1.el9.elrepo.x86_64 - KDE Plasma 5.27.11
gooxi-ai 2 Systems - 91 Benchmark Results	Intel Xeon Gold 6226R - QEMU Standard PC - Intel 440FX 82441FX PMC Ubuntu 24.04 - 6.8.0-39-generic - X Server
pts-result-gpu-30-05-2024.list Featured Compiler Comparison	GCC 4.8.5 20150623 + CUDA 12.3 - GCC 8.3.0 + CUDA 12.3