Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs.

To run this test with the Phoronix Test Suite, the basic command is: phoronix-test-suite benchmark caffe.

Test Created

14 November 2015

Last Updated

26 September 2020

Test Maintainer

Michael Larabel 

Test Type

System

Average Install Time

24 Seconds

Average Run Time

2 Minutes, 1 Second

Test Dependencies

C/C++ Compiler Toolchain + CMake + Python + BLAS (Basic Linear Algebra Sub-Routine) + C++ Boost + Linear Algebra Pack + Snappy Compression + GFlags + OpenCV + HDF5

Accolades

150k+ Downloads

Supported Platforms


Public Result Uploads *Reported Installs **Reported Test Completions **Test Profile Page Views ***OpenBenchmarking.orgEventsCaffe AlexNet Popularity Statisticspts/caffe2015.112016.022016.072016.102017.012017.042017.072017.102018.012018.042018.072018.102019.012019.042019.092020.032020.102021.012021.042021.072021.102022.012022.042022.072022.102023.012023.042023.072023.102024.013K6K9K12K15K
* Uploading of benchmark result data to OpenBenchmarking.org is always optional (opt-in) via the Phoronix Test Suite for users wishing to share their results publicly.
** Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform.
*** Test profile page view reporting began March 2021.
Data updated weekly as of 27 March 2024.
GoogleNet48.5%AlexNet51.5%Model Option PopularityOpenBenchmarking.org
NVIDIA CUDA 12.7%CPU87.3%Acceleration Option PopularityOpenBenchmarking.org
20045.4%100011.5%10043.1%Iterations Option PopularityOpenBenchmarking.org

Revision History

pts/caffe-1.5.0   [View Source]   Sat, 26 Sep 2020 21:35:45 GMT
Overhaul Caffe test profile with latest Git snapshot, switch to CMake build system, clean up test options, etc.

pts/caffe-1.4.0   [View Source]   Sat, 29 Dec 2018 11:15:41 GMT
Update Caffe to latest Git snapshot to hopefully workaround build problems on newer distros.

pts/caffe-1.3.3   [View Source]   Sun, 01 Apr 2018 18:50:19 GMT
Basic fix for OpenCV 3.4.

pts/caffe-1.3.2   [View Source]   Wed, 04 Jan 2017 11:07:36 GMT
Fix for OpenCV 3.2.

pts/caffe-1.3.1   [View Source]   Wed, 28 Dec 2016 20:36:42 GMT
Don't show title string of "Caffe AlexNet" but "Caffe" with recent test profile versions supporting more than just AlexNet.

pts/caffe-1.3.0   [View Source]   Wed, 28 Dec 2016 20:34:27 GMT
Update to latest Git snapshot to fix OpenCV compatibility.

pts/caffe-1.2.0   [View Source]   Mon, 15 Aug 2016 16:11:16 GMT
Add Googlenet support, decrease CPU only iteration count.

pts/caffe-1.1.1   [View Source]   Sun, 12 Jun 2016 18:32:44 GMT
Add OpenCV and OpenBLAS support.

pts/caffe-1.1.0   [View Source]   Sat, 11 Jun 2016 19:32:55 GMT
Update

pts/caffe-1.0.0   [View Source]   Sat, 14 Nov 2015 15:29:45 GMT
Initial commit of Caffe deep learning framework and with this benchmark using the AlexNet model for benchmarking.

Suites Using This Test

Machine Learning

HPC - High Performance Computing

NVIDIA GPU Compute


Performance Metrics

Analyze Test Configuration:

Caffe 2020-02-13

Model: GoogleNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.org metrics for this test profile configuration based on 582 public results since 26 September 2020 with the latest data as of 16 January 2024.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Percentile Rank
# Compatible Public Results
Milli-Seconds (Average)
97th
3
159788 +/- 158
96th
3
161086 +/- 443
95th
5
163199 +/- 1087
95th
6
163962 +/- 6183
94th
3
165855 +/- 198
94th
3
171773 +/- 1233
93rd
3
174912 +/- 2501
91st
3
181570 +/- 315
91st
9
181996 +/- 2369
87th
8
189382 +/- 4836
85th
16
194832 +/- 7751
84th
3
197890 +/- 608
84th
24
198758 +/- 15961
82nd
6
207320 +/- 777
81st
3
210984 +/- 1322
78th
5
219292 +/- 288
76th
3
229563 +/- 506
Mid-Tier
75th
> 229935
75th
6
231392 +/- 300
74th
3
232960 +/- 105
73rd
4
235882 +/- 111
71st
3
243080 +/- 876
71st
3
243969 +/- 179
70th
6
244991 +/- 12497
70th
3
246544 +/- 1380
69th
10
249764 +/- 6911
69th
4
251655 +/- 7329
68th
4
255692 +/- 419
67th
3
257068 +/- 5470
64th
8
260921 +/- 5831
64th
6
261239 +/- 4280
63rd
4
262272 +/- 1715
62nd
6
264203 +/- 1714
60th
3
269026 +/- 661
60th
3
269149 +/- 116
59th
3
269681 +/- 426
58th
3
273643 +/- 497
56th
3
285728 +/- 546
55th
3
290616 +/- 1545
54th
3
291823 +/- 266
53rd
4
294777 +/- 9737
52nd
6
295788 +/- 1476
52nd
3
298529 +/- 2820
51st
3
301802 +/- 142
Median
50th
301954
49th
3
306952 +/- 1815
46th
3
322939 +/- 535
43rd
4
325003 +/- 137
43rd
3
325846 +/- 11747
42nd
3
329996 +/- 966
41st
3
331143 +/- 366
41st
8
331590 +/- 11623
41st
3
332482 +/- 2140
41st
3
332794 +/- 822
40th
8
333401 +/- 11662
38th
12
338520 +/- 1773
36th
6
343450 +/- 2003
33rd
3
347609 +/- 1606
33rd
8
348329 +/- 42465
32nd
7
355804 +/- 24362
31st
11
358600 +/- 14280
31st
3
359523 +/- 2017
31st
3
361633 +/- 18141
29th
6
366768 +/- 7614
28th
8
368574 +/- 375
Low-Tier
25th
> 370762
22nd
9
374297 +/- 4388
22nd
8
375642 +/- 9548
21st
6
377014 +/- 1483
21st
8
377556 +/- 9227
21st
8
378717 +/- 10184
19th
8
385203 +/- 36992
17th
3
390644 +/- 420
16th
6
397570 +/- 10930
15th
5
399865 +/- 27933
15th
7
407487 +/- 21163
13th
8
415478 +/- 10342
8th
8
439886 +/- 1732
7th
3
474160 +/- 9586
4th
4
517379 +/- 1769
3rd
3
556082 +/- 16214
2nd
3
628287 +/- 26813
OpenBenchmarking.orgDistribution Of Public Results - Model: GoogleNet - Acceleration: CPU - Iterations: 200582 Results Range From 120428 To 3048080 Milli-Seconds12042817898223753629609035464441319847175253030658886064741470596876452282307688163094018499873810572921115846117440012329541291508135006214086161467170152572415842781642832170138617599401818494187704819356021994156205271021112642169818222837222869262345480240403424625882521142257969626382502696804275535828139122872466293102029895743048128306090120150

Based on OpenBenchmarking.org data, the selected test / test configuration (Caffe 2020-02-13 - Model: GoogleNet - Acceleration: CPU - Iterations: 200) has an average run-time of 16 minutes. By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.

OpenBenchmarking.orgMinutesTime Required To Complete BenchmarkModel: GoogleNet - Acceleration: CPU - Iterations: 200Run-Time918273645Min: 6 / Avg: 15.57 / Max: 46

Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 0.1%.

OpenBenchmarking.orgPercent, Fewer Is BetterAverage Deviation Between RunsModel: GoogleNet - Acceleration: CPU - Iterations: 200Deviation246810Min: 0 / Avg: 0.14 / Max: 4

Does It Scale Well With Increasing Cores?

No, based on the automated analysis of the collected public benchmark data, this test / test settings does not generally scale well with increasing CPU core counts. Data based on publicly available results for this test / test settings, separated by vendor, result divided by the reference CPU clock speed, grouped by matching physical CPU core count, and normalized against the smallest core count tested from each vendor for each CPU having a sufficient number of test samples and statistically significant data.

IntelAMDOpenBenchmarking.orgRelative Core Scaling To BaseCaffe AlexNet CPU Core ScalingModel: GoogleNet - Acceleration: CPU - Iterations: 20046812162432640.38540.77081.15621.54161.927

Notable Instruction Set Usage

Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.

Instruction Set
Support
Instructions Detected
SSE2 (SSE2)
Used by default on supported hardware.
 
PUNPCKLQDQ MOVDQA MOVDQU CVTSS2SD MOVD ADDSD DIVSD CVTTSD2SI MOVUPD CVTPS2PD CVTPD2PS CVTSD2SS PSHUFD XORPD SHUFPD SUBSD MULSD CVTSI2SD MOVAPD UCOMISD UNPCKLPD CVTDQ2PS COMISD CVTDQ2PD SQRTSD ANDPD ANDNPD CMPNLESD ORPD DIVPD MULPD MINSD MINPD MAXPD MAXSD CMPLTPD ADDPD CMPLTSD MOVHPD SUBPD MOVLPD UNPCKHPD PMULUDQ PSRLDQ
Requires passing a supported compiler/build flag (verified with targets: sandybridge, skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Sandy Bridge (2011).
Found on AMD processors since Bulldozer (2011).

 
VZEROUPPER VINSERTF128 VEXTRACTF128 VPERM2F128 VPERMILPS VPERMILPD VBROADCASTSS VBROADCASTSD VMASKMOVPS
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Excavator (2016).

 
VPERM2I128 VPERMD VPERMPD VPBROADCASTQ VPBROADCASTD VPERMQ VGATHERQPS VEXTRACTI128 VPMASKMOVD VINSERTI128 VPGATHERDD VPBROADCASTW
FMA (FMA)
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Bulldozer (2011).

 
VFMADD132SS VFMADD132SD VFMSUB213PS VFMSUB132SS VFMSUB213PD VFMSUB132SD VFNMADD213SD VFNMADD213SS VFMADD231SS VFNMADD231SS VFMADD213SS VFNMADD132SS VFMADD231SD VFNMADD132SD VFMADD213SD VFMADD132PS VFMADD132PD VFNMADD132PD VFNMADD213PD VFNMADD132PS VFNMADD213PS VFMSUB231SD VFNMADD231SD VFMADD231PD
Advanced Vector Extensions 512 (AVX512)
Requires passing a supported compiler/build flag (verified with targets: cascadelake, sapphirerapids).
 
(ZMM REGISTER USE)
The test / benchmark does honor compiler flag changes.
Last automated analysis: 17 January 2022

This test profile binary relies on the shared libraries libcaffe.so.1.0.0, libglog.so.0, libgflags.so.2.2, libprotobuf.so.23, libc.so.6, libm.so.6, liblmdb.so.0, libopenblas.so.0, libunwind.so.8, libpthread.so.0, libz.so.1, libcrypto.so.3, libcurl.so.4, libsz.so.2, libgfortran.so.5, liblzma.so.5, libnghttp2.so.14, libidn2.so.0, librtmp.so.1, libssh.so.4, libpsl.so.5, libssl.so.3, libldap-2.5.so.0, liblber-2.5.so.0, libzstd.so.1, libbrotlidec.so.1, libaec.so.0, libquadmath.so.0, libunistring.so.2, libgnutls.so.30, libhogweed.so.6, libnettle.so.8, libgmp.so.10, libkrb5.so.3, libk5crypto.so.3, libkrb5support.so.0, libsasl2.so.2, libbrotlicommon.so.1, libp11-kit.so.0, libtasn1.so.6, libkeyutils.so.1, libresolv.so.2, libffi.so.8.

Tested CPU Architectures

This benchmark has been successfully tested on the below mentioned architectures. The CPU architectures listed is where successful OpenBenchmarking.org result uploads occurred, namely for helping to determine if a given test is compatible with various alternative CPU architectures.

CPU Architecture
Kernel Identifier
Verified On
Intel / AMD x86 64-bit
x86_64
(Many Processors)
IBM POWER (PowerPC) 64-bit
ppc64le
POWER9 44-Core
ARMv8 64-bit
aarch64
ARMv8 Cortex-A72 6-Core, ARMv8 Neoverse-V1

Recent Test Results

OpenBenchmarking.org Results Compare

1 System - 341 Benchmark Results

AMD Ryzen 9 7950X 16-Core - ASUS ProArt X670E-CREATOR WIFI - AMD Device 14d8

Pop 22.04 - 6.6.10-76060610-generic - GNOME Shell 42.5

1 System - 304 Benchmark Results

AMD A4-5300 APU - ASRock FM2A88M-HD+ R3.0 - AMD 15h

Ubuntu 20.04 - 5.15.0-89-generic - GNOME Shell 3.36.9

1 System - 295 Benchmark Results

AMD Ryzen 9 7950X3D 16-Core - ASUS PRIME X670E-PRO WIFI - AMD Device 14d8

Ubuntu 22.04 - 6.2.0-39-generic - GNOME Shell 42.9

1 System - 301 Benchmark Results

AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G - ASRock A320M-HDV R4.0 - AMD 15h

Ubuntu 20.04 - 5.15.0-89-generic - GNOME Shell 3.36.9

1 System - 307 Benchmark Results

Intel Core i3-10100 - ASRock H510M-HVS - Intel Device 43ef

Ubuntu 20.04 - 5.15.0-88-generic - GNOME Shell 3.36.9

1 System - 264 Benchmark Results

Intel Pentium Gold G6405 - ASRock H510M-HDV/M.2 SE - Intel Comet Lake PCH

Ubuntu 20.04 - 5.15.0-88-generic - GNOME Shell 3.36.9

1 System - 12 Benchmark Results

Intel Core i9-10850K - ASUS ROG MAXIMUS XII APEX - Intel Comet Lake PCH

Ubuntu 20.04 - 5.15.0-88-generic - GNOME Shell 3.36.9

1 System - 264 Benchmark Results

Intel Pentium Gold G6400 - ASRock H510M-HDV/M.2 SE - Intel Comet Lake PCH

Ubuntu 20.04 - 5.15.0-86-generic - GNOME Shell 3.36.9

1 System - 279 Benchmark Results

Intel Core i3-10105 - ASRock H510M-HDV/M.2 SE - Intel Comet Lake PCH

Ubuntu 20.04 - 5.15.0-83-generic - GNOME Shell 3.36.9

1 System - 168 Benchmark Results

2 x AMD EPYC 9354 32-Core - Supermicro H13DSG-O-CPU v1.10 - AMD Device 14a4

Ubuntu 22.04 - 6.2.0-34-generic - GNOME Shell 42.9

1 System - 279 Benchmark Results

Intel Core i3-10105 - ASRock H510M-HDV/M.2 SE - Intel Comet Lake PCH

Ubuntu 20.04 - 5.15.0-83-generic - GNOME Shell 3.36.9

1 System - 279 Benchmark Results

Intel Core i3-10105 - ASRock H510M-HDV/M.2 SE - Intel Comet Lake PCH

Ubuntu 20.04 - 5.15.0-83-generic - GNOME Shell 3.36.9

Find More Test Results