Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs.

To run this test with the Phoronix Test Suite, the basic command is: phoronix-test-suite benchmark caffe.

Test Created

14 November 2015

Last Updated

26 September 2020

Test Maintainer

Michael Larabel 

Test Type

System

Average Install Time

27 Seconds

Average Run Time

2 Minutes, 1 Second

Test Dependencies

C/C++ Compiler Toolchain + CMake + Python + BLAS (Basic Linear Algebra Sub-Routine) + C++ Boost + Linear Algebra Pack + Snappy Compression + GFlags + OpenCV + HDF5

Accolades

100k+ Downloads

Supported Platforms


Public Result Uploads *Reported Installs **Reported Test Completions **Test Profile Page Views ***OpenBenchmarking.orgEventsCaffe AlexNet Popularity Statisticspts/caffe2015.112016.012016.032016.072016.092016.112017.012017.032017.052017.072017.092017.112018.012018.032018.052018.072018.092018.112019.012019.032019.052019.092020.012020.072020.102020.122021.022021.042021.062021.082021.102021.122K4K6K8K10K
* Uploading of benchmark result data to OpenBenchmarking.org is always optional (opt-in) via the Phoronix Test Suite for users wishing to share their results publicly.
** Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform.
*** Test profile page view reporting began March 2021.
Data current as of 7 December 2021.
GoogleNet48.2%AlexNet51.8%Model Option PopularityOpenBenchmarking.org
NVIDIA CUDA 16.0%CPU84.0%Acceleration Option PopularityOpenBenchmarking.org
20046.1%100011.2%10042.7%Iterations Option PopularityOpenBenchmarking.org

Revision History

pts/caffe-1.5.0   [View Source]   Sat, 26 Sep 2020 21:35:45 GMT
Overhaul Caffe test profile with latest Git snapshot, switch to CMake build system, clean up test options, etc.

pts/caffe-1.4.0   [View Source]   Sat, 29 Dec 2018 11:15:41 GMT
Update Caffe to latest Git snapshot to hopefully workaround build problems on newer distros.

pts/caffe-1.3.3   [View Source]   Sun, 01 Apr 2018 18:50:19 GMT
Basic fix for OpenCV 3.4.

pts/caffe-1.3.2   [View Source]   Wed, 04 Jan 2017 11:07:36 GMT
Fix for OpenCV 3.2.

pts/caffe-1.3.1   [View Source]   Wed, 28 Dec 2016 20:36:42 GMT
Don't show title string of "Caffe AlexNet" but "Caffe" with recent test profile versions supporting more than just AlexNet.

pts/caffe-1.3.0   [View Source]   Wed, 28 Dec 2016 20:34:27 GMT
Update to latest Git snapshot to fix OpenCV compatibility.

pts/caffe-1.2.0   [View Source]   Mon, 15 Aug 2016 16:11:16 GMT
Add Googlenet support, decrease CPU only iteration count.

pts/caffe-1.1.1   [View Source]   Sun, 12 Jun 2016 18:32:44 GMT
Add OpenCV and OpenBLAS support.

pts/caffe-1.1.0   [View Source]   Sat, 11 Jun 2016 19:32:55 GMT
Update

pts/caffe-1.0.0   [View Source]   Sat, 14 Nov 2015 15:29:45 GMT
Initial commit of Caffe deep learning framework and with this benchmark using the AlexNet model for benchmarking.

Suites Using This Test

Machine Learning

HPC - High Performance Computing

NVIDIA GPU Compute


Performance Metrics

Analyze Test Configuration:

Caffe 2020-02-13

Model: AlexNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.org metrics for this test profile configuration based on 510 public results since 26 September 2020 with the latest data as of 29 September 2021.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Percentile Rank
# Compatible Public Results
Milli-Seconds (Average)
99th
3
31313 +/- 140
97th
3
32010 +/- 67
97th
9
32838 +/- 3009
96th
11
34498 +/- 2825
94th
3
34822 +/- 9
93rd
10
35102 +/- 789
89th
3
37173 +/- 22
89th
16
37289 +/- 2684
88th
4
37366 +/- 34
86th
3
37955 +/- 132
86th
3
38315 +/- 72
86th
13
38567 +/- 2550
83rd
7
39617 +/- 443
81st
3
40100 +/- 99
80th
7
40411 +/- 5457
80th
3
40696 +/- 46
79th
7
41490 +/- 228
77th
3
42896 +/- 16
77th
4
43703 +/- 1085
77th
3
44032 +/- 123
Mid-Tier
75th
> 44676
74th
3
46908 +/- 38
73rd
3
48030 +/- 65
72nd
6
49030 +/- 55
71st
3
50195 +/- 92
69th
3
51637 +/- 187
69th
3
51775 +/- 77
69th
6
51822 +/- 2121
65th
4
53293 +/- 1907
64th
3
53894 +/- 254
64th
5
53938 +/- 2643
63rd
3
54246 +/- 97
63rd
3
54528 +/- 194
62nd
3
54762 +/- 50
61st
3
55743 +/- 343
59th
3
57031 +/- 72
57th
3
58792 +/- 45
56th
3
59244 +/- 55
55th
4
59847 +/- 31
54th
3
60055 +/- 341
53rd
3
60383 +/- 85
52nd
3
60847 +/- 97
51st
3
60956 +/- 44
Median
50th
61003
50th
3
61442 +/- 484
47th
3
62756 +/- 661
46th
7
63592 +/- 3987
45th
4
64167 +/- 112
44th
4
64360 +/- 4536
44th
6
64670 +/- 1103
43rd
3
65462 +/- 825
43rd
3
65525 +/- 14
41st
3
66966 +/- 92
40th
3
67388 +/- 194
39th
3
68781 +/- 164
35th
12
69338 +/- 471
32nd
9
70420 +/- 1606
32nd
4
70911 +/- 127
31st
3
71502 +/- 272
31st
3
71788 +/- 91
30th
3
73006 +/- 157
Low-Tier
25th
> 74949
25th
8
75063 +/- 162
21st
4
76325 +/- 1670
19th
3
79454 +/- 240
17th
4
81789 +/- 576
12th
9
85526 +/- 1817
11th
3
86210 +/- 644
9th
5
95251 +/- 13722
9th
3
96699 +/- 6657
8th
3
103280 +/- 3062
4th
3
121367 +/- 4283
3rd
3
124930 +/- 1779
3rd
3
127752 +/- 382
2nd
3
151504 +/- 48
OpenBenchmarking.orgDistribution Of Public Results - Model: AlexNet - Acceleration: CPU - Iterations: 100510 Results Range From 29670 To 650940 Milli-Seconds296704209654522669487937491800104226116652129078141504153930166356178782191208203634216060228486240912253338265764278190290616303042315468327894340320352746365172377598390024402450414876427302439728452154464580477006489432501858514284526710539136551562563988576414588840601266613692626118638544650970306090120150

Based on OpenBenchmarking.org data, the selected test / test configuration (Caffe 2020-02-13 - Model: AlexNet - Acceleration: CPU - Iterations: 100) has an average run-time of 4 minutes. By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.

OpenBenchmarking.orgMinutesTime Required To Complete BenchmarkModel: AlexNet - Acceleration: CPU - Iterations: 100Run-Time510152025Min: 2 / Avg: 4.09 / Max: 21

Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 0.3%.

OpenBenchmarking.orgPercent, Fewer Is BetterAverage Deviation Between RunsModel: AlexNet - Acceleration: CPU - Iterations: 100Deviation246810Min: 0 / Avg: 0.3 / Max: 5

Does It Scale Well With Increasing Cores?

No, based on the automated analysis of the collected public benchmark data, this test / test settings does not generally scale well with increasing CPU core counts. Data based on publicly available results for this test / test settings, separated by vendor, result divided by the reference CPU clock speed, grouped by matching physical CPU core count, and normalized against the smallest core count tested from each vendor for each CPU having a sufficient number of test samples and statistically significant data.

AMDIntelOpenBenchmarking.orgRelative Core Scaling To BaseCaffe AlexNet CPU Core ScalingModel: AlexNet - Acceleration: CPU - Iterations: 10046812162432640.67361.34722.02082.69443.368

Notable Instruction Set Usage

Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.

Instruction Set
Support
Instructions Detected
SSE2 (SSE2)
Used by default on supported hardware.
 
PUNPCKLQDQ MOVDQA MOVDQU CVTSS2SD MOVD ADDSD DIVSD CVTTSD2SI MOVUPD CVTPS2PD CVTPD2PS CVTSD2SS PSHUFD XORPD SHUFPD SUBSD MULSD CVTSI2SD MOVAPD UCOMISD UNPCKLPD CVTDQ2PS COMISD CVTDQ2PD SQRTSD ANDPD ANDNPD CMPNLESD ORPD DIVPD MULPD MINSD MINPD MAXPD MAXSD CMPLTPD ADDPD CMPLTSD MOVHPD SUBPD MOVLPD UNPCKHPD PMULUDQ PSRLDQ
Requires passing a supported compiler/build flag (verified with targets: sandybridge, skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Sandy Bridge (2011).
Found on AMD processors since Bulldozer (2011).

 
VZEROUPPER VINSERTF128 VEXTRACTF128 VPERM2F128 VPERMILPS VPERMILPD VBROADCASTSS VBROADCASTSD VMASKMOVPS
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Excavator (2016).

 
VPERM2I128 VPERMD VPERMPD VPBROADCASTQ VPBROADCASTD VPERMQ VGATHERQPS VEXTRACTI128 VPMASKMOVD VINSERTI128 VPGATHERDD VPBROADCASTW
FMA (FMA)
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, sapphirerapids, alderlake, znver2, znver3).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Bulldozer (2011).

 
VFMADD132SS VFMADD132SD VFMSUB213PS VFMSUB132SS VFMSUB213PD VFMSUB132SD VFNMADD213SD VFNMADD213SS VFMADD231SS VFNMADD231SS VFMADD213SS VFNMADD132SS VFMADD231SD VFNMADD132SD VFMADD213SD VFMADD132PS VFMADD132PD VFNMADD132PD VFNMADD213PD VFNMADD132PS VFNMADD213PS VFMSUB231SD VFNMADD231SD VFMADD231PD
The test / benchmark does honor compiler flag changes.
Last automated analysis: 30 January 2021

This test profile binary relies on the shared libraries libcaffe.so.1.0.0, libglog.so.0, libgflags.so.2.2, libprotobuf.so.23, libpthread.so.0, libsz.so.2, libz.so.1, libdl.so.2, libm.so.6, liblmdb.so.0, libopenblas.so.0, libc.so.6, libunwind.so.8, libaec.so.0, libgfortran.so.5, liblzma.so.5, libquadmath.so.0.

Recent Test Results

OpenBenchmarking.org Results Compare

1 System - 2 Benchmark Results

AMD Ryzen Threadripper PRO 3955WX 16-Cores - ASUS Pro WS WRX80E-SAGE SE WIFI - AMD Starship

Ubuntu 21.04 - 5.11.0-36-generic - X Server 1.20.11

1 System - 86 Benchmark Results

AMD Ryzen 9 5900X 12-Core - ASRock X570 PG Velocita - AMD Starship

Pop 21.04 - 5.11.0-7620-generic - GNOME Shell 3.38.4

1 System - 86 Benchmark Results

AMD Ryzen 9 5900X 12-Core - ASRock X570 PG Velocita - AMD Starship

Pop 21.04 - 5.11.0-7620-generic - GNOME Shell 3.38.4

1 System - 1873 Benchmark Results

1 System - 1687 Benchmark Results

1 System - 1 Benchmark Result

AArch64 rev 4 - Qualcomm MSM8998 - 6GB

Debian GNU - 4.4.78-perf+ - GNOME Shell

1 System - 1 Benchmark Result

Unknown - Inspur CE3000F - 16GB

Ubuntu 20.04.2 LTS - 5.8.0-64-generic - GCC 9.3.0

1 System - 1 Benchmark Result

Loongson-3A5000LL - Loongson Loongson-LS3A5000-7A1000-1w-V0.1-CRB v1.0 - Loongson LLC Hyper Transport Bridge

Loongnix 20 - 4.19.0-12-loongson-3 - X Server 1.20.4

3 Systems - 240 Benchmark Results

AMD EPYC 7F32 8-Core - Supermicro H12SSL-i v1.01 - AMD Starship

Ubuntu 21.04 - 5.11.0-16-generic - GNOME Shell 3.38.4

3 Systems - 240 Benchmark Results

AMD EPYC 7F32 8-Core - Supermicro H12SSL-i v1.01 - AMD Starship

Ubuntu 21.04 - 5.11.0-16-generic - GNOME Shell 3.38.4

2 Systems - 118 Benchmark Results

Intel Core i9-11900K - ASUS ROG MAXIMUS XIII HERO - Intel Tiger Lake-H

Fedora 34 - 5.12.9-300.fc34.x86_64 - GNOME Shell 40.1

1 System - 6 Benchmark Results

Intel 0000 - LENOVO 31A5 - Intel Device 43ef

Ubuntu 20.04 - 5.8.0-55-generic - GNOME Shell 3.36.9

8 Systems - 195 Benchmark Results

AMD EPYC - Hetzner vServer v20171111 - 1 x 8000 MB RAM QEMU

Debian 11 - 5.10.35-rt39-xanmod1 - GCC 10.2.1 20210110

7 Systems - 195 Benchmark Results

AMD EPYC - Hetzner vServer v20171111 - 1 x 8000 MB RAM QEMU

Debian 11 - 5.12.8-xanmod1-cacule - GCC 10.2.1 20210110

6 Systems - 195 Benchmark Results

AMD EPYC - Hetzner vServer v20171111 - 1 x 8000 MB RAM QEMU

Debian 11 - 5.12.8-051208-lowlatency - GCC 10.2.1 20210110

Most Popular Test Results

OpenBenchmarking.org Results Compare

3 Systems - 268 Benchmark Results

Intel Core i5-2520M - HP 161C - Intel 2nd Generation Core DRAM

Ubuntu 18.04 - 4.18.0-20-generic - GNOME Shell 3.28.3

3 Systems - 174 Benchmark Results

Intel Core i9-10900K - Gigabyte Z490 AORUS MASTER - Intel Comet Lake PCH

Fedora 33 - 5.8.11-300.fc33.x86_64 - GNOME Shell 3.38.0

8 Systems - 439 Benchmark Results

AMD Ryzen 9 5950X 16-Core - ASUS ROG CROSSHAIR VIII HERO - AMD Starship

Ubuntu 21.04 - 5.12.0-051200rc3daily20210315-generic - GNOME Shell 3.38.3

3 Systems - 46 Benchmark Results

AMD Ryzen Threadripper 3960X 24-Core - MSI Creator TRX40 - AMD Starship

Ubuntu 20.04 - 5.9.0-rc5-14sep-patch - GNOME Shell 3.36.4

8 Systems - 32 Benchmark Results

Intel Core i7-1065G7 - Dell 06CDVY - Intel Device 34ef

Ubuntu 20.10 - 5.8.0-25-generic - GNOME Shell 3.38.1

2 Systems - 403 Benchmark Results

Intel Core i9-10900K - Gigabyte Z490 AORUS MASTER - Intel Comet Lake PCH

Ubuntu 20.10 - 5.8.0-22-generic - GNOME Shell 3.38.0

3 Systems - 406 Benchmark Results

Intel Core i9-10900K - Gigabyte Z490 AORUS MASTER - Intel Comet Lake PCH

Ubuntu 20.04 - 5.4.0-48-generic - GNOME Shell 3.36.4

2 Systems - 297 Benchmark Results

AMD Ryzen Threadripper 3990X 64-Core - System76 Thelio Major - AMD Starship

Pop 20.10 - 5.8.0-7625-generic - GNOME Shell 3.38.1

3 Systems - 191 Benchmark Results

AMD Ryzen 3 2200G - ASUS PRIME B350M-E - AMD Raven

Ubuntu 20.10 - 5.8.0-38-generic - GNOME Shell 3.38.1

3 Systems - 131 Benchmark Results

Intel Core i5-10600K - ASUS PRIME Z490M-PLUS - Intel Comet Lake PCH

Ubuntu 20.04 - 5.9.0-050900daily20201012-generic - GNOME Shell 3.36.4

3 Systems - 40 Benchmark Results

2 x Intel Xeon Gold 5220R - TYAN S7106 - Intel Sky Lake-E DMI3 Registers

Ubuntu 20.04 - 5.9.0-050900rc6-generic - GNOME Shell 3.36.4

4 Systems - 513 Benchmark Results

Intel Core i7-5775C - CompuLab v1.0 - Intel Broadwell-U DMI

Ubuntu 20.10 - 5.8.0-26-generic - GNOME Shell 3.38.1

3 Systems - 32 Benchmark Results

AMD Ryzen 9 3900X 12-Core - ASUS TUF GAMING X570-PLUS - AMD Starship

Ubuntu 20.04 - 5.9.0-050900rc6daily20200922-generic - GNOME Shell 3.36.4

3 Systems - 202 Benchmark Results

Intel Core i7-7700K - MSI Z270-A PRO - Intel Xeon E3-1200 v6

Ubuntu 20.04 - 5.4.0-28-generic - GNOME Shell 3.36.1

Find More Test Results