ACES DGEMM

This is a multi-threaded DGEMM benchmark.

To run this test with the Phoronix Test Suite, the basic command is: phoronix-test-suite benchmark mt-dgemm.

Project Site

lanl.gov

Test Created

11 October 2019

Test Maintainer

Michael Larabel 

Test Type

Processor

Average Install Time

2 Seconds

Average Run Time

15 Minutes, 37 Seconds

Test Dependencies

C/C++ Compiler Toolchain

Accolades

5k+ Downloads

Supported Platforms


Public Result Uploads *Reported Installs **Reported Test Completions **Test Profile Page Views ***OpenBenchmarking.orgEventsACES DGEMM Popularity Statisticspts/mt-dgemm2019.102019.112019.122020.012020.022020.032020.042020.052020.062020.072020.082020.092020.102020.112020.122021.012021.022021.032021.042021.052021.062021.072021.082021.09400800120016002000
* Uploading of benchmark result data to OpenBenchmarking.org is always optional (opt-in) via the Phoronix Test Suite for users wishing to share their results publicly.
** Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform.
*** Test profile page view reporting began March 2021.
Data current as of 11 September 2021.

Revision History

pts/mt-dgemm-1.2.0   [View Source]   Fri, 11 Oct 2019 15:29:15 GMT
Initial commit of ACES DGEMM

Suites Using This Test

Multi-Core

Scientific Computing

HPC - High Performance Computing

Linear Algebra

Programmer / Developer System Benchmarks


Performance Metrics

Analyze Test Configuration:

ACES DGEMM 1.0

Sustained Floating-Point Rate

OpenBenchmarking.org metrics for this test profile configuration based on 1,455 public results since 11 October 2019 with the latest data as of 16 September 2021.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Percentile Rank
# Compatible Public Results
GFLOP/s (Average)
100th
21
38.3 +/- 0.5
100th
12
31.8 +/- 0.3
97th
27
27.4 +/- 3.2
96th
10
23.2 +/- 0.4
96th
3
22.5 +/- 1.1
95th
13
21.1 +/- 1.0
94th
5
20.5 +/- 0.3
92nd
4
20.1 +/- 1.7
92nd
11
20.1 +/- 1.0
92nd
11
19.8 +/- 0.7
90th
5
18.8 +/- 0.1
89th
6
17.5 +/- 1.0
87th
12
16.7 +/- 0.2
86th
8
16.3 +/- 0.3
86th
3
16.2 +/- 0.4
86th
3
15.8 +/- 0.3
85th
12
15.5 +/- 0.3
84th
7
15.1 +/- 0.2
84th
4
15.1 +/- 1.7
82nd
33
13.9 +/- 0.2
81st
7
13.6 +/- 0.4
80th
7
13.4 +/- 0.4
80th
4
13.3 +/- 1.6
79th
11
12.7 +/- 0.1
78th
6
12.6 +/- 0.5
77th
10
12.2 +/- 0.1
77th
6
12.1 +/- 0.6
76th
6
11.7 +/- 0.8
Mid-Tier
75th
< 11.3
75th
7
11.3 +/- 0.3
75th
10
11.3 +/- 0.3
75th
4
11.2 +/- 0.2
74th
15
10.9 +/- 1.6
73rd
5
10.7 +/- 0.4
73rd
9
10.6 +/- 1.1
73rd
7
10.6 +/- 0.1
72nd
3
10.6 +/- 0.2
72nd
6
10.2 +/- 0.6
70th
11
9.8 +/- 0.1
67th
8
9.4 +/- 0.4
65th
14
9.2 +/- 0.3
64th
9
9.1 +/- 0.2
62nd
7
8.9 +/- 0.1
61st
42
8.8 +/- 0.8
60th
7
8.3 +/- 0.2
59th
3
7.9 +/- 0.1
58th
8
7.6 +/- 0.2
56th
17
7.3 +/- 0.3
55th
3
7.2 +/- 0.2
54th
9
6.8 +/- 0.3
53rd
3
6.6 +/- 0.2
52nd
3
6.5 +/- 0.2
Median
50th
6.3
50th
24
6.3 +/- 0.2
49th
4
6.2 +/- 0.2
49th
7
5.9 +/- 0.2
49th
11
5.9 +/- 0.5
47th
6
5.7 +/- 0.4
47th
3
5.7 +/- 0.1
46th
15
5.5 +/- 0.1
45th
10
5.4 +/- 0.6
44th
13
5.2 +/- 0.2
43rd
3
5.0 +/- 0.1
42nd
13
4.9 +/- 0.3
41st
16
4.7 +/- 0.5
39th
6
4.4 +/- 0.3
39th
6
4.4 +/- 0.1
37th
5
4.2 +/- 0.2
35th
29
4.1 +/- 0.5
34th
10
3.8 +/- 0.3
32nd
5
3.6 +/- 0.4
31st
9
3.5 +/- 0.1
30th
9
3.3 +/- 0.2
Low-Tier
25th
< 2.6
25th
10
2.6 +/- 0.2
25th
4
2.5 +/- 0.1
23rd
9
2.3 +/- 0.1
23rd
10
2.3 +/- 0.1
21st
5
2.1 +/- 0.1
19th
10
1.8 +/- 0.1
18th
3
1.7 +/- 0.1
17th
18
1.6 +/- 0.1
15th
7
1.3 +/- 0.2
OpenBenchmarking.orgDistribution Of Public Results - Sustained Floating-Point Rate1452 Results Range From 0 To 49 GFLOP/s051015202530354045505560130260390520650

Based on OpenBenchmarking.org data, the selected test / test configuration (ACES DGEMM 1.0 - Sustained Floating-Point Rate) has an average run-time of 7 minutes. By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.

OpenBenchmarking.orgMinutesTime Required To Complete BenchmarkSustained Floating-Point RateRun-Time20406080100Min: 1 / Avg: 6.49 / Max: 93

Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 1.3%.

OpenBenchmarking.orgPercent, Fewer Is BetterAverage Deviation Between RunsSustained Floating-Point RateDeviation3691215Min: 0 / Avg: 1.32 / Max: 8

Does It Scale Well With Increasing Cores?

Yes, based on the automated analysis of the collected public benchmark data, this test / test settings does generally scale well with increasing CPU core counts. Data based on publicly available results for this test / test settings, separated by vendor, result divided by the reference CPU clock speed, grouped by matching physical CPU core count, and normalized against the smallest core count tested from each vendor for each CPU having a sufficient number of test samples and statistically significant data.

IntelAMDOpenBenchmarking.orgRelative Core Scaling To BaseACES DGEMM CPU Core ScalingSustained Floating-Point Rate468121618202432486496128612182430

Notable Instruction Set Usage

Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.

Instruction Set
Support
Instructions Detected
Used by default on supported hardware.
Found on Intel processors since Sandy Bridge (2011).
Found on AMD processors since Bulldozer (2011).

 
VZEROUPPER VEXTRACTF128 VINSERTF128
FMA (FMA)
Used by default on supported hardware.
Found on Intel processors since Haswell (2013).
Found on AMD processors since Bulldozer (2011).

 
VFMADD132SD VFMADD231SD
The test / benchmark does honor compiler flag changes.
Last automated analysis: 10 May 2021

This test profile binary relies on the shared libraries libgomp.so.1, libc.so.6, libdl.so.2, libpthread.so.0.

Recent Test Results

OpenBenchmarking.org Results Compare

2 Systems - 21 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

1 System - 31 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 19 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

1 System - 29 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 14 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 13 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 12 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 11 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 10 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

1 System - 19 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 9 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

1 System - 18 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

1 System - 17 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 8 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

1 System - 16 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

Most Popular Test Results

OpenBenchmarking.org Results Compare

3 Systems - 268 Benchmark Results

Intel Core i5-2520M - HP 161C - Intel 2nd Generation Core DRAM

Ubuntu 18.04 - 4.18.0-20-generic - GNOME Shell 3.28.3

11 Systems - 217 Benchmark Results

AMD Ryzen 5 2600X Six-Core - ASUS ROG CROSSHAIR VIII HERO - AMD 17h

Ubuntu 20.04 - 5.9.0-050900-generic - GNOME Shell 3.36.4

12 Systems - 593 Benchmark Results

AMD Ryzen 5 3600XT 6-Core - MSI X470 GAMING M7 AC - AMD Starship

Ubuntu 20.04 - 5.8.0-050800daily20200622-generic - GNOME Shell 3.36.2

15 Systems - 38 Benchmark Results

2 x AMD EPYC 7742 64-Core - AMD DAYTONA_X - AMD Starship

Ubuntu 19.10 - 5.3.0-18-generic - GNOME Shell 3.34.1

8 Systems - 360 Benchmark Results

AMD Ryzen Threadripper 2970WX 24-Core - Gigabyte X399 AORUS Gaming 7 - AMD 17h

Ubuntu 19.10 - 5.4.0-999-generic - GNOME Shell 3.34.1

7 Systems - 62 Benchmark Results

Intel Core i9-7960X - 16384MB - 238GB

Ubuntu 18.04 - 4.4.0-18362-Microsoft - GCC 7.4.0

3 Systems - 301 Benchmark Results

Intel Core i5-7600K - Gigabyte Z270M-D3H-CF - Intel Xeon E3-1200 v6

Ubuntu 20.04 - 5.4.0-40-generic - GNOME Shell 3.36.3

2 Systems - 59 Benchmark Results

AMD Ryzen 7 3700X 8-Core - MSI MEG X570 GODLIKE - AMD Device 1480

Clear Linux OS 31480 - 5.3.8-854.native - GNOME Shell 3.34.1

12 Systems - 229 Benchmark Results

AMD Ryzen 5 2600 Six-Core - ASUS ROG CROSSHAIR VIII HERO - AMD 17h

Ubuntu 20.04 - 5.9.0-050900-generic - GNOME Shell 3.36.4

2 Systems - 1708 Benchmark Results

AMD Ryzen 3 3300X 4-Core - ASRock X570 Pro4 - AMD Starship

Ubuntu 20.04 - 5.7.0-rc6-amd-energy - GNOME Shell 3.36.2

18 Systems - 115 Benchmark Results

AMD Ryzen 9 3900X 12-Core - ASUS ROG CROSSHAIR VIII HERO - AMD Starship

Ubuntu 19.10 - 5.3.0-17-generic - GNOME Shell 3.34.1

Find More Test Results