ss AMD EPYC 3255 8-Core Temp testing with a congatec conga-B7E3 (5.13 BIOS) and MSI NVIDIA GeForce GTX 1050 2GB on Ubuntu 16.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2106293-IB-SS230099426&grs&rdt&export=txt .
ss Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenGL Compiler File-System Screen Resolution sysbench1604Ph10 graphics-magick1604Ph10 ipc-benchmarking1604Ph10 ipc-benchmarking1024-1604Ph10 amg1604Ph10 tensorflow1604PH10 ramspeed1604Ph10 npb1604Ph10 scimark1604Ph10 cachebench1604Ph10 onednn1604Ph10 apache-ctx_clock1604Ph10 ctx-clock1604Ph10 hackbenchAll1604Ph10 hackbench1604Ph10 mbw1604Ph10 openssl1604Ph10 perf-bench1604Ph10 schbench8-16-1604Ph10 stress-ng1604Ph10 t-test1-1604Ph10 tinymembench1604Ph10 AMD EPYC 3255 8-Core Temp @ 2.50GHz (8 Cores / 16 Threads) congatec conga-B7E3 (5.13 BIOS) AMD 17h 32GB 2000GB Samsung SSD 970 EVO 2TB + 2000GB Portable SSD T5 MSI NVIDIA GeForce GTX 1050 2GB NVIDIA GP107GL HD Audio Intel I211 + Intel I210 + 2 x AMD Device 1458 + 2 x AMD Device 1459 Ubuntu 16.04 4.15.0-123-generic (x86_64) X Server NVIDIA 1.4 (2.1 Mesa 10.5.4) GCC 5.5.0 20171010 ext4 800x600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - sysbench1604Ph10, graphics-magick1604Ph10, ipc-benchmarking1604Ph10, ipc-benchmarking1024-1604Ph10, amg1604Ph10, ramspeed1604Ph10, npb1604Ph10, scimark1604Ph10, cachebench1604Ph10, onednn1604Ph10, apache-ctx_clock1604Ph10, ctx-clock1604Ph10, hackbenchAll1604Ph10, hackbench1604Ph10, mbw1604Ph10, openssl1604Ph10, perf-bench1604Ph10, schbench8-16-1604Ph10, stress-ng1604Ph10, t-test1-1604Ph10, tinymembench1604Ph10: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected Python Details - tensorflow1604PH10: sh: 1: /opt/TensorRT/python: Permission denied + Python 3.5.2
ss tinymembench: Standard Memset tinymembench: Standard Memcpy t-test1: 2 t-test1: 1 stress-ng: System V Message Passing stress-ng: Glibc Qsort Data Sorting stress-ng: Glibc C String Functions stress-ng: Context Switching stress-ng: Socket Activity stress-ng: Memory Copying stress-ng: Vector Math stress-ng: Matrix Math stress-ng: Semaphores stress-ng: CPU Stress stress-ng: CPU Cache stress-ng: SENDFILE stress-ng: Forking stress-ng: Malloc stress-ng: Crypto stress-ng: Atomic stress-ng: MEMFD stress-ng: NUMA stress-ng: MMAP schbench: 8 - 16 perf-bench: Syscall Basic perf-bench: Futex Lock-Pi perf-bench: Sched Pipe perf-bench: Memset 1MB perf-bench: Memcpy 1MB perf-bench: Futex Hash perf-bench: Epoll Wait openssl: RSA 4096-bit Performance mbw: Memory Copy, Fixed Block Size - 1024 MiB mbw: Memory Copy - 1024 MiB hackbench: 16 - Process hackbench: 16 - Thread ctx-clock: Context Switch Time apache: Static Web Page Serving onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - f32 - CPU cachebench: Read / Modify / Write cachebench: Write cachebench: Read scimark2: Jacobi Successive Over-Relaxation scimark2: Dense LU Matrix Factorization scimark2: Sparse Matrix Multiply scimark2: Fast Fourier Transform scimark2: Monte Carlo scimark2: Composite npb: EP.D npb: EP.C ramspeed: Average - Floating Point ramspeed: Scale - Floating Point ramspeed: Add - Floating Point ramspeed: Average - Integer ramspeed: Scale - Integer ramspeed: Add - Integer amg: ipc-benchmark: Unnamed Unix Domain Socket - 1024 ipc-benchmark: FIFO Named Pipe - 1024 ipc-benchmark: Unnamed Pipe - 1024 ipc-benchmark: TCP Socket - 1024 ipc-benchmark: FIFO Named Pipe - 128 ipc-benchmark: Unnamed Pipe - 128 ipc-benchmark: TCP Socket - 128 graphics-magick: HWB Color Space graphics-magick: Noise-Gaussian graphics-magick: Resizing graphics-magick: Enhanced graphics-magick: Sharpen graphics-magick: Rotate graphics-magick: Swirl sysbench: CPU ipc-benchmark: Unnamed Unix Domain Socket - 128 sysbench1604Ph10 graphics-magick1604Ph10 ipc-benchmarking1604Ph10 ipc-benchmarking1024-1604Ph10 amg1604Ph10 tensorflow1604PH10 ramspeed1604Ph10 npb1604Ph10 scimark1604Ph10 cachebench1604Ph10 onednn1604Ph10 apache-ctx_clock1604Ph10 ctx-clock1604Ph10 hackbenchAll1604Ph10 hackbench1604Ph10 mbw1604Ph10 openssl1604Ph10 perf-bench1604Ph10 schbench8-16-1604Ph10 stress-ng1604Ph10 t-test1-1604Ph10 tinymembench1604Ph10 12018.18 711 120 579 120 82 458 287 1811477 1926153 1936803 1002040 1498550 1590154 1647029 1476380 97848307 10488.67 9818.92 11122.64 10366.89 9654.84 11115.05 217.22 221.32 830.83 314.80 500.28 167.16 101.47 382.91 21639.888567 10865.062305 2086.961474 6.41798 8572.49 12262.6 10.8067 8577.57 12314.2 8579.99 12254.6 13.8261 10.2525 43.9371 20.9738 14.6100 38.3076 4.86345 9.77902 18.5927 17.7590 19458.70 175 53.076 54.266 4729.510 8058.320 1177.1 14521622 892 99785 39.593470 14.073058 3649333 37459 104686 7377828.11 81.02 534980.71 2924879.73 3995.67 600.61 38522.11 26882.41 1237491.78 2077.87 18.30 101151.48 27205.00 26445743.26 1490.69 226821.97 307.67 96.81 118.16 14.942 47.910 7800.5 8984.2 OpenBenchmarking.org
Tinymembench Standard Memset OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memset tinymembench1604Ph10 2K 4K 6K 8K 10K SE +/- 10.48, N = 3 7800.5 1. (CC) gcc-7 options: -O2 -lm
Tinymembench Standard Memcpy OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memcpy tinymembench1604Ph10 2K 4K 6K 8K 10K SE +/- 22.06, N = 3 8984.2 1. (CC) gcc-7 options: -O2 -lm
t-test1 Threads: 2 OpenBenchmarking.org Seconds, Fewer Is Better t-test1 2017-01-13 Threads: 2 t-test1-1604Ph10 4 8 12 16 20 SE +/- 0.02, N = 3 14.94 1. (CC) gcc-7 options: -pthread
t-test1 Threads: 1 OpenBenchmarking.org Seconds, Fewer Is Better t-test1 2017-01-13 Threads: 1 t-test1-1604Ph10 11 22 33 44 55 SE +/- 0.20, N = 3 47.91 1. (CC) gcc-7 options: -pthread
Stress-NG Test: System V Message Passing OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: System V Message Passing stress-ng1604Ph10 1.6M 3.2M 4.8M 6.4M 8M SE +/- 92072.90, N = 15 7377828.11 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Glibc Qsort Data Sorting OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Glibc Qsort Data Sorting stress-ng1604Ph10 20 40 60 80 100 SE +/- 0.46, N = 3 81.02 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Glibc C String Functions OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Glibc C String Functions stress-ng1604Ph10 110K 220K 330K 440K 550K SE +/- 36.12, N = 3 534980.71 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Context Switching OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Context Switching stress-ng1604Ph10 600K 1200K 1800K 2400K 3000K SE +/- 29565.69, N = 3 2924879.73 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Socket Activity OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Socket Activity stress-ng1604Ph10 900 1800 2700 3600 4500 SE +/- 51.78, N = 3 3995.67 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Memory Copying OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Memory Copying stress-ng1604Ph10 130 260 390 520 650 SE +/- 0.40, N = 3 600.61 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Vector Math stress-ng1604Ph10 8K 16K 24K 32K 40K SE +/- 20.76, N = 3 38522.11 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Matrix Math stress-ng1604Ph10 6K 12K 18K 24K 30K SE +/- 48.19, N = 3 26882.41 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Semaphores OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Semaphores stress-ng1604Ph10 300K 600K 900K 1200K 1500K SE +/- 332.73, N = 3 1237491.78 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: CPU Stress OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Stress stress-ng1604Ph10 400 800 1200 1600 2000 SE +/- 20.72, N = 3 2077.87 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Cache stress-ng1604Ph10 5 10 15 20 25 SE +/- 0.35, N = 3 18.30 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: SENDFILE OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: SENDFILE stress-ng1604Ph10 20K 40K 60K 80K 100K SE +/- 320.48, N = 3 101151.48 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Forking OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Forking stress-ng1604Ph10 6K 12K 18K 24K 30K SE +/- 369.68, N = 3 27205.00 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Malloc OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Malloc stress-ng1604Ph10 6M 12M 18M 24M 30M SE +/- 17076.95, N = 3 26445743.26 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Crypto OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Crypto stress-ng1604Ph10 300 600 900 1200 1500 SE +/- 0.39, N = 3 1490.69 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Atomic OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Atomic stress-ng1604Ph10 50K 100K 150K 200K 250K SE +/- 292.77, N = 3 226821.97 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: MEMFD OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: MEMFD stress-ng1604Ph10 70 140 210 280 350 SE +/- 0.20, N = 3 307.67 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: NUMA OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: NUMA stress-ng1604Ph10 20 40 60 80 100 SE +/- 0.85, N = 3 96.81 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: MMAP OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: MMAP stress-ng1604Ph10 30 60 90 120 150 SE +/- 0.36, N = 3 118.16 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Schbench Message Threads: 8 - Workers Per Message Thread: 16 OpenBenchmarking.org usec, 99.9th Latency Percentile, Fewer Is Better Schbench Message Threads: 8 - Workers Per Message Thread: 16 schbench8-16-1604Ph10 20K 40K 60K 80K 100K SE +/- 1375.04, N = 7 104686 1. (CC) gcc-7 options: -O2 -lpthread
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic perf-bench1604Ph10 3M 6M 9M 12M 15M SE +/- 36942.47, N = 3 14521622 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi perf-bench1604Ph10 200 400 600 800 1000 892 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe perf-bench1604Ph10 20K 40K 60K 80K 100K SE +/- 340.78, N = 3 99785 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB perf-bench1604Ph10 9 18 27 36 45 SE +/- 0.10, N = 3 39.59 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB perf-bench1604Ph10 4 8 12 16 20 SE +/- 0.09, N = 3 14.07 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash perf-bench1604Ph10 800K 1600K 2400K 3200K 4000K SE +/- 8513.42, N = 3 3649333 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait perf-bench1604Ph10 8K 16K 24K 32K 40K SE +/- 36.75, N = 3 37459 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance openssl1604Ph10 300 600 900 1200 1500 SE +/- 2.35, N = 3 1177.1 1. (CC) gcc-7 options: -pthread -m64 -O3 -lssl -lcrypto -ldl
MBW Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB mbw1604Ph10 1000 2000 3000 4000 5000 SE +/- 20.26, N = 3 4729.51 1. (CC) gcc-7 options: -O3 -march=native
MBW Test: Memory Copy - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 1024 MiB mbw1604Ph10 2K 4K 6K 8K 10K SE +/- 14.77, N = 3 8058.32 1. (CC) gcc-7 options: -O3 -march=native
Hackbench Count: 16 - Type: Process OpenBenchmarking.org Seconds, Fewer Is Better Hackbench Count: 16 - Type: Process hackbench1604Ph10 12 24 36 48 60 SE +/- 0.64, N = 3 53.08 1. (CC) gcc-7 options: -lpthread
Hackbench Count: 16 - Type: Thread OpenBenchmarking.org Seconds, Fewer Is Better Hackbench Count: 16 - Type: Thread hackbench1604Ph10 12 24 36 48 60 SE +/- 0.77, N = 3 54.27 1. (CC) gcc-7 options: -lpthread
ctx_clock Context Switch Time OpenBenchmarking.org Clocks, Fewer Is Better ctx_clock Context Switch Time ctx-clock1604Ph10 40 80 120 160 200 175
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.29 Static Web Page Serving apache-ctx_clock1604Ph10 4K 8K 12K 16K 20K SE +/- 173.96, N = 3 19458.70 1. (CC) gcc-7 options: -shared -fPIC -O2 -pthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 2 4 6 8 10 SE +/- 0.00458, N = 3 6.41798 MIN: 6.09 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 23.27, N = 3 8572.49 MIN: 8526.54 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 27.97, N = 3 12262.6 MIN: 12196.2 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.06, N = 3 10.81 MIN: 10.37 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 13.01, N = 3 8577.57 MIN: 8526.62 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 27.82, N = 3 12314.2 MIN: 12240.5 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 12.76, N = 3 8579.99 MIN: 8534.51 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 38.73, N = 3 12254.6 MIN: 12140.4 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.06, N = 3 13.83 MIN: 13.37 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.01, N = 3 10.25 MIN: 9.57 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 10 20 30 40 50 SE +/- 0.02, N = 3 43.94 MIN: 43.44 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU onednn1604Ph10 5 10 15 20 25 SE +/- 0.06, N = 3 20.97 MIN: 19.86 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.03, N = 3 14.61 MIN: 13.03 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU onednn1604Ph10 9 18 27 36 45 SE +/- 0.00, N = 3 38.31 MIN: 37.4 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 1.0943 2.1886 3.2829 4.3772 5.4715 SE +/- 0.00067, N = 3 4.86345 MIN: 4.61 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.00477, N = 3 9.77902 MIN: 9.18 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU onednn1604Ph10 5 10 15 20 25 SE +/- 0.02, N = 3 18.59 MIN: 18.25 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.11, N = 3 17.76 MIN: 16.64 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write cachebench1604Ph10 5K 10K 15K 20K 25K SE +/- 39.55, N = 3 21639.89 MIN: 18656.61 / MAX: 22903.78 1. (CC) gcc-7 options: -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write cachebench1604Ph10 2K 4K 6K 8K 10K SE +/- 20.48, N = 3 10865.06 MIN: 9466.53 / MAX: 11482.88 1. (CC) gcc-7 options: -lrt
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read cachebench1604Ph10 400 800 1200 1600 2000 SE +/- 4.55, N = 3 2086.96 MIN: 2078.1 / MAX: 2093.84 1. (CC) gcc-7 options: -lrt
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation scimark1604Ph10 200 400 600 800 1000 SE +/- 22.88, N = 3 830.83 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization scimark1604Ph10 70 140 210 280 350 SE +/- 1.26, N = 3 314.80 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply scimark1604Ph10 110 220 330 440 550 SE +/- 1.11, N = 3 500.28 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform scimark1604Ph10 40 80 120 160 200 SE +/- 0.58, N = 3 167.16 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo scimark1604Ph10 20 40 60 80 100 SE +/- 0.31, N = 3 101.47 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite scimark1604Ph10 80 160 240 320 400 SE +/- 3.97, N = 3 382.91 1. (CC) gcc-7 options: -lm
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D npb1604Ph10 50 100 150 200 250 SE +/- 0.55, N = 3 217.22 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C npb1604Ph10 50 100 150 200 250 SE +/- 0.38, N = 3 221.32 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
RAMspeed SMP Type: Average - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 5.49, N = 3 10488.67 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 5.81, N = 3 9818.92 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 8.14, N = 3 11122.64 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Average - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 18.60, N = 3 10366.89 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 35.92, N = 3 9654.84 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 7.81, N = 3 11115.05 1. (CC) gcc-7 options: -O3 -march=native
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 amg1604Ph10 20M 40M 60M 80M 100M SE +/- 47423.33, N = 3 97848307 1. (CC) gcc-7 options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 7449.83, N = 3 1498550
IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 9737.22, N = 3 1590154
IPC_benchmark Type: Unnamed Pipe - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Pipe - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 13168.11, N = 3 1647029
IPC_benchmark Type: TCP Socket - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: TCP Socket - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 5988.24, N = 3 1476380
IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 19705.06, N = 3 1811477
IPC_benchmark Type: Unnamed Pipe - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Pipe - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 7932.21, N = 3 1926153
IPC_benchmark Type: TCP Socket - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: TCP Socket - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 4379.21, N = 3 1936803
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space graphics-magick1604Ph10 150 300 450 600 750 SE +/- 0.67, N = 3 711 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Noise-Gaussian OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian graphics-magick1604Ph10 30 60 90 120 150 120 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing graphics-magick1604Ph10 130 260 390 520 650 SE +/- 0.33, N = 3 579 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced graphics-magick1604Ph10 30 60 90 120 150 SE +/- 0.33, N = 3 120 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen graphics-magick1604Ph10 20 40 60 80 100 82 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate graphics-magick1604Ph10 100 200 300 400 500 SE +/- 1.86, N = 3 458 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl graphics-magick1604Ph10 60 120 180 240 300 SE +/- 0.88, N = 3 287 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU sysbench1604Ph10 3K 6K 9K 12K 15K SE +/- 54.78, N = 3 12018.18 1. (CC) gcc-7 options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 128 ipc-benchmarking1604Ph10 200K 400K 600K 800K 1000K SE +/- 43657.25, N = 15 1002040
Phoronix Test Suite v10.8.5