ss AMD EPYC 3255 8-Core Temp testing with a congatec conga-B7E3 (5.13 BIOS) and MSI NVIDIA GeForce GTX 1050 2GB on Ubuntu 16.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2106293-IB-SS230099426&export=pdf&grr&rdt&rro .
ss Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenGL Compiler File-System Screen Resolution sysbench1604Ph10 graphics-magick1604Ph10 ipc-benchmarking1604Ph10 ipc-benchmarking1024-1604Ph10 amg1604Ph10 tensorflow1604PH10 ramspeed1604Ph10 npb1604Ph10 scimark1604Ph10 cachebench1604Ph10 onednn1604Ph10 apache-ctx_clock1604Ph10 ctx-clock1604Ph10 hackbenchAll1604Ph10 hackbench1604Ph10 mbw1604Ph10 openssl1604Ph10 perf-bench1604Ph10 schbench8-16-1604Ph10 stress-ng1604Ph10 t-test1-1604Ph10 tinymembench1604Ph10 AMD EPYC 3255 8-Core Temp @ 2.50GHz (8 Cores / 16 Threads) congatec conga-B7E3 (5.13 BIOS) AMD 17h 32GB 2000GB Samsung SSD 970 EVO 2TB + 2000GB Portable SSD T5 MSI NVIDIA GeForce GTX 1050 2GB NVIDIA GP107GL HD Audio Intel I211 + Intel I210 + 2 x AMD Device 1458 + 2 x AMD Device 1459 Ubuntu 16.04 4.15.0-123-generic (x86_64) X Server NVIDIA 1.4 (2.1 Mesa 10.5.4) GCC 5.5.0 20171010 ext4 800x600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - sysbench1604Ph10, graphics-magick1604Ph10, ipc-benchmarking1604Ph10, ipc-benchmarking1024-1604Ph10, amg1604Ph10, ramspeed1604Ph10, npb1604Ph10, scimark1604Ph10, cachebench1604Ph10, onednn1604Ph10, apache-ctx_clock1604Ph10, ctx-clock1604Ph10, hackbenchAll1604Ph10, hackbench1604Ph10, mbw1604Ph10, openssl1604Ph10, perf-bench1604Ph10, schbench8-16-1604Ph10, stress-ng1604Ph10, t-test1-1604Ph10, tinymembench1604Ph10: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected Python Details - tensorflow1604PH10: sh: 1: /opt/TensorRT/python: Permission denied + Python 3.5.2
ss npb: EP.D tinymembench: Standard Memset tinymembench: Standard Memcpy ramspeed: Average - Integer ramspeed: Scale - Integer ramspeed: Add - Integer ramspeed: Scale - Floating Point ramspeed: Add - Floating Point ramspeed: Average - Floating Point stress-ng: System V Message Passing onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - f32 - CPU cachebench: Read / Modify / Write cachebench: Write cachebench: Read onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU sysbench: CPU schbench: 8 - 16 graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Swirl graphics-magick: Noise-Gaussian graphics-magick: Resizing graphics-magick: Rotate graphics-magick: HWB Color Space hackbench: 16 - Thread hackbench: 16 - Process apache: Static Web Page Serving perf-bench: Sched Pipe t-test1: 1 ipc-benchmark: Unnamed Unix Domain Socket - 128 amg: npb: EP.C perf-bench: Epoll Wait stress-ng: CPU Stress stress-ng: NUMA stress-ng: Memory Copying stress-ng: Malloc stress-ng: Crypto stress-ng: SENDFILE stress-ng: CPU Cache stress-ng: Atomic stress-ng: Glibc Qsort Data Sorting stress-ng: Glibc C String Functions stress-ng: Context Switching stress-ng: Socket Activity stress-ng: Vector Math stress-ng: Matrix Math stress-ng: Semaphores stress-ng: Forking stress-ng: MEMFD stress-ng: MMAP perf-bench: Futex Lock-Pi perf-bench: Futex Hash perf-bench: Memcpy 1MB scimark2: Composite mbw: Memory Copy, Fixed Block Size - 1024 MiB onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU openssl: RSA 4096-bit Performance onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU t-test1: 2 mbw: Memory Copy - 1024 MiB ipc-benchmark: TCP Socket - 1024 onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU perf-bench: Memset 1MB onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU perf-bench: Syscall Basic onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU ipc-benchmark: Unnamed Unix Domain Socket - 1024 ipc-benchmark: FIFO Named Pipe - 1024 ipc-benchmark: TCP Socket - 128 ipc-benchmark: Unnamed Pipe - 1024 ipc-benchmark: FIFO Named Pipe - 128 ipc-benchmark: Unnamed Pipe - 128 onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU ctx-clock: Context Switch Time scimark2: Jacobi Successive Over-Relaxation scimark2: Dense LU Matrix Factorization scimark2: Sparse Matrix Multiply scimark2: Fast Fourier Transform scimark2: Monte Carlo sysbench1604Ph10 graphics-magick1604Ph10 ipc-benchmarking1604Ph10 ipc-benchmarking1024-1604Ph10 amg1604Ph10 tensorflow1604PH10 ramspeed1604Ph10 npb1604Ph10 scimark1604Ph10 cachebench1604Ph10 onednn1604Ph10 apache-ctx_clock1604Ph10 ctx-clock1604Ph10 hackbenchAll1604Ph10 hackbench1604Ph10 mbw1604Ph10 openssl1604Ph10 perf-bench1604Ph10 schbench8-16-1604Ph10 stress-ng1604Ph10 t-test1-1604Ph10 tinymembench1604Ph10 12018.18 82 120 287 120 579 458 711 1002040 1936803 1811477 1926153 1476380 1498550 1590154 1647029 97848307 10366.89 9654.84 11115.05 9818.92 11122.64 10488.67 217.22 221.32 382.91 830.83 314.80 500.28 167.16 101.47 21639.888567 10865.062305 2086.961474 12314.2 12262.6 12254.6 8577.57 8579.99 8572.49 14.6100 10.2525 17.7590 9.77902 10.8067 6.41798 18.5927 4.86345 43.9371 38.3076 20.9738 13.8261 19458.70 175 54.266 53.076 4729.510 8058.320 1177.1 99785 37459 892 3649333 14.073058 39.593470 14521622 104686 7377828.11 2077.87 96.81 600.61 26445743.26 1490.69 101151.48 18.30 226821.97 81.02 534980.71 2924879.73 3995.67 38522.11 26882.41 1237491.78 27205.00 307.67 118.16 47.910 14.942 7800.5 8984.2 OpenBenchmarking.org
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D npb1604Ph10 50 100 150 200 250 SE +/- 0.55, N = 3 217.22 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
Tinymembench Standard Memset OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memset tinymembench1604Ph10 2K 4K 6K 8K 10K SE +/- 10.48, N = 3 7800.5 1. (CC) gcc-7 options: -O2 -lm
Tinymembench Standard Memcpy OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memcpy tinymembench1604Ph10 2K 4K 6K 8K 10K SE +/- 22.06, N = 3 8984.2 1. (CC) gcc-7 options: -O2 -lm
RAMspeed SMP Type: Average - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 18.60, N = 3 10366.89 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 35.92, N = 3 9654.84 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 7.81, N = 3 11115.05 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 5.81, N = 3 9818.92 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 8.14, N = 3 11122.64 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Average - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 5.49, N = 3 10488.67 1. (CC) gcc-7 options: -O3 -march=native
Stress-NG Test: System V Message Passing OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: System V Message Passing stress-ng1604Ph10 1.6M 3.2M 4.8M 6.4M 8M SE +/- 92072.90, N = 15 7377828.11 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 27.82, N = 3 12314.2 MIN: 12240.5 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 27.97, N = 3 12262.6 MIN: 12196.2 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 38.73, N = 3 12254.6 MIN: 12140.4 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write cachebench1604Ph10 5K 10K 15K 20K 25K SE +/- 39.55, N = 3 21639.89 MIN: 18656.61 / MAX: 22903.78 1. (CC) gcc-7 options: -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write cachebench1604Ph10 2K 4K 6K 8K 10K SE +/- 20.48, N = 3 10865.06 MIN: 9466.53 / MAX: 11482.88 1. (CC) gcc-7 options: -lrt
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read cachebench1604Ph10 400 800 1200 1600 2000 SE +/- 4.55, N = 3 2086.96 MIN: 2078.1 / MAX: 2093.84 1. (CC) gcc-7 options: -lrt
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 13.01, N = 3 8577.57 MIN: 8526.62 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 12.76, N = 3 8579.99 MIN: 8534.51 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 23.27, N = 3 8572.49 MIN: 8526.54 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU sysbench1604Ph10 3K 6K 9K 12K 15K SE +/- 54.78, N = 3 12018.18 1. (CC) gcc-7 options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
Schbench Message Threads: 8 - Workers Per Message Thread: 16 OpenBenchmarking.org usec, 99.9th Latency Percentile, Fewer Is Better Schbench Message Threads: 8 - Workers Per Message Thread: 16 schbench8-16-1604Ph10 20K 40K 60K 80K 100K SE +/- 1375.04, N = 7 104686 1. (CC) gcc-7 options: -O2 -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen graphics-magick1604Ph10 20 40 60 80 100 82 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced graphics-magick1604Ph10 30 60 90 120 150 SE +/- 0.33, N = 3 120 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl graphics-magick1604Ph10 60 120 180 240 300 SE +/- 0.88, N = 3 287 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Noise-Gaussian OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian graphics-magick1604Ph10 30 60 90 120 150 120 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing graphics-magick1604Ph10 130 260 390 520 650 SE +/- 0.33, N = 3 579 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate graphics-magick1604Ph10 100 200 300 400 500 SE +/- 1.86, N = 3 458 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space graphics-magick1604Ph10 150 300 450 600 750 SE +/- 0.67, N = 3 711 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Hackbench Count: 16 - Type: Thread OpenBenchmarking.org Seconds, Fewer Is Better Hackbench Count: 16 - Type: Thread hackbench1604Ph10 12 24 36 48 60 SE +/- 0.77, N = 3 54.27 1. (CC) gcc-7 options: -lpthread
Hackbench Count: 16 - Type: Process OpenBenchmarking.org Seconds, Fewer Is Better Hackbench Count: 16 - Type: Process hackbench1604Ph10 12 24 36 48 60 SE +/- 0.64, N = 3 53.08 1. (CC) gcc-7 options: -lpthread
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.29 Static Web Page Serving apache-ctx_clock1604Ph10 4K 8K 12K 16K 20K SE +/- 173.96, N = 3 19458.70 1. (CC) gcc-7 options: -shared -fPIC -O2 -pthread
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe perf-bench1604Ph10 20K 40K 60K 80K 100K SE +/- 340.78, N = 3 99785 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
t-test1 Threads: 1 OpenBenchmarking.org Seconds, Fewer Is Better t-test1 2017-01-13 Threads: 1 t-test1-1604Ph10 11 22 33 44 55 SE +/- 0.20, N = 3 47.91 1. (CC) gcc-7 options: -pthread
IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 128 ipc-benchmarking1604Ph10 200K 400K 600K 800K 1000K SE +/- 43657.25, N = 15 1002040
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 amg1604Ph10 20M 40M 60M 80M 100M SE +/- 47423.33, N = 3 97848307 1. (CC) gcc-7 options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C npb1604Ph10 50 100 150 200 250 SE +/- 0.38, N = 3 221.32 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait perf-bench1604Ph10 8K 16K 24K 32K 40K SE +/- 36.75, N = 3 37459 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
Stress-NG Test: CPU Stress OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Stress stress-ng1604Ph10 400 800 1200 1600 2000 SE +/- 20.72, N = 3 2077.87 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: NUMA OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: NUMA stress-ng1604Ph10 20 40 60 80 100 SE +/- 0.85, N = 3 96.81 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Memory Copying OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Memory Copying stress-ng1604Ph10 130 260 390 520 650 SE +/- 0.40, N = 3 600.61 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Malloc OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Malloc stress-ng1604Ph10 6M 12M 18M 24M 30M SE +/- 17076.95, N = 3 26445743.26 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Crypto OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Crypto stress-ng1604Ph10 300 600 900 1200 1500 SE +/- 0.39, N = 3 1490.69 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: SENDFILE OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: SENDFILE stress-ng1604Ph10 20K 40K 60K 80K 100K SE +/- 320.48, N = 3 101151.48 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Cache stress-ng1604Ph10 5 10 15 20 25 SE +/- 0.35, N = 3 18.30 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Atomic OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Atomic stress-ng1604Ph10 50K 100K 150K 200K 250K SE +/- 292.77, N = 3 226821.97 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Glibc Qsort Data Sorting OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Glibc Qsort Data Sorting stress-ng1604Ph10 20 40 60 80 100 SE +/- 0.46, N = 3 81.02 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Glibc C String Functions OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Glibc C String Functions stress-ng1604Ph10 110K 220K 330K 440K 550K SE +/- 36.12, N = 3 534980.71 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Context Switching OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Context Switching stress-ng1604Ph10 600K 1200K 1800K 2400K 3000K SE +/- 29565.69, N = 3 2924879.73 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Socket Activity OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Socket Activity stress-ng1604Ph10 900 1800 2700 3600 4500 SE +/- 51.78, N = 3 3995.67 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Vector Math stress-ng1604Ph10 8K 16K 24K 32K 40K SE +/- 20.76, N = 3 38522.11 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Matrix Math stress-ng1604Ph10 6K 12K 18K 24K 30K SE +/- 48.19, N = 3 26882.41 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Semaphores OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Semaphores stress-ng1604Ph10 300K 600K 900K 1200K 1500K SE +/- 332.73, N = 3 1237491.78 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Forking OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Forking stress-ng1604Ph10 6K 12K 18K 24K 30K SE +/- 369.68, N = 3 27205.00 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: MEMFD OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: MEMFD stress-ng1604Ph10 70 140 210 280 350 SE +/- 0.20, N = 3 307.67 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: MMAP OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: MMAP stress-ng1604Ph10 30 60 90 120 150 SE +/- 0.36, N = 3 118.16 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi perf-bench1604Ph10 200 400 600 800 1000 892 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash perf-bench1604Ph10 800K 1600K 2400K 3200K 4000K SE +/- 8513.42, N = 3 3649333 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB perf-bench1604Ph10 4 8 12 16 20 SE +/- 0.09, N = 3 14.07 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite scimark1604Ph10 80 160 240 320 400 SE +/- 3.97, N = 3 382.91 1. (CC) gcc-7 options: -lm
MBW Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB mbw1604Ph10 1000 2000 3000 4000 5000 SE +/- 20.26, N = 3 4729.51 1. (CC) gcc-7 options: -O3 -march=native
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.03, N = 3 14.61 MIN: 13.03 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.01, N = 3 10.25 MIN: 9.57 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance openssl1604Ph10 300 600 900 1200 1500 SE +/- 2.35, N = 3 1177.1 1. (CC) gcc-7 options: -pthread -m64 -O3 -lssl -lcrypto -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.11, N = 3 17.76 MIN: 16.64 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.00477, N = 3 9.77902 MIN: 9.18 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
t-test1 Threads: 2 OpenBenchmarking.org Seconds, Fewer Is Better t-test1 2017-01-13 Threads: 2 t-test1-1604Ph10 4 8 12 16 20 SE +/- 0.02, N = 3 14.94 1. (CC) gcc-7 options: -pthread
MBW Test: Memory Copy - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 1024 MiB mbw1604Ph10 2K 4K 6K 8K 10K SE +/- 14.77, N = 3 8058.32 1. (CC) gcc-7 options: -O3 -march=native
IPC_benchmark Type: TCP Socket - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: TCP Socket - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 5988.24, N = 3 1476380
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.06, N = 3 10.81 MIN: 10.37 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 2 4 6 8 10 SE +/- 0.00458, N = 3 6.41798 MIN: 6.09 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB perf-bench1604Ph10 9 18 27 36 45 SE +/- 0.10, N = 3 39.59 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU onednn1604Ph10 5 10 15 20 25 SE +/- 0.02, N = 3 18.59 MIN: 18.25 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 1.0943 2.1886 3.2829 4.3772 5.4715 SE +/- 0.00067, N = 3 4.86345 MIN: 4.61 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic perf-bench1604Ph10 3M 6M 9M 12M 15M SE +/- 36942.47, N = 3 14521622 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 10 20 30 40 50 SE +/- 0.02, N = 3 43.94 MIN: 43.44 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU onednn1604Ph10 9 18 27 36 45 SE +/- 0.00, N = 3 38.31 MIN: 37.4 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 7449.83, N = 3 1498550
IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 9737.22, N = 3 1590154
IPC_benchmark Type: TCP Socket - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: TCP Socket - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 4379.21, N = 3 1936803
IPC_benchmark Type: Unnamed Pipe - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Pipe - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 13168.11, N = 3 1647029
IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 19705.06, N = 3 1811477
IPC_benchmark Type: Unnamed Pipe - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Pipe - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 7932.21, N = 3 1926153
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU onednn1604Ph10 5 10 15 20 25 SE +/- 0.06, N = 3 20.97 MIN: 19.86 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.06, N = 3 13.83 MIN: 13.37 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
ctx_clock Context Switch Time OpenBenchmarking.org Clocks, Fewer Is Better ctx_clock Context Switch Time ctx-clock1604Ph10 40 80 120 160 200 175
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation scimark1604Ph10 200 400 600 800 1000 SE +/- 22.88, N = 3 830.83 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization scimark1604Ph10 70 140 210 280 350 SE +/- 1.26, N = 3 314.80 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply scimark1604Ph10 110 220 330 440 550 SE +/- 1.11, N = 3 500.28 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform scimark1604Ph10 40 80 120 160 200 SE +/- 0.58, N = 3 167.16 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo scimark1604Ph10 20 40 60 80 100 SE +/- 0.31, N = 3 101.47 1. (CC) gcc-7 options: -lm
Phoronix Test Suite v10.8.5