ss AMD EPYC 3255 8-Core Temp testing with a congatec conga-B7E3 (5.13 BIOS) and MSI NVIDIA GeForce GTX 1050 2GB on Ubuntu 16.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2106293-IB-SS230099426&sor&export=txt&gru .
ss Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenGL Compiler File-System Screen Resolution sysbench1604Ph10 graphics-magick1604Ph10 ipc-benchmarking1604Ph10 ipc-benchmarking1024-1604Ph10 amg1604Ph10 tensorflow1604PH10 ramspeed1604Ph10 npb1604Ph10 scimark1604Ph10 cachebench1604Ph10 onednn1604Ph10 apache-ctx_clock1604Ph10 ctx-clock1604Ph10 hackbenchAll1604Ph10 hackbench1604Ph10 mbw1604Ph10 openssl1604Ph10 perf-bench1604Ph10 schbench8-16-1604Ph10 stress-ng1604Ph10 t-test1-1604Ph10 tinymembench1604Ph10 AMD EPYC 3255 8-Core Temp @ 2.50GHz (8 Cores / 16 Threads) congatec conga-B7E3 (5.13 BIOS) AMD 17h 32GB 2000GB Samsung SSD 970 EVO 2TB + 2000GB Portable SSD T5 MSI NVIDIA GeForce GTX 1050 2GB NVIDIA GP107GL HD Audio Intel I211 + Intel I210 + 2 x AMD Device 1458 + 2 x AMD Device 1459 Ubuntu 16.04 4.15.0-123-generic (x86_64) X Server NVIDIA 1.4 (2.1 Mesa 10.5.4) GCC 5.5.0 20171010 ext4 800x600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - sysbench1604Ph10, graphics-magick1604Ph10, ipc-benchmarking1604Ph10, ipc-benchmarking1024-1604Ph10, amg1604Ph10, ramspeed1604Ph10, npb1604Ph10, scimark1604Ph10, cachebench1604Ph10, onednn1604Ph10, apache-ctx_clock1604Ph10, ctx-clock1604Ph10, hackbenchAll1604Ph10, hackbench1604Ph10, mbw1604Ph10, openssl1604Ph10, perf-bench1604Ph10, schbench8-16-1604Ph10, stress-ng1604Ph10, t-test1-1604Ph10, tinymembench1604Ph10: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8001250 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected Python Details - tensorflow1604PH10: sh: 1: /opt/TensorRT/python: Permission denied + Python 3.5.2
ss stress-ng: MMAP stress-ng: NUMA stress-ng: MEMFD stress-ng: Atomic stress-ng: Crypto stress-ng: Malloc stress-ng: Forking stress-ng: SENDFILE stress-ng: CPU Cache stress-ng: CPU Stress stress-ng: Semaphores stress-ng: Matrix Math stress-ng: Vector Math stress-ng: Memory Copying stress-ng: Socket Activity stress-ng: Context Switching stress-ng: Glibc C String Functions stress-ng: Glibc Qsort Data Sorting stress-ng: System V Message Passing sysbench: CPU amg: perf-bench: Memcpy 1MB perf-bench: Memset 1MB graphics-magick: Swirl graphics-magick: Rotate graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Resizing graphics-magick: Noise-Gaussian graphics-magick: HWB Color Space ramspeed: Add - Integer ramspeed: Scale - Integer ramspeed: Average - Integer ramspeed: Add - Floating Point ramspeed: Scale - Floating Point ramspeed: Average - Floating Point cachebench: Read cachebench: Write cachebench: Read / Modify / Write tinymembench: Standard Memcpy tinymembench: Standard Memset ipc-benchmark: TCP Socket - 128 ipc-benchmark: Unnamed Pipe - 128 ipc-benchmark: FIFO Named Pipe - 128 ipc-benchmark: Unnamed Unix Domain Socket - 128 ipc-benchmark: TCP Socket - 1024 ipc-benchmark: Unnamed Pipe - 1024 ipc-benchmark: FIFO Named Pipe - 1024 ipc-benchmark: Unnamed Unix Domain Socket - 1024 scimark2: Composite scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Sparse Matrix Multiply scimark2: Dense LU Matrix Factorization scimark2: Jacobi Successive Over-Relaxation mbw: Memory Copy - 1024 MiB mbw: Memory Copy, Fixed Block Size - 1024 MiB perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic apache: Static Web Page Serving openssl: RSA 4096-bit Performance npb: EP.C npb: EP.D ctx-clock: Context Switch Time onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU hackbench: 16 - Thread hackbench: 16 - Process t-test1: 1 t-test1: 2 schbench: 8 - 16 sysbench1604Ph10 graphics-magick1604Ph10 ipc-benchmarking1604Ph10 ipc-benchmarking1024-1604Ph10 amg1604Ph10 tensorflow1604PH10 ramspeed1604Ph10 npb1604Ph10 scimark1604Ph10 cachebench1604Ph10 onednn1604Ph10 apache-ctx_clock1604Ph10 ctx-clock1604Ph10 hackbenchAll1604Ph10 hackbench1604Ph10 mbw1604Ph10 openssl1604Ph10 perf-bench1604Ph10 schbench8-16-1604Ph10 stress-ng1604Ph10 t-test1-1604Ph10 tinymembench1604Ph10 12018.18 287 458 82 120 579 120 711 1936803 1926153 1811477 1002040 1476380 1647029 1590154 1498550 97848307 11115.05 9654.84 10366.89 11122.64 9818.92 10488.67 221.32 217.22 382.91 101.47 167.16 500.28 314.80 830.83 2086.961474 10865.062305 21639.888567 17.7590 18.5927 9.77902 4.86345 38.3076 14.6100 20.9738 43.9371 10.2525 13.8261 12254.6 8579.99 12314.2 8577.57 10.8067 12262.6 8572.49 6.41798 19458.70 175 54.266 53.076 8058.320 4729.510 1177.1 14.073058 39.593470 37459 3649333 99785 892 14521622 104686 118.16 96.81 307.67 226821.97 1490.69 26445743.26 27205.00 101151.48 18.30 2077.87 1237491.78 26882.41 38522.11 600.61 3995.67 2924879.73 534980.71 81.02 7377828.11 47.910 14.942 8984.2 7800.5 OpenBenchmarking.org
Stress-NG Test: MMAP OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: MMAP stress-ng1604Ph10 30 60 90 120 150 SE +/- 0.36, N = 3 118.16 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: NUMA OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: NUMA stress-ng1604Ph10 20 40 60 80 100 SE +/- 0.85, N = 3 96.81 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: MEMFD OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: MEMFD stress-ng1604Ph10 70 140 210 280 350 SE +/- 0.20, N = 3 307.67 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Atomic OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Atomic stress-ng1604Ph10 50K 100K 150K 200K 250K SE +/- 292.77, N = 3 226821.97 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Crypto OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Crypto stress-ng1604Ph10 300 600 900 1200 1500 SE +/- 0.39, N = 3 1490.69 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Malloc OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Malloc stress-ng1604Ph10 6M 12M 18M 24M 30M SE +/- 17076.95, N = 3 26445743.26 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Forking OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Forking stress-ng1604Ph10 6K 12K 18K 24K 30K SE +/- 369.68, N = 3 27205.00 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: SENDFILE OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: SENDFILE stress-ng1604Ph10 20K 40K 60K 80K 100K SE +/- 320.48, N = 3 101151.48 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Cache stress-ng1604Ph10 5 10 15 20 25 SE +/- 0.35, N = 3 18.30 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: CPU Stress OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: CPU Stress stress-ng1604Ph10 400 800 1200 1600 2000 SE +/- 20.72, N = 3 2077.87 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Semaphores OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Semaphores stress-ng1604Ph10 300K 600K 900K 1200K 1500K SE +/- 332.73, N = 3 1237491.78 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Matrix Math stress-ng1604Ph10 6K 12K 18K 24K 30K SE +/- 48.19, N = 3 26882.41 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Vector Math stress-ng1604Ph10 8K 16K 24K 32K 40K SE +/- 20.76, N = 3 38522.11 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Memory Copying OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Memory Copying stress-ng1604Ph10 130 260 390 520 650 SE +/- 0.40, N = 3 600.61 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Socket Activity OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Socket Activity stress-ng1604Ph10 900 1800 2700 3600 4500 SE +/- 51.78, N = 3 3995.67 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Context Switching OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Context Switching stress-ng1604Ph10 600K 1200K 1800K 2400K 3000K SE +/- 29565.69, N = 3 2924879.73 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Glibc C String Functions OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Glibc C String Functions stress-ng1604Ph10 110K 220K 330K 440K 550K SE +/- 36.12, N = 3 534980.71 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: Glibc Qsort Data Sorting OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: Glibc Qsort Data Sorting stress-ng1604Ph10 20 40 60 80 100 SE +/- 0.46, N = 3 81.02 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Stress-NG Test: System V Message Passing OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.11.07 Test: System V Message Passing stress-ng1604Ph10 1.6M 3.2M 4.8M 6.4M 8M SE +/- 92072.90, N = 15 7377828.11 1. (CC) gcc-7 options: -O2 -std=gnu99 -lm -laio -lcrypt -lrt -lz -ldl -lpthread -lc
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU sysbench1604Ph10 3K 6K 9K 12K 15K SE +/- 54.78, N = 3 12018.18 1. (CC) gcc-7 options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 amg1604Ph10 20M 40M 60M 80M 100M SE +/- 47423.33, N = 3 97848307 1. (CC) gcc-7 options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB perf-bench1604Ph10 4 8 12 16 20 SE +/- 0.09, N = 3 14.07 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB perf-bench1604Ph10 9 18 27 36 45 SE +/- 0.10, N = 3 39.59 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl graphics-magick1604Ph10 60 120 180 240 300 SE +/- 0.88, N = 3 287 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate graphics-magick1604Ph10 100 200 300 400 500 SE +/- 1.86, N = 3 458 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen graphics-magick1604Ph10 20 40 60 80 100 82 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced graphics-magick1604Ph10 30 60 90 120 150 SE +/- 0.33, N = 3 120 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing graphics-magick1604Ph10 130 260 390 520 650 SE +/- 0.33, N = 3 579 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Noise-Gaussian OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian graphics-magick1604Ph10 30 60 90 120 150 120 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space graphics-magick1604Ph10 150 300 450 600 750 SE +/- 0.67, N = 3 711 1. (CC) gcc-7 options: -fopenmp -O2 -pthread -ljbig -ltiff -ljasper -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
RAMspeed SMP Type: Add - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 7.81, N = 3 11115.05 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 35.92, N = 3 9654.84 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Average - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Integer ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 18.60, N = 3 10366.89 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 8.14, N = 3 11122.64 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 5.81, N = 3 9818.92 1. (CC) gcc-7 options: -O3 -march=native
RAMspeed SMP Type: Average - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Floating Point ramspeed1604Ph10 2K 4K 6K 8K 10K SE +/- 5.49, N = 3 10488.67 1. (CC) gcc-7 options: -O3 -march=native
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read cachebench1604Ph10 400 800 1200 1600 2000 SE +/- 4.55, N = 3 2086.96 MIN: 2078.1 / MAX: 2093.84 1. (CC) gcc-7 options: -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write cachebench1604Ph10 2K 4K 6K 8K 10K SE +/- 20.48, N = 3 10865.06 MIN: 9466.53 / MAX: 11482.88 1. (CC) gcc-7 options: -lrt
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write cachebench1604Ph10 5K 10K 15K 20K 25K SE +/- 39.55, N = 3 21639.89 MIN: 18656.61 / MAX: 22903.78 1. (CC) gcc-7 options: -lrt
Tinymembench Standard Memcpy OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memcpy tinymembench1604Ph10 2K 4K 6K 8K 10K SE +/- 22.06, N = 3 8984.2 1. (CC) gcc-7 options: -O2 -lm
Tinymembench Standard Memset OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memset tinymembench1604Ph10 2K 4K 6K 8K 10K SE +/- 10.48, N = 3 7800.5 1. (CC) gcc-7 options: -O2 -lm
IPC_benchmark Type: TCP Socket - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: TCP Socket - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 4379.21, N = 3 1936803
IPC_benchmark Type: Unnamed Pipe - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Pipe - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 7932.21, N = 3 1926153
IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 128 ipc-benchmarking1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 19705.06, N = 3 1811477
IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 128 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 128 ipc-benchmarking1604Ph10 200K 400K 600K 800K 1000K SE +/- 43657.25, N = 15 1002040
IPC_benchmark Type: TCP Socket - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: TCP Socket - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 5988.24, N = 3 1476380
IPC_benchmark Type: Unnamed Pipe - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Pipe - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 400K 800K 1200K 1600K 2000K SE +/- 13168.11, N = 3 1647029
IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: FIFO Named Pipe - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 9737.22, N = 3 1590154
IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 1024 OpenBenchmarking.org Messages Per Second, More Is Better IPC_benchmark Type: Unnamed Unix Domain Socket - Message Bytes: 1024 ipc-benchmarking1024-1604Ph10 300K 600K 900K 1200K 1500K SE +/- 7449.83, N = 3 1498550
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite scimark1604Ph10 80 160 240 320 400 SE +/- 3.97, N = 3 382.91 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo scimark1604Ph10 20 40 60 80 100 SE +/- 0.31, N = 3 101.47 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform scimark1604Ph10 40 80 120 160 200 SE +/- 0.58, N = 3 167.16 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply scimark1604Ph10 110 220 330 440 550 SE +/- 1.11, N = 3 500.28 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization scimark1604Ph10 70 140 210 280 350 SE +/- 1.26, N = 3 314.80 1. (CC) gcc-7 options: -lm
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation scimark1604Ph10 200 400 600 800 1000 SE +/- 22.88, N = 3 830.83 1. (CC) gcc-7 options: -lm
MBW Test: Memory Copy - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 1024 MiB mbw1604Ph10 2K 4K 6K 8K 10K SE +/- 14.77, N = 3 8058.32 1. (CC) gcc-7 options: -O3 -march=native
MBW Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB mbw1604Ph10 1000 2000 3000 4000 5000 SE +/- 20.26, N = 3 4729.51 1. (CC) gcc-7 options: -O3 -march=native
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait perf-bench1604Ph10 8K 16K 24K 32K 40K SE +/- 36.75, N = 3 37459 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash perf-bench1604Ph10 800K 1600K 2400K 3200K 4000K SE +/- 8513.42, N = 3 3649333 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe perf-bench1604Ph10 20K 40K 60K 80K 100K SE +/- 340.78, N = 3 99785 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi perf-bench1604Ph10 200 400 600 800 1000 892 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic perf-bench1604Ph10 3M 6M 9M 12M 15M SE +/- 36942.47, N = 3 14521622 1. (CC) gcc-7 options: -O6 -ggdb3 -funwind-tables -std=gnu99 -Xlinker -export-dynamic -lpthread -lrt -lm -ldl -lcrypto -lperl -lc -lcrypt -lpython2.7 -lutil -lz -llzma -lnuma
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.29 Static Web Page Serving apache-ctx_clock1604Ph10 4K 8K 12K 16K 20K SE +/- 173.96, N = 3 19458.70 1. (CC) gcc-7 options: -shared -fPIC -O2 -pthread
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance openssl1604Ph10 300 600 900 1200 1500 SE +/- 2.35, N = 3 1177.1 1. (CC) gcc-7 options: -pthread -m64 -O3 -lssl -lcrypto -ldl
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C npb1604Ph10 50 100 150 200 250 SE +/- 0.38, N = 3 221.32 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D npb1604Ph10 50 100 150 200 250 SE +/- 0.55, N = 3 217.22 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
ctx_clock Context Switch Time OpenBenchmarking.org Clocks, Fewer Is Better ctx_clock Context Switch Time ctx-clock1604Ph10 40 80 120 160 200 175
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.11, N = 3 17.76 MIN: 16.64 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU onednn1604Ph10 5 10 15 20 25 SE +/- 0.02, N = 3 18.59 MIN: 18.25 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.00477, N = 3 9.77902 MIN: 9.18 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 1.0943 2.1886 3.2829 4.3772 5.4715 SE +/- 0.00067, N = 3 4.86345 MIN: 4.61 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU onednn1604Ph10 9 18 27 36 45 SE +/- 0.00, N = 3 38.31 MIN: 37.4 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.03, N = 3 14.61 MIN: 13.03 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU onednn1604Ph10 5 10 15 20 25 SE +/- 0.06, N = 3 20.97 MIN: 19.86 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 10 20 30 40 50 SE +/- 0.02, N = 3 43.94 MIN: 43.44 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.01, N = 3 10.25 MIN: 9.57 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 4 8 12 16 20 SE +/- 0.06, N = 3 13.83 MIN: 13.37 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 38.73, N = 3 12254.6 MIN: 12140.4 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 12.76, N = 3 8579.99 MIN: 8534.51 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 27.82, N = 3 12314.2 MIN: 12240.5 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 13.01, N = 3 8577.57 MIN: 8526.62 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU onednn1604Ph10 3 6 9 12 15 SE +/- 0.06, N = 3 10.81 MIN: 10.37 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU onednn1604Ph10 3K 6K 9K 12K 15K SE +/- 27.97, N = 3 12262.6 MIN: 12196.2 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU onednn1604Ph10 2K 4K 6K 8K 10K SE +/- 23.27, N = 3 8572.49 MIN: 8526.54 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU onednn1604Ph10 2 4 6 8 10 SE +/- 0.00458, N = 3 6.41798 MIN: 6.09 1. (CXX) g++-7 options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Hackbench Count: 16 - Type: Thread OpenBenchmarking.org Seconds, Fewer Is Better Hackbench Count: 16 - Type: Thread hackbench1604Ph10 12 24 36 48 60 SE +/- 0.77, N = 3 54.27 1. (CC) gcc-7 options: -lpthread
Hackbench Count: 16 - Type: Process OpenBenchmarking.org Seconds, Fewer Is Better Hackbench Count: 16 - Type: Process hackbench1604Ph10 12 24 36 48 60 SE +/- 0.64, N = 3 53.08 1. (CC) gcc-7 options: -lpthread
t-test1 Threads: 1 OpenBenchmarking.org Seconds, Fewer Is Better t-test1 2017-01-13 Threads: 1 t-test1-1604Ph10 11 22 33 44 55 SE +/- 0.20, N = 3 47.91 1. (CC) gcc-7 options: -pthread
t-test1 Threads: 2 OpenBenchmarking.org Seconds, Fewer Is Better t-test1 2017-01-13 Threads: 2 t-test1-1604Ph10 4 8 12 16 20 SE +/- 0.02, N = 3 14.94 1. (CC) gcc-7 options: -pthread
Schbench Message Threads: 8 - Workers Per Message Thread: 16 OpenBenchmarking.org usec, 99.9th Latency Percentile, Fewer Is Better Schbench Message Threads: 8 - Workers Per Message Thread: 16 schbench8-16-1604Ph10 20K 40K 60K 80K 100K SE +/- 1375.04, N = 7 104686 1. (CC) gcc-7 options: -O2 -lpthread
Phoronix Test Suite v10.8.5