Threadripper 3960X GCC 10 LTO Testing AMD Ryzen Threadripper 3960X 24-Core testing with a MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS) and Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB on Ubuntu 19.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/1912215-PTS-THREADRI04&grr .
Threadripper 3960X GCC 10 LTO Testing Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution -O3 -march=native -flto AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads) MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS) AMD Starship/Matisse 32768MB 1000GB Sabrent Rocket 4.0 1TB Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz) AMD Baffin HDMI/DP ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 Ubuntu 19.10 5.4.0-nvme-hwmon (x86_64) GNOME Shell 3.34.1 X Server 1.20.5 modesetting 1.20.5 4.5 Mesa 19.2.1 (LLVM 9.0.0) GCC 10.0.0 20191208 ext4 3840x2160 OpenBenchmarking.org - CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto" - --disable-multilib --enable-checking=release - NONE / errors=remount-ro,relatime,rw - Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Threadripper 3960X GCC 10 LTO Testing hpcc: G-HPL qmcpack: fftw: Float + SSE - 2D FFT Size 4096 radiance: Serial fftw: Stock - 2D FFT Size 4096 mkl-dnn: Convolution Batch conv_googlenet_v3 - f32 byte: Dhrystone 2 byte: Floating-Point Arithmetic byte: Register Arithmetic byte: Integer Arithmetic mrbayes: Primate Phylogeny Analysis pgbench: Buffer Test - Normal Load - Read Only pgbench: Buffer Test - Heavy Contention - Read Only askap: tConvolve MT - Degridding askap: tConvolve MT - Gridding build-imagemagick: Time To Compile gromacs: Water Benchmark rocksdb: Rand Fill Sync rocksdb: Rand Fill rocksdb: Read While Writing rocksdb: Rand Read himeno: Poisson Pressure Solver radiance: SMP Parallel sqlite-speedtest: Timed Time - Size 1,000 stockfish: Total Time rocksdb: Seq Fill nginx: Static Web Page Serving mkl-dnn: Recurrent Neural Network Training - f32 minife: Small mkl-dnn: Deconvolution Batch deconv_1d - f32 mt-dgemm: Sustained Floating-Point Rate crafty: Elapsed Time ttsiod-renderer: Phong Rendering With Soft-Shadow Mapping openssl: RSA 4096-bit Performance compress-xz: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 mkl-dnn: Convolution Batch conv_alexnet - f32 sqlite: 1 encode-flac: WAV To FLAC askap: tConvolve OpenMP - Degridding askap: tConvolve OpenMP - Gridding compress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 encode-mp3: WAV To MP3 fftw: Stock - 1D FFT Size 32 fftw: Float + SSE - 2D FFT Size 32 fftw: Stock - 2D FFT Size 32 fftw: Float + SSE - 1D FFT Size 32 tscp: AI Chess Performance hpcc: Max Ping Pong Bandwidth hpcc: Rand Ring Bandwidth hpcc: Rand Ring Latency hpcc: G-Rand Access hpcc: EP-STREAM Triad hpcc: G-Ptrans hpcc: EP-DGEMM hpcc: G-Ffte hpcc: G-Ffte -O3 -march=native -flto 63.68487 1895.1 24505 556.135 8969.7 52.8269 67070357.6 1 1 1 69.132 701920.062619 703431.182912 3366.19 1949.81 75.246 2.517 24409 916114 4901767 147319777 4766.396934 174.534 56.442 79613988 1010840 43673.89 194.665 7720.98 2.31627 8.761475 9209287 950.330 7182.9 19.867 126.971 14.232 8.044 4117.58 5435.31 10.168 6.622 11201 46350 12677 15488 1422472 22951.428 3.33117 0.45234 0.16722 1.68874 5.78791 32.75707 15.21737 15.21737 OpenBenchmarking.org
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -O3 -march=native -flto 14 28 42 56 70 SE +/- 0.14, N = 3 63.68 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
QMCPACK OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.8 -O3 -march=native -flto 400 800 1200 1600 2000 1895.1 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 -O3 -march=native -flto 5K 10K 15K 20K 25K SE +/- 153.78, N = 3 24505 1. (CC) gcc options: -pthread -O3 -march=native -flto -lm
Radiance Benchmark Test: Serial OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: Serial -O3 -march=native -flto 120 240 360 480 600 556.14
FFTW Build: Stock - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 -O3 -march=native -flto 2K 4K 6K 8K 10K SE +/- 155.20, N = 3 8969.7 1. (CC) gcc options: -pthread -O3 -march=native -flto -lm
MKL-DNN DNNL Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 -O3 -march=native -flto 12 24 36 48 60 SE +/- 0.34, N = 3 52.83 MIN: 51.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 -O3 -march=native -flto 14M 28M 42M 56M 70M SE +/- 508456.63, N = 3 67070357.6 1. (CC) gcc options: -O3 -march=native -flto
BYTE Unix Benchmark Computational Test: Floating-Point Arithmetic OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Floating-Point Arithmetic -O3 -march=native -flto 0.225 0.45 0.675 0.9 1.125 1 1. (CC) gcc options: -O3 -march=native -flto
BYTE Unix Benchmark Computational Test: Register Arithmetic OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Register Arithmetic -O3 -march=native -flto 0.225 0.45 0.675 0.9 1.125 1 1. (CC) gcc options: -O3 -march=native -flto
BYTE Unix Benchmark Computational Test: Integer Arithmetic OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Integer Arithmetic -O3 -march=native -flto 0.225 0.45 0.675 0.9 1.125 1 1. (CC) gcc options: -O3 -march=native -flto
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis -O3 -march=native -flto 15 30 45 60 75 SE +/- 0.87, N = 4 69.13 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -march=native -flto -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only -O3 -march=native -flto 150K 300K 450K 600K 750K SE +/- 375.35, N = 3 701920.06 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only -O3 -march=native -flto 150K 300K 450K 600K 750K SE +/- 5480.07, N = 3 703431.18 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -flto -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Degridding -O3 -march=native -flto 700 1400 2100 2800 3500 SE +/- 2.36, N = 3 3366.19 1. (CXX) g++ options: -lpthread
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Gridding -O3 -march=native -flto 400 800 1200 1600 2000 SE +/- 2.80, N = 3 1949.81 1. (CXX) g++ options: -lpthread
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile -O3 -march=native -flto 20 40 60 80 100 SE +/- 0.02, N = 3 75.25
GROMACS Water Benchmark OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2019.4 Water Benchmark -O3 -march=native -flto 0.5663 1.1326 1.6989 2.2652 2.8315 SE +/- 0.001, N = 3 2.517 1. (CXX) g++ options: -mavx2 -mfma -O3 -march=native -flto -std=c++11 -funroll-all-loops -pthread -lrt -lpthread -lm
Facebook RocksDB Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill Sync -O3 -march=native -flto 5K 10K 15K 20K 25K SE +/- 31.00, N = 3 24409 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
Facebook RocksDB Test: Random Fill OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill -O3 -march=native -flto 200K 400K 600K 800K 1000K SE +/- 8977.10, N = 3 916114 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
Facebook RocksDB Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Read While Writing -O3 -march=native -flto 1000K 2000K 3000K 4000K 5000K SE +/- 9839.07, N = 3 4901767 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
Facebook RocksDB Test: Random Read OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Read -O3 -march=native -flto 30M 60M 90M 120M 150M SE +/- 1236531.46, N = 3 147319777 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver -O3 -march=native -flto 1000 2000 3000 4000 5000 SE +/- 22.94, N = 3 4766.40 1. (CC) gcc options: -O3 -march=native -flto -mavx2
Radiance Benchmark Test: SMP Parallel OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: SMP Parallel -O3 -march=native -flto 40 80 120 160 200 174.53
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 -O3 -march=native -flto 13 26 39 52 65 SE +/- 0.10, N = 3 56.44 1. (CC) gcc options: -O3 -march=native -flto -ldl -lz -lpthread
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 9 Total Time -O3 -march=native -flto 20M 40M 60M 80M 100M SE +/- 774628.71, N = 3 79613988 1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -flto -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt
Facebook RocksDB Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Sequential Fill -O3 -march=native -flto 200K 400K 600K 800K 1000K SE +/- 923.75, N = 3 1010840 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
NGINX Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving -O3 -march=native -flto 9K 18K 27K 36K 45K SE +/- 294.79, N = 3 43673.89 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native -flto
MKL-DNN DNNL Harness: Recurrent Neural Network Training - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 -O3 -march=native -flto 40 80 120 160 200 SE +/- 0.33, N = 3 194.67 MIN: 192.64 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small -O3 -march=native -flto 1700 3400 5100 6800 8500 SE +/- 12.76, N = 3 7720.98 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
MKL-DNN DNNL Harness: Deconvolution Batch deconv_1d - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 -O3 -march=native -flto 0.5212 1.0424 1.5636 2.0848 2.606 SE +/- 0.00389, N = 4 2.31627 MIN: 2.25 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate -O3 -march=native -flto 2 4 6 8 10 SE +/- 0.132221, N = 3 8.761475 1. (CC) gcc options: -O3 -march=native -fopenmp -flto
Crafty Elapsed Time OpenBenchmarking.org Nodes Per Second, More Is Better Crafty 25.2 Elapsed Time -O3 -march=native -flto 2M 4M 6M 8M 10M SE +/- 13239.35, N = 3 9209287 1. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm
TTSIOD 3D Renderer Phong Rendering With Soft-Shadow Mapping OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping -O3 -march=native -flto 200 400 600 800 1000 SE +/- 0.32, N = 3 950.33 1. (CXX) g++ options: -O3 -march=native -flto -fomit-frame-pointer -ffast-math -mtune=native -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -fopenmp -fwhole-program -lstdc++
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance -O3 -march=native -flto 1500 3000 4500 6000 7500 SE +/- 21.57, N = 3 7182.9 1. (CC) gcc options: -pthread -m64 -O3 -march=native -flto -lssl -lcrypto -ldl
XZ Compression Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 OpenBenchmarking.org Seconds, Fewer Is Better XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 -O3 -march=native -flto 5 10 15 20 25 SE +/- 0.06, N = 3 19.87 1. (CC) gcc options: -pthread -fvisibility=hidden -O3 -march=native -flto
MKL-DNN DNNL Harness: Convolution Batch conv_alexnet - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 -O3 -march=native -flto 30 60 90 120 150 SE +/- 0.18, N = 3 126.97 MIN: 125.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
SQLite Threads / Copies: 1 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.30.1 Threads / Copies: 1 -O3 -march=native -flto 4 8 12 16 20 SE +/- 0.05, N = 3 14.23 1. (CC) gcc options: -O3 -march=native -flto -lz -lm -ldl -lpthread
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.2 WAV To FLAC -O3 -march=native -flto 2 4 6 8 10 SE +/- 0.006, N = 5 8.044 1. (CXX) g++ options: -O3 -march=native -flto -fvisibility=hidden -logg -lm
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding -O3 -march=native -flto 900 1800 2700 3600 4500 SE +/- 21.33, N = 3 4117.58 1. (CXX) g++ options: -lpthread
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding -O3 -march=native -flto 1200 2400 3600 4800 6000 SE +/- 64.06, N = 3 5435.31 1. (CXX) g++ options: -lpthread
Zstd Compression Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 OpenBenchmarking.org Seconds, Fewer Is Better Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 -O3 -march=native -flto 3 6 9 12 15 SE +/- 0.02, N = 3 10.17 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz -llzma
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 -O3 -march=native -flto 2 4 6 8 10 SE +/- 0.067, N = 3 6.622 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -flto -lm
FFTW Build: Stock - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 -O3 -march=native -flto 2K 4K 6K 8K 10K SE +/- 55.87, N = 3 11201 1. (CC) gcc options: -pthread -O3 -march=native -flto -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 -O3 -march=native -flto 10K 20K 30K 40K 50K SE +/- 82.72, N = 3 46350 1. (CC) gcc options: -pthread -O3 -march=native -flto -lm
FFTW Build: Stock - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 -O3 -march=native -flto 3K 6K 9K 12K 15K SE +/- 14.19, N = 3 12677 1. (CC) gcc options: -pthread -O3 -march=native -flto -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 -O3 -march=native -flto 3K 6K 9K 12K 15K SE +/- 25.21, N = 3 15488 1. (CC) gcc options: -pthread -O3 -march=native -flto -lm
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance -O3 -march=native -flto 300K 600K 900K 1200K 1500K SE +/- 1795.91, N = 5 1422472 1. (CC) gcc options: -O3 -march=native -flto
HPC Challenge Test / Class: Max Ping Pong Bandwidth OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth -O3 -march=native -flto 5K 10K 15K 20K 25K SE +/- 301.57, N = 3 22951.43 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: Random Ring Bandwidth OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth -O3 -march=native -flto 0.7495 1.499 2.2485 2.998 3.7475 SE +/- 0.02086, N = 3 3.33117 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: Random Ring Latency OpenBenchmarking.org usecs, Fewer Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Latency -O3 -march=native -flto 0.1018 0.2036 0.3054 0.4072 0.509 SE +/- 0.00082, N = 3 0.45234 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: G-Random Access OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Random Access -O3 -march=native -flto 0.0376 0.0752 0.1128 0.1504 0.188 SE +/- 0.00027, N = 3 0.16722 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: EP-STREAM Triad OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad -O3 -march=native -flto 0.38 0.76 1.14 1.52 1.9 SE +/- 0.00593, N = 3 1.68874 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: G-Ptrans OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ptrans -O3 -march=native -flto 1.3023 2.6046 3.9069 5.2092 6.5115 SE +/- 0.01376, N = 3 5.78791 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: EP-DGEMM OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: EP-DGEMM -O3 -march=native -flto 8 16 24 32 40 SE +/- 0.64, N = 3 32.76 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -flto 4 8 12 16 20 SE +/- 0.45, N = 3 15.22 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -flto 4 8 12 16 20 SE +/- 0.45, N = 3 15.22 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -flto -funroll-loops 2. ATLAS + Open MPI 3.1.3
Phoronix Test Suite v10.8.5