2 x AMD EPYC 7742 64-Core testing with a Supermicro H11DSU-iN (2.1b BIOS) and llvmpipe 504GB on Ubuntu 20.04 via the Phoronix Test Suite.
2 x AMD EPYC 7742 64-Core Processor: 2 x AMD EPYC 7742 64-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: Supermicro H11DSU-iN (2.1b BIOS), Chipset: AMD Starship/Matisse, Memory: 504GB, Disk: 2 x 3841GB Micron_9200_MTFDHAL3T8TCT, Graphics: llvmpipe 504GB, Network: 4 x Intel I350
OS: Ubuntu 20.04, Kernel: 5.4.0-39-generic (x86_64), Desktop: GNOME Shell 3.36.2, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, OpenGL: 3.3 Mesa 20.0.4 (LLVM 9.0.1 128 bits), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038Python Notes: Python 3.8.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Pauls EPYC Redo 77422 OpenBenchmarking.org Phoronix Test Suite 2 x AMD EPYC 7742 64-Core @ 2.25GHz (128 Cores / 256 Threads) Supermicro H11DSU-iN (2.1b BIOS) AMD Starship/Matisse 504GB 2 x 3841GB Micron_9200_MTFDHAL3T8TCT llvmpipe 504GB 4 x Intel I350 Ubuntu 20.04 5.4.0-39-generic (x86_64) GNOME Shell 3.36.2 X Server 1.20.8 modesetting 1.20.8 3.3 Mesa 20.0.4 (LLVM 9.0.1 128 bits) GCC 9.3.0 ext4 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution Pauls EPYC Redo 77422 Benchmarks System Logs - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038 - Python 3.8.2 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Pauls EPYC Redo 77422 mlpack: scikit_linearridgeregression pybench: Total For Average Test Times blender: Pabellon Barcelona - CPU-Only blender: Pabellon Barcelona - OpenCL blender: Barbershop - CPU-Only blender: Fishy Cat - CPU-Only blender: Classroom - CPU-Only blender: Barbershop - OpenCL blender: Fishy Cat - OpenCL blender: Classroom - OpenCL blender: BMW27 - CPU-Only blender: BMW27 - OpenCL sysbench: CPU sysbench: Memory redis: SET redis: GET redis: LPUSH redis: SADD redis: LPOP gromacs: Water Benchmark openssl: RSA 4096-bit Performance y-cruncher: Calculating 500M Pi Digits c-ray: Total Time - 4K, 16 Rays Per Pixel build-llvm: Time To Compile stockfish: Total Time svt-av1: Enc Mode 8 - 1080p svt-av1: Enc Mode 4 - 1080p svt-av1: Enc Mode 0 - 1080p onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch deconv_3d - u8s8f32 - CPU onednn: Deconvolution Batch deconv_1d - u8s8f32 - CPU onednn: Deconvolution Batch deconv_3d - f32 - CPU onednn: Deconvolution Batch deconv_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Batch All - u8s8f32 - CPU onednn: IP Batch 1D - u8s8f32 - CPU onednn: IP Batch All - f32 - CPU onednn: IP Batch 1D - f32 - CPU byte: Floating-Point Arithmetic byte: Register Arithmetic byte: Integer Arithmetic byte: Dhrystone 2 fftw: Float + SSE - 2D FFT Size 4096 fftw: Float + SSE - 2D FFT Size 2048 fftw: Float + SSE - 2D FFT Size 1024 fftw: Float + SSE - 1D FFT Size 4096 fftw: Float + SSE - 1D FFT Size 2048 fftw: Float + SSE - 1D FFT Size 1024 fftw: Float + SSE - 2D FFT Size 512 fftw: Float + SSE - 2D FFT Size 256 fftw: Float + SSE - 2D FFT Size 128 fftw: Float + SSE - 1D FFT Size 512 fftw: Float + SSE - 1D FFT Size 256 fftw: Float + SSE - 1D FFT Size 128 fftw: Float + SSE - 2D FFT Size 64 fftw: Float + SSE - 2D FFT Size 32 fftw: Float + SSE - 1D FFT Size 64 fftw: Float + SSE - 1D FFT Size 32 fftw: Stock - 2D FFT Size 4096 fftw: Stock - 2D FFT Size 2048 fftw: Stock - 2D FFT Size 1024 fftw: Stock - 1D FFT Size 4096 fftw: Stock - 1D FFT Size 2048 fftw: Stock - 1D FFT Size 1024 fftw: Stock - 2D FFT Size 512 fftw: Stock - 2D FFT Size 256 fftw: Stock - 2D FFT Size 128 fftw: Stock - 1D FFT Size 512 fftw: Stock - 1D FFT Size 256 fftw: Stock - 1D FFT Size 128 fftw: Stock - 2D FFT Size 64 fftw: Stock - 2D FFT Size 32 fftw: Stock - 1D FFT Size 64 fftw: Stock - 1D FFT Size 32 namd: ATPase Simulation - 327,506 Atoms npb: SP.B npb: MG.C npb: LU.C npb: IS.D npb: FT.C npb: EP.D npb: EP.C npb: CG.C npb: BT.C stream: Add stream: Triad stream: Scale stream: Copy mlpack: scikit_svm onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU 2 x AMD EPYC 7742 64-Core 228.59 1153 73.77 1088.31 144.17 44.66 46.72 588.93 1048.86 231.48 27.09 400.66 214875.5076 4622114.7571 1534110.27 2174159.67 1299705.47 1696127.25 2331103.25 8.334 24890.0 14.267 6.048 195.574 236757913 86.955 9.009 0.105 0.827013 0.763937 376.463 1004.117 1.20007 2.21301 2.74425 2.92704 0.731886 10.2041 3.14757 21.8730 2.06958 1 1 1 35242187.2 15715 20771 34269 44174 47018 44655 33462 33906 36088 40273 30630 21129 36665 35928 16517 12117 5318.4 5570.4 6253.3 6891.8 7079.1 7312.3 6795.5 6813.5 6999.0 7147.7 7122.2 6767.9 7528.5 8897.8 8205.9 8832.6 0.26084 116398.05 99974.26 235826.23 4353.14 108907.76 8461.55 8140.38 49999.81 241482.30 193641.0 199693.6 175933.3 184819.7 28.40 2.74691 OpenBenchmarking.org
PyBench This test profile reports the total time of the different average timed test results from PyBench. PyBench reports average test times for different functions such as BuiltinFunctionCalls and NestedForLoops, with this total result providing a rough estimate as to Python's average performance on a given system. This test profile runs PyBench each time for 20 rounds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milliseconds, Fewer Is Better PyBench 2018-02-16 Total For Average Test Times 2 x AMD EPYC 7742 64-Core 200 400 600 800 1000 SE +/- 2.52, N = 3 1153
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.82 Blend File: Pabellon Barcelona - Compute: OpenCL 2 x AMD EPYC 7742 64-Core 200 400 600 800 1000 SE +/- 4.64, N = 3 1088.31
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.82 Blend File: Barbershop - Compute: OpenCL 2 x AMD EPYC 7742 64-Core 130 260 390 520 650 SE +/- 4.46, N = 3 588.93
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.82 Blend File: Fishy Cat - Compute: OpenCL 2 x AMD EPYC 7742 64-Core 200 400 600 800 1000 SE +/- 1.78, N = 3 1048.86
OpenBenchmarking.org Events Per Second, More Is Better Sysbench 2018-07-28 Test: Memory 2 x AMD EPYC 7742 64-Core 1000K 2000K 3000K 4000K 5000K SE +/- 2131.08, N = 3 4622114.76 1. (CC) gcc options: -pthread -O3 -funroll-loops -ggdb3 -march=amdfam10 -rdynamic -ldl -laio -lm
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: GET 2 x AMD EPYC 7742 64-Core 500K 1000K 1500K 2000K 2500K SE +/- 16374.26, N = 3 2174159.67 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: LPUSH 2 x AMD EPYC 7742 64-Core 300K 600K 900K 1200K 1500K SE +/- 14918.14, N = 15 1299705.47 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: SADD 2 x AMD EPYC 7742 64-Core 400K 800K 1200K 1600K 2000K SE +/- 24020.68, N = 15 1696127.25 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenBenchmarking.org Requests Per Second, More Is Better Redis 5.0.5 Test: LPOP 2 x AMD EPYC 7742 64-Core 500K 1000K 1500K 2000K 2500K SE +/- 10817.25, N = 3 2331103.25 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance 2 x AMD EPYC 7742 64-Core 5K 10K 15K 20K 25K SE +/- 24.94, N = 3 24890.0 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel 2 x AMD EPYC 7742 64-Core 2 4 6 8 10 SE +/- 0.010, N = 3 6.048 1. (CC) gcc options: -lm -lpthread -O3
Stockfish This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 9 Total Time 2 x AMD EPYC 7742 64-Core 50M 100M 150M 200M 250M SE +/- 1766557.68, N = 3 236757913 1. (CXX) g++ options: -m64 -lpthread -fno-exceptions -std=c++11 -pedantic -O3 -msse -msse3 -mpopcnt -flto
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p 2 x AMD EPYC 7742 64-Core 20 40 60 80 100 SE +/- 0.98, N = 3 86.96 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p 2 x AMD EPYC 7742 64-Core 3 6 9 12 15 SE +/- 0.025, N = 3 9.009 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p 2 x AMD EPYC 7742 64-Core 0.0236 0.0472 0.0708 0.0944 0.118 SE +/- 0.000, N = 3 0.105 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.1861 0.3722 0.5583 0.7444 0.9305 SE +/- 0.002264, N = 3 0.827013 MIN: 0.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.1719 0.3438 0.5157 0.6876 0.8595 SE +/- 0.010853, N = 3 0.763937 MIN: 0.68 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 80 160 240 320 400 SE +/- 0.53, N = 3 376.46 MIN: 349.42 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 200 400 600 800 1000 SE +/- 11.67, N = 3 1004.12 MIN: 890.7 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.27 0.54 0.81 1.08 1.35 SE +/- 0.01456, N = 3 1.20007 MIN: 1.06 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.4979 0.9958 1.4937 1.9916 2.4895 SE +/- 0.00661, N = 3 2.21301 MIN: 2.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.6175 1.235 1.8525 2.47 3.0875 SE +/- 0.03237, N = 5 2.74425 MIN: 2.43 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.6586 1.3172 1.9758 2.6344 3.293 SE +/- 0.00686, N = 3 2.92704 MIN: 2.66 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.1647 0.3294 0.4941 0.6588 0.8235 SE +/- 0.000571, N = 3 0.731886 MIN: 0.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 3 6 9 12 15 SE +/- 0.02, N = 3 10.20 MIN: 9.48 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.7082 1.4164 2.1246 2.8328 3.541 SE +/- 0.01409, N = 3 3.14757 MIN: 2.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 5 10 15 20 25 SE +/- 0.08, N = 3 21.87 MIN: 18.64 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.4657 0.9314 1.3971 1.8628 2.3285 SE +/- 0.01313, N = 3 2.06958 MIN: 1.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 2 x AMD EPYC 7742 64-Core 3K 6K 9K 12K 15K SE +/- 193.22, N = 3 15715 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 2048 2 x AMD EPYC 7742 64-Core 4K 8K 12K 16K 20K SE +/- 268.11, N = 3 20771 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 1024 2 x AMD EPYC 7742 64-Core 7K 14K 21K 28K 35K SE +/- 260.64, N = 3 34269 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 4096 2 x AMD EPYC 7742 64-Core 9K 18K 27K 36K 45K SE +/- 456.79, N = 3 44174 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 2048 2 x AMD EPYC 7742 64-Core 10K 20K 30K 40K 50K SE +/- 191.44, N = 3 47018 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 1024 2 x AMD EPYC 7742 64-Core 10K 20K 30K 40K 50K SE +/- 730.08, N = 3 44655 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 512 2 x AMD EPYC 7742 64-Core 7K 14K 21K 28K 35K SE +/- 136.33, N = 3 33462 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 256 2 x AMD EPYC 7742 64-Core 7K 14K 21K 28K 35K SE +/- 48.79, N = 3 33906 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 128 2 x AMD EPYC 7742 64-Core 8K 16K 24K 32K 40K SE +/- 142.06, N = 3 36088 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 512 2 x AMD EPYC 7742 64-Core 9K 18K 27K 36K 45K SE +/- 353.76, N = 3 40273 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 256 2 x AMD EPYC 7742 64-Core 7K 14K 21K 28K 35K SE +/- 368.26, N = 3 30630 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 128 2 x AMD EPYC 7742 64-Core 5K 10K 15K 20K 25K SE +/- 72.95, N = 3 21129 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 64 2 x AMD EPYC 7742 64-Core 8K 16K 24K 32K 40K SE +/- 174.39, N = 3 36665 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 2 x AMD EPYC 7742 64-Core 8K 16K 24K 32K 40K SE +/- 39.55, N = 3 35928 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 64 2 x AMD EPYC 7742 64-Core 4K 8K 12K 16K 20K SE +/- 91.83, N = 3 16517 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 2 x AMD EPYC 7742 64-Core 3K 6K 9K 12K 15K SE +/- 111.25, N = 3 12117 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 2 x AMD EPYC 7742 64-Core 1100 2200 3300 4400 5500 SE +/- 9.37, N = 3 5318.4 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 2048 2 x AMD EPYC 7742 64-Core 1200 2400 3600 4800 6000 SE +/- 34.05, N = 3 5570.4 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 1024 2 x AMD EPYC 7742 64-Core 1300 2600 3900 5200 6500 SE +/- 18.31, N = 3 6253.3 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 4096 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 9.70, N = 3 6891.8 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 2048 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 15.92, N = 3 7079.1 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 1024 2 x AMD EPYC 7742 64-Core 1600 3200 4800 6400 8000 SE +/- 8.17, N = 3 7312.3 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 512 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 4.84, N = 3 6795.5 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 256 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 19.12, N = 3 6813.5 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 128 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 28.47, N = 3 6999.0 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 512 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 42.17, N = 3 7147.7 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 256 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 23.45, N = 3 7122.2 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 128 2 x AMD EPYC 7742 64-Core 1500 3000 4500 6000 7500 SE +/- 58.26, N = 3 6767.9 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 64 2 x AMD EPYC 7742 64-Core 1600 3200 4800 6400 8000 SE +/- 6.03, N = 3 7528.5 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 2 x AMD EPYC 7742 64-Core 2K 4K 6K 8K 10K SE +/- 1.42, N = 3 8897.8 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 64 2 x AMD EPYC 7742 64-Core 2K 4K 6K 8K 10K SE +/- 2.97, N = 3 8205.9 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 2 x AMD EPYC 7742 64-Core 2K 4K 6K 8K 10K SE +/- 10.99, N = 3 8832.6 1. (CC) gcc options: -pthread -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math -lm
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.13 ATPase Simulation - 327,506 Atoms 2 x AMD EPYC 7742 64-Core 0.0587 0.1174 0.1761 0.2348 0.2935 SE +/- 0.00168, N = 3 0.26084
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B 2 x AMD EPYC 7742 64-Core 20K 40K 60K 80K 100K SE +/- 1045.74, N = 3 116398.05 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C 2 x AMD EPYC 7742 64-Core 20K 40K 60K 80K 100K SE +/- 761.18, N = 3 99974.26 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C 2 x AMD EPYC 7742 64-Core 50K 100K 150K 200K 250K SE +/- 1756.51, N = 3 235826.23 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D 2 x AMD EPYC 7742 64-Core 900 1800 2700 3600 4500 SE +/- 63.92, N = 3 4353.14 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C 2 x AMD EPYC 7742 64-Core 20K 40K 60K 80K 100K SE +/- 1202.48, N = 3 108907.76 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D 2 x AMD EPYC 7742 64-Core 2K 4K 6K 8K 10K SE +/- 22.78, N = 3 8461.55 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C 2 x AMD EPYC 7742 64-Core 2K 4K 6K 8K 10K SE +/- 25.93, N = 3 8140.38 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C 2 x AMD EPYC 7742 64-Core 11K 22K 33K 44K 55K SE +/- 204.92, N = 3 49999.81 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C 2 x AMD EPYC 7742 64-Core 50K 100K 150K 200K 250K SE +/- 724.30, N = 3 241482.30 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Triad 2 x AMD EPYC 7742 64-Core 40K 80K 120K 160K 200K SE +/- 426.56, N = 5 199693.6 1. (CC) gcc options: -O3 -march=native -fopenmp
OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Scale 2 x AMD EPYC 7742 64-Core 40K 80K 120K 160K 200K SE +/- 246.48, N = 5 175933.3 1. (CC) gcc options: -O3 -march=native -fopenmp
OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Copy 2 x AMD EPYC 7742 64-Core 40K 80K 120K 160K 200K SE +/- 419.54, N = 5 184819.7 1. (CC) gcc options: -O3 -march=native -fopenmp
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7742 64-Core 0.6181 1.2362 1.8543 2.4724 3.0905 SE +/- 0.19711, N = 12 2.74691 MIN: 0.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
2 x AMD EPYC 7742 64-Core Processor: 2 x AMD EPYC 7742 64-Core @ 2.25GHz (128 Cores / 256 Threads), Motherboard: Supermicro H11DSU-iN (2.1b BIOS), Chipset: AMD Starship/Matisse, Memory: 504GB, Disk: 2 x 3841GB Micron_9200_MTFDHAL3T8TCT, Graphics: llvmpipe 504GB, Network: 4 x Intel I350
OS: Ubuntu 20.04, Kernel: 5.4.0-39-generic (x86_64), Desktop: GNOME Shell 3.36.2, Display Server: X Server 1.20.8, Display Driver: modesetting 1.20.8, OpenGL: 3.3 Mesa 20.0.4 (LLVM 9.0.1 128 bits), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038Python Notes: Python 3.8.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 28 June 2020 17:54 by user paul.