AMD EPYC 7601 2P 2021 Tests for a future article. 2 x AMD EPYC 7601 32-Core testing with a Dell 02MJ3T (1.2.5 BIOS) and llvmpipe on Ubuntu 19.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2101214-HA-AMDEPYC7619&export=pdf&sro&grs .
AMD EPYC 7601 2P 2021 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution 1 2 3 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads) Dell 02MJ3T (1.2.5 BIOS) AMD 17h 504GB 280GB INTEL SSDPED1D280GA + 12 x 500GB Samsung SSD 860 + 120GB INTEL SSDSCKJB120G7R llvmpipe VE228 2 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 19.10 5.9.0-050900rc6daily20200922-generic (x86_64) 20200921 GNOME Shell 3.34.1 X Server 1.20.5 modesetting 1.20.5 3.3 Mesa 19.2.8 (LLVM 9.0 128 bits) GCC 9.2.1 20191008 ext4 1600x1200 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - CPU Microcode: 0x8001227 Python Details - Python 2.7.17rc1 + Python 3.7.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7601 2P 2021 onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU dav1d: Summer Nature 1080p mnn: inception-v3 dav1d: Chimera 1080p onednn: IP Shapes 1D - u8s8f32 - CPU openfoam: Motorbike 30M qe: AUSURF112 build-godot: Time To Compile etcpak: DXT1 rav1e: 10 onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU mnn: MobileNetV2_224 rav1e: 1 rav1e: 6 lammps: Rhodopsin Protein cryptsetup: PBKDF2-sha512 mnn: SqueezeNetV1.0 cryptsetup: AES-XTS 256b Encryption lammps: 20k Atoms cryptsetup: PBKDF2-whirlpool cryptsetup: AES-XTS 256b Decryption cryptsetup: AES-XTS 512b Decryption rav1e: 5 lulesh: unpack-firefox: firefox-84.0.source.tar.xz cryptsetup: AES-XTS 512b Encryption openfoam: Motorbike 60M dav1d: Chimera 1080p 10-bit onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU encode-opus: WAV To Opus Encode onnx: super-resolution-10 - OpenMP CPU cryptsetup: Twofish-XTS 256b Encryption cryptsetup: Serpent-XTS 256b Encryption encode-ogg: WAV To Ogg cryptsetup: Twofish-XTS 512b Decryption tnn: CPU - MobileNet v2 cryptsetup: Serpent-XTS 256b Decryption cryptsetup: Serpent-XTS 512b Encryption cryptsetup: Twofish-XTS 256b Decryption cryptsetup: Twofish-XTS 512b Encryption amg: encode-ape: WAV To APE cryptsetup: Serpent-XTS 512b Decryption relion: Basic - CPU etcpak: ETC1 + Dithering encode-wavpack: WAV To WavPack tnn: CPU - SqueezeNet v1.1 etcpak: ETC1 etcpak: ETC2 synthmark: VoiceMark_100 kripke: onnx: shufflenet-v2-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU onnx: yolov4 - OpenMP CPU mnn: mobilenet-v1-1.0 mnn: resnet-v2-50 dav1d: Summer Nature 4K onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU qmcpack: simple-H2O cloverleaf: Lagrangian-Eulerian Hydrodynamics 1 2 3 0.919909 2.82028 2.90299 17.7153 23.2410 3.54557 19.8661 3.54912 629.60 68.585 637.03 3.86741 34.62 1796.32 107.098 1296.787 2.322 1.37881 10.741 0.258 1.016 23.311 1170727 14.955 1444.3 23.382 510008 1445.4 1276.7 0.780 16092.925 25.907 1279.4 338.71 138.45 3.16880 10.213 2084 317.4 308.1 26.603 316.0 369.108 306.7 308.6 316.5 317.7 709699800 18.344 306.7 548.379 174.283 17.296 333.403 184.747 118.049 512.107 37882537 2188 53 58 76 6.761 54.121 243.31 3698.78 4707.16 3557.56 4466.50 3940.07 4554.32 6.45462 41.839 29.54 0.902805 2.67884 2.52500 16.4879 22.7404 3.42396 19.2173 3.49511 669.24 71.575 659.19 3.81910 34.23 1754.21 104.004 1323.798 2.370 1.38889 10.909 0.262 1.031 23.382 1157453 14.812 1456.9 23.399 507751 1442.6 1285.0 0.778 16073.086 25.754 1286.7 338.37 138.90 3.18118 10.207 2078 318.3 308.4 26.627 316.6 369.669 307.0 308.8 316.8 318.1 709739033 18.322 306.9 548.427 174.300 17.285 333.272 184.690 118.049 512.072 35226890 2318 52 54 71 7.012 52.860 251.57 3877.71 4615.36 3580.43 4580.21 3865.58 4515.79 6.56237 46.249 29.24 2.60366 4.50130 3.81118 20.4528 25.1595 3.76272 21.1172 3.71634 634.70 634.77 3.96369 35.32 1808.78 103.914 1322.339 2.336 1.40554 0.261 1.030 23.083 1168547 1454.5 23.201 506073 1453.1 1284.3 0.775 15990.720 1285.4 340.04 139.13 3.18392 10.238 318.3 308.9 26.665 316.7 307.1 309 316.9 318.0 708880233 18.336 307 547.939 174.166 184.757 118.070 512.073 248.58 4137.52 4885.17 4199.67 4661.89 4050.37 5020.04 7.09130 50.655 29.87 OpenBenchmarking.org
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.5858 1.1716 1.7574 2.3432 2.929 SE +/- 0.007779, N = 3 SE +/- 0.002458, N = 3 SE +/- 0.009404, N = 3 0.919909 0.902805 2.603660 MIN: 0.77 MIN: 0.77 MIN: 2.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 2 3 1.0128 2.0256 3.0384 4.0512 5.064 SE +/- 0.03873, N = 3 SE +/- 0.02321, N = 3 SE +/- 0.01074, N = 3 2.82028 2.67884 4.50130 MIN: 2.33 MIN: 2.27 MIN: 4.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 0.8575 1.715 2.5725 3.43 4.2875 SE +/- 0.04679, N = 3 SE +/- 0.02862, N = 3 SE +/- 0.05747, N = 3 2.90299 2.52500 3.81118 MIN: 2.24 MIN: 2.08 MIN: 3.25 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 17.72 16.49 20.45 MIN: 16.71 MIN: 15.75 MIN: 19.21 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 6 12 18 24 30 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 SE +/- 0.09, N = 3 23.24 22.74 25.16 MIN: 20.59 MIN: 20.43 MIN: 22.78 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 1 2 3 0.8466 1.6932 2.5398 3.3864 4.233 SE +/- 0.03944, N = 15 SE +/- 0.03472, N = 15 SE +/- 0.04485, N = 15 3.54557 3.42396 3.76272 MIN: 2.98 MIN: 2.94 MIN: 3.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 19.87 19.22 21.12 MIN: 18.97 MIN: 18.41 MIN: 20.42 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.8362 1.6724 2.5086 3.3448 4.181 SE +/- 0.03910, N = 3 SE +/- 0.04588, N = 5 SE +/- 0.01698, N = 3 3.54912 3.49511 3.71634 MIN: 3 MIN: 3 MIN: 3.23 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 1 2 3 140 280 420 560 700 SE +/- 8.65, N = 15 SE +/- 4.94, N = 3 SE +/- 9.48, N = 15 629.60 669.24 634.70 MIN: 194.05 / MAX: 739.75 MIN: 231.81 / MAX: 754.48 MIN: 194.36 / MAX: 755.24 1. (CC) gcc options: -pthread
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 1 2 16 32 48 64 80 SE +/- 0.67, N = 3 SE +/- 1.22, N = 3 68.59 71.58 MIN: 62.84 / MAX: 229.88 MIN: 64.39 / MAX: 186.82 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 1 2 3 140 280 420 560 700 SE +/- 9.95, N = 3 SE +/- 1.84, N = 3 SE +/- 9.40, N = 4 637.03 659.19 634.77 MIN: 344.69 / MAX: 796.13 MIN: 348.24 / MAX: 815.29 MIN: 349.39 / MAX: 815.27 1. (CC) gcc options: -pthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.8918 1.7836 2.6754 3.5672 4.459 SE +/- 0.04801, N = 4 SE +/- 0.05005, N = 3 SE +/- 0.03574, N = 3 3.86741 3.81910 3.96369 MIN: 3.27 MIN: 3.31 MIN: 3.38 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 1 2 3 8 16 24 32 40 SE +/- 0.10, N = 3 SE +/- 0.14, N = 3 SE +/- 0.33, N = 15 34.62 34.23 35.32 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
Quantum ESPRESSO Input: AUSURF112 OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 6.7 Input: AUSURF112 1 2 3 400 800 1200 1600 2000 SE +/- 19.90, N = 9 SE +/- 5.15, N = 3 SE +/- 18.18, N = 9 1796.32 1754.21 1808.78 1. (F9X) gfortran options: -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Timed Godot Game Engine Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Godot Game Engine Compilation 3.2.3 Time To Compile 1 2 3 20 40 60 80 100 SE +/- 1.83, N = 3 SE +/- 1.42, N = 3 SE +/- 1.34, N = 3 107.10 104.00 103.91
Etcpak Configuration: DXT1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 1 2 3 300 600 900 1200 1500 SE +/- 1.47, N = 3 SE +/- 0.97, N = 3 SE +/- 0.70, N = 3 1296.79 1323.80 1322.34 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 1 2 3 0.5333 1.0666 1.5999 2.1332 2.6665 SE +/- 0.021, N = 3 SE +/- 0.016, N = 3 SE +/- 0.011, N = 3 2.322 2.370 2.336
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3162 0.6324 0.9486 1.2648 1.581 SE +/- 0.00740, N = 3 SE +/- 0.00510, N = 3 SE +/- 0.00370, N = 3 1.37881 1.38889 1.40554 MIN: 1.12 MIN: 1.21 MIN: 1.26 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 1 2 3 6 9 12 15 SE +/- 0.19, N = 3 SE +/- 0.30, N = 3 10.74 10.91 MIN: 10.16 / MAX: 11.79 MIN: 10.26 / MAX: 12.2 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 1 1 2 3 0.059 0.118 0.177 0.236 0.295 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 0.258 0.262 0.261
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 1 2 3 0.232 0.464 0.696 0.928 1.16 SE +/- 0.009, N = 3 SE +/- 0.005, N = 3 SE +/- 0.014, N = 3 1.016 1.031 1.030
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 1 2 3 6 12 18 24 30 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.23, N = 3 23.31 23.38 23.08 1. (CXX) g++ options: -O3 -pthread -lm
Cryptsetup PBKDF2-sha512 OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-sha512 1 2 3 300K 600K 900K 1200K 1500K SE +/- 1900.25, N = 3 SE +/- 12198.12, N = 7 SE +/- 434.00, N = 3 1170727 1157453 1168547
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 1 2 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.20, N = 3 14.96 14.81 MIN: 13.84 / MAX: 36.38 MIN: 13.46 / MAX: 30.98 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Cryptsetup AES-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Encryption 1 2 3 300 600 900 1200 1500 SE +/- 3.56, N = 3 SE +/- 2.27, N = 7 SE +/- 1.65, N = 3 1444.3 1456.9 1454.5
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 1 2 3 6 12 18 24 30 SE +/- 0.08, N = 3 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 23.38 23.40 23.20 1. (CXX) g++ options: -O3 -pthread -lm
Cryptsetup PBKDF2-whirlpool OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-whirlpool 1 2 3 110K 220K 330K 440K 550K SE +/- 572.73, N = 3 SE +/- 280.29, N = 7 SE +/- 975.00, N = 3 510008 507751 506073
Cryptsetup AES-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Decryption 1 2 3 300 600 900 1200 1500 SE +/- 2.14, N = 3 SE +/- 12.89, N = 7 SE +/- 2.05, N = 3 1445.4 1442.6 1453.1
Cryptsetup AES-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Decryption 1 2 3 300 600 900 1200 1500 SE +/- 2.31, N = 3 SE +/- 1.94, N = 7 SE +/- 1.76, N = 3 1276.7 1285.0 1284.3
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 1 2 3 0.1755 0.351 0.5265 0.702 0.8775 SE +/- 0.006, N = 3 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 0.780 0.778 0.775
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 1 2 3 3K 6K 9K 12K 15K SE +/- 42.67, N = 3 SE +/- 42.52, N = 3 SE +/- 65.00, N = 3 16092.93 16073.09 15990.72 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
Unpacking Firefox Extracting: firefox-84.0.source.tar.xz OpenBenchmarking.org Seconds, Fewer Is Better Unpacking Firefox 84.0 Extracting: firefox-84.0.source.tar.xz 1 2 6 12 18 24 30 SE +/- 0.08, N = 4 SE +/- 0.03, N = 4 25.91 25.75
Cryptsetup AES-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Encryption 1 2 3 300 600 900 1200 1500 SE +/- 1.32, N = 3 SE +/- 1.46, N = 7 SE +/- 1.42, N = 3 1279.4 1286.7 1285.4
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 1 2 3 70 140 210 280 350 SE +/- 0.27, N = 3 SE +/- 0.73, N = 3 SE +/- 0.68, N = 3 338.71 338.37 340.04 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 1 2 3 30 60 90 120 150 SE +/- 0.41, N = 3 SE +/- 0.30, N = 3 SE +/- 0.14, N = 3 138.45 138.90 139.13 MIN: 95.91 / MAX: 217.11 MIN: 96.19 / MAX: 217.56 MIN: 96.19 / MAX: 219.5 1. (CC) gcc options: -pthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.7164 1.4328 2.1492 2.8656 3.582 SE +/- 0.03490, N = 6 SE +/- 0.01629, N = 3 SE +/- 0.01545, N = 3 3.16880 3.18118 3.18392 MIN: 2.83 MIN: 2.88 MIN: 2.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 1 2 3 3 6 9 12 15 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 10.21 10.21 10.24 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1 2 400 800 1200 1600 2000 SE +/- 13.94, N = 3 SE +/- 32.31, N = 3 2084 2078 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Cryptsetup Twofish-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.37, N = 3 SE +/- 0.08, N = 7 SE +/- 0.03, N = 3 317.4 318.3 318.3
Cryptsetup Serpent-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.58, N = 3 SE +/- 0.53, N = 7 SE +/- 0.03, N = 3 308.1 308.4 308.9
Ogg Audio Encoding WAV To Ogg OpenBenchmarking.org Seconds, Fewer Is Better Ogg Audio Encoding 1.3.4 WAV To Ogg 1 2 3 6 12 18 24 30 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 26.60 26.63 26.67 1. (CC) gcc options: -O2 -ffast-math -fsigned-char
Cryptsetup Twofish-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.05, N = 2 SE +/- 0.06, N = 7 SE +/- 0.12, N = 3 316.0 316.6 316.7
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 1 2 80 160 240 320 400 SE +/- 1.37, N = 3 SE +/- 0.16, N = 3 369.11 369.67 MIN: 357.24 / MAX: 557.3 MIN: 358.63 / MAX: 519.86 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
Cryptsetup Serpent-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.12, N = 3 SE +/- 0.10, N = 7 SE +/- 0.00, N = 3 306.7 307.0 307.1
Cryptsetup Serpent-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.10, N = 2 SE +/- 0.09, N = 6 308.6 308.8 309.0
Cryptsetup Twofish-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.07, N = 7 SE +/- 0.10, N = 3 316.5 316.8 316.9
Cryptsetup Twofish-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.19, N = 3 SE +/- 0.06, N = 7 SE +/- 0.20, N = 2 317.7 318.1 318.0
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 1 2 3 150M 300M 450M 600M 750M SE +/- 1313948.59, N = 3 SE +/- 201948.45, N = 3 SE +/- 658905.95, N = 3 709699800 709739033 708880233 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE 1 2 3 5 10 15 20 25 SE +/- 0.01, N = 5 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 18.34 18.32 18.34 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
Cryptsetup Serpent-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.05, N = 2 SE +/- 0.17, N = 4 306.7 306.9 307.0
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 1 2 3 120 240 360 480 600 SE +/- 0.24, N = 3 SE +/- 0.25, N = 3 SE +/- 0.25, N = 3 548.38 548.43 547.94 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
Etcpak Configuration: ETC1 + Dithering OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 1 2 3 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 174.28 174.30 174.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack 1 2 4 8 12 16 20 SE +/- 0.00, N = 5 SE +/- 0.01, N = 5 17.30 17.29 1. (CXX) g++ options: -rdynamic
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 1 2 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.15, N = 3 333.40 333.27 MIN: 332.68 / MAX: 338.81 MIN: 332.42 / MAX: 334.08 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
Etcpak Configuration: ETC1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 1 2 3 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 184.75 184.69 184.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 1 2 3 30 60 90 120 150 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 118.05 118.05 118.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Google SynthMark Test: VoiceMark_100 OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 1 2 3 110 220 330 440 550 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 512.11 512.07 512.07 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
Kripke OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 1 2 8M 16M 24M 32M 40M SE +/- 1848270.09, N = 15 SE +/- 1684001.44, N = 12 37882537 35226890 1. (CXX) g++ options: -O3 -fopenmp
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 1 2 500 1000 1500 2000 2500 SE +/- 142.39, N = 12 SE +/- 112.64, N = 12 2188 2318 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1 2 12 24 36 48 60 SE +/- 0.88, N = 3 SE +/- 1.02, N = 12 53 52 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1 2 13 26 39 52 65 SE +/- 0.44, N = 3 SE +/- 2.87, N = 9 58 54 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1 2 20 40 60 80 100 SE +/- 2.66, N = 12 SE +/- 2.07, N = 12 76 71 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 1 2 2 4 6 8 10 SE +/- 0.103, N = 3 SE +/- 0.613, N = 3 6.761 7.012 MIN: 6.2 / MAX: 8.27 MIN: 6 / MAX: 24.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 1 2 12 24 36 48 60 SE +/- 1.07, N = 3 SE +/- 2.27, N = 3 54.12 52.86 MIN: 46.9 / MAX: 742.65 MIN: 46.53 / MAX: 819.73 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 1 2 3 50 100 150 200 250 SE +/- 4.09, N = 12 SE +/- 1.22, N = 3 SE +/- 4.61, N = 12 243.31 251.57 248.58 MIN: 81.19 / MAX: 282.97 MIN: 91.04 / MAX: 277.22 MIN: 85.73 / MAX: 286.18 1. (CC) gcc options: -pthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 128.96, N = 15 SE +/- 146.91, N = 15 SE +/- 85.43, N = 15 3698.78 3877.71 4137.52 MIN: 2872.13 MIN: 2904.51 MIN: 3448.07 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 1000 2000 3000 4000 5000 SE +/- 128.63, N = 12 SE +/- 149.33, N = 15 SE +/- 125.34, N = 15 4707.16 4615.36 4885.17 MIN: 3518.69 MIN: 3327.21 MIN: 3744.2 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 35.53, N = 3 SE +/- 114.13, N = 15 SE +/- 89.60, N = 15 3557.56 3580.43 4199.67 MIN: 3305.84 MIN: 3052.37 MIN: 3498.58 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 1000 2000 3000 4000 5000 SE +/- 216.78, N = 15 SE +/- 178.79, N = 15 SE +/- 148.69, N = 15 4466.50 4580.21 4661.89 MIN: 2939.4 MIN: 3020.84 MIN: 3208.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 64.36, N = 15 SE +/- 125.32, N = 15 SE +/- 94.46, N = 12 3940.07 3865.58 4050.37 MIN: 3412.94 MIN: 3206.95 MIN: 3493.32 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 1100 2200 3300 4400 5500 SE +/- 120.82, N = 15 SE +/- 153.50, N = 12 SE +/- 112.92, N = 15 4554.32 4515.79 5020.04 MIN: 3273.38 MIN: 2781.43 MIN: 3390.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.15837, N = 15 SE +/- 0.12641, N = 15 SE +/- 0.10920, N = 3 6.45462 6.56237 7.09130 MIN: 5.22 MIN: 5.14 MIN: 6.56 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 1 2 3 11 22 33 44 55 SE +/- 0.22, N = 3 SE +/- 1.39, N = 15 SE +/- 1.69, N = 12 41.84 46.25 50.66 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -lm -pthread
CloverLeaf Lagrangian-Eulerian Hydrodynamics OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 1 2 3 7 14 21 28 35 SE +/- 0.32, N = 15 SE +/- 0.23, N = 15 SE +/- 0.49, N = 15 29.54 29.24 29.87 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Phoronix Test Suite v10.8.5