AMD EPYC 7601 2P 2021 Tests for a future article. 2 x AMD EPYC 7601 32-Core testing with a Dell 02MJ3T (1.2.5 BIOS) and llvmpipe on Ubuntu 19.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2101214-HA-AMDEPYC7619&sor&export=pdf&grr .
AMD EPYC 7601 2P 2021 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution 1 2 3 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads) Dell 02MJ3T (1.2.5 BIOS) AMD 17h 504GB 280GB INTEL SSDPED1D280GA + 12 x 500GB Samsung SSD 860 + 120GB INTEL SSDSCKJB120G7R llvmpipe VE228 2 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 19.10 5.9.0-050900rc6daily20200922-generic (x86_64) 20200921 GNOME Shell 3.34.1 X Server 1.20.5 modesetting 1.20.5 3.3 Mesa 19.2.8 (LLVM 9.0 128 bits) GCC 9.2.1 20191008 ext4 1600x1200 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - CPU Microcode: 0x8001227 Python Details - Python 2.7.17rc1 + Python 3.7.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7601 2P 2021 qe: AUSURF112 relion: Basic - CPU onnx: yolov4 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU kripke: openfoam: Motorbike 60M onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU lammps: 20k Atoms onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: resnet-v2-50 mnn: SqueezeNetV1.0 qmcpack: simple-H2O cloverleaf: Lagrangian-Eulerian Hydrodynamics onnx: super-resolution-10 - OpenMP CPU openfoam: Motorbike 30M onednn: Deconvolution Batch shapes_1d - f32 - CPU build-godot: Time To Compile dav1d: Chimera 1080p 10-bit rav1e: 5 rav1e: 1 rav1e: 6 cryptsetup: Twofish-XTS 512b Decryption cryptsetup: Twofish-XTS 512b Encryption cryptsetup: Serpent-XTS 512b Decryption cryptsetup: Serpent-XTS 512b Encryption cryptsetup: AES-XTS 512b Decryption cryptsetup: AES-XTS 512b Encryption cryptsetup: Twofish-XTS 256b Decryption cryptsetup: Twofish-XTS 256b Encryption cryptsetup: Serpent-XTS 256b Decryption cryptsetup: Serpent-XTS 256b Encryption cryptsetup: AES-XTS 256b Decryption cryptsetup: AES-XTS 256b Encryption cryptsetup: PBKDF2-whirlpool cryptsetup: PBKDF2-sha512 amg: dav1d: Summer Nature 4K etcpak: ETC2 rav1e: 10 unpack-firefox: firefox-84.0.source.tar.xz encode-ape: WAV To APE synthmark: VoiceMark_100 encode-wavpack: WAV To WavPack etcpak: ETC1 + Dithering lulesh: encode-ogg: WAV To Ogg etcpak: ETC1 onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 dav1d: Summer Nature 1080p dav1d: Chimera 1080p encode-opus: WAV To Opus Encode onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU etcpak: DXT1 lammps: Rhodopsin Protein 1 2 3 1796.32 548.379 76 2188 4466.50 37882537 338.71 3698.78 4707.16 4554.32 3940.07 23.382 3557.56 53 58 68.585 6.761 10.741 54.121 14.955 41.839 29.54 2084 34.62 3.54557 107.098 138.45 0.780 0.258 1.016 316.0 317.7 306.7 308.6 1276.7 1279.4 316.5 317.4 306.7 308.1 1445.4 1444.3 510008 1170727 709699800 243.31 118.049 2.322 25.907 18.344 512.107 17.296 174.283 16092.925 26.603 184.747 3.54912 369.108 333.403 629.60 637.03 10.213 3.86741 2.90299 0.919909 1.37881 6.45462 2.82028 19.8661 23.2410 17.7153 3.16880 1296.787 23.311 1754.21 548.427 71 2318 4580.21 35226890 338.37 3877.71 4615.36 4515.79 3865.58 23.399 3580.43 52 54 71.575 7.012 10.909 52.860 14.812 46.249 29.24 2078 34.23 3.42396 104.004 138.90 0.778 0.262 1.031 316.6 318.1 306.9 308.8 1285.0 1286.7 316.8 318.3 307.0 308.4 1442.6 1456.9 507751 1157453 709739033 251.57 118.049 2.370 25.754 18.322 512.072 17.285 174.300 16073.086 26.627 184.690 3.49511 369.669 333.272 669.24 659.19 10.207 3.81910 2.52500 0.902805 1.38889 6.56237 2.67884 19.2173 22.7404 16.4879 3.18118 1323.798 23.382 1808.78 547.939 4661.89 340.04 4137.52 4885.17 5020.04 4050.37 23.201 4199.67 50.655 29.87 35.32 3.76272 103.914 139.13 0.775 0.261 1.030 316.7 318.0 307 309 1284.3 1285.4 316.9 318.3 307.1 308.9 1453.1 1454.5 506073 1168547 708880233 248.58 118.070 2.336 18.336 512.073 174.166 15990.720 26.665 184.757 3.71634 634.70 634.77 10.238 3.96369 3.81118 2.60366 1.40554 7.09130 4.50130 21.1172 25.1595 20.4528 3.18392 1322.339 23.083 OpenBenchmarking.org
Quantum ESPRESSO Input: AUSURF112 OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 6.7 Input: AUSURF112 2 1 3 400 800 1200 1600 2000 SE +/- 5.15, N = 3 SE +/- 19.90, N = 9 SE +/- 18.18, N = 9 1754.21 1796.32 1808.78 1. (F9X) gfortran options: -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 3 1 2 120 240 360 480 600 SE +/- 0.25, N = 3 SE +/- 0.24, N = 3 SE +/- 0.25, N = 3 547.94 548.38 548.43 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1 2 20 40 60 80 100 SE +/- 2.66, N = 12 SE +/- 2.07, N = 12 76 71 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 2 1 500 1000 1500 2000 2500 SE +/- 112.64, N = 12 SE +/- 142.39, N = 12 2318 2188 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 1000 2000 3000 4000 5000 SE +/- 216.78, N = 15 SE +/- 178.79, N = 15 SE +/- 148.69, N = 15 4466.50 4580.21 4661.89 MIN: 2939.4 MIN: 3020.84 MIN: 3208.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Kripke OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 1 2 8M 16M 24M 32M 40M SE +/- 1848270.09, N = 15 SE +/- 1684001.44, N = 12 37882537 35226890 1. (CXX) g++ options: -O3 -fopenmp
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 2 1 3 70 140 210 280 350 SE +/- 0.73, N = 3 SE +/- 0.27, N = 3 SE +/- 0.68, N = 3 338.37 338.71 340.04 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 128.96, N = 15 SE +/- 146.91, N = 15 SE +/- 85.43, N = 15 3698.78 3877.71 4137.52 MIN: 2872.13 MIN: 2904.51 MIN: 3448.07 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 1000 2000 3000 4000 5000 SE +/- 149.33, N = 15 SE +/- 128.63, N = 12 SE +/- 125.34, N = 15 4615.36 4707.16 4885.17 MIN: 3327.21 MIN: 3518.69 MIN: 3744.2 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 2 1 3 1100 2200 3300 4400 5500 SE +/- 153.50, N = 12 SE +/- 120.82, N = 15 SE +/- 112.92, N = 15 4515.79 4554.32 5020.04 MIN: 2781.43 MIN: 3273.38 MIN: 3390.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 2 1 3 900 1800 2700 3600 4500 SE +/- 125.32, N = 15 SE +/- 64.36, N = 15 SE +/- 94.46, N = 12 3865.58 3940.07 4050.37 MIN: 3206.95 MIN: 3412.94 MIN: 3493.32 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 2 1 3 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 23.40 23.38 23.20 1. (CXX) g++ options: -O3 -pthread -lm
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 35.53, N = 3 SE +/- 114.13, N = 15 SE +/- 89.60, N = 15 3557.56 3580.43 4199.67 MIN: 3305.84 MIN: 3052.37 MIN: 3498.58 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1 2 12 24 36 48 60 SE +/- 0.88, N = 3 SE +/- 1.02, N = 12 53 52 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1 2 13 26 39 52 65 SE +/- 0.44, N = 3 SE +/- 2.87, N = 9 58 54 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 1 2 16 32 48 64 80 SE +/- 0.67, N = 3 SE +/- 1.22, N = 3 68.59 71.58 MIN: 62.84 / MAX: 229.88 MIN: 64.39 / MAX: 186.82 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 1 2 2 4 6 8 10 SE +/- 0.103, N = 3 SE +/- 0.613, N = 3 6.761 7.012 MIN: 6.2 / MAX: 8.27 MIN: 6 / MAX: 24.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 1 2 3 6 9 12 15 SE +/- 0.19, N = 3 SE +/- 0.30, N = 3 10.74 10.91 MIN: 10.16 / MAX: 11.79 MIN: 10.26 / MAX: 12.2 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 2 1 12 24 36 48 60 SE +/- 2.27, N = 3 SE +/- 1.07, N = 3 52.86 54.12 MIN: 46.53 / MAX: 819.73 MIN: 46.9 / MAX: 742.65 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 2 1 4 8 12 16 20 SE +/- 0.20, N = 3 SE +/- 0.10, N = 3 14.81 14.96 MIN: 13.46 / MAX: 30.98 MIN: 13.84 / MAX: 36.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 1 2 3 11 22 33 44 55 SE +/- 0.22, N = 3 SE +/- 1.39, N = 15 SE +/- 1.69, N = 12 41.84 46.25 50.66 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -lm -pthread
CloverLeaf Lagrangian-Eulerian Hydrodynamics OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 2 1 3 7 14 21 28 35 SE +/- 0.23, N = 15 SE +/- 0.32, N = 15 SE +/- 0.49, N = 15 29.24 29.54 29.87 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1 2 400 800 1200 1600 2000 SE +/- 13.94, N = 3 SE +/- 32.31, N = 3 2084 2078 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 2 1 3 8 16 24 32 40 SE +/- 0.14, N = 3 SE +/- 0.10, N = 3 SE +/- 0.33, N = 15 34.23 34.62 35.32 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 2 1 3 0.8466 1.6932 2.5398 3.3864 4.233 SE +/- 0.03472, N = 15 SE +/- 0.03944, N = 15 SE +/- 0.04485, N = 15 3.42396 3.54557 3.76272 MIN: 2.94 MIN: 2.98 MIN: 3.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Timed Godot Game Engine Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Godot Game Engine Compilation 3.2.3 Time To Compile 3 2 1 20 40 60 80 100 SE +/- 1.34, N = 3 SE +/- 1.42, N = 3 SE +/- 1.83, N = 3 103.91 104.00 107.10
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 3 2 1 30 60 90 120 150 SE +/- 0.14, N = 3 SE +/- 0.30, N = 3 SE +/- 0.41, N = 3 139.13 138.90 138.45 MIN: 96.19 / MAX: 219.5 MIN: 96.19 / MAX: 217.56 MIN: 95.91 / MAX: 217.11 1. (CC) gcc options: -pthread
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 1 2 3 0.1755 0.351 0.5265 0.702 0.8775 SE +/- 0.006, N = 3 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 0.780 0.778 0.775
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 1 2 3 1 0.059 0.118 0.177 0.236 0.295 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 0.262 0.261 0.258
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 2 3 1 0.232 0.464 0.696 0.928 1.16 SE +/- 0.005, N = 3 SE +/- 0.014, N = 3 SE +/- 0.009, N = 3 1.031 1.030 1.016
Cryptsetup Twofish-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.12, N = 3 SE +/- 0.06, N = 7 SE +/- 0.05, N = 2 316.7 316.6 316.0
Cryptsetup Twofish-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Encryption 2 3 1 70 140 210 280 350 SE +/- 0.06, N = 7 SE +/- 0.20, N = 2 SE +/- 0.19, N = 3 318.1 318.0 317.7
Cryptsetup Serpent-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.17, N = 4 SE +/- 0.05, N = 2 307.0 306.9 306.7
Cryptsetup Serpent-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.09, N = 6 SE +/- 0.10, N = 2 309.0 308.8 308.6
Cryptsetup AES-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Decryption 2 3 1 300 600 900 1200 1500 SE +/- 1.94, N = 7 SE +/- 1.76, N = 3 SE +/- 2.31, N = 3 1285.0 1284.3 1276.7
Cryptsetup AES-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Encryption 2 3 1 300 600 900 1200 1500 SE +/- 1.46, N = 7 SE +/- 1.42, N = 3 SE +/- 1.32, N = 3 1286.7 1285.4 1279.4
Cryptsetup Twofish-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.10, N = 3 SE +/- 0.07, N = 7 SE +/- 0.06, N = 3 316.9 316.8 316.5
Cryptsetup Twofish-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.08, N = 7 SE +/- 0.37, N = 3 318.3 318.3 317.4
Cryptsetup Serpent-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.00, N = 3 SE +/- 0.10, N = 7 SE +/- 0.12, N = 3 307.1 307.0 306.7
Cryptsetup Serpent-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.53, N = 7 SE +/- 0.58, N = 3 308.9 308.4 308.1
Cryptsetup AES-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Decryption 3 1 2 300 600 900 1200 1500 SE +/- 2.05, N = 3 SE +/- 2.14, N = 3 SE +/- 12.89, N = 7 1453.1 1445.4 1442.6
Cryptsetup AES-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Encryption 2 3 1 300 600 900 1200 1500 SE +/- 2.27, N = 7 SE +/- 1.65, N = 3 SE +/- 3.56, N = 3 1456.9 1454.5 1444.3
Cryptsetup PBKDF2-whirlpool OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-whirlpool 1 2 3 110K 220K 330K 440K 550K SE +/- 572.73, N = 3 SE +/- 280.29, N = 7 SE +/- 975.00, N = 3 510008 507751 506073
Cryptsetup PBKDF2-sha512 OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-sha512 1 3 2 300K 600K 900K 1200K 1500K SE +/- 1900.25, N = 3 SE +/- 434.00, N = 3 SE +/- 12198.12, N = 7 1170727 1168547 1157453
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 2 1 3 150M 300M 450M 600M 750M SE +/- 201948.45, N = 3 SE +/- 1313948.59, N = 3 SE +/- 658905.95, N = 3 709739033 709699800 708880233 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 2 3 1 50 100 150 200 250 SE +/- 1.22, N = 3 SE +/- 4.61, N = 12 SE +/- 4.09, N = 12 251.57 248.58 243.31 MIN: 91.04 / MAX: 277.22 MIN: 85.73 / MAX: 286.18 MIN: 81.19 / MAX: 282.97 1. (CC) gcc options: -pthread
Etcpak Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 3 2 1 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 118.07 118.05 118.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 2 3 1 0.5333 1.0666 1.5999 2.1332 2.6665 SE +/- 0.016, N = 3 SE +/- 0.011, N = 3 SE +/- 0.021, N = 3 2.370 2.336 2.322
Unpacking Firefox Extracting: firefox-84.0.source.tar.xz OpenBenchmarking.org Seconds, Fewer Is Better Unpacking Firefox 84.0 Extracting: firefox-84.0.source.tar.xz 2 1 6 12 18 24 30 SE +/- 0.03, N = 4 SE +/- 0.08, N = 4 25.75 25.91
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE 2 3 1 5 10 15 20 25 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.01, N = 5 18.32 18.34 18.34 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
Google SynthMark Test: VoiceMark_100 OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 1 3 2 110 220 330 440 550 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 512.11 512.07 512.07 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack 2 1 4 8 12 16 20 SE +/- 0.01, N = 5 SE +/- 0.00, N = 5 17.29 17.30 1. (CXX) g++ options: -rdynamic
Etcpak Configuration: ETC1 + Dithering OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 2 1 3 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 174.30 174.28 174.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 1 2 3 3K 6K 9K 12K 15K SE +/- 42.67, N = 3 SE +/- 42.52, N = 3 SE +/- 65.00, N = 3 16092.93 16073.09 15990.72 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
Ogg Audio Encoding WAV To Ogg OpenBenchmarking.org Seconds, Fewer Is Better Ogg Audio Encoding 1.3.4 WAV To Ogg 1 2 3 6 12 18 24 30 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 26.60 26.63 26.67 1. (CC) gcc options: -O2 -ffast-math -fsigned-char
Etcpak Configuration: ETC1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 3 1 2 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 184.76 184.75 184.69 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 2 1 3 0.8362 1.6724 2.5086 3.3448 4.181 SE +/- 0.04588, N = 5 SE +/- 0.03910, N = 3 SE +/- 0.01698, N = 3 3.49511 3.54912 3.71634 MIN: 3 MIN: 3 MIN: 3.23 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 1 2 80 160 240 320 400 SE +/- 1.37, N = 3 SE +/- 0.16, N = 3 369.11 369.67 MIN: 357.24 / MAX: 557.3 MIN: 358.63 / MAX: 519.86 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 2 1 70 140 210 280 350 SE +/- 0.15, N = 3 SE +/- 0.06, N = 3 333.27 333.40 MIN: 332.42 / MAX: 334.08 MIN: 332.68 / MAX: 338.81 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 2 3 1 140 280 420 560 700 SE +/- 4.94, N = 3 SE +/- 9.48, N = 15 SE +/- 8.65, N = 15 669.24 634.70 629.60 MIN: 231.81 / MAX: 754.48 MIN: 194.36 / MAX: 755.24 MIN: 194.05 / MAX: 739.75 1. (CC) gcc options: -pthread
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 2 1 3 140 280 420 560 700 SE +/- 1.84, N = 3 SE +/- 9.95, N = 3 SE +/- 9.40, N = 4 659.19 637.03 634.77 MIN: 348.24 / MAX: 815.29 MIN: 344.69 / MAX: 796.13 MIN: 349.39 / MAX: 815.27 1. (CC) gcc options: -pthread
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 2 1 3 3 6 9 12 15 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 10.21 10.21 10.24 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 2 1 3 0.8918 1.7836 2.6754 3.5672 4.459 SE +/- 0.05005, N = 3 SE +/- 0.04801, N = 4 SE +/- 0.03574, N = 3 3.81910 3.86741 3.96369 MIN: 3.31 MIN: 3.27 MIN: 3.38 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 2 1 3 0.8575 1.715 2.5725 3.43 4.2875 SE +/- 0.02862, N = 3 SE +/- 0.04679, N = 3 SE +/- 0.05747, N = 3 2.52500 2.90299 3.81118 MIN: 2.08 MIN: 2.24 MIN: 3.25 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 2 1 3 0.5858 1.1716 1.7574 2.3432 2.929 SE +/- 0.002458, N = 3 SE +/- 0.007779, N = 3 SE +/- 0.009404, N = 3 0.902805 0.919909 2.603660 MIN: 0.77 MIN: 0.77 MIN: 2.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3162 0.6324 0.9486 1.2648 1.581 SE +/- 0.00740, N = 3 SE +/- 0.00510, N = 3 SE +/- 0.00370, N = 3 1.37881 1.38889 1.40554 MIN: 1.12 MIN: 1.21 MIN: 1.26 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.15837, N = 15 SE +/- 0.12641, N = 15 SE +/- 0.10920, N = 3 6.45462 6.56237 7.09130 MIN: 5.22 MIN: 5.14 MIN: 6.56 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 2 1 3 1.0128 2.0256 3.0384 4.0512 5.064 SE +/- 0.02321, N = 3 SE +/- 0.03873, N = 3 SE +/- 0.01074, N = 3 2.67884 2.82028 4.50130 MIN: 2.27 MIN: 2.33 MIN: 4.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 2 1 3 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.11, N = 3 19.22 19.87 21.12 MIN: 18.41 MIN: 18.97 MIN: 20.42 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 2 1 3 6 12 18 24 30 SE +/- 0.24, N = 3 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 22.74 23.24 25.16 MIN: 20.43 MIN: 20.59 MIN: 22.78 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 2 1 3 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 16.49 17.72 20.45 MIN: 15.75 MIN: 16.71 MIN: 19.21 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.7164 1.4328 2.1492 2.8656 3.582 SE +/- 0.03490, N = 6 SE +/- 0.01629, N = 3 SE +/- 0.01545, N = 3 3.16880 3.18118 3.18392 MIN: 2.83 MIN: 2.88 MIN: 2.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Etcpak Configuration: DXT1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 2 3 1 300 600 900 1200 1500 SE +/- 0.97, N = 3 SE +/- 0.70, N = 3 SE +/- 1.47, N = 3 1323.80 1322.34 1296.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 2 1 3 6 12 18 24 30 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.23, N = 3 23.38 23.31 23.08 1. (CXX) g++ options: -O3 -pthread -lm
Phoronix Test Suite v10.8.5