AMD EPYC 7601 2P 2021 Tests for a future article. 2 x AMD EPYC 7601 32-Core testing with a Dell 02MJ3T (1.2.5 BIOS) and llvmpipe on Ubuntu 19.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2101214-HA-AMDEPYC7619&export=pdf&grt&sor .
AMD EPYC 7601 2P 2021 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution 1 2 3 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads) Dell 02MJ3T (1.2.5 BIOS) AMD 17h 504GB 280GB INTEL SSDPED1D280GA + 12 x 500GB Samsung SSD 860 + 120GB INTEL SSDSCKJB120G7R llvmpipe VE228 2 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 19.10 5.9.0-050900rc6daily20200922-generic (x86_64) 20200921 GNOME Shell 3.34.1 X Server 1.20.5 modesetting 1.20.5 3.3 Mesa 19.2.8 (LLVM 9.0 128 bits) GCC 9.2.1 20191008 ext4 1600x1200 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - CPU Microcode: 0x8001227 Python Details - Python 2.7.17rc1 + Python 3.7.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7601 2P 2021 amg: cloverleaf: Lagrangian-Eulerian Hydrodynamics cryptsetup: PBKDF2-sha512 cryptsetup: PBKDF2-whirlpool cryptsetup: AES-XTS 256b Encryption cryptsetup: AES-XTS 256b Decryption cryptsetup: Serpent-XTS 256b Encryption cryptsetup: Serpent-XTS 256b Decryption cryptsetup: Twofish-XTS 256b Encryption cryptsetup: Twofish-XTS 256b Decryption cryptsetup: AES-XTS 512b Encryption cryptsetup: AES-XTS 512b Decryption cryptsetup: Serpent-XTS 512b Encryption cryptsetup: Serpent-XTS 512b Decryption cryptsetup: Twofish-XTS 512b Encryption cryptsetup: Twofish-XTS 512b Decryption dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit etcpak: DXT1 etcpak: ETC1 etcpak: ETC2 etcpak: ETC1 + Dithering synthmark: VoiceMark_100 kripke: lammps: 20k Atoms lammps: Rhodopsin Protein lulesh: mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 encode-ape: WAV To APE encode-ogg: WAV To Ogg onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onnx: yolov4 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU openfoam: Motorbike 30M openfoam: Motorbike 60M encode-opus: WAV To Opus Encode qmcpack: simple-H2O qe: AUSURF112 rav1e: 1 rav1e: 5 rav1e: 6 rav1e: 10 relion: Basic - CPU build-godot: Time To Compile tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 unpack-firefox: firefox-84.0.source.tar.xz encode-wavpack: WAV To WavPack 1 2 3 709699800 29.54 1170727 510008 1444.3 1445.4 308.1 306.7 317.4 316.5 1279.4 1276.7 308.6 306.7 317.7 316.0 637.03 243.31 629.60 138.45 1296.787 184.747 118.049 174.283 512.107 37882537 23.382 23.311 16092.925 14.955 54.121 10.741 6.761 68.585 18.344 26.603 2.90299 19.8661 3.86741 2.82028 17.7153 3.54557 6.45462 23.2410 3.54912 3.16880 4554.32 3940.07 4466.50 3557.56 0.919909 4707.16 3698.78 1.37881 76 58 53 2188 2084 34.62 338.71 10.213 41.839 1796.32 0.258 0.780 1.016 2.322 548.379 107.098 369.108 333.403 25.907 17.296 709739033 29.24 1157453 507751 1456.9 1442.6 308.4 307.0 318.3 316.8 1286.7 1285.0 308.8 306.9 318.1 316.6 659.19 251.57 669.24 138.90 1323.798 184.690 118.049 174.300 512.072 35226890 23.399 23.382 16073.086 14.812 52.860 10.909 7.012 71.575 18.322 26.627 2.52500 19.2173 3.81910 2.67884 16.4879 3.42396 6.56237 22.7404 3.49511 3.18118 4515.79 3865.58 4580.21 3580.43 0.902805 4615.36 3877.71 1.38889 71 54 52 2318 2078 34.23 338.37 10.207 46.249 1754.21 0.262 0.778 1.031 2.370 548.427 104.004 369.669 333.272 25.754 17.285 708880233 29.87 1168547 506073 1454.5 1453.1 308.9 307.1 318.3 316.9 1285.4 1284.3 309 307 318.0 316.7 634.77 248.58 634.70 139.13 1322.339 184.757 118.070 174.166 512.073 23.201 23.083 15990.720 18.336 26.665 3.81118 21.1172 3.96369 4.50130 20.4528 3.76272 7.09130 25.1595 3.71634 3.18392 5020.04 4050.37 4661.89 4199.67 2.60366 4885.17 4137.52 1.40554 35.32 340.04 10.238 50.655 1808.78 0.261 0.775 1.030 2.336 547.939 103.914 OpenBenchmarking.org
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 2 1 3 150M 300M 450M 600M 750M SE +/- 201948.45, N = 3 SE +/- 1313948.59, N = 3 SE +/- 658905.95, N = 3 709739033 709699800 708880233 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
CloverLeaf Lagrangian-Eulerian Hydrodynamics OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 2 1 3 7 14 21 28 35 SE +/- 0.23, N = 15 SE +/- 0.32, N = 15 SE +/- 0.49, N = 15 29.24 29.54 29.87 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Cryptsetup PBKDF2-sha512 OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-sha512 1 3 2 300K 600K 900K 1200K 1500K SE +/- 1900.25, N = 3 SE +/- 434.00, N = 3 SE +/- 12198.12, N = 7 1170727 1168547 1157453
Cryptsetup PBKDF2-whirlpool OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-whirlpool 1 2 3 110K 220K 330K 440K 550K SE +/- 572.73, N = 3 SE +/- 280.29, N = 7 SE +/- 975.00, N = 3 510008 507751 506073
Cryptsetup AES-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Encryption 2 3 1 300 600 900 1200 1500 SE +/- 2.27, N = 7 SE +/- 1.65, N = 3 SE +/- 3.56, N = 3 1456.9 1454.5 1444.3
Cryptsetup AES-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Decryption 3 1 2 300 600 900 1200 1500 SE +/- 2.05, N = 3 SE +/- 2.14, N = 3 SE +/- 12.89, N = 7 1453.1 1445.4 1442.6
Cryptsetup Serpent-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.53, N = 7 SE +/- 0.58, N = 3 308.9 308.4 308.1
Cryptsetup Serpent-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.00, N = 3 SE +/- 0.10, N = 7 SE +/- 0.12, N = 3 307.1 307.0 306.7
Cryptsetup Twofish-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.08, N = 7 SE +/- 0.37, N = 3 318.3 318.3 317.4
Cryptsetup Twofish-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.10, N = 3 SE +/- 0.07, N = 7 SE +/- 0.06, N = 3 316.9 316.8 316.5
Cryptsetup AES-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Encryption 2 3 1 300 600 900 1200 1500 SE +/- 1.46, N = 7 SE +/- 1.42, N = 3 SE +/- 1.32, N = 3 1286.7 1285.4 1279.4
Cryptsetup AES-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Decryption 2 3 1 300 600 900 1200 1500 SE +/- 1.94, N = 7 SE +/- 1.76, N = 3 SE +/- 2.31, N = 3 1285.0 1284.3 1276.7
Cryptsetup Serpent-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.09, N = 6 SE +/- 0.10, N = 2 309.0 308.8 308.6
Cryptsetup Serpent-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.17, N = 4 SE +/- 0.05, N = 2 307.0 306.9 306.7
Cryptsetup Twofish-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Encryption 2 3 1 70 140 210 280 350 SE +/- 0.06, N = 7 SE +/- 0.20, N = 2 SE +/- 0.19, N = 3 318.1 318.0 317.7
Cryptsetup Twofish-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.12, N = 3 SE +/- 0.06, N = 7 SE +/- 0.05, N = 2 316.7 316.6 316.0
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 2 1 3 140 280 420 560 700 SE +/- 1.84, N = 3 SE +/- 9.95, N = 3 SE +/- 9.40, N = 4 659.19 637.03 634.77 MIN: 348.24 / MAX: 815.29 MIN: 344.69 / MAX: 796.13 MIN: 349.39 / MAX: 815.27 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 2 3 1 50 100 150 200 250 SE +/- 1.22, N = 3 SE +/- 4.61, N = 12 SE +/- 4.09, N = 12 251.57 248.58 243.31 MIN: 91.04 / MAX: 277.22 MIN: 85.73 / MAX: 286.18 MIN: 81.19 / MAX: 282.97 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 2 3 1 140 280 420 560 700 SE +/- 4.94, N = 3 SE +/- 9.48, N = 15 SE +/- 8.65, N = 15 669.24 634.70 629.60 MIN: 231.81 / MAX: 754.48 MIN: 194.36 / MAX: 755.24 MIN: 194.05 / MAX: 739.75 1. (CC) gcc options: -pthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 3 2 1 30 60 90 120 150 SE +/- 0.14, N = 3 SE +/- 0.30, N = 3 SE +/- 0.41, N = 3 139.13 138.90 138.45 MIN: 96.19 / MAX: 219.5 MIN: 96.19 / MAX: 217.56 MIN: 95.91 / MAX: 217.11 1. (CC) gcc options: -pthread
Etcpak Configuration: DXT1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 2 3 1 300 600 900 1200 1500 SE +/- 0.97, N = 3 SE +/- 0.70, N = 3 SE +/- 1.47, N = 3 1323.80 1322.34 1296.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 3 1 2 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 184.76 184.75 184.69 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 3 2 1 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 118.07 118.05 118.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 + Dithering OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 2 1 3 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 174.30 174.28 174.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Google SynthMark Test: VoiceMark_100 OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 1 3 2 110 220 330 440 550 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 512.11 512.07 512.07 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
Kripke OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 1 2 8M 16M 24M 32M 40M SE +/- 1848270.09, N = 15 SE +/- 1684001.44, N = 12 37882537 35226890 1. (CXX) g++ options: -O3 -fopenmp
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 2 1 3 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 23.40 23.38 23.20 1. (CXX) g++ options: -O3 -pthread -lm
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 2 1 3 6 12 18 24 30 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.23, N = 3 23.38 23.31 23.08 1. (CXX) g++ options: -O3 -pthread -lm
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 1 2 3 3K 6K 9K 12K 15K SE +/- 42.67, N = 3 SE +/- 42.52, N = 3 SE +/- 65.00, N = 3 16092.93 16073.09 15990.72 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 2 1 4 8 12 16 20 SE +/- 0.20, N = 3 SE +/- 0.10, N = 3 14.81 14.96 MIN: 13.46 / MAX: 30.98 MIN: 13.84 / MAX: 36.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 2 1 12 24 36 48 60 SE +/- 2.27, N = 3 SE +/- 1.07, N = 3 52.86 54.12 MIN: 46.53 / MAX: 819.73 MIN: 46.9 / MAX: 742.65 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 1 2 3 6 9 12 15 SE +/- 0.19, N = 3 SE +/- 0.30, N = 3 10.74 10.91 MIN: 10.16 / MAX: 11.79 MIN: 10.26 / MAX: 12.2 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 1 2 2 4 6 8 10 SE +/- 0.103, N = 3 SE +/- 0.613, N = 3 6.761 7.012 MIN: 6.2 / MAX: 8.27 MIN: 6 / MAX: 24.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 1 2 16 32 48 64 80 SE +/- 0.67, N = 3 SE +/- 1.22, N = 3 68.59 71.58 MIN: 62.84 / MAX: 229.88 MIN: 64.39 / MAX: 186.82 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE 2 3 1 5 10 15 20 25 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.01, N = 5 18.32 18.34 18.34 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
Ogg Audio Encoding WAV To Ogg OpenBenchmarking.org Seconds, Fewer Is Better Ogg Audio Encoding 1.3.4 WAV To Ogg 1 2 3 6 12 18 24 30 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 26.60 26.63 26.67 1. (CC) gcc options: -O2 -ffast-math -fsigned-char
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 2 1 3 0.8575 1.715 2.5725 3.43 4.2875 SE +/- 0.02862, N = 3 SE +/- 0.04679, N = 3 SE +/- 0.05747, N = 3 2.52500 2.90299 3.81118 MIN: 2.08 MIN: 2.24 MIN: 3.25 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 2 1 3 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.11, N = 3 19.22 19.87 21.12 MIN: 18.41 MIN: 18.97 MIN: 20.42 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 2 1 3 0.8918 1.7836 2.6754 3.5672 4.459 SE +/- 0.05005, N = 3 SE +/- 0.04801, N = 4 SE +/- 0.03574, N = 3 3.81910 3.86741 3.96369 MIN: 3.31 MIN: 3.27 MIN: 3.38 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 2 1 3 1.0128 2.0256 3.0384 4.0512 5.064 SE +/- 0.02321, N = 3 SE +/- 0.03873, N = 3 SE +/- 0.01074, N = 3 2.67884 2.82028 4.50130 MIN: 2.27 MIN: 2.33 MIN: 4.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 2 1 3 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 16.49 17.72 20.45 MIN: 15.75 MIN: 16.71 MIN: 19.21 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 2 1 3 0.8466 1.6932 2.5398 3.3864 4.233 SE +/- 0.03472, N = 15 SE +/- 0.03944, N = 15 SE +/- 0.04485, N = 15 3.42396 3.54557 3.76272 MIN: 2.94 MIN: 2.98 MIN: 3.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.15837, N = 15 SE +/- 0.12641, N = 15 SE +/- 0.10920, N = 3 6.45462 6.56237 7.09130 MIN: 5.22 MIN: 5.14 MIN: 6.56 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 2 1 3 6 12 18 24 30 SE +/- 0.24, N = 3 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 22.74 23.24 25.16 MIN: 20.43 MIN: 20.59 MIN: 22.78 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 2 1 3 0.8362 1.6724 2.5086 3.3448 4.181 SE +/- 0.04588, N = 5 SE +/- 0.03910, N = 3 SE +/- 0.01698, N = 3 3.49511 3.54912 3.71634 MIN: 3 MIN: 3 MIN: 3.23 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.7164 1.4328 2.1492 2.8656 3.582 SE +/- 0.03490, N = 6 SE +/- 0.01629, N = 3 SE +/- 0.01545, N = 3 3.16880 3.18118 3.18392 MIN: 2.83 MIN: 2.88 MIN: 2.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 2 1 3 1100 2200 3300 4400 5500 SE +/- 153.50, N = 12 SE +/- 120.82, N = 15 SE +/- 112.92, N = 15 4515.79 4554.32 5020.04 MIN: 2781.43 MIN: 3273.38 MIN: 3390.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 2 1 3 900 1800 2700 3600 4500 SE +/- 125.32, N = 15 SE +/- 64.36, N = 15 SE +/- 94.46, N = 12 3865.58 3940.07 4050.37 MIN: 3206.95 MIN: 3412.94 MIN: 3493.32 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 1000 2000 3000 4000 5000 SE +/- 216.78, N = 15 SE +/- 178.79, N = 15 SE +/- 148.69, N = 15 4466.50 4580.21 4661.89 MIN: 2939.4 MIN: 3020.84 MIN: 3208.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 35.53, N = 3 SE +/- 114.13, N = 15 SE +/- 89.60, N = 15 3557.56 3580.43 4199.67 MIN: 3305.84 MIN: 3052.37 MIN: 3498.58 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 2 1 3 0.5858 1.1716 1.7574 2.3432 2.929 SE +/- 0.002458, N = 3 SE +/- 0.007779, N = 3 SE +/- 0.009404, N = 3 0.902805 0.919909 2.603660 MIN: 0.77 MIN: 0.77 MIN: 2.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 2 1 3 1000 2000 3000 4000 5000 SE +/- 149.33, N = 15 SE +/- 128.63, N = 12 SE +/- 125.34, N = 15 4615.36 4707.16 4885.17 MIN: 3327.21 MIN: 3518.69 MIN: 3744.2 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 128.96, N = 15 SE +/- 146.91, N = 15 SE +/- 85.43, N = 15 3698.78 3877.71 4137.52 MIN: 2872.13 MIN: 2904.51 MIN: 3448.07 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3162 0.6324 0.9486 1.2648 1.581 SE +/- 0.00740, N = 3 SE +/- 0.00510, N = 3 SE +/- 0.00370, N = 3 1.37881 1.38889 1.40554 MIN: 1.12 MIN: 1.21 MIN: 1.26 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1 2 20 40 60 80 100 SE +/- 2.66, N = 12 SE +/- 2.07, N = 12 76 71 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1 2 13 26 39 52 65 SE +/- 0.44, N = 3 SE +/- 2.87, N = 9 58 54 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1 2 12 24 36 48 60 SE +/- 0.88, N = 3 SE +/- 1.02, N = 12 53 52 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 2 1 500 1000 1500 2000 2500 SE +/- 112.64, N = 12 SE +/- 142.39, N = 12 2318 2188 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1 2 400 800 1200 1600 2000 SE +/- 13.94, N = 3 SE +/- 32.31, N = 3 2084 2078 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 2 1 3 8 16 24 32 40 SE +/- 0.14, N = 3 SE +/- 0.10, N = 3 SE +/- 0.33, N = 15 34.23 34.62 35.32 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 2 1 3 70 140 210 280 350 SE +/- 0.73, N = 3 SE +/- 0.27, N = 3 SE +/- 0.68, N = 3 338.37 338.71 340.04 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 2 1 3 3 6 9 12 15 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 10.21 10.21 10.24 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 1 2 3 11 22 33 44 55 SE +/- 0.22, N = 3 SE +/- 1.39, N = 15 SE +/- 1.69, N = 12 41.84 46.25 50.66 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -lm -pthread
Quantum ESPRESSO Input: AUSURF112 OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 6.7 Input: AUSURF112 2 1 3 400 800 1200 1600 2000 SE +/- 5.15, N = 3 SE +/- 19.90, N = 9 SE +/- 18.18, N = 9 1754.21 1796.32 1808.78 1. (F9X) gfortran options: -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 1 2 3 1 0.059 0.118 0.177 0.236 0.295 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 0.262 0.261 0.258
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 1 2 3 0.1755 0.351 0.5265 0.702 0.8775 SE +/- 0.006, N = 3 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 0.780 0.778 0.775
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 2 3 1 0.232 0.464 0.696 0.928 1.16 SE +/- 0.005, N = 3 SE +/- 0.014, N = 3 SE +/- 0.009, N = 3 1.031 1.030 1.016
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 2 3 1 0.5333 1.0666 1.5999 2.1332 2.6665 SE +/- 0.016, N = 3 SE +/- 0.011, N = 3 SE +/- 0.021, N = 3 2.370 2.336 2.322
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 3 1 2 120 240 360 480 600 SE +/- 0.25, N = 3 SE +/- 0.24, N = 3 SE +/- 0.25, N = 3 547.94 548.38 548.43 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
Timed Godot Game Engine Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Godot Game Engine Compilation 3.2.3 Time To Compile 3 2 1 20 40 60 80 100 SE +/- 1.34, N = 3 SE +/- 1.42, N = 3 SE +/- 1.83, N = 3 103.91 104.00 107.10
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 1 2 80 160 240 320 400 SE +/- 1.37, N = 3 SE +/- 0.16, N = 3 369.11 369.67 MIN: 357.24 / MAX: 557.3 MIN: 358.63 / MAX: 519.86 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 2 1 70 140 210 280 350 SE +/- 0.15, N = 3 SE +/- 0.06, N = 3 333.27 333.40 MIN: 332.42 / MAX: 334.08 MIN: 332.68 / MAX: 338.81 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
Unpacking Firefox Extracting: firefox-84.0.source.tar.xz OpenBenchmarking.org Seconds, Fewer Is Better Unpacking Firefox 84.0 Extracting: firefox-84.0.source.tar.xz 2 1 6 12 18 24 30 SE +/- 0.03, N = 4 SE +/- 0.08, N = 4 25.75 25.91
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack 2 1 4 8 12 16 20 SE +/- 0.01, N = 5 SE +/- 0.00, N = 5 17.29 17.30 1. (CXX) g++ options: -rdynamic
Phoronix Test Suite v10.8.5