AMD EPYC 7601 2P 2021 Tests for a future article. 2 x AMD EPYC 7601 32-Core testing with a Dell 02MJ3T (1.2.5 BIOS) and llvmpipe on Ubuntu 19.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2101214-HA-AMDEPYC7619&export=pdf&gru&rdt&rro .
AMD EPYC 7601 2P 2021 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution 1 2 3 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads) Dell 02MJ3T (1.2.5 BIOS) AMD 17h 504GB 280GB INTEL SSDPED1D280GA + 12 x 500GB Samsung SSD 860 + 120GB INTEL SSDSCKJB120G7R llvmpipe VE228 2 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 19.10 5.9.0-050900rc6daily20200922-generic (x86_64) 20200921 GNOME Shell 3.34.1 X Server 1.20.5 modesetting 1.20.5 3.3 Mesa 19.2.8 (LLVM 9.0 128 bits) GCC 9.2.1 20191008 ext4 1600x1200 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - CPU Microcode: 0x8001227 Python Details - Python 2.7.17rc1 + Python 3.7.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7601 2P 2021 amg: dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit rav1e: 1 rav1e: 5 rav1e: 6 rav1e: 10 onnx: yolov4 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU cryptsetup: PBKDF2-sha512 cryptsetup: PBKDF2-whirlpool cryptsetup: AES-XTS 256b Encryption cryptsetup: AES-XTS 256b Decryption cryptsetup: Serpent-XTS 256b Encryption cryptsetup: Serpent-XTS 256b Decryption cryptsetup: Twofish-XTS 256b Encryption cryptsetup: Twofish-XTS 256b Decryption cryptsetup: AES-XTS 512b Encryption cryptsetup: AES-XTS 512b Decryption cryptsetup: Serpent-XTS 512b Encryption cryptsetup: Serpent-XTS 512b Decryption cryptsetup: Twofish-XTS 512b Encryption cryptsetup: Twofish-XTS 512b Decryption etcpak: DXT1 etcpak: ETC1 etcpak: ETC2 etcpak: ETC1 + Dithering lammps: 20k Atoms lammps: Rhodopsin Protein kripke: synthmark: VoiceMark_100 lulesh: onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 cloverleaf: Lagrangian-Eulerian Hydrodynamics openfoam: Motorbike 30M openfoam: Motorbike 60M qe: AUSURF112 relion: Basic - CPU build-godot: Time To Compile encode-ape: WAV To APE encode-ogg: WAV To Ogg encode-opus: WAV To Opus Encode encode-wavpack: WAV To WavPack unpack-firefox: firefox-84.0.source.tar.xz qmcpack: simple-H2O 1 2 3 709699800 637.03 243.31 629.60 138.45 0.258 0.780 1.016 2.322 76 58 53 2188 2084 1170727 510008 1444.3 1445.4 308.1 306.7 317.4 316.5 1279.4 1276.7 308.6 306.7 317.7 316.0 1296.787 184.747 118.049 174.283 23.382 23.311 37882537 512.107 16092.925 2.90299 19.8661 3.86741 2.82028 17.7153 3.54557 6.45462 23.2410 3.54912 3.16880 4554.32 3940.07 4466.50 3557.56 0.919909 4707.16 3698.78 1.37881 14.955 54.121 10.741 6.761 68.585 369.108 333.403 29.54 34.62 338.71 1796.32 548.379 107.098 18.344 26.603 10.213 17.296 25.907 41.839 709739033 659.19 251.57 669.24 138.90 0.262 0.778 1.031 2.370 71 54 52 2318 2078 1157453 507751 1456.9 1442.6 308.4 307.0 318.3 316.8 1286.7 1285.0 308.8 306.9 318.1 316.6 1323.798 184.690 118.049 174.300 23.399 23.382 35226890 512.072 16073.086 2.52500 19.2173 3.81910 2.67884 16.4879 3.42396 6.56237 22.7404 3.49511 3.18118 4515.79 3865.58 4580.21 3580.43 0.902805 4615.36 3877.71 1.38889 14.812 52.860 10.909 7.012 71.575 369.669 333.272 29.24 34.23 338.37 1754.21 548.427 104.004 18.322 26.627 10.207 17.285 25.754 46.249 708880233 634.77 248.58 634.70 139.13 0.261 0.775 1.030 2.336 1168547 506073 1454.5 1453.1 308.9 307.1 318.3 316.9 1285.4 1284.3 309 307 318.0 316.7 1322.339 184.757 118.070 174.166 23.201 23.083 512.073 15990.720 3.81118 21.1172 3.96369 4.50130 20.4528 3.76272 7.09130 25.1595 3.71634 3.18392 5020.04 4050.37 4661.89 4199.67 2.60366 4885.17 4137.52 1.40554 29.87 35.32 340.04 1808.78 547.939 103.914 18.336 26.665 10.238 50.655 OpenBenchmarking.org
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 3 2 1 150M 300M 450M 600M 750M SE +/- 658905.95, N = 3 SE +/- 201948.45, N = 3 SE +/- 1313948.59, N = 3 708880233 709739033 709699800 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 3 2 1 140 280 420 560 700 SE +/- 9.40, N = 4 SE +/- 1.84, N = 3 SE +/- 9.95, N = 3 634.77 659.19 637.03 MIN: 349.39 / MAX: 815.27 MIN: 348.24 / MAX: 815.29 MIN: 344.69 / MAX: 796.13 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 3 2 1 50 100 150 200 250 SE +/- 4.61, N = 12 SE +/- 1.22, N = 3 SE +/- 4.09, N = 12 248.58 251.57 243.31 MIN: 85.73 / MAX: 286.18 MIN: 91.04 / MAX: 277.22 MIN: 81.19 / MAX: 282.97 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 3 2 1 140 280 420 560 700 SE +/- 9.48, N = 15 SE +/- 4.94, N = 3 SE +/- 8.65, N = 15 634.70 669.24 629.60 MIN: 194.36 / MAX: 755.24 MIN: 231.81 / MAX: 754.48 MIN: 194.05 / MAX: 739.75 1. (CC) gcc options: -pthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 3 2 1 30 60 90 120 150 SE +/- 0.14, N = 3 SE +/- 0.30, N = 3 SE +/- 0.41, N = 3 139.13 138.90 138.45 MIN: 96.19 / MAX: 219.5 MIN: 96.19 / MAX: 217.56 MIN: 95.91 / MAX: 217.11 1. (CC) gcc options: -pthread
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 1 3 2 1 0.059 0.118 0.177 0.236 0.295 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 0.261 0.262 0.258
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 3 2 1 0.1755 0.351 0.5265 0.702 0.8775 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 SE +/- 0.006, N = 3 0.775 0.778 0.780
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 3 2 1 0.232 0.464 0.696 0.928 1.16 SE +/- 0.014, N = 3 SE +/- 0.005, N = 3 SE +/- 0.009, N = 3 1.030 1.031 1.016
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 3 2 1 0.5333 1.0666 1.5999 2.1332 2.6665 SE +/- 0.011, N = 3 SE +/- 0.016, N = 3 SE +/- 0.021, N = 3 2.336 2.370 2.322
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 2 1 20 40 60 80 100 SE +/- 2.07, N = 12 SE +/- 2.66, N = 12 71 76 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 2 1 13 26 39 52 65 SE +/- 2.87, N = 9 SE +/- 0.44, N = 3 54 58 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 2 1 12 24 36 48 60 SE +/- 1.02, N = 12 SE +/- 0.88, N = 3 52 53 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 2 1 500 1000 1500 2000 2500 SE +/- 112.64, N = 12 SE +/- 142.39, N = 12 2318 2188 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 2 1 400 800 1200 1600 2000 SE +/- 32.31, N = 3 SE +/- 13.94, N = 3 2078 2084 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Cryptsetup PBKDF2-sha512 OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-sha512 3 2 1 300K 600K 900K 1200K 1500K SE +/- 434.00, N = 3 SE +/- 12198.12, N = 7 SE +/- 1900.25, N = 3 1168547 1157453 1170727
Cryptsetup PBKDF2-whirlpool OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-whirlpool 3 2 1 110K 220K 330K 440K 550K SE +/- 975.00, N = 3 SE +/- 280.29, N = 7 SE +/- 572.73, N = 3 506073 507751 510008
Cryptsetup AES-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Encryption 3 2 1 300 600 900 1200 1500 SE +/- 1.65, N = 3 SE +/- 2.27, N = 7 SE +/- 3.56, N = 3 1454.5 1456.9 1444.3
Cryptsetup AES-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Decryption 3 2 1 300 600 900 1200 1500 SE +/- 2.05, N = 3 SE +/- 12.89, N = 7 SE +/- 2.14, N = 3 1453.1 1442.6 1445.4
Cryptsetup Serpent-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.53, N = 7 SE +/- 0.58, N = 3 308.9 308.4 308.1
Cryptsetup Serpent-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.00, N = 3 SE +/- 0.10, N = 7 SE +/- 0.12, N = 3 307.1 307.0 306.7
Cryptsetup Twofish-XTS 256b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.08, N = 7 SE +/- 0.37, N = 3 318.3 318.3 317.4
Cryptsetup Twofish-XTS 256b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.10, N = 3 SE +/- 0.07, N = 7 SE +/- 0.06, N = 3 316.9 316.8 316.5
Cryptsetup AES-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Encryption 3 2 1 300 600 900 1200 1500 SE +/- 1.42, N = 3 SE +/- 1.46, N = 7 SE +/- 1.32, N = 3 1285.4 1286.7 1279.4
Cryptsetup AES-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Decryption 3 2 1 300 600 900 1200 1500 SE +/- 1.76, N = 3 SE +/- 1.94, N = 7 SE +/- 2.31, N = 3 1284.3 1285.0 1276.7
Cryptsetup Serpent-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.09, N = 6 SE +/- 0.10, N = 2 309.0 308.8 308.6
Cryptsetup Serpent-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 512b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.17, N = 4 SE +/- 0.05, N = 2 307.0 306.9 306.7
Cryptsetup Twofish-XTS 512b Encryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Encryption 3 2 1 70 140 210 280 350 SE +/- 0.20, N = 2 SE +/- 0.06, N = 7 SE +/- 0.19, N = 3 318.0 318.1 317.7
Cryptsetup Twofish-XTS 512b Decryption OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Decryption 3 2 1 70 140 210 280 350 SE +/- 0.12, N = 3 SE +/- 0.06, N = 7 SE +/- 0.05, N = 2 316.7 316.6 316.0
Etcpak Configuration: DXT1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 3 2 1 300 600 900 1200 1500 SE +/- 0.70, N = 3 SE +/- 0.97, N = 3 SE +/- 1.47, N = 3 1322.34 1323.80 1296.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 3 2 1 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 184.76 184.69 184.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 3 2 1 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 118.07 118.05 118.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 + Dithering OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 3 2 1 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 174.17 174.30 174.28 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 3 2 1 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 SE +/- 0.08, N = 3 23.20 23.40 23.38 1. (CXX) g++ options: -O3 -pthread -lm
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 3 2 1 6 12 18 24 30 SE +/- 0.23, N = 3 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 23.08 23.38 23.31 1. (CXX) g++ options: -O3 -pthread -lm
Kripke OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 2 1 8M 16M 24M 32M 40M SE +/- 1684001.44, N = 12 SE +/- 1848270.09, N = 15 35226890 37882537 1. (CXX) g++ options: -O3 -fopenmp
Google SynthMark Test: VoiceMark_100 OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 3 2 1 110 220 330 440 550 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 512.07 512.07 512.11 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 3 2 1 3K 6K 9K 12K 15K SE +/- 65.00, N = 3 SE +/- 42.52, N = 3 SE +/- 42.67, N = 3 15990.72 16073.09 16092.93 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 3 2 1 0.8575 1.715 2.5725 3.43 4.2875 SE +/- 0.05747, N = 3 SE +/- 0.02862, N = 3 SE +/- 0.04679, N = 3 3.81118 2.52500 2.90299 MIN: 3.25 MIN: 2.08 MIN: 2.24 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 3 2 1 5 10 15 20 25 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 21.12 19.22 19.87 MIN: 20.42 MIN: 18.41 MIN: 18.97 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.8918 1.7836 2.6754 3.5672 4.459 SE +/- 0.03574, N = 3 SE +/- 0.05005, N = 3 SE +/- 0.04801, N = 4 3.96369 3.81910 3.86741 MIN: 3.38 MIN: 3.31 MIN: 3.27 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 3 2 1 1.0128 2.0256 3.0384 4.0512 5.064 SE +/- 0.01074, N = 3 SE +/- 0.02321, N = 3 SE +/- 0.03873, N = 3 4.50130 2.67884 2.82028 MIN: 4.09 MIN: 2.27 MIN: 2.33 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 3 2 1 5 10 15 20 25 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 20.45 16.49 17.72 MIN: 19.21 MIN: 15.75 MIN: 16.71 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 3 2 1 0.8466 1.6932 2.5398 3.3864 4.233 SE +/- 0.04485, N = 15 SE +/- 0.03472, N = 15 SE +/- 0.03944, N = 15 3.76272 3.42396 3.54557 MIN: 3.09 MIN: 2.94 MIN: 2.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 3 2 1 2 4 6 8 10 SE +/- 0.10920, N = 3 SE +/- 0.12641, N = 15 SE +/- 0.15837, N = 15 7.09130 6.56237 6.45462 MIN: 6.56 MIN: 5.14 MIN: 5.22 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 3 2 1 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.24, N = 3 SE +/- 0.11, N = 3 25.16 22.74 23.24 MIN: 22.78 MIN: 20.43 MIN: 20.59 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.8362 1.6724 2.5086 3.3448 4.181 SE +/- 0.01698, N = 3 SE +/- 0.04588, N = 5 SE +/- 0.03910, N = 3 3.71634 3.49511 3.54912 MIN: 3.23 MIN: 3 MIN: 3 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.7164 1.4328 2.1492 2.8656 3.582 SE +/- 0.01545, N = 3 SE +/- 0.01629, N = 3 SE +/- 0.03490, N = 6 3.18392 3.18118 3.16880 MIN: 2.91 MIN: 2.88 MIN: 2.83 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 3 2 1 1100 2200 3300 4400 5500 SE +/- 112.92, N = 15 SE +/- 153.50, N = 12 SE +/- 120.82, N = 15 5020.04 4515.79 4554.32 MIN: 3390.91 MIN: 2781.43 MIN: 3273.38 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 3 2 1 900 1800 2700 3600 4500 SE +/- 94.46, N = 12 SE +/- 125.32, N = 15 SE +/- 64.36, N = 15 4050.37 3865.58 3940.07 MIN: 3493.32 MIN: 3206.95 MIN: 3412.94 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 3 2 1 1000 2000 3000 4000 5000 SE +/- 148.69, N = 15 SE +/- 178.79, N = 15 SE +/- 216.78, N = 15 4661.89 4580.21 4466.50 MIN: 3208.79 MIN: 3020.84 MIN: 2939.4 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 3 2 1 900 1800 2700 3600 4500 SE +/- 89.60, N = 15 SE +/- 114.13, N = 15 SE +/- 35.53, N = 3 4199.67 3580.43 3557.56 MIN: 3498.58 MIN: 3052.37 MIN: 3305.84 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 3 2 1 0.5858 1.1716 1.7574 2.3432 2.929 SE +/- 0.009404, N = 3 SE +/- 0.002458, N = 3 SE +/- 0.007779, N = 3 2.603660 0.902805 0.919909 MIN: 2.01 MIN: 0.77 MIN: 0.77 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 3 2 1 1000 2000 3000 4000 5000 SE +/- 125.34, N = 15 SE +/- 149.33, N = 15 SE +/- 128.63, N = 12 4885.17 4615.36 4707.16 MIN: 3744.2 MIN: 3327.21 MIN: 3518.69 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 3 2 1 900 1800 2700 3600 4500 SE +/- 85.43, N = 15 SE +/- 146.91, N = 15 SE +/- 128.96, N = 15 4137.52 3877.71 3698.78 MIN: 3448.07 MIN: 2904.51 MIN: 2872.13 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.3162 0.6324 0.9486 1.2648 1.581 SE +/- 0.00370, N = 3 SE +/- 0.00510, N = 3 SE +/- 0.00740, N = 3 1.40554 1.38889 1.37881 MIN: 1.26 MIN: 1.21 MIN: 1.12 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 2 1 4 8 12 16 20 SE +/- 0.20, N = 3 SE +/- 0.10, N = 3 14.81 14.96 MIN: 13.46 / MAX: 30.98 MIN: 13.84 / MAX: 36.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 2 1 12 24 36 48 60 SE +/- 2.27, N = 3 SE +/- 1.07, N = 3 52.86 54.12 MIN: 46.53 / MAX: 819.73 MIN: 46.9 / MAX: 742.65 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 2 1 3 6 9 12 15 SE +/- 0.30, N = 3 SE +/- 0.19, N = 3 10.91 10.74 MIN: 10.26 / MAX: 12.2 MIN: 10.16 / MAX: 11.79 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 2 1 2 4 6 8 10 SE +/- 0.613, N = 3 SE +/- 0.103, N = 3 7.012 6.761 MIN: 6 / MAX: 24.38 MIN: 6.2 / MAX: 8.27 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 2 1 16 32 48 64 80 SE +/- 1.22, N = 3 SE +/- 0.67, N = 3 71.58 68.59 MIN: 64.39 / MAX: 186.82 MIN: 62.84 / MAX: 229.88 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 2 1 80 160 240 320 400 SE +/- 0.16, N = 3 SE +/- 1.37, N = 3 369.67 369.11 MIN: 358.63 / MAX: 519.86 MIN: 357.24 / MAX: 557.3 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 2 1 70 140 210 280 350 SE +/- 0.15, N = 3 SE +/- 0.06, N = 3 333.27 333.40 MIN: 332.42 / MAX: 334.08 MIN: 332.68 / MAX: 338.81 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
CloverLeaf Lagrangian-Eulerian Hydrodynamics OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 3 2 1 7 14 21 28 35 SE +/- 0.49, N = 15 SE +/- 0.23, N = 15 SE +/- 0.32, N = 15 29.87 29.24 29.54 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 3 2 1 8 16 24 32 40 SE +/- 0.33, N = 15 SE +/- 0.14, N = 3 SE +/- 0.10, N = 3 35.32 34.23 34.62 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 3 2 1 70 140 210 280 350 SE +/- 0.68, N = 3 SE +/- 0.73, N = 3 SE +/- 0.27, N = 3 340.04 338.37 338.71 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
Quantum ESPRESSO Input: AUSURF112 OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 6.7 Input: AUSURF112 3 2 1 400 800 1200 1600 2000 SE +/- 18.18, N = 9 SE +/- 5.15, N = 3 SE +/- 19.90, N = 9 1808.78 1754.21 1796.32 1. (F9X) gfortran options: -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 3 2 1 120 240 360 480 600 SE +/- 0.25, N = 3 SE +/- 0.25, N = 3 SE +/- 0.24, N = 3 547.94 548.43 548.38 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
Timed Godot Game Engine Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Godot Game Engine Compilation 3.2.3 Time To Compile 3 2 1 20 40 60 80 100 SE +/- 1.34, N = 3 SE +/- 1.42, N = 3 SE +/- 1.83, N = 3 103.91 104.00 107.10
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE 3 2 1 5 10 15 20 25 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.01, N = 5 18.34 18.32 18.34 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
Ogg Audio Encoding WAV To Ogg OpenBenchmarking.org Seconds, Fewer Is Better Ogg Audio Encoding 1.3.4 WAV To Ogg 3 2 1 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 26.67 26.63 26.60 1. (CC) gcc options: -O2 -ffast-math -fsigned-char
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 3 2 1 3 6 9 12 15 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 10.24 10.21 10.21 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack 2 1 4 8 12 16 20 SE +/- 0.01, N = 5 SE +/- 0.00, N = 5 17.29 17.30 1. (CXX) g++ options: -rdynamic
Unpacking Firefox Extracting: firefox-84.0.source.tar.xz OpenBenchmarking.org Seconds, Fewer Is Better Unpacking Firefox 84.0 Extracting: firefox-84.0.source.tar.xz 2 1 6 12 18 24 30 SE +/- 0.03, N = 4 SE +/- 0.08, N = 4 25.75 25.91
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 3 2 1 11 22 33 44 55 SE +/- 1.69, N = 12 SE +/- 1.39, N = 15 SE +/- 0.22, N = 3 50.66 46.25 41.84 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -lm -pthread
Phoronix Test Suite v10.8.5