2 x AMD EPYC 7601 32-Core testing with a Dell 02MJ3T (1.2.5 BIOS) and llvmpipe on Ubuntu 19.10 via the Phoronix Test Suite.
1 Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x8001227Python Notes: Python 2.7.17rc1 + Python 3.7.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
2 3 Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD 17h, Memory: 504GB, Disk: 280GB INTEL SSDPED1D280GA + 12 x 500GB Samsung SSD 860 + 120GB INTEL SSDSCKJB120G7R, Graphics: llvmpipe, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme BCM5720 2-port PCIe
OS: Ubuntu 19.10, Kernel: 5.9.0-050900rc6daily20200922-generic (x86_64) 20200921, Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 3.3 Mesa 19.2.8 (LLVM 9.0 128 bits), Compiler: GCC 9.2.1 20191008, File-System: ext4, Screen Resolution: 1600x1200
Etcpak Etcpack is the self-proclaimed "fastest ETC compressor on the planet" with focused on providing open-source, very fast ETC and S3 texture compression support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 1 2 3 300 600 900 1200 1500 SE +/- 1.47, N = 3 SE +/- 0.97, N = 3 SE +/- 0.70, N = 3 1296.79 1323.80 1322.34 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 1 2 3 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 184.75 184.69 184.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 1 2 3 30 60 90 120 150 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 118.05 118.05 118.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 1 2 3 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 174.28 174.30 174.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
CloverLeaf CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 1 2 3 7 14 21 28 35 SE +/- 0.32, N = 15 SE +/- 0.23, N = 15 SE +/- 0.49, N = 15 29.54 29.24 29.87 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Algebraic Multi-Grid Benchmark AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 1 2 3 150M 300M 450M 600M 750M SE +/- 1313948.59, N = 3 SE +/- 201948.45, N = 3 SE +/- 658905.95, N = 3 709699800 709739033 708880233 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 1 2 3 11 22 33 44 55 SE +/- 0.22, N = 3 SE +/- 1.39, N = 15 SE +/- 1.69, N = 12 41.84 46.25 50.66 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -lm -pthread
OpenFOAM OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 1 2 3 8 16 24 32 40 SE +/- 0.10, N = 3 SE +/- 0.14, N = 3 SE +/- 0.33, N = 15 34.62 34.23 35.32 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 1 2 3 70 140 210 280 350 SE +/- 0.27, N = 3 SE +/- 0.73, N = 3 SE +/- 0.68, N = 3 338.71 338.37 340.04 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -ldynamicMesh -ldecompose -lgenericPatchFields -lmetisDecomp -lscotchDecomp -llagrangian -lregionModels -lOpenFOAM -ldl -lm
Quantum ESPRESSO Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 6.7 Input: AUSURF112 1 2 3 400 800 1200 1600 2000 SE +/- 19.90, N = 9 SE +/- 5.15, N = 3 SE +/- 18.18, N = 9 1796.32 1754.21 1808.78 1. (F9X) gfortran options: -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
RELION RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 1 2 3 120 240 360 480 600 SE +/- 0.24, N = 3 SE +/- 0.25, N = 3 SE +/- 0.25, N = 3 548.38 548.43 547.94 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 0.8575 1.715 2.5725 3.43 4.2875 SE +/- 0.04679, N = 3 SE +/- 0.02862, N = 3 SE +/- 0.05747, N = 3 2.90299 2.52500 3.81118 MIN: 2.24 MIN: 2.08 MIN: 3.25 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 19.87 19.22 21.12 MIN: 18.97 MIN: 18.41 MIN: 20.42 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.8918 1.7836 2.6754 3.5672 4.459 SE +/- 0.04801, N = 4 SE +/- 0.05005, N = 3 SE +/- 0.03574, N = 3 3.86741 3.81910 3.96369 MIN: 3.27 MIN: 3.31 MIN: 3.38 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 2 3 1.0128 2.0256 3.0384 4.0512 5.064 SE +/- 0.03873, N = 3 SE +/- 0.02321, N = 3 SE +/- 0.01074, N = 3 2.82028 2.67884 4.50130 MIN: 2.33 MIN: 2.27 MIN: 4.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 17.72 16.49 20.45 MIN: 16.71 MIN: 15.75 MIN: 19.21 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 1 2 3 0.8466 1.6932 2.5398 3.3864 4.233 SE +/- 0.03944, N = 15 SE +/- 0.03472, N = 15 SE +/- 0.04485, N = 15 3.54557 3.42396 3.76272 MIN: 2.98 MIN: 2.94 MIN: 3.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.15837, N = 15 SE +/- 0.12641, N = 15 SE +/- 0.10920, N = 3 6.45462 6.56237 7.09130 MIN: 5.22 MIN: 5.14 MIN: 6.56 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 6 12 18 24 30 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 SE +/- 0.09, N = 3 23.24 22.74 25.16 MIN: 20.59 MIN: 20.43 MIN: 22.78 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.8362 1.6724 2.5086 3.3448 4.181 SE +/- 0.03910, N = 3 SE +/- 0.04588, N = 5 SE +/- 0.01698, N = 3 3.54912 3.49511 3.71634 MIN: 3 MIN: 3 MIN: 3.23 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.7164 1.4328 2.1492 2.8656 3.582 SE +/- 0.03490, N = 6 SE +/- 0.01629, N = 3 SE +/- 0.01545, N = 3 3.16880 3.18118 3.18392 MIN: 2.83 MIN: 2.88 MIN: 2.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 1100 2200 3300 4400 5500 SE +/- 120.82, N = 15 SE +/- 153.50, N = 12 SE +/- 112.92, N = 15 4554.32 4515.79 5020.04 MIN: 3273.38 MIN: 2781.43 MIN: 3390.91 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 64.36, N = 15 SE +/- 125.32, N = 15 SE +/- 94.46, N = 12 3940.07 3865.58 4050.37 MIN: 3412.94 MIN: 3206.95 MIN: 3493.32 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 1000 2000 3000 4000 5000 SE +/- 216.78, N = 15 SE +/- 178.79, N = 15 SE +/- 148.69, N = 15 4466.50 4580.21 4661.89 MIN: 2939.4 MIN: 3020.84 MIN: 3208.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 35.53, N = 3 SE +/- 114.13, N = 15 SE +/- 89.60, N = 15 3557.56 3580.43 4199.67 MIN: 3305.84 MIN: 3052.37 MIN: 3498.58 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.5858 1.1716 1.7574 2.3432 2.929 SE +/- 0.007779, N = 3 SE +/- 0.002458, N = 3 SE +/- 0.009404, N = 3 0.919909 0.902805 2.603660 MIN: 0.77 MIN: 0.77 MIN: 2.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 1000 2000 3000 4000 5000 SE +/- 128.63, N = 12 SE +/- 149.33, N = 15 SE +/- 125.34, N = 15 4707.16 4615.36 4885.17 MIN: 3518.69 MIN: 3327.21 MIN: 3744.2 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 128.96, N = 15 SE +/- 146.91, N = 15 SE +/- 85.43, N = 15 3698.78 3877.71 4137.52 MIN: 2872.13 MIN: 2904.51 MIN: 3448.07 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3162 0.6324 0.9486 1.2648 1.581 SE +/- 0.00740, N = 3 SE +/- 0.00510, N = 3 SE +/- 0.00370, N = 3 1.37881 1.38889 1.40554 MIN: 1.12 MIN: 1.21 MIN: 1.26 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 1 2 3 140 280 420 560 700 SE +/- 9.95, N = 3 SE +/- 1.84, N = 3 SE +/- 9.40, N = 4 637.03 659.19 634.77 MIN: 344.69 / MAX: 796.13 MIN: 348.24 / MAX: 815.29 MIN: 349.39 / MAX: 815.27 1. (CC) gcc options: -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 1 2 3 50 100 150 200 250 SE +/- 4.09, N = 12 SE +/- 1.22, N = 3 SE +/- 4.61, N = 12 243.31 251.57 248.58 MIN: 81.19 / MAX: 282.97 MIN: 91.04 / MAX: 277.22 MIN: 85.73 / MAX: 286.18 1. (CC) gcc options: -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 1 2 3 140 280 420 560 700 SE +/- 8.65, N = 15 SE +/- 4.94, N = 3 SE +/- 9.48, N = 15 629.60 669.24 634.70 MIN: 194.05 / MAX: 739.75 MIN: 231.81 / MAX: 754.48 MIN: 194.36 / MAX: 755.24 1. (CC) gcc options: -pthread
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 1 2 3 30 60 90 120 150 SE +/- 0.41, N = 3 SE +/- 0.30, N = 3 SE +/- 0.14, N = 3 138.45 138.90 139.13 MIN: 95.91 / MAX: 217.11 MIN: 96.19 / MAX: 217.56 MIN: 96.19 / MAX: 219.5 1. (CC) gcc options: -pthread
OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 1 2 3 0.1755 0.351 0.5265 0.702 0.8775 SE +/- 0.006, N = 3 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 0.780 0.778 0.775
OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 1 2 3 0.232 0.464 0.696 0.928 1.16 SE +/- 0.009, N = 3 SE +/- 0.005, N = 3 SE +/- 0.014, N = 3 1.016 1.031 1.030
OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 1 2 3 0.5333 1.0666 1.5999 2.1332 2.6665 SE +/- 0.021, N = 3 SE +/- 0.016, N = 3 SE +/- 0.011, N = 3 2.322 2.370 2.336
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 1 2 3 3 6 9 12 15 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 10.21 10.21 10.24 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
Google SynthMark SynthMark is a cross platform tool for benchmarking CPU performance under a variety of real-time audio workloads. It uses a polyphonic synthesizer model to provide standardized tests for latency, jitter and computational throughput. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 1 2 3 110 220 330 440 550 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 512.11 512.07 512.07 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
OpenBenchmarking.org Iterations Per Second, More Is Better Cryptsetup PBKDF2-whirlpool 1 2 3 110K 220K 330K 440K 550K SE +/- 572.73, N = 3 SE +/- 280.29, N = 7 SE +/- 975.00, N = 3 510008 507751 506073
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Encryption 1 2 3 300 600 900 1200 1500 SE +/- 3.56, N = 3 SE +/- 2.27, N = 7 SE +/- 1.65, N = 3 1444.3 1456.9 1454.5
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 256b Decryption 1 2 3 300 600 900 1200 1500 SE +/- 2.14, N = 3 SE +/- 12.89, N = 7 SE +/- 2.05, N = 3 1445.4 1442.6 1453.1
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.58, N = 3 SE +/- 0.53, N = 7 SE +/- 0.03, N = 3 308.1 308.4 308.9
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Serpent-XTS 256b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.12, N = 3 SE +/- 0.10, N = 7 SE +/- 0.00, N = 3 306.7 307.0 307.1
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.37, N = 3 SE +/- 0.08, N = 7 SE +/- 0.03, N = 3 317.4 318.3 318.3
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 256b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.07, N = 7 SE +/- 0.10, N = 3 316.5 316.8 316.9
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Encryption 1 2 3 300 600 900 1200 1500 SE +/- 1.32, N = 3 SE +/- 1.46, N = 7 SE +/- 1.42, N = 3 1279.4 1286.7 1285.4
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup AES-XTS 512b Decryption 1 2 3 300 600 900 1200 1500 SE +/- 2.31, N = 3 SE +/- 1.94, N = 7 SE +/- 1.76, N = 3 1276.7 1285.0 1284.3
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Encryption 1 2 3 70 140 210 280 350 SE +/- 0.19, N = 3 SE +/- 0.06, N = 7 SE +/- 0.20, N = 2 317.7 318.1 318.0
OpenBenchmarking.org MiB/s, More Is Better Cryptsetup Twofish-XTS 512b Decryption 1 2 3 70 140 210 280 350 SE +/- 0.05, N = 2 SE +/- 0.06, N = 7 SE +/- 0.12, N = 3 316.0 316.6 316.7
Mobile Neural Network MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 1 2 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.20, N = 3 14.96 14.81 MIN: 13.84 / MAX: 36.38 MIN: 13.46 / MAX: 30.98 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 1 2 12 24 36 48 60 SE +/- 1.07, N = 3 SE +/- 2.27, N = 3 54.12 52.86 MIN: 46.9 / MAX: 742.65 MIN: 46.53 / MAX: 819.73 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 1 2 3 6 9 12 15 SE +/- 0.19, N = 3 SE +/- 0.30, N = 3 10.74 10.91 MIN: 10.16 / MAX: 11.79 MIN: 10.26 / MAX: 12.2 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 1 2 2 4 6 8 10 SE +/- 0.103, N = 3 SE +/- 0.613, N = 3 6.761 7.012 MIN: 6.2 / MAX: 8.27 MIN: 6 / MAX: 24.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 1 2 16 32 48 64 80 SE +/- 0.67, N = 3 SE +/- 1.22, N = 3 68.59 71.58 MIN: 62.84 / MAX: 229.88 MIN: 64.39 / MAX: 186.82 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 1 2 80 160 240 320 400 SE +/- 1.37, N = 3 SE +/- 0.16, N = 3 369.11 369.67 MIN: 357.24 / MAX: 557.3 MIN: 358.63 / MAX: 519.86 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 1 2 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.15, N = 3 333.40 333.27 MIN: 332.68 / MAX: 338.81 MIN: 332.42 / MAX: 334.08 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1 2 20 40 60 80 100 SE +/- 2.66, N = 12 SE +/- 2.07, N = 12 76 71 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1 2 13 26 39 52 65 SE +/- 0.44, N = 3 SE +/- 2.87, N = 9 58 54 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1 2 12 24 36 48 60 SE +/- 0.88, N = 3 SE +/- 1.02, N = 12 53 52 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 1 2 500 1000 1500 2000 2500 SE +/- 142.39, N = 12 SE +/- 112.64, N = 12 2188 2318 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1 2 400 800 1200 1600 2000 SE +/- 13.94, N = 3 SE +/- 32.31, N = 3 2084 2078 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 1 2 8M 16M 24M 32M 40M SE +/- 1848270.09, N = 15 SE +/- 1684001.44, N = 12 37882537 35226890 1. (CXX) g++ options: -O3 -fopenmp
1 Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x8001227Python Notes: Python 2.7.17rc1 + Python 3.7.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 19 January 2021 18:46 by user phoronix.
2 Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x8001227Python Notes: Python 2.7.17rc1 + Python 3.7.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 20 January 2021 08:25 by user phoronix.
3 Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD 17h, Memory: 504GB, Disk: 280GB INTEL SSDPED1D280GA + 12 x 500GB Samsung SSD 860 + 120GB INTEL SSDSCKJB120G7R, Graphics: llvmpipe, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMA + 2 x Broadcom NetXtreme BCM5720 2-port PCIe
OS: Ubuntu 19.10, Kernel: 5.9.0-050900rc6daily20200922-generic (x86_64) 20200921, Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 3.3 Mesa 19.2.8 (LLVM 9.0 128 bits), Compiler: GCC 9.2.1 20191008, File-System: ext4, Screen Resolution: 1600x1200
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x8001227Python Notes: Python 2.7.17rc1 + Python 3.7.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 20 January 2021 18:26 by user phoronix.