5950X ASUS ROG CROSSHAIR VIII HERO WiFi BIOS AMD Ryzen 9 5950X 16-Core testing with a ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3202 BIOS) and AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2101203-HA-5950XASUS43&rdt&grw .
5950X ASUS ROG CROSSHAIR VIII HERO WiFi BIOS Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Vulkan Compiler File-System Screen Resolution 3003 3202 AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads) ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3003 BIOS) AMD Starship/Matisse 32GB 2000GB Corsair Force MP600 + 2000GB AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (2100/875MHz) AMD Navi 10 HDMI Audio ASUS MG28U Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 20.10 5.11.0-051100rc2daily20210108-generic (x86_64) 20210107 GNOME Shell 3.38.1 X Server 1.20.9 amdgpu 19.1.0 4.6 Mesa 21.0.0-devel (git-f01bca8 2021-01-08 groovy-oibaf-ppa) (LLVM 11.0.1) 1.2.164 GCC 10.2.0 ext4 3840x2160 ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3202 BIOS) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa201009 Graphics Details - GLAMOR Python Details - Python 3.8.6 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected Disk Details - 3202: NONE / errors=remount-ro,relatime,rw / Block Size: 4096
5950X ASUS ROG CROSSHAIR VIII HERO WiFi BIOS yquake2: OpenGL 3.x - 3840 x 2160 compress-lz4: 1 - Compression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 3 - Compression Speed compress-lz4: 9 - Compression Speed compress-lz4: 9 - Decompression Speed crafty: Elapsed Time brl-cad: VGR Performance Metric warsow: 3840 x 2160 compress-lz4: 3 - Decompression Speed encode-ape: WAV To APE encode-opus: WAV To Opus Encode encode-wavpack: WAV To WavPack astcenc: Fast astcenc: Medium astcenc: Thorough astcenc: Exhaustive etcpak: DXT1 etcpak: ETC1 etcpak: ETC2 etcpak: ETC1 + Dithering libraw: Post-Processing Benchmark webp: Quality 100 webp: Quality 100, Lossless webp: Quality 100, Highest Compression webp: Quality 100, Lossless, Highest Compression espeak: Text-To-Speech Synthesis synthmark: VoiceMark_100 tesseract: 3840 x 2160 xonotic: 3840 x 2160 - Ultimate relion: Basic - CPU hmmer: Pfam Database Search numpy: hpcc: G-HPL hpcc: G-Ffte hpcc: EP-DGEMM hpcc: G-Ptrans hpcc: EP-STREAM Triad hpcc: G-Rand Access hpcc: Rand Ring Latency hpcc: Rand Ring Bandwidth hpcc: Max Ping Pong Bandwidth cloverleaf: Lagrangian-Eulerian Hydrodynamics dolfyn: Computational Fluid Dynamics deepspeech: CPU rnnoise: mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 onnx: yolov4 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU ior: 4MB - Default Test Directory ior: 256MB - Default Test Directory onnx: fcn-resnet101-11 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m gromacs: Water Benchmark lammps: 20k Atoms lammps: Rhodopsin Protein namd: ATPase Simulation - 327,506 Atoms onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU amg: kripke: lulesh: openfoam: Motorbike 30M openfoam: Motorbike 60M qmcpack: simple-H2O qe: AUSURF112 cp2k: Fayalite-FIST Data coremark: CoreMark Size 666 - Iterations Per Second compress-zstd: 3 compress-zstd: 19 build-linux-kernel: Time To Compile dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p x265: Bosphorus 4K x265: Bosphorus 1080p rav1e: 6 rav1e: 10 build-godot: Time To Compile indigobench: CPU - Bedroom indigobench: CPU - Supercar build2: Time To Compile build-eigen: Time To Compile phpbench: PHP Benchmark Suite sqlite-speedtest: Timed Time - Size 1,000 ior: 2MB - Default Test Directory dav1d: Chimera 1080p 10-bit ior: 8MB - Default Test Directory rav1e: 5 ior: 512MB - Default Test Directory etlegacy: Renderer2 - 3840 x 2160 3003 3202 979.3 11911.06 13410.2 68.88 68.57 13093.1 11736507 265419 429.6 13085.1 9.805 6.126 10.947 4.11 5.35 12.47 99.00 1532.411 382.932 236.507 349.700 52.99 1.741 13.039 5.465 27.225 21.560 958.662 356.1715 257.4557090 1892.609 82.704 514.13 53.16620 6.23452 16.85917 2.45326 1.42468 0.04998 0.48933 1.92562 34205.275 136.80 12.898 70.48086 15.181 5.232 24.036 3.402 2.501 31.005 438 705 1580.25 1339.91 99 16013 7056 218.293 211.459 12.04 4.38 4.07 4.38 3.90 5.29 1.80 12.99 60.60 14.51 11.04 25.05 21.14 14.64 17.67 1.275 13.539 13.104 1.07519 3.98022 9.48452 0.816147 0.477138 17.2340 2.38690 3.50445 19.1307 2.95002 1.53431 2753.77 1809.76 2752.88 1783.66 0.625862 2769.09 1818.11 1.45970 207261167 72385553 5041.8561 104.48 1391.39 22.312 1199.39 787.495 829763.126732 4723.9 43.4 45.787 590.32 224.40 534.95 24.57 47.87 1.958 3.362 79.198 4.175 8.803 79.997 60.052 831939 41.631 1540.02 96.35 1601.21 1.497 1748.21 224.3 986.5 11854.52 13319.6 69.10 66.95 12949.0 11427830 262742 430.5 12946.4 9.851 6.150 11.112 4.10 5.43 12.67 100.90 1562.887 387.141 244.998 355.439 53.09 1.729 12.873 5.360 27.522 21.692 957.951 353.5707 288.9672535 1875.48 82.928 514.08 53.16863 6.55050 16.57023 2.47841 1.47956 0.04973 0.49204 2.02474 34166.839 134.74 13.292 70.53470 15.234 5.153 23.630 3.325 2.481 30.332 428 665 1482.95 1251.65 98 15863 7324 220.329 212.620 11.99 4.46 4.15 4.42 3.96 5.37 1.82 13.00 59.90 14.57 11.26 25.00 21.29 14.60 17.95 1.258 13.390 13.113 1.08736 3.97992 9.51868 0.832251 0.484368 17.2901 2.43213 3.53182 19.2565 2.99402 1.59818 2749.23 1823.19 2763.90 1789.23 0.636735 2761.96 1795.37 1.48786 210165200 72580800 5066.7382 97.87 1380.20 22.371 1221.19 801.638 815726.122263 4738.4 43.3 46.410 592.56 228.62 535.52 24.18 47.62 1.965 3.446 80.072 4.132 8.669 81.883 61.251 834019 42.862 1388.54 96.66 1461.18 1.505 1682.82 OpenBenchmarking.org
yquake2 Renderer: OpenGL 3.x - Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better yquake2 7.45 Renderer: OpenGL 3.x - Resolution: 3840 x 2160 3003 3202 200 400 600 800 1000 SE +/- 1.03, N = 3 SE +/- 1.35, N = 3 979.3 986.5 1. (CC) gcc options: -lm -ldl -rdynamic -shared -lSDL2 -O2 -pipe -fomit-frame-pointer -std=gnu99 -fno-strict-aliasing -fwrapv -fvisibility=hidden -MMD -mfpmath=sse -fPIC
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Compression Speed 3003 3202 3K 6K 9K 12K 15K SE +/- 49.00, N = 3 SE +/- 59.07, N = 3 11911.06 11854.52 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Decompression Speed 3003 3202 3K 6K 9K 12K 15K SE +/- 19.80, N = 3 SE +/- 25.73, N = 3 13410.2 13319.6 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Compression Speed 3003 3202 15 30 45 60 75 SE +/- 0.92, N = 3 SE +/- 0.20, N = 3 68.88 69.10 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Compression Speed 3003 3202 15 30 45 60 75 SE +/- 0.80, N = 12 SE +/- 0.20, N = 3 68.57 66.95 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Decompression Speed 3003 3202 3K 6K 9K 12K 15K SE +/- 11.50, N = 12 SE +/- 11.91, N = 3 13093.1 12949.0 1. (CC) gcc options: -O3
Crafty Elapsed Time OpenBenchmarking.org Nodes Per Second, More Is Better Crafty 25.2 Elapsed Time 3003 3202 3M 6M 9M 12M 15M SE +/- 108514.09, N = 3 SE +/- 40276.62, N = 3 11736507 11427830 1. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm
BRL-CAD VGR Performance Metric OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.30.8 VGR Performance Metric 3003 3202 60K 120K 180K 240K 300K 265419 262742 1. (CXX) g++ options: -std=c++11 -pipe -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -pedantic -rdynamic -lSM -lICE -lXi -lGLU -lGL -lGLdispatch -lX11 -lXext -lXrender -lpthread -ldl -luuid -lm
Warsow Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better Warsow 2.5 Beta Resolution: 3840 x 2160 3003 3202 90 180 270 360 450 SE +/- 0.72, N = 3 SE +/- 0.67, N = 3 429.6 430.5
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Decompression Speed 3003 3202 3K 6K 9K 12K 15K SE +/- 7.36, N = 3 SE +/- 2.16, N = 3 13085.1 12946.4 1. (CC) gcc options: -O3
Monkey Audio Encoding WAV To APE OpenBenchmarking.org Seconds, Fewer Is Better Monkey Audio Encoding 3.99.6 WAV To APE 3003 3202 3 6 9 12 15 SE +/- 0.045, N = 5 SE +/- 0.041, N = 5 9.805 9.851 1. (CXX) g++ options: -O3 -pedantic -rdynamic -lrt
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode 3003 3202 2 4 6 8 10 SE +/- 0.032, N = 5 SE +/- 0.042, N = 5 6.126 6.150 1. (CXX) g++ options: -fvisibility=hidden -logg -lm
WavPack Audio Encoding WAV To WavPack OpenBenchmarking.org Seconds, Fewer Is Better WavPack Audio Encoding 5.3 WAV To WavPack 3003 3202 3 6 9 12 15 SE +/- 0.06, N = 5 SE +/- 0.03, N = 5 10.95 11.11 1. (CXX) g++ options: -rdynamic
ASTC Encoder Preset: Fast OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Fast 3003 3202 0.9248 1.8496 2.7744 3.6992 4.624 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 4.11 4.10 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
ASTC Encoder Preset: Medium OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Medium 3003 3202 1.2218 2.4436 3.6654 4.8872 6.109 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 5.35 5.43 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
ASTC Encoder Preset: Thorough OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Thorough 3003 3202 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 12.47 12.67 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Exhaustive 3003 3202 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.15, N = 3 99.00 100.90 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
Etcpak Configuration: DXT1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 3003 3202 300 600 900 1200 1500 SE +/- 2.32, N = 3 SE +/- 21.94, N = 3 1532.41 1562.89 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 3003 3202 80 160 240 320 400 SE +/- 1.98, N = 3 SE +/- 1.60, N = 3 382.93 387.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 3003 3202 50 100 150 200 250 SE +/- 0.59, N = 3 SE +/- 1.66, N = 3 236.51 245.00 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 + Dithering OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 + Dithering 3003 3202 80 160 240 320 400 SE +/- 0.36, N = 3 SE +/- 3.87, N = 3 349.70 355.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
LibRaw Post-Processing Benchmark OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark 3003 3202 12 24 36 48 60 SE +/- 0.08, N = 3 SE +/- 0.24, N = 3 52.99 53.09 1. (CXX) g++ options: -O2 -fopenmp -ljpeg -lz -lm
WebP Image Encode Encode Settings: Quality 100 OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 3003 3202 0.3917 0.7834 1.1751 1.5668 1.9585 SE +/- 0.001, N = 3 SE +/- 0.004, N = 3 1.741 1.729 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
WebP Image Encode Encode Settings: Quality 100, Lossless OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless 3003 3202 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 13.04 12.87 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
WebP Image Encode Encode Settings: Quality 100, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression 3003 3202 1.2296 2.4592 3.6888 4.9184 6.148 SE +/- 0.059, N = 3 SE +/- 0.062, N = 3 5.465 5.360 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
WebP Image Encode Encode Settings: Quality 100, Lossless, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression 3003 3202 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 27.23 27.52 1. (CC) gcc options: -fvisibility=hidden -O2 -pthread -lm -ljpeg -lpng16 -ltiff
eSpeak-NG Speech Engine Text-To-Speech Synthesis OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 20200907 Text-To-Speech Synthesis 3003 3202 5 10 15 20 25 SE +/- 0.08, N = 4 SE +/- 0.08, N = 4 21.56 21.69 1. (CC) gcc options: -O2 -std=c99
Google SynthMark Test: VoiceMark_100 OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 3003 3202 200 400 600 800 1000 SE +/- 3.00, N = 3 SE +/- 4.96, N = 3 958.66 957.95 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
Tesseract Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better Tesseract 2014-05-12 Resolution: 3840 x 2160 3003 3202 80 160 240 320 400 SE +/- 3.53, N = 15 SE +/- 4.21, N = 15 356.17 353.57
Xonotic Resolution: 3840 x 2160 - Effects Quality: Ultimate OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.2 Resolution: 3840 x 2160 - Effects Quality: Ultimate 3003 3202 60 120 180 240 300 SE +/- 4.83, N = 15 SE +/- 3.79, N = 3 257.46 288.97 MIN: 55 / MAX: 623 MIN: 60 / MAX: 571
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 3003 3202 400 800 1200 1600 2000 SE +/- 3.84, N = 3 SE +/- 6.24, N = 3 1892.61 1875.48 1. (CXX) g++ options: -fopenmp -std=c++0x -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.1 Pfam Database Search 3003 3202 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.07, N = 3 82.70 82.93 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark 3003 3202 110 220 330 440 550 SE +/- 0.48, N = 3 SE +/- 4.15, N = 3 514.13 514.08
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL 3003 3202 12 24 36 48 60 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 53.17 53.17 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte 3003 3202 2 4 6 8 10 SE +/- 0.10692, N = 3 SE +/- 0.02613, N = 3 6.23452 6.55050 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: EP-DGEMM OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: EP-DGEMM 3003 3202 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 16.86 16.57 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: G-Ptrans OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ptrans 3003 3202 0.5576 1.1152 1.6728 2.2304 2.788 SE +/- 0.00427, N = 3 SE +/- 0.01009, N = 3 2.45326 2.47841 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: EP-STREAM Triad OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad 3003 3202 0.3329 0.6658 0.9987 1.3316 1.6645 SE +/- 0.00070, N = 3 SE +/- 0.00089, N = 3 1.42468 1.47956 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: G-Random Access OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Random Access 3003 3202 0.0112 0.0224 0.0336 0.0448 0.056 SE +/- 0.00046, N = 3 SE +/- 0.00017, N = 3 0.04998 0.04973 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: Random Ring Latency OpenBenchmarking.org usecs, Fewer Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Latency 3003 3202 0.1107 0.2214 0.3321 0.4428 0.5535 SE +/- 0.00404, N = 3 SE +/- 0.00234, N = 3 0.48933 0.49204 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: Random Ring Bandwidth OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth 3003 3202 0.4556 0.9112 1.3668 1.8224 2.278 SE +/- 0.02644, N = 3 SE +/- 0.02919, N = 3 1.92562 2.02474 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
HPC Challenge Test / Class: Max Ping Pong Bandwidth OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth 3003 3202 7K 14K 21K 28K 35K SE +/- 146.47, N = 3 SE +/- 130.59, N = 3 34205.28 34166.84 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.0.3
CloverLeaf Lagrangian-Eulerian Hydrodynamics OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 3003 3202 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.24, N = 3 136.80 134.74 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Dolfyn Computational Fluid Dynamics OpenBenchmarking.org Seconds, Fewer Is Better Dolfyn 0.527 Computational Fluid Dynamics 3003 3202 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 12.90 13.29
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 3003 3202 16 32 48 64 80 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 70.48 70.53
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 3003 3202 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.18, N = 3 15.18 15.23 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: SqueezeNetV1.0 3003 3202 1.1772 2.3544 3.5316 4.7088 5.886 SE +/- 0.062, N = 3 SE +/- 0.022, N = 3 5.232 5.153 MIN: 5.02 / MAX: 8.4 MIN: 5.02 / MAX: 14.32 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: resnet-v2-50 3003 3202 6 12 18 24 30 SE +/- 0.25, N = 3 SE +/- 0.19, N = 3 24.04 23.63 MIN: 21.95 / MAX: 33.08 MIN: 22.31 / MAX: 33.12 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: MobileNetV2_224 3003 3202 0.7655 1.531 2.2965 3.062 3.8275 SE +/- 0.047, N = 3 SE +/- 0.039, N = 3 3.402 3.325 MIN: 3.23 / MAX: 5.81 MIN: 3.16 / MAX: 4.02 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: mobilenet-v1-1.0 3003 3202 0.5627 1.1254 1.6881 2.2508 2.8135 SE +/- 0.094, N = 3 SE +/- 0.027, N = 3 2.501 2.481 MIN: 2.32 / MAX: 4.53 MIN: 2.42 / MAX: 2.71 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.1 Model: inception-v3 3003 3202 7 14 21 28 35 SE +/- 0.18, N = 3 SE +/- 0.26, N = 3 31.01 30.33 MIN: 29.92 / MAX: 38.55 MIN: 29.28 / MAX: 56.48 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 3003 3202 90 180 270 360 450 SE +/- 1.17, N = 3 SE +/- 3.35, N = 3 438 428 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 3003 3202 150 300 450 600 750 SE +/- 1.32, N = 3 SE +/- 11.02, N = 12 705 665 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
IOR Block Size: 4MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 4MB - Disk Target: Default Test Directory 3003 3202 300 600 900 1200 1500 SE +/- 5.96, N = 3 SE +/- 11.11, N = 11 1580.25 1482.95 MIN: 1161.4 / MAX: 2244.12 MIN: 955.9 / MAX: 2484.92 1. (CC) gcc options: -O2 -lm -pthread -lmpi
IOR Block Size: 256MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 256MB - Disk Target: Default Test Directory 3003 3202 300 600 900 1200 1500 SE +/- 19.35, N = 9 SE +/- 14.35, N = 9 1339.91 1251.65 MIN: 282.98 / MAX: 2236.64 MIN: 354.68 / MAX: 2107.13 1. (CC) gcc options: -O2 -lm -pthread -lmpi
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 3003 3202 20 40 60 80 100 SE +/- 0.33, N = 3 99 98 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 3003 3202 3K 6K 9K 12K 15K SE +/- 20.67, N = 3 SE +/- 68.50, N = 3 16013 15863 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 3003 3202 1600 3200 4800 6400 8000 SE +/- 202.78, N = 12 SE +/- 175.44, N = 12 7056 7324 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 3003 3202 50 100 150 200 250 SE +/- 0.32, N = 3 SE +/- 0.67, N = 3 218.29 220.33 MIN: 208.52 / MAX: 289.05 MIN: 216.95 / MAX: 261.2 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 3003 3202 50 100 150 200 250 SE +/- 0.43, N = 3 SE +/- 0.34, N = 3 211.46 212.62 MIN: 210.71 / MAX: 212.37 MIN: 211.89 / MAX: 213.24 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet 3003 3202 3 6 9 12 15 SE +/- 0.15, N = 3 SE +/- 0.01, N = 3 12.04 11.99 MIN: 11.72 / MAX: 14.17 MIN: 11.79 / MAX: 12.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 3003 3202 1.0035 2.007 3.0105 4.014 5.0175 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 4.38 4.46 MIN: 4.24 / MAX: 7.32 MIN: 4.31 / MAX: 5.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 3003 3202 0.9338 1.8676 2.8014 3.7352 4.669 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 4.07 4.15 MIN: 4.03 / MAX: 5.2 MIN: 4.11 / MAX: 5.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 3003 3202 0.9945 1.989 2.9835 3.978 4.9725 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 4.38 4.42 MIN: 4.33 / MAX: 4.89 MIN: 4.35 / MAX: 5.28 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet 3003 3202 0.891 1.782 2.673 3.564 4.455 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.90 3.96 MIN: 3.78 / MAX: 4.83 MIN: 3.84 / MAX: 5.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 3003 3202 1.2083 2.4166 3.6249 4.8332 6.0415 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 5.29 5.37 MIN: 5.24 / MAX: 5.79 MIN: 5.31 / MAX: 7.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface 3003 3202 0.4095 0.819 1.2285 1.638 2.0475 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.80 1.82 MIN: 1.78 / MAX: 2.28 MIN: 1.79 / MAX: 2.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet 3003 3202 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 12.99 13.00 MIN: 12.62 / MAX: 13.57 MIN: 12.64 / MAX: 20.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 3003 3202 14 28 42 56 70 SE +/- 0.13, N = 3 SE +/- 0.09, N = 3 60.60 59.90 MIN: 59.43 / MAX: 62.32 MIN: 58.71 / MAX: 61.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 3003 3202 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 14.51 14.57 MIN: 14.39 / MAX: 15.06 MIN: 14.45 / MAX: 23.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet 3003 3202 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.24, N = 3 11.04 11.26 MIN: 10.95 / MAX: 11.89 MIN: 10.91 / MAX: 19.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 3003 3202 6 12 18 24 30 SE +/- 0.13, N = 3 SE +/- 0.17, N = 3 25.05 25.00 MIN: 24.65 / MAX: 26.23 MIN: 24.58 / MAX: 26.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny 3003 3202 5 10 15 20 25 SE +/- 0.11, N = 3 SE +/- 0.08, N = 3 21.14 21.29 MIN: 20.72 / MAX: 29.47 MIN: 20.98 / MAX: 21.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd 3003 3202 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 14.64 14.60 MIN: 14.31 / MAX: 15.31 MIN: 14.21 / MAX: 15.07 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m 3003 3202 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 17.67 17.95 MIN: 17.47 / MAX: 18.02 MIN: 17.66 / MAX: 19.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
GROMACS Water Benchmark OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2020.3 Water Benchmark 3003 3202 0.2869 0.5738 0.8607 1.1476 1.4345 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 1.275 1.258 1. (CXX) g++ options: -O3 -pthread -lrt -lpthread -lm
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 3003 3202 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.03, N = 3 13.54 13.39 1. (CXX) g++ options: -O3 -pthread -lm
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 3003 3202 3 6 9 12 15 SE +/- 0.15, N = 15 SE +/- 0.13, N = 15 13.10 13.11 1. (CXX) g++ options: -O3 -pthread -lm
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 3003 3202 0.2447 0.4894 0.7341 0.9788 1.2235 SE +/- 0.00258, N = 3 SE +/- 0.00500, N = 3 1.07519 1.08736
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 3003 3202 0.8955 1.791 2.6865 3.582 4.4775 SE +/- 0.01644, N = 3 SE +/- 0.00985, N = 3 3.98022 3.97992 MIN: 3.72 MIN: 3.76 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 3003 3202 3 6 9 12 15 SE +/- 0.01431, N = 3 SE +/- 0.00745, N = 3 9.48452 9.51868 MIN: 9.38 MIN: 9.44 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 3003 3202 0.1873 0.3746 0.5619 0.7492 0.9365 SE +/- 0.001756, N = 3 SE +/- 0.001536, N = 3 0.816147 0.832251 MIN: 0.74 MIN: 0.75 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 3003 3202 0.109 0.218 0.327 0.436 0.545 SE +/- 0.002170, N = 3 SE +/- 0.003401, N = 15 0.477138 0.484368 MIN: 0.44 MIN: 0.43 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 3003 3202 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 17.23 17.29 MIN: 16.81 MIN: 16.66 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 3003 3202 0.5472 1.0944 1.6416 2.1888 2.736 SE +/- 0.00251, N = 3 SE +/- 0.00987, N = 3 2.38690 2.43213 MIN: 2.28 MIN: 2.3 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 3003 3202 0.7947 1.5894 2.3841 3.1788 3.9735 SE +/- 0.00613, N = 3 SE +/- 0.00421, N = 3 3.50445 3.53182 MIN: 3.38 MIN: 3.41 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 3003 3202 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 19.13 19.26 MIN: 18.77 MIN: 18.8 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 3003 3202 0.6737 1.3474 2.0211 2.6948 3.3685 SE +/- 0.00671, N = 3 SE +/- 0.00911, N = 3 2.95002 2.99402 MIN: 2.81 MIN: 2.83 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 3003 3202 0.3596 0.7192 1.0788 1.4384 1.798 SE +/- 0.00153, N = 3 SE +/- 0.00175, N = 3 1.53431 1.59818 MIN: 1.41 MIN: 1.48 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 3003 3202 600 1200 1800 2400 3000 SE +/- 13.63, N = 3 SE +/- 2.89, N = 3 2753.77 2749.23 MIN: 2727.8 MIN: 2735.56 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 3003 3202 400 800 1200 1600 2000 SE +/- 15.33, N = 3 SE +/- 9.60, N = 3 1809.76 1823.19 MIN: 1773.21 MIN: 1798.09 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 3003 3202 600 1200 1800 2400 3000 SE +/- 11.23, N = 3 SE +/- 9.00, N = 3 2752.88 2763.90 MIN: 2722.06 MIN: 2736.15 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 3003 3202 400 800 1200 1600 2000 SE +/- 8.25, N = 3 SE +/- 17.01, N = 3 1783.66 1789.23 MIN: 1765.38 MIN: 1761.64 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 3003 3202 0.1433 0.2866 0.4299 0.5732 0.7165 SE +/- 0.000749, N = 3 SE +/- 0.000300, N = 3 0.625862 0.636735 MIN: 0.6 MIN: 0.6 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 3003 3202 600 1200 1800 2400 3000 SE +/- 9.46, N = 3 SE +/- 4.41, N = 3 2769.09 2761.96 MIN: 2739.1 MIN: 2740.57 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 3003 3202 400 800 1200 1600 2000 SE +/- 20.56, N = 3 SE +/- 19.66, N = 4 1818.11 1795.37 MIN: 1764.89 MIN: 1755.37 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 3003 3202 0.3348 0.6696 1.0044 1.3392 1.674 SE +/- 0.00059, N = 3 SE +/- 0.00214, N = 3 1.45970 1.48786 MIN: 1.39 MIN: 1.39 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 3003 3202 50M 100M 150M 200M 250M SE +/- 2159740.57, N = 3 SE +/- 569155.55, N = 3 207261167 210165200 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
Kripke OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 3003 3202 16M 32M 48M 64M 80M SE +/- 1023469.37, N = 3 SE +/- 319563.42, N = 3 72385553 72580800 1. (CXX) g++ options: -O3 -fopenmp
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 3003 3202 1100 2200 3300 4400 5500 SE +/- 59.80, N = 3 SE +/- 66.75, N = 3 5041.86 5066.74 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 3003 3202 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.15, N = 3 104.48 97.87 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfoamToVTK -ldynamicMesh -llagrangian -lgenericPatchFields -lfileFormats -lOpenFOAM -ldl -lm
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 3003 3202 300 600 900 1200 1500 SE +/- 0.57, N = 3 SE +/- 0.41, N = 3 1391.39 1380.20 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfoamToVTK -ldynamicMesh -llagrangian -lgenericPatchFields -lfileFormats -lOpenFOAM -ldl -lm
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.10 Input: simple-H2O 3003 3202 5 10 15 20 25 SE +/- 0.21, N = 7 SE +/- 0.14, N = 3 22.31 22.37 1. (CXX) g++ options: -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -march=native -O3 -fomit-frame-pointer -ffast-math -pthread -lm
Quantum ESPRESSO Input: AUSURF112 OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 6.7 Input: AUSURF112 3003 3202 300 600 900 1200 1500 SE +/- 0.92, N = 3 SE +/- 3.58, N = 3 1199.39 1221.19 1. (F9X) gfortran options: -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -ldl -levent -levent_pthreads -lutil -lm -lrt -lz
CP2K Molecular Dynamics Fayalite-FIST Data OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 8.1 Fayalite-FIST Data 3003 3202 200 400 600 800 1000 787.50 801.64
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second 3003 3202 200K 400K 600K 800K 1000K SE +/- 1722.61, N = 3 SE +/- 552.83, N = 3 829763.13 815726.12 1. (CC) gcc options: -O2 -lrt" -lrt
Zstd Compression Compression Level: 3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 3 3003 3202 1000 2000 3000 4000 5000 SE +/- 8.02, N = 3 SE +/- 12.71, N = 3 4723.9 4738.4 1. (CC) gcc options: -O3 -pthread -lz -llzma
Zstd Compression Compression Level: 19 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 19 3003 3202 10 20 30 40 50 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 43.4 43.3 1. (CC) gcc options: -O3 -pthread -lz -llzma
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 5.4 Time To Compile 3003 3202 11 22 33 44 55 SE +/- 0.43, N = 3 SE +/- 0.49, N = 3 45.79 46.41
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 3003 3202 130 260 390 520 650 SE +/- 0.69, N = 3 SE +/- 0.77, N = 3 590.32 592.56 MIN: 447.67 / MAX: 749.27 MIN: 447.8 / MAX: 754.79 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 4K 3003 3202 50 100 150 200 250 SE +/- 0.49, N = 3 SE +/- 0.36, N = 3 224.40 228.62 MIN: 172.58 / MAX: 234.67 MIN: 172.54 / MAX: 238.82 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Summer Nature 1080p 3003 3202 120 240 360 480 600 SE +/- 4.29, N = 3 SE +/- 1.59, N = 3 534.95 535.52 MIN: 432.57 / MAX: 611.74 MIN: 453.14 / MAX: 589.94 1. (CC) gcc options: -pthread
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K 3003 3202 6 12 18 24 30 SE +/- 0.35, N = 3 SE +/- 0.26, N = 4 24.57 24.18 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p 3003 3202 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.22, N = 3 47.87 47.62 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 3003 3202 0.4421 0.8842 1.3263 1.7684 2.2105 SE +/- 0.005, N = 3 SE +/- 0.016, N = 3 1.958 1.965
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 3003 3202 0.7754 1.5508 2.3262 3.1016 3.877 SE +/- 0.037, N = 3 SE +/- 0.037, N = 15 3.362 3.446
Timed Godot Game Engine Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Godot Game Engine Compilation 3.2.3 Time To Compile 3003 3202 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.31, N = 3 79.20 80.07
IndigoBench Acceleration: CPU - Scene: Bedroom OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: CPU - Scene: Bedroom 3003 3202 0.9394 1.8788 2.8182 3.7576 4.697 SE +/- 0.006, N = 3 SE +/- 0.017, N = 3 4.175 4.132
IndigoBench Acceleration: CPU - Scene: Supercar OpenBenchmarking.org M samples/s, More Is Better IndigoBench 4.4 Acceleration: CPU - Scene: Supercar 3003 3202 2 4 6 8 10 SE +/- 0.008, N = 3 SE +/- 0.011, N = 3 8.803 8.669
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile 3003 3202 20 40 60 80 100 SE +/- 0.10, N = 3 SE +/- 0.12, N = 3 80.00 81.88
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.3.9 Time To Compile 3003 3202 14 28 42 56 70 SE +/- 0.21, N = 3 SE +/- 0.01, N = 3 60.05 61.25
PHPBench PHP Benchmark Suite OpenBenchmarking.org Score, More Is Better PHPBench 0.8.1 PHP Benchmark Suite 3003 3202 200K 400K 600K 800K 1000K SE +/- 8175.01, N = 3 SE +/- 7522.35, N = 3 831939 834019
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 3003 3202 10 20 30 40 50 SE +/- 0.34, N = 3 SE +/- 0.24, N = 15 41.63 42.86 1. (CC) gcc options: -O2 -ldl -lz -lpthread
IOR Block Size: 2MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 2MB - Disk Target: Default Test Directory 3003 3202 300 600 900 1200 1500 SE +/- 2.40, N = 3 SE +/- 14.36, N = 3 1540.02 1388.54 MIN: 1034.78 / MAX: 2149.91 MIN: 890.97 / MAX: 2113.75 1. (CC) gcc options: -O2 -lm -pthread -lmpi
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.8.1 Video Input: Chimera 1080p 10-bit 3003 3202 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 96.35 96.66 MIN: 61.49 / MAX: 217.11 MIN: 61.56 / MAX: 221.22 1. (CC) gcc options: -pthread
IOR Block Size: 8MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 8MB - Disk Target: Default Test Directory 3003 3202 300 600 900 1200 1500 SE +/- 6.28, N = 3 SE +/- 28.69, N = 13 1601.21 1461.18 MIN: 1005.82 / MAX: 2534.37 MIN: 491.21 / MAX: 2711.8 1. (CC) gcc options: -O2 -lm -pthread -lmpi
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 3003 3202 0.3386 0.6772 1.0158 1.3544 1.693 SE +/- 0.005, N = 3 SE +/- 0.007, N = 3 1.497 1.505
IOR Block Size: 512MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 512MB - Disk Target: Default Test Directory 3003 3202 400 800 1200 1600 2000 SE +/- 11.67, N = 3 SE +/- 22.61, N = 9 1748.21 1682.82 MIN: 534.9 / MAX: 2360.08 MIN: 251.69 / MAX: 2253.72 1. (CC) gcc options: -O2 -lm -pthread -lmpi
ET: Legacy Renderer: Renderer2 - Resolution: 3840 x 2160 OpenBenchmarking.org Frames Per Second, More Is Better ET: Legacy 2.75 Renderer: Renderer2 - Resolution: 3840 x 2160 3003 50 100 150 200 250 224.3
Phoronix Test Suite v10.8.5