Intel Xeon E5-2609 v4 testing with a MSI X99A RAIDER (MS-7885) v5.0 (P.50 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
A Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0xb000038Python Notes: Python 2.7.18rc1 + Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT disabled + mds: Mitigation of Clear buffers; SMT disabled + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of Clear buffers; SMT disabled
B Processor: Intel Xeon E5-2609 v4 @ 1.70GHz (8 Cores), Motherboard: MSI X99A RAIDER (MS-7885) v5.0 (P.50 BIOS), Chipset: Intel Xeon E7 v4/Xeon, Memory: 16GB, Disk: 256GB CORSAIR FORCE LX, Graphics: llvmpipe, Audio: Realtek ALC892, Network: Intel I218-V
OS: Ubuntu 20.04, Kernel: 5.9.0-050900rc6daily20200926-generic (x86_64) 20200925, Desktop: GNOME Shell 3.36.2, Display Server: X Server 1.20.8, OpenGL: 3.3 Mesa 20.0.4 (LLVM 9.0.1 256 bits), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768
xeon okt OpenBenchmarking.org Phoronix Test Suite Intel Xeon E5-2609 v4 @ 1.70GHz (8 Cores) MSI X99A RAIDER (MS-7885) v5.0 (P.50 BIOS) Intel Xeon E7 v4/Xeon 16GB 256GB CORSAIR FORCE LX llvmpipe Realtek ALC892 Intel I218-V Ubuntu 20.04 5.9.0-050900rc6daily20200926-generic (x86_64) 20200925 GNOME Shell 3.36.2 X Server 1.20.8 3.3 Mesa 20.0.4 (LLVM 9.0.1 256 bits) GCC 9.3.0 ext4 1024x768 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution Xeon Okt Benchmarks System Logs - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0xb000038 - Python 2.7.18rc1 + Python 3.8.2 - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT disabled + mds: Mitigation of Clear buffers; SMT disabled + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of Clear buffers; SMT disabled
A vs. B Comparison Phoronix Test Suite Baseline +1.4% +1.4% +2.8% +2.8% +4.2% +4.2% +5.6% +5.6% 5.6% 3.5% 3% 2.2% R.N.N.I - f32 - CPU HWB Color Space Rotate Speed 6 Realtime - Bosphorus 4K 1.R.W.A.D.F.R.C.C 2% CPU - resnet50 2% oneDNN GraphicsMagick GraphicsMagick AOM AV1 ClickHouse NCNN A B
xeon okt compress-7zip: Compression Rating compress-7zip: Decompression Rating aom-av1: Speed 0 Two-Pass - Bosphorus 4K aom-av1: Speed 4 Two-Pass - Bosphorus 4K aom-av1: Speed 6 Realtime - Bosphorus 4K aom-av1: Speed 6 Two-Pass - Bosphorus 4K aom-av1: Speed 8 Realtime - Bosphorus 4K aom-av1: Speed 9 Realtime - Bosphorus 4K aom-av1: Speed 10 Realtime - Bosphorus 4K aom-av1: Speed 0 Two-Pass - Bosphorus 1080p aom-av1: Speed 4 Two-Pass - Bosphorus 1080p aom-av1: Speed 6 Realtime - Bosphorus 1080p aom-av1: Speed 6 Two-Pass - Bosphorus 1080p aom-av1: Speed 8 Realtime - Bosphorus 1080p aom-av1: Speed 9 Realtime - Bosphorus 1080p aom-av1: Speed 10 Realtime - Bosphorus 1080p astcenc: Fast astcenc: Medium astcenc: Thorough astcenc: Exhaustive blender: BMW27 - CPU-Only brl-cad: VGR Performance Metric chia-vdf: Square Plain C++ chia-vdf: Square Assembly Optimized clickhouse: 100M Rows Web Analytics Dataset, First Run / Cold Cache clickhouse: 100M Rows Web Analytics Dataset, Second Run clickhouse: 100M Rows Web Analytics Dataset, Third Run encode-flac: WAV To FLAC glibc-bench: cos glibc-bench: exp glibc-bench: ffs glibc-bench: sin glibc-bench: log2 glibc-bench: modf glibc-bench: sinh glibc-bench: sqrt glibc-bench: tanh glibc-bench: asinh glibc-bench: atanh glibc-bench: ffsll glibc-bench: sincos glibc-bench: pthread_once graphics-magick: Swirl graphics-magick: Rotate graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Resizing graphics-magick: Noise-Gaussian graphics-magick: HWB Color Space gromacs: MPI CPU - water_GMX50_bare jpegxl-decode: 1 jpegxl-decode: All jpegxl: PNG - 80 jpegxl: PNG - 90 jpegxl: JPEG - 80 jpegxl: JPEG - 90 jpegxl: PNG - 100 jpegxl: JPEG - 100 kvazaar: Bosphorus 4K - Very Fast kvazaar: Bosphorus 4K - Ultra Fast kvazaar: Bosphorus 1080p - Very Fast kvazaar: Bosphorus 1080p - Ultra Fast lammps: Rhodopsin Protein avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 6, Lossless avifenc: 10, Lossless mnn: nasnet mnn: mobilenetV3 mnn: squeezenetv1.1 mnn: resnet-v2-50 mnn: SqueezeNetV1.0 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 natron: Spaceship ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream node-web-tooling: onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU compress-pbzip2: FreeBSD-13.0-RELEASE-amd64-memstick.img Compression build-python: Default build-python: Released Build, PGO + LTO Optimized build-erlang: Time To Compile build-gem5: Time To Compile build-linux-kernel: defconfig build-linux-kernel: allmodconfig build-mplayer: Time To Compile build-nodejs: Time To Compile build-php: Time To Compile A B 22928 15114 0.05 1.84 9.6 3.38 16.73 23.51 24.34 0.15 4.4 21.44 10.82 47.75 62.36 65.56 40.4501 14.8031 1.8866 0.1856 479.3 40981 54200 77800 44.69 49.86 50.33 52.616 102.653 27.7867 6.02894 88.6781 30.6745 11.0044 36.5101 8.03159 46.7519 41.6897 50.2539 6.024 54.1233 7.00595 104 335 43 71 312 78 371 0.456 15.28 93.25 3.45 3.35 3.33 3.24 0.27 0.27 5.12 8.97 22.9 38.73 2.699 579.164 270.475 26.985 40.028 18.16 21.52 2.799 5.328 40.677 7.646 5.3 5.773 54.1 0.9 23.87 7.04 6.23 6.81 6.68 13.53 2.07 17.16 55.89 11.76 9.69 27.45 35.17 21.59 20.48 761.24 8.41 2.6413 1497.9355 2.6027 384.2074 10.8624 367.5583 9.928 100.7098 16.8314 237.4866 16.3292 61.2223 33.7991 118.0026 30.5363 32.7341 23.1105 172.6931 19.9431 50.1327 11.671 342.5351 10.0327 99.6646 2.6288 1512.7836 2.5933 385.6012 5.04 8.12857 8.17122 5.66887 3.8418 14.1653 15.9854 14.6549 17.1172 8.39234 11.4678 7868.71 4341.02 7874.27 4077.01 4.34039 7894.49 4083.82 3.50581 32.309 55.468 878.598 293.818 1396.872 311.827 3885.787 119.904 2040.12 202.301 22984 15213 0.05 1.85 9.81 3.36 16.63 23.41 24.23 0.15 4.41 21.19 10.73 48.1 62.74 66.17 40.4707 14.8165 1.8842 0.1857 479.79 41465 54100 78500 43.80 49.68 50.82 52.848 102.631 27.7312 6.0309 88.6764 30.664 11.0009 36.5099 8.03153 46.7552 41.6225 50.2277 6.03178 54.1258 7.00562 104 345 43 71 315 77 384 0.456 15.45 94.94 3.45 3.38 3.33 3.26 0.27 0.27 5.17 8.99 22.94 39.08 2.713 588.842 272.753 27.159 39.837 18.077 21.554 2.811 5.337 40.488 7.64 5.287 5.771 53.985 0.9 23.98 7.06 6.24 6.77 6.67 13.54 2.07 17.16 56.05 11.68 9.67 27.99 35.61 21.67 20.59 760.24 8.44 2.6054 1503.5043 2.6078 383.4488 10.8259 368.5715 9.9442 100.5442 16.8184 237.7222 16.3111 61.2903 33.7523 118.2881 30.5676 32.7022 23.206 172.2767 20.0013 49.9871 11.658 341.936 10.0072 99.9187 2.6644 1501.1013 2.5917 385.8361 5 8.17729 8.05773 5.66947 3.85965 14.1456 15.9733 14.6726 17.1056 8.34261 11.4522 7890.48 4112.59 7866.79 4098.73 4.26921 7935.85 4081.6 3.5256 32.234 55.341 877.31 293.507 1397.053 311.547 3881.579 119.696 2039.395 202.529 OpenBenchmarking.org
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 4K A B 0.4163 0.8326 1.2489 1.6652 2.0815 1.84 1.85 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 4K A B 3 6 9 12 15 9.60 9.81 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4K A B 0.7605 1.521 2.2815 3.042 3.8025 3.38 3.36 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K A B 4 8 12 16 20 16.73 16.63 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4K A B 6 12 18 24 30 23.51 23.41 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4K A B 6 12 18 24 30 24.34 24.23 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 0 Two-Pass - Input: Bosphorus 1080p A B 0.0338 0.0676 0.1014 0.1352 0.169 0.15 0.15 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 1080p A B 0.9923 1.9846 2.9769 3.9692 4.9615 4.40 4.41 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 1080p A B 5 10 15 20 25 21.44 21.19 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 1080p A B 3 6 9 12 15 10.82 10.73 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p A B 11 22 33 44 55 47.75 48.10 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p A B 14 28 42 56 70 62.36 62.74 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.5 Encoder Mode: Speed 10 Realtime - Input: Bosphorus 1080p A B 15 30 45 60 75 65.56 66.17 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
BRL-CAD BRL-CAD is a cross-platform, open-source solid modeling system with built-in benchmark mode. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.32.6 VGR Performance Metric A B 9K 18K 27K 36K 45K 40981 41465 1. (CXX) g++ options: -std=c++11 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -pedantic -pthread -ldl -lm
Chia Blockchain VDF Chia is a blockchain and smart transaction platform based on proofs of space and time rather than proofs of work with other cryptocurrencies. This test profile is benchmarking the CPU performance for Chia VDF performance using the Chia VDF benchmark. The Chia VDF is for the Chia Verifiable Delay Function (Proof of Time). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org IPS, More Is Better Chia Blockchain VDF 1.0.7 Test: Square Plain C++ A B 12K 24K 36K 48K 60K 54200 54100 1. (CXX) g++ options: -flto -no-pie -lgmpxx -lgmp -lboost_system -pthread
ClickHouse ClickHouse is an open-source, high performance OLAP data management system. This test profile uses ClickHouse's standard benchmark recommendations per https://clickhouse.com/docs/en/operations/performance-test/ with the 100 million rows web analytics dataset. The reported value is the query processing time using the geometric mean of all queries performed. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.5.4.19 100M Rows Web Analytics Dataset, First Run / Cold Cache A B 10 20 30 40 50 44.69 43.80 MIN: 4.76 / MAX: 6666.67 MIN: 4.74 / MAX: 4615.38 1. ClickHouse server version 22.5.4.19 (official build).
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.5.4.19 100M Rows Web Analytics Dataset, Second Run A B 11 22 33 44 55 49.86 49.68 MIN: 5.13 / MAX: 5000 MIN: 4.93 / MAX: 7500 1. ClickHouse server version 22.5.4.19 (official build).
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.5.4.19 100M Rows Web Analytics Dataset, Third Run A B 11 22 33 44 55 50.33 50.82 MIN: 4.81 / MAX: 7500 MIN: 4.97 / MAX: 7500 1. ClickHouse server version 22.5.4.19 (official build).
Glibc Benchmarks The GNU C Library project provides the core libraries for the GNU system and GNU/Linux systems, as well as many other systems that use Linux as the kernel. These libraries provide critical APIs including ISO C11, POSIX.1-2008, BSD, OS-specific APIs and more. This test profile makes use of Glibc's "benchtests" integrated benchmark suite. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ns, Fewer Is Better Glibc Benchmarks Benchmark: cos A B 20 40 60 80 100 102.65 102.63 1. (CC) gcc options: -pie -nostdlib -nostartfiles -lgcc -lgcc_s
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
Scale: 26
A: The test quit with a non-zero exit status. E: mpirun noticed that process rank 4 with PID 0 on node phoronix-MS-7885 exited on signal 9 (Killed).
B: The test quit with a non-zero exit status. E: mpirun noticed that process rank 0 with PID 0 on node phoronix-MS-7885 exited on signal 9 (Killed).
Scale: 29
A: The test quit with a non-zero exit status. E: mpirun noticed that process rank 1 with PID 0 on node phoronix-MS-7885 exited on signal 9 (Killed).
B: The test quit with a non-zero exit status. E: mpirun noticed that process rank 6 with PID 0 on node phoronix-MS-7885 exited on signal 9 (Killed).
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Rotate A B 70 140 210 280 350 335 345 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Sharpen A B 10 20 30 40 50 43 43 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Enhanced A B 16 32 48 64 80 71 71 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Resizing A B 70 140 210 280 350 312 315 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Noise-Gaussian A B 20 40 60 80 100 78 77 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: HWB Color Space A B 80 160 240 320 400 371 384 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare A B 0.1026 0.2052 0.3078 0.4104 0.513 0.456 0.456 1. (CXX) g++ options: -O3 -pthread
JPEG XL Decoding libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is suited for JPEG XL decode performance testing to PNG output file, the pts/jpexl test is for encode performance. The JPEG XL encoding/decoding is done using the libjxl codebase. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding libjxl 0.7 CPU Threads: 1 A B 4 8 12 16 20 15.28 15.45
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 80 A B 0.7763 1.5526 2.3289 3.1052 3.8815 3.45 3.45 1. (CXX) g++ options: -fno-rtti -funwind-tables -O3 -O2 -fPIE -pie -lm -pthread -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 90 A B 0.7605 1.521 2.2815 3.042 3.8025 3.35 3.38 1. (CXX) g++ options: -fno-rtti -funwind-tables -O3 -O2 -fPIE -pie -lm -pthread -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 80 A B 0.7493 1.4986 2.2479 2.9972 3.7465 3.33 3.33 1. (CXX) g++ options: -fno-rtti -funwind-tables -O3 -O2 -fPIE -pie -lm -pthread -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 90 A B 0.7335 1.467 2.2005 2.934 3.6675 3.24 3.26 1. (CXX) g++ options: -fno-rtti -funwind-tables -O3 -O2 -fPIE -pie -lm -pthread -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 100 A B 0.0608 0.1216 0.1824 0.2432 0.304 0.27 0.27 1. (CXX) g++ options: -fno-rtti -funwind-tables -O3 -O2 -fPIE -pie -lm -pthread -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 100 A B 0.0608 0.1216 0.1824 0.2432 0.304 0.27 0.27 1. (CXX) g++ options: -fno-rtti -funwind-tables -O3 -O2 -fPIE -pie -lm -pthread -latomic
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast A B 1.1633 2.3266 3.4899 4.6532 5.8165 5.12 5.17 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast A B 3 6 9 12 15 8.97 8.99 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 1080p - Video Preset: Very Fast A B 5 10 15 20 25 22.90 22.94 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast A B 9 18 27 36 45 38.73 39.08 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Mobile Neural Network MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. This MNN test profile is building the OpenMP / CPU threaded version for processor benchmarking and not any GPU-accelerated test. MNN does allow making use of AVX-512 extensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: nasnet A B 5 10 15 20 25 21.52 21.55 MIN: 21.31 / MAX: 40.96 MIN: 21.33 / MAX: 40.77 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenetV3 A B 0.6325 1.265 1.8975 2.53 3.1625 2.799 2.811 MIN: 2.77 / MAX: 3.18 MIN: 2.79 / MAX: 3.18 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: squeezenetv1.1 A B 1.2008 2.4016 3.6024 4.8032 6.004 5.328 5.337 MIN: 5.29 / MAX: 7.62 MIN: 5.31 / MAX: 7.55 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 A B 9 18 27 36 45 40.68 40.49 MIN: 40.52 / MAX: 58.97 MIN: 40.38 / MAX: 59.76 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 A B 2 4 6 8 10 7.646 7.640 MIN: 7.61 / MAX: 10.41 MIN: 7.6 / MAX: 8.84 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: MobileNetV2_224 A B 1.1925 2.385 3.5775 4.77 5.9625 5.300 5.287 MIN: 5.27 / MAX: 6.53 MIN: 5.25 / MAX: 10.12 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenet-v1-1.0 A B 1.2989 2.5978 3.8967 5.1956 6.4945 5.773 5.771 MIN: 5.74 / MAX: 7.01 MIN: 5.73 / MAX: 25.01 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: inception-v3 A B 12 24 36 48 60 54.10 53.99 MIN: 53.89 / MAX: 101.31 MIN: 53.83 / MAX: 73.18 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU-v2-v2 - Model: mobilenet-v2 A B 2 4 6 8 10 7.04 7.06 MIN: 6.98 / MAX: 7.12 MIN: 7 / MAX: 7.4 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU-v3-v3 - Model: mobilenet-v3 A B 2 4 6 8 10 6.23 6.24 MIN: 6.18 / MAX: 6.3 MIN: 6.19 / MAX: 6.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: shufflenet-v2 A B 2 4 6 8 10 6.81 6.77 MIN: 6.7 / MAX: 25.84 MIN: 6.73 / MAX: 6.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: mnasnet A B 2 4 6 8 10 6.68 6.67 MIN: 6.59 / MAX: 12.03 MIN: 6.61 / MAX: 7.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: efficientnet-b0 A B 3 6 9 12 15 13.53 13.54 MIN: 13.48 / MAX: 13.64 MIN: 13.49 / MAX: 13.62 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: blazeface A B 0.4658 0.9316 1.3974 1.8632 2.329 2.07 2.07 MIN: 2.04 / MAX: 2.17 MIN: 2.04 / MAX: 2.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: googlenet A B 4 8 12 16 20 17.16 17.16 MIN: 17.05 / MAX: 18.6 MIN: 17.04 / MAX: 19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: vgg16 A B 13 26 39 52 65 55.89 56.05 MIN: 55.71 / MAX: 57.17 MIN: 55.71 / MAX: 75.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: resnet18 A B 3 6 9 12 15 11.76 11.68 MIN: 11.68 / MAX: 12.14 MIN: 11.57 / MAX: 13.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: alexnet A B 3 6 9 12 15 9.69 9.67 MIN: 9.64 / MAX: 9.83 MIN: 9.62 / MAX: 9.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: resnet50 A B 7 14 21 28 35 27.45 27.99 MIN: 27.29 / MAX: 29.43 MIN: 27.87 / MAX: 28.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: yolov4-tiny A B 8 16 24 32 40 35.17 35.61 MIN: 34.61 / MAX: 36.08 MIN: 34.51 / MAX: 145.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: squeezenet_ssd A B 5 10 15 20 25 21.59 21.67 MIN: 21.41 / MAX: 42.91 MIN: 21.53 / MAX: 23.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: regnety_400m A B 5 10 15 20 25 20.48 20.59 MIN: 20.41 / MAX: 20.9 MIN: 20.52 / MAX: 21.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: vision_transformer A B 160 320 480 640 800 761.24 760.24 MIN: 755.05 / MAX: 780.87 MIN: 753.97 / MAX: 772.4 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20220729 Target: CPU - Model: FastestDet A B 2 4 6 8 10 8.41 8.44 MIN: 8.33 / MAX: 13.21 MIN: 8.39 / MAX: 9.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream A B 0.5943 1.1886 1.7829 2.3772 2.9715 2.6413 2.6054
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B 2 4 6 8 10 8.12857 8.17729 MIN: 8.08 MIN: 8.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B 2 4 6 8 10 8.17122 8.05773 MIN: 8.15 MIN: 8.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B 1.2756 2.5512 3.8268 5.1024 6.378 5.66887 5.66947 MIN: 5.65 MIN: 5.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B 0.8684 1.7368 2.6052 3.4736 4.342 3.84180 3.85965 MIN: 3.81 MIN: 3.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B 4 8 12 16 20 14.17 14.15 MIN: 13.87 MIN: 13.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B 4 8 12 16 20 15.99 15.97 MIN: 15.86 MIN: 15.88 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B 4 8 12 16 20 14.65 14.67 MIN: 14.63 MIN: 14.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B 4 8 12 16 20 17.12 17.11 MIN: 17.04 MIN: 17.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B 2 4 6 8 10 8.39234 8.34261 MIN: 8.31 MIN: 8.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B 3 6 9 12 15 11.47 11.45 MIN: 11.44 MIN: 11.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B 2K 4K 6K 8K 10K 7868.71 7890.48 MIN: 7860.85 MIN: 7887.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B 900 1800 2700 3600 4500 4341.02 4112.59 MIN: 4203.37 MIN: 4104.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B 2K 4K 6K 8K 10K 7874.27 7866.79 MIN: 7868.63 MIN: 7862.38 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B 900 1800 2700 3600 4500 4077.01 4098.73 MIN: 4074.41 MIN: 4092.22 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B 0.9766 1.9532 2.9298 3.9064 4.883 4.34039 4.26921 MIN: 4.21 MIN: 4.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B 2K 4K 6K 8K 10K 7894.49 7935.85 MIN: 7862.58 MIN: 7928.92 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B 900 1800 2700 3600 4500 4083.82 4081.60 MIN: 4076.89 MIN: 4073.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B 0.7933 1.5866 2.3799 3.1732 3.9665 3.50581 3.52560 MIN: 3.45 MIN: 3.47 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
A Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0xb000038Python Notes: Python 2.7.18rc1 + Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT disabled + mds: Mitigation of Clear buffers; SMT disabled + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of Clear buffers; SMT disabled
Testing initiated at 25 October 2022 21:33 by user phoronix.
B Processor: Intel Xeon E5-2609 v4 @ 1.70GHz (8 Cores), Motherboard: MSI X99A RAIDER (MS-7885) v5.0 (P.50 BIOS), Chipset: Intel Xeon E7 v4/Xeon, Memory: 16GB, Disk: 256GB CORSAIR FORCE LX, Graphics: llvmpipe, Audio: Realtek ALC892, Network: Intel I218-V
OS: Ubuntu 20.04, Kernel: 5.9.0-050900rc6daily20200926-generic (x86_64) 20200925, Desktop: GNOME Shell 3.36.2, Display Server: X Server 1.20.8, OpenGL: 3.3 Mesa 20.0.4 (LLVM 9.0.1 256 bits), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: intel_cpufreq ondemand - CPU Microcode: 0xb000038Python Notes: Python 2.7.18rc1 + Python 3.8.2Security Notes: itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Mitigation of PTE Inversion; VMX: conditional cache flushes SMT disabled + mds: Mitigation of Clear buffers; SMT disabled + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of Clear buffers; SMT disabled
Testing initiated at 26 October 2022 04:30 by user phoronix.