AMD Ryzen 9 5950X 16-Core testing with a ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
A Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
B C Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: llvmpipe (2450MHz), Audio: Intel Device 4f92, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.04, Kernel: 6.0.0-060000rc5daily20220915-generic (x86_64), Desktop: GNOME Shell 3.36.9, Display Server: X Server 1.20.13, OpenGL: 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits), OpenCL: OpenCL 3.0, Vulkan: 1.1.182, Compiler: GCC 9.4.0, File-System: ext4, Screen Resolution: 3840x2160
sss OpenBenchmarking.org Phoronix Test Suite AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads) ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS) AMD Starship/Matisse 32GB 500GB Western Digital WDS500G3X0C-00SJG0 llvmpipe (2450MHz) Intel Device 4f92 ASUS MG28U Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 20.04 6.0.0-060000rc5daily20220915-generic (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits) OpenCL 3.0 1.1.182 GCC 9.4.0 ext4 3840x2160 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Vulkan Compiler File-System Screen Resolution Sss Performance System Logs - i915.force_probe=56a5 - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016 - Python 2.7.18 + Python 3.8.10 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
A B C Result Overview Phoronix Test Suite 100% 100% 101% 101% SMHasher oneDNN OpenRadioss Neural Magic DeepSparse spaCy TensorFlow Y-Cruncher QuadRay
sss onednn: IP Shapes 3D - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU smhasher: wyhash smhasher: Spooky32 smhasher: fasthash32 onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU smhasher: t1ha0_aes_avx2 x86_64 onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: IP Shapes 1D - f32 - CPU deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream openradioss: Rubber O-Ring Seal Installation smhasher: SHA3-256 openradioss: Bumper Beam smhasher: FarmHash128 onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU smhasher: t1ha2_atonce onednn: Recurrent Neural Network Training - u8s8f32 - CPU openradioss: Cell Phone Drop Test onednn: Recurrent Neural Network Inference - u8s8f32 - CPU smhasher: MeowHash x86_64 AES-NI openradioss: INIVOL and Fluid Structure Interaction Drop Container deepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream quadray: 3 - 4K tensorflow: CPU - 16 - AlexNet openradioss: Bird Strike on Windshield deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream quadray: 5 - 1080p deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream quadray: 2 - 1080p deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream spacy: en_core_web_trf quadray: 1 - 1080p tensorflow: CPU - 32 - AlexNet quadray: 2 - 4K deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU smhasher: FarmHash32 x86_64 AVX y-cruncher: 500M deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream onednn: IP Shapes 1D - u8s8f32 - CPU deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream y-cruncher: 1B deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream tensorflow: CPU - 32 - ResNet-50 deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream spacy: en_core_web_lg deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU quadray: 1 - 4K quadray: 3 - 1080p quadray: 5 - 4K smhasher: MeowHash x86_64 AES-NI smhasher: t1ha0_aes_avx2 x86_64 smhasher: FarmHash32 x86_64 AVX smhasher: t1ha2_atonce smhasher: FarmHash128 smhasher: fasthash32 smhasher: Spooky32 smhasher: SHA3-256 smhasher: wyhash A B C 8.18095 1647.73 28600.24 19879.93 8321.97 1637.87 18.4224 83813.93 0.467669 2602.44 3.86625 192.3557 41.582 92.08 209.85 110.06 20617.68 17.0332 1.65982 3.57805 18113.85 2594.59 87.55 1641.15 46922.83 512.25 67.8059 117.8457 0.786243 4.40752 11.4295 87.4875 2.87 56.17 225.54 622.8384 3.54 50.1021 13.44 159.543 626.7286 17.7758 56.2278 12.7051 1074 46.03 80.01 3.41 80.4961 12.418 12.7346 32.36 34.05 11.79 1.07159 2608.12 33990.69 18.058 119.3666 66.9923 0.776119 88.0114 11.3614 38.82 24.1805 41.3464 11.5 31.3265 31.9148 144.4733 8.8777 112.5572 16000 55.2955 0.681937 11.1 11.44 0.89 53.591 22.097 28.725 22.878 52.114 23.51 29.944 1852.225 14.968 7.95740 1666.79 27320.49 19280.72 7974.21 1679.64 18.2313 82904.44 0.459909 2624.27 3.92386 192.8251 41.4807 93.05 204.91 110.29 20202.81 16.8122 1.68678 3.63445 18237.82 2590.10 87.66 1655.47 46726.11 511.38 68.3274 116.9655 0.795473 4.36002 11.3075 88.4311 2.90 56.67 226.96 625.5465 3.55 50.4242 13.33 158.5903 626.0087 17.7512 56.3044 12.6875 1076 45.94 79.67 3.39 80.3277 12.4438 12.7366 32.48 34.16 11.83 1.07245 2597.67 34077.20 18.126 119.2849 67.0315 0.777507 88.0959 11.3506 38.819 24.2127 41.2918 11.49 31.3600 31.8808 144.7011 8.8852 112.4632 16021 55.2420 0.682747 11.11 11.44 0.89 53.462 21.950 28.523 23.111 51.832 24.690 30.815 1898.738 16.315 9.08580 1744.78 27857.67 19028.97 8089.02 1703.11 18.9283 81425.19 0.473113 2670.40 3.96322 197.0963 40.5820 94.30 206.29 107.76 20364.50 17.1213 1.67046 3.60370 18384.27 2627.85 88.81 1664.44 47376.73 518.29 67.4911 118.4057 0.789910 4.39504 11.3460 88.1329 2.90 56.51 227.52 628.2856 3.57 50.5237 13.36 158.2496 631.1154 17.8768 55.9096 12.6218 1069 45.75 79.54 3.39 80.0432 12.4880 12.6662 32.53 34.20 11.78 1.07602 2598.17 33941.33 18.118 118.9598 67.2081 0.775055 88.2760 11.3274 38.710 24.2486 41.2307 11.47 31.4060 31.8345 144.7734 8.8930 112.3645 16016 55.2232 0.682228 11.11 11.44 0.89 53.361 22.097 28.560 22.880 51.630 24.316 31.198 1889.905 15.420 OpenBenchmarking.org
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C 3 6 9 12 15 SE +/- 0.05903, N = 3 SE +/- 0.00928, N = 3 8.18095 7.95740 9.08580 MIN: 7.86 MIN: 7.47 MIN: 8.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C 400 800 1200 1600 2000 SE +/- 19.69, N = 3 SE +/- 19.41, N = 3 1647.73 1666.79 1744.78 MIN: 1635.21 MIN: 1620.68 MIN: 1706.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C 400 800 1200 1600 2000 SE +/- 12.64, N = 11 SE +/- 15.79, N = 6 1637.87 1679.64 1703.11 MIN: 1622.93 MIN: 1619.79 MIN: 1634.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C 5 10 15 20 25 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 18.42 18.23 18.93 MIN: 18.06 MIN: 17.9 MIN: 18.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C 0.1065 0.213 0.3195 0.426 0.5325 SE +/- 0.004656, N = 3 SE +/- 0.001374, N = 3 0.467669 0.459909 0.473113 MIN: 0.43 MIN: 0.41 MIN: 0.44 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C 600 1200 1800 2400 3000 SE +/- 11.17, N = 3 SE +/- 28.26, N = 5 2602.44 2624.27 2670.40 MIN: 2590.62 MIN: 2596.85 MIN: 2578.83 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C 0.8917 1.7834 2.6751 3.5668 4.4585 SE +/- 0.03093, N = 3 SE +/- 0.00111, N = 3 3.86625 3.92386 3.96322 MIN: 3.66 MIN: 3.68 MIN: 3.73 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream A B C 40 80 120 160 200 SE +/- 0.11, N = 3 SE +/- 0.14, N = 3 192.36 192.83 197.10
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream A B C 9 18 27 36 45 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 41.58 41.48 40.58
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Rubber O-Ring Seal Installation A B C 20 40 60 80 100 SE +/- 0.43, N = 3 SE +/- 0.41, N = 3 92.08 93.05 94.30
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bumper Beam A B C 20 40 60 80 100 SE +/- 0.53, N = 3 SE +/- 0.29, N = 3 110.06 110.29 107.76
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 17.03 16.81 17.12 MIN: 16.72 MIN: 16.43 MIN: 16.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C 0.3795 0.759 1.1385 1.518 1.8975 SE +/- 0.01458, N = 3 SE +/- 0.00419, N = 3 1.65982 1.68678 1.67046 MIN: 1.53 MIN: 1.52 MIN: 1.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C 0.8178 1.6356 2.4534 3.2712 4.089 SE +/- 0.00327, N = 3 SE +/- 0.01247, N = 3 3.57805 3.63445 3.60370 MIN: 3.41 MIN: 3.41 MIN: 3.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C 600 1200 1800 2400 3000 SE +/- 6.69, N = 3 SE +/- 31.84, N = 3 2594.59 2590.10 2627.85 MIN: 2584.09 MIN: 2567.75 MIN: 2582.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Cell Phone Drop Test A B C 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.52, N = 3 87.55 87.66 88.81
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C 400 800 1200 1600 2000 SE +/- 15.95, N = 3 SE +/- 17.63, N = 5 1641.15 1655.47 1664.44 MIN: 1622.41 MIN: 1622.09 MIN: 1618.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: INIVOL and Fluid Structure Interaction Drop Container A B C 110 220 330 440 550 SE +/- 0.30, N = 3 SE +/- 4.41, N = 8 512.25 511.38 518.29
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C 0.179 0.358 0.537 0.716 0.895 SE +/- 0.008536, N = 3 SE +/- 0.002635, N = 3 0.786243 0.795473 0.789910 MIN: 0.7 MIN: 0.7 MIN: 0.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C 0.9917 1.9834 2.9751 3.9668 4.9585 SE +/- 0.03778, N = 8 SE +/- 0.01829, N = 3 4.40752 4.36002 4.39504 MIN: 3.5 MIN: 3.52 MIN: 3.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream A B C 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 11.43 11.31 11.35
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream A B C 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.22, N = 3 87.49 88.43 88.13
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 3 - Resolution: 4K A B C 0.6525 1.305 1.9575 2.61 3.2625 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 2.87 2.90 2.90 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 16 - Model: AlexNet A B C 13 26 39 52 65 SE +/- 0.14, N = 3 SE +/- 0.15, N = 3 56.17 56.67 56.51
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bird Strike on Windshield A B C 50 100 150 200 250 SE +/- 0.53, N = 3 SE +/- 0.78, N = 3 225.54 226.96 227.52
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream A B C 140 280 420 560 700 SE +/- 2.50, N = 3 SE +/- 2.78, N = 3 622.84 625.55 628.29
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 5 - Resolution: 1080p A B C 0.8033 1.6066 2.4099 3.2132 4.0165 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 3.54 3.55 3.57 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream A B C 11 22 33 44 55 SE +/- 0.23, N = 3 SE +/- 0.18, N = 3 50.10 50.42 50.52
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 2 - Resolution: 1080p A B C 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 13.44 13.33 13.36 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream A B C 40 80 120 160 200 SE +/- 0.71, N = 3 SE +/- 0.56, N = 3 159.54 158.59 158.25
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream A B C 140 280 420 560 700 SE +/- 1.55, N = 3 SE +/- 1.66, N = 3 626.73 626.01 631.12
spaCy The spaCy library is an open-source solution for advanced neural language processing (NLP). The spaCy library leverages Python and is a leading neural language processing solution. This test profile times the spaCy CPU performance with various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_trf A B C 200 400 600 800 1000 SE +/- 3.48, N = 3 SE +/- 4.93, N = 3 1074 1076 1069
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 1 - Resolution: 1080p A B C 10 20 30 40 50 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 46.03 45.94 45.75 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 32 - Model: AlexNet A B C 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 80.01 79.67 79.54
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 2 - Resolution: 4K A B C 0.7673 1.5346 2.3019 3.0692 3.8365 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.41 3.39 3.39 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream A B C 20 40 60 80 100 SE +/- 0.08, N = 3 SE +/- 0.18, N = 3 80.50 80.33 80.04
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 32 - Model: GoogLeNet A B C 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 32.36 32.48 32.53
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C 0.2421 0.4842 0.7263 0.9684 1.2105 SE +/- 0.00134, N = 3 SE +/- 0.00183, N = 3 1.07159 1.07245 1.07602 MIN: 0.97 MIN: 0.97 MIN: 0.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C 600 1200 1800 2400 3000 SE +/- 8.68, N = 3 SE +/- 7.36, N = 3 2608.12 2597.67 2598.17 MIN: 2597.2 MIN: 2576.33 MIN: 2573.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream A B C 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 119.37 119.28 118.96
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C 0.1749 0.3498 0.5247 0.6996 0.8745 SE +/- 0.000399, N = 3 SE +/- 0.001575, N = 3 0.776119 0.777507 0.775055 MIN: 0.7 MIN: 0.71 MIN: 0.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream A B C 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.06, N = 3 88.01 88.10 88.28
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream A B C 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 24.18 24.21 24.25
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 32 - Model: ResNet-50 A B C 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 11.50 11.49 11.47
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream A B C 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 31.33 31.36 31.41
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream A B C 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 31.91 31.88 31.83
spaCy The spaCy library is an open-source solution for advanced neural language processing (NLP). The spaCy library leverages Python and is a leading neural language processing solution. This test profile times the spaCy CPU performance with various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_lg A B C 3K 6K 9K 12K 15K SE +/- 18.22, N = 3 SE +/- 4.58, N = 3 16000 16021 16016
Neural Magic DeepSparse OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream A B C 12 24 36 48 60 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 55.30 55.24 55.22
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C 0.1536 0.3072 0.4608 0.6144 0.768 SE +/- 0.002859, N = 3 SE +/- 0.002683, N = 3 0.681937 0.682747 0.682228 MIN: 0.59 MIN: 0.58 MIN: 0.58 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 1 - Resolution: 4K A B C 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 11.10 11.11 11.11 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 3 - Resolution: 1080p A B C 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 11.44 11.44 11.44 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 5 - Resolution: 4K A B C 0.2003 0.4006 0.6009 0.8012 1.0015 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.89 0.89 0.89 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
A Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 13 October 2022 19:27 by user phoronix.
B Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 13 October 2022 20:42 by user phoronix.
C Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: llvmpipe (2450MHz), Audio: Intel Device 4f92, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.04, Kernel: 6.0.0-060000rc5daily20220915-generic (x86_64), Desktop: GNOME Shell 3.36.9, Display Server: X Server 1.20.13, OpenGL: 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits), OpenCL: OpenCL 3.0, Vulkan: 1.1.182, Compiler: GCC 9.4.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 14 October 2022 04:46 by user phoronix.