AMD Ryzen 9 5950X 16-Core testing with a ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
A Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
B C Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: llvmpipe (2450MHz), Audio: Intel Device 4f92, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.04, Kernel: 6.0.0-060000rc5daily20220915-generic (x86_64), Desktop: GNOME Shell 3.36.9, Display Server: X Server 1.20.13, OpenGL: 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits), OpenCL: OpenCL 3.0, Vulkan: 1.1.182, Compiler: GCC 9.4.0, File-System: ext4, Screen Resolution: 3840x2160
sss OpenBenchmarking.org Phoronix Test Suite AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads) ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS) AMD Starship/Matisse 32GB 500GB Western Digital WDS500G3X0C-00SJG0 llvmpipe (2450MHz) Intel Device 4f92 ASUS MG28U Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 20.04 6.0.0-060000rc5daily20220915-generic (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits) OpenCL 3.0 1.1.182 GCC 9.4.0 ext4 3840x2160 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Vulkan Compiler File-System Screen Resolution Sss Performance System Logs - i915.force_probe=56a5 - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016 - Python 2.7.18 + Python 3.8.10 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
A B C Result Overview Phoronix Test Suite 100% 100% 101% 101% SMHasher oneDNN OpenRadioss Neural Magic DeepSparse spaCy TensorFlow Y-Cruncher QuadRay
sss openradioss: INIVOL and Fluid Structure Interaction Drop Container smhasher: SHA3-256 smhasher: SHA3-256 tensorflow: CPU - 32 - ResNet-50 openradioss: Bird Strike on Windshield onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU tensorflow: CPU - 16 - ResNet-50 tensorflow: CPU - 32 - GoogLeNet openradioss: Bumper Beam onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU openradioss: Rubber O-Ring Seal Installation openradioss: Cell Phone Drop Test onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU spacy: en_core_web_trf spacy: en_core_web_lg tensorflow: CPU - 16 - GoogLeNet deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream tensorflow: CPU - 32 - AlexNet deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream y-cruncher: 1B deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream onednn: Deconvolution Batch shapes_1d - f32 - CPU deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream tensorflow: CPU - 16 - AlexNet onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU quadray: 5 - 4K quadray: 2 - 4K quadray: 3 - 4K quadray: 1 - 4K quadray: 5 - 1080p quadray: 3 - 1080p quadray: 2 - 1080p quadray: 1 - 1080p y-cruncher: 500M onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU smhasher: MeowHash x86_64 AES-NI smhasher: MeowHash x86_64 AES-NI smhasher: FarmHash128 smhasher: FarmHash128 smhasher: t1ha0_aes_avx2 x86_64 smhasher: t1ha0_aes_avx2 x86_64 onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU smhasher: Spooky32 smhasher: Spooky32 smhasher: FarmHash32 x86_64 AVX smhasher: FarmHash32 x86_64 AVX smhasher: t1ha2_atonce smhasher: t1ha2_atonce smhasher: fasthash32 smhasher: fasthash32 onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU smhasher: wyhash smhasher: wyhash onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU A B C 512.25 1852.225 209.85 11.5 225.54 1637.87 11.79 32.36 110.06 2602.44 1641.15 92.08 87.55 2594.59 2608.12 1647.73 1074 16000 34.05 192.3557 41.582 622.8384 12.7346 626.7286 12.7051 144.4733 55.2955 31.3265 31.9148 80.01 24.1805 41.3464 88.0114 11.3614 87.4875 11.4295 38.82 66.9923 119.3666 117.8457 67.8059 12.418 80.4961 17.7758 56.2278 50.1021 159.543 4.40752 8.8777 112.5572 56.17 1.07159 0.89 3.41 2.87 11.1 3.54 11.44 13.44 46.03 18.058 3.86625 0.776119 0.786243 0.681937 53.591 46922.83 52.114 20617.68 22.097 83813.93 8.18095 0.467669 29.944 19879.93 28.725 33990.69 22.878 18113.85 23.51 8321.97 17.0332 18.4224 14.968 28600.24 3.57805 1.65982 511.38 1898.738 204.91 11.49 226.96 1679.64 11.83 32.48 110.29 2624.27 1655.47 93.05 87.66 2590.10 2597.67 1666.79 1076 16021 34.16 192.8251 41.4807 625.5465 12.7366 626.0087 12.6875 144.7011 55.2420 31.3600 31.8808 79.67 24.2127 41.2918 88.0959 11.3506 88.4311 11.3075 38.819 67.0315 119.2849 116.9655 68.3274 12.4438 80.3277 17.7512 56.3044 50.4242 158.5903 4.36002 8.8852 112.4632 56.67 1.07245 0.89 3.39 2.90 11.11 3.55 11.44 13.33 45.94 18.126 3.92386 0.777507 0.795473 0.682747 53.462 46726.11 51.832 20202.81 21.950 82904.44 7.95740 0.459909 30.815 19280.72 28.523 34077.20 23.111 18237.82 24.690 7974.21 16.8122 18.2313 16.315 27320.49 3.63445 1.68678 518.29 1889.905 206.29 11.47 227.52 1703.11 11.78 32.53 107.76 2670.40 1664.44 94.30 88.81 2627.85 2598.17 1744.78 1069 16016 34.20 197.0963 40.5820 628.2856 12.6662 631.1154 12.6218 144.7734 55.2232 31.4060 31.8345 79.54 24.2486 41.2307 88.2760 11.3274 88.1329 11.3460 38.710 67.2081 118.9598 118.4057 67.4911 12.4880 80.0432 17.8768 55.9096 50.5237 158.2496 4.39504 8.8930 112.3645 56.51 1.07602 0.89 3.39 2.90 11.11 3.57 11.44 13.36 45.75 18.118 3.96322 0.775055 0.789910 0.682228 53.361 47376.73 51.630 20364.50 22.097 81425.19 9.08580 0.473113 31.198 19028.97 28.560 33941.33 22.880 18384.27 24.316 8089.02 17.1213 18.9283 15.420 27857.67 3.60370 1.67046 OpenBenchmarking.org
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: INIVOL and Fluid Structure Interaction Drop Container A B C 110 220 330 440 550 SE +/- 0.30, N = 3 SE +/- 4.41, N = 8 512.25 511.38 518.29
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 32 - Model: ResNet-50 A B C 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 11.50 11.49 11.47
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bird Strike on Windshield A B C 50 100 150 200 250 SE +/- 0.53, N = 3 SE +/- 0.78, N = 3 225.54 226.96 227.52
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C 400 800 1200 1600 2000 SE +/- 12.64, N = 11 SE +/- 15.79, N = 6 1637.87 1679.64 1703.11 MIN: 1622.93 MIN: 1619.79 MIN: 1634.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 16 - Model: ResNet-50 A B C 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 11.79 11.83 11.78
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Bumper Beam A B C 20 40 60 80 100 SE +/- 0.53, N = 3 SE +/- 0.29, N = 3 110.06 110.29 107.76
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C 600 1200 1800 2400 3000 SE +/- 11.17, N = 3 SE +/- 28.26, N = 5 2602.44 2624.27 2670.40 MIN: 2590.62 MIN: 2596.85 MIN: 2578.83 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C 400 800 1200 1600 2000 SE +/- 15.95, N = 3 SE +/- 17.63, N = 5 1641.15 1655.47 1664.44 MIN: 1622.41 MIN: 1622.09 MIN: 1618.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2022.10.13 Model: Rubber O-Ring Seal Installation A B C 20 40 60 80 100 SE +/- 0.43, N = 3 SE +/- 0.41, N = 3 92.08 93.05 94.30
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C 600 1200 1800 2400 3000 SE +/- 6.69, N = 3 SE +/- 31.84, N = 3 2594.59 2590.10 2627.85 MIN: 2584.09 MIN: 2567.75 MIN: 2582.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C 600 1200 1800 2400 3000 SE +/- 8.68, N = 3 SE +/- 7.36, N = 3 2608.12 2597.67 2598.17 MIN: 2597.2 MIN: 2576.33 MIN: 2573.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C 400 800 1200 1600 2000 SE +/- 19.69, N = 3 SE +/- 19.41, N = 3 1647.73 1666.79 1744.78 MIN: 1635.21 MIN: 1620.68 MIN: 1706.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
spaCy The spaCy library is an open-source solution for advanced neural language processing (NLP). The spaCy library leverages Python and is a leading neural language processing solution. This test profile times the spaCy CPU performance with various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_trf A B C 200 400 600 800 1000 SE +/- 3.48, N = 3 SE +/- 4.93, N = 3 1074 1076 1069
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 16 - Model: GoogLeNet A B C 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 34.05 34.16 34.20
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream A B C 40 80 120 160 200 SE +/- 0.11, N = 3 SE +/- 0.14, N = 3 192.36 192.83 197.10
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream A B C 9 18 27 36 45 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 41.58 41.48 40.58
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream A B C 140 280 420 560 700 SE +/- 2.50, N = 3 SE +/- 2.78, N = 3 622.84 625.55 628.29
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream A B C 140 280 420 560 700 SE +/- 1.55, N = 3 SE +/- 1.66, N = 3 626.73 626.01 631.12
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream A B C 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 31.33 31.36 31.41
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.1 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream A B C 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 31.91 31.88 31.83
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 32 - Model: AlexNet A B C 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 80.01 79.67 79.54
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream A B C 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 24.18 24.21 24.25
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream A B C 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.06, N = 3 88.01 88.10 88.28
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream A B C 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.22, N = 3 87.49 88.43 88.13
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream A B C 15 30 45 60 75 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 66.99 67.03 67.21
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C 0.9917 1.9834 2.9751 3.9668 4.9585 SE +/- 0.03778, N = 8 SE +/- 0.01829, N = 3 4.40752 4.36002 4.39504 MIN: 3.5 MIN: 3.52 MIN: 3.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Neural Magic DeepSparse OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.1 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream A B C 2 4 6 8 10 SE +/- 0.0017, N = 3 SE +/- 0.0051, N = 3 8.8777 8.8852 8.8930
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.10 Device: CPU - Batch Size: 16 - Model: AlexNet A B C 13 26 39 52 65 SE +/- 0.14, N = 3 SE +/- 0.15, N = 3 56.17 56.67 56.51
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C 0.2421 0.4842 0.7263 0.9684 1.2105 SE +/- 0.00134, N = 3 SE +/- 0.00183, N = 3 1.07159 1.07245 1.07602 MIN: 0.97 MIN: 0.97 MIN: 0.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
QuadRay VectorChief's QuadRay is a real-time ray-tracing engine written to support SIMD across ARM, MIPS, PPC, and x86/x86_64 processors. QuadRay supports SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 usage on Intel/AMD CPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 5 - Resolution: 4K A B C 0.2003 0.4006 0.6009 0.8012 1.0015 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.89 0.89 0.89 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 2 - Resolution: 4K A B C 0.7673 1.5346 2.3019 3.0692 3.8365 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.41 3.39 3.39 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 3 - Resolution: 4K A B C 0.6525 1.305 1.9575 2.61 3.2625 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 2.87 2.90 2.90 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 1 - Resolution: 4K A B C 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 11.10 11.11 11.11 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 5 - Resolution: 1080p A B C 0.8033 1.6066 2.4099 3.2132 4.0165 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 3.54 3.55 3.57 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 3 - Resolution: 1080p A B C 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 11.44 11.44 11.44 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 2 - Resolution: 1080p A B C 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 13.44 13.33 13.36 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
OpenBenchmarking.org FPS, More Is Better QuadRay 2022.05.25 Scene: 1 - Resolution: 1080p A B C 10 20 30 40 50 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 46.03 45.94 45.75 1. (CXX) g++ options: -O3 -pthread -lm -lstdc++ -lX11 -lXext -lpthread
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C 0.8917 1.7834 2.6751 3.5668 4.4585 SE +/- 0.03093, N = 3 SE +/- 0.00111, N = 3 3.86625 3.92386 3.96322 MIN: 3.66 MIN: 3.68 MIN: 3.73 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C 0.1749 0.3498 0.5247 0.6996 0.8745 SE +/- 0.000399, N = 3 SE +/- 0.001575, N = 3 0.776119 0.777507 0.775055 MIN: 0.7 MIN: 0.71 MIN: 0.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C 0.179 0.358 0.537 0.716 0.895 SE +/- 0.008536, N = 3 SE +/- 0.002635, N = 3 0.786243 0.795473 0.789910 MIN: 0.7 MIN: 0.7 MIN: 0.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C 0.1536 0.3072 0.4608 0.6144 0.768 SE +/- 0.002859, N = 3 SE +/- 0.002683, N = 3 0.681937 0.682747 0.682228 MIN: 0.59 MIN: 0.58 MIN: 0.58 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: FarmHash128 A B C 12 24 36 48 60 SE +/- 0.44, N = 3 SE +/- 0.45, N = 3 52.11 51.83 51.63 1. (CXX) g++ options: -march=native -O3 -flto -fno-fat-lto-objects -lpthread
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: t1ha0_aes_avx2 x86_64 A B C 5 10 15 20 25 SE +/- 0.17, N = 9 SE +/- 0.00, N = 3 22.10 21.95 22.10 1. (CXX) g++ options: -march=native -O3 -flto -fno-fat-lto-objects -lpthread
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C 3 6 9 12 15 SE +/- 0.05903, N = 3 SE +/- 0.00928, N = 3 8.18095 7.95740 9.08580 MIN: 7.86 MIN: 7.47 MIN: 8.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C 0.1065 0.213 0.3195 0.426 0.5325 SE +/- 0.004656, N = 3 SE +/- 0.001374, N = 3 0.467669 0.459909 0.473113 MIN: 0.43 MIN: 0.41 MIN: 0.44 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: FarmHash32 x86_64 AVX A B C 7 14 21 28 35 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 28.73 28.52 28.56 1. (CXX) g++ options: -march=native -O3 -flto -fno-fat-lto-objects -lpthread
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: t1ha2_atonce A B C 6 12 18 24 30 SE +/- 0.27, N = 3 SE +/- 0.43, N = 4 22.88 23.11 22.88 1. (CXX) g++ options: -march=native -O3 -flto -fno-fat-lto-objects -lpthread
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: fasthash32 A B C 6 12 18 24 30 SE +/- 0.10, N = 3 SE +/- 0.16, N = 3 23.51 24.69 24.32 1. (CXX) g++ options: -march=native -O3 -flto -fno-fat-lto-objects -lpthread
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 17.03 16.81 17.12 MIN: 16.72 MIN: 16.43 MIN: 16.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C 5 10 15 20 25 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 18.42 18.23 18.93 MIN: 18.06 MIN: 17.9 MIN: 18.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C 0.8178 1.6356 2.4534 3.2712 4.089 SE +/- 0.00327, N = 3 SE +/- 0.01247, N = 3 3.57805 3.63445 3.60370 MIN: 3.41 MIN: 3.41 MIN: 3.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C 0.3795 0.759 1.1385 1.518 1.8975 SE +/- 0.01458, N = 3 SE +/- 0.00419, N = 3 1.65982 1.68678 1.67046 MIN: 1.53 MIN: 1.52 MIN: 1.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU
A: The test run did not produce a result.
B: The test run did not produce a result.
C: The test run did not produce a result.
A Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 13 October 2022 19:27 by user phoronix.
B Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 13 October 2022 20:42 by user phoronix.
C Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (4006 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 500GB Western Digital WDS500G3X0C-00SJG0, Graphics: llvmpipe (2450MHz), Audio: Intel Device 4f92, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.04, Kernel: 6.0.0-060000rc5daily20220915-generic (x86_64), Desktop: GNOME Shell 3.36.9, Display Server: X Server 1.20.13, OpenGL: 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits), OpenCL: OpenCL 3.0, Vulkan: 1.1.182, Compiler: GCC 9.4.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: i915.force_probe=56a5 - Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201016Python Notes: Python 2.7.18 + Python 3.8.10Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 14 October 2022 04:46 by user phoronix.