Google Cloud c3 Sapphire Rapids

Benchmarks by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2303218-PTS-2303217N47
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
c3-highcpu-8 SPR
March 21 2023
  13 Hours, 29 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Google Cloud c3 Sapphire RapidsOpenBenchmarking.orgPhoronix Test SuiteIntel Xeon Platinum 8481C (4 Cores / 8 Threads)Google Compute Engine c3-highcpu-8Intel 440FX 82441FX PMC16GB322GB nvme_card-pdGoogle Compute Engine VirtualUbuntu 22.105.19.0-1015-gcp (x86_64)1.3.224GCC 12.2.0ext4KVMProcessorMotherboardChipsetMemoryDiskNetworkOSKernelVulkanCompilerFile-SystemSystem LayerGoogle Cloud C3 Sapphire Rapids PerformanceSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - CPU Microcode: 0xffffffff- Python 3.10.7- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

Google Cloud c3 Sapphire Rapidsopencv: Object Detectionopencv: Image Processingopencv: Stitchingopencv: Graph APIopencv: Videoopencv: Corebrl-cad: VGR Performance Metricnginx: 4000nginx: 1000nginx: 500nginx: 200nginx: 100blender: BMW27 - CPU-Onlydraco: Church Facadedraco: Liondeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamtensorflow: CPU - 64 - ResNet-50tensorflow: CPU - 32 - ResNet-50tensorflow: CPU - 16 - ResNet-50pgbench: 100 - 1000 - Read Only - Average Latencypgbench: 100 - 1000 - Read Onlypgbench: 100 - 800 - Read Only - Average Latencypgbench: 100 - 800 - Read Onlymysqlslap: 4096mysqlslap: 2048gromacs: MPI CPU - water_GMX50_barememcached: 1:100memcached: 1:10cockroach: KV, 95% Reads - 128cockroach: KV, 50% Reads - 128cockroach: MoVR - 128openssl: ChaCha20-Poly1305openssl: AES-256-GCMopenssl: AES-128-GCMopenssl: ChaCha20openssl: RSA4096openssl: RSA4096openssl: SHA512openssl: SHA256ospray-studio: 3 - 4K - 1 - Path Tracerospray-studio: 1 - 4K - 1 - Path Traceronednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUbuild-linux-kernel: defconfigbuild-ffmpeg: Time To Compilecompress-7zip: Decompression Ratingcompress-7zip: Compression Ratingopenvkl: vklBenchmark ISPCoidn: RTLightmap.hdr.4096x4096oidn: RT.hdr_alb_nrm.3840x2160uvg266: Bosphorus 1080p - Ultra Fastuvg266: Bosphorus 1080p - Super Fastuvg266: Bosphorus 1080p - Very Fastuvg266: Bosphorus 4K - Ultra Fastuvg266: Bosphorus 4K - Super Fastuvg266: Bosphorus 4K - Very Fastembree: Pathtracer ISPC - Asian Dragonembree: Pathtracer ISPC - Crownjohn-the-ripper: MD5john-the-ripper: HMAC-SHA512john-the-ripper: Blowfishjohn-the-ripper: WPA PSKjohn-the-ripper: bcryptcompress-zstd: 19, Long Mode - Decompression Speedcompress-zstd: 19, Long Mode - Compression Speedcompress-zstd: 19 - Decompression Speedcompress-zstd: 19 - Compression Speedspecfem3d: Water-layered Halfspacespecfem3d: Homogeneous Halfspacespecfem3d: Tomographic Modelspecfem3d: Layered Halfspacespecfem3d: Mount St. Helensopenradioss: Rubber O-Ring Seal Installationopenradioss: Bird Strike on Windshieldopenradioss: Cell Phone Drop Testopenradioss: Bumper Beamopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenfoam: drivaerFastback, Small Mesh Size - Mesh Timeincompact3d: input.i3d 129 Cells Per Directionnekrs: TurboPipe Periodicnamd: ATPase Simulation - 327,506 Atomsminibude: OpenMP - BM1minibude: OpenMP - BM1lczero: Eigenlczero: BLASc3-highcpu-8 SPR3899912816321476021993131654873727107232814.7532118.5834672.6535602.1036310.35315.2475736250530.61003.7693123.293116.2194305.89916.537260.385833.106430.795264.8747104.310219.167738.512051.8926528.01933.787315.6914.9314.203.4052937252.5653119423173320.7771030937.281044947.1324960.119321.6458.81597078114048008361573575940778232209155763767857.72062.71568572920428387398725819209520.9689862337.314660.703.549441.471454.177075.342181.50004244.800120.4382046835306980.120.2442.2434.5032.399.127.486.997.36425.8475765713375090006930288186932907.26.5905.210.3321.625251668179.545901627143.937022875372.089480857139.524469614367.00595.14219.73303.38422.6783662.04427732.3943163306679000003.357797.546188.65112211272OpenBenchmarking.org

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Object Detectionc3-highcpu-8 SPR8K16K24K32K40KSE +/- 384.74, N = 5389991. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Image Processingc3-highcpu-8 SPR30K60K90K120K150KSE +/- 1624.35, N = 121281631. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Stitchingc3-highcpu-8 SPR50K100K150K200K250KSE +/- 1973.06, N = 72147601. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Graph APIc3-highcpu-8 SPR50K100K150K200K250KSE +/- 931.36, N = 32199311. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Videoc3-highcpu-8 SPR7K14K21K28K35KSE +/- 198.80, N = 3316541. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Corec3-highcpu-8 SPR20K40K60K80K100KSE +/- 280.31, N = 3873721. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

BRL-CAD

BRL-CAD is a cross-platform, open-source solid modeling system with built-in benchmark mode. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgVGR Performance Metric, More Is BetterBRL-CAD 7.34VGR Performance Metricc3-highcpu-8 SPR15K30K45K60K75K710721. (CXX) g++ options: -std=c++14 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -ltcl8.6 -lregex_brl -lz_brl -lnetpbm -ldl -lm -ltk8.6

nginx

This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 4000c3-highcpu-8 SPR7K14K21K28K35KSE +/- 27.93, N = 332814.751. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 1000c3-highcpu-8 SPR7K14K21K28K35KSE +/- 22.42, N = 332118.581. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 500c3-highcpu-8 SPR7K14K21K28K35KSE +/- 321.85, N = 334672.651. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 200c3-highcpu-8 SPR8K16K24K32K40KSE +/- 83.52, N = 335602.101. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.23.2Connections: 100c3-highcpu-8 SPR8K16K24K32K40KSE +/- 23.95, N = 336310.351. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2

Blender

Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.4Blend File: BMW27 - Compute: CPU-Onlyc3-highcpu-8 SPR70140210280350SE +/- 1.13, N = 3315.24

Google Draco

Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterGoogle Draco 1.5.6Model: Church Facadec3-highcpu-8 SPR16003200480064008000SE +/- 10.48, N = 375731. (CXX) g++ options: -O3

OpenBenchmarking.orgms, Fewer Is BetterGoogle Draco 1.5.6Model: Lionc3-highcpu-8 SPR13002600390052006500SE +/- 80.23, N = 1562501. (CXX) g++ options: -O3

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR110220330440550SE +/- 3.34, N = 3530.61

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR0.84811.69622.54433.39244.2405SE +/- 0.0239, N = 33.7693

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR306090120150SE +/- 0.95, N = 3123.29

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR48121620SE +/- 0.13, N = 316.22

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR70140210280350SE +/- 0.37, N = 3305.90

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR246810SE +/- 0.0078, N = 36.5372

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR1428425670SE +/- 0.28, N = 360.39

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR816243240SE +/- 0.15, N = 333.11

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR714212835SE +/- 0.02, N = 330.80

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR1428425670SE +/- 0.04, N = 364.87

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR20406080100SE +/- 0.14, N = 3104.31

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR510152025SE +/- 0.03, N = 319.17

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR918273645SE +/- 0.04, N = 338.51

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR1224364860SE +/- 0.05, N = 351.89

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR110220330440550SE +/- 1.78, N = 3528.02

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.3.2Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamc3-highcpu-8 SPR0.85211.70422.55633.40844.2605SE +/- 0.0130, N = 33.7873

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 64 - Model: ResNet-50c3-highcpu-8 SPR48121620SE +/- 0.01, N = 315.69

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 32 - Model: ResNet-50c3-highcpu-8 SPR48121620SE +/- 0.01, N = 314.93

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 16 - Model: ResNet-50c3-highcpu-8 SPR48121620SE +/- 0.04, N = 314.20

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latencyc3-highcpu-8 SPR0.76611.53222.29833.06443.8305SE +/- 0.028, N = 33.4051. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 1000 - Mode: Read Onlyc3-highcpu-8 SPR60K120K180K240K300KSE +/- 2414.33, N = 32937251. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 800 - Mode: Read Only - Average Latencyc3-highcpu-8 SPR0.57711.15421.73132.30842.8855SE +/- 0.028, N = 32.5651. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 100 - Clients: 800 - Mode: Read Onlyc3-highcpu-8 SPR70K140K210K280K350KSE +/- 3369.27, N = 33119421. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

MariaDB

This is a MariaDB MySQL database server benchmark making use of mysqlslap. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgQueries Per Second, More Is BetterMariaDB 11.0.1Clients: 4096c3-highcpu-8 SPR70140210280350SE +/- 2.62, N = 33171. (CXX) g++ options: -pie -fPIC -fstack-protector -O3 -lnuma -lcrypt -lz -lm -lssl -lcrypto -lpthread -ldl

OpenBenchmarking.orgQueries Per Second, More Is BetterMariaDB 11.0.1Clients: 2048c3-highcpu-8 SPR70140210280350SE +/- 3.01, N = 33321. (CXX) g++ options: -pie -fPIC -fstack-protector -O3 -lnuma -lcrypt -lz -lm -lssl -lcrypto -lpthread -ldl

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_barec3-highcpu-8 SPR0.17480.34960.52440.69920.874SE +/- 0.001, N = 30.7771. (CXX) g++ options: -O3

Memcached

Memcached is a high performance, distributed memory object caching system. This Memcached test profiles makes use of memtier_benchmark for excuting this CPU/memory-focused server benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.18Set To Get Ratio: 1:100c3-highcpu-8 SPR200K400K600K800K1000KSE +/- 11157.73, N = 31030937.281. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.18Set To Get Ratio: 1:10c3-highcpu-8 SPR200K400K600K800K1000KSE +/- 3280.78, N = 31044947.131. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

CockroachDB

CockroachDB is a cloud-native, distributed SQL database for data intensive applications. This test profile uses a server-less CockroachDB configuration to test various Coackroach workloads on the local host with a single node. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgops/s, More Is BetterCockroachDB 22.2Workload: KV, 95% Reads - Concurrency: 128c3-highcpu-8 SPR5K10K15K20K25KSE +/- 127.78, N = 324960.1

OpenBenchmarking.orgops/s, More Is BetterCockroachDB 22.2Workload: KV, 50% Reads - Concurrency: 128c3-highcpu-8 SPR4K8K12K16K20KSE +/- 40.86, N = 319321.6

OpenBenchmarking.orgops/s, More Is BetterCockroachDB 22.2Workload: MoVR - Concurrency: 128c3-highcpu-8 SPR100200300400500SE +/- 5.62, N = 15458.8

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: ChaCha20-Poly1305c3-highcpu-8 SPR3000M6000M9000M12000M15000MSE +/- 21990039.72, N = 3159707811401. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: AES-256-GCMc3-highcpu-8 SPR10000M20000M30000M40000M50000MSE +/- 48074910.93, N = 3480083615731. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: AES-128-GCMc3-highcpu-8 SPR12000M24000M36000M48000M60000MSE +/- 33921495.47, N = 3575940778231. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: ChaCha20c3-highcpu-8 SPR5000M10000M15000M20000M25000MSE +/- 35781789.46, N = 3220915576371. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgverify/s, More Is BetterOpenSSL 3.1Algorithm: RSA4096c3-highcpu-8 SPR15K30K45K60K75KSE +/- 8.53, N = 367857.71. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgsign/s, More Is BetterOpenSSL 3.1Algorithm: RSA4096c3-highcpu-8 SPR400800120016002000SE +/- 1.17, N = 32062.71. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: SHA512c3-highcpu-8 SPR300M600M900M1200M1500MSE +/- 1152941.98, N = 315685729201. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: SHA256c3-highcpu-8 SPR900M1800M2700M3600M4500MSE +/- 2722265.85, N = 342838739871. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracerc3-highcpu-8 SPR6K12K18K24K30KSE +/- 364.36, N = 3258191. (CXX) g++ options: -O3 -lm -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracerc3-highcpu-8 SPR4K8K12K16K20KSE +/- 4.91, N = 3209521. (CXX) g++ options: -O3 -lm -ldl

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR0.2180.4360.6540.8721.09SE +/- 0.013114, N = 30.968986MIN: 0.921. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR5001000150020002500SE +/- 1.76, N = 32337.31MIN: 2326.771. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR10002000300040005000SE +/- 0.86, N = 34660.70MIN: 4648.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR0.79861.59722.39583.19443.993SE +/- 0.01072, N = 33.54944MIN: 3.471. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR0.33110.66220.99331.32441.6555SE +/- 0.00472, N = 31.47145MIN: 1.41. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR0.93981.87962.81943.75924.699SE +/- 0.00870, N = 34.17707MIN: 4.041. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR1.2022.4043.6064.8086.01SE +/- 0.00535, N = 35.34218MIN: 4.941. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.0Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUc3-highcpu-8 SPR0.33750.6751.01251.351.6875SE +/- 0.00340, N = 31.50004MIN: 1.371. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: defconfigc3-highcpu-8 SPR50100150200250SE +/- 0.64, N = 3244.80

Timed FFmpeg Compilation

This test times how long it takes to build the FFmpeg multimedia library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed FFmpeg Compilation 6.0Time To Compilec3-highcpu-8 SPR306090120150SE +/- 0.10, N = 3120.44

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression Ratingc3-highcpu-8 SPR4K8K12K16K20KSE +/- 183.58, N = 15204681. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression Ratingc3-highcpu-8 SPR8K16K24K32K40KSE +/- 246.17, N = 15353061. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPCc3-highcpu-8 SPR20406080100SE +/- 0.33, N = 398MIN: 11 / MAX: 1579

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RTLightmap.hdr.4096x4096c3-highcpu-8 SPR0.0270.0540.0810.1080.135SE +/- 0.00, N = 30.12

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RT.hdr_alb_nrm.3840x2160c3-highcpu-8 SPR0.0540.1080.1620.2160.27SE +/- 0.00, N = 30.24

VVenC

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.7Video Input: Bosphorus 1080p - Video Preset: Fasterc3-highcpu-8 SPR3691215SE +/- 0.01, N = 312.921. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.7Video Input: Bosphorus 1080p - Video Preset: Fastc3-highcpu-8 SPR1.19362.38723.58084.77445.968SE +/- 0.018, N = 35.3051. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.7Video Input: Bosphorus 4K - Video Preset: Fasterc3-highcpu-8 SPR0.79291.58582.37873.17163.9645SE +/- 0.000, N = 33.5241. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.7Video Input: Bosphorus 4K - Video Preset: Fastc3-highcpu-8 SPR0.35010.70021.05031.40041.7505SE +/- 0.005, N = 31.5561. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

uvg266

uvg266 is an open-source VVC/H.266 (Versatile Video Coding) encoder based on Kvazaar as part of the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: Ultra Fastc3-highcpu-8 SPR1020304050SE +/- 0.01, N = 342.24

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: Super Fastc3-highcpu-8 SPR816243240SE +/- 0.03, N = 334.50

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: Very Fastc3-highcpu-8 SPR816243240SE +/- 0.25, N = 332.39

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: Ultra Fastc3-highcpu-8 SPR3691215SE +/- 0.01, N = 39.12

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: Super Fastc3-highcpu-8 SPR246810SE +/- 0.00, N = 37.48

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: Very Fastc3-highcpu-8 SPR246810SE +/- 0.03, N = 36.99

Embree

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.0.1Binary: Pathtracer ISPC - Model: Asian Dragonc3-highcpu-8 SPR246810SE +/- 0.0079, N = 37.3642MIN: 7.33 / MAX: 7.44

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.0.1Binary: Pathtracer ISPC - Model: Crownc3-highcpu-8 SPR1.31572.63143.94715.26286.5785SE +/- 0.0120, N = 35.8475MIN: 5.81 / MAX: 5.92

John The Ripper

This is a benchmark of John The Ripper, which is a password cracker. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: MD5c3-highcpu-8 SPR160K320K480K640K800KSE +/- 1472.13, N = 37657131. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: HMAC-SHA512c3-highcpu-8 SPR8M16M24M32M40MSE +/- 21071.31, N = 3375090001. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: Blowfishc3-highcpu-8 SPR15003000450060007500SE +/- 2.08, N = 369301. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: WPA PSKc3-highcpu-8 SPR6K12K18K24K30KSE +/- 27.15, N = 3288181. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: bcryptc3-highcpu-8 SPR15003000450060007500SE +/- 0.67, N = 369321. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt

Zstd Compression

This test measures the time needed to compress/decompress a sample file (silesia.tar) using Zstd (Zstandard) compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.4Compression Level: 19, Long Mode - Decompression Speedc3-highcpu-8 SPR2004006008001000SE +/- 1.48, N = 3907.21. (CC) gcc options: -O3 -pthread -lz -llzma

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.4Compression Level: 19, Long Mode - Compression Speedc3-highcpu-8 SPR246810SE +/- 0.00, N = 36.51. (CC) gcc options: -O3 -pthread -lz -llzma

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.4Compression Level: 19 - Decompression Speedc3-highcpu-8 SPR2004006008001000SE +/- 1.50, N = 3905.21. (CC) gcc options: -O3 -pthread -lz -llzma

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.4Compression Level: 19 - Compression Speedc3-highcpu-8 SPR3691215SE +/- 0.12, N = 310.31. (CC) gcc options: -O3 -pthread -lz -llzma

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered Halfspacec3-highcpu-8 SPR70140210280350SE +/- 0.31, N = 3321.631. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous Halfspacec3-highcpu-8 SPR4080120160200SE +/- 0.09, N = 3179.551. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic Modelc3-highcpu-8 SPR306090120150SE +/- 1.65, N = 3143.941. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered Halfspacec3-highcpu-8 SPR80160240320400SE +/- 0.64, N = 3372.091. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. Helensc3-highcpu-8 SPR306090120150SE +/- 0.10, N = 3139.521. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Rubber O-Ring Seal Installationc3-highcpu-8 SPR80160240320400SE +/- 0.20, N = 3367.00

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Bird Strike on Windshieldc3-highcpu-8 SPR130260390520650SE +/- 0.30, N = 3595.14

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Cell Phone Drop Testc3-highcpu-8 SPR50100150200250SE +/- 0.38, N = 3219.73

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2022.10.13Model: Bumper Beamc3-highcpu-8 SPR70140210280350SE +/- 0.51, N = 3303.38

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution Timec3-highcpu-8 SPR90180270360450422.681. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lphysicalProperties -lspecie -lfiniteVolume -lfvModels -lgenericPatchFields -lmeshTools -lsampling -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh Timec3-highcpu-8 SPR142842567062.041. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lphysicalProperties -lspecie -lfiniteVolume -lfvModels -lgenericPatchFields -lmeshTools -lsampling -lOpenFOAM -ldl -lm

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per Directionc3-highcpu-8 SPR816243240SE +/- 0.02, N = 332.391. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

nekRS

nekRS is an open-source Navier Stokes solver based on the spectral element method. NekRS supports both CPU and GPU/accelerator support though this test profile is currently configured for CPU execution. NekRS is part of Nek5000 of the Mathematics and Computer Science MCS at Argonne National Laboratory. This nekRS benchmark is primarily relevant to large core count HPC servers and otherwise may be very time consuming. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFLOP/s, More Is BetternekRS 22.0Input: TurboPipe Periodicc3-highcpu-8 SPR7000M14000M21000M28000M35000MSE +/- 92013205.57, N = 3306679000001. (CXX) g++ options: -fopenmp -O2 -march=native -mtune=native -ftree-vectorize -lmpi_cxx -lmpi

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 Atomsc3-highcpu-8 SPR0.75551.5112.26653.0223.7775SE +/- 0.00254, N = 33.35779

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1c3-highcpu-8 SPR246810SE +/- 0.001, N = 37.5461. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1c3-highcpu-8 SPR4080120160200SE +/- 0.03, N = 3188.651. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: Eigenc3-highcpu-8 SPR30060090012001500SE +/- 7.69, N = 312211. (CXX) g++ options: -flto -pthread

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLASc3-highcpu-8 SPR30060090012001500SE +/- 16.59, N = 312721. (CXX) g++ options: -flto -pthread

110 Results Shown

OpenCV:
  Object Detection
  Image Processing
  Stitching
  Graph API
  Video
  Core
BRL-CAD
nginx:
  4000
  1000
  500
  200
  100
Blender
Google Draco:
  Church Facade
  Lion
Neural Magic DeepSparse:
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
    ms/batch
    items/sec
TensorFlow:
  CPU - 64 - ResNet-50
  CPU - 32 - ResNet-50
  CPU - 16 - ResNet-50
PostgreSQL:
  100 - 1000 - Read Only - Average Latency
  100 - 1000 - Read Only
  100 - 800 - Read Only - Average Latency
  100 - 800 - Read Only
MariaDB:
  4096
  2048
GROMACS
Memcached:
  1:100
  1:10
CockroachDB:
  KV, 95% Reads - 128
  KV, 50% Reads - 128
  MoVR - 128
OpenSSL:
  ChaCha20-Poly1305
  AES-256-GCM
  AES-128-GCM
  ChaCha20
  RSA4096
  RSA4096
  SHA512
  SHA256
OSPRay Studio:
  3 - 4K - 1 - Path Tracer
  1 - 4K - 1 - Path Tracer
oneDNN:
  Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU
  Recurrent Neural Network Inference - bf16bf16bf16 - CPU
  Recurrent Neural Network Training - bf16bf16bf16 - CPU
  Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU
  Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU
  Convolution Batch Shapes Auto - bf16bf16bf16 - CPU
  IP Shapes 3D - bf16bf16bf16 - CPU
  IP Shapes 1D - bf16bf16bf16 - CPU
Timed Linux Kernel Compilation
Timed FFmpeg Compilation
7-Zip Compression:
  Decompression Rating
  Compression Rating
OpenVKL
Intel Open Image Denoise:
  RTLightmap.hdr.4096x4096
  RT.hdr_alb_nrm.3840x2160
VVenC:
  Bosphorus 1080p - Faster
  Bosphorus 1080p - Fast
  Bosphorus 4K - Faster
  Bosphorus 4K - Fast
uvg266:
  Bosphorus 1080p - Ultra Fast
  Bosphorus 1080p - Super Fast
  Bosphorus 1080p - Very Fast
  Bosphorus 4K - Ultra Fast
  Bosphorus 4K - Super Fast
  Bosphorus 4K - Very Fast
Embree:
  Pathtracer ISPC - Asian Dragon
  Pathtracer ISPC - Crown
John The Ripper:
  MD5
  HMAC-SHA512
  Blowfish
  WPA PSK
  bcrypt
Zstd Compression:
  19, Long Mode - Decompression Speed
  19, Long Mode - Compression Speed
  19 - Decompression Speed
  19 - Compression Speed
SPECFEM3D:
  Water-layered Halfspace
  Homogeneous Halfspace
  Tomographic Model
  Layered Halfspace
  Mount St. Helens
OpenRadioss:
  Rubber O-Ring Seal Installation
  Bird Strike on Windshield
  Cell Phone Drop Test
  Bumper Beam
OpenFOAM:
  drivaerFastback, Small Mesh Size - Execution Time
  drivaerFastback, Small Mesh Size - Mesh Time
Xcompact3d Incompact3d
nekRS
NAMD
miniBUDE:
  OpenMP - BM1:
    Billion Interactions/s
    GFInst/s
LeelaChessZero:
  Eigen
  BLAS