new xeon

Tests for a future article. Intel Xeon Gold 6421N testing with a Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS) and ASPEED on Ubuntu 22.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2307315-NE-NEWXEON9432
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Limit displaying results to tests within:

C++ Boost Tests 2 Tests
Timed Code Compilation 4 Tests
C/C++ Compiler Tests 3 Tests
CPU Massive 7 Tests
Creator Workloads 3 Tests
Database Test Suite 3 Tests
Fortran Tests 2 Tests
HPC - High Performance Computing 3 Tests
Multi-Core 8 Tests
OpenMPI Tests 5 Tests
Programmer / Developer System Benchmarks 4 Tests
Python Tests 2 Tests
Software Defined Radio 2 Tests
Server 3 Tests
Server CPU Tests 5 Tests
Common Workstation Benchmarks 2 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
a
July 30 2023
  5 Hours, 55 Minutes
b
July 31 2023
  5 Hours, 22 Minutes
Invert Hiding All Results Option
  5 Hours, 38 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


new xeonOpenBenchmarking.orgPhoronix Test SuiteIntel Xeon Gold 6421N @ 3.60GHz (32 Cores / 64 Threads)Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS)Intel Device 1bce512GB3 x 3841GB Micron_9300_MTFDHAL3T8TDPASPEEDVGA HDMI4 x Intel E810-C for QSFPUbuntu 22.045.15.0-47-generic (x86_64)GNOME Shell 42.4X Server 1.21.1.31.2.204GCC 11.2.0ext41600x1200ProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionNew Xeon BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x2b0000c0 - OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)- Python 3.10.6- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

a vs. b ComparisonPhoronix Test SuiteBaseline+9.5%+9.5%+19%+19%+28.5%+28.5%37.8%22.7%7.1%6.5%6.2%6.2%5.8%4.8%4.4%4.4%4%3.9%3.2%2.8%2.8%2.7%2.7%2.7%2.3%2%100 - 100 - 200100 - 100 - 20026%CPU Cache25615.9%200 - 100 - 200100 - 100 - 500500 - 1 - 5006.2%c2c - Stock - double - 128B.L.N.Q.A - A.M.SRedis - 100 - 1:106.2%200 - 100 - 2005.9%B.L.N.Q.A - A.M.S100 - 100 - 5005.4%500 - 1 - 500Cloning4.4%N.T.C.B.b.u.S - A.M.SN.T.C.B.b.u.S - A.M.S500 - 1 - 200r2c - Stock - float - 256500 - 1 - 2003.6%c2c - FFTW - double - 1283.4%Futex3.3%P.P.B.T.TPiper2c - FFTW - float - 256100 - 1 - 200200 - 1 - 200SENDFILERedis - 100 - 1:52.6%Matrix Math2.5%500 - 100 - 5002.5%200 - 100 - 5002.4%200 - 1 - 5002.4%200 - 100 - 50016 - 256 - 512Apache IoTDBApache IoTDBStress-NGlibxsmmApache IoTDBApache IoTDBApache IoTDBHeFFTe - Highly Efficient FFT for ExascaleNeural Magic DeepSparseRedis 7.0.12 + memtier_benchmarkApache IoTDBNeural Magic DeepSparseApache IoTDBApache IoTDBStress-NGNeural Magic DeepSparseNeural Magic DeepSparseApache IoTDBHeFFTe - Highly Efficient FFT for ExascaleApache IoTDBHeFFTe - Highly Efficient FFT for ExascaleStress-NGsrsRAN ProjectStress-NGHeFFTe - Highly Efficient FFT for ExascaleApache IoTDBApache IoTDBStress-NGRedis 7.0.12 + memtier_benchmarkStress-NGApache IoTDBApache IoTDBApache IoTDBApache IoTDBLiquid-DSPab

new xeonopenfoam: drivaerFastback, Medium Mesh Size - Execution Timeopenfoam: drivaerFastback, Medium Mesh Size - Mesh Timebrl-cad: VGR Performance Metricblender: Barbershop - CPU-Onlybuild-linux-kernel: allmodconfigbuild-llvm: Unix Makefileshpcg: 160 160 160 - 60libxsmm: 128build-llvm: Ninjahpcg: 144 144 144 - 60blender: Pabellon Barcelona - CPU-Onlylaghos: Sedov Blast Wave, ube_922_hex.meshhpcg: 104 104 104 - 60libxsmm: 256blender: Classroom - CPU-Onlycassandra: Writesdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamvvenc: Bosphorus 4K - Fastopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenfoam: drivaerFastback, Small Mesh Size - Mesh Timepalabos: 100memtier-benchmark: Redis - 100 - 1:10palabos: 400memtier-benchmark: Redis - 100 - 1:5apache-iotdb: 500 - 100 - 500apache-iotdb: 500 - 100 - 500memtier-benchmark: Redis - 50 - 1:5memtier-benchmark: Redis - 50 - 1:10palabos: 500blender: Fishy Cat - CPU-Onlylaghos: Triple Point Problemdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamvvenc: Bosphorus 4K - Fasterdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamblender: BMW27 - CPU-Onlyheffte: c2c - Stock - double - 512heffte: c2c - FFTW - double - 512apache-iotdb: 200 - 100 - 500apache-iotdb: 200 - 100 - 500deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streambuild-php: Time To Compilebuild-gdb: Time To Compilebuild-linux-kernel: defconfigdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamvvenc: Bosphorus 1080p - Fastdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamapache-iotdb: 500 - 100 - 200apache-iotdb: 500 - 100 - 200srsran: PUSCH Processor Benchmark, Throughput Totalstress-ng: IO_uringstress-ng: Atomicapache-iotdb: 500 - 1 - 500apache-iotdb: 500 - 1 - 500stress-ng: CPU Cachestress-ng: MMAPstress-ng: Cloningstress-ng: Mallocstress-ng: MEMFDstress-ng: Zlibstress-ng: Glibc Qsort Data Sortingstress-ng: Fused Multiply-Addstress-ng: Pthreadstress-ng: System V Message Passingstress-ng: Hashstress-ng: Vector Mathliquid-dsp: 64 - 256 - 512stress-ng: Futexstress-ng: Socket Activitystress-ng: Vector Shufflestress-ng: Matrix 3D Mathstress-ng: NUMAstress-ng: Vector Floating Pointstress-ng: Pipestress-ng: Wide Vector Mathstress-ng: x86_64 RdRandstress-ng: AVL Treestress-ng: Forkingstress-ng: CPU Stressstress-ng: Glibc C String Functionsstress-ng: Function Callstress-ng: Matrix Mathstress-ng: SENDFILEstress-ng: Cryptostress-ng: Mutexstress-ng: Context Switchingliquid-dsp: 32 - 256 - 512stress-ng: Floating Pointstress-ng: Memory Copyingstress-ng: Semaphoresstress-ng: Pollliquid-dsp: 16 - 256 - 512liquid-dsp: 64 - 256 - 57liquid-dsp: 64 - 256 - 32liquid-dsp: 32 - 256 - 57liquid-dsp: 32 - 256 - 32liquid-dsp: 16 - 256 - 57liquid-dsp: 16 - 256 - 32heffte: c2c - Stock - float - 512heffte: r2c - FFTW - double - 512heffte: r2c - Stock - double - 512apache-iotdb: 100 - 100 - 500apache-iotdb: 100 - 100 - 500heffte: c2c - FFTW - float - 512apache-iotdb: 200 - 100 - 200apache-iotdb: 200 - 100 - 200apache-iotdb: 200 - 1 - 500apache-iotdb: 200 - 1 - 500apache-iotdb: 500 - 1 - 200apache-iotdb: 500 - 1 - 200vvenc: Bosphorus 1080p - Fastersrsran: Downlink Processor Benchmarkapache-iotdb: 100 - 100 - 200apache-iotdb: 100 - 100 - 200apache-iotdb: 100 - 1 - 500apache-iotdb: 100 - 1 - 500apache-iotdb: 200 - 1 - 200apache-iotdb: 200 - 1 - 200apache-iotdb: 100 - 1 - 200apache-iotdb: 100 - 1 - 200libxsmm: 64heffte: r2c - Stock - float - 512heffte: r2c - FFTW - float - 512libxsmm: 32srsran: PUSCH Processor Benchmark, Throughput Threadheffte: c2c - FFTW - double - 256heffte: c2c - Stock - double - 256heffte: r2c - FFTW - double - 256heffte: c2c - Stock - float - 256heffte: c2c - FFTW - float - 256heffte: r2c - Stock - double - 256heffte: r2c - FFTW - float - 256heffte: r2c - Stock - float - 256heffte: c2c - Stock - double - 128heffte: c2c - FFTW - double - 128heffte: c2c - Stock - float - 128heffte: r2c - Stock - double - 128heffte: c2c - FFTW - float - 128heffte: r2c - FFTW - double - 128heffte: r2c - Stock - float - 128heffte: r2c - FFTW - float - 128ab615.99074144.69646466686493.45445.385323.85627.50861211.8263.15427.4213159.94216.8627.7808879.6127.78155626453.480235.15295.84267.70733127.965214235.1862447092.01287.2682285996.1768.3467607191.642211638.652316281.26300.27664.07177.7831.6750504.611411.02014.86001074.8218116.3761137.3780460.781834.5311468.804633.9358345.149146.330747.1540.743843.9665101.2545677447.24131.4497121.689340.9109390.907642.35141.90540.43854.0674295.829576.5807208.847176.5597208.897516.10033.3278479.787633.3894478.91084.94163227.095431.5856894390.615372.91529665.98133.8322.971916642.91537111.20861.289740.5799373474.31549.942647.81696.6534197705.63136846.015852281.715577252.32151386.315131350001541676.3624947.14167204.219599.93390.8758243.3835837711.851745029.27331416.52294.2689918.2164111.1126067360.6022028.03160653.44582724.6350240.0915147444.512572801.7538355500010587.487176.1962126446.213669281.6924394000017288500001577300000132810000084708500084843500055794500072.560974.473476.611069.0859041436.6478.829129.5454224351.126.291505080.349.491576432.2530.946705.831.8343074031.8428.271191500.8811.861045806.8114.58710382.44833.8137.536141.41440.0240.438.930438.961372.289375.089276.029976.9042149.825157.86746.635764.426385.739892.3973131.656121.794149.935207.244615.46018144.93674493.61445.380319.85227.39781225.0262.88427.3890217.1927.8405758.9127.76428.669537.32535.91767.56316327.948717234.8742304730.19285.7612227152.0268.0165935725.672217192.122293467.62300.85564.01176.9231.6488505.130910.99214.84731075.9571111.4976143.4387460.758834.5539460.670734.5447343.517046.549147.2240.664844.006498.8746726912.46131.0664122.036740.8061391.912542.38242.00640.45153.3291299.927775.7218211.227076.4684208.990816.24933.2781480.522333.3680479.22414.93123233.958831.6956137174.75543.71503623.79132.6121.632009050.461885833.11856.149326.0999251227.28549.552648.81696.9234050669.23136709.815854201.785583978.14151431.155130400001492979.4625282.31167202.079605.30392.0858232.7036852791.121750003.43331423.04294.6689966.2964118.8726125214.8422106.49156668.43598173.5650243.4815192892.592571092.6937865000010601.107180.4361651485.433671617.9724882000017337000001576850000132390000084767500086219500055865500072.539174.714876.604173.5656018457.8778.960531.6351199962.1126.641469808.899.871521587.430.927710.943.8634191814.8628.451185338.0212.181042859.0314.98697217.55839.9137.740141.193444.6236.338.518238.675772.198174.928675.300177.0345154.053164.04749.523062.297485.485090.9851130.982122.460151.803206.217OpenBenchmarking.org

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution Timeab130260390520650SE +/- 0.42, N = 2SE +/- 0.03, N = 2615.99615.461. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh Timeab306090120150SE +/- 0.01, N = 2SE +/- 0.08, N = 2144.70144.941. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

BRL-CAD

BRL-CAD is a cross-platform, open-source solid modeling system with built-in benchmark mode. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgVGR Performance Metric, More Is BetterBRL-CAD 7.36VGR Performance Metrica100K200K300K400K500KSE +/- 3768.50, N = 24666861. (CXX) g++ options: -std=c++14 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -ltcl8.6 -lregex_brl -lz_brl -lnetpbm -ldl -lm -ltk8.6

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-Onlyab110220330440550SE +/- 0.22, N = 2SE +/- 0.42, N = 2493.45493.61

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigab100200300400500SE +/- 1.46, N = 2SE +/- 1.13, N = 2445.39445.38

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Unix Makefilesab70140210280350SE +/- 5.08, N = 2SE +/- 5.88, N = 2323.86319.85

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60ab612182430SE +/- 0.03, N = 2SE +/- 0.07, N = 227.5127.401. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128ab30060090012001500SE +/- 4.60, N = 2SE +/- 1.10, N = 21211.81225.01. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Ninjaab60120180240300SE +/- 0.15, N = 2SE +/- 0.15, N = 2263.15262.88

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60ab612182430SE +/- 0.01, N = 2SE +/- 0.06, N = 227.4227.391. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-Onlya4080120160200SE +/- 0.04, N = 2159.94

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshab50100150200250SE +/- 0.24, N = 2SE +/- 0.18, N = 2216.86217.191. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60ab714212835SE +/- 0.03, N = 2SE +/- 0.01, N = 227.7827.841. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256ab2004006008001000SE +/- 0.65, N = 2SE +/- 5.75, N = 2879.6758.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-Onlyab306090120150SE +/- 0.05, N = 2SE +/- 0.13, N = 2127.78127.76

Apache Cassandra

This is a benchmark of the Apache Cassandra NoSQL database management system making use of cassandra-stress. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgOp/s, More Is BetterApache Cassandra 4.1.3Test: Writesa30K60K90K120K150KSE +/- 803.50, N = 2155626

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Streamab100200300400500SE +/- 0.26, N = 2SE +/- 4.41, N = 2453.48428.67

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Streamab918273645SE +/- 0.01, N = 2SE +/- 0.38, N = 235.1537.33

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fastab1.33132.66263.99395.32526.6565SE +/- 0.074, N = 2SE +/- 0.015, N = 25.8425.9171. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution Timeab1530456075SE +/- 0.09, N = 2SE +/- 0.11, N = 267.7167.561. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh Timeab714212835SE +/- 0.02, N = 2SE +/- 0.05, N = 227.9727.951. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 100ab50100150200250SE +/- 0.02, N = 2SE +/- 0.34, N = 2235.19234.871. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Redis 7.0.12 + memtier_benchmark

Memtier_benchmark is a NoSQL Redis/Memcache traffic generation plus benchmarking tool developed by Redis Labs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgOps/sec, More Is BetterRedis 7.0.12 + memtier_benchmark 2.0Protocol: Redis - Clients: 100 - Set To Get Ratio: 1:10ab500K1000K1500K2000K2500KSE +/- 114392.77, N = 2SE +/- 12975.09, N = 22447092.012304730.191. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 400ab60120180240300SE +/- 0.49, N = 2SE +/- 1.54, N = 2287.27285.761. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Redis 7.0.12 + memtier_benchmark

Memtier_benchmark is a NoSQL Redis/Memcache traffic generation plus benchmarking tool developed by Redis Labs. Learn more via the OpenBenchmarking.org test page.

Protocol: Redis - Clients: 500 - Set To Get Ratio: 1:10

a: The test run did not produce a result.

b: The test run did not produce a result.

Protocol: Redis - Clients: 500 - Set To Get Ratio: 1:5

a: The test run did not produce a result.

b: The test run did not produce a result.

OpenBenchmarking.orgOps/sec, More Is BetterRedis 7.0.12 + memtier_benchmark 2.0Protocol: Redis - Clients: 100 - Set To Get Ratio: 1:5ab500K1000K1500K2000K2500KSE +/- 6000.63, N = 2SE +/- 3990.38, N = 22285996.172227152.021. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500ab153045607568.3468.01MAX: 2006.68MAX: 1606.75

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500ab14M28M42M56M70M67607191.6465935725.67

Redis 7.0.12 + memtier_benchmark

Memtier_benchmark is a NoSQL Redis/Memcache traffic generation plus benchmarking tool developed by Redis Labs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgOps/sec, More Is BetterRedis 7.0.12 + memtier_benchmark 2.0Protocol: Redis - Clients: 50 - Set To Get Ratio: 1:5ab500K1000K1500K2000K2500KSE +/- 31848.80, N = 2SE +/- 39004.04, N = 22211638.652217192.121. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

OpenBenchmarking.orgOps/sec, More Is BetterRedis 7.0.12 + memtier_benchmark 2.0Protocol: Redis - Clients: 50 - Set To Get Ratio: 1:10ab500K1000K1500K2000K2500KSE +/- 13610.76, N = 2SE +/- 4548.93, N = 22316281.262293467.621. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 500ab70140210280350SE +/- 1.63, N = 2SE +/- 1.17, N = 2300.28300.861. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Onlyab1428425670SE +/- 0.08, N = 2SE +/- 0.20, N = 264.0764.01

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point Problemab4080120160200SE +/- 0.13, N = 2SE +/- 0.02, N = 2177.78176.921. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Streamab714212835SE +/- 0.01, N = 2SE +/- 0.01, N = 231.6831.65

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Streamab110220330440550SE +/- 0.18, N = 2SE +/- 0.12, N = 2504.61505.13

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fasterab3691215SE +/- 0.00, N = 2SE +/- 0.03, N = 211.0210.991. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Streamab48121620SE +/- 0.01, N = 2SE +/- 0.01, N = 214.8614.85

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Streamab2004006008001000SE +/- 0.57, N = 2SE +/- 1.01, N = 21074.821075.96

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Streamab306090120150SE +/- 3.45, N = 2SE +/- 0.59, N = 2116.38111.50

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Streamab306090120150SE +/- 4.10, N = 2SE +/- 0.80, N = 2137.38143.44

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamab100200300400500SE +/- 0.42, N = 2SE +/- 2.44, N = 2460.78460.76

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Streamab816243240SE +/- 0.06, N = 2SE +/- 0.12, N = 234.5334.55

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamab100200300400500SE +/- 1.46, N = 2SE +/- 0.20, N = 2468.80460.67

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Streamab816243240SE +/- 0.07, N = 2SE +/- 0.03, N = 233.9434.54

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamab80160240320400SE +/- 0.15, N = 2SE +/- 1.63, N = 2345.15343.52

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Streamab1122334455SE +/- 0.02, N = 2SE +/- 0.20, N = 246.3346.55

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Onlyab1122334455SE +/- 0.02, N = 2SE +/- 0.08, N = 247.1547.22

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512ab918273645SE +/- 0.05, N = 2SE +/- 0.00, N = 240.7440.661. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512ab1020304050SE +/- 0.04, N = 2SE +/- 0.02, N = 243.9744.011. (CXX) g++ options: -O3

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 500ab20406080100101.2598.87MAX: 3631.89MAX: 3564.64

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 500ab10M20M30M40M50M45677447.2446726912.46

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Streamab306090120150SE +/- 0.05, N = 2SE +/- 0.22, N = 2131.45131.07

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Streamab306090120150SE +/- 0.05, N = 2SE +/- 0.22, N = 2121.69122.04

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Streamab918273645SE +/- 0.11, N = 2SE +/- 0.01, N = 240.9140.81

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Streamab90180270360450SE +/- 1.01, N = 2SE +/- 0.12, N = 2390.91391.91

Timed PHP Compilation

This test times how long it takes to build PHP. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 8.1.9Time To Compileab1020304050SE +/- 0.34, N = 2SE +/- 0.48, N = 242.3542.38

Timed GDB GNU Debugger Compilation

This test times how long it takes to build the GNU Debugger (GDB) in a default configuration. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed GDB GNU Debugger Compilation 10.2Time To Compileab1020304050SE +/- 0.06, N = 2SE +/- 0.12, N = 241.9142.01

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: defconfigab918273645SE +/- 0.72, N = 2SE +/- 0.69, N = 240.4440.45

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamab1224364860SE +/- 0.01, N = 2SE +/- 0.09, N = 254.0753.33

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Streamab70140210280350SE +/- 0.05, N = 2SE +/- 0.51, N = 2295.83299.93

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Streamab20406080100SE +/- 0.13, N = 2SE +/- 0.04, N = 276.5875.72

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Streamab50100150200250SE +/- 0.34, N = 2SE +/- 0.12, N = 2208.85211.23

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Streamab20406080100SE +/- 0.03, N = 2SE +/- 0.04, N = 276.5676.47

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Streamab50100150200250SE +/- 0.10, N = 2SE +/- 0.05, N = 2208.90208.99

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 1080p - Video Preset: Fastab48121620SE +/- 0.17, N = 2SE +/- 0.02, N = 216.1016.251. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Streamab816243240SE +/- 0.01, N = 2SE +/- 0.04, N = 233.3333.28

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Streamab100200300400500SE +/- 0.12, N = 2SE +/- 0.54, N = 2479.79480.52

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamab816243240SE +/- 0.00, N = 2SE +/- 0.00, N = 233.3933.37

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Streamab100200300400500SE +/- 0.05, N = 2SE +/- 0.02, N = 2478.91479.22

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Streamab1.11192.22383.33574.44765.5595SE +/- 0.0128, N = 2SE +/- 0.0056, N = 24.94164.9312

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Streamab7001400210028003500SE +/- 8.40, N = 2SE +/- 3.51, N = 23227.103233.96

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 200ab71421283531.5831.69MAX: 1920.32MAX: 1610.79

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 200ab12M24M36M48M60M56894390.6156137174.70

srsRAN Project

srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.5Test: PUSCH Processor Benchmark, Throughput Totalab12002400360048006000SE +/- 143.30, N = 2SE +/- 95.40, N = 25372.95543.71. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: IO_uringab300K600K900K1200K1500KSE +/- 22482.34, N = 2SE +/- 5229.94, N = 21529665.981503623.791. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Atomicab306090120150SE +/- 1.05, N = 2SE +/- 0.20, N = 2133.83132.611. (CXX) g++ options: -O2 -std=gnu99 -lc

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500ab61218243022.9721.63MAX: 864.74MAX: 867.44

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500ab400K800K1200K1600K2000K1916642.902009050.46

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: CPU Cacheab400K800K1200K1600K2000KSE +/- 31294.95, N = 2SE +/- 234949.06, N = 21537111.201885833.111. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: MMAPab2004006008001000SE +/- 3.32, N = 2SE +/- 2.06, N = 2861.28856.141. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Cloningab2K4K6K8K10KSE +/- 114.33, N = 2SE +/- 100.16, N = 29740.579326.091. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Mallocab20M40M60M80M100MSE +/- 129754.02, N = 2SE +/- 83929.32, N = 299373474.3199251227.281. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: MEMFDab120240360480600SE +/- 1.31, N = 2SE +/- 1.20, N = 2549.94549.551. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Zlibab6001200180024003000SE +/- 0.06, N = 2SE +/- 0.65, N = 22647.812648.811. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Glibc Qsort Data Sortingab150300450600750SE +/- 0.40, N = 2SE +/- 0.46, N = 2696.65696.921. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Fused Multiply-Addab7M14M21M28M35MSE +/- 137631.48, N = 2SE +/- 285.63, N = 234197705.6334050669.231. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Pthreadab30K60K90K120K150KSE +/- 971.78, N = 2SE +/- 102.07, N = 2136846.01136709.811. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: System V Message Passingab1.3M2.6M3.9M5.2M6.5MSE +/- 7174.98, N = 2SE +/- 9802.94, N = 25852281.715854201.781. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Hashab1.2M2.4M3.6M4.8M6MSE +/- 3166.95, N = 2SE +/- 2865.25, N = 25577252.325583978.141. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Vector Mathab30K60K90K120K150KSE +/- 47.16, N = 2SE +/- 5.98, N = 2151386.31151431.151. (CXX) g++ options: -O2 -std=gnu99 -lc

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 64 - Buffer Length: 256 - Filter Length: 512ab110M220M330M440M550MSE +/- 385000.00, N = 2SE +/- 800000.00, N = 25131350005130400001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Futexab300K600K900K1200K1500KSE +/- 56630.43, N = 2SE +/- 45385.58, N = 21541676.361492979.461. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Socket Activityab5K10K15K20K25KSE +/- 72.57, N = 2SE +/- 267.39, N = 224947.1425282.311. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Vector Shuffleab40K80K120K160K200KSE +/- 6.63, N = 2SE +/- 6.04, N = 2167204.21167202.071. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Matrix 3D Mathab2K4K6K8K10KSE +/- 34.45, N = 2SE +/- 4.08, N = 29599.939605.301. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: NUMAab90180270360450SE +/- 0.88, N = 2SE +/- 0.05, N = 2390.87392.081. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Vector Floating Pointab12K24K36K48K60KSE +/- 30.71, N = 2SE +/- 4.11, N = 258243.3858232.701. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Pipeab8M16M24M32M40MSE +/- 1105250.10, N = 2SE +/- 79631.10, N = 235837711.8536852791.121. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Wide Vector Mathab400K800K1200K1600K2000KSE +/- 918.08, N = 2SE +/- 4139.63, N = 21745029.271750003.431. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: x86_64 RdRandab70K140K210K280K350KSE +/- 2.35, N = 2SE +/- 1.14, N = 2331416.52331423.041. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: AVL Treeab60120180240300SE +/- 0.32, N = 2SE +/- 0.85, N = 2294.26294.661. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Forkingab20K40K60K80K100KSE +/- 469.20, N = 2SE +/- 421.24, N = 289918.2189966.291. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: CPU Stressab14K28K42K56K70KSE +/- 12.73, N = 2SE +/- 38.95, N = 264111.1164118.871. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Glibc C String Functionsab6M12M18M24M30MSE +/- 150617.25, N = 2SE +/- 69329.81, N = 226067360.6026125214.841. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Function Callab5K10K15K20K25KSE +/- 80.03, N = 2SE +/- 74.09, N = 222028.0322106.491. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Matrix Mathab30K60K90K120K150KSE +/- 2867.57, N = 2SE +/- 332.46, N = 2160653.44156668.431. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: SENDFILEab130K260K390K520K650KSE +/- 6799.74, N = 2SE +/- 243.97, N = 2582724.63598173.561. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Cryptoab11K22K33K44K55KSE +/- 3.65, N = 2SE +/- 18.13, N = 250240.0950243.481. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Mutexab3M6M9M12M15MSE +/- 23940.47, N = 2SE +/- 2864.48, N = 215147444.5115192892.591. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Context Switchingab600K1200K1800K2400K3000KSE +/- 678.57, N = 2SE +/- 604.17, N = 22572801.752571092.691. (CXX) g++ options: -O2 -std=gnu99 -lc

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 512ab80M160M240M320M400MSE +/- 1955000.00, N = 2SE +/- 4920000.00, N = 23835550003786500001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Floating Pointab2K4K6K8K10KSE +/- 1.07, N = 2SE +/- 17.77, N = 210587.4810601.101. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Memory Copyingab15003000450060007500SE +/- 8.71, N = 2SE +/- 11.04, N = 27176.197180.431. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Semaphoresab13M26M39M52M65MSE +/- 2077286.42, N = 2SE +/- 466593.23, N = 262126446.2161651485.431. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Pollab800K1600K2400K3200K4000KSE +/- 2536.76, N = 2SE +/- 1953.54, N = 23669281.693671617.971. (CXX) g++ options: -O2 -std=gnu99 -lc

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 16 - Buffer Length: 256 - Filter Length: 512ab50M100M150M200M250MSE +/- 1950000.00, N = 2SE +/- 3170000.00, N = 22439400002488200001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 64 - Buffer Length: 256 - Filter Length: 57ab400M800M1200M1600M2000MSE +/- 550000.00, N = 2SE +/- 900000.00, N = 2172885000017337000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 64 - Buffer Length: 256 - Filter Length: 32ab300M600M900M1200M1500MSE +/- 300000.00, N = 2SE +/- 450000.00, N = 2157730000015768500001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57ab300M600M900M1200M1500MSE +/- 300000.00, N = 2SE +/- 4400000.00, N = 2132810000013239000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32ab200M400M600M800M1000MSE +/- 25000.00, N = 2SE +/- 85000.00, N = 28470850008476750001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 16 - Buffer Length: 256 - Filter Length: 57ab200M400M600M800M1000MSE +/- 14365000.00, N = 2SE +/- 695000.00, N = 28484350008621950001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 16 - Buffer Length: 256 - Filter Length: 32ab120M240M360M480M600MSE +/- 2065000.00, N = 2SE +/- 605000.00, N = 25579450005586550001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512ab1632486480SE +/- 0.21, N = 2SE +/- 0.00, N = 272.5672.541. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512ab20406080100SE +/- 0.48, N = 2SE +/- 0.16, N = 274.4774.711. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512ab20406080100SE +/- 0.01, N = 2SE +/- 0.11, N = 276.6176.601. (CXX) g++ options: -O3

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500ab163248648069.0873.56MAX: 1049.85MAX: 1309.93

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500ab13M26M39M52M65M59041436.6456018457.87

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512ab20406080100SE +/- 0.36, N = 2SE +/- 0.06, N = 278.8378.961. (CXX) g++ options: -O3

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 200ab71421283529.5431.63MAX: 746.57MAX: 718.08

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 200ab12M24M36M48M60M54224351.1051199962.11

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500ab61218243026.2926.64MAX: 620.79MAX: 636.93

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500ab300K600K900K1200K1500K1505080.341469808.89

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200ab36912159.499.87MAX: 845.95MAX: 820.85

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200ab300K600K900K1200K1500K1576432.251521587.40

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 1080p - Video Preset: Fasterab714212835SE +/- 0.06, N = 2SE +/- 0.04, N = 230.9530.931. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto

srsRAN Project

srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.5Test: Downlink Processor Benchmarkab150300450600750SE +/- 5.15, N = 2SE +/- 1.60, N = 2705.8710.91. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest

Apache IoTDB

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200ab102030405031.8343.86MAX: 790.74MAX: 2550.76

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200ab9M18M27M36M45M43074031.8434191814.86

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500ab71421283528.2728.45MAX: 671.77MAX: 664.29

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500ab300K600K900K1200K1500K1191500.881185338.02

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200ab369121511.8612.18MAX: 573.1MAX: 586.62

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200ab200K400K600K800K1000K1045806.811042859.03

OpenBenchmarking.orgAverage Latency, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200ab4812162014.5814.98MAX: 679.89MAX: 612.21

OpenBenchmarking.orgpoint/sec, More Is BetterApache IoTDB 1.1.2Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200ab150K300K450K600K750K710382.44697217.55

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64ab2004006008001000SE +/- 1.05, N = 2SE +/- 0.20, N = 2833.8839.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512ab306090120150SE +/- 0.00, N = 2SE +/- 0.33, N = 2137.54137.741. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512ab306090120150SE +/- 0.63, N = 2SE +/- 0.20, N = 2141.41141.191. (CXX) g++ options: -O3

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32ab100200300400500SE +/- 0.25, N = 2SE +/- 0.15, N = 2440.0444.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

srsRAN Project

srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.5Test: PUSCH Processor Benchmark, Throughput Threadab50100150200250SE +/- 3.55, N = 2SE +/- 0.10, N = 2240.4236.31. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256ab918273645SE +/- 0.25, N = 2SE +/- 0.16, N = 238.9338.521. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 256ab918273645SE +/- 0.07, N = 2SE +/- 0.07, N = 238.9638.681. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256ab1632486480SE +/- 0.44, N = 2SE +/- 0.12, N = 272.2972.201. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256ab20406080100SE +/- 0.48, N = 2SE +/- 0.10, N = 275.0974.931. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256ab20406080100SE +/- 0.70, N = 2SE +/- 0.08, N = 276.0375.301. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256ab20406080100SE +/- 0.40, N = 2SE +/- 0.65, N = 276.9077.031. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256ab306090120150SE +/- 3.76, N = 2SE +/- 1.59, N = 2149.83154.051. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 256ab4080120160200SE +/- 6.51, N = 2SE +/- 3.21, N = 2157.87164.051. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 128ab1122334455SE +/- 0.26, N = 2SE +/- 3.39, N = 246.6449.521. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128ab1428425670SE +/- 2.73, N = 2SE +/- 2.41, N = 264.4362.301. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 128ab20406080100SE +/- 1.30, N = 2SE +/- 0.88, N = 285.7485.491. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 128ab20406080100SE +/- 0.90, N = 2SE +/- 0.09, N = 292.4090.991. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128ab306090120150SE +/- 0.77, N = 2SE +/- 0.61, N = 2131.66130.981. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128ab306090120150SE +/- 0.56, N = 2SE +/- 1.22, N = 2121.79122.461. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 128ab306090120150SE +/- 1.93, N = 2SE +/- 1.24, N = 2149.94151.801. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128ab50100150200250SE +/- 0.61, N = 2SE +/- 0.19, N = 2207.24206.221. (CXX) g++ options: -O3

164 Results Shown

OpenFOAM:
  drivaerFastback, Medium Mesh Size - Execution Time
  drivaerFastback, Medium Mesh Size - Mesh Time
BRL-CAD
Blender
Timed Linux Kernel Compilation
Timed LLVM Compilation
High Performance Conjugate Gradient
libxsmm
Timed LLVM Compilation
High Performance Conjugate Gradient
Blender
Laghos
High Performance Conjugate Gradient
libxsmm
Blender
Apache Cassandra
Neural Magic DeepSparse:
  BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
    ms/batch
    items/sec
VVenC
OpenFOAM:
  drivaerFastback, Small Mesh Size - Execution Time
  drivaerFastback, Small Mesh Size - Mesh Time
Palabos
Redis 7.0.12 + memtier_benchmark
Palabos
Redis 7.0.12 + memtier_benchmark
Apache IoTDB:
  500 - 100 - 500:
    Average Latency
    point/sec
Redis 7.0.12 + memtier_benchmark:
  Redis - 50 - 1:5
  Redis - 50 - 1:10
Palabos
Blender
Laghos
Neural Magic DeepSparse:
  BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
VVenC
Neural Magic DeepSparse:
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Blender
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - Stock - double - 512
  c2c - FFTW - double - 512
Apache IoTDB:
  200 - 100 - 500:
    Average Latency
    point/sec
Neural Magic DeepSparse:
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Timed PHP Compilation
Timed GDB GNU Debugger Compilation
Timed Linux Kernel Compilation
Neural Magic DeepSparse:
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
    ms/batch
    items/sec
VVenC
Neural Magic DeepSparse:
  ResNet-50, Baseline - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Apache IoTDB:
  500 - 100 - 200:
    Average Latency
    point/sec
srsRAN Project
Stress-NG:
  IO_uring
  Atomic
Apache IoTDB:
  500 - 1 - 500:
    Average Latency
    point/sec
Stress-NG:
  CPU Cache
  MMAP
  Cloning
  Malloc
  MEMFD
  Zlib
  Glibc Qsort Data Sorting
  Fused Multiply-Add
  Pthread
  System V Message Passing
  Hash
  Vector Math
Liquid-DSP
Stress-NG:
  Futex
  Socket Activity
  Vector Shuffle
  Matrix 3D Math
  NUMA
  Vector Floating Point
  Pipe
  Wide Vector Math
  x86_64 RdRand
  AVL Tree
  Forking
  CPU Stress
  Glibc C String Functions
  Function Call
  Matrix Math
  SENDFILE
  Crypto
  Mutex
  Context Switching
Liquid-DSP
Stress-NG:
  Floating Point
  Memory Copying
  Semaphores
  Poll
Liquid-DSP:
  16 - 256 - 512
  64 - 256 - 57
  64 - 256 - 32
  32 - 256 - 57
  32 - 256 - 32
  16 - 256 - 57
  16 - 256 - 32
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - Stock - float - 512
  r2c - FFTW - double - 512
  r2c - Stock - double - 512
Apache IoTDB:
  100 - 100 - 500:
    Average Latency
    point/sec
HeFFTe - Highly Efficient FFT for Exascale
Apache IoTDB:
  200 - 100 - 200:
    Average Latency
    point/sec
  200 - 1 - 500:
    Average Latency
    point/sec
  500 - 1 - 200:
    Average Latency
    point/sec
VVenC
srsRAN Project
Apache IoTDB:
  100 - 100 - 200:
    Average Latency
    point/sec
  100 - 1 - 500:
    Average Latency
    point/sec
  200 - 1 - 200:
    Average Latency
    point/sec
  100 - 1 - 200:
    Average Latency
    point/sec
libxsmm
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - Stock - float - 512
  r2c - FFTW - float - 512
libxsmm
srsRAN Project
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - FFTW - double - 256
  c2c - Stock - double - 256
  r2c - FFTW - double - 256
  c2c - Stock - float - 256
  c2c - FFTW - float - 256
  r2c - Stock - double - 256
  r2c - FFTW - float - 256
  r2c - Stock - float - 256
  c2c - Stock - double - 128
  c2c - FFTW - double - 128
  c2c - Stock - float - 128
  r2c - Stock - double - 128
  c2c - FFTW - float - 128
  r2c - FFTW - double - 128
  r2c - Stock - float - 128
  r2c - FFTW - float - 128