new xeon Tests for a future article. Intel Xeon Gold 6421N testing with a Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS) and ASPEED on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2307315-NE-NEWXEON9432&grs&sor .
new xeon Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution a b Intel Xeon Gold 6421N @ 3.60GHz (32 Cores / 64 Threads) Quanta Cloud S6Q-MB-MPS (3A10.uh BIOS) Intel Device 1bce 512GB 3 x 3841GB Micron_9300_MTFDHAL3T8TDP ASPEED VGA HDMI 4 x Intel E810-C for QSFP Ubuntu 22.04 5.15.0-47-generic (x86_64) GNOME Shell 42.4 X Server 1.21.1.3 1.2.204 GCC 11.2.0 ext4 1600x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x2b0000c0 Java Details - OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04) Python Details - Python 3.10.6 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
new xeon apache-iotdb: 100 - 100 - 200 apache-iotdb: 100 - 100 - 200 stress-ng: CPU Cache libxsmm: 256 apache-iotdb: 200 - 100 - 200 apache-iotdb: 100 - 100 - 500 apache-iotdb: 500 - 1 - 500 heffte: c2c - Stock - double - 128 deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream memtier-benchmark: Redis - 100 - 1:10 apache-iotdb: 200 - 100 - 200 deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream apache-iotdb: 100 - 100 - 500 apache-iotdb: 500 - 1 - 500 stress-ng: Cloning deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream apache-iotdb: 500 - 1 - 200 heffte: r2c - Stock - float - 256 apache-iotdb: 500 - 1 - 200 heffte: c2c - FFTW - double - 128 stress-ng: Futex srsran: PUSCH Processor Benchmark, Throughput Total stress-ng: Pipe heffte: r2c - FFTW - float - 256 apache-iotdb: 100 - 1 - 200 apache-iotdb: 200 - 1 - 200 stress-ng: SENDFILE memtier-benchmark: Redis - 100 - 1:5 stress-ng: Matrix Math apache-iotdb: 500 - 100 - 500 apache-iotdb: 200 - 100 - 500 apache-iotdb: 200 - 1 - 500 apache-iotdb: 200 - 100 - 500 liquid-dsp: 16 - 256 - 512 apache-iotdb: 100 - 1 - 200 deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream srsran: PUSCH Processor Benchmark, Throughput Thread stress-ng: IO_uring liquid-dsp: 16 - 256 - 57 heffte: r2c - Stock - double - 128 deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream apache-iotdb: 500 - 100 - 200 stress-ng: Socket Activity apache-iotdb: 200 - 1 - 500 liquid-dsp: 32 - 256 - 512 vvenc: Bosphorus 4K - Fast build-llvm: Unix Makefiles heffte: r2c - Stock - float - 128 deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream libxsmm: 128 heffte: c2c - FFTW - double - 256 libxsmm: 32 memtier-benchmark: Redis - 50 - 1:10 heffte: c2c - FFTW - float - 256 vvenc: Bosphorus 1080p - Fast stress-ng: Atomic stress-ng: Semaphores heffte: c2c - Stock - double - 256 libxsmm: 64 srsran: Downlink Processor Benchmark apache-iotdb: 100 - 1 - 500 stress-ng: MMAP heffte: r2c - FFTW - double - 128 palabos: 400 apache-iotdb: 100 - 1 - 500 heffte: c2c - FFTW - float - 128 heffte: r2c - FFTW - float - 128 laghos: Triple Point Problem apache-iotdb: 500 - 100 - 500 deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream stress-ng: Fused Multiply-Add hpcg: 160 160 160 - 60 stress-ng: Function Call apache-iotdb: 500 - 100 - 200 heffte: r2c - FFTW - double - 512 liquid-dsp: 32 - 256 - 57 stress-ng: NUMA stress-ng: Mutex heffte: c2c - Stock - float - 128 deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream stress-ng: Wide Vector Math apache-iotdb: 200 - 1 - 200 liquid-dsp: 64 - 256 - 57 deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream vvenc: Bosphorus 4K - Faster memtier-benchmark: Redis - 50 - 1:5 build-gdb: Time To Compile stress-ng: Glibc C String Functions hpcg: 104 104 104 - 60 heffte: c2c - Stock - float - 256 openfoam: drivaerFastback, Small Mesh Size - Execution Time deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream heffte: c2c - Stock - double - 512 palabos: 500 heffte: r2c - Stock - double - 256 heffte: c2c - FFTW - float - 512 openfoam: drivaerFastback, Medium Mesh Size - Mesh Time heffte: r2c - FFTW - float - 512 deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream laghos: Sedov Blast Wave, ube_922_hex.mesh deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream blender: BMW27 - CPU-Only heffte: r2c - Stock - float - 512 stress-ng: AVL Tree palabos: 100 stress-ng: Floating Point liquid-dsp: 16 - 256 - 32 heffte: r2c - FFTW - double - 256 stress-ng: Malloc stress-ng: Hash deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream hpcg: 144 144 144 - 60 deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream build-llvm: Ninja stress-ng: Pthread blender: Fishy Cat - CPU-Only heffte: c2c - FFTW - double - 512 openfoam: drivaerFastback, Medium Mesh Size - Execution Time deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream build-php: Time To Compile stress-ng: MEMFD liquid-dsp: 32 - 256 - 32 stress-ng: Context Switching deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream stress-ng: Poll vvenc: Bosphorus 1080p - Faster stress-ng: Memory Copying openfoam: drivaerFastback, Small Mesh Size - Mesh Time stress-ng: Matrix 3D Math stress-ng: Forking deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream stress-ng: Glibc Qsort Data Sorting stress-ng: Zlib stress-ng: System V Message Passing blender: Barbershop - CPU-Only build-linux-kernel: defconfig heffte: c2c - Stock - float - 512 stress-ng: Vector Math liquid-dsp: 64 - 256 - 32 liquid-dsp: 64 - 256 - 512 stress-ng: Vector Floating Point blender: Classroom - CPU-Only stress-ng: CPU Stress heffte: r2c - Stock - double - 512 stress-ng: Crypto deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream stress-ng: x86_64 RdRand stress-ng: Vector Shuffle build-linux-kernel: allmodconfig brl-cad: VGR Performance Metric cassandra: Writes blender: Pabellon Barcelona - CPU-Only memtier-benchmark: Redis - 500 - 1:5 a b 31.83 43074031.84 1537111.20 879.6 29.54 69.08 22.97 46.6357 35.1529 2447092.01 54224351.1 453.4802 59041436.64 1916642.9 9740.57 137.3780 116.3761 9.49 157.867 1576432.25 64.4263 1541676.36 5372.9 35837711.85 149.825 14.58 11.86 582724.63 2285996.17 160653.44 67607191.64 101.25 1505080.34 45677447.24 243940000 710382.44 33.9358 468.8046 240.4 1529665.98 848435000 92.3973 295.8295 54.0674 56894390.61 24947.14 26.29 383555000 5.842 323.856 149.935 208.8471 76.5807 1211.8 38.9304 440.0 2316281.26 76.0299 16.100 133.83 62126446.21 38.9613 833.8 705.8 28.27 861.28 121.794 287.268 1191500.88 131.656 207.244 177.78 68.34 345.1491 46.3307 34197705.63 27.5086 22028.03 31.58 74.4734 1328100000 390.87 15147444.51 85.7398 131.4497 121.6893 1745029.27 1045806.81 1728850000 390.9076 40.9109 11.020 2211638.65 41.905 26067360.60 27.7808 75.0892 67.707331 3227.0954 4.9416 40.7438 300.276 76.9042 78.8291 144.69646 141.41 479.7876 216.86 33.3278 47.15 137.536 294.26 235.186 10587.48 557945000 72.2893 99373474.31 5577252.32 76.5597 27.4213 1074.8218 504.6114 263.154 136846.01 64.07 43.9665 615.99074 14.8600 31.6750 42.351 549.94 847085000 2572801.75 34.5311 478.9108 33.3894 3669281.69 30.946 7176.19 27.965214 9599.93 89918.21 208.8975 696.65 2647.81 5852281.71 493.45 40.438 72.5609 151386.31 1577300000 513135000 58243.38 127.78 64111.11 76.6110 50240.09 460.7818 331416.52 167204.21 445.385 466686 155626 159.94 43.86 34191814.86 1885833.11 758.9 31.63 73.56 21.63 49.5230 37.3253 2304730.19 51199962.11 428.6695 56018457.87 2009050.46 9326.09 143.4387 111.4976 9.87 164.047 1521587.4 62.2974 1492979.46 5543.7 36852791.12 154.053 14.98 12.18 598173.56 2227152.02 156668.43 65935725.67 98.87 1469808.89 46726912.46 248820000 697217.55 34.5447 460.6707 236.3 1503623.79 862195000 90.9851 299.9277 53.3291 56137174.7 25282.31 26.64 378650000 5.917 319.852 151.803 211.2270 75.7218 1225.0 38.5182 444.6 2293467.62 75.3001 16.249 132.61 61651485.43 38.6757 839.9 710.9 28.45 856.14 122.460 285.761 1185338.02 130.982 206.217 176.92 68.01 343.5170 46.5491 34050669.23 27.3978 22106.49 31.69 74.7148 1323900000 392.08 15192892.59 85.4850 131.0664 122.0367 1750003.43 1042859.03 1733700000 391.9125 40.8061 10.992 2217192.12 42.006 26125214.84 27.8405 74.9286 67.563163 3233.9588 4.9312 40.6648 300.855 77.0345 78.9605 144.93674 141.193 480.5223 217.19 33.2781 47.22 137.740 294.66 234.874 10601.10 558655000 72.1981 99251227.28 5583978.14 76.4684 27.3890 1075.9571 505.1309 262.884 136709.81 64.01 44.0064 615.46018 14.8473 31.6488 42.382 549.55 847675000 2571092.69 34.5539 479.2241 33.3680 3671617.97 30.927 7180.43 27.948717 9605.30 89966.29 208.9908 696.92 2648.81 5854201.78 493.61 40.451 72.5391 151431.15 1576850000 513040000 58232.70 127.76 64118.87 76.6041 50243.48 460.7588 331423.04 167202.07 445.380 OpenBenchmarking.org
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 b a 10 20 30 40 50 43.86 31.83 MAX: 2550.76 MAX: 790.74
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 a b 9M 18M 27M 36M 45M 43074031.84 34191814.86
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: CPU Cache b a 400K 800K 1200K 1600K 2000K SE +/- 234949.06, N = 2 SE +/- 31294.95, N = 2 1885833.11 1537111.20 1. (CXX) g++ options: -O2 -std=gnu99 -lc
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 a b 200 400 600 800 1000 SE +/- 0.65, N = 2 SE +/- 5.75, N = 2 879.6 758.9 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Apache IoTDB Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 200 b a 7 14 21 28 35 31.63 29.54 MAX: 718.08 MAX: 746.57
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 b a 16 32 48 64 80 73.56 69.08 MAX: 1309.93 MAX: 1049.85
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 a b 6 12 18 24 30 22.97 21.63 MAX: 864.74 MAX: 867.44
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 b a 11 22 33 44 55 SE +/- 3.39, N = 2 SE +/- 0.26, N = 2 49.52 46.64 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream b a 9 18 27 36 45 SE +/- 0.38, N = 2 SE +/- 0.01, N = 2 37.33 35.15
Redis 7.0.12 + memtier_benchmark Protocol: Redis - Clients: 100 - Set To Get Ratio: 1:10 OpenBenchmarking.org Ops/sec, More Is Better Redis 7.0.12 + memtier_benchmark 2.0 Protocol: Redis - Clients: 100 - Set To Get Ratio: 1:10 a b 500K 1000K 1500K 2000K 2500K SE +/- 114392.77, N = 2 SE +/- 12975.09, N = 2 2447092.01 2304730.19 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Apache IoTDB Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 200 a b 12M 24M 36M 48M 60M 54224351.10 51199962.11
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream b a 100 200 300 400 500 SE +/- 4.41, N = 2 SE +/- 0.26, N = 2 428.67 453.48
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 a b 13M 26M 39M 52M 65M 59041436.64 56018457.87
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 b a 400K 800K 1200K 1600K 2000K 2009050.46 1916642.90
Stress-NG Test: Cloning OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Cloning a b 2K 4K 6K 8K 10K SE +/- 114.33, N = 2 SE +/- 100.16, N = 2 9740.57 9326.09 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.80, N = 2 SE +/- 4.10, N = 2 143.44 137.38
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.59, N = 2 SE +/- 3.45, N = 2 111.50 116.38
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 b a 3 6 9 12 15 9.87 9.49 MAX: 820.85 MAX: 845.95
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 b a 40 80 120 160 200 SE +/- 3.21, N = 2 SE +/- 6.51, N = 2 164.05 157.87 1. (CXX) g++ options: -O3
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 a b 300K 600K 900K 1200K 1500K 1576432.25 1521587.40
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 a b 14 28 42 56 70 SE +/- 2.73, N = 2 SE +/- 2.41, N = 2 64.43 62.30 1. (CXX) g++ options: -O3
Stress-NG Test: Futex OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Futex a b 300K 600K 900K 1200K 1500K SE +/- 56630.43, N = 2 SE +/- 45385.58, N = 2 1541676.36 1492979.46 1. (CXX) g++ options: -O2 -std=gnu99 -lc
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: PUSCH Processor Benchmark, Throughput Total b a 1200 2400 3600 4800 6000 SE +/- 95.40, N = 2 SE +/- 143.30, N = 2 5543.7 5372.9 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
Stress-NG Test: Pipe OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Pipe b a 8M 16M 24M 32M 40M SE +/- 79631.10, N = 2 SE +/- 1105250.10, N = 2 36852791.12 35837711.85 1. (CXX) g++ options: -O2 -std=gnu99 -lc
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 b a 30 60 90 120 150 SE +/- 1.59, N = 2 SE +/- 3.76, N = 2 154.05 149.83 1. (CXX) g++ options: -O3
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 b a 4 8 12 16 20 14.98 14.58 MAX: 612.21 MAX: 679.89
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 b a 3 6 9 12 15 12.18 11.86 MAX: 586.62 MAX: 573.1
Stress-NG Test: SENDFILE OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: SENDFILE b a 130K 260K 390K 520K 650K SE +/- 243.97, N = 2 SE +/- 6799.74, N = 2 598173.56 582724.63 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Redis 7.0.12 + memtier_benchmark Protocol: Redis - Clients: 100 - Set To Get Ratio: 1:5 OpenBenchmarking.org Ops/sec, More Is Better Redis 7.0.12 + memtier_benchmark 2.0 Protocol: Redis - Clients: 100 - Set To Get Ratio: 1:5 a b 500K 1000K 1500K 2000K 2500K SE +/- 6000.63, N = 2 SE +/- 3990.38, N = 2 2285996.17 2227152.02 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Matrix Math a b 30K 60K 90K 120K 150K SE +/- 2867.57, N = 2 SE +/- 332.46, N = 2 160653.44 156668.43 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 a b 14M 28M 42M 56M 70M 67607191.64 65935725.67
Apache IoTDB Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 500 a b 20 40 60 80 100 101.25 98.87 MAX: 3631.89 MAX: 3564.64
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 a b 300K 600K 900K 1200K 1500K 1505080.34 1469808.89
Apache IoTDB Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 100 - Sensor Count: 500 b a 10M 20M 30M 40M 50M 46726912.46 45677447.24
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 512 b a 50M 100M 150M 200M 250M SE +/- 3170000.00, N = 2 SE +/- 1950000.00, N = 2 248820000 243940000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 a b 150K 300K 450K 600K 750K 710382.44 697217.55
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream b a 8 16 24 32 40 SE +/- 0.03, N = 2 SE +/- 0.07, N = 2 34.54 33.94
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream b a 100 200 300 400 500 SE +/- 0.20, N = 2 SE +/- 1.46, N = 2 460.67 468.80
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: PUSCH Processor Benchmark, Throughput Thread a b 50 100 150 200 250 SE +/- 3.55, N = 2 SE +/- 0.10, N = 2 240.4 236.3 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
Stress-NG Test: IO_uring OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: IO_uring a b 300K 600K 900K 1200K 1500K SE +/- 22482.34, N = 2 SE +/- 5229.94, N = 2 1529665.98 1503623.79 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 57 b a 200M 400M 600M 800M 1000M SE +/- 695000.00, N = 2 SE +/- 14365000.00, N = 2 862195000 848435000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 a b 20 40 60 80 100 SE +/- 0.90, N = 2 SE +/- 0.09, N = 2 92.40 90.99 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream b a 70 140 210 280 350 SE +/- 0.51, N = 2 SE +/- 0.05, N = 2 299.93 295.83
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream b a 12 24 36 48 60 SE +/- 0.09, N = 2 SE +/- 0.01, N = 2 53.33 54.07
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 200 a b 12M 24M 36M 48M 60M 56894390.61 56137174.70
Stress-NG Test: Socket Activity OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Socket Activity b a 5K 10K 15K 20K 25K SE +/- 267.39, N = 2 SE +/- 72.57, N = 2 25282.31 24947.14 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 b a 6 12 18 24 30 26.64 26.29 MAX: 636.93 MAX: 620.79
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 a b 80M 160M 240M 320M 400M SE +/- 1955000.00, N = 2 SE +/- 4920000.00, N = 2 383555000 378650000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
VVenC Video Input: Bosphorus 4K - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Fast b a 1.3313 2.6626 3.9939 5.3252 6.6565 SE +/- 0.015, N = 2 SE +/- 0.074, N = 2 5.917 5.842 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
Timed LLVM Compilation Build System: Unix Makefiles OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 16.0 Build System: Unix Makefiles b a 70 140 210 280 350 SE +/- 5.88, N = 2 SE +/- 5.08, N = 2 319.85 323.86
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 b a 30 60 90 120 150 SE +/- 1.24, N = 2 SE +/- 1.93, N = 2 151.80 149.94 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 50 100 150 200 250 SE +/- 0.12, N = 2 SE +/- 0.34, N = 2 211.23 208.85
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 20 40 60 80 100 SE +/- 0.04, N = 2 SE +/- 0.13, N = 2 75.72 76.58
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 b a 300 600 900 1200 1500 SE +/- 1.10, N = 2 SE +/- 4.60, N = 2 1225.0 1211.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 a b 9 18 27 36 45 SE +/- 0.25, N = 2 SE +/- 0.16, N = 2 38.93 38.52 1. (CXX) g++ options: -O3
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 b a 100 200 300 400 500 SE +/- 0.15, N = 2 SE +/- 0.25, N = 2 444.6 440.0 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Redis 7.0.12 + memtier_benchmark Protocol: Redis - Clients: 50 - Set To Get Ratio: 1:10 OpenBenchmarking.org Ops/sec, More Is Better Redis 7.0.12 + memtier_benchmark 2.0 Protocol: Redis - Clients: 50 - Set To Get Ratio: 1:10 a b 500K 1000K 1500K 2000K 2500K SE +/- 13610.76, N = 2 SE +/- 4548.93, N = 2 2316281.26 2293467.62 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 a b 20 40 60 80 100 SE +/- 0.70, N = 2 SE +/- 0.08, N = 2 76.03 75.30 1. (CXX) g++ options: -O3
VVenC Video Input: Bosphorus 1080p - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Fast b a 4 8 12 16 20 SE +/- 0.02, N = 2 SE +/- 0.17, N = 2 16.25 16.10 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
Stress-NG Test: Atomic OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Atomic a b 30 60 90 120 150 SE +/- 1.05, N = 2 SE +/- 0.20, N = 2 133.83 132.61 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Semaphores OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Semaphores a b 13M 26M 39M 52M 65M SE +/- 2077286.42, N = 2 SE +/- 466593.23, N = 2 62126446.21 61651485.43 1. (CXX) g++ options: -O2 -std=gnu99 -lc
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 a b 9 18 27 36 45 SE +/- 0.07, N = 2 SE +/- 0.07, N = 2 38.96 38.68 1. (CXX) g++ options: -O3
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 b a 200 400 600 800 1000 SE +/- 0.20, N = 2 SE +/- 1.05, N = 2 839.9 833.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
srsRAN Project Test: Downlink Processor Benchmark OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: Downlink Processor Benchmark b a 150 300 450 600 750 SE +/- 1.60, N = 2 SE +/- 5.15, N = 2 710.9 705.8 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 b a 7 14 21 28 35 28.45 28.27 MAX: 664.29 MAX: 671.77
Stress-NG Test: MMAP OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: MMAP a b 200 400 600 800 1000 SE +/- 3.32, N = 2 SE +/- 2.06, N = 2 861.28 856.14 1. (CXX) g++ options: -O2 -std=gnu99 -lc
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 b a 30 60 90 120 150 SE +/- 1.22, N = 2 SE +/- 0.56, N = 2 122.46 121.79 1. (CXX) g++ options: -O3
Palabos Grid Size: 400 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 400 a b 60 120 180 240 300 SE +/- 0.49, N = 2 SE +/- 1.54, N = 2 287.27 285.76 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 a b 300K 600K 900K 1200K 1500K 1191500.88 1185338.02
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 a b 30 60 90 120 150 SE +/- 0.77, N = 2 SE +/- 0.61, N = 2 131.66 130.98 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 a b 50 100 150 200 250 SE +/- 0.61, N = 2 SE +/- 0.19, N = 2 207.24 206.22 1. (CXX) g++ options: -O3
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem a b 40 80 120 160 200 SE +/- 0.13, N = 2 SE +/- 0.02, N = 2 177.78 176.92 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 a b 15 30 45 60 75 68.34 68.01 MAX: 2006.68 MAX: 1606.75
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream b a 80 160 240 320 400 SE +/- 1.63, N = 2 SE +/- 0.15, N = 2 343.52 345.15
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream b a 11 22 33 44 55 SE +/- 0.20, N = 2 SE +/- 0.02, N = 2 46.55 46.33
Stress-NG Test: Fused Multiply-Add OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Fused Multiply-Add a b 7M 14M 21M 28M 35M SE +/- 137631.48, N = 2 SE +/- 285.63, N = 2 34197705.63 34050669.23 1. (CXX) g++ options: -O2 -std=gnu99 -lc
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 a b 6 12 18 24 30 SE +/- 0.03, N = 2 SE +/- 0.07, N = 2 27.51 27.40 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
Stress-NG Test: Function Call OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Function Call b a 5K 10K 15K 20K 25K SE +/- 74.09, N = 2 SE +/- 80.03, N = 2 22106.49 22028.03 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org Average Latency, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 200 b a 7 14 21 28 35 31.69 31.58 MAX: 1610.79 MAX: 1920.32
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 b a 20 40 60 80 100 SE +/- 0.16, N = 2 SE +/- 0.48, N = 2 74.71 74.47 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 a b 300M 600M 900M 1200M 1500M SE +/- 300000.00, N = 2 SE +/- 4400000.00, N = 2 1328100000 1323900000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: NUMA OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: NUMA b a 90 180 270 360 450 SE +/- 0.05, N = 2 SE +/- 0.88, N = 2 392.08 390.87 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Mutex OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Mutex b a 3M 6M 9M 12M 15M SE +/- 2864.48, N = 2 SE +/- 23940.47, N = 2 15192892.59 15147444.51 1. (CXX) g++ options: -O2 -std=gnu99 -lc
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 a b 20 40 60 80 100 SE +/- 1.30, N = 2 SE +/- 0.88, N = 2 85.74 85.49 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.22, N = 2 SE +/- 0.05, N = 2 131.07 131.45
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.22, N = 2 SE +/- 0.05, N = 2 122.04 121.69
Stress-NG Test: Wide Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Wide Vector Math b a 400K 800K 1200K 1600K 2000K SE +/- 4139.63, N = 2 SE +/- 918.08, N = 2 1750003.43 1745029.27 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 a b 200K 400K 600K 800K 1000K 1045806.81 1042859.03
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 b a 400M 800M 1200M 1600M 2000M SE +/- 900000.00, N = 2 SE +/- 550000.00, N = 2 1733700000 1728850000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream b a 90 180 270 360 450 SE +/- 0.12, N = 2 SE +/- 1.01, N = 2 391.91 390.91
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream b a 9 18 27 36 45 SE +/- 0.01, N = 2 SE +/- 0.11, N = 2 40.81 40.91
VVenC Video Input: Bosphorus 4K - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Faster a b 3 6 9 12 15 SE +/- 0.00, N = 2 SE +/- 0.03, N = 2 11.02 10.99 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
Redis 7.0.12 + memtier_benchmark Protocol: Redis - Clients: 50 - Set To Get Ratio: 1:5 OpenBenchmarking.org Ops/sec, More Is Better Redis 7.0.12 + memtier_benchmark 2.0 Protocol: Redis - Clients: 50 - Set To Get Ratio: 1:5 b a 500K 1000K 1500K 2000K 2500K SE +/- 39004.04, N = 2 SE +/- 31848.80, N = 2 2217192.12 2211638.65 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Timed GDB GNU Debugger Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed GDB GNU Debugger Compilation 10.2 Time To Compile a b 10 20 30 40 50 SE +/- 0.06, N = 2 SE +/- 0.12, N = 2 41.91 42.01
Stress-NG Test: Glibc C String Functions OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Glibc C String Functions b a 6M 12M 18M 24M 30M SE +/- 69329.81, N = 2 SE +/- 150617.25, N = 2 26125214.84 26067360.60 1. (CXX) g++ options: -O2 -std=gnu99 -lc
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 b a 7 14 21 28 35 SE +/- 0.01, N = 2 SE +/- 0.03, N = 2 27.84 27.78 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 a b 20 40 60 80 100 SE +/- 0.48, N = 2 SE +/- 0.10, N = 2 75.09 74.93 1. (CXX) g++ options: -O3
OpenFOAM Input: drivaerFastback, Small Mesh Size - Execution Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Execution Time b a 15 30 45 60 75 SE +/- 0.11, N = 2 SE +/- 0.09, N = 2 67.56 67.71 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 700 1400 2100 2800 3500 SE +/- 3.51, N = 2 SE +/- 8.40, N = 2 3233.96 3227.10
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 1.1119 2.2238 3.3357 4.4476 5.5595 SE +/- 0.0056, N = 2 SE +/- 0.0128, N = 2 4.9312 4.9416
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 a b 9 18 27 36 45 SE +/- 0.05, N = 2 SE +/- 0.00, N = 2 40.74 40.66 1. (CXX) g++ options: -O3
Palabos Grid Size: 500 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 500 b a 70 140 210 280 350 SE +/- 1.17, N = 2 SE +/- 1.63, N = 2 300.86 300.28 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 b a 20 40 60 80 100 SE +/- 0.65, N = 2 SE +/- 0.40, N = 2 77.03 76.90 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 b a 20 40 60 80 100 SE +/- 0.06, N = 2 SE +/- 0.36, N = 2 78.96 78.83 1. (CXX) g++ options: -O3
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Mesh Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time a b 30 60 90 120 150 SE +/- 0.01, N = 2 SE +/- 0.08, N = 2 144.70 144.94 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 a b 30 60 90 120 150 SE +/- 0.63, N = 2 SE +/- 0.20, N = 2 141.41 141.19 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream b a 100 200 300 400 500 SE +/- 0.54, N = 2 SE +/- 0.12, N = 2 480.52 479.79
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh b a 50 100 150 200 250 SE +/- 0.18, N = 2 SE +/- 0.24, N = 2 217.19 216.86 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream b a 8 16 24 32 40 SE +/- 0.04, N = 2 SE +/- 0.01, N = 2 33.28 33.33
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only a b 11 22 33 44 55 SE +/- 0.02, N = 2 SE +/- 0.08, N = 2 47.15 47.22
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 b a 30 60 90 120 150 SE +/- 0.33, N = 2 SE +/- 0.00, N = 2 137.74 137.54 1. (CXX) g++ options: -O3
Stress-NG Test: AVL Tree OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: AVL Tree b a 60 120 180 240 300 SE +/- 0.85, N = 2 SE +/- 0.32, N = 2 294.66 294.26 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Palabos Grid Size: 100 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 100 a b 50 100 150 200 250 SE +/- 0.02, N = 2 SE +/- 0.34, N = 2 235.19 234.87 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
Stress-NG Test: Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Floating Point b a 2K 4K 6K 8K 10K SE +/- 17.77, N = 2 SE +/- 1.07, N = 2 10601.10 10587.48 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 32 b a 120M 240M 360M 480M 600M SE +/- 605000.00, N = 2 SE +/- 2065000.00, N = 2 558655000 557945000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 a b 16 32 48 64 80 SE +/- 0.44, N = 2 SE +/- 0.12, N = 2 72.29 72.20 1. (CXX) g++ options: -O3
Stress-NG Test: Malloc OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Malloc a b 20M 40M 60M 80M 100M SE +/- 129754.02, N = 2 SE +/- 83929.32, N = 2 99373474.31 99251227.28 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Hash OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Hash b a 1.2M 2.4M 3.6M 4.8M 6M SE +/- 2865.25, N = 2 SE +/- 3166.95, N = 2 5583978.14 5577252.32 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream b a 20 40 60 80 100 SE +/- 0.04, N = 2 SE +/- 0.03, N = 2 76.47 76.56
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 a b 6 12 18 24 30 SE +/- 0.01, N = 2 SE +/- 0.06, N = 2 27.42 27.39 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 200 400 600 800 1000 SE +/- 1.01, N = 2 SE +/- 0.57, N = 2 1075.96 1074.82
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 110 220 330 440 550 SE +/- 0.12, N = 2 SE +/- 0.18, N = 2 505.13 504.61
Timed LLVM Compilation Build System: Ninja OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 16.0 Build System: Ninja b a 60 120 180 240 300 SE +/- 0.15, N = 2 SE +/- 0.15, N = 2 262.88 263.15
Stress-NG Test: Pthread OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Pthread a b 30K 60K 90K 120K 150K SE +/- 971.78, N = 2 SE +/- 102.07, N = 2 136846.01 136709.81 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only b a 14 28 42 56 70 SE +/- 0.20, N = 2 SE +/- 0.08, N = 2 64.01 64.07
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 b a 10 20 30 40 50 SE +/- 0.02, N = 2 SE +/- 0.04, N = 2 44.01 43.97 1. (CXX) g++ options: -O3
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Execution Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time b a 130 260 390 520 650 SE +/- 0.03, N = 2 SE +/- 0.42, N = 2 615.46 615.99 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 4 8 12 16 20 SE +/- 0.01, N = 2 SE +/- 0.01, N = 2 14.85 14.86
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream b a 7 14 21 28 35 SE +/- 0.01, N = 2 SE +/- 0.01, N = 2 31.65 31.68
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 8.1.9 Time To Compile a b 10 20 30 40 50 SE +/- 0.34, N = 2 SE +/- 0.48, N = 2 42.35 42.38
Stress-NG Test: MEMFD OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: MEMFD a b 120 240 360 480 600 SE +/- 1.31, N = 2 SE +/- 1.20, N = 2 549.94 549.55 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 b a 200M 400M 600M 800M 1000M SE +/- 85000.00, N = 2 SE +/- 25000.00, N = 2 847675000 847085000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Context Switching OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Context Switching a b 600K 1200K 1800K 2400K 3000K SE +/- 678.57, N = 2 SE +/- 604.17, N = 2 2572801.75 2571092.69 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream b a 8 16 24 32 40 SE +/- 0.12, N = 2 SE +/- 0.06, N = 2 34.55 34.53
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream b a 100 200 300 400 500 SE +/- 0.02, N = 2 SE +/- 0.05, N = 2 479.22 478.91
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream b a 8 16 24 32 40 SE +/- 0.00, N = 2 SE +/- 0.00, N = 2 33.37 33.39
Stress-NG Test: Poll OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Poll b a 800K 1600K 2400K 3200K 4000K SE +/- 1953.54, N = 2 SE +/- 2536.76, N = 2 3671617.97 3669281.69 1. (CXX) g++ options: -O2 -std=gnu99 -lc
VVenC Video Input: Bosphorus 1080p - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Faster a b 7 14 21 28 35 SE +/- 0.06, N = 2 SE +/- 0.04, N = 2 30.95 30.93 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
Stress-NG Test: Memory Copying OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Memory Copying b a 1500 3000 4500 6000 7500 SE +/- 11.04, N = 2 SE +/- 8.71, N = 2 7180.43 7176.19 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenFOAM Input: drivaerFastback, Small Mesh Size - Mesh Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Mesh Time b a 7 14 21 28 35 SE +/- 0.05, N = 2 SE +/- 0.02, N = 2 27.95 27.97 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
Stress-NG Test: Matrix 3D Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Matrix 3D Math b a 2K 4K 6K 8K 10K SE +/- 4.08, N = 2 SE +/- 34.45, N = 2 9605.30 9599.93 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Forking OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Forking b a 20K 40K 60K 80K 100K SE +/- 421.24, N = 2 SE +/- 469.20, N = 2 89966.29 89918.21 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream b a 50 100 150 200 250 SE +/- 0.05, N = 2 SE +/- 0.10, N = 2 208.99 208.90
Stress-NG Test: Glibc Qsort Data Sorting OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Glibc Qsort Data Sorting b a 150 300 450 600 750 SE +/- 0.46, N = 2 SE +/- 0.40, N = 2 696.92 696.65 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Zlib OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Zlib b a 600 1200 1800 2400 3000 SE +/- 0.65, N = 2 SE +/- 0.06, N = 2 2648.81 2647.81 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: System V Message Passing OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: System V Message Passing b a 1.3M 2.6M 3.9M 5.2M 6.5M SE +/- 9802.94, N = 2 SE +/- 7174.98, N = 2 5854201.78 5852281.71 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: CPU-Only a b 110 220 330 440 550 SE +/- 0.22, N = 2 SE +/- 0.42, N = 2 493.45 493.61
Timed Linux Kernel Compilation Build: defconfig OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 6.1 Build: defconfig a b 9 18 27 36 45 SE +/- 0.72, N = 2 SE +/- 0.69, N = 2 40.44 40.45
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 a b 16 32 48 64 80 SE +/- 0.21, N = 2 SE +/- 0.00, N = 2 72.56 72.54 1. (CXX) g++ options: -O3
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Math b a 30K 60K 90K 120K 150K SE +/- 5.98, N = 2 SE +/- 47.16, N = 2 151431.15 151386.31 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 a b 300M 600M 900M 1200M 1500M SE +/- 300000.00, N = 2 SE +/- 450000.00, N = 2 1577300000 1576850000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 a b 110M 220M 330M 440M 550M SE +/- 385000.00, N = 2 SE +/- 800000.00, N = 2 513135000 513040000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Vector Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Floating Point a b 12K 24K 36K 48K 60K SE +/- 30.71, N = 2 SE +/- 4.11, N = 2 58243.38 58232.70 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: CPU-Only b a 30 60 90 120 150 SE +/- 0.13, N = 2 SE +/- 0.05, N = 2 127.76 127.78
Stress-NG Test: CPU Stress OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: CPU Stress b a 14K 28K 42K 56K 70K SE +/- 38.95, N = 2 SE +/- 12.73, N = 2 64118.87 64111.11 1. (CXX) g++ options: -O2 -std=gnu99 -lc
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 a b 20 40 60 80 100 SE +/- 0.01, N = 2 SE +/- 0.11, N = 2 76.61 76.60 1. (CXX) g++ options: -O3
Stress-NG Test: Crypto OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Crypto b a 11K 22K 33K 44K 55K SE +/- 18.13, N = 2 SE +/- 3.65, N = 2 50243.48 50240.09 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream b a 100 200 300 400 500 SE +/- 2.44, N = 2 SE +/- 0.42, N = 2 460.76 460.78
Stress-NG Test: x86_64 RdRand OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: x86_64 RdRand b a 70K 140K 210K 280K 350K SE +/- 1.14, N = 2 SE +/- 2.35, N = 2 331423.04 331416.52 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Vector Shuffle OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Shuffle a b 40K 80K 120K 160K 200K SE +/- 6.63, N = 2 SE +/- 6.04, N = 2 167204.21 167202.07 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Timed Linux Kernel Compilation Build: allmodconfig OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 6.1 Build: allmodconfig b a 100 200 300 400 500 SE +/- 1.13, N = 2 SE +/- 1.46, N = 2 445.38 445.39
BRL-CAD VGR Performance Metric OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.36 VGR Performance Metric a 100K 200K 300K 400K 500K SE +/- 3768.50, N = 2 466686 1. (CXX) g++ options: -std=c++14 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -ltcl8.6 -lregex_brl -lz_brl -lnetpbm -ldl -lm -ltk8.6
Apache Cassandra Test: Writes OpenBenchmarking.org Op/s, More Is Better Apache Cassandra 4.1.3 Test: Writes a 30K 60K 90K 120K 150K SE +/- 803.50, N = 2 155626
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only a 40 80 120 160 200 SE +/- 0.04, N = 2 159.94
Phoronix Test Suite v10.8.5