Benchmarks by Michael Larabel for a future article.
G242-P36 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vProcessor Notes: Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
gig dd Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 16 x 32 GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE, Disk: 800GB Micron_7450_MTFDKBA800TFS, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Intel I350
OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080
Gigabyte G242-P36 Ampere Altra Max Server OpenBenchmarking.org Phoronix Test Suite ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores) GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP Ampere Computing LLC Altra PCI Root Complex A 16 x 32 GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE 800GB Micron_7450_MTFDKBA800TFS ASPEED VGA HDMI 2 x Intel I350 Ubuntu 23.10 6.5.0-13-generic (aarch64) GCC 13.2.0 ext4 1920x1080 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Compiler File-System Screen Resolution Gigabyte G242-P36 Ampere Altra Max Server Benchmarks System Logs - Transparent Huge Pages: madvise - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) - Python 3.11.6 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
G242-P36 gig dd Result Overview Phoronix Test Suite 100% 107% 114% 121% Stockfish Llama.cpp LeelaChessZero Quicksilver RocksDB Timed Linux Kernel Compilation Stress-NG Timed LLVM Compilation Speedb Neural Magic DeepSparse 7-Zip Compression OpenSSL CacheBench
Gigabyte G242-P36 Ampere Altra Max Server pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 1 - ResNet-50 xmrig: Wownero - 1M speedb: Seq Fill xmrig: Monero - 1M deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream build-llvm: Unix Makefiles lczero: BLAS lczero: Eigen deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream quicksilver: CTS2 build-linux-kernel: allmodconfig stress-ng: Atomic build-llvm: Ninja llama-cpp: llama-2-70b-chat.Q5_0.gguf stockfish: Total Time openssl: ChaCha20-Poly1305 openssl: ChaCha20 openssl: AES-256-GCM speedb: Read While Writing rocksdb: Rand Read quicksilver: CORAL2 P2 openssl: AES-128-GCM openssl: SHA256 openssl: SHA512 speedb: Rand Read rocksdb: Read While Writing llama-cpp: llama-2-13b.Q4_0.gguf cachebench: Read cachebench: Read / Modify / Write cachebench: Write rocksdb: Read Rand Write Rand stress-ng: Futex stress-ng: Context Switching deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream amg: build-linux-kernel: defconfig deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream gromacs: MPI CPU - water_GMX50_bare stress-ng: MEMFD speedb: Rand Fill speedb: Rand Fill Sync speedb: Update Rand speedb: Read Rand Write Rand rocksdb: Update Rand openssl: RSA4096 openssl: RSA4096 deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream quicksilver: CORAL2 P1 deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream compress-7zip: Decompression Rating compress-7zip: Compression Rating stress-ng: IO_uring stress-ng: MMAP stress-ng: Cloning stress-ng: Malloc stress-ng: CPU Cache stress-ng: Pthread stress-ng: Zlib stress-ng: Vector Shuffle stress-ng: Vector Math stress-ng: Wide Vector Math stress-ng: Matrix Math stress-ng: Function Call stress-ng: Matrix 3D Math stress-ng: CPU Stress stress-ng: AVL Tree stress-ng: Crypto stress-ng: Fused Multiply-Add stress-ng: Hash stress-ng: SENDFILE stress-ng: AVX-512 VNNI stress-ng: Glibc Qsort Data Sorting stress-ng: Vector Floating Point stress-ng: Floating Point stress-ng: Poll stress-ng: Glibc C String Functions stress-ng: System V Message Passing stress-ng: Forking stress-ng: Memory Copying stress-ng: Semaphores stress-ng: Mutex stress-ng: Mixed Scheduler stress-ng: NUMA stress-ng: Pipe stress-ng: Socket Activity llama-cpp: llama-2-7b.Q4_0.gguf minife: Small mt-dgemm: Sustained Floating-Point Rate G242-P36 gig dd 0.30 0.67 1.83 0.68 1.91 1935.2 295079 4201.7 23.5004 2677.0708 411.521 62 48 1358.1773 45.6788 16203333 308.297 7.29 266.333 3.07 188653177 112213448840 161732226070 306487842680 12905035 434052355 25543333 382688207300 101322961753 34478769590 409571625 8558845 13.90 11438.276516 45034.976156 38239.970730 3320337 343012.75 20365273.28 1320.1354 47.0250 1057064333 78.703 146.7462 430.1375 4.588 574.85 284987 207376 272275 2419683 431406 517886.0 6342.8 55.5703 1137.781 1830.5760 33.7473 1834.5799 33.6229 310.8403 202.2279 25273333 314.1452 200.0280 185.3571 339.9765 132.3047 476.3781 132.1032 477.8141 537647 333316 604943.76 1088.77 7795.96 164364343.39 879814.35 113551.87 5987.88 86218.95 398869.87 2346519.63 681885.30 72283.18 5099.81 33761.08 299.50 252315.26 151220570.51 15671801.48 1624492.92 4690386.64 2020.18 102535.35 22213.54 7330369.96 62783286.48 21143237.72 52250.53 27153.74 167637763.59 37172432.66 36794.33 1419.06 30330081.18 28009.07 21.58 23996.0 17.784983 290059 24.0125 2624.7719 408.271 59 47 1334.5433 46.7273 16460000 309.477 5.64 267.86 3.13 177653916 112250396400 161791663040 306544534870 13255341 450500912 25520000 382856328260 100039593750 34453399030 418448304 8516060 14.02 11438.666161 45027.472701 38251.591924 3449038 323012.96 19654874.85 1326.9581 46.5531 1060136000 80.078 149.4774 421.345 4.688 576.53 278264 204410 264998 2518519 427908 518115.9 6345.6 55.4233 1141.451 1830.7174 33.869 1832.1154 33.5823 310.6422 202.6332 25810000 315.8962 198.9064 185.8675 339.5239 132.3228 477.0964 133.4484 472.0699 541204 333057 612149.93 1104.19 7312.78 164067515.18 882510.28 112993.15 5993.74 86375.79 398993.46 2355564.94 682490.75 72298.23 5082.65 33765.26 299.1 251986.12 151387869.76 15654462.92 1624969.46 4691697.85 2022.01 102553.11 22219.8 7392099.82 62867317.16 21054213.79 50130.97 27162.14 167850957.68 37215286.04 36309.29 1416.03 29805509.12 27959.85 21.9 24150.7 18.27275 285766 23.4209 2684.8341 407.19 60 48 1336.3924 46.4998 16430000 310.137 6.8 264.744 3.14 226859548 13785530 404291813 24460000 382793028680 101321237450 34448701700 420437471 8636563 14.11 11438.863847 45041.154853 38252.62844 3537322 318037.93 20708288.98 1327.9962 46.4194 80.243 145.2837 433.8593 569.36 285316 207891 264748 2473336 443804 518085.7 6345.3 55.7222 1135.4365 1843.2396 33.1592 1850.2264 33.2422 311.5325 201.7554 25510000 316.9118 198.3147 183.6437 343.7639 132.2627 477.6899 130.6575 483.308 541552 331579 583751.83 1092.25 6918.49 164592319.96 882225.34 113379.28 5985.69 86257.77 399042.09 2354926.97 682554.33 72290.81 5089.19 33559.87 299.99 251996.36 151037296.46 15654282.58 1624702.09 4692452.8 2020.3 102604.74 22220.7 7395099.64 62845443.53 21119614.31 50686.58 27159.07 166379337.67 37267646.91 36361.29 1426.45 30776841.73 27536.79 26.64 OpenBenchmarking.org
PyTorch This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l G242-P36 0.0675 0.135 0.2025 0.27 0.3375 SE +/- 0.00, N = 3 0.30 MIN: 0.27 / MAX: 0.4
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 G242-P36 0.1508 0.3016 0.4524 0.6032 0.754 SE +/- 0.00, N = 2 0.67 MIN: 0.65 / MAX: 0.7
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 G242-P36 0.4118 0.8236 1.2354 1.6472 2.059 SE +/- 0.02, N = 5 1.83 MIN: 1.7 / MAX: 2.02
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 G242-P36 0.153 0.306 0.459 0.612 0.765 SE +/- 0.00, N = 3 0.68 MIN: 0.65 / MAX: 0.7
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 G242-P36 0.4298 0.8596 1.2894 1.7192 2.149 SE +/- 0.00, N = 3 1.91 MIN: 1.8 / MAX: 2.09
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.21 Variant: Wownero - Hash Count: 1M G242-P36 400 800 1200 1600 2000 SE +/- 2.92, N = 3 1935.2 1. (CXX) g++ options: -fexceptions -fno-rtti -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Speedb Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Sequential Fill G242-P36 dd gig 60K 120K 180K 240K 300K SE +/- 3101.60, N = 5 295079 285766 290059 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.21 Variant: Monero - Hash Count: 1M G242-P36 900 1800 2700 3600 4500 SE +/- 17.55, N = 3 4201.7 1. (CXX) g++ options: -fexceptions -fno-rtti -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Quicksilver Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CTS2 G242-P36 dd gig 4M 8M 12M 16M 20M SE +/- 42557.15, N = 3 16203333 16430000 16460000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Llama.cpp Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf G242-P36 dd gig 0.7065 1.413 2.1195 2.826 3.5325 SE +/- 0.03, N = 8 3.07 3.14 3.13 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -lopenblas
Stockfish This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 15 Total Time G242-P36 dd gig 50M 100M 150M 200M 250M SE +/- 6857171.33, N = 15 188653177 226859548 177653916 1. (CXX) g++ options: -lgcov -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -flto -flto=jobserver
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20-Poly1305 G242-P36 gig 20000M 40000M 60000M 80000M 100000M SE +/- 361309.16, N = 3 112213448840 112250396400 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20 G242-P36 gig 30000M 60000M 90000M 120000M 150000M SE +/- 10001054.79, N = 3 161732226070 161791663040 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-256-GCM G242-P36 gig 70000M 140000M 210000M 280000M 350000M SE +/- 40660594.45, N = 3 306487842680 306544534870 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
Speedb Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read While Writing G242-P36 dd gig 3M 6M 9M 12M 15M SE +/- 201662.23, N = 15 12905035 13785530 13255341 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better RocksDB 8.0 Test: Random Read G242-P36 dd gig 100M 200M 300M 400M 500M SE +/- 4162622.50, N = 15 434052355 404291813 450500912 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Quicksilver Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 G242-P36 dd gig 5M 10M 15M 20M 25M SE +/- 84129.53, N = 3 25543333 24460000 25520000 1. (CXX) g++ options: -fopenmp -O3 -march=native
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-128-GCM G242-P36 dd gig 80000M 160000M 240000M 320000M 400000M SE +/- 3586455.40, N = 3 382688207300 382793028680 382856328260 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA256 G242-P36 dd gig 20000M 40000M 60000M 80000M 100000M SE +/- 64411674.99, N = 3 101322961753 101321237450 100039593750 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA512 G242-P36 dd gig 7000M 14000M 21000M 28000M 35000M SE +/- 8688088.34, N = 3 34478769590 34448701700 34453399030 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
Speedb Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read G242-P36 dd gig 90M 180M 270M 360M 450M SE +/- 2947408.87, N = 11 409571625 420437471 418448304 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better RocksDB 8.0 Test: Read While Writing G242-P36 dd gig 2M 4M 6M 8M 10M SE +/- 68677.29, N = 9 8558845 8636563 8516060 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Llama.cpp Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf G242-P36 dd gig 4 8 12 16 20 SE +/- 0.16, N = 15 13.90 14.11 14.02 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -lopenblas
CacheBench This is a performance test of CacheBench, which is part of LLCbench. CacheBench is designed to test the memory and cache bandwidth performance Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read G242-P36 dd gig 2K 4K 6K 8K 10K SE +/- 0.01, N = 3 11438.28 11438.86 11438.67 MIN: 11437.32 / MAX: 11438.59 MIN: 11438.05 / MAX: 11439.05 MIN: 11438.33 / MAX: 11438.85 1. (CC) gcc options: -O3 -lrt
OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write G242-P36 dd gig 10K 20K 30K 40K 50K SE +/- 2.04, N = 3 45034.98 45041.15 45027.47 MIN: 43692.22 / MAX: 45639.26 MIN: 43693.38 / MAX: 45647.65 MIN: 43694.36 / MAX: 45640.07 1. (CC) gcc options: -O3 -lrt
OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write G242-P36 dd gig 8K 16K 24K 32K 40K SE +/- 1.22, N = 3 38239.97 38252.63 38251.59 MIN: 35288.52 / MAX: 41382 MIN: 35291.37 / MAX: 41384.3 MIN: 35289.91 / MAX: 41383.99 1. (CC) gcc options: -O3 -lrt
RocksDB This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better RocksDB 8.0 Test: Read Random Write Random G242-P36 dd gig 800K 1600K 2400K 3200K 4000K SE +/- 30568.75, N = 7 3320337 3537322 3449038 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Context Switching G242-P36 dd gig 4M 8M 12M 16M 20M SE +/- 174052.70, N = 15 20365273.28 20708288.98 19654874.85 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Algebraic Multi-Grid Benchmark AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 G242-P36 gig 200M 400M 600M 800M 1000M SE +/- 47484.50, N = 3 1057064333 1060136000 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2023 Implementation: MPI CPU - Input: water_GMX50_bare G242-P36 gig 1.0548 2.1096 3.1644 4.2192 5.274 SE +/- 0.002, N = 3 4.588 4.688 1. (CXX) g++ options: -O3
Speedb Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill G242-P36 dd gig 60K 120K 180K 240K 300K SE +/- 1985.22, N = 3 284987 285316 278264 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill Sync G242-P36 dd gig 40K 80K 120K 160K 200K SE +/- 1986.97, N = 3 207376 207891 204410 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random G242-P36 dd gig 60K 120K 180K 240K 300K SE +/- 1573.56, N = 3 272275 264748 264998 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random G242-P36 dd gig 500K 1000K 1500K 2000K 2500K SE +/- 21596.32, N = 3 2419683 2473336 2518519 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
RocksDB This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better RocksDB 8.0 Test: Update Random G242-P36 dd gig 100K 200K 300K 400K 500K SE +/- 4409.44, N = 3 431406 443804 427908 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 G242-P36 dd gig 110K 220K 330K 440K 550K SE +/- 27.21, N = 3 517886.0 518085.7 518115.9 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 G242-P36 dd gig 1400 2800 4200 5600 7000 SE +/- 0.10, N = 3 6342.8 6345.3 6345.6 1. (CC) gcc options: -pthread -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream G242-P36 dd gig 200 400 600 800 1000 SE +/- 1.48, N = 3 1137.78 1135.44 1141.45
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream G242-P36 dd gig 400 800 1200 1600 2000 SE +/- 0.45, N = 3 1830.58 1843.24 1830.72
OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream G242-P36 dd gig 400 800 1200 1600 2000 SE +/- 1.29, N = 3 1834.58 1850.23 1832.12
Quicksilver Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 G242-P36 dd gig 6M 12M 18M 24M 30M SE +/- 81103.50, N = 3 25273333 25510000 25810000 1. (CXX) g++ options: -fopenmp -O3 -march=native
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Malloc G242-P36 dd gig 40M 80M 120M 160M 200M SE +/- 296218.44, N = 3 164364343.39 164592319.96 164067515.18 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: CPU Cache G242-P36 dd gig 200K 400K 600K 800K 1000K SE +/- 1033.74, N = 3 879814.35 882225.34 882510.28 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Pthread G242-P36 dd gig 20K 40K 60K 80K 100K SE +/- 65.20, N = 3 113551.87 113379.28 112993.15 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Zlib G242-P36 dd gig 1300 2600 3900 5200 6500 SE +/- 0.87, N = 3 5987.88 5985.69 5993.74 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Vector Shuffle G242-P36 dd gig 20K 40K 60K 80K 100K SE +/- 3.20, N = 3 86218.95 86257.77 86375.79 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Vector Math G242-P36 dd gig 90K 180K 270K 360K 450K SE +/- 4.53, N = 3 398869.87 399042.09 398993.46 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Wide Vector Math G242-P36 dd gig 500K 1000K 1500K 2000K 2500K SE +/- 6960.54, N = 3 2346519.63 2354926.97 2355564.94 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Matrix Math G242-P36 dd gig 150K 300K 450K 600K 750K SE +/- 404.39, N = 3 681885.30 682554.33 682490.75 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Function Call G242-P36 dd gig 15K 30K 45K 60K 75K SE +/- 1.53, N = 3 72283.18 72290.81 72298.23 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Matrix 3D Math G242-P36 dd gig 1100 2200 3300 4400 5500 SE +/- 3.74, N = 3 5099.81 5089.19 5082.65 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: CPU Stress G242-P36 dd gig 7K 14K 21K 28K 35K SE +/- 1.60, N = 3 33761.08 33559.87 33765.26 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Crypto G242-P36 dd gig 50K 100K 150K 200K 250K SE +/- 928.63, N = 3 252315.26 251996.36 251986.12 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Fused Multiply-Add G242-P36 dd gig 30M 60M 90M 120M 150M SE +/- 110268.18, N = 3 151220570.51 151037296.46 151387869.76 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Hash G242-P36 dd gig 3M 6M 9M 12M 15M SE +/- 9429.94, N = 3 15671801.48 15654282.58 15654462.92 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: SENDFILE G242-P36 dd gig 300K 600K 900K 1200K 1500K SE +/- 18.53, N = 3 1624492.92 1624702.09 1624969.46 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: AVX-512 VNNI G242-P36 dd gig 1000K 2000K 3000K 4000K 5000K SE +/- 401.84, N = 3 4690386.64 4692452.80 4691697.85 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Glibc Qsort Data Sorting G242-P36 dd gig 400 800 1200 1600 2000 SE +/- 0.78, N = 3 2020.18 2020.30 2022.01 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Vector Floating Point G242-P36 dd gig 20K 40K 60K 80K 100K SE +/- 25.89, N = 3 102535.35 102604.74 102553.11 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Floating Point G242-P36 dd gig 5K 10K 15K 20K 25K SE +/- 0.42, N = 3 22213.54 22220.70 22219.80 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Poll G242-P36 dd gig 1.6M 3.2M 4.8M 6.4M 8M SE +/- 12697.25, N = 3 7330369.96 7395099.64 7392099.82 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Glibc C String Functions G242-P36 dd gig 13M 26M 39M 52M 65M SE +/- 17918.08, N = 3 62783286.48 62845443.53 62867317.16 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: System V Message Passing G242-P36 dd gig 5M 10M 15M 20M 25M SE +/- 32907.24, N = 3 21143237.72 21119614.31 21054213.79 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Forking G242-P36 dd gig 11K 22K 33K 44K 55K SE +/- 410.62, N = 3 52250.53 50686.58 50130.97 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Memory Copying G242-P36 dd gig 6K 12K 18K 24K 30K SE +/- 1.16, N = 3 27153.74 27159.07 27162.14 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Semaphores G242-P36 dd gig 40M 80M 120M 160M 200M SE +/- 217685.76, N = 3 167637763.59 166379337.67 167850957.68 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Mutex G242-P36 dd gig 8M 16M 24M 32M 40M SE +/- 9463.26, N = 3 37172432.66 37267646.91 37215286.04 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Mixed Scheduler G242-P36 dd gig 8K 16K 24K 32K 40K SE +/- 141.59, N = 3 36794.33 36361.29 36309.29 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Pipe G242-P36 dd gig 7M 14M 21M 28M 35M SE +/- 95784.06, N = 3 30330081.18 30776841.73 29805509.12 1. (CXX) g++ options: -O2 -std=gnu99 -lc
OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Socket Activity G242-P36 dd gig 6K 12K 18K 24K 30K SE +/- 159.43, N = 3 28009.07 27536.79 27959.85 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Llama.cpp Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf G242-P36 dd gig 6 12 18 24 30 SE +/- 0.21, N = 6 21.58 26.64 21.90 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -lopenblas
G242-P36 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vProcessor Notes: Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 16 January 2024 23:01 by user phoronix.
gig Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vProcessor Notes: Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 17 January 2024 18:09 by user phoronix.
dd Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 16 x 32 GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE, Disk: 800GB Micron_7450_MTFDKBA800TFS, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Intel I350
OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -vProcessor Notes: Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 17 January 2024 20:45 by user phoronix.