AmpereOne A192-32X vs. AWS Graviton4 CPU Performance Benchmarks AmpereOne versus AWS Graviton4 ARM64 CPU benchmarks by Michael Larabel for a future article. AmpereOne A192-32X: Processor: AmpereOne @ 3.20GHz (192 Cores), Motherboard: Supermicro ARS-211M-NR R13SPD v1.02 (T20240726102529 BIOS), Chipset: Ampere Computing LLC Device e208, Memory: 8 x 64GB DDR5-5200MT/s, Disk: 3841GB SAMSUNG MZQL23T8HCLS-00A07 + 960GB SAMSUNG MZ1L2960HCJR-00A07, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Broadcom BCM57414 NetXtreme-E 10Gb/25Gb + 2 x Mellanox MT2892 OS: Ubuntu 24.04, Kernel: 6.8.0-39-generic-64k (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080 Graviton4 192 vCPUs: Processor: ARMv8 Neoverse-V2 (192 Cores), Motherboard: Amazon EC2 r8g.48xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 1520GB, Disk: 429GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Ubuntu 24.04, Kernel: 6.8.0-41-generic-64k (aarch64), Compiler: GCC 13.2.0, File-System: ext4, System Layer: amazon 7-Zip Compression 22.01 Test: Compression Rating MIPS > Higher Is Better AmpereOne A192-32X .. 756681 |====================================== Graviton4 192 vCPUs . 969091 |================================================= 7-Zip Compression 22.01 Test: Decompression Rating MIPS > Higher Is Better AmpereOne A192-32X .. 877305 |================================================= Graviton4 192 vCPUs . 874238 |================================================= Algebraic Multi-Grid Benchmark 1.2 Figure Of Merit > Higher Is Better AmpereOne A192-32X .. 1828265333 |============== Graviton4 192 vCPUs . 5820992333 |============================================= ASKAP 1.0 Test: tConvolve MPI - Degridding Mpix/sec > Higher Is Better AmpereOne A192-32X .. 20809.2 |=================== Graviton4 192 vCPUs . 53520.6 |================================================ ASKAP 1.0 Test: tConvolve MPI - Gridding Mpix/sec > Higher Is Better AmpereOne A192-32X .. 18414.7 |================== Graviton4 192 vCPUs . 49717.8 |================================================ ASTC Encoder 4.7 Preset: Thorough MT/s > Higher Is Better AmpereOne A192-32X .. 50.02 |========================== Graviton4 192 vCPUs . 94.49 |================================================== ASTC Encoder 4.7 Preset: Very Thorough MT/s > Higher Is Better AmpereOne A192-32X .. 7.3748 |========================= Graviton4 192 vCPUs . 13.9615 |================================================ ASTC Encoder 4.7 Preset: Exhaustive MT/s > Higher Is Better AmpereOne A192-32X .. 4.5930 |========================== Graviton4 192 vCPUs . 8.6372 |================================================= ClickHouse 22.12.3.5 100M Rows Hits Dataset, First Run / Cold Cache Queries Per Minute, Geo Mean > Higher Is Better AmpereOne A192-32X .. 402.87 |============================ Graviton4 192 vCPUs . 713.11 |================================================= ClickHouse 22.12.3.5 100M Rows Hits Dataset, Second Run Queries Per Minute, Geo Mean > Higher Is Better AmpereOne A192-32X .. 400.27 |=========================== Graviton4 192 vCPUs . 728.98 |================================================= ClickHouse 22.12.3.5 100M Rows Hits Dataset, Third Run Queries Per Minute, Geo Mean > Higher Is Better AmpereOne A192-32X .. 406.36 |============================ Graviton4 192 vCPUs . 724.05 |================================================= CloverLeaf 1.3 Input: clover_bm64_short Seconds < Lower Is Better AmpereOne A192-32X .. 39.23 |================================================== Graviton4 192 vCPUs . 22.06 |============================ CloverLeaf 1.3 Input: clover_bm16 Seconds < Lower Is Better AmpereOne A192-32X .. 349.49 |================================================= Graviton4 192 vCPUs . 323.86 |============================================= Coremark 1.0 CoreMark Size 666 - Iterations Per Second Iterations/Sec > Higher Is Better AmpereOne A192-32X .. 4561177.34 |========================================= Graviton4 192 vCPUs . 5066982.50 |============================================= GPAW 23.6 Input: Carbon Nanotube Seconds < Lower Is Better AmpereOne A192-32X .. 41.06 |================================================== Graviton4 192 vCPUs . 28.58 |=================================== GraphicsMagick 1.3.43 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better AmpereOne A192-32X .. 245 |===================== Graviton4 192 vCPUs . 602 |==================================================== GraphicsMagick 1.3.43 Operation: Enhanced Iterations Per Minute > Higher Is Better AmpereOne A192-32X .. 370 |=========================== Graviton4 192 vCPUs . 715 |==================================================== GraphicsMagick 1.3.43 Operation: Sharpen Iterations Per Minute > Higher Is Better AmpereOne A192-32X .. 469 |============================== Graviton4 192 vCPUs . 817 |==================================================== GraphicsMagick 1.3.43 Operation: Swirl Iterations Per Minute > Higher Is Better AmpereOne A192-32X .. 953 |=============================== Graviton4 192 vCPUs . 1575 |=================================================== GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better AmpereOne A192-32X .. 7.039 |=========================== Graviton4 192 vCPUs . 12.708 |================================================= Helsing 1.0-beta Digit Range: 14 digit Seconds < Lower Is Better AmpereOne A192-32X .. 30.47 |================================================== Graviton4 192 vCPUs . 26.48 |=========================================== High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 GFLOP/s > Higher Is Better AmpereOne A192-32X .. 33.36 |============== Graviton4 192 vCPUs . 114.95 |================================================= John The Ripper 2023.03.14 Test: Blowfish Real C/S > Higher Is Better AmpereOne A192-32X .. 176306 |================================================= Graviton4 192 vCPUs . 139322 |======================================= John The Ripper 2023.03.14 Test: bcrypt Real C/S > Higher Is Better AmpereOne A192-32X .. 174048 |================================================= Graviton4 192 vCPUs . 139374 |======================================= LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein ns/day > Higher Is Better AmpereOne A192-32X .. 50.54 |================================ Graviton4 192 vCPUs . 78.55 |================================================== LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms ns/day > Higher Is Better AmpereOne A192-32X .. 53.86 |=================================== Graviton4 192 vCPUs . 76.53 |================================================== Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better AmpereOne A192-32X .. 2983900000 |===================== Graviton4 192 vCPUs . 6268333333 |============================================= LULESH 2.0.3 z/s > Higher Is Better AmpereOne A192-32X .. 41890.41 |================= Graviton4 192 vCPUs . 114373.35 |============================================== m-queens 1.2 Time To Solve Seconds < Lower Is Better AmpereOne A192-32X .. 5.357 |=============================================== Graviton4 192 vCPUs . 5.740 |================================================== Memcached 1.6.19 Set To Get Ratio: 1:100 Ops/sec > Higher Is Better AmpereOne A192-32X .. 3895708.19 |========================================== Graviton4 192 vCPUs . 4205545.72 |============================================= miniFE 2.2 Problem Size: Small CG Mflops > Higher Is Better AmpereOne A192-32X .. 38349.2 |=================================== Graviton4 192 vCPUs . 53285.0 |================================================ NAS Parallel Benchmarks 3.4 Test / Class: EP.D Total Mop/s > Higher Is Better AmpereOne A192-32X .. 7557.72 |=================================== Graviton4 192 vCPUs . 10218.17 |=============================================== NAS Parallel Benchmarks 3.4 Test / Class: SP.C Total Mop/s > Higher Is Better AmpereOne A192-32X .. 33612.41 |================================ Graviton4 192 vCPUs . 49186.02 |=============================================== NAS Parallel Benchmarks 3.4 Test / Class: IS.D Total Mop/s > Higher Is Better AmpereOne A192-32X .. 2417.04 |============================ Graviton4 192 vCPUs . 4203.95 |================================================ Numpy Benchmark Score > Higher Is Better AmpereOne A192-32X .. 313.43 |================================== Graviton4 192 vCPUs . 449.08 |================================================= NWChem 7.0.2 Input: C240 Buckyball Seconds < Lower Is Better AmpereOne A192-32X .. 2293.2 |================================================= Graviton4 192 vCPUs . 1576.3 |================================== OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Mesh Time Seconds < Lower Is Better AmpereOne A192-32X .. 28.42 |================================================== Graviton4 192 vCPUs . 17.30 |============================== OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Execution Time Seconds < Lower Is Better AmpereOne A192-32X .. 32.36 |================================================== Graviton4 192 vCPUs . 22.62 |=================================== OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time Seconds < Lower Is Better AmpereOne A192-32X .. 152.18 |================================================= Graviton4 192 vCPUs . 83.28 |=========================== OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time Seconds < Lower Is Better AmpereOne A192-32X .. 315.66 |================================================= Graviton4 192 vCPUs . 110.33 |================= Parallel BZIP2 Compression 1.1.13 FreeBSD-13.0-RELEASE-amd64-memstick.img Compression Seconds < Lower Is Better AmpereOne A192-32X .. 1.644485 |=============================================== Graviton4 192 vCPUs . 1.526932 |============================================ Pennant 1.0.1 Test: leblancbig Hydro Cycle Time - Seconds < Lower Is Better AmpereOne A192-32X .. 3.349439 |=============================================== Graviton4 192 vCPUs . 1.314359 |================== Pennant 1.0.1 Test: sedovbig Hydro Cycle Time - Seconds < Lower Is Better AmpereOne A192-32X .. 4.290586 |=============================================== Graviton4 192 vCPUs . 1.742032 |=================== PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only TPS > Higher Is Better AmpereOne A192-32X .. 2746730 |================================================ Graviton4 192 vCPUs . 2396138 |========================================== PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency ms < Lower Is Better AmpereOne A192-32X .. 0.364 |============================================ Graviton4 192 vCPUs . 0.417 |================================================== Primesieve 12.1 Length: 1e13 Seconds < Lower Is Better AmpereOne A192-32X .. 14.01 |=============================================== Graviton4 192 vCPUs . 14.75 |================================================== PyBench 2018-02-16 Total For Average Test Times Milliseconds < Lower Is Better AmpereOne A192-32X .. 1246 |=================================================== Graviton4 192 vCPUs . 845 |=================================== PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 batches/sec > Higher Is Better AmpereOne A192-32X .. 19.98 |================================================== QMCPACK 3.17.1 Input: Li2_STO_ae Total Execution Time - Seconds < Lower Is Better AmpereOne A192-32X .. 106.91 |================================================= Graviton4 192 vCPUs . 73.96 |================================== QuantLib 1.32 Configuration: Multi-Threaded MFLOPS > Higher Is Better AmpereOne A192-32X .. 300839.9 |========================== Graviton4 192 vCPUs . 547281.9 |=============================================== RocksDB 9.0 Test: Random Read Op/s > Higher Is Better AmpereOne A192-32X .. 719203371 |=========================== Graviton4 192 vCPUs . 1208755766 |============================================= RocksDB 9.0 Test: Read While Writing Op/s > Higher Is Better AmpereOne A192-32X .. 9803710 |================================================ Graviton4 192 vCPUs . 9720737 |================================================ Speedb 2.7 Test: Random Read Op/s > Higher Is Better AmpereOne A192-32X .. 734401635 |=========================== Graviton4 192 vCPUs . 1222618271 |============================================= srsRAN Project 23.10.1-20240325 Test: PUSCH Processor Benchmark, Throughput Total Mbps > Higher Is Better AmpereOne A192-32X .. 2824.7 |=========================================== Graviton4 192 vCPUs . 3208.5 |================================================= srsRAN Project 23.10.1-20240325 Test: PUSCH Processor Benchmark, Throughput Thread Mbps > Higher Is Better AmpereOne A192-32X .. 50.2 |================================================ Graviton4 192 vCPUs . 53.0 |=================================================== srsRAN Project 23.10.1-20240325 Test: PDSCH Processor Benchmark, Throughput Total Mbps > Higher Is Better AmpereOne A192-32X .. 24257.2 |======================================= Graviton4 192 vCPUs . 29796.9 |================================================ srsRAN Project 23.10.1-20240325 Test: PDSCH Processor Benchmark, Throughput Thread Mbps > Higher Is Better AmpereOne A192-32X .. 193.7 |========================================= Graviton4 192 vCPUs . 239.1 |================================================== Stockfish 16.1 Chess Benchmark Nodes Per Second > Higher Is Better AmpereOne A192-32X .. 128602136 |========================== Graviton4 192 vCPUs . 230111588 |============================================== Timed Gem5 Compilation 23.0.1 Time To Compile Seconds < Lower Is Better AmpereOne A192-32X .. 195.57 |================================================= Graviton4 192 vCPUs . 145.17 |==================================== Timed LLVM Compilation 16.0 Build System: Ninja Seconds < Lower Is Better AmpereOne A192-32X .. 177.10 |================================================= Graviton4 192 vCPUs . 113.28 |=============================== Timed Mesa Compilation 24.0 Time To Compile Seconds < Lower Is Better AmpereOne A192-32X .. 17.19 |================================================== Graviton4 192 vCPUs . 13.78 |======================================== Timed Node.js Compilation 21.7.2 Time To Compile Seconds < Lower Is Better AmpereOne A192-32X .. 214.16 |================================================= Graviton4 192 vCPUs . 159.97 |===================================== WRF 4.2.2 Input: conus 2.5km Seconds < Lower Is Better AmpereOne A192-32X .. 9102.32 |================================================ Graviton4 192 vCPUs . 3759.19 |==================== Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction Seconds < Lower Is Better AmpereOne A192-32X .. 8.95581484 |============================================= Graviton4 192 vCPUs . 2.90287603 |=============== Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d Seconds < Lower Is Better AmpereOne A192-32X .. 290.28 |================================================= Graviton4 192 vCPUs . 91.89 |================ Xmrig 6.21 Variant: GhostRider - Hash Count: 1M H/s > Higher Is Better AmpereOne A192-32X .. 17812.6 |============================================ Graviton4 192 vCPUs . 19482.7 |================================================