AmpereOne A192-32X vs. AWS Graviton4 CPU Performance Benchmarks

AmpereOne versus AWS Graviton4 ARM64 CPU benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2409052-NE-GRAVITON412&rdt&grt.

AmpereOne A192-32X vs. AWS Graviton4 CPU Performance BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionSystem LayerAmpereOne A192-32XGraviton4 192 vCPUsAmpereOne @ 3.20GHz (192 Cores)Supermicro ARS-211M-NR R13SPD v1.02 (T20240726102529 BIOS)Ampere Computing LLC Device e2088 x 64GB DDR5-5200MT/s3841GB SAMSUNG MZQL23T8HCLS-00A07 + 960GB SAMSUNG MZ1L2960HCJR-00A07ASPEEDVGA HDMI2 x Broadcom BCM57414 NetXtreme-E 10Gb/25Gb + 2 x Mellanox MT2892Ubuntu 24.046.8.0-39-generic-64k (aarch64)GCC 13.2.0ext41920x1080ARMv8 Neoverse-V2 (192 Cores)Amazon EC2 r8g.48xlarge (1.0 BIOS)Amazon Device 02001520GB429GB Amazon Elastic Block StoreAmazon Elastic6.8.0-41-generic-64k (aarch64)amazonOpenBenchmarking.orgKernel Details- AmpereOne A192-32X: Transparent Huge Pages: always- Graviton4 192 vCPUs: nvme_core.io_timeout=4294967295 - Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v Processor Details- AmpereOne A192-32X: Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Python Details- Python 3.12.3Security Details- AmpereOne A192-32X: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Not affected + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected - Graviton4 192 vCPUs: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AmpereOne A192-32X vs. AWS Graviton4 CPU Performance Benchmarkscompress-7zip: Compression Ratingcompress-7zip: Decompression Ratingamg: askap: tConvolve MPI - Degriddingaskap: tConvolve MPI - Griddingastcenc: Thoroughastcenc: Very Thoroughastcenc: Exhaustiveclickhouse: 100M Rows Hits Dataset, First Run / Cold Cacheclickhouse: 100M Rows Hits Dataset, Second Runclickhouse: 100M Rows Hits Dataset, Third Runcloverleaf: clover_bm64_shortcloverleaf: clover_bm16coremark: CoreMark Size 666 - Iterations Per Secondgpaw: Carbon Nanotubegraphics-magick: Noise-Gaussiangraphics-magick: Enhancedgraphics-magick: Sharpengraphics-magick: Swirlgromacs: MPI CPU - water_GMX50_barehelsing: 14 digithpcg: 144 144 144 - 60john-the-ripper: Blowfishjohn-the-ripper: bcryptlammps: Rhodopsin Proteinlammps: 20k Atomsliquid-dsp: 128 - 256 - 32lulesh: m-queens: Time To Solvememcached: 1:100minife: Smallnpb: EP.Dnpb: SP.Cnpb: IS.Dnumpy: nwchem: C240 Buckyballopenfoam: drivaerFastback, Small Mesh Size - Mesh Timeopenfoam: drivaerFastback, Small Mesh Size - Execution Timeopenfoam: drivaerFastback, Medium Mesh Size - Mesh Timeopenfoam: drivaerFastback, Medium Mesh Size - Execution Timecompress-pbzip2: FreeBSD-13.0-RELEASE-amd64-memstick.img Compressionpennant: leblancbigpennant: sedovbigpgbench: 100 - 1000 - Read Onlypgbench: 100 - 1000 - Read Only - Average Latencyprimesieve: 1e13pybench: Total For Average Test Timespytorch: CPU - 512 - ResNet-50qmcpack: Li2_STO_aequantlib: Multi-Threadedrocksdb: Rand Readrocksdb: Read While Writingspeedb: Rand Readsrsran: PUSCH Processor Benchmark, Throughput Totalsrsran: PUSCH Processor Benchmark, Throughput Threadsrsran: PDSCH Processor Benchmark, Throughput Totalsrsran: PDSCH Processor Benchmark, Throughput Threadstockfish: Chess Benchmarkbuild-gem5: Time To Compilebuild-llvm: Ninjabuild-mesa: Time To Compilebuild-nodejs: Time To Compilewrf: conus 2.5kmincompact3d: input.i3d 193 Cells Per Directionincompact3d: X3D-benchmarking input.i3dxmrig: GhostRider - 1MAmpereOne A192-32XGraviton4 192 vCPUs756681877305182826533320809.218414.750.02327.37484.5930402.87400.27406.3639.23349.494561177.33617041.0602453704699537.03930.47333.361717630617404850.53553.859298390000041890.4075.3573895708.1938349.27557.7233612.412417.04313.432293.228.41745532.358871152.1754315.655871.6444853.3494394.29058627467300.36414.010124619.98106.91300839.971920337198037107344016352824.750.224257.2193.7128602136195.568177.10417.192214.1559102.3188.95581484290.28072117812.6969091874238582099233353520.649717.894.486213.96158.6372713.11728.98724.0522.06323.865066982.49970228.583602715817157512.70826.477114.95113932213937478.55376.5296268333333114373.355.7404205545.7253285.010218.1749186.024203.95449.081576.317.30248422.6186983.280689110.333061.5269321.3143591.74203223961380.41714.74684573.959547281.91208755766972073712226182713208.55329796.9239.1230111588145.169113.27713.777159.9663759.192.9028760391.886861219482.7OpenBenchmarking.org

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingAmpereOne A192-32XGraviton4 192 vCPUs200K400K600K800K1000KSE +/- 7822.06, N = 3SE +/- 4521.10, N = 37566819690911. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingAmpereOne A192-32XGraviton4 192 vCPUs200K400K600K800K1000KSE +/- 2055.40, N = 3SE +/- 24922.61, N = 38773058742381. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Algebraic Multi-Grid Benchmark

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.2AmpereOne A192-32XGraviton4 192 vCPUs1200M2400M3600M4800M6000MSE +/- 1007826.76, N = 3SE +/- 2157483.59, N = 3182826533358209923331. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingAmpereOne A192-32XGraviton4 192 vCPUs11K22K33K44K55KSE +/- 159.94, N = 3SE +/- 152.03, N = 320809.253520.61. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingAmpereOne A192-32XGraviton4 192 vCPUs11K22K33K44K55KSE +/- 93.27, N = 3SE +/- 131.17, N = 318414.749717.81. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASTC Encoder

Preset: Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.7Preset: ThoroughAmpereOne A192-32XGraviton4 192 vCPUs20406080100SE +/- 0.01, N = 4SE +/- 0.03, N = 350.0294.491. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Very Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.7Preset: Very ThoroughAmpereOne A192-32XGraviton4 192 vCPUs48121620SE +/- 0.0004, N = 3SE +/- 0.0026, N = 37.374813.96151. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Exhaustive

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.7Preset: ExhaustiveAmpereOne A192-32XGraviton4 192 vCPUs246810SE +/- 0.0011, N = 3SE +/- 0.0006, N = 34.59308.63721. (CXX) g++ options: -O3 -flto -pthread

ClickHouse

100M Rows Hits Dataset, First Run / Cold Cache

OpenBenchmarking.orgQueries Per Minute, Geo Mean, More Is BetterClickHouse 22.12.3.5100M Rows Hits Dataset, First Run / Cold CacheAmpereOne A192-32XGraviton4 192 vCPUs150300450600750SE +/- 17.49, N = 9SE +/- 1.29, N = 3402.87713.11MIN: 23.07 / MAX: 5000MIN: 90.5 / MAX: 5454.55

ClickHouse

100M Rows Hits Dataset, Second Run

OpenBenchmarking.orgQueries Per Minute, Geo Mean, More Is BetterClickHouse 22.12.3.5100M Rows Hits Dataset, Second RunAmpereOne A192-32XGraviton4 192 vCPUs160320480640800SE +/- 15.66, N = 9SE +/- 7.44, N = 3400.27728.98MIN: 30.88 / MAX: 5000MIN: 89.42 / MAX: 6000

ClickHouse

100M Rows Hits Dataset, Third Run

OpenBenchmarking.orgQueries Per Minute, Geo Mean, More Is BetterClickHouse 22.12.3.5100M Rows Hits Dataset, Third RunAmpereOne A192-32XGraviton4 192 vCPUs160320480640800SE +/- 13.28, N = 9SE +/- 6.40, N = 3406.36724.05MIN: 30.14 / MAX: 5000MIN: 90.09 / MAX: 6000

CloverLeaf

Input: clover_bm64_short

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeaf 1.3Input: clover_bm64_shortAmpereOne A192-32XGraviton4 192 vCPUs918273645SE +/- 0.13, N = 3SE +/- 0.23, N = 1539.2322.061. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

CloverLeaf

Input: clover_bm16

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeaf 1.3Input: clover_bm16AmpereOne A192-32XGraviton4 192 vCPUs80160240320400SE +/- 0.18, N = 3SE +/- 0.91, N = 3349.49323.861. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Coremark

CoreMark Size 666 - Iterations Per Second

OpenBenchmarking.orgIterations/Sec, More Is BetterCoremark 1.0CoreMark Size 666 - Iterations Per SecondAmpereOne A192-32XGraviton4 192 vCPUs1.1M2.2M3.3M4.4M5.5MSE +/- 3500.33, N = 3SE +/- 61511.29, N = 44561177.345066982.501. (CC) gcc options: -O2 -lrt" -lrt

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon NanotubeAmpereOne A192-32XGraviton4 192 vCPUs918273645SE +/- 0.10, N = 3SE +/- 0.26, N = 1541.0628.581. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

GraphicsMagick

Operation: Noise-Gaussian

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.43Operation: Noise-GaussianAmpereOne A192-32XGraviton4 192 vCPUs130260390520650SE +/- 1.20, N = 3SE +/- 8.81, N = 12245602-ltiff -ljbig -lsharpyuv -lwebp -lwebpmux -lzstd -llzma1. (CC) gcc options: -fopenmp -O2 -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -lxml2 -lbz2 -lz -lm -lpthread -lgomp

GraphicsMagick

Operation: Enhanced

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.43Operation: EnhancedAmpereOne A192-32XGraviton4 192 vCPUs150300450600750SE +/- 0.33, N = 3SE +/- 1.53, N = 3370715-ltiff -ljbig -lsharpyuv -lwebp -lwebpmux -lzstd -llzma1. (CC) gcc options: -fopenmp -O2 -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -lxml2 -lbz2 -lz -lm -lpthread -lgomp

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.43Operation: SharpenAmpereOne A192-32XGraviton4 192 vCPUs2004006008001000SE +/- 4.33, N = 3SE +/- 4.36, N = 3469817-ltiff -ljbig -lsharpyuv -lwebp -lwebpmux -lzstd -llzma1. (CC) gcc options: -fopenmp -O2 -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -lxml2 -lbz2 -lz -lm -lpthread -lgomp

GraphicsMagick

Operation: Swirl

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.43Operation: SwirlAmpereOne A192-32XGraviton4 192 vCPUs30060090012001500SE +/- 2.52, N = 3SE +/- 6.77, N = 39531575-ltiff -ljbig -lsharpyuv -lwebp -lwebpmux -lzstd -llzma1. (CC) gcc options: -fopenmp -O2 -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -lxml2 -lbz2 -lz -lm -lpthread -lgomp

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareAmpereOne A192-32XGraviton4 192 vCPUs3691215SE +/- 0.007, N = 3SE +/- 0.002, N = 37.03912.7081. (CXX) g++ options: -O3 -lm

Helsing

Digit Range: 14 digit

OpenBenchmarking.orgSeconds, Fewer Is BetterHelsing 1.0-betaDigit Range: 14 digitAmpereOne A192-32XGraviton4 192 vCPUs714212835SE +/- 0.11, N = 3SE +/- 0.01, N = 330.4726.481. (CC) gcc options: -O2 -pthread

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60AmpereOne A192-32XGraviton4 192 vCPUs306090120150SE +/- 0.39, N = 3SE +/- 0.04, N = 333.36114.951. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

John The Ripper

Test: Blowfish

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: BlowfishAmpereOne A192-32XGraviton4 192 vCPUs40K80K120K160K200KSE +/- 564.17, N = 3SE +/- 351.56, N = 31763061393221. (CC) gcc options: -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

John The Ripper

Test: bcrypt

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: bcryptAmpereOne A192-32XGraviton4 192 vCPUs40K80K120K160K200KSE +/- 1839.06, N = 3SE +/- 201.45, N = 31740481393741. (CC) gcc options: -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

LAMMPS Molecular Dynamics Simulator

Model: Rhodopsin Protein

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: Rhodopsin ProteinAmpereOne A192-32XGraviton4 192 vCPUs20406080100SE +/- 3.06, N = 12SE +/- 0.56, N = 350.5478.551. (CXX) g++ options: -O3 -lm -ldl

LAMMPS Molecular Dynamics Simulator

Model: 20k Atoms

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: 20k AtomsAmpereOne A192-32XGraviton4 192 vCPUs20406080100SE +/- 0.04, N = 3SE +/- 0.04, N = 353.8676.531. (CXX) g++ options: -O3 -lm -ldl

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32AmpereOne A192-32XGraviton4 192 vCPUs1300M2600M3900M5200M6500MSE +/- 100000.00, N = 3SE +/- 1545243.60, N = 3298390000062683333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3AmpereOne A192-32XGraviton4 192 vCPUs20K40K60K80K100KSE +/- 79.76, N = 3SE +/- 74.63, N = 341890.41114373.351. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi

m-queens

Time To Solve

OpenBenchmarking.orgSeconds, Fewer Is Betterm-queens 1.2Time To SolveAmpereOne A192-32XGraviton4 192 vCPUs1.29152.5833.87455.1666.4575SE +/- 0.010, N = 7SE +/- 0.102, N = 155.3575.7401. (CXX) g++ options: -fopenmp -O2 -march=native

Memcached

Set To Get Ratio: 1:100

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:100AmpereOne A192-32XGraviton4 192 vCPUs900K1800K2700K3600K4500KSE +/- 76864.47, N = 12SE +/- 241067.66, N = 133895708.194205545.721. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

miniFE

Problem Size: Small

OpenBenchmarking.orgCG Mflops, More Is BetterminiFE 2.2Problem Size: SmallAmpereOne A192-32XGraviton4 192 vCPUs11K22K33K44K55KSE +/- 628.57, N = 15SE +/- 1848.80, N = 1538349.253285.01. (CXX) g++ options: -O3 -fopenmp -lmpi_cxx -lmpi

NAS Parallel Benchmarks

Test / Class: EP.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.DAmpereOne A192-32XGraviton4 192 vCPUs2K4K6K8K10KSE +/- 27.71, N = 3SE +/- 30.70, N = 37557.7210218.171. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CAmpereOne A192-32XGraviton4 192 vCPUs11K22K33K44K55KSE +/- 8.41, N = 3SE +/- 94.96, N = 333612.4149186.021. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DAmpereOne A192-32XGraviton4 192 vCPUs9001800270036004500SE +/- 10.73, N = 3SE +/- 8.10, N = 32417.044203.951. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAmpereOne A192-32XGraviton4 192 vCPUs100200300400500SE +/- 0.70, N = 3SE +/- 0.13, N = 3313.43449.08

NWChem

Input: C240 Buckyball

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballAmpereOne A192-32XGraviton4 192 vCPUs50010001500200025002293.21576.31. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Mesh Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh TimeAmpereOne A192-32XGraviton4 192 vCPUs71421283528.4217.301. (CXX) g++ options: -std=c++14 -O3 -mcpu=native -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: drivaerFastback, Small Mesh Size - Execution Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeAmpereOne A192-32XGraviton4 192 vCPUs81624324032.3622.621. (CXX) g++ options: -std=c++14 -O3 -mcpu=native -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: drivaerFastback, Medium Mesh Size - Mesh Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh TimeAmpereOne A192-32XGraviton4 192 vCPUs306090120150152.1883.281. (CXX) g++ options: -std=c++14 -O3 -mcpu=native -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: drivaerFastback, Medium Mesh Size - Execution Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeAmpereOne A192-32XGraviton4 192 vCPUs70140210280350315.66110.331. (CXX) g++ options: -std=c++14 -O3 -mcpu=native -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Parallel BZIP2 Compression

FreeBSD-13.0-RELEASE-amd64-memstick.img Compression

OpenBenchmarking.orgSeconds, Fewer Is BetterParallel BZIP2 Compression 1.1.13FreeBSD-13.0-RELEASE-amd64-memstick.img CompressionAmpereOne A192-32XGraviton4 192 vCPUs0.370.741.111.481.85SE +/- 0.008273, N = 11SE +/- 0.418215, N = 151.6444851.5269321. (CXX) g++ options: -O2 -pthread -lbz2 -lpthread

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigAmpereOne A192-32XGraviton4 192 vCPUs0.75361.50722.26083.01443.768SE +/- 0.062865, N = 15SE +/- 0.001158, N = 33.3494391.3143591. (CXX) g++ options: -fopenmp -lmpi_cxx -lmpi

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigAmpereOne A192-32XGraviton4 192 vCPUs0.96541.93082.89623.86164.827SE +/- 0.065187, N = 15SE +/- 0.012223, N = 34.2905861.7420321. (CXX) g++ options: -fopenmp -lmpi_cxx -lmpi

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read OnlyAmpereOne A192-32XGraviton4 192 vCPUs600K1200K1800K2400K3000KSE +/- 20807.81, N = 12SE +/- 28406.87, N = 4274673023961381. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average LatencyAmpereOne A192-32XGraviton4 192 vCPUs0.09380.18760.28140.37520.469SE +/- 0.003, N = 12SE +/- 0.005, N = 40.3640.4171. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

Primesieve

Length: 1e13

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 12.1Length: 1e13AmpereOne A192-32XGraviton4 192 vCPUs48121620SE +/- 0.06, N = 4SE +/- 0.01, N = 314.0114.751. (CXX) g++ options: -O3

PyBench

Total For Average Test Times

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyBench 2018-02-16Total For Average Test TimesAmpereOne A192-32XGraviton4 192 vCPUs30060090012001500SE +/- 0.33, N = 3SE +/- 0.00, N = 31246845

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50AmpereOne A192-32X510152025SE +/- 0.20, N = 319.98MIN: 12.43 / MAX: 21.16

QMCPACK

Input: Li2_STO_ae

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: Li2_STO_aeAmpereOne A192-32XGraviton4 192 vCPUs20406080100SE +/- 0.15, N = 3SE +/- 0.20, N = 3106.9173.961. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -mcpu=native -O3 -lm -ldl

QuantLib

Configuration: Multi-Threaded

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.32Configuration: Multi-ThreadedAmpereOne A192-32XGraviton4 192 vCPUs120K240K360K480K600KSE +/- 97.87, N = 3SE +/- 302.91, N = 3300839.9547281.91. (CXX) g++ options: -O3 -march=native -fPIE -pie

RocksDB

Test: Random Read

OpenBenchmarking.orgOp/s, More Is BetterRocksDB 9.0Test: Random ReadAmpereOne A192-32XGraviton4 192 vCPUs300M600M900M1200M1500MSE +/- 904748.20, N = 3SE +/- 683858.75, N = 371920337112087557661. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti

RocksDB

Test: Read While Writing

OpenBenchmarking.orgOp/s, More Is BetterRocksDB 9.0Test: Read While WritingAmpereOne A192-32XGraviton4 192 vCPUs2M4M6M8M10MSE +/- 27064.97, N = 3SE +/- 332194.39, N = 15980371097207371. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti

Speedb

Test: Random Read

OpenBenchmarking.orgOp/s, More Is BetterSpeedb 2.7Test: Random ReadAmpereOne A192-32XGraviton4 192 vCPUs300M600M900M1200M1500MSE +/- 1533105.14, N = 3SE +/- 420037.99, N = 373440163512226182711. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.10.1-20240325Test: PUSCH Processor Benchmark, Throughput TotalAmpereOne A192-32XGraviton4 192 vCPUs7001400210028003500SE +/- 0.12, N = 3SE +/- 0.20, N = 32824.73208.5MIN: 1697.4 / MAX: 2824.9MIN: 1852.1 / MAX: 3208.91. (CXX) g++ options: -O3 -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.10.1-20240325Test: PUSCH Processor Benchmark, Throughput ThreadAmpereOne A192-32XGraviton4 192 vCPUs1224364860SE +/- 0.00, N = 3SE +/- 0.00, N = 350.253.0MIN: 31.7MIN: 35.31. (CXX) g++ options: -O3 -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.10.1-20240325Test: PDSCH Processor Benchmark, Throughput TotalAmpereOne A192-32XGraviton4 192 vCPUs6K12K18K24K30KSE +/- 45.30, N = 3SE +/- 334.78, N = 324257.229796.91. (CXX) g++ options: -O3 -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.10.1-20240325Test: PDSCH Processor Benchmark, Throughput ThreadAmpereOne A192-32XGraviton4 192 vCPUs50100150200250SE +/- 0.24, N = 4SE +/- 0.06, N = 3193.7239.11. (CXX) g++ options: -O3 -fno-trapping-math -fno-math-errno -ldl

Stockfish

Chess Benchmark

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 16.1Chess BenchmarkAmpereOne A192-32XGraviton4 192 vCPUs50M100M150M200M250MSE +/- 4643374.23, N = 11SE +/- 4060681.61, N = 151286021362301115881. (CXX) g++ options: -lgcov -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -flto -flto-partition=one -flto=jobserver

Timed Gem5 Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Gem5 Compilation 23.0.1Time To CompileAmpereOne A192-32XGraviton4 192 vCPUs4080120160200SE +/- 0.58, N = 3SE +/- 0.73, N = 3195.57145.17

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: NinjaAmpereOne A192-32XGraviton4 192 vCPUs4080120160200SE +/- 0.57, N = 3SE +/- 0.18, N = 3177.10113.28

Timed Mesa Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Mesa Compilation 24.0Time To CompileAmpereOne A192-32XGraviton4 192 vCPUs48121620SE +/- 0.07, N = 3SE +/- 0.12, N = 317.1913.78

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 21.7.2Time To CompileAmpereOne A192-32XGraviton4 192 vCPUs50100150200250SE +/- 0.34, N = 3SE +/- 0.49, N = 3214.16159.97

WRF

Input: conus 2.5km

OpenBenchmarking.orgSeconds, Fewer Is BetterWRF 4.2.2Input: conus 2.5kmAmpereOne A192-32XGraviton4 192 vCPUs2K4K6K8K10K9102.323759.191. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionAmpereOne A192-32XGraviton4 192 vCPUs3691215SE +/- 0.04916545, N = 4SE +/- 0.02410444, N = 98.955814842.902876031. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Xcompact3d Incompact3d

Input: X3D-benchmarking input.i3d

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: X3D-benchmarking input.i3dAmpereOne A192-32XGraviton4 192 vCPUs60120180240300SE +/- 0.24, N = 3SE +/- 0.06, N = 3290.2891.891. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Xmrig

Variant: GhostRider - Hash Count: 1M

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.21Variant: GhostRider - Hash Count: 1MAmpereOne A192-32XGraviton4 192 vCPUs4K8K12K16K20KSE +/- 274.17, N = 15SE +/- 112.11, N = 317812.619482.71. (CXX) g++ options: -fexceptions -fno-rtti -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc


Phoronix Test Suite v10.8.5