12700k HPC+OpenCL AVX512 performance profiling

Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2112125-TJ-12700KHPC62
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

Bioinformatics 5 Tests
BLAS (Basic Linear Algebra Sub-Routine) Tests 5 Tests
C++ Boost Tests 4 Tests
C/C++ Compiler Tests 6 Tests
CPU Massive 13 Tests
Creator Workloads 4 Tests
HPC - High Performance Computing 32 Tests
LAPACK (Linear Algebra Pack) Tests 3 Tests
Linear Algebra 3 Tests
Machine Learning 9 Tests
Molecular Dynamics 7 Tests
MPI Benchmarks 7 Tests
Multi-Core 9 Tests
NVIDIA GPU Compute 6 Tests
OpenCL 4 Tests
OpenMPI Tests 15 Tests
Programmer / Developer System Benchmarks 3 Tests
Python Tests 3 Tests
Scientific Computing 17 Tests
Server CPU Tests 5 Tests
Single-Threaded 3 Tests
Speech 2 Tests
Telephony 2 Tests
Common Workstation Benchmarks 2 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt
December 09 2021
  10 Hours, 32 Minutes
12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt
December 11 2021
  8 Hours, 28 Minutes
Invert Hiding All Results Option
  9 Hours, 30 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


12700k HPC+OpenCL AVX512 performance profiling - Phoronix Test Suite

12700k HPC+OpenCL AVX512 performance profiling

Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2112125-TJ-12700KHPC62&sor&gru.

12700k HPC+OpenCL AVX512 performance profilingProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLVulkanCompilerFile-SystemScreen Resolution12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xtIntel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads)MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS)Intel Device 7aa732GB500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 ProGigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz)Realtek ALC897LG HDR WQHDIntel I225-VPop 21.045.15.5-76051505-generic (x86_64)GNOME Shell 3.38.4X Server 1.20.114.6 Mesa 21.2.2 (LLVM 12.0.0)OpenCL 2.2 AMD-APP (3361.0)1.2.185GCC 11.1.0ext43440x1440500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 ProOpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseEnvironment Details- 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16"- 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" Compiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details- NONE / errors=remount-ro,noatime,rw / Block Size: 4096Processor Details- Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3Graphics Details- GLAMOR - BAR1 / Visible vRAM Size: 6128 MBPython Details- Python 2.7.18 + Python 3.9.5Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

12700k HPC+OpenCL AVX512 performance profilingintel-mpi: IMB-MPI1 Exchangeintel-mpi: IMB-MPI1 PingPongintel-mpi: IMB-MPI1 Sendrecvintel-mpi: IMB-P2P PingPongminife: Smallamg: shoc: OpenCL - Triadshoc: OpenCL - Reductionshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthcl-mem: Copycl-mem: Readcl-mem: Writemt-dgemm: Sustained Floating-Point Rateshoc: OpenCL - S3Dshoc: OpenCL - FFT SPshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Max SP Flopshpl: arrayfire: BLAS CPUshoc: OpenCL - MD5 Hashaskap: Hogbom Clean OpenMPfftw: Stock - 1D FFT Size 32fftw: Stock - 2D FFT Size 32fftw: Stock - 1D FFT Size 4096fftw: Stock - 2D FFT Size 4096fftw: Float + SSE - 1D FFT Size 32fftw: Float + SSE - 2D FFT Size 32fftw: Float + SSE - 1D FFT Size 4096fftw: Float + SSE - 2D FFT Size 4096himeno: Poisson Pressure Solveraskap: tConvolve MT - Griddingaskap: tConvolve MT - Degriddingaskap: tConvolve OpenMP - Griddingaskap: tConvolve OpenMP - Degriddingaskap: tConvolve MPI - Degriddingaskap: tConvolve MPI - Griddinglczero: BLASgromacs: MPI CPU - water_GMX50_barenumpy: daphne: OpenMP - NDT Mappingdaphne: OpenMP - Points2Imagedaphne: OpenMP - Euclidean Clusterlulesh: intel-mpi: IMB-MPI1 Exchangeintel-mpi: IMB-MPI1 Sendrecvnamd: ATPase Simulation - 327,506 Atomspennant: sedovbigpennant: leblancbigtensorflow-lite: SqueezeNettensorflow-lite: Inception V4tensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Floattensorflow-lite: Mobilenet Quanttensorflow-lite: Inception ResNet V2caffe: AlexNet - CPU - 100caffe: AlexNet - CPU - 200caffe: AlexNet - CPU - 1000caffe: GoogleNet - CPU - 100caffe: GoogleNet - CPU - 200caffe: GoogleNet - CPU - 1000onednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUparboil: OpenMP LBMparboil: OpenMP CUTCPparboil: OpenMP Stencilparboil: OpenMP MRI Griddingcp2k: Fayalite-FISTmrbayes: Primate Phylogeny Analysishmmer: Pfam Database Searchmafft: Multiple Sequence Alignment - LSU RNAopenfoam: Motorbike 30Mopenfoam: Motorbike 60Mrelion: Basic - CPUdeepspeech: CPUrbenchmark: rnnoise: darktable: Boat - OpenCLdarktable: Masskrug - OpenCLdarktable: Server Rack - OpenCLdarktable: Server Room - OpenCLoctave-benchmark: qmcpack: simple-H2O12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt15189.7610004.4612273.5486289826411.4330397510012.5997254.12620.148720.3871349.340198.4263.6255.34.875598125.078680.8831841.75837663797.5541205.099.3041267.389227742294818293137703218080496103960439009471.7383691245.642054.711866.303614.484859.185046.079061.180618.631033.4435022.1454310971671.516872.8297109.0153.501.1624969.8943449.93021145830208011012485997253.998345.318787302312246624247624796741572727265412.586378.782680.6851321.933472.445453.5683713.39576.277874.1316913.28360.8726831.068562540.281388.342596.326.773347.432044.677891334.312.336132585.151337.470.5656991.18583114.0688173.17714115.01145749.150711398.34573.79482.4827.703137.71867.201684.70248.851450.104416.5093.9713.8270.1333.0125.08017.95015023.9810294.2512471.8985134966407.1230261763312.3131254.40719.978221.0509349.012195.2261.2248.45.016124125.070682.0171859.409031599100.5931207.879.3179269.303227772282718502140203156082515104417429359554.0568011248.932054.381906.523599.354889.745046.079591.186612.181033.7436114.8083322581667.436878.6229108.2152.661.1787167.7735847.38740145241208282312494197559.598287.018816632365947045238435678521361876801772.578858.894500.6908731.925392.464743.5278813.41156.602454.2366813.40800.8785881.076962532.451341.612560.486.349106.724554.672931328.422.332062578.541339.830.5505881.18705114.0966623.18310914.98037548.567082372.74474.33982.6107.523135.71864.001656.71348.960520.105017.1194.1513.8180.1312.9995.06418.126OpenBenchmarking.org

Intel MPI Benchmarks

Test: IMB-MPI1 Exchange

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Exchange12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 185.82, N = 15SE +/- 218.78, N = 1515189.7615023.98-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.24-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.641. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Intel MPI Benchmarks

Test: IMB-MPI1 PingPong

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 PingPong12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2K4K6K8K10KSE +/- 131.85, N = 3SE +/- 140.45, N = 1510294.2510004.46-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.09-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.721. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Intel MPI Benchmarks

Test: IMB-MPI1 Sendrecv

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Sendrecv12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 81.05, N = 3SE +/- 111.25, N = 312471.8912273.54-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.84-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.11. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Intel MPI Benchmarks

Test: IMB-P2P PingPong

OpenBenchmarking.orgAverage Msg/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-P2P PingPong12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2M4M6M8M10MSE +/- 23849.34, N = 3SE +/- 102028.44, N = 386289828513496-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 22289308-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 220823601. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

miniFE

Problem Size: Small

OpenBenchmarking.orgCG Mflops, More Is BetterminiFE 2.2Problem Size: Small12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt14002800420056007000SE +/- 0.36, N = 3SE +/- 1.22, N = 36411.436407.121. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi

Algebraic Multi-Grid Benchmark

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt70M140M210M280M350MSE +/- 51637.29, N = 3SE +/- 414229.41, N = 33039751003026176331. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Triad12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.14, N = 3SE +/- 0.13, N = 612.6012.31-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Reduction12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt60120180240300SE +/- 0.16, N = 3SE +/- 0.22, N = 3254.41254.13-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Download12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt510152025SE +/- 0.24, N = 3SE +/- 0.15, N = 320.1519.98-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Readback12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt510152025SE +/- 0.21, N = 15SE +/- 0.01, N = 321.0520.39-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read Bandwidth12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt80160240320400SE +/- 1.54, N = 3SE +/- 1.09, N = 3349.34349.01-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copy12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt4080120160200SE +/- 0.12, N = 3SE +/- 0.12, N = 3198.4195.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Read12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 0.06, N = 3SE +/- 0.07, N = 3263.6261.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Write12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 0.17, N = 3SE +/- 0.15, N = 3255.3248.41. (CC) gcc options: -O2 -flto -lOpenCL

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point Rate12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1.12862.25723.38584.51445.643SE +/- 0.013832, N = 3SE +/- 0.034356, N = 35.0161244.875598-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -O3 -march=native -fopenmp

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3D12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt306090120150SE +/- 0.71, N = 3SE +/- 0.56, N = 3125.08125.07-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SP12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt150300450600750SE +/- 0.08, N = 3SE +/- 0.96, N = 3682.02680.88-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_N12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400800120016002000SE +/- 17.38, N = 3SE +/- 8.67, N = 31859.401841.75-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP Flops12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2M4M6M8M10MSE +/- 142226.18, N = 9SE +/- 65495.62, N = 390315998376637-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

HPL Linpack

OpenBenchmarking.orgGFLOPS, More Is BetterHPL Linpack 2.312700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20406080100SE +/- 1.07, N = 3SE +/- 0.14, N = 3100.5997.55-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi

ArrayFire

Test: BLAS CPU

OpenBenchmarking.orgGFLOPS, More Is BetterArrayFire 3.7Test: BLAS CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30060090012001500SE +/- 0.54, N = 3SE +/- 0.76, N = 31207.871205.09-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -rdynamic

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 Hash12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3691215SE +/- 0.0002, N = 3SE +/- 0.0009, N = 39.31799.3041-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

ASKAP

Test: Hogbom Clean OpenMP

OpenBenchmarking.orgIterations Per Second, More Is BetterASKAP 1.0Test: Hogbom Clean OpenMP12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt60120180240300SE +/- 0.64, N = 3SE +/- 1.10, N = 3269.30267.391. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

FFTW

Build: Stock - Size: 1D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 3212700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 4.36, N = 3SE +/- 0.67, N = 32277722774-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Stock - Size: 2D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 297.88, N = 3SE +/- 135.68, N = 32294822827-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Stock - Size: 1D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 409612700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt4K8K12K16K20KSE +/- 161.26, N = 3SE +/- 196.08, N = 31850218293-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Stock - Size: 2D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 409612700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 33.65, N = 3SE +/- 62.76, N = 31402013770-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Float + SSE - Size: 1D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt7K14K21K28K35KSE +/- 18.34, N = 3SE +/- 195.07, N = 33218031560-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Float + SSE - Size: 2D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 3212700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 920.77, N = 3SE +/- 272.26, N = 38251580496-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Float + SSE - Size: 1D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 409612700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 1211.30, N = 3SE +/- 1023.44, N = 3104417103960-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Float + SSE - Size: 2D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt9K18K27K36K45KSE +/- 484.91, N = 5SE +/- 90.35, N = 34390042935-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solver12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2K4K6K8K10KSE +/- 6.17, N = 3SE +/- 123.70, N = 39554.069471.74-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -O3 -mavx2

ASKAP

Test: tConvolve MT - Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Gridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30060090012001500SE +/- 0.92, N = 3SE +/- 0.56, N = 31248.931245.641. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MT - Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Degridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 1.84, N = 3SE +/- 1.75, N = 32054.712054.381. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve OpenMP - Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Gridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400800120016002000SE +/- 12.08, N = 3SE +/- 4.37, N = 31906.521866.301. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve OpenMP - Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Degridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt8001600240032004000SE +/- 16.43, N = 3SE +/- 47.99, N = 33614.483599.351. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Degridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt10002000300040005000SE +/- 30.56, N = 3SE +/- 0.00, N = 34889.744859.181. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Gridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt11002200330044005500SE +/- 0.00, N = 3SE +/- 0.00, N = 35046.075046.071. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

LeelaChessZero

Backend: BLAS

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLAS12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2004006008001000SE +/- 11.95, N = 4SE +/- 12.84, N = 9959906-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -flto -O3 -pthread

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2021.2Implementation: MPI CPU - Input: water_GMX50_bare12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.26690.53380.80071.06761.3345SE +/- 0.005, N = 3SE +/- 0.001, N = 31.1861.180-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -pthread

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmark12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt130260390520650SE +/- 3.76, N = 3SE +/- 4.12, N = 3618.63612.18

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: NDT Mapping

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: NDT Mapping12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2004006008001000SE +/- 6.53, N = 3SE +/- 11.85, N = 31033.741033.441. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Points2Image

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Points2Image12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt8K16K24K32K40KSE +/- 233.74, N = 3SE +/- 389.77, N = 336114.8135022.151. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Euclidean Cluster

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Euclidean Cluster12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 0.74, N = 3SE +/- 15.22, N = 31671.511667.431. (CXX) g++ options: -O3 -std=c++11 -fopenmp

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.312700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt15003000450060007500SE +/- 67.35, N = 3SE +/- 82.30, N = 46878.626872.831. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi

Intel MPI Benchmarks

Test: IMB-MPI1 Exchange

OpenBenchmarking.orgAverage usec, Fewer Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Exchange12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20406080100SE +/- 0.88, N = 15SE +/- 0.91, N = 15108.21109.01-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.441. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Intel MPI Benchmarks

Test: IMB-MPI1 Sendrecv

OpenBenchmarking.orgAverage usec, Fewer Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Sendrecv12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1224364860SE +/- 0.28, N = 3SE +/- 0.35, N = 352.6653.50-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.76-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.281. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 Atoms12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.26520.53040.79561.06081.326SE +/- 0.00085, N = 3SE +/- 0.00713, N = 31.162491.17871

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbig12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1632486480SE +/- 0.13, N = 3SE +/- 0.35, N = 367.7769.891. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbig12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1122334455SE +/- 0.03, N = 3SE +/- 0.05, N = 347.3949.931. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

TensorFlow Lite

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: SqueezeNet12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 572.70, N = 3SE +/- 601.51, N = 3145241145830

TensorFlow Lite

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception V412700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400K800K1200K1600K2000KSE +/- 1250.33, N = 3SE +/- 4943.42, N = 320801102082823

TensorFlow Lite

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: NASNet Mobile12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 671.08, N = 3SE +/- 607.87, N = 3124859124941

TensorFlow Lite

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet Float12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 155.46, N = 3SE +/- 138.23, N = 397253.997559.5

TensorFlow Lite

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet Quant12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 93.88, N = 3SE +/- 62.98, N = 398287.098345.3

TensorFlow Lite

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception ResNet V212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400K800K1200K1600K2000KSE +/- 4029.02, N = 3SE +/- 2515.38, N = 318787301881663

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 10012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 319.02, N = 3SE +/- 282.19, N = 32312223659-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 20012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt10K20K30K40K50KSE +/- 442.71, N = 3SE +/- 218.90, N = 34662447045-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 100012700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt50K100K150K200K250KSE +/- 1961.39, N = 3SE +/- 3023.73, N = 9238435247624-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 10012700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 292.08, N = 3SE +/- 640.77, N = 156785279674-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 20012700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 778.54, N = 3SE +/- 1277.18, N = 3136187157272-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 100012700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt160K320K480K640K800KSE +/- 2693.85, N = 3SE +/- 11125.73, N = 9680177726541-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.58191.16381.74572.32762.9095SE +/- 0.00417, N = 3SE +/- 0.00632, N = 32.578852.58637-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.34-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.331. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.01096, N = 3SE +/- 0.13446, N = 148.782688.89450-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.63-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.611. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.15540.31080.46620.62160.777SE +/- 0.006714, N = 3SE +/- 0.006051, N = 30.6851320.690873-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.6-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.61. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.4350.871.3051.742.175SE +/- 0.00843, N = 3SE +/- 0.00686, N = 31.925391.93347-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.86-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.851. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.55461.10921.66382.21842.773SE +/- 0.02457, N = 3SE +/- 0.02589, N = 52.445452.46474-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.14-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.80291.60582.40873.21164.0145SE +/- 0.02720, N = 10SE +/- 0.02891, N = 33.527883.56837-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.15-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.161. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.01, N = 3SE +/- 0.00, N = 313.4013.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.16-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.12013, N = 15SE +/- 0.13247, N = 126.277876.60245-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.58-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.531. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.95331.90662.85993.81324.7665SE +/- 0.01117, N = 3SE +/- 0.08547, N = 154.131694.23668-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.05-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.021. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.01, N = 3SE +/- 0.02, N = 313.2813.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.99-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.071. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.19770.39540.59310.79080.9885SE +/- 0.001641, N = 3SE +/- 0.007466, N = 30.8726830.878588-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.82-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.81. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.24230.48460.72690.96921.2115SE +/- 0.01500, N = 15SE +/- 0.02280, N = 121.068561.07696-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.98-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.991. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt5001000150020002500SE +/- 7.55, N = 3SE +/- 4.59, N = 32532.452540.28-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.58-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.61. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30060090012001500SE +/- 3.44, N = 3SE +/- 11.91, N = 81341.611388.34-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.01-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.311. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt6001200180024003000SE +/- 26.69, N = 3SE +/- 3.87, N = 32560.482596.32-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.01-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt246810SE +/- 0.00598, N = 3SE +/- 0.06030, N = 36.349106.77334-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.03-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt246810SE +/- 0.01945, N = 3SE +/- 0.04340, N = 36.724557.43204-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.2-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.531. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1.05252.1053.15754.215.2625SE +/- 0.09231, N = 15SE +/- 0.07736, N = 154.672934.67789-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.21-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.271. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30060090012001500SE +/- 2.17, N = 3SE +/- 1.94, N = 31328.421334.31-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.56-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.52561.05121.57682.10242.628SE +/- 0.02354, N = 3SE +/- 0.00302, N = 32.332062.33613-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.07-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.051. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt6001200180024003000SE +/- 20.40, N = 3SE +/- 25.72, N = 32578.542585.15-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.45-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.471. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 2.19, N = 3SE +/- 5.65, N = 31337.471339.83-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.85-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.621. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.12730.25460.38190.50920.6365SE +/- 0.005334, N = 6SE +/- 0.005655, N = 30.5505880.565699-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.45-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.471. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.26710.53420.80131.06841.3355SE +/- 0.01307, N = 3SE +/- 0.01157, N = 31.185831.18705-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.01-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 11. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Parboil

Test: OpenMP LBM

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP LBM12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt306090120150SE +/- 0.03, N = 3SE +/- 0.02, N = 3114.07114.101. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Parboil

Test: OpenMP CUTCP

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP CUTCP12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.71621.43242.14862.86483.581SE +/- 0.006360, N = 3SE +/- 0.009619, N = 33.1771413.1831091. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Parboil

Test: OpenMP Stencil

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP Stencil12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt48121620SE +/- 0.05, N = 3SE +/- 0.07, N = 314.9815.011. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Parboil

Test: OpenMP MRI Gridding

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP MRI Gridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1122334455SE +/- 0.99, N = 12SE +/- 0.68, N = 1548.5749.151. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

CP2K Molecular Dynamics

Input: Fayalite-FIST

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 8.2Input: Fayalite-FIST12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt90180270360450372.74398.35

Timed MrBayes Analysis

Primate Phylogeny Analysis

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.37, N = 3SE +/- 0.15, N = 373.7974.34-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 3.3.2Pfam Database Search12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.11, N = 3SE +/- 0.24, N = 382.4882.61-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi

Timed MAFFT Alignment

Multiple Sequence Alignment - LSU RNA

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.471Multiple Sequence Alignment - LSU RNA12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt246810SE +/- 0.012, N = 3SE +/- 0.014, N = 37.5237.7031. (CC) gcc options: -std=c99 -O3 -lm -lpthread

OpenFOAM

Input: Motorbike 30M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 8Input: Motorbike 30M12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt306090120150SE +/- 0.13, N = 3SE +/- 0.70, N = 3135.71137.71-lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling-ldynamicMesh1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenFOAM

Input: Motorbike 60M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 8Input: Motorbike 60M12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2004006008001000SE +/- 0.58, N = 3SE +/- 2.30, N = 3864.00867.20-lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling-ldynamicMesh1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 3.1.1Test: Basic - Device: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400800120016002000SE +/- 0.43, N = 3SE +/- 3.94, N = 31656.711684.70-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi

DeepSpeech

Acceleration: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1122334455SE +/- 0.34, N = 3SE +/- 0.29, N = 348.8548.96

R Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterR Benchmark12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.02360.04720.07080.09440.118SE +/- 0.0005, N = 3SE +/- 0.0008, N = 150.10440.10501. R scripting front-end version 4.0.4 (2021-02-15)

RNNoise

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-2812700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt48121620SE +/- 0.21, N = 3SE +/- 0.01, N = 316.5117.12-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Boat - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.9341.8682.8023.7364.67SE +/- 0.049, N = 3SE +/- 0.043, N = 33.9714.151

Darktable

Test: Masskrug - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Masskrug - Acceleration: OpenCL12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.86111.72222.58333.44444.3055SE +/- 0.007, N = 3SE +/- 0.012, N = 33.8183.827

Darktable

Test: Server Rack - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Server Rack - Acceleration: OpenCL12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.02990.05980.08970.11960.1495SE +/- 0.000, N = 3SE +/- 0.001, N = 30.1310.133

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Server Room - Acceleration: OpenCL12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.67771.35542.03312.71083.3885SE +/- 0.007, N = 3SE +/- 0.004, N = 32.9993.012

GNU Octave Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterGNU Octave Benchmark 6.1.1~hg.2021.01.2612700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1.1432.2863.4294.5725.715SE +/- 0.026, N = 5SE +/- 0.018, N = 55.0645.080

QMCPACK

Input: simple-H2O

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.11Input: simple-H2O12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt48121620SE +/- 0.21, N = 14SE +/- 0.10, N = 317.9518.13-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl


Phoronix Test Suite v10.8.4