12700k HPC+OpenCL AVX512 performance profiling

Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2112125-TJ-12700KHPC62&grs&rdt.

12700k HPC+OpenCL AVX512 performance profilingProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLVulkanCompilerFile-SystemScreen Resolution12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xtIntel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads)MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS)Intel Device 7aa732GB500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 ProGigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz)Realtek ALC897LG HDR WQHDIntel I225-VPop 21.045.15.5-76051505-generic (x86_64)GNOME Shell 3.38.4X Server 1.20.114.6 Mesa 21.2.2 (LLVM 12.0.0)OpenCL 2.2 AMD-APP (3361.0)1.2.185GCC 11.1.0ext43440x1440500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 ProOpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseEnvironment Details- 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16"- 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" Compiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details- NONE / errors=remount-ro,noatime,rw / Block Size: 4096Processor Details- Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3Graphics Details- GLAMOR - BAR1 / Visible vRAM Size: 6128 MBPython Details- Python 2.7.18 + Python 3.9.5Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

12700k HPC+OpenCL AVX512 performance profilingcaffe: GoogleNet - CPU - 100caffe: GoogleNet - CPU - 200onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUshoc: OpenCL - Max SP Flopscp2k: Fayalite-FISTcaffe: GoogleNet - CPU - 1000onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUlczero: BLASpennant: leblancbigdarktable: Boat - OpenCLcaffe: AlexNet - CPU - 1000rnnoise: onednn: Recurrent Neural Network Inference - f32 - CPUshoc: OpenCL - Bus Speed Readbackpennant: sedovbigdaphne: OpenMP - Points2Imagehpl: intel-mpi: IMB-MPI1 PingPongmt-dgemm: Sustained Floating-Point Ratecl-mem: Writeonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUfftw: Float + SSE - 2D FFT Size 32mafft: Multiple Sequence Alignment - LSU RNAshoc: OpenCL - Triadcaffe: AlexNet - CPU - 100fftw: Float + SSE - 2D FFT Size 4096askap: tConvolve OpenMP - Griddingfftw: Float + SSE - 1D FFT Size 32fftw: Stock - 2D FFT Size 4096relion: Basic - CPUcl-mem: Copyintel-mpi: IMB-MPI1 Sendrecvintel-mpi: IMB-MPI1 Sendrecvdarktable: Server Rack - OpenCLopenfoam: Motorbike 30Monednn: Recurrent Neural Network Training - u8s8f32 - CPUnamd: ATPase Simulation - 327,506 Atomsintel-mpi: IMB-P2P PingPongonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUfftw: Stock - 1D FFT Size 4096intel-mpi: IMB-MPI1 Exchangenumpy: qmcpack: simple-H2Oshoc: OpenCL - GEMM SGEMM_Nonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUcl-mem: Readcaffe: AlexNet - CPU - 200himeno: Poisson Pressure Solvershoc: OpenCL - Bus Speed Downloadonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUintel-mpi: IMB-MPI1 Exchangemrbayes: Primate Phylogeny Analysisaskap: Hogbom Clean OpenMPonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUaskap: tConvolve MPI - Degriddingrbenchmark: fftw: Stock - 2D FFT Size 32gromacs: MPI CPU - water_GMX50_bareamg: onednn: Recurrent Neural Network Inference - u8s8f32 - CPUfftw: Float + SSE - 1D FFT Size 4096darktable: Server Room - OpenCLaskap: tConvolve OpenMP - Degriddingonednn: IP Shapes 3D - u8s8f32 - CPUtensorflow-lite: SqueezeNetopenfoam: Motorbike 60Moctave-benchmark: tensorflow-lite: Mobilenet Floatonednn: Recurrent Neural Network Training - f32 - CPUonednn: IP Shapes 1D - f32 - CPUaskap: tConvolve MT - Griddingonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUdaphne: OpenMP - Euclidean Clusterdarktable: Masskrug - OpenCLarrayfire: BLAS CPUdeepspeech: CPUparboil: OpenMP Stencilparboil: OpenMP CUTCPonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUshoc: OpenCL - FFT SPtensorflow-lite: Inception ResNet V2hmmer: Pfam Database Searchshoc: OpenCL - MD5 Hashtensorflow-lite: Inception V4onednn: Convolution Batch Shapes Auto - f32 - CPUshoc: OpenCL - Reductiononednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUshoc: OpenCL - Texture Read Bandwidthlulesh: minife: Smalltensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Quantdaphne: OpenMP - NDT Mappingparboil: OpenMP LBMaskap: tConvolve MT - Degriddingfftw: Stock - 1D FFT Size 32shoc: OpenCL - S3Daskap: tConvolve MPI - Griddingonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUparboil: OpenMP MRI Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt796741572727.432048376637398.3457265416.7733490649.930213.97124762416.5091388.3420.387169.8943435022.14543109797.55410004.464.875598255.30.565699804967.70312.599723122439001866.3032180137701684.702198.412273.5453.500.133137.712596.321.1624986289828.782683.568371829315189.76618.6317.9501841.7513.2836263.6466249471.73836920.14870.6851322.44545109.0173.794267.3890.8726834859.180.1044229481.1803039751001334.311039603.0123614.481.93347145830867.205.08097253.92540.282.586371245.642585.151671.513.8271205.0948.8514515.0114573.1771411337.472.33613680.883187873082.4829.3041208011013.3957254.1261.18583349.3406872.82976411.4312485998345.31033.44114.0688172054.7122774125.0785046.074.677891.068564.131696.2778749.150711678521361876.724559031599372.7446801776.3491095947.387404.15123843517.1191341.6121.050967.7735836114.808332258100.59310294.255.016124248.40.550588825157.52312.313123659429351906.5231560140201656.713195.212471.8952.660.131135.712560.481.1787185134968.894503.527881850215023.98612.1818.1261859.4013.4080261.2470459554.05680119.97820.6908732.46474108.2174.339269.3030.8785884889.740.1050228271.1863026176331328.421044172.9993599.351.92539145241864.005.06497559.52532.452.578851248.932578.541667.433.8181207.8748.9605214.9803753.1831091339.832.33206682.017188166382.6109.3179208282313.4115254.4071.18705349.0126878.62296407.1212494198287.01033.74114.0966622054.3822777125.0705046.074.672931.076964.236686.6024548.567082OpenBenchmarking.org

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 10012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 640.77, N = 15SE +/- 292.08, N = 37967467852-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 20012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 1277.18, N = 3SE +/- 778.54, N = 3157272136187-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.04340, N = 3SE +/- 0.01945, N = 37.432046.72455-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.53-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.21. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP Flops12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2M4M6M8M10MSE +/- 65495.62, N = 3SE +/- 142226.18, N = 983766379031599-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

CP2K Molecular Dynamics

Input: Fayalite-FIST

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 8.2Input: Fayalite-FIST12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt90180270360450398.35372.74

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 100012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt160K320K480K640K800KSE +/- 11125.73, N = 9SE +/- 2693.85, N = 3726541680177-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.06030, N = 3SE +/- 0.00598, N = 36.773346.34910-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.15-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.031. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

LeelaChessZero

Backend: BLAS

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLAS12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2004006008001000SE +/- 12.84, N = 9SE +/- 11.95, N = 4906959-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -flto -O3 -pthread

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbig12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1122334455SE +/- 0.05, N = 3SE +/- 0.03, N = 349.9347.391. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Boat - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.9341.8682.8023.7364.67SE +/- 0.049, N = 3SE +/- 0.043, N = 33.9714.151

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 100012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt50K100K150K200K250KSE +/- 3023.73, N = 9SE +/- 1961.39, N = 3247624238435-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

RNNoise

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-2812700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt48121620SE +/- 0.21, N = 3SE +/- 0.01, N = 316.5117.12-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 11.91, N = 8SE +/- 3.44, N = 31388.341341.61-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.31-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.011. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Readback12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt510152025SE +/- 0.01, N = 3SE +/- 0.21, N = 1520.3921.05-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbig12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1632486480SE +/- 0.35, N = 3SE +/- 0.13, N = 369.8967.771. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Points2Image

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Points2Image12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt8K16K24K32K40KSE +/- 389.77, N = 3SE +/- 233.74, N = 335022.1536114.811. (CXX) g++ options: -O3 -std=c++11 -fopenmp

HPL Linpack

OpenBenchmarking.orgGFLOPS, More Is BetterHPL Linpack 2.312700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.14, N = 3SE +/- 1.07, N = 397.55100.59-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi

Intel MPI Benchmarks

Test: IMB-MPI1 PingPong

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 PingPong12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2K4K6K8K10KSE +/- 140.45, N = 15SE +/- 131.85, N = 310004.4610294.25-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.72-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.091. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point Rate12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1.12862.25723.38584.51445.643SE +/- 0.034356, N = 3SE +/- 0.013832, N = 34.8755985.016124-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -march=native -fopenmp

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Write12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 0.17, N = 3SE +/- 0.15, N = 3255.3248.41. (CC) gcc options: -O2 -flto -lOpenCL

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.12730.25460.38190.50920.6365SE +/- 0.005655, N = 3SE +/- 0.005334, N = 60.5656990.550588-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.47-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.451. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

FFTW

Build: Float + SSE - Size: 2D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 272.26, N = 3SE +/- 920.77, N = 38049682515-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Timed MAFFT Alignment

Multiple Sequence Alignment - LSU RNA

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.471Multiple Sequence Alignment - LSU RNA12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.014, N = 3SE +/- 0.012, N = 37.7037.5231. (CC) gcc options: -std=c99 -O3 -lm -lpthread

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Triad12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.14, N = 3SE +/- 0.13, N = 612.6012.31-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 10012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 319.02, N = 3SE +/- 282.19, N = 32312223659-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

FFTW

Build: Float + SSE - Size: 2D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt9K18K27K36K45KSE +/- 484.91, N = 5SE +/- 90.35, N = 34390042935-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

ASKAP

Test: tConvolve OpenMP - Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 4.37, N = 3SE +/- 12.08, N = 31866.301906.521. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

FFTW

Build: Float + SSE - Size: 1D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt7K14K21K28K35KSE +/- 18.34, N = 3SE +/- 195.07, N = 33218031560-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

FFTW

Build: Stock - Size: 2D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 62.76, N = 3SE +/- 33.65, N = 31377014020-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 3.1.1Test: Basic - Device: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 3.94, N = 3SE +/- 0.43, N = 31684.701656.71-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copy12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt4080120160200SE +/- 0.12, N = 3SE +/- 0.12, N = 3198.4195.21. (CC) gcc options: -O2 -flto -lOpenCL

Intel MPI Benchmarks

Test: IMB-MPI1 Sendrecv

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Sendrecv12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 111.25, N = 3SE +/- 81.05, N = 312273.5412471.89-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.1-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.841. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Intel MPI Benchmarks

Test: IMB-MPI1 Sendrecv

OpenBenchmarking.orgAverage usec, Fewer Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Sendrecv12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1224364860SE +/- 0.35, N = 3SE +/- 0.28, N = 353.5052.66-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.28-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.761. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Darktable

Test: Server Rack - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Server Rack - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.02990.05980.08970.11960.1495SE +/- 0.001, N = 3SE +/- 0.000, N = 30.1330.131

OpenFOAM

Input: Motorbike 30M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 8Input: Motorbike 30M12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt306090120150SE +/- 0.70, N = 3SE +/- 0.13, N = 3137.71135.71-ldynamicMesh-lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt6001200180024003000SE +/- 3.87, N = 3SE +/- 26.69, N = 32596.322560.48-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.76-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.011. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 Atoms12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.26520.53040.79561.06081.326SE +/- 0.00085, N = 3SE +/- 0.00713, N = 31.162491.17871

Intel MPI Benchmarks

Test: IMB-P2P PingPong

OpenBenchmarking.orgAverage Msg/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-P2P PingPong12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2M4M6M8M10MSE +/- 23849.34, N = 3SE +/- 102028.44, N = 386289828513496-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 22289308-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 220823601. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.01096, N = 3SE +/- 0.13446, N = 148.782688.89450-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.63-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.611. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.80291.60582.40873.21164.0145SE +/- 0.02891, N = 3SE +/- 0.02720, N = 103.568373.52788-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.16-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

FFTW

Build: Stock - Size: 1D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt4K8K12K16K20KSE +/- 196.08, N = 3SE +/- 161.26, N = 31829318502-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Intel MPI Benchmarks

Test: IMB-MPI1 Exchange

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Exchange12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 185.82, N = 15SE +/- 218.78, N = 1515189.7615023.98-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.24-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.641. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmark12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt130260390520650SE +/- 3.76, N = 3SE +/- 4.12, N = 3618.63612.18

QMCPACK

Input: simple-H2O

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.11Input: simple-H2O12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt48121620SE +/- 0.21, N = 14SE +/- 0.10, N = 317.9518.13-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_N12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 8.67, N = 3SE +/- 17.38, N = 31841.751859.40-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.01, N = 3SE +/- 0.02, N = 313.2813.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.99-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.071. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Read12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 0.06, N = 3SE +/- 0.07, N = 3263.6261.21. (CC) gcc options: -O2 -flto -lOpenCL

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 20012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt10K20K30K40K50KSE +/- 442.71, N = 3SE +/- 218.90, N = 34662447045-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solver12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2K4K6K8K10KSE +/- 123.70, N = 3SE +/- 6.17, N = 39471.749554.06-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -mavx2

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Download12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt510152025SE +/- 0.24, N = 3SE +/- 0.15, N = 320.1519.98-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.15540.31080.46620.62160.777SE +/- 0.006714, N = 3SE +/- 0.006051, N = 30.6851320.690873-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.6-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.61. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.55461.10921.66382.21842.773SE +/- 0.02457, N = 3SE +/- 0.02589, N = 52.445452.46474-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.14-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Intel MPI Benchmarks

Test: IMB-MPI1 Exchange

OpenBenchmarking.orgAverage usec, Fewer Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Exchange12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.91, N = 15SE +/- 0.88, N = 15109.01108.21-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.44-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.411. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Timed MrBayes Analysis

Primate Phylogeny Analysis

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.37, N = 3SE +/- 0.15, N = 373.7974.34-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

ASKAP

Test: Hogbom Clean OpenMP

OpenBenchmarking.orgIterations Per Second, More Is BetterASKAP 1.0Test: Hogbom Clean OpenMP12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 1.10, N = 3SE +/- 0.64, N = 3267.39269.301. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.19770.39540.59310.79080.9885SE +/- 0.001641, N = 3SE +/- 0.007466, N = 30.8726830.878588-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.82-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.81. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Degridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt10002000300040005000SE +/- 0.00, N = 3SE +/- 30.56, N = 34859.184889.741. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

R Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterR Benchmark12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.02360.04720.07080.09440.118SE +/- 0.0005, N = 3SE +/- 0.0008, N = 150.10440.10501. R scripting front-end version 4.0.4 (2021-02-15)

FFTW

Build: Stock - Size: 2D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 297.88, N = 3SE +/- 135.68, N = 32294822827-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2021.2Implementation: MPI CPU - Input: water_GMX50_bare12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.26690.53380.80071.06761.3345SE +/- 0.001, N = 3SE +/- 0.005, N = 31.1801.186-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -pthread

Algebraic Multi-Grid Benchmark

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt70M140M210M280M350MSE +/- 51637.29, N = 3SE +/- 414229.41, N = 33039751003026176331. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 1.94, N = 3SE +/- 2.17, N = 31334.311328.42-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.19-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.561. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

FFTW

Build: Float + SSE - Size: 1D FFT Size 4096

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 1023.44, N = 3SE +/- 1211.30, N = 3103960104417-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Server Room - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.67771.35542.03312.71083.3885SE +/- 0.004, N = 3SE +/- 0.007, N = 33.0122.999

ASKAP

Test: tConvolve OpenMP - Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Degridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt8001600240032004000SE +/- 16.43, N = 3SE +/- 47.99, N = 33614.483599.351. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.4350.871.3051.742.175SE +/- 0.00686, N = 3SE +/- 0.00843, N = 31.933471.92539-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.85-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.861. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

TensorFlow Lite

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: SqueezeNet12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 601.51, N = 3SE +/- 572.70, N = 3145830145241

OpenFOAM

Input: Motorbike 60M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 8Input: Motorbike 60M12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2004006008001000SE +/- 2.30, N = 3SE +/- 0.58, N = 3867.20864.00-ldynamicMesh-lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm

GNU Octave Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterGNU Octave Benchmark 6.1.1~hg.2021.01.2612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1.1432.2863.4294.5725.715SE +/- 0.018, N = 5SE +/- 0.026, N = 55.0805.064

TensorFlow Lite

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet Float12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 155.46, N = 3SE +/- 138.23, N = 397253.997559.5

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5001000150020002500SE +/- 4.59, N = 3SE +/- 7.55, N = 32540.282532.45-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.6-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.581. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.58191.16381.74572.32762.9095SE +/- 0.00632, N = 3SE +/- 0.00417, N = 32.586372.57885-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.33-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.341. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

ASKAP

Test: tConvolve MT - Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 0.56, N = 3SE +/- 0.92, N = 31245.641248.931. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt6001200180024003000SE +/- 25.72, N = 3SE +/- 20.40, N = 32585.152578.54-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.47-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.451. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Euclidean Cluster

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Euclidean Cluster12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 0.74, N = 3SE +/- 15.22, N = 31671.511667.431. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Darktable

Test: Masskrug - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Masskrug - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.86111.72222.58333.44444.3055SE +/- 0.012, N = 3SE +/- 0.007, N = 33.8273.818

ArrayFire

Test: BLAS CPU

OpenBenchmarking.orgGFLOPS, More Is BetterArrayFire 3.7Test: BLAS CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 0.76, N = 3SE +/- 0.54, N = 31205.091207.87-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -rdynamic

DeepSpeech

Acceleration: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1122334455SE +/- 0.34, N = 3SE +/- 0.29, N = 348.8548.96

Parboil

Test: OpenMP Stencil

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP Stencil12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt48121620SE +/- 0.07, N = 3SE +/- 0.05, N = 315.0114.981. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Parboil

Test: OpenMP CUTCP

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP CUTCP12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.71621.43242.14862.86483.581SE +/- 0.006360, N = 3SE +/- 0.009619, N = 33.1771413.1831091. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 2.19, N = 3SE +/- 5.65, N = 31337.471339.83-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.85-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.621. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.52561.05121.57682.10242.628SE +/- 0.00302, N = 3SE +/- 0.02354, N = 32.336132.33206-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.05-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.071. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SP12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt150300450600750SE +/- 0.96, N = 3SE +/- 0.08, N = 3680.88682.02-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

TensorFlow Lite

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception ResNet V212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400K800K1200K1600K2000KSE +/- 4029.02, N = 3SE +/- 2515.38, N = 318787301881663

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 3.3.2Pfam Database Search12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.11, N = 3SE +/- 0.24, N = 382.4882.61-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 Hash12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.0009, N = 3SE +/- 0.0002, N = 39.30419.3179-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

TensorFlow Lite

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception V412700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400K800K1200K1600K2000KSE +/- 1250.33, N = 3SE +/- 4943.42, N = 320801102082823

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.01, N = 3SE +/- 0.00, N = 313.4013.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.16-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Reduction12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 0.22, N = 3SE +/- 0.16, N = 3254.13254.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.26710.53420.80131.06841.3355SE +/- 0.01307, N = 3SE +/- 0.01157, N = 31.185831.18705-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.01-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 11. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read Bandwidth12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt80160240320400SE +/- 1.54, N = 3SE +/- 1.09, N = 3349.34349.01-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.312700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt15003000450060007500SE +/- 82.30, N = 4SE +/- 67.35, N = 36872.836878.621. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi

miniFE

Problem Size: Small

OpenBenchmarking.orgCG Mflops, More Is BetterminiFE 2.2Problem Size: Small12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt14002800420056007000SE +/- 0.36, N = 3SE +/- 1.22, N = 36411.436407.121. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi

TensorFlow Lite

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: NASNet Mobile12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 671.08, N = 3SE +/- 607.87, N = 3124859124941

TensorFlow Lite

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet Quant12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 62.98, N = 3SE +/- 93.88, N = 398345.398287.0

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: NDT Mapping

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: NDT Mapping12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2004006008001000SE +/- 11.85, N = 3SE +/- 6.53, N = 31033.441033.741. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Parboil

Test: OpenMP LBM

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP LBM12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt306090120150SE +/- 0.03, N = 3SE +/- 0.02, N = 3114.07114.101. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

ASKAP

Test: tConvolve MT - Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Degridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 1.84, N = 3SE +/- 1.75, N = 32054.712054.381. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

FFTW

Build: Stock - Size: 1D FFT Size 32

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 0.67, N = 3SE +/- 4.36, N = 32277422777-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3D12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt306090120150SE +/- 0.71, N = 3SE +/- 0.56, N = 3125.08125.07-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt11002200330044005500SE +/- 0.00, N = 3SE +/- 0.00, N = 35046.075046.071. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1.05252.1053.15754.215.2625SE +/- 0.07736, N = 15SE +/- 0.09231, N = 154.677894.67293-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.27-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.211. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.24230.48460.72690.96921.2115SE +/- 0.01500, N = 15SE +/- 0.02280, N = 121.068561.07696-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.98-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.991. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.95331.90662.85993.81324.7665SE +/- 0.01117, N = 3SE +/- 0.08547, N = 154.131694.23668-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.05-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.021. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.12013, N = 15SE +/- 0.13247, N = 126.277876.60245-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.58-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.531. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Parboil

Test: OpenMP MRI Gridding

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP MRI Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1122334455SE +/- 0.68, N = 15SE +/- 0.99, N = 1249.1548.571. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp


Phoronix Test Suite v10.8.5