12700k HPC+OpenCL AVX512 performance profiling

Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2112125-TJ-12700KHPC62
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Limit displaying results to tests within:

Bioinformatics 5 Tests
BLAS (Basic Linear Algebra Sub-Routine) Tests 5 Tests
C++ Boost Tests 4 Tests
C/C++ Compiler Tests 6 Tests
CPU Massive 13 Tests
Creator Workloads 4 Tests
HPC - High Performance Computing 32 Tests
LAPACK (Linear Algebra Pack) Tests 3 Tests
Linear Algebra 3 Tests
Machine Learning 9 Tests
Molecular Dynamics 7 Tests
MPI Benchmarks 7 Tests
Multi-Core 9 Tests
NVIDIA GPU Compute 6 Tests
OpenCL 4 Tests
OpenMPI Tests 15 Tests
Programmer / Developer System Benchmarks 3 Tests
Python Tests 3 Tests
Scientific Computing 17 Tests
Server CPU Tests 5 Tests
Single-Threaded 3 Tests
Speech 2 Tests
Telephony 2 Tests
Common Workstation Benchmarks 2 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt
December 09 2021
  10 Hours, 32 Minutes
12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt
December 11 2021
  8 Hours, 28 Minutes
Invert Hiding All Results Option
  9 Hours, 30 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


12700k HPC+OpenCL AVX512 performance profilingOpenBenchmarking.orgPhoronix Test SuiteIntel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads)MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS)Intel Device 7aa732GB500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 Pro500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 ProGigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz)Realtek ALC897LG HDR WQHDIntel I225-VPop 21.045.15.5-76051505-generic (x86_64)GNOME Shell 3.38.4X Server 1.20.114.6 Mesa 21.2.2 (LLVM 12.0.0)OpenCL 2.2 AMD-APP (3361.0)1.2.185GCC 11.1.0ext43440x1440ProcessorMotherboardChipsetMemoryDisksGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLVulkanCompilerFile-SystemScreen Resolution12700k HPC+OpenCL AVX512 Performance Profiling BenchmarksSystem Logs- Transparent Huge Pages: madvise- 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" - 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - NONE / errors=remount-ro,noatime,rw / Block Size: 4096- Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3- GLAMOR - BAR1 / Visible vRAM Size: 6128 MB- Python 2.7.18 + Python 3.9.5- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt vs. 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt ComparisonPhoronix Test SuiteBaseline+4.4%+4.4%+8.8%+8.8%+13.2%+13.2%17.4%15.5%10.5%7.8%6.9%6.8%6.7%5.8%5.4%3.9%3.5%3.3%3.1%3.1%3.1%2.9%2.9%2.7%2.5%2.4%2.2%GoogleNet - CPU - 100GoogleNet - CPU - 200D.B.s - bf16bf16bf16 - CPUOpenCL - Max SP FlopsFayalite-FISTGoogleNet - CPU - 1000C.B.S.A - bf16bf16bf16 - CPUBLASleblancbigD.B.s - f32 - CPU5.2%Boat - OpenCL4.5%AlexNet - CPU - 10003.7%R.N.N.I - f32 - CPUOpenCL - Bus Speed ReadbacksedovbigOpenMP - Points2ImageIMB-MPI1 PingPongS.F.P.RWrite2.8%M.M.B.S.T - u8s8f32 - CPUD.B.s - f32 - CPU2.5%Float + SSE - 2D FFT Size 32M.S.A - LSU RNAOpenCL - Triad2.3%AlexNet - CPU - 1002.3%Float + SSE - 2D FFT Size 40962.2%tConvolve OpenMP - GriddingFloat + SSE - 1D FFT Size 322%CaffeCaffeoneDNNSHOC Scalable HeterOgeneous ComputingCP2K Molecular DynamicsCaffeoneDNNLeelaChessZeroPennantoneDNNDarktableCaffeRNNoiseoneDNNSHOC Scalable HeterOgeneous ComputingPennantDarmstadt Automotive Parallel Heterogeneous SuiteHPL LinpackIntel MPI BenchmarksACES DGEMMcl-memoneDNNoneDNNFFTWTimed MAFFT AlignmentSHOC Scalable HeterOgeneous ComputingCaffeFFTWASKAPFFTW12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt

12700k HPC+OpenCL AVX512 performance profilingcaffe: GoogleNet - CPU - 100caffe: GoogleNet - CPU - 200onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUshoc: OpenCL - Max SP Flopscp2k: Fayalite-FISTcaffe: GoogleNet - CPU - 1000onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUlczero: BLASpennant: leblancbigdarktable: Boat - OpenCLcaffe: AlexNet - CPU - 1000rnnoise: onednn: Recurrent Neural Network Inference - f32 - CPUshoc: OpenCL - Bus Speed Readbackpennant: sedovbigdaphne: OpenMP - Points2Imagehpl: intel-mpi: IMB-MPI1 PingPongmt-dgemm: Sustained Floating-Point Ratecl-mem: Writeonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUfftw: Float + SSE - 2D FFT Size 32mafft: Multiple Sequence Alignment - LSU RNAshoc: OpenCL - Triadcaffe: AlexNet - CPU - 100fftw: Float + SSE - 2D FFT Size 4096askap: tConvolve OpenMP - Griddingfftw: Float + SSE - 1D FFT Size 32fftw: Stock - 2D FFT Size 4096relion: Basic - CPUcl-mem: Copyintel-mpi: IMB-MPI1 Sendrecvintel-mpi: IMB-MPI1 Sendrecvdarktable: Server Rack - OpenCLopenfoam: Motorbike 30Monednn: Recurrent Neural Network Training - u8s8f32 - CPUnamd: ATPase Simulation - 327,506 Atomsintel-mpi: IMB-P2P PingPongonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUfftw: Stock - 1D FFT Size 4096intel-mpi: IMB-MPI1 Exchangenumpy: qmcpack: simple-H2Oshoc: OpenCL - GEMM SGEMM_Nonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUcl-mem: Readcaffe: AlexNet - CPU - 200himeno: Poisson Pressure Solvershoc: OpenCL - Bus Speed Downloadonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUintel-mpi: IMB-MPI1 Exchangemrbayes: Primate Phylogeny Analysisaskap: Hogbom Clean OpenMPonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUaskap: tConvolve MPI - Degriddingrbenchmark: fftw: Stock - 2D FFT Size 32gromacs: MPI CPU - water_GMX50_bareamg: onednn: Recurrent Neural Network Inference - u8s8f32 - CPUfftw: Float + SSE - 1D FFT Size 4096darktable: Server Room - OpenCLaskap: tConvolve OpenMP - Degriddingonednn: IP Shapes 3D - u8s8f32 - CPUtensorflow-lite: SqueezeNetopenfoam: Motorbike 60Moctave-benchmark: tensorflow-lite: Mobilenet Floatonednn: Recurrent Neural Network Training - f32 - CPUonednn: IP Shapes 1D - f32 - CPUaskap: tConvolve MT - Griddingonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUdaphne: OpenMP - Euclidean Clusterdarktable: Masskrug - OpenCLarrayfire: BLAS CPUdeepspeech: CPUparboil: OpenMP Stencilparboil: OpenMP CUTCPonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUshoc: OpenCL - FFT SPtensorflow-lite: Inception ResNet V2hmmer: Pfam Database Searchshoc: OpenCL - MD5 Hashtensorflow-lite: Inception V4onednn: Convolution Batch Shapes Auto - f32 - CPUshoc: OpenCL - Reductiononednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUshoc: OpenCL - Texture Read Bandwidthlulesh: minife: Smalltensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Quantdaphne: OpenMP - NDT Mappingparboil: OpenMP LBMaskap: tConvolve MT - Degriddingfftw: Stock - 1D FFT Size 32shoc: OpenCL - S3Daskap: tConvolve MPI - Griddingonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUparboil: OpenMP MRI Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt796741572727.432048376637398.3457265416.7733490649.930213.97124762416.5091388.3420.387169.8943435022.14543109797.55410004.464.875598255.30.565699804967.70312.599723122439001866.3032180137701684.702198.412273.5453.500.133137.712596.321.1624986289828.782683.568371829315189.76618.6317.9501841.7513.2836263.6466249471.73836920.14870.6851322.44545109.0173.794267.3890.8726834859.180.1044229481.1803039751001334.311039603.0123614.481.93347145830867.205.08097253.92540.282.586371245.642585.151671.513.8271205.0948.8514515.0114573.1771411337.472.33613680.883187873082.4829.3041208011013.3957254.1261.18583349.3406872.82976411.4312485998345.31033.44114.0688172054.7122774125.0785046.074.677891.068564.131696.2778749.150711678521361876.724559031599372.7446801776.3491095947.387404.15123843517.1191341.6121.050967.7735836114.808332258100.59310294.255.016124248.40.550588825157.52312.313123659429351906.5231560140201656.713195.212471.8952.660.131135.712560.481.1787185134968.894503.527881850215023.98612.1818.1261859.4013.4080261.2470459554.05680119.97820.6908732.46474108.2174.339269.3030.8785884889.740.1050228271.1863026176331328.421044172.9993599.351.92539145241864.005.06497559.52532.452.578851248.932578.541667.433.8181207.8748.9605214.9803753.1831091339.832.33206682.017188166382.6109.3179208282313.4115254.4071.18705349.0126878.62296407.1212494198287.01033.74114.0966622054.3822777125.0705046.074.672931.076964.236686.6024548.567082OpenBenchmarking.org

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 10012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 640.77, N = 15SE +/- 292.08, N = 37967467852-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 20012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 1277.18, N = 3SE +/- 778.54, N = 3157272136187-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.04340, N = 3SE +/- 0.01945, N = 37.432046.72455-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.53-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.21. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP Flops12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2M4M6M8M10MSE +/- 65495.62, N = 3SE +/- 142226.18, N = 983766379031599-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

CP2K Molecular Dynamics

CP2K is an open-source molecular dynamics software package focused on quantum chemistry and solid-state physics. This test profile currently uses the SSMP (OpenMP) version of cp2k. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 8.2Input: Fayalite-FIST12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt90180270360450398.35372.74

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 100012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt160K320K480K640K800KSE +/- 11125.73, N = 9SE +/- 2693.85, N = 3726541680177-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.06030, N = 3SE +/- 0.00598, N = 36.773346.34910-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.15-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.031. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLAS12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2004006008001000SE +/- 12.84, N = 9SE +/- 11.95, N = 4906959-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -flto -O3 -pthread

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbig12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1122334455SE +/- 0.05, N = 3SE +/- 0.03, N = 349.9347.391. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Darktable

Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Boat - Acceleration: OpenCL12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.9341.8682.8023.7364.67SE +/- 0.043, N = 3SE +/- 0.049, N = 34.1513.971

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 100012700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt50K100K150K200K250KSE +/- 3023.73, N = 9SE +/- 1961.39, N = 3247624238435-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

RNNoise

RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-2812700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt48121620SE +/- 0.01, N = 3SE +/- 0.21, N = 317.1216.51-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -O3 -pedantic -fvisibility=hidden

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 11.91, N = 8SE +/- 3.44, N = 31388.341341.61-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.31-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.011. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Readback12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt510152025SE +/- 0.01, N = 3SE +/- 0.21, N = 1520.3921.05-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbig12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1632486480SE +/- 0.35, N = 3SE +/- 0.13, N = 369.8967.771. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Darmstadt Automotive Parallel Heterogeneous Suite

DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Points2Image12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt8K16K24K32K40KSE +/- 389.77, N = 3SE +/- 233.74, N = 335022.1536114.811. (CXX) g++ options: -O3 -std=c++11 -fopenmp

HPL Linpack

HPL is a well known portable Linpack implementation for distributed memory systems. This test profile is testing HPL upstream directly, outside the scope of the HPC Challenge test profile also available through the Phoronix Test Suite (hpcc). The test profile attempts to generate an optimized HPL.dat input file based on the CPU/memory under test. The automated HPL.dat input generation is still being tuned and thus for now this test profile remains "experimental". Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterHPL Linpack 2.312700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.14, N = 3SE +/- 1.07, N = 397.55100.59-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi

Intel MPI Benchmarks

Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 PingPong12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2K4K6K8K10KSE +/- 140.45, N = 15SE +/- 131.85, N = 310004.4610294.25-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.72-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.091. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point Rate12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1.12862.25723.38584.51445.643SE +/- 0.034356, N = 3SE +/- 0.013832, N = 34.8755985.016124-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -march=native -fopenmp

cl-mem

A basic OpenCL memory benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Write12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt60120180240300SE +/- 0.15, N = 3SE +/- 0.17, N = 3248.4255.31. (CC) gcc options: -O2 -flto -lOpenCL

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.12730.25460.38190.50920.6365SE +/- 0.005655, N = 3SE +/- 0.005334, N = 60.5656990.550588-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.47-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.451. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 272.26, N = 3SE +/- 920.77, N = 38049682515-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Timed MAFFT Alignment

This test performs an alignment of 100 pyruvate decarboxylase sequences. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.471Multiple Sequence Alignment - LSU RNA12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt246810SE +/- 0.014, N = 3SE +/- 0.012, N = 37.7037.5231. (CC) gcc options: -std=c99 -O3 -lm -lpthread

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Triad12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3691215SE +/- 0.13, N = 6SE +/- 0.14, N = 312.3112.60-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 10012700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 282.19, N = 3SE +/- 319.02, N = 32365923122-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 409612700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt9K18K27K36K45KSE +/- 90.35, N = 3SE +/- 484.91, N = 54293543900-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 4.37, N = 3SE +/- 12.08, N = 31866.301906.521. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 3212700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt7K14K21K28K35KSE +/- 195.07, N = 3SE +/- 18.34, N = 33156032180-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 62.76, N = 3SE +/- 33.65, N = 31377014020-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

RELION

RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 3.1.1Test: Basic - Device: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 3.94, N = 3SE +/- 0.43, N = 31684.701656.71-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi

cl-mem

A basic OpenCL memory benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copy12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt4080120160200SE +/- 0.12, N = 3SE +/- 0.12, N = 3195.2198.41. (CC) gcc options: -O2 -flto -lOpenCL

Intel MPI Benchmarks

Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Sendrecv12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 111.25, N = 3SE +/- 81.05, N = 312273.5412471.89-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.1-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.841. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgAverage usec, Fewer Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Sendrecv12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1224364860SE +/- 0.35, N = 3SE +/- 0.28, N = 353.5052.66-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.28-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.761. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Darktable

Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Server Rack - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.02990.05980.08970.11960.1495SE +/- 0.001, N = 3SE +/- 0.000, N = 30.1330.131

OpenFOAM

OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 8Input: Motorbike 30M12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt306090120150SE +/- 0.70, N = 3SE +/- 0.13, N = 3137.71135.71-ldynamicMesh-lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt6001200180024003000SE +/- 3.87, N = 3SE +/- 26.69, N = 32596.322560.48-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.76-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.011. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 Atoms12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.26520.53040.79561.06081.326SE +/- 0.00713, N = 3SE +/- 0.00085, N = 31.178711.16249

Intel MPI Benchmarks

Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgAverage Msg/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-P2P PingPong12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt2M4M6M8M10MSE +/- 102028.44, N = 3SE +/- 23849.34, N = 385134968628982-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 22082360-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 222893081. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt246810SE +/- 0.13446, N = 14SE +/- 0.01096, N = 38.894508.78268-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.61-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.631. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.80291.60582.40873.21164.0145SE +/- 0.02891, N = 3SE +/- 0.02720, N = 103.568373.52788-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.16-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt4K8K12K16K20KSE +/- 196.08, N = 3SE +/- 161.26, N = 31829318502-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Intel MPI Benchmarks

Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgAverage Mbytes/sec, More Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Exchange12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3K6K9K12K15KSE +/- 218.78, N = 15SE +/- 185.82, N = 1515023.9815189.76-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.64-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.241. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmark12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt130260390520650SE +/- 4.12, N = 3SE +/- 3.76, N = 3612.18618.63

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.11Input: simple-H2O12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt48121620SE +/- 0.10, N = 3SE +/- 0.21, N = 1418.1317.95-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_N12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt400800120016002000SE +/- 8.67, N = 3SE +/- 17.38, N = 31841.751859.40-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3691215SE +/- 0.02, N = 3SE +/- 0.01, N = 313.4113.28-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.07-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.991. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

cl-mem

A basic OpenCL memory benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Read12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt60120180240300SE +/- 0.07, N = 3SE +/- 0.06, N = 3261.2263.61. (CC) gcc options: -O2 -flto -lOpenCL

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 20012700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt10K20K30K40K50KSE +/- 218.90, N = 3SE +/- 442.71, N = 34704546624-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Himeno Benchmark

The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solver12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2K4K6K8K10KSE +/- 123.70, N = 3SE +/- 6.17, N = 39471.749554.06-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -O3 -mavx2

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed Download12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt510152025SE +/- 0.15, N = 3SE +/- 0.24, N = 319.9820.15-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.15540.31080.46620.62160.777SE +/- 0.006051, N = 3SE +/- 0.006714, N = 30.6908730.685132-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.6-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.61. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.55461.10921.66382.21842.773SE +/- 0.02589, N = 5SE +/- 0.02457, N = 32.464742.44545-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.19-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.141. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Intel MPI Benchmarks

Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgAverage usec, Fewer Is BetterIntel MPI Benchmarks 2019.3Test: IMB-MPI1 Exchange12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20406080100SE +/- 0.91, N = 15SE +/- 0.88, N = 15109.01108.21-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.44-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.411. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi

Timed MrBayes Analysis

This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20406080100SE +/- 0.15, N = 3SE +/- 0.37, N = 374.3473.79-march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgIterations Per Second, More Is BetterASKAP 1.0Test: Hogbom Clean OpenMP12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 1.10, N = 3SE +/- 0.64, N = 3267.39269.301. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.19770.39540.59310.79080.9885SE +/- 0.007466, N = 3SE +/- 0.001641, N = 30.8785880.872683-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.8-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.821. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Degridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt10002000300040005000SE +/- 0.00, N = 3SE +/- 30.56, N = 34859.184889.741. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

R Benchmark

This test is a quick-running survey of general R performance Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterR Benchmark12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.02360.04720.07080.09440.118SE +/- 0.0008, N = 15SE +/- 0.0005, N = 30.10500.10441. R scripting front-end version 4.0.4 (2021-02-15)

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 3212700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 135.68, N = 3SE +/- 297.88, N = 32282722948-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -pthread -O3 -lm

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2021.2Implementation: MPI CPU - Input: water_GMX50_bare12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.26690.53380.80071.06761.3345SE +/- 0.001, N = 3SE +/- 0.005, N = 31.1801.186-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -pthread

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.212700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt70M140M210M280M350MSE +/- 414229.41, N = 3SE +/- 51637.29, N = 33026176333039751001. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 1.94, N = 3SE +/- 2.17, N = 31334.311328.42-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.19-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.561. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 409612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 1023.44, N = 3SE +/- 1211.30, N = 3103960104417-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

Darktable

Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Server Room - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.67771.35542.03312.71083.3885SE +/- 0.004, N = 3SE +/- 0.007, N = 33.0122.999

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Degridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt8001600240032004000SE +/- 47.99, N = 3SE +/- 16.43, N = 33599.353614.481. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.4350.871.3051.742.175SE +/- 0.00686, N = 3SE +/- 0.00843, N = 31.933471.92539-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.85-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.861. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: SqueezeNet12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 601.51, N = 3SE +/- 572.70, N = 3145830145241

OpenFOAM

OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 8Input: Motorbike 60M12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2004006008001000SE +/- 2.30, N = 3SE +/- 0.58, N = 3867.20864.00-ldynamicMesh-lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm

GNU Octave Benchmark

This test profile measures how long it takes to complete several reference GNU Octave files via octave-benchmark. GNU Octave is used for numerical computations and is an open-source alternative to MATLAB. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterGNU Octave Benchmark 6.1.1~hg.2021.01.2612700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1.1432.2863.4294.5725.715SE +/- 0.018, N = 5SE +/- 0.026, N = 55.0805.064

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet Float12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 138.23, N = 3SE +/- 155.46, N = 397559.597253.9

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5001000150020002500SE +/- 4.59, N = 3SE +/- 7.55, N = 32540.282532.45-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.6-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.581. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.58191.16381.74572.32762.9095SE +/- 0.00632, N = 3SE +/- 0.00417, N = 32.586372.57885-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.33-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.341. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 0.56, N = 3SE +/- 0.92, N = 31245.641248.931. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt6001200180024003000SE +/- 25.72, N = 3SE +/- 20.40, N = 32585.152578.54-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.47-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.451. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Darmstadt Automotive Parallel Heterogeneous Suite

DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Euclidean Cluster12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400800120016002000SE +/- 15.22, N = 3SE +/- 0.74, N = 31667.431671.511. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Darktable

Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 3.4.1Test: Masskrug - Acceleration: OpenCL12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.86111.72222.58333.44444.3055SE +/- 0.012, N = 3SE +/- 0.007, N = 33.8273.818

ArrayFire

ArrayFire is an GPU and CPU numeric processing library, this test uses the built-in CPU and OpenCL ArrayFire benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterArrayFire 3.7Test: BLAS CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt30060090012001500SE +/- 0.76, N = 3SE +/- 0.54, N = 31205.091207.87-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -rdynamic

DeepSpeech

Mozilla DeepSpeech is a speech-to-text engine powered by TensorFlow for machine learning and derived from Baidu's Deep Speech research paper. This test profile times the speech-to-text process for a roughly three minute audio recording. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt1122334455SE +/- 0.29, N = 3SE +/- 0.34, N = 348.9648.85

Parboil

The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP Stencil12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt48121620SE +/- 0.07, N = 3SE +/- 0.05, N = 315.0114.981. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP CUTCP12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.71621.43242.14862.86483.581SE +/- 0.009619, N = 3SE +/- 0.006360, N = 33.1831093.1771411. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30060090012001500SE +/- 5.65, N = 3SE +/- 2.19, N = 31339.831337.47-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.62-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.851. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt0.52561.05121.57682.10242.628SE +/- 0.00302, N = 3SE +/- 0.02354, N = 32.336132.33206-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.05-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.071. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SP12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt150300450600750SE +/- 0.96, N = 3SE +/- 0.08, N = 3680.88682.02-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception ResNet V212700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400K800K1200K1600K2000KSE +/- 2515.38, N = 3SE +/- 4029.02, N = 318816631878730

Timed HMMer Search

This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 3.3.2Pfam Database Search12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt20406080100SE +/- 0.24, N = 3SE +/- 0.11, N = 382.6182.48-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 Hash12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt3691215SE +/- 0.0009, N = 3SE +/- 0.0002, N = 39.30419.3179-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception V412700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400K800K1200K1600K2000KSE +/- 4943.42, N = 3SE +/- 1250.33, N = 320828232080110

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt3691215SE +/- 0.00, N = 3SE +/- 0.01, N = 313.4113.40-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.19-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.161. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Reduction12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt60120180240300SE +/- 0.22, N = 3SE +/- 0.16, N = 3254.13254.41-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.26710.53420.80131.06841.3355SE +/- 0.01157, N = 3SE +/- 0.01307, N = 31.187051.18583-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.011. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read Bandwidth12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt80160240320400SE +/- 1.09, N = 3SE +/- 1.54, N = 3349.01349.34-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

LULESH

LULESH is the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.312700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt15003000450060007500SE +/- 82.30, N = 4SE +/- 67.35, N = 36872.836878.621. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi

miniFE

MiniFE Finite Element is an application for unstructured implicit finite element codes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgCG Mflops, More Is BetterminiFE 2.2Problem Size: Small12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt14002800420056007000SE +/- 1.22, N = 3SE +/- 0.36, N = 36407.126411.431. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: NASNet Mobile12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt30K60K90K120K150KSE +/- 607.87, N = 3SE +/- 671.08, N = 3124941124859

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet Quant12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt20K40K60K80K100KSE +/- 62.98, N = 3SE +/- 93.88, N = 398345.398287.0

Darmstadt Automotive Parallel Heterogeneous Suite

DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: NDT Mapping12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt2004006008001000SE +/- 11.85, N = 3SE +/- 6.53, N = 31033.441033.741. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Parboil

The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP LBM12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt306090120150SE +/- 0.02, N = 3SE +/- 0.03, N = 3114.10114.071. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Degridding12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt400800120016002000SE +/- 1.75, N = 3SE +/- 1.84, N = 32054.382054.711. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 3212700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt5K10K15K20K25KSE +/- 0.67, N = 3SE +/- 4.36, N = 32277422777-mno-amx-tile -mno-amx-int8 -mno-amx-bf16-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect1. (CC) gcc options: -pthread -O3 -lm

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3D12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt306090120150SE +/- 0.56, N = 3SE +/- 0.71, N = 3125.07125.08-march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect-mno-amx-tile -mno-amx-int8 -mno-amx-bf161. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt11002200330044005500SE +/- 0.00, N = 3SE +/- 0.00, N = 35046.075046.071. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1.05252.1053.15754.215.2625SE +/- 0.07736, N = 15SE +/- 0.09231, N = 154.677894.67293-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.27-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.211. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.24230.48460.72690.96921.2115SE +/- 0.02280, N = 12SE +/- 0.01500, N = 151.076961.06856-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.99-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.981. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt0.95331.90662.85993.81324.7665SE +/- 0.08547, N = 15SE +/- 0.01117, N = 34.236684.13169-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.02-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.051. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt246810SE +/- 0.13247, N = 12SE +/- 0.12013, N = 156.602456.27787-mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.53-mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.581. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Parboil

The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenMP MRI Gridding12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt1122334455SE +/- 0.68, N = 15SE +/- 0.99, N = 1249.1548.571. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

106 Results Shown

Caffe:
  GoogleNet - CPU - 100
  GoogleNet - CPU - 200
oneDNN
SHOC Scalable HeterOgeneous Computing
CP2K Molecular Dynamics
Caffe
oneDNN
LeelaChessZero
Pennant
Darktable
Caffe
RNNoise
oneDNN
SHOC Scalable HeterOgeneous Computing
Pennant
Darmstadt Automotive Parallel Heterogeneous Suite
HPL Linpack
Intel MPI Benchmarks
ACES DGEMM
cl-mem
oneDNN
FFTW
Timed MAFFT Alignment
SHOC Scalable HeterOgeneous Computing
Caffe
FFTW
ASKAP
FFTW:
  Float + SSE - 1D FFT Size 32
  Stock - 2D FFT Size 4096
RELION
cl-mem
Intel MPI Benchmarks:
  IMB-MPI1 Sendrecv:
    Average Mbytes/sec
    Average usec
Darktable
OpenFOAM
oneDNN
NAMD
Intel MPI Benchmarks
oneDNN:
  IP Shapes 3D - f32 - CPU
  IP Shapes 3D - bf16bf16bf16 - CPU
FFTW
Intel MPI Benchmarks
Numpy Benchmark
QMCPACK
SHOC Scalable HeterOgeneous Computing
oneDNN
cl-mem
Caffe
Himeno Benchmark
SHOC Scalable HeterOgeneous Computing
oneDNN:
  IP Shapes 1D - u8s8f32 - CPU
  IP Shapes 1D - bf16bf16bf16 - CPU
Intel MPI Benchmarks
Timed MrBayes Analysis
ASKAP
oneDNN
ASKAP
R Benchmark
FFTW
GROMACS
Algebraic Multi-Grid Benchmark
oneDNN
FFTW
Darktable
ASKAP
oneDNN
TensorFlow Lite
OpenFOAM
GNU Octave Benchmark
TensorFlow Lite
oneDNN:
  Recurrent Neural Network Training - f32 - CPU
  IP Shapes 1D - f32 - CPU
ASKAP
oneDNN
Darmstadt Automotive Parallel Heterogeneous Suite
Darktable
ArrayFire
DeepSpeech
Parboil:
  OpenMP Stencil
  OpenMP CUTCP
oneDNN:
  Recurrent Neural Network Inference - bf16bf16bf16 - CPU
  Matrix Multiply Batch Shapes Transformer - f32 - CPU
SHOC Scalable HeterOgeneous Computing
TensorFlow Lite
Timed HMMer Search
SHOC Scalable HeterOgeneous Computing
TensorFlow Lite
oneDNN
SHOC Scalable HeterOgeneous Computing
oneDNN
SHOC Scalable HeterOgeneous Computing
LULESH
miniFE
TensorFlow Lite:
  NASNet Mobile
  Mobilenet Quant
Darmstadt Automotive Parallel Heterogeneous Suite
Parboil
ASKAP
FFTW
SHOC Scalable HeterOgeneous Computing
ASKAP
oneDNN:
  Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU
  Deconvolution Batch shapes_3d - u8s8f32 - CPU
  Deconvolution Batch shapes_3d - f32 - CPU
  Deconvolution Batch shapes_1d - f32 - CPU
Parboil