12700k HPC+OpenCL AVX512 performance profiling Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2112125-TJ-12700KHPC62 .
12700k HPC+OpenCL AVX512 performance profiling Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Vulkan Compiler File-System Screen Resolution 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt Intel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads) MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) Intel Device 7aa7 32GB 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 Pro Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz) Realtek ALC897 LG HDR WQHD Intel I225-V Pop 21.04 5.15.5-76051505-generic (x86_64) GNOME Shell 3.38.4 X Server 1.20.11 4.6 Mesa 21.2.2 (LLVM 12.0.0) OpenCL 2.2 AMD-APP (3361.0) 1.2.185 GCC 11.1.0 ext4 3440x1440 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 Pro OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Environment Details - 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" - 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - NONE / errors=remount-ro,noatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3 Graphics Details - GLAMOR - BAR1 / Visible vRAM Size: 6128 MB Python Details - Python 2.7.18 + Python 3.9.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
12700k HPC+OpenCL AVX512 performance profiling shoc: OpenCL - S3D shoc: OpenCL - Triad shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash shoc: OpenCL - Reduction shoc: OpenCL - GEMM SGEMM_N shoc: OpenCL - Max SP Flops shoc: OpenCL - Bus Speed Download shoc: OpenCL - Bus Speed Readback shoc: OpenCL - Texture Read Bandwidth cl-mem: Copy cl-mem: Read cl-mem: Write hpl: lczero: BLAS parboil: OpenMP LBM parboil: OpenMP CUTCP parboil: OpenMP Stencil parboil: OpenMP MRI Gridding minife: Small cp2k: Fayalite-FIST namd: ATPase Simulation - 327,506 Atoms amg: fftw: Stock - 1D FFT Size 32 fftw: Stock - 2D FFT Size 32 fftw: Stock - 1D FFT Size 4096 fftw: Stock - 2D FFT Size 4096 fftw: Float + SSE - 1D FFT Size 32 fftw: Float + SSE - 2D FFT Size 32 fftw: Float + SSE - 1D FFT Size 4096 fftw: Float + SSE - 2D FFT Size 4096 pennant: sedovbig pennant: leblancbig mrbayes: Primate Phylogeny Analysis qmcpack: simple-H2O hmmer: Pfam Database Search mafft: Multiple Sequence Alignment - LSU RNA openfoam: Motorbike 30M openfoam: Motorbike 60M relion: Basic - CPU lulesh: arrayfire: BLAS CPU mt-dgemm: Sustained Floating-Point Rate himeno: Poisson Pressure Solver onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU numpy: deepspeech: CPU rbenchmark: rnnoise: askap: tConvolve MT - Gridding askap: tConvolve MT - Degridding askap: tConvolve MPI - Degridding askap: tConvolve MPI - Gridding askap: tConvolve OpenMP - Gridding askap: tConvolve OpenMP - Degridding askap: Hogbom Clean OpenMP intel-mpi: IMB-P2P PingPong intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-MPI1 PingPong intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 Sendrecv gromacs: MPI CPU - water_GMX50_bare daphne: OpenMP - NDT Mapping daphne: OpenMP - Points2Image daphne: OpenMP - Euclidean Cluster tensorflow-lite: SqueezeNet tensorflow-lite: Inception V4 tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Float tensorflow-lite: Mobilenet Quant tensorflow-lite: Inception ResNet V2 darktable: Boat - OpenCL darktable: Masskrug - OpenCL darktable: Server Rack - OpenCL darktable: Server Room - OpenCL octave-benchmark: caffe: AlexNet - CPU - 100 caffe: AlexNet - CPU - 200 caffe: AlexNet - CPU - 1000 caffe: GoogleNet - CPU - 100 caffe: GoogleNet - CPU - 200 caffe: GoogleNet - CPU - 1000 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 125.078 12.5997 680.883 9.3041 254.126 1841.75 8376637 20.1487 20.3871 349.340 198.4 263.6 255.3 97.554 906 114.068817 3.177141 15.011457 49.150711 6411.43 398.345 1.16249 303975100 22774 22948 18293 13770 32180 80496 103960 43900 69.89434 49.93021 73.794 17.950 82.482 7.703 137.71 867.20 1684.702 6872.8297 1205.09 4.875598 9471.738369 2.58637 8.78268 0.685132 1.93347 2.44545 3.56837 13.3957 6.27787 4.13169 13.2836 0.872683 1.06856 2540.28 1388.34 2596.32 6.77334 7.43204 4.67789 1334.31 2.33613 2585.15 1337.47 0.565699 1.18583 618.63 48.85145 0.1044 16.509 1245.64 2054.71 4859.18 5046.07 1866.30 3614.48 267.389 8628982 15189.76 109.01 10004.46 12273.54 53.50 1.180 1033.44 35022.145431097 1671.51 145830 2080110 124859 97253.9 98345.3 1878730 3.971 3.827 0.133 3.012 5.080 23122 46624 247624 79674 157272 726541 125.070 12.3131 682.017 9.3179 254.407 1859.40 9031599 19.9782 21.0509 349.012 195.2 261.2 248.4 100.593 959 114.096662 3.183109 14.980375 48.567082 6407.12 372.744 1.17871 302617633 22777 22827 18502 14020 31560 82515 104417 42935 67.77358 47.38740 74.339 18.126 82.610 7.523 135.71 864.00 1656.713 6878.6229 1207.87 5.016124 9554.056801 2.57885 8.89450 0.690873 1.92539 2.46474 3.52788 13.4115 6.60245 4.23668 13.4080 0.878588 1.07696 2532.45 1341.61 2560.48 6.34910 6.72455 4.67293 1328.42 2.33206 2578.54 1339.83 0.550588 1.18705 612.18 48.96052 0.1050 17.119 1248.93 2054.38 4889.74 5046.07 1906.52 3599.35 269.303 8513496 15023.98 108.21 10294.25 12471.89 52.66 1.186 1033.74 36114.808332258 1667.43 145241 2082823 124941 97559.5 98287.0 1881663 4.151 3.818 0.131 2.999 5.064 23659 47045 238435 67852 136187 680177 OpenBenchmarking.org
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.71, N = 3 SE +/- 0.56, N = 3 125.08 125.07 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.14, N = 3 SE +/- 0.13, N = 6 12.60 12.31 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 150 300 450 600 750 SE +/- 0.96, N = 3 SE +/- 0.08, N = 3 680.88 682.02 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.0009, N = 3 SE +/- 0.0002, N = 3 9.3041 9.3179 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.22, N = 3 SE +/- 0.16, N = 3 254.13 254.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 8.67, N = 3 SE +/- 17.38, N = 3 1841.75 1859.40 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 65495.62, N = 3 SE +/- 142226.18, N = 9 8376637 9031599 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.24, N = 3 SE +/- 0.15, N = 3 20.15 19.98 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.21, N = 15 20.39 21.05 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 80 160 240 320 400 SE +/- 1.54, N = 3 SE +/- 1.09, N = 3 349.34 349.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 40 80 120 160 200 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 198.4 195.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 263.6 261.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.17, N = 3 SE +/- 0.15, N = 3 255.3 248.4 1. (CC) gcc options: -O2 -flto -lOpenCL
HPL Linpack OpenBenchmarking.org GFLOPS, More Is Better HPL Linpack 2.3 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.14, N = 3 SE +/- 1.07, N = 3 97.55 100.59 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 12.84, N = 9 SE +/- 11.95, N = 4 906 959 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -flto -O3 -pthread
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 114.07 114.10 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.7162 1.4324 2.1486 2.8648 3.581 SE +/- 0.006360, N = 3 SE +/- 0.009619, N = 3 3.177141 3.183109 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.05, N = 3 15.01 14.98 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.68, N = 15 SE +/- 0.99, N = 12 49.15 48.57 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1400 2800 4200 5600 7000 SE +/- 0.36, N = 3 SE +/- 1.22, N = 3 6411.43 6407.12 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 8.2 Input: Fayalite-FIST 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 90 180 270 360 450 398.35 372.74
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2652 0.5304 0.7956 1.0608 1.326 SE +/- 0.00085, N = 3 SE +/- 0.00713, N = 3 1.16249 1.17871
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 70M 140M 210M 280M 350M SE +/- 51637.29, N = 3 SE +/- 414229.41, N = 3 303975100 302617633 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
FFTW Build: Stock - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 0.67, N = 3 SE +/- 4.36, N = 3 22774 22777 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Stock - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 297.88, N = 3 SE +/- 135.68, N = 3 22948 22827 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Stock - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4K 8K 12K 16K 20K SE +/- 196.08, N = 3 SE +/- 161.26, N = 3 18293 18502 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Stock - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 62.76, N = 3 SE +/- 33.65, N = 3 13770 14020 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 7K 14K 21K 28K 35K SE +/- 18.34, N = 3 SE +/- 195.07, N = 3 32180 31560 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 272.26, N = 3 SE +/- 920.77, N = 3 80496 82515 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 1023.44, N = 3 SE +/- 1211.30, N = 3 103960 104417 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 9K 18K 27K 36K 45K SE +/- 484.91, N = 5 SE +/- 90.35, N = 3 43900 42935 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 16 32 48 64 80 SE +/- 0.35, N = 3 SE +/- 0.13, N = 3 69.89 67.77 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 49.93 47.39 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.37, N = 3 SE +/- 0.15, N = 3 73.79 74.34 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.11 Input: simple-H2O 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.21, N = 14 SE +/- 0.10, N = 3 17.95 18.13 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.2 Pfam Database Search 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 82.48 82.61 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi
Timed MAFFT Alignment Multiple Sequence Alignment - LSU RNA OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 7.471 Multiple Sequence Alignment - LSU RNA 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.014, N = 3 SE +/- 0.012, N = 3 7.703 7.523 1. (CC) gcc options: -std=c99 -O3 -lm -lpthread
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.70, N = 3 SE +/- 0.13, N = 3 137.71 135.71 -ldynamicMesh -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 2.30, N = 3 SE +/- 0.58, N = 3 867.20 864.00 -ldynamicMesh -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 3.94, N = 3 SE +/- 0.43, N = 3 1684.70 1656.71 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1500 3000 4500 6000 7500 SE +/- 82.30, N = 4 SE +/- 67.35, N = 3 6872.83 6878.62 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.76, N = 3 SE +/- 0.54, N = 3 1205.09 1207.87 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -rdynamic
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1.1286 2.2572 3.3858 4.5144 5.643 SE +/- 0.034356, N = 3 SE +/- 0.013832, N = 3 4.875598 5.016124 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -march=native -fopenmp
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 123.70, N = 3 SE +/- 6.17, N = 3 9471.74 9554.06 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -mavx2
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5819 1.1638 1.7457 2.3276 2.9095 SE +/- 0.00632, N = 3 SE +/- 0.00417, N = 3 2.58637 2.57885 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.33 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.34 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.01096, N = 3 SE +/- 0.13446, N = 14 8.78268 8.89450 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.63 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1554 0.3108 0.4662 0.6216 0.777 SE +/- 0.006714, N = 3 SE +/- 0.006051, N = 3 0.685132 0.690873 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.6 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.435 0.87 1.305 1.74 2.175 SE +/- 0.00686, N = 3 SE +/- 0.00843, N = 3 1.93347 1.92539 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.85 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5546 1.1092 1.6638 2.2184 2.773 SE +/- 0.02457, N = 3 SE +/- 0.02589, N = 5 2.44545 2.46474 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.14 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.8029 1.6058 2.4087 3.2116 4.0145 SE +/- 0.02891, N = 3 SE +/- 0.02720, N = 10 3.56837 3.52788 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 13.40 13.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.12013, N = 15 SE +/- 0.13247, N = 12 6.27787 6.60245 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.58 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.9533 1.9066 2.8599 3.8132 4.7665 SE +/- 0.01117, N = 3 SE +/- 0.08547, N = 15 4.13169 4.23668 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.05 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.02 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 13.28 13.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.99 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1977 0.3954 0.5931 0.7908 0.9885 SE +/- 0.001641, N = 3 SE +/- 0.007466, N = 3 0.872683 0.878588 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.82 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2423 0.4846 0.7269 0.9692 1.2115 SE +/- 0.01500, N = 15 SE +/- 0.02280, N = 12 1.06856 1.07696 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.98 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.99 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 500 1000 1500 2000 2500 SE +/- 4.59, N = 3 SE +/- 7.55, N = 3 2540.28 2532.45 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.6 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.58 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 11.91, N = 8 SE +/- 3.44, N = 3 1388.34 1341.61 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.31 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 3.87, N = 3 SE +/- 26.69, N = 3 2596.32 2560.48 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.76 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.06030, N = 3 SE +/- 0.00598, N = 3 6.77334 6.34910 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.15 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.04340, N = 3 SE +/- 0.01945, N = 3 7.43204 6.72455 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.53 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1.0525 2.105 3.1575 4.21 5.2625 SE +/- 0.07736, N = 15 SE +/- 0.09231, N = 15 4.67789 4.67293 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.27 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 1.94, N = 3 SE +/- 2.17, N = 3 1334.31 1328.42 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.19 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5256 1.0512 1.5768 2.1024 2.628 SE +/- 0.00302, N = 3 SE +/- 0.02354, N = 3 2.33613 2.33206 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.05 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 25.72, N = 3 SE +/- 20.40, N = 3 2585.15 2578.54 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.47 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 2.19, N = 3 SE +/- 5.65, N = 3 1337.47 1339.83 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.85 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1273 0.2546 0.3819 0.5092 0.6365 SE +/- 0.005655, N = 3 SE +/- 0.005334, N = 6 0.565699 0.550588 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.47 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2671 0.5342 0.8013 1.0684 1.3355 SE +/- 0.01307, N = 3 SE +/- 0.01157, N = 3 1.18583 1.18705 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.01 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 130 260 390 520 650 SE +/- 3.76, N = 3 SE +/- 4.12, N = 3 618.63 612.18
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.34, N = 3 SE +/- 0.29, N = 3 48.85 48.96
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.0236 0.0472 0.0708 0.0944 0.118 SE +/- 0.0005, N = 3 SE +/- 0.0008, N = 15 0.1044 0.1050 1. R scripting front-end version 4.0.4 (2021-02-15)
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.21, N = 3 SE +/- 0.01, N = 3 16.51 17.12 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.56, N = 3 SE +/- 0.92, N = 3 1245.64 1248.93 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 1.84, N = 3 SE +/- 1.75, N = 3 2054.71 2054.38 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Degridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1000 2000 3000 4000 5000 SE +/- 0.00, N = 3 SE +/- 30.56, N = 3 4859.18 4889.74 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Gridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1100 2200 3300 4400 5500 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 5046.07 5046.07 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 4.37, N = 3 SE +/- 12.08, N = 3 1866.30 1906.52 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 800 1600 2400 3200 4000 SE +/- 16.43, N = 3 SE +/- 47.99, N = 3 3614.48 3599.35 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 1.10, N = 3 SE +/- 0.64, N = 3 267.39 269.30 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Intel MPI Benchmarks Test: IMB-P2P PingPong OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 23849.34, N = 3 SE +/- 102028.44, N = 3 8628982 8513496 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 22289308 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 22082360 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 185.82, N = 15 SE +/- 218.78, N = 15 15189.76 15023.98 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.24 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.64 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.91, N = 15 SE +/- 0.88, N = 15 109.01 108.21 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.44 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.41 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 PingPong OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 140.45, N = 15 SE +/- 131.85, N = 3 10004.46 10294.25 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.72 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.09 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 111.25, N = 3 SE +/- 81.05, N = 3 12273.54 12471.89 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.1 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.84 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12 24 36 48 60 SE +/- 0.35, N = 3 SE +/- 0.28, N = 3 53.50 52.66 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.28 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.76 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2669 0.5338 0.8007 1.0676 1.3345 SE +/- 0.001, N = 3 SE +/- 0.005, N = 3 1.180 1.186 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -pthread
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 11.85, N = 3 SE +/- 6.53, N = 3 1033.44 1033.74 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 8K 16K 24K 32K 40K SE +/- 389.77, N = 3 SE +/- 233.74, N = 3 35022.15 36114.81 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 0.74, N = 3 SE +/- 15.22, N = 3 1671.51 1667.43 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 601.51, N = 3 SE +/- 572.70, N = 3 145830 145241
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 1250.33, N = 3 SE +/- 4943.42, N = 3 2080110 2082823
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 671.08, N = 3 SE +/- 607.87, N = 3 124859 124941
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 155.46, N = 3 SE +/- 138.23, N = 3 97253.9 97559.5
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 62.98, N = 3 SE +/- 93.88, N = 3 98345.3 98287.0
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 4029.02, N = 3 SE +/- 2515.38, N = 3 1878730 1881663
Darktable Test: Boat - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Boat - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.934 1.868 2.802 3.736 4.67 SE +/- 0.049, N = 3 SE +/- 0.043, N = 3 3.971 4.151
Darktable Test: Masskrug - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Masskrug - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.8611 1.7222 2.5833 3.4444 4.3055 SE +/- 0.012, N = 3 SE +/- 0.007, N = 3 3.827 3.818
Darktable Test: Server Rack - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Rack - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.0299 0.0598 0.0897 0.1196 0.1495 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 0.133 0.131
Darktable Test: Server Room - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Room - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.6777 1.3554 2.0331 2.7108 3.3885 SE +/- 0.004, N = 3 SE +/- 0.007, N = 3 3.012 2.999
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 6.1.1~hg.2021.01.26 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1.143 2.286 3.429 4.572 5.715 SE +/- 0.018, N = 5 SE +/- 0.026, N = 5 5.080 5.064
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 319.02, N = 3 SE +/- 282.19, N = 3 23122 23659 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 10K 20K 30K 40K 50K SE +/- 442.71, N = 3 SE +/- 218.90, N = 3 46624 47045 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 50K 100K 150K 200K 250K SE +/- 3023.73, N = 9 SE +/- 1961.39, N = 3 247624 238435 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 640.77, N = 15 SE +/- 292.08, N = 3 79674 67852 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 1277.18, N = 3 SE +/- 778.54, N = 3 157272 136187 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 160K 320K 480K 640K 800K SE +/- 11125.73, N = 9 SE +/- 2693.85, N = 3 726541 680177 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Phoronix Test Suite v10.8.5