Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.
12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt Processor: Intel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads), Motherboard: MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS), Chipset: Intel Device 7aa7, Memory: 32GB, Disk: 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 Pro, Graphics: Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz), Audio: Realtek ALC897, Monitor: LG HDR WQHD, Network: Intel I225-V
OS: Pop 21.04, Kernel: 5.15.5-76051505-generic (x86_64), Desktop: GNOME Shell 3.38.4, Display Server: X Server 1.20.11, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.0), OpenCL: OpenCL 2.2 AMD-APP (3361.0), Vulkan: 1.2.185, Compiler: GCC 11.1.0, File-System: ext4, Screen Resolution: 3440x1440
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: NONE / errors=remount-ro,noatime,rw / Block Size: 4096Processor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3Graphics Notes: GLAMOR - BAR1 / Visible vRAM Size: 6128 MBPython Notes: Python 2.7.18 + Python 3.9.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt Changed Disk to 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 Pro .
Environment Change: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect"
RELION RELION - REgularised LIkelihood OptimisatioN - is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy (cryo-EM). It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 0.43, N = 3 SE +/- 3.94, N = 3 1656.71 1684.70 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 160K 320K 480K 640K 800K SE +/- 2693.85, N = 3 SE +/- 11125.73, N = 9 680177 726541 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenFOAM OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 0.58, N = 3 SE +/- 2.30, N = 3 864.00 867.20 -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling -ldynamicMesh 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
HPL Linpack HPL is a well known portable Linpack implementation for distributed memory systems. This test profile is testing HPL upstream directly, outside the scope of the HPC Challenge test profile also available through the Phoronix Test Suite (hpcc). The test profile attempts to generate an optimized HPL.dat input file based on the CPU/memory under test. The automated HPL.dat input generation is still being tuned and thus for now this test profile remains "experimental". Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better HPL Linpack 2.3 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 1.07, N = 3 SE +/- 0.14, N = 3 100.59 97.55 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 142226.18, N = 9 SE +/- 65495.62, N = 3 9031599 8376637 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 11.95, N = 4 SE +/- 12.84, N = 9 959 906 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -flto -O3 -pthread
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 50K 100K 150K 200K 250K SE +/- 1961.39, N = 3 SE +/- 3023.73, N = 9 238435 247624 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 292.08, N = 3 SE +/- 640.77, N = 15 67852 79674 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 9K 18K 27K 36K 45K SE +/- 484.91, N = 5 SE +/- 90.35, N = 3 43900 42935 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
Parboil The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.99, N = 12 SE +/- 0.68, N = 15 48.57 49.15 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.2669 0.5338 0.8007 1.0676 1.3345 SE +/- 0.005, N = 3 SE +/- 0.001, N = 3 1.186 1.180 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -pthread
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 778.54, N = 3 SE +/- 1277.18, N = 3 136187 157272 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenFOAM OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.13, N = 3 SE +/- 0.70, N = 3 135.71 137.71 -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling -ldynamicMesh 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 3.44, N = 3 SE +/- 11.91, N = 8 1341.61 1388.34 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.31 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Parboil The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 114.07 114.10 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 33.65, N = 3 SE +/- 62.76, N = 3 14020 13770 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
TensorFlow Lite This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 1250.33, N = 3 SE +/- 4943.42, N = 3 2080110 2082823
Intel MPI Benchmarks Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.88, N = 15 SE +/- 0.91, N = 15 108.21 109.01 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.44 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 185.82, N = 15 SE +/- 218.78, N = 15 15189.76 15023.98 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.24 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.64 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
TensorFlow Lite This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 4029.02, N = 3 SE +/- 2515.38, N = 3 1878730 1881663
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.12013, N = 15 SE +/- 0.13247, N = 12 6.27787 6.60245 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.58 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Timed HMMer Search This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.2 Pfam Database Search 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 82.48 82.61 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 26.69, N = 3 SE +/- 3.87, N = 3 2560.48 2596.32 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 20.40, N = 3 SE +/- 25.72, N = 3 2578.54 2585.15 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.45 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.47 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 500 1000 1500 2000 2500 SE +/- 7.55, N = 3 SE +/- 4.59, N = 3 2532.45 2540.28 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.58 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 2.17, N = 3 SE +/- 1.94, N = 3 1328.42 1334.31 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.56 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 2.19, N = 3 SE +/- 5.65, N = 3 1337.47 1339.83 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.85 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.37, N = 3 SE +/- 0.15, N = 3 73.79 74.34 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline
Pennant Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 16 32 48 64 80 SE +/- 0.13, N = 3 SE +/- 0.35, N = 3 67.77 69.89 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 1.84, N = 3 SE +/- 1.75, N = 3 2054.71 2054.38 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.92, N = 3 SE +/- 0.56, N = 3 1248.93 1245.64 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2652 0.5304 0.7956 1.0608 1.326 SE +/- 0.00085, N = 3 SE +/- 0.00713, N = 3 1.16249 1.17871
TensorFlow Lite This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 93.88, N = 3 SE +/- 62.98, N = 3 98287.0 98345.3
OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 572.70, N = 3 SE +/- 601.51, N = 3 145241 145830
OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 671.08, N = 3 SE +/- 607.87, N = 3 124859 124941
OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 155.46, N = 3 SE +/- 138.23, N = 3 97253.9 97559.5
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 8K 16K 24K 32K 40K SE +/- 233.74, N = 3 SE +/- 389.77, N = 3 36114.81 35022.15 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Himeno Benchmark The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 6.17, N = 3 SE +/- 123.70, N = 3 9554.06 9471.74 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -mavx2
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.71, N = 3 SE +/- 0.56, N = 3 125.08 125.07 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.11 Input: simple-H2O 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.21, N = 14 SE +/- 0.10, N = 3 17.95 18.13 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl
Pennant Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 47.39 49.93 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 10K 20K 30K 40K 50K SE +/- 442.71, N = 3 SE +/- 218.90, N = 3 46624 47045 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
ACES DGEMM This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1.1286 2.2572 3.3858 4.5144 5.643 SE +/- 0.013832, N = 3 SE +/- 0.034356, N = 3 5.016124 4.875598 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -march=native -fopenmp
miniFE MiniFE Finite Element is an application for unstructured implicit finite element codes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1400 2800 4200 5600 7000 SE +/- 0.36, N = 3 SE +/- 1.22, N = 3 6411.43 6407.12 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
R Benchmark This test is a quick-running survey of general R performance Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.0236 0.0472 0.0708 0.0944 0.118 SE +/- 0.0005, N = 3 SE +/- 0.0008, N = 15 0.1044 0.1050 1. R scripting front-end version 4.0.4 (2021-02-15)
DeepSpeech Mozilla DeepSpeech is a speech-to-text engine powered by TensorFlow for machine learning and derived from Baidu's Deep Speech research paper. This test profile times the speech-to-text process for a roughly three minute audio recording. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.34, N = 3 SE +/- 0.29, N = 3 48.85 48.96
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1100 2200 3300 4400 5500 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 5046.07 5046.07 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1000 2000 3000 4000 5000 SE +/- 30.56, N = 3 SE +/- 0.00, N = 3 4889.74 4859.18 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.01096, N = 3 SE +/- 0.13446, N = 14 8.78268 8.89450 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.63 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Caffe This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 319.02, N = 3 SE +/- 282.19, N = 3 23122 23659 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.01945, N = 3 SE +/- 0.04340, N = 3 6.72455 7.43204 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.2 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1977 0.3954 0.5931 0.7908 0.9885 SE +/- 0.001641, N = 3 SE +/- 0.007466, N = 3 0.872683 0.878588 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.82 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Intel MPI Benchmarks Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 131.85, N = 3 SE +/- 140.45, N = 15 10294.25 10004.46 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.09 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.72 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5546 1.1092 1.6638 2.2184 2.773 SE +/- 0.02457, N = 3 SE +/- 0.02589, N = 5 2.44545 2.46474 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.14 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.8029 1.6058 2.4087 3.2116 4.0145 SE +/- 0.02720, N = 10 SE +/- 0.02891, N = 3 3.52788 3.56837 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.15 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.16 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.1273 0.2546 0.3819 0.5092 0.6365 SE +/- 0.005334, N = 6 SE +/- 0.005655, N = 3 0.550588 0.565699 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.45 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.47 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 6.53, N = 3 SE +/- 11.85, N = 3 1033.74 1033.44 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Intel MPI Benchmarks Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12 24 36 48 60 SE +/- 0.28, N = 3 SE +/- 0.35, N = 3 52.66 53.50 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.76 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.28 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 81.05, N = 3 SE +/- 111.25, N = 3 12471.89 12273.54 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.84 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.1 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
RNNoise RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.21, N = 3 SE +/- 0.01, N = 3 16.51 17.12 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden
Intel MPI Benchmarks Intel MPI Benchmarks for stressing MPI implementations. At this point the test profile aggregates results for some common MPI functionality. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 23849.34, N = 3 SE +/- 102028.44, N = 3 8628982 8513496 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 22289308 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 22082360 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Parboil The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 14.98 15.01 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ArrayFire ArrayFire is an GPU and CPU numeric processing library, this test uses the built-in CPU and OpenCL ArrayFire benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.54, N = 3 SE +/- 0.76, N = 3 1207.87 1205.09 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -rdynamic
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1.0525 2.105 3.1575 4.21 5.2625 SE +/- 0.09231, N = 15 SE +/- 0.07736, N = 15 4.67293 4.67789 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.21 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.27 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.5819 1.1638 1.7457 2.3276 2.9095 SE +/- 0.00417, N = 3 SE +/- 0.00632, N = 3 2.57885 2.58637 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.34 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1554 0.3108 0.4662 0.6216 0.777 SE +/- 0.006714, N = 3 SE +/- 0.006051, N = 3 0.685132 0.690873 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.6 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Algebraic Multi-Grid Benchmark AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 70M 140M 210M 280M 350M SE +/- 51637.29, N = 3 SE +/- 414229.41, N = 3 303975100 302617633 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2423 0.4846 0.7269 0.9692 1.2115 SE +/- 0.01500, N = 15 SE +/- 0.02280, N = 12 1.06856 1.07696 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.98 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.99 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 0.74, N = 3 SE +/- 15.22, N = 3 1671.51 1667.43 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.64, N = 3 SE +/- 1.10, N = 3 269.30 267.39 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.5256 1.0512 1.5768 2.1024 2.628 SE +/- 0.02354, N = 3 SE +/- 0.00302, N = 3 2.33206 2.33613 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.07 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2671 0.5342 0.8013 1.0684 1.3355 SE +/- 0.01307, N = 3 SE +/- 0.01157, N = 3 1.18583 1.18705 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.01 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LULESH LULESH is the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1500 3000 4500 6000 7500 SE +/- 67.35, N = 3 SE +/- 82.30, N = 4 6878.62 6872.83 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.435 0.87 1.305 1.74 2.175 SE +/- 0.00843, N = 3 SE +/- 0.00686, N = 3 1.92539 1.93347 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.86 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.85 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.9533 1.9066 2.8599 3.8132 4.7665 SE +/- 0.01117, N = 3 SE +/- 0.08547, N = 15 4.13169 4.23668 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.05 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.02 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 80 160 240 320 400 SE +/- 1.54, N = 3 SE +/- 1.09, N = 3 349.34 349.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 800 1600 2400 3200 4000 SE +/- 16.43, N = 3 SE +/- 47.99, N = 3 3614.48 3599.35 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 12.08, N = 3 SE +/- 4.37, N = 3 1906.52 1866.30 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 1211.30, N = 3 SE +/- 1023.44, N = 3 104417 103960 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 263.6 261.2 1. (CC) gcc options: -O2 -flto -lOpenCL
OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.17, N = 3 SE +/- 0.15, N = 3 255.3 248.4 1. (CC) gcc options: -O2 -flto -lOpenCL
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 13.40 13.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 13.28 13.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.99 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.00598, N = 3 SE +/- 0.06030, N = 3 6.34910 6.77334 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.03 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darktable Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Boat - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.934 1.868 2.802 3.736 4.67 SE +/- 0.049, N = 3 SE +/- 0.043, N = 3 3.971 4.151
OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Masskrug - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.8611 1.7222 2.5833 3.4444 4.3055 SE +/- 0.007, N = 3 SE +/- 0.012, N = 3 3.818 3.827
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 17.38, N = 3 SE +/- 8.67, N = 3 1859.40 1841.75 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Darktable Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Room - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.6777 1.3554 2.0331 2.7108 3.3885 SE +/- 0.007, N = 3 SE +/- 0.004, N = 3 2.999 3.012
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 4K 8K 12K 16K 20K SE +/- 161.26, N = 3 SE +/- 196.08, N = 3 18502 18293 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.21, N = 15 SE +/- 0.01, N = 3 21.05 20.39 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 297.88, N = 3 SE +/- 135.68, N = 3 22948 22827 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 4.36, N = 3 SE +/- 0.67, N = 3 22777 22774 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
Parboil The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.7162 1.4324 2.1486 2.8648 3.581 SE +/- 0.006360, N = 3 SE +/- 0.009619, N = 3 3.177141 3.183109 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 7K 14K 21K 28K 35K SE +/- 18.34, N = 3 SE +/- 195.07, N = 3 32180 31560 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 920.77, N = 3 SE +/- 272.26, N = 3 82515 80496 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.14, N = 3 SE +/- 0.13, N = 6 12.60 12.31 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.0002, N = 3 SE +/- 0.0009, N = 3 9.3179 9.3041 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.16, N = 3 SE +/- 0.22, N = 3 254.41 254.13 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 150 300 450 600 750 SE +/- 0.08, N = 3 SE +/- 0.96, N = 3 682.02 680.88 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Darktable Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Rack - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.0299 0.0598 0.0897 0.1196 0.1495 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 0.131 0.133
SHOC Scalable HeterOgeneous Computing The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.24, N = 3 SE +/- 0.15, N = 3 20.15 19.98 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt Processor: Intel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads), Motherboard: MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS), Chipset: Intel Device 7aa7, Memory: 32GB, Disk: 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 Pro, Graphics: Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz), Audio: Realtek ALC897, Monitor: LG HDR WQHD, Network: Intel I225-V
OS: Pop 21.04, Kernel: 5.15.5-76051505-generic (x86_64), Desktop: GNOME Shell 3.38.4, Display Server: X Server 1.20.11, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.0), OpenCL: OpenCL 2.2 AMD-APP (3361.0), Vulkan: 1.2.185, Compiler: GCC 11.1.0, File-System: ext4, Screen Resolution: 3440x1440
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: NONE / errors=remount-ro,noatime,rw / Block Size: 4096Processor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3Graphics Notes: GLAMOR - BAR1 / Visible vRAM Size: 6128 MBPython Notes: Python 2.7.18 + Python 3.9.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 9 December 2021 07:03 by user felix.
12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt Processor: Intel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads), Motherboard: MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS), Chipset: Intel Device 7aa7, Memory: 32GB, Disk: 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 Pro, Graphics: Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz), Audio: Realtek ALC897, Monitor: LG HDR WQHD, Network: Intel I225-V
OS: Pop 21.04, Kernel: 5.15.5-76051505-generic (x86_64), Desktop: GNOME Shell 3.38.4, Display Server: X Server 1.20.11, OpenGL: 4.6 Mesa 21.2.2 (LLVM 12.0.0), OpenCL: OpenCL 2.2 AMD-APP (3361.0), Vulkan: 1.2.185, Compiler: GCC 11.1.0, File-System: ext4, Screen Resolution: 3440x1440
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: NONE / errors=remount-ro,noatime,rw / Block Size: 4096Processor Notes: Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3Graphics Notes: GLAMOR - BAR1 / Visible vRAM Size: 6128 MBPython Notes: Python 2.7.18 + Python 3.9.5Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 11 December 2021 11:22 by user felix.