multicore-40c-shared Processor: 2 x Intel Xeon (Skylake IBRS) (40 Cores), Motherboard: OpenStack Foundation Nova v21.0.0 (1.13.0-1ubuntu1.1 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 8192 MB RAM, Disk: 148GB, Graphics: Cirrus Logic GD 5446, Network: Red Hat Virtio device
OS: Ubuntu 20.04, Kernel: 5.4.0-74-generic (x86_64), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x1Java Notes: OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)Python Notes: Python 3.8.5Security Notes: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT disabled + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of Clear buffers; SMT Host state unknown
OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong multicore-40c-shared 800 1600 2400 3200 4000 SE +/- 48.82, N = 3 3670.34 MIN: 2.91 / MAX: 13415.29 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv multicore-40c-shared 600 1200 1800 2400 3000 SE +/- 22.33, N = 3 2697.00 MIN: 1.52 / MAX: 12350.67 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong multicore-40c-shared 2M 4M 6M 8M 10M SE +/- 70926.46, N = 3 10137234 MIN: 4304 / MAX: 27600565 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
OpenBenchmarking.org FPS, More Is Better libgav1 0.16.3 Video Input: Summer Nature 4K multicore-40c-shared 7 14 21 28 35 SE +/- 0.27, N = 15 31.21 1. (CXX) g++ options: -O3 -lpthread -lrt
OpenBenchmarking.org FPS, More Is Better libgav1 0.16.3 Video Input: Summer Nature 1080p multicore-40c-shared 20 40 60 80 100 SE +/- 0.57, N = 3 94.11 1. (CXX) g++ options: -O3 -lpthread -lrt
OpenBenchmarking.org FPS, More Is Better libgav1 0.16.3 Video Input: Chimera 1080p 10-bit multicore-40c-shared 9 18 27 36 45 SE +/- 0.07, N = 3 40.12 1. (CXX) g++ options: -O3 -lpthread -lrt
OpenBenchmarking.org FPS, More Is Better dav1d 0.9.0 Video Input: Summer Nature 4K multicore-40c-shared 50 100 150 200 250 SE +/- 1.41, N = 3 230.35 MIN: 118.79 / MAX: 245.58 1. (CC) gcc options: -pthread -lm
OpenBenchmarking.org FPS, More Is Better dav1d 0.9.0 Video Input: Summer Nature 1080p multicore-40c-shared 120 240 360 480 600 SE +/- 2.20, N = 3 567.52 MIN: 325.87 / MAX: 619.74 1. (CC) gcc options: -pthread -lm
OpenBenchmarking.org FPS, More Is Better dav1d 0.9.0 Video Input: Chimera 1080p 10-bit multicore-40c-shared 90 180 270 360 450 SE +/- 0.35, N = 3 404.76 MIN: 315.36 / MAX: 564.58 1. (CC) gcc options: -pthread -lm
OSPray Intel OSPray is a portable ray-tracing engine for high-performance, high-fidenlity scientific visualizations. OSPray builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: SciVis multicore-40c-shared 7 14 21 28 35 SE +/- 0.00, N = 3 31.25 MIN: 30.3 / MAX: 32.26
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: SciVis multicore-40c-shared 1.1925 2.385 3.5775 4.77 5.9625 SE +/- 0.05, N = 3 5.30 MIN: 5.18 / MAX: 5.46
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: Path Tracer multicore-40c-shared 0.585 1.17 1.755 2.34 2.925 SE +/- 0.00, N = 3 2.60 MIN: 2.5 / MAX: 2.62
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: Path Tracer multicore-40c-shared 0.6683 1.3366 2.0049 2.6732 3.3415 SE +/- 0.03, N = 3 2.97 MIN: 2.85 / MAX: 3.03
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: SciVis multicore-40c-shared 8 16 24 32 40 SE +/- 0.00, N = 3 34.48 MIN: 33.33 / MAX: 35.71
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: Path Tracer multicore-40c-shared 2 4 6 8 10 SE +/- 0.00, N = 3 7.25 MIN: 6.94 / MAX: 7.35
OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: Path Tracer multicore-40c-shared 70 140 210 280 350 SE +/- 0.00, N = 3 333.33 MIN: 250 / MAX: 500
TTSIOD 3D Renderer A portable GPL 3D software renderer that supports OpenMP and Intel Threading Building Blocks with many different rendering modes. This version does not use OpenGL but is entirely CPU/software based. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping multicore-40c-shared 110 220 330 440 550 SE +/- 3.12, N = 3 527.78 1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -flto -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -fopenmp -fwhole-program -lstdc++
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 4K multicore-40c-shared 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.01, N = 3 3.11 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 4K multicore-40c-shared 3 6 9 12 15 SE +/- 0.08, N = 3 12.67 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4K multicore-40c-shared 1.2195 2.439 3.6585 4.878 6.0975 SE +/- 0.02, N = 3 5.42 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K multicore-40c-shared 7 14 21 28 35 SE +/- 0.11, N = 3 28.58 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4K multicore-40c-shared 8 16 24 32 40 SE +/- 0.37, N = 3 33.20 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 0 Two-Pass - Input: Bosphorus 1080p multicore-40c-shared 0.0878 0.1756 0.2634 0.3512 0.439 SE +/- 0.00, N = 3 0.39 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 1080p multicore-40c-shared 1.1565 2.313 3.4695 4.626 5.7825 SE +/- 0.03, N = 3 5.14 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 1080p multicore-40c-shared 5 10 15 20 25 SE +/- 0.25, N = 3 19.87 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 1080p multicore-40c-shared 4 8 12 16 20 SE +/- 0.09, N = 3 14.29 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p multicore-40c-shared 15 30 45 60 75 SE +/- 0.33, N = 3 68.24 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.1 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p multicore-40c-shared 20 40 60 80 100 SE +/- 0.49, N = 3 74.45 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Embree Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Crown multicore-40c-shared 5 10 15 20 25 SE +/- 0.00, N = 3 21.43 MIN: 21.21 / MAX: 21.74
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown multicore-40c-shared 5 10 15 20 25 SE +/- 0.16, N = 3 18.85 MIN: 18.37 / MAX: 19.55
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Asian Dragon multicore-40c-shared 5 10 15 20 25 SE +/- 0.19, N = 3 22.50 MIN: 21.94 / MAX: 23
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Asian Dragon Obj multicore-40c-shared 5 10 15 20 25 SE +/- 0.13, N = 3 20.75 MIN: 20.45 / MAX: 21.38
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Asian Dragon multicore-40c-shared 5 10 15 20 25 SE +/- 0.12, N = 3 22.76 MIN: 22.34 / MAX: 23.27
OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Asian Dragon Obj multicore-40c-shared 5 10 15 20 25 SE +/- 0.14, N = 3 20.27 MIN: 19.85 / MAX: 20.77
Kvazaar This is a test of Kvazaar as a CPU-based H.265 video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Slow multicore-40c-shared 3 6 9 12 15 SE +/- 0.08, N = 15 9.55 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Medium multicore-40c-shared 3 6 9 12 15 SE +/- 0.02, N = 3 10.05 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Slow multicore-40c-shared 7 14 21 28 35 SE +/- 0.26, N = 12 31.63 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Medium multicore-40c-shared 8 16 24 32 40 SE +/- 0.03, N = 3 32.95 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Very Fast multicore-40c-shared 5 10 15 20 25 SE +/- 0.03, N = 3 22.47 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast multicore-40c-shared 8 16 24 32 40 SE +/- 0.02, N = 3 35.08 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Very Fast multicore-40c-shared 16 32 48 64 80 SE +/- 0.42, N = 3 69.70 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast multicore-40c-shared 30 60 90 120 150 SE +/- 0.04, N = 3 114.09 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 4 - Input: Bosphorus 4K multicore-40c-shared 0.3132 0.6264 0.9396 1.2528 1.566 SE +/- 0.010, N = 3 1.392 1. (CXX) g++ options: -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 8 - Input: Bosphorus 4K multicore-40c-shared 4 8 12 16 20 SE +/- 0.23, N = 12 15.67 1. (CXX) g++ options: -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 4 - Input: Bosphorus 1080p multicore-40c-shared 1.0197 2.0394 3.0591 4.0788 5.0985 SE +/- 0.030, N = 3 4.532 1. (CXX) g++ options: -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8.7 Encoder Mode: Preset 8 - Input: Bosphorus 1080p multicore-40c-shared 14 28 42 56 70 SE +/- 0.82, N = 12 60.55 1. (CXX) g++ options: -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
SVT-HEVC This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p multicore-40c-shared 5 10 15 20 25 SE +/- 0.19, N = 5 18.39 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p multicore-40c-shared 50 100 150 200 250 SE +/- 0.14, N = 3 219.86 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p multicore-40c-shared 80 160 240 320 400 SE +/- 0.97, N = 3 383.96 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p multicore-40c-shared 60 120 180 240 300 SE +/- 18.82, N = 12 296.97 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p multicore-40c-shared 70 140 210 280 350 SE +/- 1.55, N = 3 313.44 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p multicore-40c-shared 50 100 150 200 250 SE +/- 0.60, N = 3 249.12 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.10.0 Speed: Speed 5 - Input: Bosphorus 4K multicore-40c-shared 2 4 6 8 10 SE +/- 0.10, N = 3 7.45 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.10.0 Speed: Speed 0 - Input: Bosphorus 1080p multicore-40c-shared 2 4 6 8 10 SE +/- 0.12, N = 3 8.58 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.10.0 Speed: Speed 5 - Input: Bosphorus 1080p multicore-40c-shared 4 8 12 16 20 SE +/- 0.17, N = 3 15.45 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11
x264 This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2019-12-17 H.264 Video Encoding multicore-40c-shared 30 60 90 120 150 SE +/- 6.78, N = 12 140.51 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p multicore-40c-shared 13 26 39 52 65 SE +/- 0.42, N = 14 57.63 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkVdbVolume multicore-40c-shared 5M 10M 15M 20M 25M SE +/- 82579.12, N = 3 23780483 MIN: 866256 / MAX: 131388624
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkStructuredVolume multicore-40c-shared 20M 40M 60M 80M 100M SE +/- 371048.93, N = 3 93888931 MIN: 1022989 / MAX: 995673960
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkUnstructuredVolume multicore-40c-shared 400K 800K 1200K 1600K 2000K SE +/- 13759.58, N = 3 1864738 MIN: 13506 / MAX: 6225496
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl multicore-40c-shared 200 400 600 800 1000 SE +/- 9.60, N = 4 822 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate multicore-40c-shared 100 200 300 400 500 SE +/- 3.77, N = 9 452 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen multicore-40c-shared 50 100 150 200 250 SE +/- 1.45, N = 3 229 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced multicore-40c-shared 110 220 330 440 550 SE +/- 0.58, N = 3 514 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing multicore-40c-shared 400 800 1200 1600 2000 SE +/- 5.33, N = 3 1787 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian multicore-40c-shared 70 140 210 280 350 SE +/- 3.67, N = 3 331 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space multicore-40c-shared 130 260 390 520 650 SE +/- 4.14, N = 15 604 1. (CC) gcc options: -fopenmp -O2 -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP multicore-40c-shared 120 240 360 480 600 SE +/- 7.48, N = 3 536.88 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.5 Scene: DLSC - Acceleration: CPU multicore-40c-shared 0.7695 1.539 2.3085 3.078 3.8475 SE +/- 0.03, N = 5 3.42 MIN: 3.18 / MAX: 3.78
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.5 Scene: Danish Mood - Acceleration: CPU multicore-40c-shared 0.5535 1.107 1.6605 2.214 2.7675 SE +/- 0.02, N = 3 2.46 MIN: 0.81 / MAX: 2.92
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.5 Scene: Orange Juice - Acceleration: CPU multicore-40c-shared 1.2083 2.4166 3.6249 4.8332 6.0415 SE +/- 0.02, N = 3 5.37 MIN: 4.95 / MAX: 5.47
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.5 Scene: LuxCore Benchmark - Acceleration: CPU multicore-40c-shared 0.5175 1.035 1.5525 2.07 2.5875 SE +/- 0.09, N = 12 2.30 MIN: 0.59 / MAX: 3.3
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.5 Scene: Rainbow Colors and Prism - Acceleration: CPU multicore-40c-shared 1.2038 2.4076 3.6114 4.8152 6.019 SE +/- 0.36, N = 15 5.35 MIN: 2.74 / MAX: 7.63
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Compression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 10.37, N = 3 1676.0 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3 - Decompression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 4.62, N = 3 2002.3 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8 - Decompression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 2.08, N = 3 2004.9 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 7.49, N = 3 1652.2 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Compression Speed multicore-40c-shared 130 260 390 520 650 SE +/- 2.42, N = 3 600.5 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 3, Long Mode - Decompression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 6.18, N = 3 2079.1 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Compression Speed multicore-40c-shared 130 260 390 520 650 SE +/- 1.34, N = 3 614.9 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 8, Long Mode - Decompression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 9.87, N = 3 2034.8 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed multicore-40c-shared 400 800 1200 1600 2000 SE +/- 1.47, N = 3 1652.9 1. (CC) gcc options: -O3 -pthread -lz -llzma
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding multicore-40c-shared 600 1200 1800 2400 3000 SE +/- 21.51, N = 3 2986.91 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding multicore-40c-shared 700 1400 2100 2800 3500 SE +/- 19.87, N = 3 3210.73 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding multicore-40c-shared 1500 3000 4500 6000 7500 SE +/- 59.89, N = 3 6946.85 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding multicore-40c-shared 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 8068.36 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding multicore-40c-shared 1000 2000 3000 4000 5000 SE +/- 32.18, N = 15 4685.51 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding multicore-40c-shared 1300 2600 3900 5200 6500 SE +/- 41.10, N = 15 6130.95 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Stockfish This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time multicore-40c-shared 11M 22M 33M 44M 55M SE +/- 423071.20, N = 15 53451711 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fprofile-use -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare multicore-40c-shared 0.529 1.058 1.587 2.116 2.645 SE +/- 0.091, N = 15 2.351 1. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 4 multicore-40c-shared 160 320 480 640 800 SE +/- 37.34, N = 9 738 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 8 multicore-40c-shared 200 400 600 800 1000 SE +/- 3.87, N = 3 796 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 16 multicore-40c-shared 140 280 420 560 700 SE +/- 26.25, N = 6 654 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 32 multicore-40c-shared 110 220 330 440 550 SE +/- 16.55, N = 9 502 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 64 multicore-40c-shared 90 180 270 360 450 SE +/- 10.75, N = 6 436 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 128 multicore-40c-shared 70 140 210 280 350 SE +/- 3.78, N = 3 329 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 256 multicore-40c-shared 60 120 180 240 300 SE +/- 2.02, N = 3 292 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 512 multicore-40c-shared 60 120 180 240 300 SE +/- 3.03, N = 3 294 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -llzma -lbz2 -laio -lnuma -lpcre2-8 -lcrypt -lz -lm -lssl -lcrypto -ldl
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: MD5 multicore-40c-shared 900K 1800K 2700K 3600K 4500K SE +/- 8647.41, N = 3 4318333 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -pthread -lm -lz -ldl -lcrypt -lbz2
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C multicore-40c-shared 20K 40K 60K 80K 100K SE +/- 323.57, N = 3 79136.06 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C multicore-40c-shared 4K 8K 12K 16K 20K SE +/- 108.53, N = 3 16758.30 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C multicore-40c-shared 900 1800 2700 3600 4500 SE +/- 43.38, N = 5 4097.83 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D multicore-40c-shared 800 1600 2400 3200 4000 SE +/- 127.03, N = 12 3951.07 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C multicore-40c-shared 9K 18K 27K 36K 45K SE +/- 470.96, N = 3 40867.99 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D multicore-40c-shared 300 600 900 1200 1500 SE +/- 18.23, N = 3 1367.79 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C multicore-40c-shared 16K 32K 48K 64K 80K SE +/- 714.10, N = 15 76891.25 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C multicore-40c-shared 9K 18K 27K 36K 45K SE +/- 558.29, N = 15 40309.83 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B multicore-40c-shared 9K 18K 27K 36K 45K SE +/- 332.50, N = 15 40434.00 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C multicore-40c-shared 7K 14K 21K 28K 35K SE +/- 632.95, N = 15 32677.55 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv multicore-40c-shared 50 100 150 200 250 SE +/- 2.35, N = 3 248.25 MIN: 1.17 / MAX: 4065.6 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms multicore-40c-shared 0.1527 0.3054 0.4581 0.6108 0.7635 SE +/- 0.00038, N = 3 0.67854
OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig multicore-40c-shared 4 8 12 16 20 SE +/- 0.07, N = 3 14.53 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU multicore-40c-shared 0.3255 0.651 0.9765 1.302 1.6275 SE +/- 0.01770, N = 3 1.44653 MIN: 1.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU multicore-40c-shared 0.8796 1.7592 2.6388 3.5184 4.398 SE +/- 0.00812, N = 3 3.90947 MIN: 3.65 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 0.1935 0.387 0.5805 0.774 0.9675 SE +/- 0.003828, N = 3 0.860158 MIN: 0.82 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 0.2428 0.4856 0.7284 0.9712 1.214 SE +/- 0.00144, N = 3 1.07902 MIN: 1.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 1.0039 2.0078 3.0117 4.0156 5.0195 SE +/- 0.01067, N = 3 4.46188 MIN: 4.3 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 0.9989 1.9978 2.9967 3.9956 4.9945 SE +/- 0.00773, N = 3 4.43945 MIN: 4.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU multicore-40c-shared 0.8651 1.7302 2.5953 3.4604 4.3255 SE +/- 0.00622, N = 3 3.84481 MIN: 3.68 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU multicore-40c-shared 1.2449 2.4898 3.7347 4.9796 6.2245 SE +/- 0.00262, N = 3 5.53268 MIN: 3.63 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU multicore-40c-shared 0.3947 0.7894 1.1841 1.5788 1.9735 SE +/- 0.00302, N = 3 1.75413 MIN: 1.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 0.8436 1.6872 2.5308 3.3744 4.218 SE +/- 0.01542, N = 3 3.74933 MIN: 3.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 0.1708 0.3416 0.5124 0.6832 0.854 SE +/- 0.000566, N = 3 0.759291 MIN: 0.73 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 0.231 0.462 0.693 0.924 1.155 SE +/- 0.00262, N = 3 1.02688 MIN: 0.99 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU multicore-40c-shared 200 400 600 800 1000 SE +/- 6.05, N = 3 1017.74 MIN: 998.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU multicore-40c-shared 130 260 390 520 650 SE +/- 4.47, N = 3 609.21 MIN: 593.64 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 200 400 600 800 1000 SE +/- 4.34, N = 3 1017.17 MIN: 1003.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 1.1327 2.2654 3.3981 4.5308 5.6635 SE +/- 0.00177, N = 3 5.03433 MIN: 5 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 2 4 6 8 10 SE +/- 0.04336, N = 3 7.45023 MIN: 7.29 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 2 4 6 8 10 SE +/- 0.00230, N = 3 6.87810 MIN: 6.78 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 130 260 390 520 650 SE +/- 3.32, N = 3 606.54 MIN: 595.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU multicore-40c-shared 0.106 0.212 0.318 0.424 0.53 SE +/- 0.011059, N = 15 0.471047 MIN: 0.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 200 400 600 800 1000 SE +/- 4.92, N = 3 1032.10 MIN: 1015.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 130 260 390 520 650 SE +/- 0.91, N = 3 613.01 MIN: 606.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU multicore-40c-shared 0.0958 0.1916 0.2874 0.3832 0.479 SE +/- 0.004188, N = 15 0.425939 MIN: 0.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU multicore-40c-shared 0.2801 0.5602 0.8403 1.1204 1.4005 SE +/- 0.00658, N = 3 1.24498 MIN: 1.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU multicore-40c-shared 0.1238 0.2476 0.3714 0.4952 0.619 SE +/- 0.00, N = 3 0.55
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU multicore-40c-shared 0.1238 0.2476 0.3714 0.4952 0.619 SE +/- 0.00, N = 3 0.55
Rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD multicore-40c-shared 20 40 60 80 100 SE +/- 0.17, N = 3 110.10 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP HotSpot3D multicore-40c-shared 30 60 90 120 150 SE +/- 0.12, N = 3 113.57 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte multicore-40c-shared 11 22 33 44 55 SE +/- 0.62, N = 3 46.46 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver multicore-40c-shared 3 6 9 12 15 SE +/- 0.010, N = 3 9.547 1. (CXX) g++ options: -O2 -lOpenCL
OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster multicore-40c-shared 4 8 12 16 20 SE +/- 0.03, N = 3 17.78 1. (CXX) g++ options: -O2 -lOpenCL
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel multicore-40c-shared 6 12 18 24 30 SE +/- 0.03, N = 3 24.71 1. (CC) gcc options: -lm -lpthread -O3
POV-Ray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time multicore-40c-shared 15 30 45 60 75 SE +/- 6.77, N = 12 65.16 1. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -pthread -lSDL -lXpm -lSM -lICE -lX11 -lIlmImf -lImath -lHalf -lIex -lIexMath -lIlmThread -lpthread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
Rust Mandelbrot This test profile is of the combined time for the serial and parallel Mandelbrot sets written in Rustlang via willi-kappler/mandel-rust. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Rust Mandelbrot Time To Complete Serial/Parallel Mandelbrot multicore-40c-shared 11 22 33 44 55 SE +/- 0.58, N = 3 50.44 1. (CC) gcc options: -m64 -pie -nodefaultlibs -ldl -lrt -lpthread -lgcc_s -lc -lm -lutil
Smallpt Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 128 Samples multicore-40c-shared 2 4 6 8 10 SE +/- 0.165, N = 15 6.515 1. (CXX) g++ options: -fopenmp -O3
Tungsten Renderer Tungsten is a C++ physically based renderer that makes use of Intel's Embree ray tracing library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Hair multicore-40c-shared 4 8 12 16 20 SE +/- 0.02, N = 3 13.93 1. (CXX) g++ options: -std=c++0x -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mfma -mbmi2 -mavx512f -mavx512vl -mavx512cd -mavx512dq -mavx512bw -mno-sse4a -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512pf -mno-avx512er -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lIlmImf -lIlmThread -lImath -lHalf -lIex -lz -ljpeg -lpthread -ldl
OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Water Caustic multicore-40c-shared 6 12 18 24 30 SE +/- 0.08, N = 3 27.37 1. (CXX) g++ options: -std=c++0x -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mfma -mbmi2 -mavx512f -mavx512vl -mavx512cd -mavx512dq -mavx512bw -mno-sse4a -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512pf -mno-avx512er -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lIlmImf -lIlmThread -lImath -lHalf -lIex -lz -ljpeg -lpthread -ldl
OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Non-Exponential multicore-40c-shared 3 6 9 12 15 SE +/- 0.05258, N = 3 9.35835 1. (CXX) g++ options: -std=c++0x -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mfma -mbmi2 -mavx512f -mavx512vl -mavx512cd -mavx512dq -mavx512bw -mno-sse4a -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512pf -mno-avx512er -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lIlmImf -lIlmThread -lImath -lHalf -lIex -lz -ljpeg -lpthread -ldl
OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Volumetric Caustic multicore-40c-shared 4 8 12 16 20 SE +/- 0.23, N = 3 16.49 1. (CXX) g++ options: -std=c++0x -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mfma -mbmi2 -mavx512f -mavx512vl -mavx512cd -mavx512dq -mavx512bw -mno-sse4a -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512pf -mno-avx512er -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lIlmImf -lIlmThread -lImath -lHalf -lIex -lz -ljpeg -lpthread -ldl
Timed Wasmer Compilation This test times how long it takes to compile Wasmer. Wasmer is written in the Rust programming language and is a WebAssembly runtime implementation that supports WASI and EmScripten. This test profile builds Wasmer with the Cranelift and Singlepast compiler features enabled. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed Wasmer Compilation 1.0.2 Time To Compile multicore-40c-shared 20 40 60 80 100 SE +/- 0.27, N = 3 85.85 1. (CC) gcc options: -m64 -pie -nodefaultlibs -ldl -lrt -lpthread -lgcc_s -lc -lm -lutil
FFmpeg This test uses FFmpeg for testing the system's audio/video encoding performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 4.0.2 H.264 HD To NTSC DV multicore-40c-shared 2 4 6 8 10 SE +/- 0.031, N = 3 7.728 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lXv -lX11 -lXext -lm -lxcb -lasound -pthread -lva -lbz2 -llzma -lva-drm -lva-x11 -std=c11 -fomit-frame-pointer -fPIC -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing via OpenCL, NVIDIA OptiX, and NVIDIA CUDA is supported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: BMW27 - Compute: CPU-Only multicore-40c-shared 15 30 45 60 75 SE +/- 0.23, N = 3 66.87
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: Pabellon Barcelona - Compute: CPU-Only multicore-40c-shared 50 100 150 200 250 SE +/- 1.39, N = 3 232.35
multicore-40c-shared Processor: 2 x Intel Xeon (Skylake IBRS) (40 Cores), Motherboard: OpenStack Foundation Nova v21.0.0 (1.13.0-1ubuntu1.1 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 16384 MB + 8192 MB RAM, Disk: 148GB, Graphics: Cirrus Logic GD 5446, Network: Red Hat Virtio device
OS: Ubuntu 20.04, Kernel: 5.4.0-74-generic (x86_64), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x1Java Notes: OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)Python Notes: Python 3.8.5Security Notes: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT disabled + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline IBPB: conditional IBRS_FW STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of Clear buffers; SMT Host state unknown
Testing initiated at 8 June 2021 04:02 by user root.