AMD EPYC 7502 testing of various benchmarks under AMD AOCC 2.3, GCC 10.2, LLVM Clang 11. CFLAGS/CXXFLAGS of "-O3 -march=znver2" throughout. Benchmarks by Michael Larabel for a future article.
GCC 10.2 Processor: AMD EPYC 7502 32-Core @ 2.50GHz (32 Cores / 64 Threads), Motherboard: ASRockRack EPYCD8 (P2.10 BIOS), Chipset: AMD Starship/Matisse, Memory: 126GB, Disk: 280GB INTEL SSDPED1D280GA, Graphics: ASPEED, Audio: AMD Starship/Matisse, Monitor: VE228, Network: 2 x Intel I350
OS: Ubuntu 20.10, Kernel: 5.8.0-31-generic (x86_64), Desktop: GNOME Shell 3.38.1, Display Server: X Server 1.20.9, Display Driver: modesetting 1.20.9, Compiler: GCC 10.2.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101cSecurity Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
LLVM Clang 11 OS: Ubuntu 20.10, Kernel: 5.8.0-31-generic (x86_64), Desktop: GNOME Shell 3.38.1, Display Server: X Server 1.20.9, Display Driver: modesetting 1.20.9, Compiler: Clang 11.0.0-2Target:, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2"Processor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101cSecurity Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD AOCC 2.3 OS: Ubuntu 20.10, Kernel: 5.8.0-31-generic (x86_64), Desktop: GNOME Shell 3.38.1, Display Server: X Server 1.20.9, Display Driver: modesetting 1.20.9, Compiler: Clang 11.0.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver2Processor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101cSecurity Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Basis Universal Basis Universal is a GPU texture codoec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 2 + RDO Post-Processing GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 200 400 600 800 1000 SE +/- 0.14, N = 3 SE +/- 0.04, N = 3 SE +/- 0.24, N = 3 755.52 833.54 837.54 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 4K 8K 12K 16K 20K SE +/- 140.90, N = 3 SE +/- 163.63, N = 6 SE +/- 96.60, N = 3 18452.62 13720.16 11946.57 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
NGINX Benchmark This is a test of ab, which is the Apache Benchmark program running against nginx. This test profile measures how many requests per second a given system can sustain when carrying out 2,000,000 requests with 500 requests being carried out concurrently. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 7K 14K 21K 28K 35K SE +/- 254.59, N = 15 SE +/- 159.74, N = 3 SE +/- 381.73, N = 4 31368.86 30676.68 30658.67 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native -march=znver2
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.0329 0.0658 0.0987 0.1316 0.1645 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.146 0.145 0.103 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: yolov4-tiny AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 7 14 21 28 35 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.19, N = 15 29.38 29.47 30.70 -lomp - MIN: 28.77 / MAX: 31.68 -lgomp - MIN: 28.89 / MAX: 31.49 -lomp - MIN: 29.08 / MAX: 40.6 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet50 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 6 12 18 24 30 SE +/- 0.18, N = 3 SE +/- 0.20, N = 15 SE +/- 0.16, N = 3 19.70 22.23 23.70 -lomp - MIN: 19.16 / MAX: 22.41 -lomp - MIN: 20.63 / MAX: 31.79 -lgomp - MIN: 23.21 / MAX: 25.68 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: alexnet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.14, N = 3 SE +/- 0.18, N = 15 8.94 9.46 10.83 -lomp - MIN: 8.81 / MAX: 13.49 -lgomp - MIN: 9.17 / MAX: 11.49 -lomp - MIN: 9.15 / MAX: 60.4 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet18 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 SE +/- 0.12, N = 15 11.99 13.12 13.19 -lomp - MIN: 11.71 / MAX: 14.27 -lgomp - MIN: 12.79 / MAX: 15.11 -lomp - MIN: 12.17 / MAX: 23.53 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: vgg16 GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 8 16 24 32 40 SE +/- 0.30, N = 3 SE +/- 0.31, N = 3 SE +/- 0.42, N = 15 30.89 32.03 36.60 -lgomp - MIN: 30.04 / MAX: 50.21 -lomp - MIN: 30.86 / MAX: 35.2 -lomp - MIN: 32.93 / MAX: 48.08 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: googlenet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.27, N = 15 SE +/- 0.10, N = 3 16.51 18.88 20.85 -lomp - MIN: 16.26 / MAX: 18.75 -lomp - MIN: 16.91 / MAX: 23.31 -lgomp - MIN: 19.8 / MAX: 22.84 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: blazeface AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.8978 1.7956 2.6934 3.5912 4.489 SE +/- 0.03, N = 3 SE +/- 0.02, N = 15 SE +/- 0.10, N = 3 2.19 2.84 3.99 -lomp - MIN: 2.1 / MAX: 2.43 -lomp - MIN: 2.67 / MAX: 4.67 -lgomp - MIN: 3.77 / MAX: 4.73 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: efficientnet-b0 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.09, N = 15 SE +/- 0.09, N = 3 6.94 8.77 11.32 -lomp - MIN: 6.72 / MAX: 9 -lomp - MIN: 7.99 / MAX: 13.6 -lgomp - MIN: 11.03 / MAX: 13.16 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mnasnet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.09, N = 15 SE +/- 0.29, N = 3 5.01 6.26 8.95 -lomp - MIN: 4.87 / MAX: 5.39 -lomp - MIN: 5.63 / MAX: 8.5 -lgomp - MIN: 8.26 / MAX: 11.03 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: shufflenet-v2 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.12, N = 3 SE +/- 0.02, N = 15 SE +/- 0.68, N = 3 6.25 7.35 10.59 -lomp - MIN: 5.96 / MAX: 6.53 -lomp - MIN: 7.09 / MAX: 11.01 -lgomp - MIN: 9.42 / MAX: 13.72 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v3-v3 - Model: mobilenet-v3 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.06, N = 15 SE +/- 0.11, N = 3 5.20 6.87 8.88 -lomp - MIN: 5.05 / MAX: 7.61 -lomp - MIN: 6.16 / MAX: 8.64 -lgomp - MIN: 8.58 / MAX: 10.83 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v2-v2 - Model: mobilenet-v2 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.06, N = 15 SE +/- 0.24, N = 3 5.62 6.88 9.37 -lomp - MIN: 5.35 / MAX: 7.49 -lomp - MIN: 6.35 / MAX: 16.49 -lgomp - MIN: 8.62 / MAX: 11 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mobilenet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 5 10 15 20 25 SE +/- 0.34, N = 3 SE +/- 0.13, N = 15 SE +/- 0.12, N = 3 17.02 17.85 19.46 -lomp - MIN: 16.39 / MAX: 20.11 -lomp - MIN: 16.89 / MAX: 21.04 -lgomp - MIN: 18.87 / MAX: 31.76 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: squeezenet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.14, N = 15 SE +/- 0.12, N = 3 14.38 15.35 17.29 -lomp - MIN: 13.96 / MAX: 16.87 -lomp - MIN: 14.29 / MAX: 18.88 -lgomp - MIN: 16.89 / MAX: 19.32 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 10-bit GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.34, N = 3 SE +/- 0.26, N = 3 143.02 122.39 92.56 MIN: 98.76 / MAX: 246.17 MIN: 85.78 / MAX: 202.39 MIN: 61.05 / MAX: 158.66 1. (CC) gcc options: -O3 -march=znver2 -pthread
VP9 libvpx Encoding This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9/WebM format using a sample 1080p video. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 6.65 6.27 5.96 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver2 -fPIC -U_FORTIFY_SOURCE -std=c++11
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.17, N = 3 90.76 93.84 94.28 -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=znver2 -lm
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 20 40 60 80 100 SE +/- 0.32, N = 3 SE +/- 2.37, N = 15 SE +/- 0.80, N = 3 30.72 64.99 79.25 -fopenmp=libomp - MIN: 29.35 -fopenmp=libomp - MIN: 50.95 -fopenmp - MIN: 77.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Exhaustive AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 16 32 48 64 80 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 66.04 67.17 73.46 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Compression Speed AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 11 22 33 44 55 SE +/- 0.00, N = 3 SE +/- 0.04, N = 3 SE +/- 0.66, N = 3 48.52 48.40 45.57 1. (CC) gcc options: -O3
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 400 800 1200 1600 2000 SE +/- 17.89, N = 3 SE +/- 10.17, N = 3 SE +/- 22.27, N = 3 2092 1827 1653 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 90 180 270 360 450 434 384 316 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 120 240 360 480 600 SE +/- 5.24, N = 3 SE +/- 1.00, N = 3 535 527 525 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced LLVM Clang 11 GCC 10.2 AMD AOCC 2.3 140 280 420 560 700 627 590 526 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 300 600 900 1200 1500 SE +/- 4.26, N = 3 SE +/- 2.40, N = 3 SE +/- 1.00, N = 3 1368 1323 1295 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 11.92 13.46 13.88 -fopenmp=libomp - MIN: 11.65 -fopenmp=libomp - MIN: 13.14 -fopenmp - MIN: 13.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 10 20 30 40 50 SE +/- 0.12, N = 3 SE +/- 0.01, N = 3 SE +/- 0.12, N = 3 42.38 42.70 44.38 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
Stockfish This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 12 Total Time LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 13M 26M 39M 52M 65M SE +/- 877956.02, N = 4 SE +/- 379890.17, N = 3 SE +/- 872585.44, N = 3 62434784 62375605 58347386 -flto=thin -flto=thin -flto -flto=jobserver 1. (CXX) g++ options: -m64 -lpthread -O3 -march=znver2 -fno-exceptions -std=c++17 -pedantic -msse -msse3 -mpopcnt -msse4.1 -mssse3 -msse2
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 500K 1000K 1500K 2000K 2500K SE +/- 49885.99, N = 15 SE +/- 30693.11, N = 15 SE +/- 18838.90, N = 3 2122749.43 1874175.91 1809976.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
AOBench AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 10 20 30 40 50 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 35.78 37.76 41.66 1. (CC) gcc options: -lm -O3 -march=znver2
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 60 120 180 240 300 SE +/- 0.06, N = 3 SE +/- 1.10, N = 3 SE +/- 0.73, N = 3 147.52 172.51 254.61 -fopenmp=libomp - MIN: 146.31 -fopenmp=libomp - MIN: 169.54 -fopenmp - MIN: 251.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 2.29, N = 3 SE +/- 1.07, N = 3 SE +/- 2.97, N = 3 945.32 923.68 874.54 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
VP9 libvpx Encoding This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9/WebM format using a sample 1080p video. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 20.12 19.03 18.23 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver2 -fPIC -U_FORTIFY_SOURCE -std=c++11
PostgreSQL pgbench This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.0603 0.1206 0.1809 0.2412 0.3015 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 SE +/- 0.003, N = 5 0.259 0.265 0.268 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 800 1600 2400 3200 4000 SE +/- 18.01, N = 3 SE +/- 3.59, N = 3 SE +/- 46.08, N = 5 3865 3772 3739 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
SciMark This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 600 1200 1800 2400 3000 SE +/- 43.99, N = 3 SE +/- 14.97, N = 3 SE +/- 7.04, N = 3 2779.62 2759.35 2673.79 1. (CC) gcc options: -O3 -march=znver2 -lm
Darmstadt Automotive Parallel Heterogeneous Suite DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 200 400 600 800 1000 SE +/- 0.77, N = 3 SE +/- 0.47, N = 3 SE +/- 2.53, N = 3 890.71 678.51 674.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
LibRaw LibRaw is a RAW image decoder for digital camera photos. This test profile runs LibRaw's post-processing benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 12 24 36 48 60 SE +/- 0.16, N = 3 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 52.54 38.47 36.98 1. (CXX) g++ options: -O3 -march=znver2 -fopenmp -ljpeg -lz -lm
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 18.92 30.65 33.28 1. (CC) gcc options: -lm -lpthread -O3 -march=znver2
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 23.69 22.83 22.44 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread -lrt -ldl -lnuma
Basis Universal Basis Universal is a GPU texture codoec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 3 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 25.21 25.32 25.52 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
PostgreSQL pgbench This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 14.49 14.53 14.66 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 700 1400 2100 2800 3500 SE +/- 5.86, N = 3 SE +/- 4.43, N = 3 SE +/- 5.71, N = 3 3453 3443 3413 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.0216 0.0432 0.0648 0.0864 0.108 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 0.092 0.094 0.096 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 120K 240K 360K 480K 600K SE +/- 798.75, N = 3 SE +/- 4278.21, N = 3 SE +/- 4440.25, N = 3 541314 530684 521332 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.0079 0.0158 0.0237 0.0316 0.0395 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.034 0.035 0.035 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 6K 12K 18K 24K 30K SE +/- 73.04, N = 3 SE +/- 252.07, N = 3 SE +/- 137.51, N = 3 29645 28921 28919 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 90 180 270 360 450 SE +/- 0.62, N = 3 SE +/- 0.24, N = 3 SE +/- 0.45, N = 3 324.17 365.39 392.20 -fopenmp - MIN: 311.9 / MAX: 354.48 -fopenmp=libomp - MIN: 364.57 / MAX: 366.34 -fopenmp=libomp - MIN: 390.71 / MAX: 393.87 1. (CXX) g++ options: -O3 -march=znver2 -pthread -fvisibility=hidden -rdynamic -ldl
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 70 140 210 280 350 SE +/- 0.66, N = 3 SE +/- 0.20, N = 3 SE +/- 1.92, N = 3 304.18 305.13 311.68 -fopenmp=libomp - MIN: 302.78 / MAX: 315.91 -fopenmp - MIN: 304.36 / MAX: 306.06 -fopenmp=libomp - MIN: 306.99 / MAX: 314.32 1. (CXX) g++ options: -O3 -march=znver2 -pthread -fvisibility=hidden -rdynamic -ldl
RNNoise RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 20.76 21.54 21.72 1. (CC) gcc options: -O3 -march=znver2 -pedantic -fvisibility=hidden
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.4904 0.9808 1.4712 1.9616 2.452 SE +/- 0.00112, N = 3 SE +/- 0.00233, N = 3 SE +/- 0.00145, N = 3 1.99141 2.05539 2.17957 -fopenmp=libomp - MIN: 1.92 -fopenmp=libomp - MIN: 1.96 -fopenmp - MIN: 2.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.4403 0.8806 1.3209 1.7612 2.2015 SE +/- 0.00186, N = 3 SE +/- 0.01023, N = 3 SE +/- 0.01336, N = 3 1.66772 1.69390 1.95695 -fopenmp=libomp - MIN: 1.61 -fopenmp=libomp - MIN: 1.63 -fopenmp - MIN: 1.87 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless LLVM Clang 11 GCC 10.2 AMD AOCC 2.3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.03, N = 3 20.72 20.80 20.90 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 120 240 360 480 600 SE +/- 0.49, N = 3 SE +/- 0.97, N = 3 SE +/- 1.05, N = 3 575.22 572.19 564.96 MIN: 414.12 / MAX: 729.95 MIN: 404.8 / MAX: 726.7 MIN: 399.64 / MAX: 724.27 1. (CC) gcc options: -O3 -march=znver2 -pthread
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 1600 3200 4800 6400 8000 SE +/- 0.73, N = 3 SE +/- 0.64, N = 3 SE +/- 1.37, N = 3 7395.4 5413.5 5412.5 -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=znver2 -lssl -lcrypto -ldl
Basis Universal Basis Universal is a GPU texture codoec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 2 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 16.32 16.49 16.51 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.3876 0.7752 1.1628 1.5504 1.938 SE +/- 0.00432, N = 3 SE +/- 0.00519, N = 3 SE +/- 0.00256, N = 3 1.40879 1.67503 1.72247 -fopenmp=libomp - MIN: 1.36 -fopenmp - MIN: 1.59 -fopenmp=libomp - MIN: 1.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.2645 0.529 0.7935 1.058 1.3225 SE +/- 0.00275, N = 3 SE +/- 0.00137, N = 3 SE +/- 0.00350, N = 3 1.00926 1.03137 1.17556 -fopenmp=libomp - MIN: 0.95 -fopenmp=libomp - MIN: 0.97 -fopenmp - MIN: 1.13 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Thorough AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 8.16 8.32 9.61 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 4K LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 60 120 180 240 300 SE +/- 1.08, N = 3 SE +/- 1.22, N = 3 SE +/- 0.57, N = 3 275.41 274.20 269.72 MIN: 155.99 / MAX: 295.35 MIN: 151.98 / MAX: 293.44 MIN: 160.12 / MAX: 288.9 1. (CC) gcc options: -O3 -march=znver2 -pthread
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.042, N = 3 SE +/- 0.026, N = 3 SE +/- 0.025, N = 3 8.657 8.591 6.775 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.1296 0.2592 0.3888 0.5184 0.648 SE +/- 0.000544, N = 3 SE +/- 0.001691, N = 3 SE +/- 0.001767, N = 3 0.464235 0.532944 0.575864 -fopenmp=libomp - MIN: 0.45 -fopenmp - MIN: 0.51 -fopenmp=libomp - MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 11 22 33 44 55 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 SE +/- 0.13, N = 3 50.60 50.02 49.05 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread -lrt -ldl -lnuma
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 3 6 9 12 15 SE +/- 0.004, N = 3 SE +/- 0.003, N = 3 SE +/- 0.004, N = 3 8.798 10.078 11.026 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=znver2 -lncurses -lm
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.008, N = 3 SE +/- 0.008, N = 3 SE +/- 0.003, N = 3 7.546 7.655 8.861 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 16 32 48 64 80 SE +/- 0.46, N = 3 SE +/- 0.30, N = 3 SE +/- 0.29, N = 3 70.70 70.23 55.71 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Medium AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.81 6.05 7.00 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 130 260 390 520 650 SE +/- 2.09, N = 3 SE +/- 0.95, N = 3 SE +/- 0.36, N = 3 588.62 584.65 567.42 MIN: 345.64 / MAX: 651.08 MIN: 337.56 / MAX: 641.35 MIN: 337.19 / MAX: 625.71 1. (CC) gcc options: -O3 -march=znver2 -pthread
x264 This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2019-12-17 H.264 Video Encoding AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 30 60 90 120 150 SE +/- 0.67, N = 3 SE +/- 1.24, N = 3 SE +/- 0.59, N = 3 151.92 149.35 146.43 -mstack-alignment=64 -mstack-alignment=64 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -march=znver2 -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.4476 0.8952 1.3428 1.7904 2.238 SE +/- 0.00189, N = 3 SE +/- 0.00392, N = 3 SE +/- 0.00691, N = 3 1.87485 1.94380 1.98927 -fopenmp=libomp - MIN: 1.82 -fopenmp - MIN: 1.82 -fopenmp=libomp - MIN: 1.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.8378 1.6756 2.5134 3.3512 4.189 SE +/- 0.01101, N = 3 SE +/- 0.01624, N = 3 SE +/- 0.00775, N = 3 3.66096 3.68073 3.72354 -fopenmp=libomp - MIN: 3.49 -fopenmp - MIN: 3.53 -fopenmp=libomp - MIN: 3.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 60 120 180 240 300 SE +/- 0.90, N = 3 SE +/- 1.68, N = 3 SE +/- 1.18, N = 3 291.22 286.24 279.73 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
TSCP This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 200K 400K 600K 800K 1000K SE +/- 582.00, N = 5 SE +/- 471.20, N = 5 SE +/- 1467.20, N = 5 1143642 1138442 1007283 1. (CC) gcc options: -O3 -march=znver2 -march=native
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 80 160 240 320 400 SE +/- 0.54, N = 3 SE +/- 1.74, N = 3 SE +/- 1.42, N = 3 368.17 365.57 348.24 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: VMAF Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 80 160 240 320 400 SE +/- 1.39, N = 3 SE +/- 1.70, N = 3 SE +/- 2.15, N = 3 366.61 363.72 354.98 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
GCC 10.2 Processor: AMD EPYC 7502 32-Core @ 2.50GHz (32 Cores / 64 Threads), Motherboard: ASRockRack EPYCD8 (P2.10 BIOS), Chipset: AMD Starship/Matisse, Memory: 126GB, Disk: 280GB INTEL SSDPED1D280GA, Graphics: ASPEED, Audio: AMD Starship/Matisse, Monitor: VE228, Network: 2 x Intel I350
OS: Ubuntu 20.10, Kernel: 5.8.0-31-generic (x86_64), Desktop: GNOME Shell 3.38.1, Display Server: X Server 1.20.9, Display Driver: modesetting 1.20.9, Compiler: GCC 10.2.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101cSecurity Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 7 December 2020 16:58 by user phoronix.
LLVM Clang 11 Processor: AMD EPYC 7502 32-Core @ 2.50GHz (32 Cores / 64 Threads), Motherboard: ASRockRack EPYCD8 (P2.10 BIOS), Chipset: AMD Starship/Matisse, Memory: 126GB, Disk: 280GB INTEL SSDPED1D280GA, Graphics: ASPEED, Audio: AMD Starship/Matisse, Monitor: VE228, Network: 2 x Intel I350
OS: Ubuntu 20.10, Kernel: 5.8.0-31-generic (x86_64), Desktop: GNOME Shell 3.38.1, Display Server: X Server 1.20.9, Display Driver: modesetting 1.20.9, Compiler: Clang 11.0.0-2Target:, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2"Processor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101cSecurity Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 8 December 2020 06:09 by user phoronix.
AMD AOCC 2.3 Processor: AMD EPYC 7502 32-Core @ 2.50GHz (32 Cores / 64 Threads), Motherboard: ASRockRack EPYCD8 (P2.10 BIOS), Chipset: AMD Starship/Matisse, Memory: 126GB, Disk: 280GB INTEL SSDPED1D280GA, Graphics: ASPEED, Audio: AMD Starship/Matisse, Monitor: VE228, Network: 2 x Intel I350
OS: Ubuntu 20.10, Kernel: 5.8.0-31-generic (x86_64), Desktop: GNOME Shell 3.38.1, Display Server: X Server 1.20.9, Display Driver: modesetting 1.20.9, Compiler: Clang 11.0.0, File-System: ext4, Screen Resolution: 1920x1080
Environment Notes: CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver2Processor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101cSecurity Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 7 December 2020 10:27 by user phoronix.