Benchmarks for a future article.
GCC 10.2 Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3204 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 2000GB Corsair Force MP600 + 2000GB, Graphics: AMD NAVY_FLOUNDER 12GB (2855/1000MHz), Audio: AMD Device ab28, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: GCC 10.2.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD AOCC 2.3 OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: Clang 11.0.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: (unknown)Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
LLVM Clang 12 OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: Clang 12.0.0-++rc3-1~exp1~oibaf~g, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD AOCC 3.0 OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: Clang 12.0.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: (unknown)Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Sysbench This is a benchmark of Sysbench with the built-in CPU and memory sub-tests. Sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50M 100M 150M 200M 250M SE +/- 204338.54, N = 3 SE +/- 5355.39, N = 3 SE +/- 301436.60, N = 3 SE +/- 115.96, N = 3 210533984.51 2445437.63 210804861.92 91743.72 1. (CC) gcc options: -pthread -O2 -funroll-loops -O3 -march=native -rdynamic -ldl -laio -lm
Etcpak Etcpack is the self-proclaimed "fastest ETC compressor on the planet" with focused on providing open-source, very fast ETC and S3 texture compression support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 800 1600 2400 3200 4000 SE +/- 7.06, N = 3 SE +/- 26.84, N = 3 SE +/- 5.75, N = 3 SE +/- 2.21, N = 3 3583.01 3669.19 2986.71 1546.30 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
C-Ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 10 20 30 40 50 SE +/- 0.08, N = 3 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 44.33 44.89 44.53 25.09 1. (CC) gcc options: -lm -lpthread -O3 -march=native
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 80 160 240 320 400 SE +/- 0.67, N = 3 SE +/- 0.58, N = 3 SE +/- 0.58, N = 3 SE +/- 1.00, N = 3 240 237 241 375 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
LibRaw LibRaw is a RAW image decoder for digital camera photos. This test profile runs LibRaw's post-processing benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 SE +/- 0.08, N = 3 SE +/- 0.16, N = 3 52.68 54.14 50.37 78.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -ljpeg -lz -lm
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 15 12.30 17.06 12.19 17.61 -lomp - MIN: 11.96 / MAX: 17.61 -lomp - MIN: 16.85 / MAX: 20.53 -lomp - MIN: 11.89 / MAX: 13.6 -lgomp - MIN: 16.94 / MAX: 25.97 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 3.06, N = 3 SE +/- 1.00, N = 3 SE +/- 1.33, N = 3 805 844 848 1115 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Thorough AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3 6 9 12 15 SE +/- 0.0148, N = 3 SE +/- 0.0075, N = 3 SE +/- 0.0090, N = 3 SE +/- 0.0057, N = 3 9.3493 9.4996 9.2012 6.9922 1. (CXX) g++ options: -O3 -march=native -flto -pthread
Etcpak Etcpack is the self-proclaimed "fastest ETC compressor on the planet" with focused on providing open-source, very fast ETC and S3 texture compression support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 80 160 240 320 400 SE +/- 0.06, N = 3 SE +/- 0.14, N = 3 SE +/- 1.16, N = 3 SE +/- 0.37, N = 3 286.93 383.47 285.29 386.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 15 30 45 60 75 SE +/- 0.66, N = 3 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 SE +/- 0.24, N = 3 64.28 65.31 65.75 51.77 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 500 1000 1500 2000 2500 SE +/- 2.65, N = 3 SE +/- 1.15, N = 3 SE +/- 1.45, N = 3 1720 1789 1824 2165 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.9968 1.9936 2.9904 3.9872 4.984 SE +/- 0.06, N = 4 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 15 3.53 3.79 3.52 4.43 -lomp - MIN: 3.27 / MAX: 4.75 -lomp - MIN: 3.63 / MAX: 5.2 -lomp - MIN: 3.34 / MAX: 4.84 -lgomp - MIN: 4.19 / MAX: 11.09 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.8663 1.7326 2.5989 3.4652 4.3315 SE +/- 0.07, N = 4 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 15 3.07 3.33 3.06 3.85 -lomp - MIN: 2.9 / MAX: 4.41 -lomp - MIN: 3.19 / MAX: 5.6 -lomp - MIN: 2.98 / MAX: 4.3 -lgomp - MIN: 3.74 / MAX: 10.85 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 60 120 180 240 300 SE +/- 0.69, N = 3 SE +/- 0.58, N = 3 SE +/- 0.35, N = 3 SE +/- 0.56, N = 3 260.66 270.79 252.45 216.28 -fopenmp=libomp - MIN: 257.51 / MAX: 262.88 -fopenmp=libomp - MIN: 268.42 / MAX: 272.22 -fopenmp=libomp - MIN: 250.25 / MAX: 255.53 -fopenmp - MIN: 215.1 / MAX: 218.26 1. (CXX) g++ options: -O3 -march=native -pthread -fvisibility=hidden -rdynamic -ldl
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.8843 1.7686 2.6529 3.5372 4.4215 SE +/- 0.03, N = 4 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 15 3.18 3.45 3.16 3.93 -lomp - MIN: 3.04 / MAX: 4.48 -lomp - MIN: 3.37 / MAX: 4.6 -lomp - MIN: 3.06 / MAX: 4.05 -lgomp - MIN: 3.71 / MAX: 6.06 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
Ogg Audio Encoding This test times how long it takes to encode a sample WAV file to Ogg format using the reference Xiph.org tools/libraries. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ogg Audio Encoding 1.3.4 WAV To Ogg AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.09, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 SE +/- 0.04, N = 3 16.54 13.37 16.56 13.58 1. (CC) gcc options: -O2 -ffast-math -fsigned-char -O3 -march=native
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Medium AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.9118 1.8236 2.7354 3.6472 4.559 SE +/- 0.0017, N = 3 SE +/- 0.0018, N = 3 SE +/- 0.0273, N = 3 SE +/- 0.0178, N = 3 3.4040 3.5076 3.2899 4.0524 1. (CXX) g++ options: -O3 -march=native -flto -pthread
Google SynthMark SynthMark is a cross platform tool for benchmarking CPU performance under a variety of real-time audio workloads. It uses a polyphonic synthesizer model to provide standardized tests for latency, jitter and computational throughput. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Voices, More Is Better Google SynthMark 20201109 Test: VoiceMark_100 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 5.41, N = 3 SE +/- 4.01, N = 3 SE +/- 5.04, N = 3 SE +/- 1.26, N = 3 789.22 795.81 807.37 966.30 1. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 3, Long Mode - Compression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 300 600 900 1200 1500 SE +/- 2.80, N = 3 SE +/- 1.19, N = 3 SE +/- 4.71, N = 3 SE +/- 2.43, N = 3 1186.0 1191.6 1166.4 1425.9 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 2.03, N = 3 SE +/- 1.86, N = 3 SE +/- 8.67, N = 3 SE +/- 3.51, N = 3 867 1016 928 1056 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1.197 2.394 3.591 4.788 5.985 SE +/- 0.03, N = 4 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 SE +/- 0.02, N = 15 4.50 4.80 4.53 5.32 -lomp - MIN: 4.34 / MAX: 5.7 -lomp - MIN: 4.71 / MAX: 6.61 -lomp - MIN: 4.35 / MAX: 6.86 -lgomp - MIN: 5.15 / MAX: 13.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
TSCP This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 500K 1000K 1500K 2000K 2500K SE +/- 3546.09, N = 5 SE +/- 4267.44, N = 5 SE +/- 4348.49, N = 5 SE +/- 7442.75, N = 5 2283512 2148154 2314225 1965773 1. (CC) gcc options: -O3 -march=native
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.4118 0.8236 1.2354 1.6472 2.059 SE +/- 0.02, N = 4 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 15 1.56 1.73 1.57 1.83 -lomp - MIN: 1.46 / MAX: 6.9 -lomp - MIN: 1.68 / MAX: 1.79 -lomp - MIN: 1.54 / MAX: 1.75 -lgomp - MIN: 1.77 / MAX: 3.9 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 800 1600 2400 3200 4000 SE +/- 27.64, N = 10 SE +/- 49.56, N = 3 SE +/- 28.46, N = 10 SE +/- 33.41, N = 5 3646.4 3538.5 3710.4 3196.9 1. (CXX) g++ options: -O3 -march=native -rdynamic
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 100 200 300 400 500 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.67, N = 3 392 398 402 454 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Etcpak Etcpack is the self-proclaimed "fastest ETC compressor on the planet" with focused on providing open-source, very fast ETC and S3 texture compression support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 60 120 180 240 300 SE +/- 0.45, N = 3 SE +/- 0.09, N = 3 SE +/- 2.43, N = 3 SE +/- 1.65, N = 3 242.03 272.99 236.47 245.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 32 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 300M 600M 900M 1200M 1500M SE +/- 1422439.22, N = 3 SE +/- 240370.09, N = 3 SE +/- 1125956.38, N = 3 SE +/- 497772.82, N = 3 1334900000 1335233333 1332333333 1164966667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4K 8K 12K 16K 20K SE +/- 177.70, N = 3 SE +/- 123.46, N = 12 SE +/- 193.85, N = 4 SE +/- 134.84, N = 3 15474 14972 17105 15049 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
JPEG XL Decoding The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is suited for JPEG XL decode performance testing to PNG output file, the pts/jpexl test is for encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding 0.3.3 CPU Threads: 1 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 14 28 42 56 70 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.11, N = 3 SE +/- 0.05, N = 3 59.92 62.27 64.34 56.53
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1.1795 2.359 3.5385 4.718 5.8975 SE +/- 0.020, N = 3 SE +/- 0.055, N = 3 SE +/- 0.016, N = 3 SE +/- 0.018, N = 3 4.937 4.674 4.609 5.242 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16 -ltiff
AOM AV1 This is a test of the AOMedia AV1 encoder (libaom) run on the CPU with a sample 1080p video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 0 Two-Pass LLVM Clang 12 GCC 10.2 0.0945 0.189 0.2835 0.378 0.4725 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.42 0.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3 6 9 12 15 SE +/- 0.12, N = 4 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 SE +/- 0.16, N = 15 11.28 11.53 10.96 12.42 -lomp - MIN: 10.61 / MAX: 20.99 -lomp - MIN: 11.09 / MAX: 12.2 -lomp - MIN: 10.51 / MAX: 16.79 -lgomp - MIN: 11.7 / MAX: 20.08 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
SVT-AV1 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 2 4 6 8 10 SE +/- 0.055, N = 3 SE +/- 0.033, N = 3 SE +/- 0.004, N = 3 SE +/- 0.014, N = 3 6.917 6.823 6.859 6.137 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.9518 1.9036 2.8554 3.8072 4.759 SE +/- 0.06, N = 4 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 SE +/- 0.01, N = 15 3.87 4.04 3.79 4.23 -lomp - MIN: 3.67 / MAX: 12.94 -lomp - MIN: 3.88 / MAX: 5.03 -lomp - MIN: 3.64 / MAX: 4.86 -lgomp - MIN: 4.15 / MAX: 9.05 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
JPEG XL Decoding The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is suited for JPEG XL decode performance testing to PNG output file, the pts/jpexl test is for encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding 0.3.3 CPU Threads: All AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50 100 150 200 250 SE +/- 0.21, N = 3 SE +/- 0.05, N = 3 SE +/- 0.40, N = 3 SE +/- 0.29, N = 3 191.91 196.34 213.67 210.99
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.24, N = 4 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 SE +/- 0.06, N = 15 12.40 12.50 12.74 13.77 -lomp - MIN: 11.72 / MAX: 15.69 -lomp - MIN: 12.23 / MAX: 16.6 -lomp - MIN: 12 / MAX: 19.89 -lgomp - MIN: 13.25 / MAX: 23.45 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
JPEG XL The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 8 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 9 18 27 36 45 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 34.34 36.44 35.87 38.13 -Xclang -mrelax-all -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -funwind-tables -O2 -pthread -fPIE -pie -ldl
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 6 12 18 24 30 SE +/- 0.20, N = 4 SE +/- 0.20, N = 3 SE +/- 0.09, N = 3 SE +/- 0.21, N = 15 23.29 23.54 23.31 25.67 -lomp - MIN: 22.43 / MAX: 25.17 -lomp - MIN: 22.92 / MAX: 26.57 -lomp - MIN: 22.75 / MAX: 33.51 -lgomp - MIN: 24.52 / MAX: 35.96 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19, Long Mode - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 900 1800 2700 3600 4500 SE +/- 39.89, N = 3 SE +/- 25.47, N = 3 SE +/- 16.46, N = 3 SE +/- 72.38, N = 3 3978.2 3957.9 4024.7 4350.9 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.2745 0.549 0.8235 1.098 1.3725 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.12 1.14 1.11 1.22 1. (CXX) g++ options: -O3 -march=native -pthread
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8, Long Mode - Compression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 5.57, N = 3 SE +/- 2.69, N = 3 SE +/- 6.07, N = 3 SE +/- 2.15, N = 3 1024.5 1034.5 1025.4 1122.6 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
JPEG XL The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 8 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.2565 0.513 0.7695 1.026 1.2825 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 1.04 1.06 1.04 1.14 -Xclang -mrelax-all -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -funwind-tables -O2 -pthread -fPIE -pie -ldl
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 6.04 6.18 5.92 5.64 1. (CXX) g++ options: -O3 -march=native -pthread
Basis Universal Basis Universal is a GPU texture codec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 0 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1.2688 2.5376 3.8064 5.0752 6.344 SE +/- 0.015, N = 3 SE +/- 0.015, N = 3 SE +/- 0.009, N = 3 SE +/- 0.023, N = 3 5.639 5.522 5.453 5.157 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 6.23 6.26 6.12 5.73 1. (CXX) g++ options: -O3 -march=native -pthread
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 100 200 300 400 500 SE +/- 1.36, N = 3 SE +/- 2.13, N = 3 SE +/- 2.95, N = 3 SE +/- 1.96, N = 3 465 426 456 433 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20M 40M 60M 80M 100M SE +/- 601612.28, N = 3 SE +/- 171803.51, N = 3 SE +/- 78876.13, N = 3 SE +/- 828458.69, N = 5 78734000 77794333 75031667 81844000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
POV-Ray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 6 12 18 24 30 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.09, N = 3 22.54 22.12 22.54 24.09 1. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -pthread -lSDL -lXpm -lSM -lICE -lX11 -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5 -lIexMath-2_5 -lIlmThread-2_5 -lIlmThread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
WebP2 Image Encode This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 75, Compression Effort 7 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 0.95, N = 3 SE +/- 0.42, N = 3 SE +/- 1.06, N = 3 105.72 103.01 106.38 111.80 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -ljpeg -lgif -lwebp -lwebpdemux -lpthread
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 95, Compression Effort 7 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 40 80 120 160 200 SE +/- 0.56, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 193.43 188.31 193.60 203.81 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -ljpeg -lgif -lwebp -lwebpdemux -lpthread
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Compression Effort 5 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 2 4 6 8 10 SE +/- 0.010, N = 3 SE +/- 0.010, N = 3 SE +/- 0.015, N = 3 SE +/- 0.011, N = 3 6.364 6.789 6.288 6.414 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -ljpeg -lgif -lwebp -lwebpdemux -lpthread
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 300 600 900 1200 1500 SE +/- 3.84, N = 3 SE +/- 4.48, N = 3 SE +/- 3.71, N = 3 SE +/- 3.67, N = 3 1083 1108 1131 1166 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Basis Universal Basis Universal is a GPU texture codec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: ETC1S AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 5 10 15 20 25 SE +/- 0.21, N = 3 SE +/- 0.07, N = 3 SE +/- 0.18, N = 3 SE +/- 0.04, N = 3 21.37 21.38 21.22 19.90 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
JPEG XL The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 5 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.24, N = 3 SE +/- 0.25, N = 3 SE +/- 0.14, N = 3 83.54 85.67 89.51 87.35 -Xclang -mrelax-all -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -funwind-tables -O2 -pthread -fPIE -pie -ldl
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8 - Compression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 9.53, N = 15 SE +/- 11.95, N = 4 SE +/- 9.29, N = 3 SE +/- 3.93, N = 3 1096.3 1078.1 1117.6 1057.4 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3 6 9 12 15 SE +/- 0.13, N = 4 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 SE +/- 0.06, N = 15 11.96 12.53 11.94 12.76 -lomp - MIN: 11.46 / MAX: 13.27 -lomp - MIN: 12.12 / MAX: 17.12 -lomp - MIN: 11.62 / MAX: 12.42 -lgomp - MIN: 12.19 / MAX: 19.36 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
JPEG XL The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 7 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.20, N = 3 SE +/- 0.25, N = 3 SE +/- 0.19, N = 3 83.63 85.81 89.32 87.07 -Xclang -mrelax-all -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -funwind-tables -O2 -pthread -fPIE -pie -ldl
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.2 Video Input: Summer Nature 4K AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50 100 150 200 250 SE +/- 0.43, N = 3 SE +/- 0.04, N = 3 SE +/- 0.32, N = 3 SE +/- 0.47, N = 3 229.03 244.37 244.15 243.69 MIN: 171.52 / MAX: 237.17 MIN: 182.08 / MAX: 252.22 MIN: 180.82 / MAX: 252.96 -lm - MIN: 181.29 / MAX: 252.3 1. (CC) gcc options: -O3 -march=native -pthread
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Default AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.2345 0.469 0.7035 0.938 1.1725 SE +/- 0.005, N = 3 SE +/- 0.014, N = 3 SE +/- 0.006, N = 3 SE +/- 0.008, N = 3 1.007 0.977 0.979 1.042 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16 -ltiff
WebP2 Image Encode This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Default AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.5117 1.0234 1.5351 2.0468 2.5585 SE +/- 0.025, N = 3 SE +/- 0.011, N = 3 SE +/- 0.024, N = 3 SE +/- 0.005, N = 3 2.165 2.134 2.144 2.274 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -ljpeg -lgif -lwebp -lwebpdemux -lpthread
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50 100 150 200 250 SE +/- 2.47, N = 12 SE +/- 2.40, N = 12 SE +/- 2.24, N = 13 SE +/- 2.40, N = 12 225.17 223.50 238.11 235.04 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 3, Long Mode - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1000 2000 3000 4000 5000 SE +/- 33.94, N = 3 SE +/- 2.17, N = 3 SE +/- 31.63, N = 3 SE +/- 46.74, N = 3 4543.8 4456.6 4586.2 4737.1 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: LPUSH AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 500K 1000K 1500K 2000K 2500K SE +/- 35143.82, N = 15 SE +/- 30760.12, N = 3 SE +/- 27675.97, N = 4 SE +/- 23396.73, N = 15 2345671.03 2212779.00 2351340.56 2222217.52 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: LPOP AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 800K 1600K 2400K 3200K 4000K SE +/- 45854.80, N = 15 SE +/- 31635.54, N = 3 SE +/- 30792.29, N = 8 SE +/- 26197.04, N = 3 3766645.92 3649832.58 3589202.59 3549910.50 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8 - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1000 2000 3000 4000 5000 SE +/- 8.16, N = 11 SE +/- 37.90, N = 2 SE +/- 26.73, N = 3 4463.1 4352.4 4468.2 4617.1 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 140 280 420 560 700 SE +/- 5.59, N = 12 SE +/- 5.80, N = 3 SE +/- 6.07, N = 12 SE +/- 6.71, N = 3 649 634 646 614 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 5 10 15 20 25 SE +/- 0.11, N = 4 SE +/- 0.13, N = 3 SE +/- 0.14, N = 3 SE +/- 0.17, N = 15 21.93 21.70 21.66 20.77 -lomp - MIN: 21.28 / MAX: 24.81 -lomp - MIN: 21.21 / MAX: 30.18 -lomp - MIN: 21.17 / MAX: 27.18 -lgomp - MIN: 19.69 / MAX: 43.19 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.837 1.674 2.511 3.348 4.185 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 3.61 3.71 3.53 3.72 1. (CXX) g++ options: -O3 -march=native -pthread
WebP2 Image Encode This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Lossless Compression AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 80 160 240 320 400 SE +/- 1.38, N = 3 SE +/- 0.58, N = 3 SE +/- 1.28, N = 3 SE +/- 0.42, N = 3 356.82 349.02 357.90 367.37 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -ljpeg -lgif -lwebp -lwebpdemux -lpthread
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50 100 150 200 250 SE +/- 0.39, N = 3 SE +/- 0.82, N = 3 SE +/- 0.90, N = 3 SE +/- 0.68, N = 3 221.59 219.12 230.19 228.96 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20 40 60 80 100 SE +/- 0.44, N = 3 SE +/- 0.29, N = 3 SE +/- 0.44, N = 3 SE +/- 0.17, N = 3 102 104 103 99 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 100 200 300 400 500 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 452 457 461 439 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
LZ4 Compression This test measures the time needed to compress/decompress a sample file (an Ubuntu ISO) using LZ4 compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3K 6K 9K 12K 15K SE +/- 106.53, N = 3 SE +/- 93.71, N = 3 SE +/- 59.50, N = 3 SE +/- 38.97, N = 3 13144.8 13305.3 13595.3 13771.1 1. (CC) gcc options: -O3
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SET AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 600K 1200K 1800K 2400K 3000K SE +/- 14014.88, N = 3 SE +/- 28596.87, N = 3 SE +/- 23132.25, N = 3 SE +/- 26145.63, N = 15 2719036.20 2762047.50 2719539.83 2640316.17 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Exhaustive AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 12 24 36 48 60 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 51.45 51.66 50.81 52.93 1. (CXX) g++ options: -O3 -march=native -flto -pthread
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 16 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200M 400M 600M 800M 1000M SE +/- 3939684.14, N = 3 SE +/- 3699249.17, N = 3 SE +/- 3628743.28, N = 3 SE +/- 5768882.04, N = 3 1086033333 1067666667 1067266667 1111200000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 0.9268 1.8536 2.7804 3.7072 4.634 SE +/- 0.01273, N = 3 SE +/- 0.01294, N = 3 SE +/- 0.00506, N = 3 4.09663 4.11930 3.95979 -fopenmp=libomp - MIN: 3.9 -fopenmp=libomp - MIN: 3.88 -fopenmp - MIN: 3.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 7 14 21 28 35 SE +/- 0.12, N = 3 SE +/- 0.09, N = 3 SE +/- 0.08, N = 3 SE +/- 0.16, N = 3 26.96 28.02 27.49 27.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma
Tachyon This is a test of the threaded Tachyon, a parallel ray-tracing system, measuring the time to ray-trace a sample scene. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Tachyon 0.99b6 Total Time AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 10 20 30 40 50 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 SE +/- 0.20, N = 3 SE +/- 0.13, N = 3 45.00 45.07 46.13 44.39 1. (CC) gcc options: -m64 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20 40 60 80 100 SE +/- 0.35, N = 3 SE +/- 0.13, N = 3 SE +/- 0.28, N = 3 SE +/- 0.19, N = 3 88.70 92.16 89.74 89.80 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma
TNN TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50 100 150 200 250 SE +/- 0.48, N = 3 SE +/- 1.02, N = 3 SE +/- 0.85, N = 3 SE +/- 0.57, N = 3 204.60 206.36 203.64 211.57 -fopenmp=libomp - MIN: 203.72 / MAX: 206.33 -fopenmp=libomp - MIN: 204.24 / MAX: 209.24 -fopenmp=libomp - MIN: 201.91 / MAX: 206.13 -fopenmp - MIN: 206.88 / MAX: 212.83 1. (CXX) g++ options: -O3 -march=native -pthread -fvisibility=hidden -rdynamic -ldl
GNU Radio GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Hilbert Transform AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 120 240 360 480 600 SE +/- 1.23, N = 8 SE +/- 0.58, N = 9 SE +/- 1.95, N = 9 SE +/- 0.63, N = 9 523.6 522.8 534.8 515.8 1. 3.8.1.0
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 3 6 9 12 15 SE +/- 0.01936, N = 3 SE +/- 0.01452, N = 3 SE +/- 0.01340, N = 3 9.57194 9.59442 9.25967 -fopenmp=libomp - MIN: 9.46 -fopenmp=libomp - MIN: 9.47 -fopenmp - MIN: 9.1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Ngspice Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C7552 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 14 28 42 56 70 SE +/- 0.08, N = 3 SE +/- 0.20, N = 3 SE +/- 0.54, N = 3 SE +/- 0.15, N = 3 64.91 64.52 64.89 62.82 1. (CC) gcc options: -O3 -march=native -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 13 26 39 52 65 SE +/- 0.10, N = 3 SE +/- 0.81, N = 3 SE +/- 0.10, N = 3 SE +/- 0.15, N = 3 57.99 59.23 59.07 59.87 -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=native -lm -lreadline
LZ4 Compression This test measures the time needed to compress/decompress a sample file (an Ubuntu ISO) using LZ4 compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3K 6K 9K 12K 15K SE +/- 24.92, N = 6 SE +/- 21.95, N = 3 SE +/- 53.22, N = 3 SE +/- 35.65, N = 6 12981.8 13129.4 13188.4 13397.7 1. (CC) gcc options: -O3
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SADD AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 700K 1400K 2100K 2800K 3500K SE +/- 27502.49, N = 15 SE +/- 29853.73, N = 3 SE +/- 40118.44, N = 3 SE +/- 39730.96, N = 15 2948093.68 2954866.80 2961165.67 3041527.37 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
RNNoise RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.16, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.04, N = 3 14.47 14.33 14.04 14.20 1. (CC) gcc options: -O3 -march=native -pedantic -fvisibility=hidden
LZ4 Compression This test measures the time needed to compress/decompress a sample file (an Ubuntu ISO) using LZ4 compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3K 6K 9K 12K 15K SE +/- 30.75, N = 3 SE +/- 46.92, N = 3 SE +/- 15.17, N = 5 SE +/- 48.22, N = 3 13010.7 13082.4 13212.6 13400.1 1. (CC) gcc options: -O3
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3 6 9 12 15 SE +/- 0.04, N = 4 SE +/- 0.04, N = 3 SE +/- 0.00, N = 2 SE +/- 0.09, N = 15 11.11 11.14 11.01 10.82 -lomp - MIN: 10.92 / MAX: 15.76 -lomp - MIN: 10.96 / MAX: 13.28 -lomp - MIN: 10.84 / MAX: 12.26 -lgomp - MIN: 10.41 / MAX: 17.59 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
LZ4 Compression This test measures the time needed to compress/decompress a sample file (an Ubuntu ISO) using LZ4 compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Compression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3K 6K 9K 12K 15K SE +/- 49.14, N = 3 SE +/- 66.76, N = 3 SE +/- 57.72, N = 3 SE +/- 76.55, N = 3 12124.81 12227.47 12456.17 12330.56 1. (CC) gcc options: -O3
Gcrypt Library Libgcrypt is a general purpose cryptographic library developed as part of the GnuPG project. This is a benchmark of libgcrypt's integrated benchmark and is measuring the time to run the benchmark command with a cipher/mac/hash repetition count set for 50 times as simple, high level look at the overall crypto performance of the system under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.9 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 40 80 120 160 200 SE +/- 0.17, N = 3 SE +/- 1.10, N = 3 SE +/- 1.67, N = 3 SE +/- 0.29, N = 3 175.69 172.90 173.31 171.19 1. (CC) gcc options: -O3 -march=native -fvisibility=hidden -lgpg-error
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.12, N = 3 SE +/- 0.11, N = 3 SE +/- 0.11, N = 3 13.84 13.91 13.64 13.99 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16 -ltiff
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 0.8201 1.6402 2.4603 3.2804 4.1005 SE +/- 0.00604, N = 3 SE +/- 0.01444, N = 3 SE +/- 0.00753, N = 3 3.58364 3.64485 3.55467 -fopenmp=libomp - MIN: 3.44 -fopenmp=libomp - MIN: 3.5 -fopenmp - MIN: 3.46 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 13 26 39 52 65 SE +/- 0.19, N = 4 SE +/- 0.17, N = 3 SE +/- 0.01, N = 3 SE +/- 0.12, N = 15 58.96 57.51 58.11 57.89 -lomp - MIN: 57.64 / MAX: 66.68 -lomp - MIN: 56.17 / MAX: 62.53 -lomp - MIN: 56.81 / MAX: 67.53 -lgomp - MIN: 55.89 / MAX: 80.86 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
Crypto++ Crypto++ is a C++ class library of cryptographic algorithms. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Unkeyed Algorithms AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 120 240 360 480 600 SE +/- 2.13, N = 3 SE +/- 1.73, N = 3 SE +/- 1.69, N = 3 SE +/- 3.29, N = 15 538.88 552.38 550.46 545.91 1. (CXX) g++ options: -O3 -march=native -fPIC -pthread -pipe
OpenFOAM OpenFOAM is the leading free, open source software for computational fluid dynamics (CFD). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M LLVM Clang 12 GCC 10.2 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.08, N = 3 100.16 97.75 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfoamToVTK -ldynamicMesh -llagrangian -lgenericPatchFields -lfileFormats -lOpenFOAM -ldl -lm
Ngspice Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C2670 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 16 32 48 64 80 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 SE +/- 0.06, N = 3 SE +/- 0.21, N = 3 73.28 72.46 72.78 71.60 1. (CC) gcc options: -O3 -march=native -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
GNU Radio GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Signal Source (Cosine) AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1000 2000 3000 4000 5000 SE +/- 60.32, N = 8 SE +/- 10.26, N = 9 SE +/- 22.36, N = 9 SE +/- 16.39, N = 9 4704.7 4769.8 4661.6 4715.4 1. 3.8.1.0
x264 This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x264 2019-12-17 H.264 Video Encoding AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 50 100 150 200 250 SE +/- 1.75, N = 9 SE +/- 1.62, N = 12 SE +/- 1.86, N = 8 SE +/- 1.66, N = 9 210.35 213.57 210.72 208.93 -mstack-alignment=64 -mstack-alignment=64 -mstack-alignment=64 1. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 7 14 21 28 35 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.08, N = 3 28.19 28.67 28.43 28.81 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16 -ltiff
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19 - Compression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 12 24 36 48 60 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 SE +/- 0.20, N = 3 50.5 51.3 50.7 51.6 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
JPEG XL The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 7 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 11.17 11.41 11.31 11.20 -Xclang -mrelax-all -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -funwind-tables -O2 -pthread -fPIE -pie -ldl
GNU Radio GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: IIR Filter AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 2.72, N = 8 SE +/- 1.22, N = 9 SE +/- 2.72, N = 9 SE +/- 1.32, N = 9 838.3 835.4 853.3 843.1 1. 3.8.1.0
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.3764 0.7528 1.1292 1.5056 1.882 SE +/- 0.011, N = 3 SE +/- 0.003, N = 3 SE +/- 0.006, N = 3 SE +/- 0.018, N = 4 1.669 1.638 1.673 1.652 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16 -ltiff
dav1d Dav1d is an open-source, speedy AV1 video decoder. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better dav1d 0.8.2 Video Input: Summer Nature 1080p AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 1.29, N = 3 SE +/- 2.97, N = 3 SE +/- 9.00, N = 3 SE +/- 1.38, N = 3 959.29 979.34 976.93 971.79 MIN: 714.89 / MAX: 1039.77 MIN: 717.55 / MAX: 1062.34 MIN: 633.01 / MAX: 1069.88 -lm - MIN: 732.02 / MAX: 1055.82 1. (CC) gcc options: -O3 -march=native -pthread
Opus Codec Encoding Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode LLVM Clang 12 GCC 10.2 1.2575 2.515 3.7725 5.03 6.2875 SE +/- 0.037, N = 5 SE +/- 0.031, N = 5 5.589 5.484 -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -logg -lm
GNU Radio GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: FIR Filter AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 3.65, N = 8 SE +/- 3.22, N = 9 SE +/- 4.68, N = 9 SE +/- 2.09, N = 9 1065.4 1060.1 1080.3 1063.5 1. 3.8.1.0
NCNN NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.11, N = 4 SE +/- 0.08, N = 3 SE +/- 0.21, N = 3 SE +/- 0.05, N = 15 13.86 13.91 14.08 14.11 -lomp - MIN: 13.44 / MAX: 16.35 -lomp - MIN: 13.66 / MAX: 14.54 -lomp - MIN: 13.56 / MAX: 21.13 -lgomp - MIN: 13.84 / MAX: 23.15 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 400 800 1200 1600 2000 SE +/- 3.74, N = 3 SE +/- 9.12, N = 3 SE +/- 5.00, N = 3 1760.57 1792.27 1773.67 -fopenmp=libomp - MIN: 1745.87 -fopenmp=libomp - MIN: 1766.32 -fopenmp - MIN: 1750.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8, Long Mode - Decompression Speed AMD AOCC 2.3 GCC 10.2 1000 2000 3000 4000 5000 SE +/- 29.99, N = 3 4805.6 4886.2 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
JPEG XL The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 5 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.04, N = 3 SE +/- 0.11, N = 3 SE +/- 0.03, N = 3 73.64 74.77 74.61 74.12 -Xclang -mrelax-all -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -funwind-tables -O2 -pthread -fPIE -pie -ldl
libavif avifenc This is a test of the AOMedia libavif library testing the encoding of a JPEG image to AV1 Image Format (AVIF). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 10, Lossless AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1.0969 2.1938 3.2907 4.3876 5.4845 SE +/- 0.015, N = 3 SE +/- 0.038, N = 3 SE +/- 0.041, N = 3 SE +/- 0.022, N = 3 4.837 4.807 4.832 4.875 1. (CXX) g++ options: -O3 -fPIC -lm
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19, Long Mode - Compression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 8 16 24 32 40 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 36.4 36.8 36.7 36.6 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
Basis Universal Basis Universal is a GPU texture codec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 2 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 16.06 15.91 15.98 15.90 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 0.1447 0.2894 0.4341 0.5788 0.7235 SE +/- 0.004823, N = 3 SE +/- 0.000908, N = 3 SE +/- 0.000722, N = 3 0.643093 0.641231 0.638664 -fopenmp=libomp - MIN: 0.61 -fopenmp=libomp - MIN: 0.61 -fopenmp - MIN: 0.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
libavif avifenc This is a test of the AOMedia libavif library testing the encoding of a JPEG image to AV1 Image Format (AVIF). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 10 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 0.6642 1.3284 1.9926 2.6568 3.321 SE +/- 0.006, N = 3 SE +/- 0.016, N = 3 SE +/- 0.035, N = 3 SE +/- 0.014, N = 3 2.941 2.952 2.933 2.934 1. (CXX) g++ options: -O3 -fPIC -lm
GNU Radio GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: FM Deemphasis Filter AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 3.09, N = 8 SE +/- 0.98, N = 9 SE +/- 15.16, N = 9 SE +/- 0.78, N = 9 1055.8 1054.9 1061.0 1055.0 1. 3.8.1.0
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.09, N = 3 17.36 17.34 17.29 -fopenmp=libomp - MIN: 16.83 -fopenmp=libomp - MIN: 16.81 -fopenmp - MIN: 16.58 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Basis Universal Basis Universal is a GPU texture codec. This test times how long it takes to convert sRGB PNGs into Basis Univeral assets with various settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 3 AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 7 14 21 28 35 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 28.17 28.22 28.15 28.13 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 600 1200 1800 2400 3000 SE +/- 17.19, N = 3 SE +/- 5.95, N = 3 SE +/- 2.01, N = 3 2760.94 2757.75 2757.52 -fopenmp=libomp - MIN: 2717.59 -fopenmp=libomp - MIN: 2734.73 -fopenmp - MIN: 2719.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Mobile Neural Network MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: inception-v3 GCC 10.2 8 16 24 32 40 SE +/- 0.09, N = 3 32.34 MIN: 31.33 / MAX: 42.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: mobilenet-v1-1.0 GCC 10.2 0.529 1.058 1.587 2.116 2.645 SE +/- 0.027, N = 3 2.351 MIN: 2.27 / MAX: 7.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: MobileNetV2_224 GCC 10.2 0.729 1.458 2.187 2.916 3.645 SE +/- 0.049, N = 3 3.240 MIN: 3.12 / MAX: 11.31 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: resnet-v2-50 GCC 10.2 6 12 18 24 30 SE +/- 0.02, N = 3 25.07 MIN: 23.97 / MAX: 39.95 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: SqueezeNetV1.0 GCC 10.2 1.1432 2.2864 3.4296 4.5728 5.716 SE +/- 0.010, N = 3 5.081 MIN: 4.92 / MAX: 14.74 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Smallpt Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 128 Samples GCC 10.2 1.0517 2.1034 3.1551 4.2068 5.2585 SE +/- 0.015, N = 3 4.674 1. (CXX) g++ options: -fopenmp -O3 -march=native
ONNX Runtime ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 1400 2800 4200 5600 7000 SE +/- 38.28, N = 3 SE +/- 55.09, N = 12 SE +/- 34.74, N = 3 SE +/- 215.50, N = 12 5976 5937 6067 6721 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
Redis Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 800K 1600K 2400K 3200K 4000K SE +/- 11517.61, N = 3 SE +/- 47796.61, N = 15 SE +/- 58906.79, N = 15 SE +/- 36718.95, N = 15 3545388.92 3624414.37 3658044.77 3470419.90 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=native
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU AMD AOCC 3.0 LLVM Clang 12 GCC 10.2 1.0052 2.0104 3.0156 4.0208 5.026 SE +/- 0.00340, N = 3 SE +/- 0.00451, N = 3 SE +/- 0.30276, N = 15 2.46850 2.46561 4.46777 -fopenmp=libomp - MIN: 2.36 -fopenmp=libomp - MIN: 2.33 -fopenmp - MIN: 2.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
GNU Radio GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Five Back to Back FIR Filters AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 20.61, N = 8 SE +/- 17.91, N = 9 SE +/- 20.04, N = 9 SE +/- 19.67, N = 9 929.1 911.2 931.6 920.8 1. 3.8.1.0
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19 - Decompression Speed AMD AOCC 3.0 LLVM Clang 12 AMD AOCC 2.3 GCC 10.2 900 1800 2700 3600 4500 SE +/- 417.40, N = 3 SE +/- 50.52, N = 3 SE +/- 12.18, N = 3 SE +/- 6.53, N = 3 3608.8 4000.9 4097.3 4251.7 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
GCC 10.2 Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3204 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 2000GB Corsair Force MP600 + 2000GB, Graphics: AMD NAVY_FLOUNDER 12GB (2855/1000MHz), Audio: AMD Device ab28, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: GCC 10.2.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 14 March 2021 08:46 by user pts.
AMD AOCC 2.3 Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3204 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 2000GB Corsair Force MP600 + 2000GB, Graphics: AMD NAVY_FLOUNDER 12GB (2855/1000MHz), Audio: AMD Device ab28, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: Clang 11.0.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: (unknown)Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 14 March 2021 17:02 by user pts.
LLVM Clang 12 Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3204 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 2000GB Corsair Force MP600 + 2000GB, Graphics: AMD NAVY_FLOUNDER 12GB (2855/1000MHz), Audio: AMD Device ab28, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: Clang 12.0.0-++rc3-1~exp1~oibaf~g, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 15 March 2021 05:52 by user pts.
AMD AOCC 3.0 Processor: AMD Ryzen 9 5950X 16-Core @ 3.40GHz (16 Cores / 32 Threads), Motherboard: ASUS ROG CROSSHAIR VIII HERO (WI-FI) (3204 BIOS), Chipset: AMD Starship/Matisse, Memory: 32GB, Disk: 2000GB Corsair Force MP600 + 2000GB, Graphics: AMD NAVY_FLOUNDER 12GB (2855/1000MHz), Audio: AMD Device ab28, Monitor: ASUS MG28U, Network: Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200
OS: Ubuntu 20.10, Kernel: 5.11.6-051106-generic (x86_64), Desktop: GNOME Shell 3.38.2, Display Server: X Server 1.20.9, OpenGL: 4.6 Mesa 21.1.0-devel (git-684f97d 2021-03-12 groovy-oibaf-ppa) (LLVM 11.0.1), Vulkan: 1.2.168, Compiler: Clang 12.0.0, File-System: ext4, Screen Resolution: 3840x2160
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: (unknown)Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa201009Python Notes: Python 3.8.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 15 March 2021 13:34 by user pts.