AMD EPYC 7773X GCC / Clang / AOCC compiler benchmarking by Michael Larabel for a future article.
GCC 11.2 Processor: 2 x AMD EPYC 7773X 64-Core @ 2.20GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (TYM1008C BIOS), Chipset: AMD Starship/Matisse, Memory: 16 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E2, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710
OS: Ubuntu 22.04, Kernel: 5.17.0-051700rc8-generic (x86_64), Desktop: GNOME Shell 42.0, Display Server: X Server, Vulkan: 1.2.204, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: NONE / errors=remount-ro,relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228Python Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Clang 14.0 OS: Ubuntu 22.04, Kernel: 5.17.0-051700rc8-generic (x86_64), Desktop: GNOME Shell 42.0, Display Server: X Server, Vulkan: 1.2.204, Compiler: Clang 14.0.0-1ubuntu1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228Python Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD AOCC 3.2 OS: Ubuntu 22.04, Kernel: 5.17.0-051700rc8-generic (x86_64), Desktop: GNOME Shell 42.0, Display Server: X Server, Vulkan: 1.2.204, Compiler: Clang 13.0.0, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver3Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228Python Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
AMD EPYC 7773X Compilers OpenBenchmarking.org Phoronix Test Suite 2 x AMD EPYC 7773X 64-Core @ 2.20GHz (128 Cores / 256 Threads) AMD DAYTONA_X (TYM1008C BIOS) AMD Starship/Matisse 16 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E2 800GB INTEL SSDPF21Q800GB ASPEED VE228 2 x Mellanox MT27710 Ubuntu 22.04 5.17.0-051700rc8-generic (x86_64) GNOME Shell 42.0 X Server 1.2.204 GCC 11.2.0 Clang 14.0.0-1ubuntu1 Clang 13.0.0 ext4 1920x1080 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Vulkan Compilers File-System Screen Resolution AMD EPYC 7773X Compilers Performance System Logs - Transparent Huge Pages: madvise - CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto" - GCC 11.2: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - AMD AOCC 3.2: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver3 - GCC 11.2: NONE / errors=remount-ro,relatime,rw / Block Size: 4096 - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228 - Python 3.10.4 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
GCC 11.2 Clang 14.0 AMD AOCC 3.2 Result Overview Phoronix Test Suite 100% 125% 149% 174% 199% GraphicsMagick Etcpak x265 LeelaChessZero Coremark SVT-HEVC JPEG XL Decoding libjxl JPEG XL libjxl TSCP SVT-VP9 LAME MP3 Encoding Xmrig ASTC Encoder libjpeg-turbo tjbench Liquid-DSP WebP Image Encode SVT-AV1 Google Draco libavif avifenc QuantLib Zstd Compression AOBench Kvazaar OpenSSL OpenJPEG LAMMPS Molecular Dynamics Simulator KTX-Software toktx Primesieve FLAC Audio Encoding
AMD EPYC 7773X Compilers tscp: AI Chess Performance encode-flac: WAV To FLAC encode-mp3: WAV To MP3 tjbench: Decompression Throughput astcenc: Thorough astcenc: Exhaustive etcpak: DXT1 etcpak: ETC2 draco: Lion toktx: UASTC 3 toktx: Zstd Compression 9 toktx: Zstd Compression 19 toktx: UASTC 3 + Zstd Compression 19 toktx: UASTC 4 + Zstd Compression 19 jpegxl-decode: 1 jpegxl-decode: All jpegxl: PNG - 8 jpegxl: JPEG - 7 openjpeg: NASA Curiosity Panorama M34 webp: Default webp: Quality 100 webp: Quality 100, Lossless webp: Quality 100, Highest Compression webp: Quality 100, Lossless, Highest Compression xmrig: Monero - 1M xmrig: Wownero - 1M quantlib: lczero: BLAS lczero: Eigen lammps: 20k Atoms lammps: Rhodopsin Protein coremark: CoreMark Size 666 - Iterations Per Second primesieve: 1e12 Prime Number Generation compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed kvazaar: Bosphorus 4K - Medium kvazaar: Bosphorus 4K - Very Fast aobench: 2048 x 2048 - Total Time graphics-magick: Rotate graphics-magick: Resizing svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-av1: Preset 4 - Bosphorus 4K svt-av1: Preset 10 - Bosphorus 4K svt-av1: Preset 12 - Bosphorus 4K x265: Bosphorus 4K svt-hevc: 7 - Bosphorus 1080p svt-hevc: 10 - Bosphorus 1080p avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 6, Lossless liquid-dsp: 128 - 256 - 57 liquid-dsp: 256 - 256 - 57 openssl: SHA256 openssl: RSA4096 openssl: RSA4096 GCC 11.2 Clang 14.0 AMD AOCC 3.2 1094141 21.516 8.988 163.486990 6.4222 5.8792 844.045 134.043 5952 4.604 3.839 21.441 9.015 35.583 46.70 564.28 0.72 71.13 362483 1.695 2.906 24.561 8.798 50.642 41100.5 42309.7 2125.3 4159 4187 35.972 28.468 4447513.211801 2.613 98.8 2269.9 31.52 43.64 49.744 537 164 330.00 4.082 107.652 140.461 19.37 288.97 376.34 88.757 48.775 5.020 8.318 5825600000 6127133333 156404481030 26996.8 1770209.9 1231667 21.834 9.986 150.720021 5.9580 5.7737 1926.467 166.392 6176 4.738 3.886 22.989 9.389 35.531 53.78 605.18 0.67 78.02 349040 1.701 2.871 23.868 8.585 45.089 40338.2 40922.7 2240.3 4224 5107 36.010 29.947 3705497.015609 2.611 99.0 2086.0 33.21 45.28 50.303 450 93 362.42 4.005 104.559 129.325 21.31 303.09 437.24 87.251 47.512 4.911 7.976 5867933333 6229033333 170350109150 26924.9 1774479.0 1172438 21.804 9.782 160.291046 5.7378 5.5686 2012.565 158.347 5859 4.684 3.832 21.813 9.071 35.386 57.14 599.49 0.79 82.75 352848 1.660 2.759 22.759 8.026 45.901 38790.7 36998.0 2251.4 4551 5570 36.316 30.353 4242648.798194 2.564 98.7 2311.1 32.92 45.36 47.941 616 268 367.81 4.483 110.839 132.782 25.03 308.11 472.24 84.815 46.271 4.635 7.822 5949000000 7010600000 176385148413 26972.4 1768233.2 OpenBenchmarking.org
TSCP This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance Clang 14.0 AMD AOCC 3.2 GCC 11.2 300K 600K 900K 1200K 1500K SE +/- 4052.15, N = 5 SE +/- 4599.34, N = 5 SE +/- 2612.05, N = 5 1231667 1172438 1094141 1. (CC) gcc options: -O3 -march=native -flto
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 GCC 11.2 AMD AOCC 3.2 Clang 14.0 3 6 9 12 15 SE +/- 0.024, N = 3 SE +/- 0.020, N = 3 SE +/- 0.017, N = 3 8.988 9.782 9.986 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=native -flto -lncurses -lm
libjpeg-turbo tjbench tjbench is a JPEG decompression/compression benchmark that is part of libjpeg-turbo, a JPEG image codec library optimized for SIMD instructions on modern CPU architectures. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Megapixels/sec, More Is Better libjpeg-turbo tjbench 2.1.0 Test: Decompression Throughput GCC 11.2 AMD AOCC 3.2 Clang 14.0 40 80 120 160 200 SE +/- 0.17, N = 3 SE +/- 0.62, N = 3 SE +/- 0.18, N = 3 163.49 160.29 150.72 1. (CC) gcc options: -O3 -march=native -flto -rdynamic -lm
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Thorough AMD AOCC 3.2 Clang 14.0 GCC 11.2 2 4 6 8 10 SE +/- 0.0468, N = 15 SE +/- 0.0253, N = 3 SE +/- 0.0328, N = 3 5.7378 5.9580 6.4222 1. (CXX) g++ options: -O3 -march=native -flto -pthread
OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 3.2 Preset: Exhaustive AMD AOCC 3.2 Clang 14.0 GCC 11.2 1.3228 2.6456 3.9684 5.2912 6.614 SE +/- 0.0070, N = 3 SE +/- 0.0041, N = 3 SE +/- 0.0090, N = 3 5.5686 5.7737 5.8792 1. (CXX) g++ options: -O3 -march=native -flto -pthread
Etcpak Etcpack is the self-proclaimed "fastest ETC compressor on the planet" with focused on providing open-source, very fast ETC and S3 texture compression support. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 AMD AOCC 3.2 Clang 14.0 GCC 11.2 400 800 1200 1600 2000 SE +/- 23.71, N = 15 SE +/- 29.34, N = 15 SE +/- 2.54, N = 3 2012.57 1926.47 844.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 Clang 14.0 AMD AOCC 3.2 GCC 11.2 40 80 120 160 200 SE +/- 0.04, N = 3 SE +/- 1.75, N = 4 SE +/- 1.46, N = 3 166.39 158.35 134.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Google Draco Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better Google Draco 1.5.0 Model: Lion AMD AOCC 3.2 GCC 11.2 Clang 14.0 1300 2600 3900 5200 6500 SE +/- 8.33, N = 3 SE +/- 7.94, N = 3 SE +/- 17.33, N = 3 5859 5952 6176 1. (CXX) g++ options: -O3 -march=native -flto
KTX-Software toktx This is a benchmark of The Khronos Group's KTX-Software library and tools. KTX-Software provides "toktx" for converting/creating in the KTX container format for image textures. This benchmark times how long it takes to convert to KTX 2.0 format with various settings using a reference PNG sample input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better KTX-Software toktx 4.0 Settings: UASTC 3 GCC 11.2 AMD AOCC 3.2 Clang 14.0 1.0661 2.1322 3.1983 4.2644 5.3305 SE +/- 0.054, N = 3 SE +/- 0.059, N = 3 SE +/- 0.045, N = 15 4.604 4.684 4.738
OpenBenchmarking.org Seconds, Fewer Is Better KTX-Software toktx 4.0 Settings: Zstd Compression 9 AMD AOCC 3.2 GCC 11.2 Clang 14.0 0.8744 1.7488 2.6232 3.4976 4.372 SE +/- 0.021, N = 3 SE +/- 0.052, N = 3 SE +/- 0.036, N = 3 3.832 3.839 3.886
OpenBenchmarking.org Seconds, Fewer Is Better KTX-Software toktx 4.0 Settings: UASTC 3 + Zstd Compression 19 GCC 11.2 AMD AOCC 3.2 Clang 14.0 3 6 9 12 15 SE +/- 0.100, N = 3 SE +/- 0.024, N = 3 SE +/- 0.066, N = 15 9.015 9.071 9.389
OpenBenchmarking.org Seconds, Fewer Is Better KTX-Software toktx 4.0 Settings: UASTC 4 + Zstd Compression 19 AMD AOCC 3.2 Clang 14.0 GCC 11.2 8 16 24 32 40 SE +/- 0.17, N = 3 SE +/- 0.09, N = 3 SE +/- 0.46, N = 3 35.39 35.53 35.58
JPEG XL Decoding libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is suited for JPEG XL decode performance testing to PNG output file, the pts/jpexl test is for encode performance. The JPEG XL encoding/decoding is done using the libjxl codebase. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding libjxl 0.6.1 CPU Threads: 1 AMD AOCC 3.2 Clang 14.0 GCC 11.2 13 26 39 52 65 SE +/- 0.15, N = 3 SE +/- 0.08, N = 3 SE +/- 0.08, N = 3 57.14 53.78 46.70
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: PNG - Encode Speed: 8 AMD AOCC 3.2 GCC 11.2 Clang 14.0 0.1778 0.3556 0.5334 0.7112 0.889 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.79 0.72 0.67 -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -flto -funwind-tables -O2 -fPIE -pie
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.6.1 Input: JPEG - Encode Speed: 7 AMD AOCC 3.2 Clang 14.0 GCC 11.2 20 40 60 80 100 SE +/- 0.90, N = 4 SE +/- 1.11, N = 3 SE +/- 1.50, N = 15 82.75 78.02 71.13 -Xclang -mrelax-all -Xclang -mrelax-all 1. (CXX) g++ options: -O3 -march=native -flto -funwind-tables -O2 -fPIE -pie
OpenJPEG OpenJPEG is an open-source JPEG 2000 codec written in the C programming language. The default input for this test profile is the NASA/JPL-Caltech/MSSS Curiosity panorama 717MB TIFF image file converting to JPEG2000 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OpenJPEG 2.4 Encode: NASA Curiosity Panorama M34 Clang 14.0 AMD AOCC 3.2 GCC 11.2 80K 160K 240K 320K 400K SE +/- 935.91, N = 3 SE +/- 3533.23, N = 15 SE +/- 3417.00, N = 3 349040 352848 362483 1. (CXX) g++ options: -O3 -march=native -flto -rdynamic
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Default AMD AOCC 3.2 GCC 11.2 Clang 14.0 0.3827 0.7654 1.1481 1.5308 1.9135 SE +/- 0.003, N = 3 SE +/- 0.001, N = 3 SE +/- 0.015, N = 15 1.660 1.695 1.701 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -lm -lpng16 -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 AMD AOCC 3.2 Clang 14.0 GCC 11.2 0.6539 1.3078 1.9617 2.6156 3.2695 SE +/- 0.004, N = 3 SE +/- 0.029, N = 15 SE +/- 0.029, N = 3 2.759 2.871 2.906 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -lm -lpng16 -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless AMD AOCC 3.2 Clang 14.0 GCC 11.2 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.22, N = 15 SE +/- 0.22, N = 3 22.76 23.87 24.56 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -lm -lpng16 -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression AMD AOCC 3.2 Clang 14.0 GCC 11.2 2 4 6 8 10 SE +/- 0.005, N = 3 SE +/- 0.013, N = 3 SE +/- 0.004, N = 3 8.026 8.585 8.798 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -lm -lpng16 -ljpeg
OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression Clang 14.0 AMD AOCC 3.2 GCC 11.2 11 22 33 44 55 SE +/- 0.11, N = 3 SE +/- 0.02, N = 3 SE +/- 0.51, N = 5 45.09 45.90 50.64 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -flto -lm -lpng16 -ljpeg
Xmrig Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Monero - Hash Count: 1M GCC 11.2 Clang 14.0 AMD AOCC 3.2 9K 18K 27K 36K 45K SE +/- 47.93, N = 3 SE +/- 142.04, N = 3 SE +/- 276.55, N = 3 41100.5 40338.2 38790.7 -static-libgcc -static-libstdc++ -funroll-loops -funroll-loops 1. (CXX) g++ options: -O3 -march=native -flto -fexceptions -fno-rtti -maes -Ofast -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
OpenBenchmarking.org H/s, More Is Better Xmrig 6.12.1 Variant: Wownero - Hash Count: 1M GCC 11.2 Clang 14.0 AMD AOCC 3.2 9K 18K 27K 36K 45K SE +/- 63.43, N = 3 SE +/- 526.98, N = 3 SE +/- 352.95, N = 3 42309.7 40922.7 36998.0 -static-libgcc -static-libstdc++ -funroll-loops -funroll-loops 1. (CXX) g++ options: -O3 -march=native -flto -fexceptions -fno-rtti -maes -Ofast -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 AMD AOCC 3.2 Clang 14.0 GCC 11.2 500 1000 1500 2000 2500 SE +/- 14.43, N = 3 SE +/- 5.25, N = 3 SE +/- 9.30, N = 3 2251.4 2240.3 2125.3 1. (CXX) g++ options: -O3 -march=native -rdynamic
LeelaChessZero LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS AMD AOCC 3.2 Clang 14.0 GCC 11.2 1000 2000 3000 4000 5000 SE +/- 44.95, N = 9 SE +/- 23.81, N = 3 SE +/- 38.97, N = 3 4551 4224 4159 1. (CXX) g++ options: -flto -O3 -march=native -pthread
OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: Eigen AMD AOCC 3.2 Clang 14.0 GCC 11.2 1200 2400 3600 4800 6000 SE +/- 67.86, N = 9 SE +/- 57.43, N = 9 SE +/- 48.22, N = 3 5570 5107 4187 1. (CXX) g++ options: -flto -O3 -march=native -pthread
Coremark This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second GCC 11.2 AMD AOCC 3.2 Clang 14.0 1000K 2000K 3000K 4000K 5000K SE +/- 9967.53, N = 3 SE +/- 14710.43, N = 3 SE +/- 34949.41, N = 3 4447513.21 4242648.80 3705497.02 1. (CC) gcc options: -O2 -O3 -march=native -flto -lrt" -lrt
Primesieve Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.7 1e12 Prime Number Generation AMD AOCC 3.2 Clang 14.0 GCC 11.2 0.5879 1.1758 1.7637 2.3516 2.9395 SE +/- 0.028, N = 5 SE +/- 0.030, N = 15 SE +/- 0.029, N = 5 2.564 2.611 2.613 1. (CXX) g++ options: -O3 -march=native -flto
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed Clang 14.0 GCC 11.2 AMD AOCC 3.2 20 40 60 80 100 SE +/- 1.18, N = 4 SE +/- 1.34, N = 3 SE +/- 1.21, N = 3 99.0 98.8 98.7 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed AMD AOCC 3.2 GCC 11.2 Clang 14.0 500 1000 1500 2000 2500 SE +/- 6.09, N = 3 SE +/- 4.90, N = 3 SE +/- 64.15, N = 4 2311.1 2269.9 2086.0 1. (CC) gcc options: -O3 -march=native -flto -pthread -lz -llzma
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Medium Clang 14.0 AMD AOCC 3.2 GCC 11.2 8 16 24 32 40 SE +/- 0.03, N = 3 SE +/- 0.20, N = 3 SE +/- 0.17, N = 3 33.21 32.92 31.52 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast AMD AOCC 3.2 Clang 14.0 GCC 11.2 10 20 30 40 50 SE +/- 0.64, N = 15 SE +/- 0.50, N = 15 SE +/- 0.56, N = 3 45.36 45.28 43.64 -lpthread 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -flto -lm -lrt
AOBench AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time AMD AOCC 3.2 GCC 11.2 Clang 14.0 11 22 33 44 55 SE +/- 0.41, N = 3 SE +/- 0.13, N = 3 SE +/- 0.25, N = 3 47.94 49.74 50.30 1. (CC) gcc options: -lm -O3 -march=native -flto
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate AMD AOCC 3.2 GCC 11.2 Clang 14.0 130 260 390 520 650 SE +/- 3.76, N = 3 SE +/- 7.17, N = 3 SE +/- 8.14, N = 15 616 537 450 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing AMD AOCC 3.2 GCC 11.2 Clang 14.0 60 120 180 240 300 SE +/- 3.79, N = 3 SE +/- 10.44, N = 15 SE +/- 1.33, N = 3 268 164 93 1. (CC) gcc options: -fopenmp -O3 -march=native -flto -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p AMD AOCC 3.2 Clang 14.0 GCC 11.2 80 160 240 320 400 SE +/- 0.54, N = 3 SE +/- 3.50, N = 3 SE +/- 1.46, N = 3 367.81 362.42 330.00 1. (CC) gcc options: -O3 -fcommon -march=native -flto -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.9 Encoder Mode: Preset 4 - Input: Bosphorus 4K AMD AOCC 3.2 GCC 11.2 Clang 14.0 1.0087 2.0174 3.0261 4.0348 5.0435 SE +/- 0.052, N = 3 SE +/- 0.036, N = 3 SE +/- 0.015, N = 3 4.483 4.082 4.005 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.9 Encoder Mode: Preset 10 - Input: Bosphorus 4K AMD AOCC 3.2 GCC 11.2 Clang 14.0 20 40 60 80 100 SE +/- 0.22, N = 3 SE +/- 0.35, N = 3 SE +/- 0.85, N = 3 110.84 107.65 104.56 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.9 Encoder Mode: Preset 12 - Input: Bosphorus 4K GCC 11.2 AMD AOCC 3.2 Clang 14.0 30 60 90 120 150 SE +/- 1.52, N = 5 SE +/- 1.72, N = 3 SE +/- 0.12, N = 3 140.46 132.78 129.33 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie
x265 This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD AOCC 3.2 Clang 14.0 GCC 11.2 6 12 18 24 30 SE +/- 0.18, N = 3 SE +/- 0.21, N = 15 SE +/- 0.21, N = 15 25.03 21.31 19.37 1. (CXX) g++ options: -O3 -march=native -flto -rdynamic -lpthread -lrt -ldl -lnuma
SVT-HEVC This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p AMD AOCC 3.2 Clang 14.0 GCC 11.2 70 140 210 280 350 SE +/- 2.88, N = 3 SE +/- 1.03, N = 3 SE +/- 2.37, N = 3 308.11 303.09 288.97 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p AMD AOCC 3.2 Clang 14.0 GCC 11.2 100 200 300 400 500 SE +/- 3.33, N = 3 SE +/- 5.37, N = 3 SE +/- 3.15, N = 13 472.24 437.24 376.34 1. (CC) gcc options: -O3 -march=native -flto -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 2 AMD AOCC 3.2 Clang 14.0 GCC 11.2 11 22 33 44 55 SE +/- 0.17, N = 3 SE +/- 0.27, N = 3 SE +/- 0.13, N = 3 46.27 47.51 48.78 1. (CXX) g++ options: -O3 -fPIC -march=native -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 6 AMD AOCC 3.2 Clang 14.0 GCC 11.2 1.1295 2.259 3.3885 4.518 5.6475 SE +/- 0.047, N = 3 SE +/- 0.057, N = 4 SE +/- 0.046, N = 7 4.635 4.911 5.020 1. (CXX) g++ options: -O3 -fPIC -march=native -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.10 Encoder Speed: 6, Lossless AMD AOCC 3.2 Clang 14.0 GCC 11.2 2 4 6 8 10 SE +/- 0.109, N = 3 SE +/- 0.038, N = 3 SE +/- 0.057, N = 3 7.822 7.976 8.318 1. (CXX) g++ options: -O3 -fPIC -march=native -flto -lm
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 128 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.2 Clang 14.0 GCC 11.2 1300M 2600M 3900M 5200M 6500M SE +/- 3257811.13, N = 3 SE +/- 2434018.17, N = 3 SE +/- 7813023.32, N = 3 5949000000 5867933333 5825600000 1. (CC) gcc options: -O3 -march=native -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.2 Clang 14.0 GCC 11.2 1500M 3000M 4500M 6000M 7500M SE +/- 1311487.70, N = 3 SE +/- 1414606.34, N = 3 SE +/- 1604507.54, N = 3 7010600000 6229033333 6127133333 1. (CC) gcc options: -O3 -march=native -flto -pthread -lm -lc -lliquid
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 AMD AOCC 3.2 Clang 14.0 GCC 11.2 40000M 80000M 120000M 160000M 200000M SE +/- 144731144.77, N = 3 SE +/- 326147908.01, N = 3 SE +/- 330832787.20, N = 3 176385148413 170350109150 156404481030 -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=native -flto -lssl -lcrypto -ldl
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 GCC 11.2 AMD AOCC 3.2 Clang 14.0 6K 12K 18K 24K 30K SE +/- 8.41, N = 3 SE +/- 9.62, N = 3 SE +/- 26.21, N = 3 26996.8 26972.4 26924.9 -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=native -flto -lssl -lcrypto -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Clang 14.0 GCC 11.2 AMD AOCC 3.2 400K 800K 1200K 1600K 2000K SE +/- 686.16, N = 3 SE +/- 368.08, N = 3 SE +/- 526.74, N = 3 1774479.0 1770209.9 1768233.2 -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=native -flto -lssl -lcrypto -ldl
GCC 11.2 Processor: 2 x AMD EPYC 7773X 64-Core @ 2.20GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (TYM1008C BIOS), Chipset: AMD Starship/Matisse, Memory: 16 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E2, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710
OS: Ubuntu 22.04, Kernel: 5.17.0-051700rc8-generic (x86_64), Desktop: GNOME Shell 42.0, Display Server: X Server, Vulkan: 1.2.204, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vDisk Notes: NONE / errors=remount-ro,relatime,rw / Block Size: 4096Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228Python Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 9 April 2022 16:00 by user root.
Clang 14.0 Processor: 2 x AMD EPYC 7773X 64-Core @ 2.20GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (TYM1008C BIOS), Chipset: AMD Starship/Matisse, Memory: 16 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E2, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710
OS: Ubuntu 22.04, Kernel: 5.17.0-051700rc8-generic (x86_64), Desktop: GNOME Shell 42.0, Display Server: X Server, Vulkan: 1.2.204, Compiler: Clang 14.0.0-1ubuntu1, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228Python Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 10 April 2022 18:43 by user root.
AMD AOCC 3.2 Processor: 2 x AMD EPYC 7773X 64-Core @ 2.20GHz (128 Cores / 256 Threads), Motherboard: AMD DAYTONA_X (TYM1008C BIOS), Chipset: AMD Starship/Matisse, Memory: 16 x 32 GB DDR4-3200MT/s 36ASF4G72PZ-3G2E2, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VE228, Network: 2 x Mellanox MT27710
OS: Ubuntu 22.04, Kernel: 5.17.0-051700rc8-generic (x86_64), Desktop: GNOME Shell 42.0, Display Server: X Server, Vulkan: 1.2.204, Compiler: Clang 13.0.0, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver3Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa001228Python Notes: Python 3.10.4Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 11 April 2022 07:51 by user root.