AMD EPYC 9654 GCC 13 development compiler benchmarks by Michael Larabel for a future article.
Znver4 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Znver4 + Prefer AVX-512 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto" CFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Znver3 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -flto" CFLAGS="-O3 -march=znver3 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Znver3 + AVX-512 Processor: 2 x AMD EPYC 9654 96-Core @ 3.71GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 23.04, Kernel: 5.19.0-21-generic (x86_64), Desktop: GNOME Shell 43.1, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 13.0.0 20230103, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto" CFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Znver3 + AVX-512 Znver4 Znver3 Znver4 + Prefer AVX-512 50 100 150 200 250 SE +/- 2.13, N = 3 SE +/- 2.07, N = 3 SE +/- 2.52, N = 3 SE +/- 2.51, N = 4 214.63 214.50 214.08 211.02 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 60 120 180 240 300 SE +/- 1.06, N = 3 SE +/- 0.73, N = 3 SE +/- 0.73, N = 3 SE +/- 1.75, N = 3 266.16 265.52 264.28 264.15 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 Znver3 + AVX-512 Znver3 Znver4 + Prefer AVX-512 Znver4 60000M 120000M 180000M 240000M 300000M SE +/- 143443542.20, N = 3 SE +/- 18811489.95, N = 3 SE +/- 172236238.05, N = 3 SE +/- 150345864.28, N = 3 266899089453 266464361193 266230124070 265326713587 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 -march=native -march=native 1. (CC) gcc options: -pthread -m64 -O3 -flto -lssl -lcrypto -ldl
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Medium Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 14 28 42 56 70 SE +/- 0.52, N = 3 SE +/- 0.36, N = 3 SE +/- 0.46, N = 3 SE +/- 0.10, N = 3 64.35 64.27 64.05 63.80 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -flto -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 16 32 48 64 80 SE +/- 0.28, N = 3 SE +/- 0.92, N = 3 SE +/- 0.87, N = 4 SE +/- 0.83, N = 3 72.01 70.81 70.71 70.71 -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -flto -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast Znver4 Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 1.01, N = 3 SE +/- 0.62, N = 3 SE +/- 0.64, N = 15 SE +/- 0.73, N = 15 78.69 78.01 76.76 74.98 -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -flto -lpthread -lm -lrt
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 4 - Input: Bosphorus 4K Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 1.2202 2.4404 3.6606 4.8808 6.101 SE +/- 0.043, N = 3 SE +/- 0.015, N = 3 SE +/- 0.037, N = 3 SE +/- 0.012, N = 3 5.423 5.392 5.374 5.360 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -march=native -flto -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 8 - Input: Bosphorus 4K Znver3 Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 20 40 60 80 100 SE +/- 0.46, N = 3 SE +/- 0.32, N = 3 SE +/- 0.33, N = 3 SE +/- 0.52, N = 3 95.12 94.34 93.76 92.96 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 12 - Input: Bosphorus 4K Znver3 Znver4 Znver3 + AVX-512 Znver4 + Prefer AVX-512 50 100 150 200 250 SE +/- 5.39, N = 15 SE +/- 4.58, N = 15 SE +/- 3.02, N = 15 SE +/- 1.78, N = 3 222.90 210.39 206.68 196.80 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 13 - Input: Bosphorus 4K Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 50 100 150 200 250 SE +/- 4.56, N = 12 SE +/- 2.81, N = 15 SE +/- 3.20, N = 15 SE +/- 3.33, N = 15 219.57 210.43 208.64 196.12 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: Kostya Znver3 Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 0.945 1.89 2.835 3.78 4.725 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 4.20 4.19 4.17 4.16 -march=znver3 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: TopTweet Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 2 4 6 8 10 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 6.96 6.91 6.73 6.61 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: LargeRandom Znver4 Znver3 + AVX-512 Znver3 Znver4 + Prefer AVX-512 0.2858 0.5716 0.8574 1.1432 1.429 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.27 1.25 1.25 1.24 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 -march=native 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: PartialTweets Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 6.74 6.68 6.63 6.43 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: DistinctUserID Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 Znver3 2 4 6 8 10 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 6.86 6.61 6.53 6.46 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Znver3 + AVX-512 Znver4 Znver3 Znver4 + Prefer AVX-512 1200 2400 3600 4800 6000 SE +/- 53.26, N = 3 SE +/- 51.63, N = 3 SE +/- 62.89, N = 3 SE +/- 62.65, N = 4 5365.61 5362.54 5351.94 5275.43 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 1400 2800 4200 5600 7000 SE +/- 26.43, N = 3 SE +/- 18.27, N = 3 SE +/- 18.35, N = 3 SE +/- 43.78, N = 3 6653.86 6638.01 6607.09 6603.68 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
ACES DGEMM This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 16 32 48 64 80 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.28, N = 3 SE +/- 0.10, N = 3 70.38 70.30 70.19 70.05 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -O3 -march=native -fopenmp -flto
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Swirl Znver4 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 600 1200 1800 2400 3000 SE +/- 26.19, N = 7 SE +/- 28.99, N = 3 SE +/- 8.09, N = 3 SE +/- 36.67, N = 3 2862 2826 2681 2563 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Rotate Znver4 Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 150 300 450 600 750 SE +/- 1.73, N = 3 SE +/- 1.00, N = 3 SE +/- 1.20, N = 3 SE +/- 3.84, N = 3 673 656 645 605 -march=native -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Sharpen Znver4 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 300 600 900 1200 1500 SE +/- 6.56, N = 3 SE +/- 1.15, N = 3 SE +/- 13.17, N = 3 SE +/- 10.73, N = 3 1359 1321 1314 1285 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Enhanced Znver4 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 500 1000 1500 2000 2500 SE +/- 2.00, N = 3 SE +/- 14.95, N = 3 SE +/- 5.00, N = 3 SE +/- 19.19, N = 3 2234 2208 2150 1837 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Resizing Znver3 Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 20 40 60 80 100 SE +/- 1.27, N = 15 SE +/- 1.00, N = 15 SE +/- 0.94, N = 15 SE +/- 0.90, N = 15 89 88 87 86 -march=znver3 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Noise-Gaussian Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 200 400 600 800 1000 SE +/- 6.60, N = 15 SE +/- 5.13, N = 3 SE +/- 11.39, N = 15 SE +/- 11.43, N = 15 1024 1018 1013 975 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: HWB Color Space Znver4 Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 300 600 900 1200 1500 SE +/- 4.84, N = 3 SE +/- 15.31, N = 15 SE +/- 11.42, N = 15 SE +/- 1.86, N = 3 1180 1167 1134 1062 -march=native -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
Coremark This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 Znver3 2M 4M 6M 8M 10M SE +/- 2823.14, N = 3 SE +/- 83462.96, N = 3 SE +/- 11460.97, N = 3 SE +/- 62802.17, N = 3 7871273.93 7861097.86 7694653.90 7640546.27 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CC) gcc options: -O2 -O3 -flto -lrt" -lrt
Cpuminer-Opt Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Magi Znver3 Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 2K 4K 6K 8K 10K SE +/- 51.27, N = 3 SE +/- 64.85, N = 3 SE +/- 47.28, N = 3 SE +/- 50.94, N = 3 8490.73 8467.24 8440.79 8355.75 -march=znver3 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: x25x Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 2K 4K 6K 8K 10K SE +/- 90.39, N = 3 SE +/- 74.42, N = 3 SE +/- 15.17, N = 3 SE +/- 17.26, N = 3 8217.70 8042.88 7941.38 6116.97 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: scrypt Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 1000 2000 3000 4000 5000 SE +/- 0.45, N = 3 SE +/- 1.22, N = 3 SE +/- 1.99, N = 3 SE +/- 0.45, N = 3 4790.11 4782.74 4763.91 2959.15 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Deepcoin Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 40K 80K 120K 160K 200K SE +/- 218.28, N = 3 SE +/- 1703.12, N = 5 SE +/- 880.18, N = 3 SE +/- 81.72, N = 3 164993 162242 160157 159147 -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Garlicoin Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 Znver3 16K 32K 48K 64K 80K SE +/- 742.93, N = 3 SE +/- 461.75, N = 3 SE +/- 295.35, N = 3 SE +/- 66.92, N = 3 72837 72413 72130 49523 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Skeincoin Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 400K 800K 1200K 1600K 2000K SE +/- 10323.18, N = 3 SE +/- 7108.55, N = 3 SE +/- 24878.62, N = 3 SE +/- 5094.00, N = 3 2014770 2009367 2004990 1414047 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: LBC, LBRY Credits Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 Znver3 200K 400K 600K 800K 1000K SE +/- 12123.53, N = 3 SE +/- 1800.04, N = 3 SE +/- 829.54, N = 3 SE +/- 3568.28, N = 3 1085827 1067743 1065487 497020 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Quad SHA-256, Pyrite Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 Znver3 500K 1000K 1500K 2000K 2500K SE +/- 25368.27, N = 4 SE +/- 22716.96, N = 3 SE +/- 10306.20, N = 3 SE +/- 5899.09, N = 3 2323995 2264747 2251067 1378987 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Triple SHA-256, Onecoin Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 Znver3 700K 1400K 2100K 2800K 3500K SE +/- 26057.45, N = 3 SE +/- 26496.25, N = 3 SE +/- 22336.69, N = 3 SE +/- 4313.22, N = 3 3323680 3306643 3301217 3255253 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
SecureMark SecureMark is an objective, standardized benchmarking framework for measuring the efficiency of cryptographic processing solutions developed by EEMBC. SecureMark-TLS is benchmarking Transport Layer Security performance with a focus on IoT/edge computing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org marks, More Is Better SecureMark 1.0.4 Benchmark: SecureMark-TLS Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 Znver3 60K 120K 180K 240K 300K SE +/- 380.50, N = 3 SE +/- 383.28, N = 3 SE +/- 640.61, N = 3 SE +/- 1276.92, N = 3 296575 296548 294122 294057 1. (CC) gcc options: -pedantic -O3
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 Znver4 20 40 60 80 100 SE +/- 1.30, N = 3 SE +/- 0.91, N = 15 SE +/- 1.27, N = 3 SE +/- 1.03, N = 6 105.3 104.4 102.9 102.1 -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 Znver3 800 1600 2400 3200 4000 SE +/- 32.49, N = 3 SE +/- 17.00, N = 6 SE +/- 36.60, N = 3 SE +/- 12.56, N = 15 3594.8 3584.1 3581.6 3574.9 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 Znver3 10 20 30 40 50 SE +/- 0.79, N = 15 SE +/- 0.59, N = 15 SE +/- 0.40, N = 15 SE +/- 0.40, N = 15 43.9 42.5 40.8 39.8 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 800 1600 2400 3200 4000 SE +/- 12.59, N = 15 SE +/- 11.99, N = 15 SE +/- 13.90, N = 15 SE +/- 12.89, N = 15 3708.8 3695.5 3685.7 3684.7 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 700 1400 2100 2800 3500 SE +/- 4.52, N = 3 SE +/- 2.74, N = 3 SE +/- 1.71, N = 3 SE +/- 1.99, N = 3 3120.5 3114.9 3112.6 3096.9 1. (CXX) g++ options: -O3 -march=native -rdynamic
SMHasher SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.
Result
OpenBenchmarking.org MiB/sec, More Is Better SMHasher 2022-08-22 Hash: FarmHash32 x86_64 AVX Znver3 Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 9K 18K 27K 36K 45K SE +/- 0.13, N = 3 SE +/- 0.95, N = 3 SE +/- 1.44, N = 3 SE +/- 1.84, N = 3 40565.72 40565.07 40563.96 40559.33 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
cycles/hash
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: FarmHash32 x86_64 AVX Znver4 Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 26.49 26.49 26.49 26.50 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -march=native -flto -flto=auto -fno-fat-lto-objects
Result
OpenBenchmarking.org MiB/sec, More Is Better SMHasher 2022-08-22 Hash: t1ha0_aes_avx2 x86_64 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 Znver3 20K 40K 60K 80K 100K SE +/- 19.58, N = 3 SE +/- 11.55, N = 3 SE +/- 20.00, N = 3 SE +/- 8.85, N = 3 102403.92 102399.74 102354.57 102351.87 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
cycles/hash
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: t1ha0_aes_avx2 x86_64 Znver4 Znver3 + AVX-512 Znver3 Znver4 + Prefer AVX-512 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 20.52 20.53 20.53 20.81 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -march=native -flto -flto=auto -fno-fat-lto-objects
Result
OpenBenchmarking.org MiB/sec, More Is Better SMHasher 2022-08-22 Hash: MeowHash x86_64 AES-NI Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 Znver4 12K 24K 36K 48K 60K SE +/- 9.47, N = 3 SE +/- 10.52, N = 3 SE +/- 9.98, N = 3 SE +/- 7.42, N = 3 54297.04 54284.65 54281.94 54272.14 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -march=native -flto -flto=auto -fno-fat-lto-objects
cycles/hash
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: MeowHash x86_64 AES-NI Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 Znver4 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 44.95 44.96 44.97 44.98 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -march=native -flto -flto=auto -fno-fat-lto-objects
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 90 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 Znver3 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 9.86 9.83 9.81 9.61 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 90 Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.13, N = 3 SE +/- 0.03, N = 3 9.48 9.43 9.27 9.18 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 100 Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 0.1868 0.3736 0.5604 0.7472 0.934 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.83 0.82 0.82 0.81 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 100 Znver4 + Prefer AVX-512 Znver3 Znver4 Znver3 + AVX-512 0.1688 0.3376 0.5064 0.6752 0.844 SE +/- 0.01, N = 3 SE +/- 0.01, N = 6 SE +/- 0.01, N = 9 SE +/- 0.01, N = 9 0.75 0.74 0.73 0.71 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
JPEG XL Decoding libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is suited for JPEG XL decode performance testing to PNG output file, the pts/jpexl test is for encode performance. The JPEG XL encoding/decoding is done using the libjxl codebase. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding libjxl 0.7 CPU Threads: 1 Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 11 22 33 44 55 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.25, N = 3 48.64 48.22 48.21 47.02
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding libjxl 0.7 CPU Threads: All Znver4 Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 60 120 180 240 300 SE +/- 1.41, N = 3 SE +/- 0.30, N = 3 SE +/- 1.29, N = 3 SE +/- 1.24, N = 3 277.20 272.31 269.59 266.53
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Default Znver3 Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.07, N = 3 18.99 18.97 18.95 18.85 -march=znver3 -lgif -ltiff -march=native -ljpeg -ltiff -march=native -ljpeg -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100 Znver4 + Prefer AVX-512 Znver4 Znver3 Znver3 + AVX-512 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 11.54 11.54 11.48 11.38 -march=native -ljpeg -ltiff -march=native -ljpeg -march=znver3 -lgif -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lm -lpng16
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 0.3308 0.6616 0.9924 1.3232 1.654 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.47 1.47 1.45 1.45 -march=znver3 -lgif -ltiff -march=native -ljpeg -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=native -ljpeg 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Highest Compression Znver3 Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 0.819 1.638 2.457 3.276 4.095 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 3.64 3.25 3.23 3.11 -march=znver3 -lgif -ltiff -march=native -ljpeg -march=native -ljpeg -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless, Highest Compression Znver3 + AVX-512 Znver3 Znver4 + Prefer AVX-512 Znver4 0.1305 0.261 0.3915 0.522 0.6525 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.58 0.58 0.58 0.57 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=znver3 -lgif -ltiff -march=native -ljpeg -ltiff -march=native -ljpeg 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lm -lpng16
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Medium Znver3 + AVX-512 Znver4 Znver3 Znver4 + Prefer AVX-512 110 220 330 440 550 SE +/- 5.68, N = 3 SE +/- 3.36, N = 13 SE +/- 6.17, N = 3 SE +/- 0.56, N = 3 511.52 493.22 459.69 420.70 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 -march=native 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 118.99 118.74 117.69 116.09 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 Znver3 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 13.09 13.03 12.94 12.22 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -pthread
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 19.49 19.44 19.09 18.23 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CXX) g++ options: -O3 -flto
LAMMPS Molecular Dynamics Simulator LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 13 26 39 52 65 SE +/- 0.08, N = 3 SE +/- 0.21, N = 3 SE +/- 0.04, N = 3 SE +/- 0.15, N = 3 56.04 55.74 55.72 55.62 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 -march=native 1. (CXX) g++ options: -O3 -flto -lm -ldl
OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 Znver3 12 24 36 48 60 SE +/- 0.48, N = 3 SE +/- 0.40, N = 10 SE +/- 0.21, N = 3 SE +/- 0.43, N = 3 52.29 51.82 51.59 51.46 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 1. (CXX) g++ options: -O3 -flto -lm -ldl
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 96000 - Buffer Size: 1024 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 0.993 1.986 2.979 3.972 4.965 SE +/- 0.011844, N = 3 SE +/- 0.013873, N = 3 SE +/- 0.016378, N = 3 SE +/- 0.006484, N = 3 4.413379 4.408290 4.356446 4.331230 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 192000 - Buffer Size: 1024 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 0.6454 1.2908 1.9362 2.5816 3.227 SE +/- 0.000776, N = 3 SE +/- 0.003508, N = 3 SE +/- 0.014903, N = 3 SE +/- 0.009628, N = 3 2.868373 2.867233 2.820079 2.783717 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
PJSIP PJSIP is a free and open source multimedia communication library written in C language implementing standard based protocols such as SIP, SDP, RTP, STUN, TURN, and ICE. It combines signaling protocol (SIP) with rich multimedia framework and NAT traversal functionality into high level API that is portable and suitable for almost any type of systems ranging from desktops, embedded systems, to mobile handsets. This test profile is making use of pjsip-perf with both the client/server on teh system. More details on the PJSIP benchmark at https://www.pjsip.org/high-performance-sip.htm Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Responses Per Second, More Is Better PJSIP 2.11 Method: INVITE Znver4 Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 1100 2200 3300 4400 5500 SE +/- 19.91, N = 3 SE +/- 15.00, N = 3 SE +/- 24.85, N = 3 SE +/- 51.36, N = 5 5200 5149 5132 5084 -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -lavformat -lavcodec -lswscale -lavutil -lstdc++ -lopus -lssl -lcrypto -lm -lrt -lpthread -lasound -O3 -flto
OpenBenchmarking.org Responses Per Second, More Is Better PJSIP 2.11 Method: OPTIONS, Stateful Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 2K 4K 6K 8K 10K SE +/- 32.42, N = 3 SE +/- 23.25, N = 3 SE +/- 36.35, N = 3 SE +/- 72.95, N = 3 9288 9237 9236 9226 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 1. (CC) gcc options: -lavformat -lavcodec -lswscale -lavutil -lstdc++ -lopus -lssl -lcrypto -lm -lrt -lpthread -lasound -O3 -flto
OpenBenchmarking.org Responses Per Second, More Is Better PJSIP 2.11 Method: OPTIONS, Stateless Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 70K 140K 210K 280K 350K SE +/- 4433.83, N = 12 SE +/- 5240.59, N = 15 SE +/- 3613.75, N = 15 SE +/- 2531.21, N = 3 336885 336791 336615 335767 -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -lavformat -lavcodec -lswscale -lavutil -lstdc++ -lopus -lssl -lcrypto -lm -lrt -lpthread -lasound -O3 -flto
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 128 - Buffer Length: 256 - Filter Length: 57 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 Znver3 1500M 3000M 4500M 6000M 7500M SE +/- 4658445.14, N = 3 SE +/- 3192874.01, N = 3 SE +/- 6145549.43, N = 3 SE +/- 6548367.06, N = 3 6999233333 6990866667 6977633333 6940833333 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 Znver3 2000M 4000M 6000M 8000M 10000M SE +/- 4697162.26, N = 3 SE +/- 8533658.85, N = 3 SE +/- 5651843.36, N = 3 SE +/- 24004606.04, N = 3 9813700000 9809800000 9789500000 9735666667 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 384 - Buffer Length: 256 - Filter Length: 57 Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver4 Znver3 2000M 4000M 6000M 8000M 10000M SE +/- 2403700.85, N = 3 SE +/- 7218802.61, N = 3 SE +/- 7356025.50, N = 3 SE +/- 10066445.91, N = 3 11301666667 11270666667 11249333333 11176000000 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native -march=znver3 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Znver3 + AVX-512 Znver4 Znver3 Znver4 + Prefer AVX-512 10K 20K 30K 40K 50K SE +/- 0.44, N = 3 SE +/- 8.03, N = 3 SE +/- 22.49, N = 3 SE +/- 151.57, N = 3 44499.3 44490.1 44435.3 44301.8 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=znver3 -march=native 1. (CC) gcc options: -pthread -m64 -O3 -flto -lssl -lcrypto -ldl
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 Znver3 + AVX-512 Znver3 Znver4 Znver4 + Prefer AVX-512 60M 120M 180M 240M 300M SE +/- 3107958.27, N = 12 SE +/- 2950522.72, N = 15 SE +/- 3293132.01, N = 15 SE +/- 2000945.99, N = 3 271735708 263812847 261562280 254648533 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=znver3 -march=native -march=native 1. (CXX) g++ options: -O3 -flto -fopenmp
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Znver4 Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 600K 1200K 1800K 2400K 3000K SE +/- 140.73, N = 3 SE +/- 228.34, N = 3 SE +/- 1641.03, N = 3 SE +/- 10617.65, N = 3 2939503.5 2938488.5 2935372.1 2924586.4 -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -pthread -m64 -O3 -flto -lssl -lcrypto -ldl
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 4 8 12 16 20 SE +/- 0.42, N = 12 SE +/- 0.24, N = 15 SE +/- 0.34, N = 12 SE +/- 0.11, N = 3 14.25 14.29 14.62 14.82 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 6.05 MIN: 8.38 -march=znver3 - MIN: 8.56 MIN: 9.81 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 0.2043 0.4086 0.6129 0.8172 1.0215 SE +/- 0.004696, N = 3 SE +/- 0.004516, N = 3 SE +/- 0.002136, N = 3 SE +/- 0.011617, N = 3 0.881303 0.886642 0.902303 0.907941 MIN: 0.74 MIN: 0.76 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.75 -march=znver3 - MIN: 0.75 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver4 Znver3 + AVX-512 Znver4 + Prefer AVX-512 6 12 18 24 30 SE +/- 0.82, N = 15 SE +/- 0.95, N = 15 SE +/- 0.76, N = 15 SE +/- 0.56, N = 12 19.59 21.14 22.94 23.53 -march=znver3 - MIN: 9.21 MIN: 9.61 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 9.35 MIN: 10.06 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver3 1.018 2.036 3.054 4.072 5.09 SE +/- 0.04728, N = 15 SE +/- 0.04436, N = 15 SE +/- 0.03566, N = 9 SE +/- 0.04256, N = 15 4.26600 4.33366 4.33658 4.52440 MIN: 2.83 MIN: 2.77 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2.98 -march=znver3 - MIN: 3.03 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU Znver4 Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 0.0884 0.1768 0.2652 0.3536 0.442 SE +/- 0.000892, N = 3 SE +/- 0.003460, N = 3 SE +/- 0.000630, N = 3 SE +/- 0.004423, N = 3 0.380486 0.388483 0.392765 0.392875 MIN: 0.28 MIN: 0.28 -march=znver3 - MIN: 0.28 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.28 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU Znver3 + AVX-512 Znver4 Znver3 Znver4 + Prefer AVX-512 0.2147 0.4294 0.6441 0.8588 1.0735 SE +/- 0.011238, N = 4 SE +/- 0.012915, N = 3 SE +/- 0.009578, N = 5 SE +/- 0.009537, N = 3 0.936374 0.937164 0.947013 0.954140 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.79 MIN: 0.77 -march=znver3 - MIN: 0.79 MIN: 0.78 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 0.0629 0.1258 0.1887 0.2516 0.3145 SE +/- 0.001864, N = 3 SE +/- 0.001957, N = 12 SE +/- 0.003530, N = 3 SE +/- 0.002168, N = 15 0.274662 0.274769 0.276965 0.279655 MIN: 0.24 MIN: 0.24 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.24 -march=znver3 - MIN: 0.23 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU Znver4 Znver3 + AVX-512 Znver3 Znver4 + Prefer AVX-512 500 1000 1500 2000 2500 SE +/- 16.34, N = 3 SE +/- 18.84, N = 15 SE +/- 26.49, N = 3 SE +/- 26.64, N = 12 2020.24 2099.43 2108.46 2123.35 MIN: 1878.54 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1917.53 -march=znver3 - MIN: 1945.83 MIN: 1873.09 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU Znver3 + AVX-512 Znver3 Znver4 Znver4 + Prefer AVX-512 0.1002 0.2004 0.3006 0.4008 0.501 SE +/- 0.004441, N = 15 SE +/- 0.002906, N = 3 SE +/- 0.001884, N = 3 SE +/- 0.000431, N = 3 0.431299 0.442748 0.443891 0.445532 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.33 -march=znver3 - MIN: 0.34 MIN: 0.37 MIN: 0.34 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU Znver4 + Prefer AVX-512 Znver3 Znver4 Znver3 + AVX-512 0.5314 1.0628 1.5942 2.1256 2.657 SE +/- 0.00909, N = 3 SE +/- 0.00532, N = 3 SE +/- 0.00839, N = 3 SE +/- 0.00584, N = 3 2.29880 2.31858 2.33363 2.36166 MIN: 1.92 -march=znver3 - MIN: 1.91 MIN: 1.95 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1.92 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU Znver3 + AVX-512 Znver4 + Prefer AVX-512 Znver3 Znver4 0.1481 0.2962 0.4443 0.5924 0.7405 SE +/- 0.003927, N = 3 SE +/- 0.001081, N = 3 SE +/- 0.006925, N = 4 SE +/- 0.005902, N = 3 0.647087 0.650067 0.656342 0.658101 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.56 MIN: 0.53 -march=znver3 - MIN: 0.54 MIN: 0.53 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU Znver4 + Prefer AVX-512 Znver3 + AVX-512 Znver4 Znver3 500 1000 1500 2000 2500 SE +/- 31.29, N = 15 SE +/- 26.39, N = 15 SE +/- 20.65, N = 15 SE +/- 32.77, N = 12 2359.29 2377.17 2405.63 2442.35 MIN: 2066.84 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2080.2 MIN: 2158.4 -march=znver3 - MIN: 2123.49 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 Znver4 500 1000 1500 2000 2500 SE +/- 16.85, N = 3 SE +/- 27.68, N = 15 SE +/- 15.62, N = 15 SE +/- 9.82, N = 3 2070.13 2093.75 2103.41 2134.85 MIN: 1938.32 -march=znver3 - MIN: 1826.26 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1883.21 MIN: 2008.7 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 500 1000 1500 2000 2500 SE +/- 24.11, N = 15 SE +/- 25.52, N = 3 SE +/- 24.81, N = 3 SE +/- 16.53, N = 3 2418.28 2444.52 2479.72 2531.42 -march=znver3 - MIN: 2149.67 MIN: 2258.45 MIN: 2315.89 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2281.87 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU Znver4 + Prefer AVX-512 Znver4 Znver3 + AVX-512 Znver3 0.115 0.23 0.345 0.46 0.575 SE +/- 0.003087, N = 3 SE +/- 0.004211, N = 3 SE +/- 0.006958, N = 3 SE +/- 0.006281, N = 3 0.483979 0.500579 0.504842 0.510939 MIN: 0.39 MIN: 0.39 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.39 -march=znver3 - MIN: 0.39 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
libavif avifenc This is a test of the AOMedia libavif library testing the encoding of a JPEG image to AV1 Image Format (AVIF). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 0 Znver4 + Prefer AVX-512 Znver4 Znver3 Znver3 + AVX-512 14 28 42 56 70 SE +/- 0.07, N = 3 SE +/- 0.13, N = 3 SE +/- 0.30, N = 3 SE +/- 0.02, N = 3 61.07 61.12 61.66 62.68 -march=native -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 2 Znver4 + Prefer AVX-512 Znver3 Znver4 Znver3 + AVX-512 8 16 24 32 40 SE +/- 0.19, N = 3 SE +/- 0.25, N = 3 SE +/- 0.26, N = 3 SE +/- 0.15, N = 3 33.90 33.91 34.10 34.23 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6 Znver4 + Prefer AVX-512 Znver3 Znver4 Znver3 + AVX-512 0.5414 1.0828 1.6242 2.1656 2.707 SE +/- 0.002, N = 3 SE +/- 0.010, N = 3 SE +/- 0.014, N = 3 SE +/- 0.005, N = 3 2.317 2.331 2.347 2.406 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6, Lossless Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 1.0157 2.0314 3.0471 4.0628 5.0785 SE +/- 0.052, N = 4 SE +/- 0.027, N = 3 SE +/- 0.064, N = 3 SE +/- 0.010, N = 3 4.398 4.437 4.462 4.514 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 10, Lossless Znver4 Znver4 + Prefer AVX-512 Znver3 Znver3 + AVX-512 0.8206 1.6412 2.4618 3.2824 4.103 SE +/- 0.012, N = 3 SE +/- 0.017, N = 3 SE +/- 0.016, N = 3 SE +/- 0.038, N = 3 3.541 3.572 3.581 3.647 -march=native -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -fPIC -flto -lm
Ngspice Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C2670 Znver4 Znver3 Znver3 + AVX-512 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 0.76, N = 3 SE +/- 0.09, N = 3 SE +/- 0.24, N = 3 SE +/- 0.17, N = 3 95.07 95.18 95.19 95.60 -march=native -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native 1. (CC) gcc options: -O3 -flto -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C7552 Znver4 Znver3 Znver4 + Prefer AVX-512 Znver3 + AVX-512 20 40 60 80 100 SE +/- 1.00, N = 3 SE +/- 0.28, N = 3 SE +/- 0.14, N = 3 SE +/- 0.28, N = 3 92.10 92.94 93.45 93.52 -march=native -march=znver3 -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -O3 -flto -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 22.1 Input: Carbon Nanotube Znver3 Znver4 Znver4 + Prefer AVX-512 Znver3 + AVX-512 5 10 15 20 25 SE +/- 0.26, N = 4 SE +/- 0.15, N = 3 SE +/- 0.28, N = 3 SE +/- 0.07, N = 3 21.72 22.35 22.82 22.82 -march=znver3 -march=native -march=native -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -shared -fwrapv -O2 -O3 -flto -lxc -lblas -lmpi
Znver4 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 3 January 2023 08:40 by user phoronix.
Znver4 + Prefer AVX-512 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto" CFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 3 January 2023 18:42 by user phoronix.
Znver3 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -flto" CFLAGS="-O3 -march=znver3 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 4 January 2023 05:51 by user phoronix.
Znver3 + AVX-512 Processor: 2 x AMD EPYC 9654 96-Core @ 3.71GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 23.04, Kernel: 5.19.0-21-generic (x86_64), Desktop: GNOME Shell 43.1, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 13.0.0 20230103, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto" CFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 4 January 2023 13:46 by user phoronix.