AMD EPYC 9654 GCC 13 development compiler benchmarks by Michael Larabel for a future article.
Znver4 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Znver4 + Prefer AVX-512 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto" CFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Znver3 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -flto" CFLAGS="-O3 -march=znver3 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Znver3 + AVX-512 Processor: 2 x AMD EPYC 9654 96-Core @ 3.71GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 23.04, Kernel: 5.19.0-21-generic (x86_64), Desktop: GNOME Shell 43.1, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 13.0.0 20230103, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto" CFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 50 100 150 200 250 SE +/- 2.52, N = 3 SE +/- 2.13, N = 3 SE +/- 2.07, N = 3 SE +/- 2.51, N = 4 214.08 214.63 214.50 211.02 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 60 120 180 240 300 SE +/- 0.73, N = 3 SE +/- 1.06, N = 3 SE +/- 1.75, N = 3 SE +/- 0.73, N = 3 264.28 266.16 264.15 265.52 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.0 Algorithm: SHA256 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 60000M 120000M 180000M 240000M 300000M SE +/- 18811489.95, N = 3 SE +/- 143443542.20, N = 3 SE +/- 150345864.28, N = 3 SE +/- 172236238.05, N = 3 266464361193 266899089453 265326713587 266230124070 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -pthread -m64 -O3 -flto -lssl -lcrypto -ldl
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Medium Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 14 28 42 56 70 SE +/- 0.36, N = 3 SE +/- 0.10, N = 3 SE +/- 0.52, N = 3 SE +/- 0.46, N = 3 64.27 63.80 64.35 64.05 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -flto -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Very Fast Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 16 32 48 64 80 SE +/- 0.28, N = 3 SE +/- 0.87, N = 4 SE +/- 0.83, N = 3 SE +/- 0.92, N = 3 72.01 70.71 70.71 70.81 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -flto -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 0.62, N = 3 SE +/- 0.64, N = 15 SE +/- 1.01, N = 3 SE +/- 0.73, N = 15 78.01 76.76 78.69 74.98 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -flto -lpthread -lm -lrt
SVT-AV1 This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 4 - Input: Bosphorus 4K Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1.2202 2.4404 3.6606 4.8808 6.101 SE +/- 0.012, N = 3 SE +/- 0.037, N = 3 SE +/- 0.015, N = 3 SE +/- 0.043, N = 3 5.360 5.374 5.392 5.423 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 8 - Input: Bosphorus 4K Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 0.46, N = 3 SE +/- 0.52, N = 3 SE +/- 0.33, N = 3 SE +/- 0.32, N = 3 95.12 92.96 93.76 94.34 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 12 - Input: Bosphorus 4K Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 50 100 150 200 250 SE +/- 5.39, N = 15 SE +/- 3.02, N = 15 SE +/- 4.58, N = 15 SE +/- 1.78, N = 3 222.90 206.68 210.39 196.80 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.4 Encoder Mode: Preset 13 - Input: Bosphorus 4K Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 50 100 150 200 250 SE +/- 4.56, N = 12 SE +/- 2.81, N = 15 SE +/- 3.33, N = 15 SE +/- 3.20, N = 15 219.57 210.43 196.12 208.64 -march=znver3 -march=znver3 -mavx512cd -mavx512vl -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: Kostya Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.945 1.89 2.835 3.78 4.725 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 4.20 4.16 4.17 4.19 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: TopTweet Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2 4 6 8 10 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 6.61 6.73 6.96 6.91 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: LargeRandom Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.2858 0.5716 0.8574 1.1432 1.429 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.25 1.25 1.27 1.24 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: PartialTweets Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 6.74 6.68 6.63 6.43 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: DistinctUserID Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 6.46 6.61 6.53 6.86 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1200 2400 3600 4800 6000 SE +/- 62.89, N = 3 SE +/- 53.26, N = 3 SE +/- 51.63, N = 3 SE +/- 62.65, N = 4 5351.94 5365.61 5362.54 5275.43 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1400 2800 4200 5600 7000 SE +/- 18.35, N = 3 SE +/- 26.43, N = 3 SE +/- 43.78, N = 3 SE +/- 18.27, N = 3 6607.09 6653.86 6603.68 6638.01 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
ACES DGEMM This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 16 32 48 64 80 SE +/- 0.06, N = 3 SE +/- 0.28, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 70.38 70.19 70.05 70.30 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CC) gcc options: -O3 -march=native -fopenmp -flto
GraphicsMagick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Swirl Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 600 1200 1800 2400 3000 SE +/- 36.67, N = 3 SE +/- 28.99, N = 3 SE +/- 26.19, N = 7 SE +/- 8.09, N = 3 2563 2826 2862 2681 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Rotate Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 150 300 450 600 750 SE +/- 1.20, N = 3 SE +/- 3.84, N = 3 SE +/- 1.73, N = 3 SE +/- 1.00, N = 3 645 605 673 656 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Sharpen Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 300 600 900 1200 1500 SE +/- 10.73, N = 3 SE +/- 1.15, N = 3 SE +/- 6.56, N = 3 SE +/- 13.17, N = 3 1285 1321 1359 1314 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Enhanced Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 500 1000 1500 2000 2500 SE +/- 19.19, N = 3 SE +/- 14.95, N = 3 SE +/- 2.00, N = 3 SE +/- 5.00, N = 3 1837 2208 2234 2150 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Resizing Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 1.27, N = 15 SE +/- 0.90, N = 15 SE +/- 1.00, N = 15 SE +/- 0.94, N = 15 89 86 88 87 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Noise-Gaussian Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 200 400 600 800 1000 SE +/- 5.13, N = 3 SE +/- 11.43, N = 15 SE +/- 6.60, N = 15 SE +/- 11.39, N = 15 1018 975 1024 1013 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: HWB Color Space Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 300 600 900 1200 1500 SE +/- 11.42, N = 15 SE +/- 1.86, N = 3 SE +/- 4.84, N = 3 SE +/- 15.31, N = 15 1134 1062 1180 1167 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -fopenmp -O3 -flto -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lz -lm -lpthread
Coremark This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2M 4M 6M 8M 10M SE +/- 62802.17, N = 3 SE +/- 2823.14, N = 3 SE +/- 11460.97, N = 3 SE +/- 83462.96, N = 3 7640546.27 7871273.93 7694653.90 7861097.86 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O2 -O3 -flto -lrt" -lrt
Cpuminer-Opt Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Magi Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2K 4K 6K 8K 10K SE +/- 51.27, N = 3 SE +/- 50.94, N = 3 SE +/- 47.28, N = 3 SE +/- 64.85, N = 3 8490.73 8355.75 8440.79 8467.24 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: x25x Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2K 4K 6K 8K 10K SE +/- 17.26, N = 3 SE +/- 15.17, N = 3 SE +/- 74.42, N = 3 SE +/- 90.39, N = 3 6116.97 7941.38 8042.88 8217.70 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: scrypt Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1000 2000 3000 4000 5000 SE +/- 0.45, N = 3 SE +/- 1.99, N = 3 SE +/- 0.45, N = 3 SE +/- 1.22, N = 3 2959.15 4763.91 4790.11 4782.74 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Deepcoin Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 40K 80K 120K 160K 200K SE +/- 218.28, N = 3 SE +/- 880.18, N = 3 SE +/- 81.72, N = 3 SE +/- 1703.12, N = 5 164993 160157 159147 162242 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Garlicoin Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 16K 32K 48K 64K 80K SE +/- 66.92, N = 3 SE +/- 742.93, N = 3 SE +/- 461.75, N = 3 SE +/- 295.35, N = 3 49523 72837 72413 72130 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Skeincoin Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 400K 800K 1200K 1600K 2000K SE +/- 5094.00, N = 3 SE +/- 24878.62, N = 3 SE +/- 10323.18, N = 3 SE +/- 7108.55, N = 3 1414047 2004990 2014770 2009367 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: LBC, LBRY Credits Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 200K 400K 600K 800K 1000K SE +/- 3568.28, N = 3 SE +/- 1800.04, N = 3 SE +/- 829.54, N = 3 SE +/- 12123.53, N = 3 497020 1067743 1065487 1085827 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Quad SHA-256, Pyrite Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 500K 1000K 1500K 2000K 2500K SE +/- 5899.09, N = 3 SE +/- 22716.96, N = 3 SE +/- 10306.20, N = 3 SE +/- 25368.27, N = 4 1378987 2264747 2251067 2323995 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Triple SHA-256, Onecoin Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 700K 1400K 2100K 2800K 3500K SE +/- 4313.22, N = 3 SE +/- 26057.45, N = 3 SE +/- 26496.25, N = 3 SE +/- 22336.69, N = 3 3255253 3323680 3306643 3301217 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lcurl -lz -lpthread -lssl -lcrypto -lgmp
SecureMark SecureMark is an objective, standardized benchmarking framework for measuring the efficiency of cryptographic processing solutions developed by EEMBC. SecureMark-TLS is benchmarking Transport Layer Security performance with a focus on IoT/edge computing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org marks, More Is Better SecureMark 1.0.4 Benchmark: SecureMark-TLS Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 60K 120K 180K 240K 300K SE +/- 1276.92, N = 3 SE +/- 380.50, N = 3 SE +/- 383.28, N = 3 SE +/- 640.61, N = 3 294057 296575 296548 294122 1. (CC) gcc options: -pedantic -O3
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 0.91, N = 15 SE +/- 1.27, N = 3 SE +/- 1.03, N = 6 SE +/- 1.30, N = 3 104.4 102.9 102.1 105.3 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Decompression Speed Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 800 1600 2400 3200 4000 SE +/- 12.56, N = 15 SE +/- 32.49, N = 3 SE +/- 17.00, N = 6 SE +/- 36.60, N = 3 3574.9 3594.8 3584.1 3581.6 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 10 20 30 40 50 SE +/- 0.40, N = 15 SE +/- 0.79, N = 15 SE +/- 0.40, N = 15 SE +/- 0.59, N = 15 39.8 43.9 40.8 42.5 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Decompression Speed Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 800 1600 2400 3200 4000 SE +/- 11.99, N = 15 SE +/- 12.89, N = 15 SE +/- 12.59, N = 15 SE +/- 13.90, N = 15 3695.5 3684.7 3708.8 3685.7 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lz -llzma
QuantLib QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 700 1400 2100 2800 3500 SE +/- 4.52, N = 3 SE +/- 2.74, N = 3 SE +/- 1.99, N = 3 SE +/- 1.71, N = 3 3120.5 3114.9 3096.9 3112.6 1. (CXX) g++ options: -O3 -march=native -rdynamic
SMHasher SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.
Result
OpenBenchmarking.org MiB/sec, More Is Better SMHasher 2022-08-22 Hash: FarmHash32 x86_64 AVX Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 9K 18K 27K 36K 45K SE +/- 0.13, N = 3 SE +/- 1.84, N = 3 SE +/- 0.95, N = 3 SE +/- 1.44, N = 3 40565.72 40559.33 40565.07 40563.96 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
cycles/hash
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: FarmHash32 x86_64 AVX Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 26.49 26.49 26.49 26.50 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
Result
OpenBenchmarking.org MiB/sec, More Is Better SMHasher 2022-08-22 Hash: t1ha0_aes_avx2 x86_64 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20K 40K 60K 80K 100K SE +/- 8.85, N = 3 SE +/- 19.58, N = 3 SE +/- 20.00, N = 3 SE +/- 11.55, N = 3 102351.87 102403.92 102354.57 102399.74 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
cycles/hash
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: t1ha0_aes_avx2 x86_64 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 20.53 20.53 20.52 20.81 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
Result
OpenBenchmarking.org MiB/sec, More Is Better SMHasher 2022-08-22 Hash: MeowHash x86_64 AES-NI Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 12K 24K 36K 48K 60K SE +/- 9.98, N = 3 SE +/- 10.52, N = 3 SE +/- 7.42, N = 3 SE +/- 9.47, N = 3 54281.94 54284.65 54272.14 54297.04 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
cycles/hash
OpenBenchmarking.org cycles/hash, Fewer Is Better SMHasher 2022-08-22 Hash: MeowHash x86_64 AES-NI Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 44.96 44.97 44.98 44.95 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi 1. (CXX) g++ options: -O3 -flto -march=native -flto=auto -fno-fat-lto-objects
JPEG XL libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 90 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 9.61 9.86 9.83 9.81 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 90 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.13, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 9.18 9.27 9.48 9.43 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: PNG - Quality: 100 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.1868 0.3736 0.5604 0.7472 0.934 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.82 0.81 0.83 0.82 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
OpenBenchmarking.org MP/s, More Is Better JPEG XL libjxl 0.7 Input: JPEG - Quality: 100 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.1688 0.3376 0.5064 0.6752 0.844 SE +/- 0.01, N = 6 SE +/- 0.01, N = 9 SE +/- 0.01, N = 9 SE +/- 0.01, N = 3 0.74 0.71 0.73 0.75 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -fno-rtti -funwind-tables -O2 -fPIE -pie -latomic
JPEG XL Decoding libjxl The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is suited for JPEG XL decode performance testing to PNG output file, the pts/jpexl test is for encode performance. The JPEG XL encoding/decoding is done using the libjxl codebase. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding libjxl 0.7 CPU Threads: 1 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 11 22 33 44 55 SE +/- 0.12, N = 3 SE +/- 0.25, N = 3 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 48.22 47.02 48.64 48.21
OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding libjxl 0.7 CPU Threads: All Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 60 120 180 240 300 SE +/- 1.29, N = 3 SE +/- 1.24, N = 3 SE +/- 1.41, N = 3 SE +/- 0.30, N = 3 269.59 266.53 277.20 272.31
WebP Image Encode This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Default Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 18.99 18.85 18.95 18.97 -march=znver3 -lgif -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=native -ljpeg -march=native -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 11.48 11.38 11.54 11.54 -march=znver3 -lgif -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=native -ljpeg -march=native -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.3308 0.6616 0.9924 1.3232 1.654 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.47 1.45 1.45 1.47 -march=znver3 -lgif -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=native -ljpeg -march=native -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Highest Compression Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.819 1.638 2.457 3.276 4.095 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.64 3.11 3.25 3.23 -march=znver3 -lgif -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=native -ljpeg -march=native -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless, Highest Compression Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.1305 0.261 0.3915 0.522 0.6525 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.58 0.58 0.57 0.58 -march=znver3 -lgif -ltiff -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -ljpeg -ltiff -march=native -ljpeg -march=native -ljpeg -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -flto -lpng16 -lm
ASTC Encoder ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Medium Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 110 220 330 440 550 SE +/- 6.17, N = 3 SE +/- 5.68, N = 3 SE +/- 3.36, N = 13 SE +/- 0.56, N = 3 459.69 511.52 493.22 420.70 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Thorough Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 30 60 90 120 150 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 116.09 117.69 118.74 118.99 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 4.0 Preset: Exhaustive Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.05, N = 3 12.22 13.03 12.94 13.09 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -pthread
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2022.1 Implementation: MPI CPU - Input: water_GMX50_bare Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 18.23 19.09 19.49 19.44 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto
LAMMPS Molecular Dynamics Simulator LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 13 26 39 52 65 SE +/- 0.04, N = 3 SE +/- 0.08, N = 3 SE +/- 0.15, N = 3 SE +/- 0.21, N = 3 55.72 56.04 55.62 55.74 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lm -ldl
OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 12 24 36 48 60 SE +/- 0.43, N = 3 SE +/- 0.40, N = 10 SE +/- 0.21, N = 3 SE +/- 0.48, N = 3 51.46 51.82 51.59 52.29 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -lm -ldl
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 96000 - Buffer Size: 1024 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.993 1.986 2.979 3.972 4.965 SE +/- 0.016378, N = 3 SE +/- 0.011844, N = 3 SE +/- 0.006484, N = 3 SE +/- 0.013873, N = 3 4.356446 4.413379 4.331230 4.408290 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 192000 - Buffer Size: 1024 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.6454 1.2908 1.9362 2.5816 3.227 SE +/- 0.014903, N = 3 SE +/- 0.000776, N = 3 SE +/- 0.009628, N = 3 SE +/- 0.003508, N = 3 2.820079 2.868373 2.783717 2.867233 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
PJSIP PJSIP is a free and open source multimedia communication library written in C language implementing standard based protocols such as SIP, SDP, RTP, STUN, TURN, and ICE. It combines signaling protocol (SIP) with rich multimedia framework and NAT traversal functionality into high level API that is portable and suitable for almost any type of systems ranging from desktops, embedded systems, to mobile handsets. This test profile is making use of pjsip-perf with both the client/server on teh system. More details on the PJSIP benchmark at https://www.pjsip.org/high-performance-sip.htm Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Responses Per Second, More Is Better PJSIP 2.11 Method: INVITE Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1100 2200 3300 4400 5500 SE +/- 15.00, N = 3 SE +/- 24.85, N = 3 SE +/- 19.91, N = 3 SE +/- 51.36, N = 5 5149 5132 5200 5084 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -lavformat -lavcodec -lswscale -lavutil -lstdc++ -lopus -lssl -lcrypto -lm -lrt -lpthread -lasound -O3 -flto
OpenBenchmarking.org Responses Per Second, More Is Better PJSIP 2.11 Method: OPTIONS, Stateful Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2K 4K 6K 8K 10K SE +/- 72.95, N = 3 SE +/- 36.35, N = 3 SE +/- 23.25, N = 3 SE +/- 32.42, N = 3 9226 9236 9237 9288 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -lavformat -lavcodec -lswscale -lavutil -lstdc++ -lopus -lssl -lcrypto -lm -lrt -lpthread -lasound -O3 -flto
OpenBenchmarking.org Responses Per Second, More Is Better PJSIP 2.11 Method: OPTIONS, Stateless Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 70K 140K 210K 280K 350K SE +/- 4433.83, N = 12 SE +/- 3613.75, N = 15 SE +/- 2531.21, N = 3 SE +/- 5240.59, N = 15 336885 336615 335767 336791 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -lavformat -lavcodec -lswscale -lavutil -lstdc++ -lopus -lssl -lcrypto -lm -lrt -lpthread -lasound -O3 -flto
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 128 - Buffer Length: 256 - Filter Length: 57 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1500M 3000M 4500M 6000M 7500M SE +/- 6548367.06, N = 3 SE +/- 4658445.14, N = 3 SE +/- 3192874.01, N = 3 SE +/- 6145549.43, N = 3 6940833333 6999233333 6990866667 6977633333 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 256 - Buffer Length: 256 - Filter Length: 57 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2000M 4000M 6000M 8000M 10000M SE +/- 24004606.04, N = 3 SE +/- 4697162.26, N = 3 SE +/- 5651843.36, N = 3 SE +/- 8533658.85, N = 3 9735666667 9813700000 9789500000 9809800000 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 384 - Buffer Length: 256 - Filter Length: 57 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 2000M 4000M 6000M 8000M 10000M SE +/- 10066445.91, N = 3 SE +/- 2403700.85, N = 3 SE +/- 7356025.50, N = 3 SE +/- 7218802.61, N = 3 11176000000 11301666667 11249333333 11270666667 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -pthread -lm -lc -lliquid
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 10K 20K 30K 40K 50K SE +/- 22.49, N = 3 SE +/- 0.44, N = 3 SE +/- 8.03, N = 3 SE +/- 151.57, N = 3 44435.3 44499.3 44490.1 44301.8 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -pthread -m64 -O3 -flto -lssl -lcrypto -ldl
Kripke Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 60M 120M 180M 240M 300M SE +/- 2950522.72, N = 15 SE +/- 3107958.27, N = 12 SE +/- 3293132.01, N = 15 SE +/- 2000945.99, N = 3 263812847 271735708 261562280 254648533 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -flto -fopenmp
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.0 Algorithm: RSA4096 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 600K 1200K 1800K 2400K 3000K SE +/- 228.34, N = 3 SE +/- 1641.03, N = 3 SE +/- 140.73, N = 3 SE +/- 10617.65, N = 3 2938488.5 2935372.1 2939503.5 2924586.4 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -pthread -m64 -O3 -flto -lssl -lcrypto -ldl
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 4 8 12 16 20 SE +/- 0.34, N = 12 SE +/- 0.42, N = 12 SE +/- 0.11, N = 3 SE +/- 0.24, N = 15 14.62 14.25 14.82 14.29 -march=znver3 - MIN: 8.56 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 6.05 MIN: 9.81 MIN: 8.38 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.2043 0.4086 0.6129 0.8172 1.0215 SE +/- 0.011617, N = 3 SE +/- 0.002136, N = 3 SE +/- 0.004516, N = 3 SE +/- 0.004696, N = 3 0.907941 0.902303 0.886642 0.881303 -march=znver3 - MIN: 0.75 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.75 MIN: 0.76 MIN: 0.74 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 6 12 18 24 30 SE +/- 0.82, N = 15 SE +/- 0.76, N = 15 SE +/- 0.95, N = 15 SE +/- 0.56, N = 12 19.59 22.94 21.14 23.53 -march=znver3 - MIN: 9.21 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 9.35 MIN: 9.61 MIN: 10.06 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1.018 2.036 3.054 4.072 5.09 SE +/- 0.04256, N = 15 SE +/- 0.03566, N = 9 SE +/- 0.04728, N = 15 SE +/- 0.04436, N = 15 4.52440 4.33658 4.26600 4.33366 -march=znver3 - MIN: 3.03 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2.98 MIN: 2.83 MIN: 2.77 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.0884 0.1768 0.2652 0.3536 0.442 SE +/- 0.000630, N = 3 SE +/- 0.004423, N = 3 SE +/- 0.000892, N = 3 SE +/- 0.003460, N = 3 0.392765 0.392875 0.380486 0.388483 -march=znver3 - MIN: 0.28 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.28 MIN: 0.28 MIN: 0.28 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.2147 0.4294 0.6441 0.8588 1.0735 SE +/- 0.009578, N = 5 SE +/- 0.011238, N = 4 SE +/- 0.012915, N = 3 SE +/- 0.009537, N = 3 0.947013 0.936374 0.937164 0.954140 -march=znver3 - MIN: 0.79 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.79 MIN: 0.77 MIN: 0.78 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.0629 0.1258 0.1887 0.2516 0.3145 SE +/- 0.002168, N = 15 SE +/- 0.003530, N = 3 SE +/- 0.001957, N = 12 SE +/- 0.001864, N = 3 0.279655 0.276965 0.274769 0.274662 -march=znver3 - MIN: 0.23 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.24 MIN: 0.24 MIN: 0.24 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 500 1000 1500 2000 2500 SE +/- 26.49, N = 3 SE +/- 18.84, N = 15 SE +/- 16.34, N = 3 SE +/- 26.64, N = 12 2108.46 2099.43 2020.24 2123.35 -march=znver3 - MIN: 1945.83 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1917.53 MIN: 1878.54 MIN: 1873.09 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.1002 0.2004 0.3006 0.4008 0.501 SE +/- 0.002906, N = 3 SE +/- 0.004441, N = 15 SE +/- 0.001884, N = 3 SE +/- 0.000431, N = 3 0.442748 0.431299 0.443891 0.445532 -march=znver3 - MIN: 0.34 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.33 MIN: 0.37 MIN: 0.34 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.5314 1.0628 1.5942 2.1256 2.657 SE +/- 0.00532, N = 3 SE +/- 0.00584, N = 3 SE +/- 0.00839, N = 3 SE +/- 0.00909, N = 3 2.31858 2.36166 2.33363 2.29880 -march=znver3 - MIN: 1.91 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1.92 MIN: 1.95 MIN: 1.92 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.1481 0.2962 0.4443 0.5924 0.7405 SE +/- 0.006925, N = 4 SE +/- 0.003927, N = 3 SE +/- 0.005902, N = 3 SE +/- 0.001081, N = 3 0.656342 0.647087 0.658101 0.650067 -march=znver3 - MIN: 0.54 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.56 MIN: 0.53 MIN: 0.53 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 500 1000 1500 2000 2500 SE +/- 32.77, N = 12 SE +/- 26.39, N = 15 SE +/- 20.65, N = 15 SE +/- 31.29, N = 15 2442.35 2377.17 2405.63 2359.29 -march=znver3 - MIN: 2123.49 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2080.2 MIN: 2158.4 MIN: 2066.84 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 500 1000 1500 2000 2500 SE +/- 27.68, N = 15 SE +/- 15.62, N = 15 SE +/- 9.82, N = 3 SE +/- 16.85, N = 3 2093.75 2103.41 2134.85 2070.13 -march=znver3 - MIN: 1826.26 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1883.21 MIN: 2008.7 MIN: 1938.32 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 500 1000 1500 2000 2500 SE +/- 24.11, N = 15 SE +/- 16.53, N = 3 SE +/- 24.81, N = 3 SE +/- 25.52, N = 3 2418.28 2531.42 2479.72 2444.52 -march=znver3 - MIN: 2149.67 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2281.87 MIN: 2315.89 MIN: 2258.45 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.115 0.23 0.345 0.46 0.575 SE +/- 0.006281, N = 3 SE +/- 0.006958, N = 3 SE +/- 0.004211, N = 3 SE +/- 0.003087, N = 3 0.510939 0.504842 0.500579 0.483979 -march=znver3 - MIN: 0.39 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.39 MIN: 0.39 MIN: 0.39 1. (CXX) g++ options: -O3 -march=native -flto -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
libavif avifenc This is a test of the AOMedia libavif library testing the encoding of a JPEG image to AV1 Image Format (AVIF). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 0 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 14 28 42 56 70 SE +/- 0.30, N = 3 SE +/- 0.02, N = 3 SE +/- 0.13, N = 3 SE +/- 0.07, N = 3 61.66 62.68 61.12 61.07 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 2 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 8 16 24 32 40 SE +/- 0.25, N = 3 SE +/- 0.15, N = 3 SE +/- 0.26, N = 3 SE +/- 0.19, N = 3 33.91 34.23 34.10 33.90 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.5414 1.0828 1.6242 2.1656 2.707 SE +/- 0.010, N = 3 SE +/- 0.005, N = 3 SE +/- 0.014, N = 3 SE +/- 0.002, N = 3 2.331 2.406 2.347 2.317 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 6, Lossless Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 1.0157 2.0314 3.0471 4.0628 5.0785 SE +/- 0.027, N = 3 SE +/- 0.010, N = 3 SE +/- 0.052, N = 4 SE +/- 0.064, N = 3 4.437 4.514 4.398 4.462 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -fPIC -flto -lm
OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.11 Encoder Speed: 10, Lossless Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 0.8206 1.6412 2.4618 3.2824 4.103 SE +/- 0.016, N = 3 SE +/- 0.038, N = 3 SE +/- 0.012, N = 3 SE +/- 0.017, N = 3 3.581 3.647 3.541 3.572 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CXX) g++ options: -O3 -fPIC -flto -lm
Ngspice Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C2670 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.24, N = 3 SE +/- 0.76, N = 3 SE +/- 0.17, N = 3 95.18 95.19 95.07 95.60 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C7552 Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 20 40 60 80 100 SE +/- 0.28, N = 3 SE +/- 0.28, N = 3 SE +/- 1.00, N = 3 SE +/- 0.14, N = 3 92.94 93.52 92.10 93.45 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -O3 -flto -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 22.1 Input: Carbon Nanotube Znver3 Znver3 + AVX-512 Znver4 Znver4 + Prefer AVX-512 5 10 15 20 25 SE +/- 0.26, N = 4 SE +/- 0.07, N = 3 SE +/- 0.15, N = 3 SE +/- 0.28, N = 3 21.72 22.82 22.35 22.82 -march=znver3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -march=native -march=native 1. (CC) gcc options: -shared -fwrapv -O2 -O3 -flto -lxc -lblas -lmpi
Znver4 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 3 January 2023 08:40 by user phoronix.
Znver4 + Prefer AVX-512 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto" CFLAGS="-O3 -march=native -mprefer-vector-width=512 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 3 January 2023 18:42 by user phoronix.
Znver3 Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -flto" CFLAGS="-O3 -march=znver3 -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 4 January 2023 05:51 by user phoronix.
Znver3 + AVX-512 Processor: 2 x AMD EPYC 9654 96-Core @ 3.71GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1002E BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Monitor: VGA HDMI, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 23.04, Kernel: 5.19.0-21-generic (x86_64), Desktop: GNOME Shell 43.1, Display Server: X Server 1.21.1.4, Vulkan: 1.3.224, Compiler: GCC 13.0.0 20230103, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseEnvironment Notes: CXXFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto" CFLAGS="-O3 -march=znver3 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -flto"Compiler Notes: --disable-multilib --enable-checking=releaseProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110dPython Notes: Python 3.10.9Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 4 January 2023 13:46 by user phoronix.