EPYC 7502 AOCC 2.3 Compiler Comparison AMD EPYC 7502 testing of various benchmarks under AMD AOCC 2.3, GCC 10.2, LLVM Clang 11. CFLAGS/CXXFLAGS of "-O3 -march=znver2" throughout. Benchmarks by Michael Larabel for a future article.
HTML result view exported from: https://openbenchmarking.org/result/2012080-HA-EPYC7502A97&gru&sro .
EPYC 7502 AOCC 2.3 Compiler Comparison Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver Compiler File-System Screen Resolution GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 AMD EPYC 7502 32-Core @ 2.50GHz (32 Cores / 64 Threads) ASRockRack EPYCD8 (P2.10 BIOS) AMD Starship/Matisse 126GB 280GB INTEL SSDPED1D280GA ASPEED AMD Starship/Matisse VE228 2 x Intel I350 Ubuntu 20.10 5.8.0-31-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 modesetting 1.20.9 GCC 10.2.0 ext4 1920x1080 Clang 11.0.0-2Target: Clang 11.0.0 OpenBenchmarking.org Environment Details - CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2" Compiler Details - GCC 10.2: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - AMD AOCC 2.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver2 Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101c Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
EPYC 7502 AOCC 2.3 Compiler Comparison dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit svt-av1: Enc Mode 0 - 1080p svt-av1: Enc Mode 4 - 1080p svt-av1: Enc Mode 8 - 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p vpxenc: Speed 0 vpxenc: Speed 5 x264: H.264 Video Encoding x265: Bosphorus 4K x265: Bosphorus 1080p graphics-magick: Swirl graphics-magick: Rotate graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Resizing compress-lz4: 1 - Compression Speed compress-lz4: 3 - Compression Speed compress-lz4: 9 - Compression Speed compress-zstd: 3 compress-zstd: 19 tjbench: Decompression Throughput scimark2: Composite cryptopp: Unkeyed Algorithms libraw: Post-Processing Benchmark tscp: AI Chess Performance stockfish: Total Time hint: FLOAT redis: LPUSH redis: GET redis: SET nginx: Static Web Page Serving openssl: RSA 4096-bit Performance daphne: OpenMP - NDT Mapping daphne: OpenMP - Points2Image daphne: OpenMP - Euclidean Cluster pgbench: 1 - 1 - Read Only pgbench: 1 - 1 - Read Write pgbench: 1 - 50 - Read Only pgbench: 1 - 50 - Read Write webp: Quality 100, Lossless webp: Quality 100, Highest Compression webp: Quality 100, Lossless, Highest Compression onednn: IP Batch 1D - f32 - CPU onednn: IP Batch 1D - u8s8f32 - CPU onednn: IP Batch All - u8s8f32 - CPU onednn: Deconvolution Batch deconv_1d - f32 - CPU onednn: Deconvolution Batch deconv_3d - f32 - CPU onednn: Deconvolution Batch deconv_1d - u8s8f32 - CPU onednn: Deconvolution Batch deconv_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU pgbench: 1 - 1 - Read Only - Average Latency pgbench: 1 - 1 - Read Write - Average Latency pgbench: 1 - 50 - Read Only - Average Latency pgbench: 1 - 50 - Read Write - Average Latency ncnn: CPU - squeezenet ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 mrbayes: Primate Phylogeny Analysis c-ray: Total Time - 4K, 16 Rays Per Pixel aobench: 2048 x 2048 - Total Time encode-mp3: WAV To MP3 rnnoise: astcenc: Medium astcenc: Thorough astcenc: Exhaustive basis: UASTC Level 2 basis: UASTC Level 3 basis: UASTC Level 2 + RDO Post-Processing sqlite-speedtest: Timed Time - Size 1,000 GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 564.96 269.72 567.42 143.02 0.103 6.775 55.713 354.98 348.24 279.73 6.27 19.03 149.35 22.83 49.05 1295 535 434 590 2092 9459.69 45.57 44.72 7849.6 109.9 171.888797 2759.35 305.900410 52.54 1007283 58347386 291925951.64210 1212068.06 1809976.83 1369757.13 30658.67 7395.4 874.54 18452.623694826 890.71 28919 3739 521332 3453 20.799 8.861 44.383 1.67503 1.17556 13.8793 1.95695 3.68073 2.17957 1.94380 254.610 79.2495 0.532944 0.035 0.268 0.096 14.488 17.29 19.46 9.37 8.88 10.59 8.95 11.32 3.99 20.85 30.89 13.12 9.46 23.70 29.47 324.172 305.134 94.284 18.922 35.783 8.798 21.723 7.00 9.61 73.46 16.507 25.522 755.517 75.140 572.19 275.41 584.65 92.56 0.145 8.591 70.229 363.72 365.57 286.24 5.96 18.23 146.43 22.44 50.02 1323 527 384 627 1827 9838.54 48.40 44.30 7866.2 111.2 174.991039 2673.79 314.389914 36.98 1143642 62434784 292874904.53733 1304842.23 2122749.43 1483322.31 30676.68 5412.5 945.32 11946.566719313 674.01 28921 3772 530684 3443 20.719 7.655 42.377 1.72247 1.03137 13.4649 1.69390 3.72354 2.05539 1.98927 172.513 64.9912 0.575864 0.035 0.265 0.094 14.528 15.35 17.85 6.88 6.87 7.35 6.26 8.77 2.84 18.88 36.60 13.19 10.83 22.23 30.70 392.204 311.682 93.841 30.646 41.655 10.078 21.540 6.05 8.32 67.17 16.485 25.321 837.541 77.642 575.22 274.20 588.62 122.39 0.146 8.657 70.698 366.61 368.17 291.22 6.65 20.12 151.92 23.69 50.60 1368 525 316 526 1653 9780.39 48.52 45.76 7937.8 114.6 176.873794 2779.62 312.637840 38.47 1138442 62375605 294314450.61706 1380024.29 1874175.91 1446872.35 31368.86 5413.5 923.68 13720.155459109 678.51 29645 3865 541314 3413 20.904 7.546 42.704 1.40879 1.00926 11.9233 1.66772 3.66096 1.99141 1.87485 147.523 30.7182 0.464235 0.034 0.259 0.092 14.655 14.38 17.02 5.62 5.20 6.25 5.01 6.94 2.19 16.51 32.03 11.99 8.94 19.70 29.38 365.390 304.178 90.759 33.275 37.758 11.026 20.756 5.81 8.16 66.04 16.315 25.210 833.536 78.067 OpenBenchmarking.org
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 120 240 360 480 600 SE +/- 0.49, N = 3 SE +/- 1.05, N = 3 SE +/- 0.97, N = 3 575.22 564.96 572.19 MIN: 414.12 / MAX: 729.95 MIN: 399.64 / MAX: 724.27 MIN: 404.8 / MAX: 726.7 1. (CC) gcc options: -O3 -march=znver2 -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 4K AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 60 120 180 240 300 SE +/- 1.22, N = 3 SE +/- 0.57, N = 3 SE +/- 1.08, N = 3 274.20 269.72 275.41 MIN: 151.98 / MAX: 293.44 MIN: 160.12 / MAX: 288.9 MIN: 155.99 / MAX: 295.35 1. (CC) gcc options: -O3 -march=znver2 -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 130 260 390 520 650 SE +/- 2.09, N = 3 SE +/- 0.36, N = 3 SE +/- 0.95, N = 3 588.62 567.42 584.65 MIN: 345.64 / MAX: 651.08 MIN: 337.19 / MAX: 625.71 MIN: 337.56 / MAX: 641.35 1. (CC) gcc options: -O3 -march=znver2 -pthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 10-bit AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 30 60 90 120 150 SE +/- 0.34, N = 3 SE +/- 0.18, N = 3 SE +/- 0.26, N = 3 122.39 143.02 92.56 MIN: 85.78 / MAX: 202.39 MIN: 98.76 / MAX: 246.17 MIN: 61.05 / MAX: 158.66 1. (CC) gcc options: -O3 -march=znver2 -pthread
SVT-AV1 Encoder Mode: Enc Mode 0 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.0329 0.0658 0.0987 0.1316 0.1645 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.146 0.103 0.145 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 4 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.042, N = 3 SE +/- 0.025, N = 3 SE +/- 0.026, N = 3 8.657 6.775 8.591 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 8 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 16 32 48 64 80 SE +/- 0.46, N = 3 SE +/- 0.29, N = 3 SE +/- 0.30, N = 3 70.70 55.71 70.23 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-VP9 Tuning: VMAF Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: VMAF Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 80 160 240 320 400 SE +/- 1.39, N = 3 SE +/- 2.15, N = 3 SE +/- 1.70, N = 3 366.61 354.98 363.72 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 80 160 240 320 400 SE +/- 0.54, N = 3 SE +/- 1.42, N = 3 SE +/- 1.74, N = 3 368.17 348.24 365.57 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 60 120 180 240 300 SE +/- 0.90, N = 3 SE +/- 1.18, N = 3 SE +/- 1.68, N = 3 291.22 279.73 286.24 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
VP9 libvpx Encoding Speed: Speed 0 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 6.65 6.27 5.96 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver2 -fPIC -U_FORTIFY_SOURCE -std=c++11
VP9 libvpx Encoding Speed: Speed 5 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 20.12 19.03 18.23 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver2 -fPIC -U_FORTIFY_SOURCE -std=c++11
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2019-12-17 H.264 Video Encoding AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 30 60 90 120 150 SE +/- 0.67, N = 3 SE +/- 1.24, N = 3 SE +/- 0.59, N = 3 151.92 149.35 146.43 -mstack-alignment=64 -mstack-alignment=64 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -march=znver2 -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 23.69 22.83 22.44 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread -lrt -ldl -lnuma
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 11 22 33 44 55 SE +/- 0.14, N = 3 SE +/- 0.13, N = 3 SE +/- 0.09, N = 3 50.60 49.05 50.02 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread -lrt -ldl -lnuma
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 300 600 900 1200 1500 SE +/- 4.26, N = 3 SE +/- 1.00, N = 3 SE +/- 2.40, N = 3 1368 1295 1323 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 120 240 360 480 600 SE +/- 1.00, N = 3 SE +/- 5.24, N = 3 525 535 527 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 90 180 270 360 450 316 434 384 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 140 280 420 560 700 526 590 627 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 400 800 1200 1600 2000 SE +/- 22.27, N = 3 SE +/- 17.89, N = 3 SE +/- 10.17, N = 3 1653 2092 1827 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Compression Speed AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2K 4K 6K 8K 10K SE +/- 45.16, N = 3 SE +/- 56.41, N = 3 SE +/- 55.02, N = 3 9780.39 9459.69 9838.54 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Compression Speed AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 11 22 33 44 55 SE +/- 0.00, N = 3 SE +/- 0.66, N = 3 SE +/- 0.04, N = 3 48.52 45.57 48.40 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Compression Speed AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 10 20 30 40 50 SE +/- 0.03, N = 3 SE +/- 0.60, N = 3 SE +/- 0.02, N = 3 45.76 44.72 44.30 1. (CC) gcc options: -O3
Zstd Compression Compression Level: 3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 3 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2K 4K 6K 8K 10K SE +/- 3.73, N = 3 SE +/- 6.11, N = 3 SE +/- 30.93, N = 3 7937.8 7849.6 7866.2 1. (CC) gcc options: -O3 -march=znver2 -pthread -lz
Zstd Compression Compression Level: 19 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 19 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 30 60 90 120 150 SE +/- 0.26, N = 3 SE +/- 0.23, N = 3 SE +/- 0.30, N = 3 114.6 109.9 111.2 1. (CC) gcc options: -O3 -march=znver2 -pthread -lz
libjpeg-turbo tjbench Test: Decompression Throughput OpenBenchmarking.org Megapixels/sec, More Is Better libjpeg-turbo tjbench 2.0.2 Test: Decompression Throughput AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.21, N = 3 176.87 171.89 174.99 1. (CC) gcc options: -O3 -march=znver2 -rdynamic
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 600 1200 1800 2400 3000 SE +/- 43.99, N = 3 SE +/- 14.97, N = 3 SE +/- 7.04, N = 3 2779.62 2759.35 2673.79 1. (CC) gcc options: -O3 -march=znver2 -lm
Crypto++ Test: Unkeyed Algorithms OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Unkeyed Algorithms AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 70 140 210 280 350 SE +/- 0.16, N = 3 SE +/- 0.09, N = 3 SE +/- 0.20, N = 3 312.64 305.90 314.39 1. (CXX) g++ options: -O3 -march=znver2 -fPIC -pthread -pipe
LibRaw Post-Processing Benchmark OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 12 24 36 48 60 SE +/- 0.10, N = 3 SE +/- 0.16, N = 3 SE +/- 0.13, N = 3 38.47 52.54 36.98 1. (CXX) g++ options: -O3 -march=znver2 -fopenmp -ljpeg -lz -lm
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 200K 400K 600K 800K 1000K SE +/- 471.20, N = 5 SE +/- 1467.20, N = 5 SE +/- 582.00, N = 5 1138442 1007283 1143642 1. (CC) gcc options: -O3 -march=znver2 -march=native
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 12 Total Time AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 13M 26M 39M 52M 65M SE +/- 379890.17, N = 3 SE +/- 872585.44, N = 3 SE +/- 877956.02, N = 4 62375605 58347386 62434784 -flto=thin -flto -flto=jobserver -flto=thin 1. (CXX) g++ options: -m64 -lpthread -O3 -march=znver2 -fno-exceptions -std=c++17 -pedantic -msse -msse3 -mpopcnt -msse4.1 -mssse3 -msse2
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 60M 120M 180M 240M 300M SE +/- 30193.74, N = 3 SE +/- 15707.07, N = 3 SE +/- 170353.62, N = 3 294314450.62 291925951.64 292874904.54 1. (CC) gcc options: -O3 -march=znver2 -march=native -lm
Redis Test: LPUSH OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: LPUSH AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 300K 600K 900K 1200K 1500K SE +/- 18763.42, N = 3 SE +/- 21030.89, N = 15 SE +/- 22719.73, N = 15 1380024.29 1212068.06 1304842.23 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 500K 1000K 1500K 2000K 2500K SE +/- 30693.11, N = 15 SE +/- 18838.90, N = 3 SE +/- 49885.99, N = 15 1874175.91 1809976.83 2122749.43 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SET AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 300K 600K 900K 1200K 1500K SE +/- 27170.26, N = 15 SE +/- 24810.19, N = 15 SE +/- 22282.05, N = 15 1446872.35 1369757.13 1483322.31 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
NGINX Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 7K 14K 21K 28K 35K SE +/- 254.59, N = 15 SE +/- 381.73, N = 4 SE +/- 159.74, N = 3 31368.86 30658.67 30676.68 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native -march=znver2
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 1600 3200 4800 6400 8000 SE +/- 0.64, N = 3 SE +/- 0.73, N = 3 SE +/- 1.37, N = 3 5413.5 7395.4 5412.5 -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=znver2 -lssl -lcrypto -ldl
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 200 400 600 800 1000 SE +/- 1.07, N = 3 SE +/- 2.97, N = 3 SE +/- 2.29, N = 3 923.68 874.54 945.32 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 4K 8K 12K 16K 20K SE +/- 163.63, N = 6 SE +/- 140.90, N = 3 SE +/- 96.60, N = 3 13720.16 18452.62 11946.57 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 200 400 600 800 1000 SE +/- 0.47, N = 3 SE +/- 0.77, N = 3 SE +/- 2.53, N = 3 678.51 890.71 674.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 6K 12K 18K 24K 30K SE +/- 73.04, N = 3 SE +/- 137.51, N = 3 SE +/- 252.07, N = 3 29645 28919 28921 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 800 1600 2400 3200 4000 SE +/- 18.01, N = 3 SE +/- 46.08, N = 5 SE +/- 3.59, N = 3 3865 3739 3772 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 120K 240K 360K 480K 600K SE +/- 798.75, N = 3 SE +/- 4440.25, N = 3 SE +/- 4278.21, N = 3 541314 521332 530684 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 700 1400 2100 2800 3500 SE +/- 5.71, N = 3 SE +/- 5.86, N = 3 SE +/- 4.43, N = 3 3413 3453 3443 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
WebP Image Encode Encode Settings: Quality 100, Lossless OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 20.90 20.80 20.72 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
WebP Image Encode Encode Settings: Quality 100, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.008, N = 3 SE +/- 0.003, N = 3 SE +/- 0.008, N = 3 7.546 8.861 7.655 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
WebP Image Encode Encode Settings: Quality 100, Lossless, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 42.70 44.38 42.38 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
oneDNN Harness: IP Batch 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.3876 0.7752 1.1628 1.5504 1.938 SE +/- 0.00432, N = 3 SE +/- 0.00519, N = 3 SE +/- 0.00256, N = 3 1.40879 1.67503 1.72247 -fopenmp=libomp - MIN: 1.36 -fopenmp - MIN: 1.59 -fopenmp=libomp - MIN: 1.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.2645 0.529 0.7935 1.058 1.3225 SE +/- 0.00275, N = 3 SE +/- 0.00350, N = 3 SE +/- 0.00137, N = 3 1.00926 1.17556 1.03137 -fopenmp=libomp - MIN: 0.95 -fopenmp - MIN: 1.13 -fopenmp=libomp - MIN: 0.97 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 11.92 13.88 13.46 -fopenmp=libomp - MIN: 11.65 -fopenmp - MIN: 13.35 -fopenmp=libomp - MIN: 13.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.4403 0.8806 1.3209 1.7612 2.2015 SE +/- 0.00186, N = 3 SE +/- 0.01336, N = 3 SE +/- 0.01023, N = 3 1.66772 1.95695 1.69390 -fopenmp=libomp - MIN: 1.61 -fopenmp - MIN: 1.87 -fopenmp=libomp - MIN: 1.63 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.8378 1.6756 2.5134 3.3512 4.189 SE +/- 0.01101, N = 3 SE +/- 0.01624, N = 3 SE +/- 0.00775, N = 3 3.66096 3.68073 3.72354 -fopenmp=libomp - MIN: 3.49 -fopenmp - MIN: 3.53 -fopenmp=libomp - MIN: 3.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.4904 0.9808 1.4712 1.9616 2.452 SE +/- 0.00112, N = 3 SE +/- 0.00145, N = 3 SE +/- 0.00233, N = 3 1.99141 2.17957 2.05539 -fopenmp=libomp - MIN: 1.92 -fopenmp - MIN: 2.05 -fopenmp=libomp - MIN: 1.96 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.4476 0.8952 1.3428 1.7904 2.238 SE +/- 0.00189, N = 3 SE +/- 0.00392, N = 3 SE +/- 0.00691, N = 3 1.87485 1.94380 1.98927 -fopenmp=libomp - MIN: 1.82 -fopenmp - MIN: 1.82 -fopenmp=libomp - MIN: 1.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 60 120 180 240 300 SE +/- 0.06, N = 3 SE +/- 0.73, N = 3 SE +/- 1.10, N = 3 147.52 254.61 172.51 -fopenmp=libomp - MIN: 146.31 -fopenmp - MIN: 251.67 -fopenmp=libomp - MIN: 169.54 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 20 40 60 80 100 SE +/- 0.32, N = 3 SE +/- 0.80, N = 3 SE +/- 2.37, N = 15 30.72 79.25 64.99 -fopenmp=libomp - MIN: 29.35 -fopenmp - MIN: 77.35 -fopenmp=libomp - MIN: 50.95 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.1296 0.2592 0.3888 0.5184 0.648 SE +/- 0.000544, N = 3 SE +/- 0.001691, N = 3 SE +/- 0.001767, N = 3 0.464235 0.532944 0.575864 -fopenmp=libomp - MIN: 0.45 -fopenmp - MIN: 0.51 -fopenmp=libomp - MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.0079 0.0158 0.0237 0.0316 0.0395 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.034 0.035 0.035 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.0603 0.1206 0.1809 0.2412 0.3015 SE +/- 0.001, N = 3 SE +/- 0.003, N = 5 SE +/- 0.000, N = 3 0.259 0.268 0.265 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.0216 0.0432 0.0648 0.0864 0.108 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 0.092 0.096 0.094 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 14.66 14.49 14.53 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
NCNN Target: CPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: squeezenet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.12, N = 3 SE +/- 0.14, N = 15 14.38 17.29 15.35 -lomp - MIN: 13.96 / MAX: 16.87 -lgomp - MIN: 16.89 / MAX: 19.32 -lomp - MIN: 14.29 / MAX: 18.88 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mobilenet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.34, N = 3 SE +/- 0.12, N = 3 SE +/- 0.13, N = 15 17.02 19.46 17.85 -lomp - MIN: 16.39 / MAX: 20.11 -lgomp - MIN: 18.87 / MAX: 31.76 -lomp - MIN: 16.89 / MAX: 21.04 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v2-v2 - Model: mobilenet-v2 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.24, N = 3 SE +/- 0.06, N = 15 5.62 9.37 6.88 -lomp - MIN: 5.35 / MAX: 7.49 -lgomp - MIN: 8.62 / MAX: 11 -lomp - MIN: 6.35 / MAX: 16.49 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v3-v3 - Model: mobilenet-v3 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 15 5.20 8.88 6.87 -lomp - MIN: 5.05 / MAX: 7.61 -lgomp - MIN: 8.58 / MAX: 10.83 -lomp - MIN: 6.16 / MAX: 8.64 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: shufflenet-v2 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.12, N = 3 SE +/- 0.68, N = 3 SE +/- 0.02, N = 15 6.25 10.59 7.35 -lomp - MIN: 5.96 / MAX: 6.53 -lgomp - MIN: 9.42 / MAX: 13.72 -lomp - MIN: 7.09 / MAX: 11.01 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mnasnet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.29, N = 3 SE +/- 0.09, N = 15 5.01 8.95 6.26 -lomp - MIN: 4.87 / MAX: 5.39 -lgomp - MIN: 8.26 / MAX: 11.03 -lomp - MIN: 5.63 / MAX: 8.5 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: efficientnet-b0 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 SE +/- 0.09, N = 15 6.94 11.32 8.77 -lomp - MIN: 6.72 / MAX: 9 -lgomp - MIN: 11.03 / MAX: 13.16 -lomp - MIN: 7.99 / MAX: 13.6 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: blazeface AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.8978 1.7956 2.6934 3.5912 4.489 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 15 2.19 3.99 2.84 -lomp - MIN: 2.1 / MAX: 2.43 -lgomp - MIN: 3.77 / MAX: 4.73 -lomp - MIN: 2.67 / MAX: 4.67 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: googlenet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 SE +/- 0.27, N = 15 16.51 20.85 18.88 -lomp - MIN: 16.26 / MAX: 18.75 -lgomp - MIN: 19.8 / MAX: 22.84 -lomp - MIN: 16.91 / MAX: 23.31 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: vgg16 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 8 16 24 32 40 SE +/- 0.31, N = 3 SE +/- 0.30, N = 3 SE +/- 0.42, N = 15 32.03 30.89 36.60 -lomp - MIN: 30.86 / MAX: 35.2 -lgomp - MIN: 30.04 / MAX: 50.21 -lomp - MIN: 32.93 / MAX: 48.08 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet18 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 SE +/- 0.12, N = 15 11.99 13.12 13.19 -lomp - MIN: 11.71 / MAX: 14.27 -lgomp - MIN: 12.79 / MAX: 15.11 -lomp - MIN: 12.17 / MAX: 23.53 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: alexnet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.14, N = 3 SE +/- 0.18, N = 15 8.94 9.46 10.83 -lomp - MIN: 8.81 / MAX: 13.49 -lgomp - MIN: 9.17 / MAX: 11.49 -lomp - MIN: 9.15 / MAX: 60.4 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet50 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 6 12 18 24 30 SE +/- 0.18, N = 3 SE +/- 0.16, N = 3 SE +/- 0.20, N = 15 19.70 23.70 22.23 -lomp - MIN: 19.16 / MAX: 22.41 -lgomp - MIN: 23.21 / MAX: 25.68 -lomp - MIN: 20.63 / MAX: 31.79 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: yolov4-tiny AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 7 14 21 28 35 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.19, N = 15 29.38 29.47 30.70 -lomp - MIN: 28.77 / MAX: 31.68 -lgomp - MIN: 28.89 / MAX: 31.49 -lomp - MIN: 29.08 / MAX: 40.6 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 90 180 270 360 450 SE +/- 0.24, N = 3 SE +/- 0.62, N = 3 SE +/- 0.45, N = 3 365.39 324.17 392.20 -fopenmp=libomp - MIN: 364.57 / MAX: 366.34 -fopenmp - MIN: 311.9 / MAX: 354.48 -fopenmp=libomp - MIN: 390.71 / MAX: 393.87 1. (CXX) g++ options: -O3 -march=znver2 -pthread -fvisibility=hidden -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 70 140 210 280 350 SE +/- 0.66, N = 3 SE +/- 0.20, N = 3 SE +/- 1.92, N = 3 304.18 305.13 311.68 -fopenmp=libomp - MIN: 302.78 / MAX: 315.91 -fopenmp - MIN: 304.36 / MAX: 306.06 -fopenmp=libomp - MIN: 306.99 / MAX: 314.32 1. (CXX) g++ options: -O3 -march=znver2 -pthread -fvisibility=hidden -rdynamic -ldl
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.17, N = 3 SE +/- 0.04, N = 3 90.76 94.28 93.84 -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=znver2 -lm
C-Ray Total Time - 4K, 16 Rays Per Pixel OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 8 16 24 32 40 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 33.28 18.92 30.65 1. (CC) gcc options: -lm -lpthread -O3 -march=znver2
AOBench Size: 2048 x 2048 - Total Time OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 10 20 30 40 50 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 37.76 35.78 41.66 1. (CC) gcc options: -lm -O3 -march=znver2
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.004, N = 3 SE +/- 0.004, N = 3 SE +/- 0.003, N = 3 11.026 8.798 10.078 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=znver2 -lncurses -lm
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 20.76 21.72 21.54 1. (CC) gcc options: -O3 -march=znver2 -pedantic -fvisibility=hidden
ASTC Encoder Preset: Medium OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Medium AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.81 7.00 6.05 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
ASTC Encoder Preset: Thorough OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Thorough AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 8.16 9.61 8.32 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Exhaustive AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 16 32 48 64 80 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 66.04 73.46 67.17 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
Basis Universal Settings: UASTC Level 2 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 2 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 16.32 16.51 16.49 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 3 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 3 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 25.21 25.52 25.32 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 2 + RDO Post-Processing OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 2 + RDO Post-Processing AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 200 400 600 800 1000 SE +/- 0.04, N = 3 SE +/- 0.14, N = 3 SE +/- 0.24, N = 3 833.54 755.52 837.54 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 20 40 60 80 100 SE +/- 0.09, N = 3 SE +/- 0.13, N = 3 SE +/- 0.18, N = 3 78.07 75.14 77.64 1. (CC) gcc options: -O3 -march=znver2 -ldl -lz -lpthread
Phoronix Test Suite v10.8.5