EPYC 7502 AOCC 2.3 Compiler Comparison AMD EPYC 7502 testing of various benchmarks under AMD AOCC 2.3, GCC 10.2, LLVM Clang 11. CFLAGS/CXXFLAGS of "-O3 -march=znver2" throughout. Benchmarks by Michael Larabel for a future article.
HTML result view exported from: https://openbenchmarking.org/result/2012080-HA-EPYC7502A97&sgm=1&swl=1&grs&sor .
EPYC 7502 AOCC 2.3 Compiler Comparison Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver Compiler File-System Screen Resolution GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 AMD EPYC 7502 32-Core @ 2.50GHz (32 Cores / 64 Threads) ASRockRack EPYCD8 (P2.10 BIOS) AMD Starship/Matisse 126GB 280GB INTEL SSDPED1D280GA ASPEED AMD Starship/Matisse VE228 2 x Intel I350 Ubuntu 20.10 5.8.0-31-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 modesetting 1.20.9 GCC 10.2.0 ext4 1920x1080 Clang 11.0.0-2Target: Clang 11.0.0 OpenBenchmarking.org Environment Details - CXXFLAGS="-O3 -march=znver2" CFLAGS="-O3 -march=znver2" Compiler Details - GCC 10.2: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - AMD AOCC 2.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver2 Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x830101c Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
EPYC 7502 AOCC 2.3 Compiler Comparison ncnn: CPU - blazeface ncnn: CPU - mnasnet c-ray: Total Time - 4K, 16 Rays Per Pixel onednn: Recurrent Neural Network Training - f32 - CPU ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - efficientnet-b0 dav1d: Chimera 1080p 10-bit daphne: OpenMP - Points2Image libraw: Post-Processing Benchmark svt-av1: Enc Mode 0 - 1080p graphics-magick: Sharpen openssl: RSA 4096-bit Performance daphne: OpenMP - Euclidean Cluster svt-av1: Enc Mode 4 - 1080p svt-av1: Enc Mode 8 - 1080p graphics-magick: Resizing ncnn: CPU - googlenet encode-mp3: WAV To MP3 onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: IP Batch 1D - f32 - CPU tnn: CPU - MobileNet v2 astcenc: Medium ncnn: CPU - resnet50 ncnn: CPU - squeezenet graphics-magick: Enhanced ncnn: CPU - vgg16 astcenc: Thorough webp: Quality 100, Highest Compression onednn: Deconvolution Batch deconv_1d - f32 - CPU onednn: IP Batch 1D - u8s8f32 - CPU aobench: 2048 x 2048 - Total Time onednn: IP Batch All - u8s8f32 - CPU ncnn: CPU - mobilenet tscp: AI Chess Performance vpxenc: Speed 0 astcenc: Exhaustive basis: UASTC Level 2 + RDO Post-Processing vpxenc: Speed 5 ncnn: CPU - resnet18 onednn: Deconvolution Batch deconv_1d - u8s8f32 - CPU daphne: OpenMP - NDT Mapping stockfish: Total Time compress-lz4: 3 - Compression Speed onednn: Deconvolution Batch deconv_3d - u8s8f32 - CPU svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p graphics-magick: Swirl x265: Bosphorus 4K webp: Quality 100, Lossless, Highest Compression rnnoise: ncnn: CPU - yolov4-tiny pgbench: 1 - 50 - Read Only - Average Latency compress-zstd: 19 svt-vp9: Visual Quality Optimized - Bosphorus 1080p compress-lz4: 1 - Compression Speed scimark2: Composite sqlite-speedtest: Timed Time - Size 1,000 mrbayes: Primate Phylogeny Analysis pgbench: 1 - 50 - Read Only x264: H.264 Video Encoding dav1d: Summer Nature 1080p pgbench: 1 - 1 - Read Write - Average Latency pgbench: 1 - 1 - Read Write compress-lz4: 9 - Compression Speed svt-vp9: VMAF Optimized - Bosphorus 1080p x265: Bosphorus 1080p pgbench: 1 - 1 - Read Only - Average Latency tjbench: Decompression Throughput cryptopp: Unkeyed Algorithms pgbench: 1 - 1 - Read Only tnn: CPU - SqueezeNet v1.1 nginx: Static Web Page Serving dav1d: Summer Nature 4K graphics-magick: Rotate dav1d: Chimera 1080p onednn: Deconvolution Batch deconv_3d - f32 - CPU basis: UASTC Level 3 basis: UASTC Level 2 pgbench: 1 - 50 - Read Write pgbench: 1 - 50 - Read Write - Average Latency compress-zstd: 3 webp: Quality 100, Lossless hint: FLOAT ncnn: CPU - alexnet ncnn: CPU - shufflenet-v2 redis: SET redis: GET redis: LPUSH onednn: Recurrent Neural Network Inference - f32 - CPU GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 3.99 8.95 18.922 254.610 8.88 9.37 11.32 143.02 18452.623694826 52.54 0.103 434 7395.4 890.71 6.775 55.713 2092 20.85 8.798 0.532944 1.67503 324.172 7.00 23.70 17.29 590 30.89 9.61 8.861 1.95695 1.17556 35.783 13.8793 19.46 1007283 6.27 73.46 755.517 19.03 13.12 2.17957 874.54 58347386 45.57 1.94380 348.24 1295 22.83 44.383 21.723 29.47 0.096 109.9 279.73 9459.69 2759.35 75.140 94.284 521332 149.35 567.42 0.268 3739 44.72 354.98 49.05 0.035 171.888797 305.900410 28919 305.134 30658.67 269.72 535 564.96 3.68073 25.522 16.507 3453 14.488 7849.6 20.799 291925951.64210 9.46 10.59 1369757.13 1809976.83 1212068.06 79.2495 2.84 6.26 30.646 172.513 6.87 6.88 8.77 92.56 11946.566719313 36.98 0.145 384 5412.5 674.01 8.591 70.229 1827 18.88 10.078 0.575864 1.72247 392.204 6.05 22.23 15.35 627 36.60 8.32 7.655 1.69390 1.03137 41.655 13.4649 17.85 1143642 5.96 67.17 837.541 18.23 13.19 2.05539 945.32 62434784 48.40 1.98927 365.57 1323 22.44 42.377 21.540 30.70 0.094 111.2 286.24 9838.54 2673.79 77.642 93.841 530684 146.43 584.65 0.265 3772 44.30 363.72 50.02 0.035 174.991039 314.389914 28921 311.682 30676.68 275.41 527 572.19 3.72354 25.321 16.485 3443 14.528 7866.2 20.719 292874904.53733 10.83 7.35 1483322.31 2122749.43 1304842.23 64.9912 2.19 5.01 33.275 147.523 5.20 5.62 6.94 122.39 13720.155459109 38.47 0.146 316 5413.5 678.51 8.657 70.698 1653 16.51 11.026 0.464235 1.40879 365.390 5.81 19.70 14.38 526 32.03 8.16 7.546 1.66772 1.00926 37.758 11.9233 17.02 1138442 6.65 66.04 833.536 20.12 11.99 1.99141 923.68 62375605 48.52 1.87485 368.17 1368 23.69 42.704 20.756 29.38 0.092 114.6 291.22 9780.39 2779.62 78.067 90.759 541314 151.92 588.62 0.259 3865 45.76 366.61 50.60 0.034 176.873794 312.637840 29645 304.178 31368.86 274.20 525 575.22 3.66096 25.210 16.315 3413 14.655 7937.8 20.904 294314450.61706 8.94 6.25 1446872.35 1874175.91 1380024.29 30.7182 OpenBenchmarking.org
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: blazeface AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.8978 1.7956 2.6934 3.5912 4.489 SE +/- 0.03, N = 3 SE +/- 0.02, N = 15 SE +/- 0.10, N = 3 2.19 2.84 3.99 -lomp - MIN: 2.1 / MAX: 2.43 -lomp - MIN: 2.67 / MAX: 4.67 -lgomp - MIN: 3.77 / MAX: 4.73 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mnasnet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.09, N = 15 SE +/- 0.29, N = 3 5.01 6.26 8.95 -lomp - MIN: 4.87 / MAX: 5.39 -lomp - MIN: 5.63 / MAX: 8.5 -lgomp - MIN: 8.26 / MAX: 11.03 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
C-Ray Total Time - 4K, 16 Rays Per Pixel OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 18.92 30.65 33.28 1. (CC) gcc options: -lm -lpthread -O3 -march=znver2
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 60 120 180 240 300 SE +/- 0.06, N = 3 SE +/- 1.10, N = 3 SE +/- 0.73, N = 3 147.52 172.51 254.61 -fopenmp=libomp - MIN: 146.31 -fopenmp=libomp - MIN: 169.54 -fopenmp - MIN: 251.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v3-v3 - Model: mobilenet-v3 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.06, N = 15 SE +/- 0.11, N = 3 5.20 6.87 8.88 -lomp - MIN: 5.05 / MAX: 7.61 -lomp - MIN: 6.16 / MAX: 8.64 -lgomp - MIN: 8.58 / MAX: 10.83 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v2-v2 - Model: mobilenet-v2 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.06, N = 15 SE +/- 0.24, N = 3 5.62 6.88 9.37 -lomp - MIN: 5.35 / MAX: 7.49 -lomp - MIN: 6.35 / MAX: 16.49 -lgomp - MIN: 8.62 / MAX: 11 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: efficientnet-b0 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.09, N = 15 SE +/- 0.09, N = 3 6.94 8.77 11.32 -lomp - MIN: 6.72 / MAX: 9 -lomp - MIN: 7.99 / MAX: 13.6 -lgomp - MIN: 11.03 / MAX: 13.16 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p 10-bit GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.34, N = 3 SE +/- 0.26, N = 3 143.02 122.39 92.56 MIN: 98.76 / MAX: 246.17 MIN: 85.78 / MAX: 202.39 MIN: 61.05 / MAX: 158.66 1. (CC) gcc options: -O3 -march=znver2 -pthread
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 4K 8K 12K 16K 20K SE +/- 140.90, N = 3 SE +/- 163.63, N = 6 SE +/- 96.60, N = 3 18452.62 13720.16 11946.57 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
LibRaw Post-Processing Benchmark OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 12 24 36 48 60 SE +/- 0.16, N = 3 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 52.54 38.47 36.98 1. (CXX) g++ options: -O3 -march=znver2 -fopenmp -ljpeg -lz -lm
SVT-AV1 Encoder Mode: Enc Mode 0 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.0329 0.0658 0.0987 0.1316 0.1645 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.146 0.145 0.103 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 90 180 270 360 450 434 384 316 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 1600 3200 4800 6400 8000 SE +/- 0.73, N = 3 SE +/- 0.64, N = 3 SE +/- 1.37, N = 3 7395.4 5413.5 5412.5 -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -pthread -m64 -O3 -march=znver2 -lssl -lcrypto -ldl
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 200 400 600 800 1000 SE +/- 0.77, N = 3 SE +/- 0.47, N = 3 SE +/- 2.53, N = 3 890.71 678.51 674.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
SVT-AV1 Encoder Mode: Enc Mode 4 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.042, N = 3 SE +/- 0.026, N = 3 SE +/- 0.025, N = 3 8.657 8.591 6.775 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 8 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 16 32 48 64 80 SE +/- 0.46, N = 3 SE +/- 0.30, N = 3 SE +/- 0.29, N = 3 70.70 70.23 55.71 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 400 800 1200 1600 2000 SE +/- 17.89, N = 3 SE +/- 10.17, N = 3 SE +/- 22.27, N = 3 2092 1827 1653 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: googlenet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.27, N = 15 SE +/- 0.10, N = 3 16.51 18.88 20.85 -lomp - MIN: 16.26 / MAX: 18.75 -lomp - MIN: 16.91 / MAX: 23.31 -lgomp - MIN: 19.8 / MAX: 22.84 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 3 6 9 12 15 SE +/- 0.004, N = 3 SE +/- 0.003, N = 3 SE +/- 0.004, N = 3 8.798 10.078 11.026 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=znver2 -lncurses -lm
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.1296 0.2592 0.3888 0.5184 0.648 SE +/- 0.000544, N = 3 SE +/- 0.001691, N = 3 SE +/- 0.001767, N = 3 0.464235 0.532944 0.575864 -fopenmp=libomp - MIN: 0.45 -fopenmp - MIN: 0.51 -fopenmp=libomp - MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.3876 0.7752 1.1628 1.5504 1.938 SE +/- 0.00432, N = 3 SE +/- 0.00519, N = 3 SE +/- 0.00256, N = 3 1.40879 1.67503 1.72247 -fopenmp=libomp - MIN: 1.36 -fopenmp - MIN: 1.59 -fopenmp=libomp - MIN: 1.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: MobileNet v2 GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 90 180 270 360 450 SE +/- 0.62, N = 3 SE +/- 0.24, N = 3 SE +/- 0.45, N = 3 324.17 365.39 392.20 -fopenmp - MIN: 311.9 / MAX: 354.48 -fopenmp=libomp - MIN: 364.57 / MAX: 366.34 -fopenmp=libomp - MIN: 390.71 / MAX: 393.87 1. (CXX) g++ options: -O3 -march=znver2 -pthread -fvisibility=hidden -rdynamic -ldl
ASTC Encoder Preset: Medium OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Medium AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.81 6.05 7.00 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet50 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 6 12 18 24 30 SE +/- 0.18, N = 3 SE +/- 0.20, N = 15 SE +/- 0.16, N = 3 19.70 22.23 23.70 -lomp - MIN: 19.16 / MAX: 22.41 -lomp - MIN: 20.63 / MAX: 31.79 -lgomp - MIN: 23.21 / MAX: 25.68 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: squeezenet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.14, N = 15 SE +/- 0.12, N = 3 14.38 15.35 17.29 -lomp - MIN: 13.96 / MAX: 16.87 -lomp - MIN: 14.29 / MAX: 18.88 -lgomp - MIN: 16.89 / MAX: 19.32 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced LLVM Clang 11 GCC 10.2 AMD AOCC 2.3 140 280 420 560 700 627 590 526 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: vgg16 GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 8 16 24 32 40 SE +/- 0.30, N = 3 SE +/- 0.31, N = 3 SE +/- 0.42, N = 15 30.89 32.03 36.60 -lgomp - MIN: 30.04 / MAX: 50.21 -lomp - MIN: 30.86 / MAX: 35.2 -lomp - MIN: 32.93 / MAX: 48.08 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
ASTC Encoder Preset: Thorough OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Thorough AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 8.16 8.32 9.61 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
WebP Image Encode Encode Settings: Quality 100, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2 4 6 8 10 SE +/- 0.008, N = 3 SE +/- 0.008, N = 3 SE +/- 0.003, N = 3 7.546 7.655 8.861 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.4403 0.8806 1.3209 1.7612 2.2015 SE +/- 0.00186, N = 3 SE +/- 0.01023, N = 3 SE +/- 0.01336, N = 3 1.66772 1.69390 1.95695 -fopenmp=libomp - MIN: 1.61 -fopenmp=libomp - MIN: 1.63 -fopenmp - MIN: 1.87 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.2645 0.529 0.7935 1.058 1.3225 SE +/- 0.00275, N = 3 SE +/- 0.00137, N = 3 SE +/- 0.00350, N = 3 1.00926 1.03137 1.17556 -fopenmp=libomp - MIN: 0.95 -fopenmp=libomp - MIN: 0.97 -fopenmp - MIN: 1.13 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
AOBench Size: 2048 x 2048 - Total Time OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 10 20 30 40 50 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 35.78 37.76 41.66 1. (CC) gcc options: -lm -O3 -march=znver2
oneDNN Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 11.92 13.46 13.88 -fopenmp=libomp - MIN: 11.65 -fopenmp=libomp - MIN: 13.14 -fopenmp - MIN: 13.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mobilenet AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 5 10 15 20 25 SE +/- 0.34, N = 3 SE +/- 0.13, N = 15 SE +/- 0.12, N = 3 17.02 17.85 19.46 -lomp - MIN: 16.39 / MAX: 20.11 -lomp - MIN: 16.89 / MAX: 21.04 -lgomp - MIN: 18.87 / MAX: 31.76 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 200K 400K 600K 800K 1000K SE +/- 582.00, N = 5 SE +/- 471.20, N = 5 SE +/- 1467.20, N = 5 1143642 1138442 1007283 1. (CC) gcc options: -O3 -march=znver2 -march=native
VP9 libvpx Encoding Speed: Speed 0 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 6.65 6.27 5.96 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver2 -fPIC -U_FORTIFY_SOURCE -std=c++11
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.0 Preset: Exhaustive AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 16 32 48 64 80 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 66.04 67.17 73.46 1. (CXX) g++ options: -std=c++14 -fvisibility=hidden -O3 -flto -mfpmath=sse -mavx2 -mpopcnt -lpthread
Basis Universal Settings: UASTC Level 2 + RDO Post-Processing OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 2 + RDO Post-Processing GCC 10.2 AMD AOCC 2.3 LLVM Clang 11 200 400 600 800 1000 SE +/- 0.14, N = 3 SE +/- 0.04, N = 3 SE +/- 0.24, N = 3 755.52 833.54 837.54 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
VP9 libvpx Encoding Speed: Speed 5 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 20.12 19.03 18.23 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver2 -fPIC -U_FORTIFY_SOURCE -std=c++11
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet18 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 SE +/- 0.12, N = 15 11.99 13.12 13.19 -lomp - MIN: 11.71 / MAX: 14.27 -lgomp - MIN: 12.79 / MAX: 15.11 -lomp - MIN: 12.17 / MAX: 23.53 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.4904 0.9808 1.4712 1.9616 2.452 SE +/- 0.00112, N = 3 SE +/- 0.00233, N = 3 SE +/- 0.00145, N = 3 1.99141 2.05539 2.17957 -fopenmp=libomp - MIN: 1.92 -fopenmp=libomp - MIN: 1.96 -fopenmp - MIN: 2.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 200 400 600 800 1000 SE +/- 2.29, N = 3 SE +/- 1.07, N = 3 SE +/- 2.97, N = 3 945.32 923.68 874.54 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 12 Total Time LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 13M 26M 39M 52M 65M SE +/- 877956.02, N = 4 SE +/- 379890.17, N = 3 SE +/- 872585.44, N = 3 62434784 62375605 58347386 -flto=thin -flto=thin -flto -flto=jobserver 1. (CXX) g++ options: -m64 -lpthread -O3 -march=znver2 -fno-exceptions -std=c++17 -pedantic -msse -msse3 -mpopcnt -msse4.1 -mssse3 -msse2
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Compression Speed AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 11 22 33 44 55 SE +/- 0.00, N = 3 SE +/- 0.04, N = 3 SE +/- 0.66, N = 3 48.52 48.40 45.57 1. (CC) gcc options: -O3
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.4476 0.8952 1.3428 1.7904 2.238 SE +/- 0.00189, N = 3 SE +/- 0.00392, N = 3 SE +/- 0.00691, N = 3 1.87485 1.94380 1.98927 -fopenmp=libomp - MIN: 1.82 -fopenmp - MIN: 1.82 -fopenmp=libomp - MIN: 1.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
SVT-VP9 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 80 160 240 320 400 SE +/- 0.54, N = 3 SE +/- 1.74, N = 3 SE +/- 1.42, N = 3 368.17 365.57 348.24 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 300 600 900 1200 1500 SE +/- 4.26, N = 3 SE +/- 2.40, N = 3 SE +/- 1.00, N = 3 1368 1323 1295 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 23.69 22.83 22.44 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread -lrt -ldl -lnuma
WebP Image Encode Encode Settings: Quality 100, Lossless, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 10 20 30 40 50 SE +/- 0.12, N = 3 SE +/- 0.01, N = 3 SE +/- 0.12, N = 3 42.38 42.70 44.38 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 20.76 21.54 21.72 1. (CC) gcc options: -O3 -march=znver2 -pedantic -fvisibility=hidden
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: yolov4-tiny AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 7 14 21 28 35 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.19, N = 15 29.38 29.47 30.70 -lomp - MIN: 28.77 / MAX: 31.68 -lgomp - MIN: 28.89 / MAX: 31.49 -lomp - MIN: 29.08 / MAX: 40.6 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only - Average Latency AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.0216 0.0432 0.0648 0.0864 0.108 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 0.092 0.094 0.096 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
Zstd Compression Compression Level: 19 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 19 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 30 60 90 120 150 SE +/- 0.26, N = 3 SE +/- 0.30, N = 3 SE +/- 0.23, N = 3 114.6 111.2 109.9 1. (CC) gcc options: -O3 -march=znver2 -pthread -lz
SVT-VP9 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 60 120 180 240 300 SE +/- 0.90, N = 3 SE +/- 1.68, N = 3 SE +/- 1.18, N = 3 291.22 286.24 279.73 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 1 - Compression Speed LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 2K 4K 6K 8K 10K SE +/- 55.02, N = 3 SE +/- 45.16, N = 3 SE +/- 56.41, N = 3 9838.54 9780.39 9459.69 1. (CC) gcc options: -O3
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 600 1200 1800 2400 3000 SE +/- 43.99, N = 3 SE +/- 14.97, N = 3 SE +/- 7.04, N = 3 2779.62 2759.35 2673.79 1. (CC) gcc options: -O3 -march=znver2 -lm
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.18, N = 3 SE +/- 0.09, N = 3 75.14 77.64 78.07 1. (CC) gcc options: -O3 -march=znver2 -ldl -lz -lpthread
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.17, N = 3 90.76 93.84 94.28 -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=znver2 -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Only AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 120K 240K 360K 480K 600K SE +/- 798.75, N = 3 SE +/- 4278.21, N = 3 SE +/- 4440.25, N = 3 541314 530684 521332 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2019-12-17 H.264 Video Encoding AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 30 60 90 120 150 SE +/- 0.67, N = 3 SE +/- 1.24, N = 3 SE +/- 0.59, N = 3 151.92 149.35 146.43 -mstack-alignment=64 -mstack-alignment=64 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -march=znver2 -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 130 260 390 520 650 SE +/- 2.09, N = 3 SE +/- 0.95, N = 3 SE +/- 0.36, N = 3 588.62 584.65 567.42 MIN: 345.64 / MAX: 651.08 MIN: 337.56 / MAX: 641.35 MIN: 337.19 / MAX: 625.71 1. (CC) gcc options: -O3 -march=znver2 -pthread
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write - Average Latency AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 0.0603 0.1206 0.1809 0.2412 0.3015 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 SE +/- 0.003, N = 5 0.259 0.265 0.268 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Write AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 800 1600 2400 3200 4000 SE +/- 18.01, N = 3 SE +/- 3.59, N = 3 SE +/- 46.08, N = 5 3865 3772 3739 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Compression Speed AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 10 20 30 40 50 SE +/- 0.03, N = 3 SE +/- 0.60, N = 3 SE +/- 0.02, N = 3 45.76 44.72 44.30 1. (CC) gcc options: -O3
SVT-VP9 Tuning: VMAF Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.1 Tuning: VMAF Optimized - Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 80 160 240 320 400 SE +/- 1.39, N = 3 SE +/- 1.70, N = 3 SE +/- 2.15, N = 3 366.61 363.72 354.98 1. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 11 22 33 44 55 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 SE +/- 0.13, N = 3 50.60 50.02 49.05 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread -lrt -ldl -lnuma
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only - Average Latency AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.0079 0.0158 0.0237 0.0316 0.0395 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.034 0.035 0.035 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
libjpeg-turbo tjbench Test: Decompression Throughput OpenBenchmarking.org Megapixels/sec, More Is Better libjpeg-turbo tjbench 2.0.2 Test: Decompression Throughput AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.21, N = 3 SE +/- 0.04, N = 3 176.87 174.99 171.89 1. (CC) gcc options: -O3 -march=znver2 -rdynamic
Crypto++ Test: Unkeyed Algorithms OpenBenchmarking.org MiB/second, More Is Better Crypto++ 8.2 Test: Unkeyed Algorithms LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 70 140 210 280 350 SE +/- 0.20, N = 3 SE +/- 0.16, N = 3 SE +/- 0.09, N = 3 314.39 312.64 305.90 1. (CXX) g++ options: -O3 -march=znver2 -fPIC -pthread -pipe
PostgreSQL pgbench Scaling Factor: 1 - Clients: 1 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 1 - Mode: Read Only AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 6K 12K 18K 24K 30K SE +/- 73.04, N = 3 SE +/- 252.07, N = 3 SE +/- 137.51, N = 3 29645 28921 28919 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.2.3 Target: CPU - Model: SqueezeNet v1.1 AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 70 140 210 280 350 SE +/- 0.66, N = 3 SE +/- 0.20, N = 3 SE +/- 1.92, N = 3 304.18 305.13 311.68 -fopenmp=libomp - MIN: 302.78 / MAX: 315.91 -fopenmp - MIN: 304.36 / MAX: 306.06 -fopenmp=libomp - MIN: 306.99 / MAX: 314.32 1. (CXX) g++ options: -O3 -march=znver2 -pthread -fvisibility=hidden -rdynamic -ldl
NGINX Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 7K 14K 21K 28K 35K SE +/- 254.59, N = 15 SE +/- 159.74, N = 3 SE +/- 381.73, N = 4 31368.86 30676.68 30658.67 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native -march=znver2
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Summer Nature 4K LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 60 120 180 240 300 SE +/- 1.08, N = 3 SE +/- 1.22, N = 3 SE +/- 0.57, N = 3 275.41 274.20 269.72 MIN: 155.99 / MAX: 295.35 MIN: 151.98 / MAX: 293.44 MIN: 160.12 / MAX: 288.9 1. (CC) gcc options: -O3 -march=znver2 -pthread
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 120 240 360 480 600 SE +/- 5.24, N = 3 SE +/- 1.00, N = 3 535 527 525 1. (CC) gcc options: -fopenmp -O3 -march=znver2 -pthread -ljpeg -lX11 -lz -lm -lpthread
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.7.0 Video Input: Chimera 1080p AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 120 240 360 480 600 SE +/- 0.49, N = 3 SE +/- 0.97, N = 3 SE +/- 1.05, N = 3 575.22 572.19 564.96 MIN: 414.12 / MAX: 729.95 MIN: 404.8 / MAX: 726.7 MIN: 399.64 / MAX: 724.27 1. (CC) gcc options: -O3 -march=znver2 -pthread
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 0.8378 1.6756 2.5134 3.3512 4.189 SE +/- 0.01101, N = 3 SE +/- 0.01624, N = 3 SE +/- 0.00775, N = 3 3.66096 3.68073 3.72354 -fopenmp=libomp - MIN: 3.49 -fopenmp - MIN: 3.53 -fopenmp=libomp - MIN: 3.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Basis Universal Settings: UASTC Level 3 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 3 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 25.21 25.32 25.52 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 2 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.12 Settings: UASTC Level 2 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 16.32 16.49 16.51 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 700 1400 2100 2800 3500 SE +/- 5.86, N = 3 SE +/- 4.43, N = 3 SE +/- 5.71, N = 3 3453 3443 3413 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 1 - Clients: 50 - Mode: Read Write - Average Latency GCC 10.2 LLVM Clang 11 AMD AOCC 2.3 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 14.49 14.53 14.66 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=znver2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
Zstd Compression Compression Level: 3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 3 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 2K 4K 6K 8K 10K SE +/- 3.73, N = 3 SE +/- 30.93, N = 3 SE +/- 6.11, N = 3 7937.8 7866.2 7849.6 1. (CC) gcc options: -O3 -march=znver2 -pthread -lz
WebP Image Encode Encode Settings: Quality 100, Lossless OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless LLVM Clang 11 GCC 10.2 AMD AOCC 2.3 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.03, N = 3 20.72 20.80 20.90 1. (CC) gcc options: -fvisibility=hidden -O3 -march=znver2 -pthread -lm -ljpeg
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 60M 120M 180M 240M 300M SE +/- 30193.74, N = 3 SE +/- 170353.62, N = 3 SE +/- 15707.07, N = 3 294314450.62 292874904.54 291925951.64 1. (CC) gcc options: -O3 -march=znver2 -march=native -lm
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: alexnet AMD AOCC 2.3 GCC 10.2 LLVM Clang 11 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.14, N = 3 SE +/- 0.18, N = 15 8.94 9.46 10.83 -lomp - MIN: 8.81 / MAX: 13.49 -lgomp - MIN: 9.17 / MAX: 11.49 -lomp - MIN: 9.15 / MAX: 60.4 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: shufflenet-v2 AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 3 6 9 12 15 SE +/- 0.12, N = 3 SE +/- 0.02, N = 15 SE +/- 0.68, N = 3 6.25 7.35 10.59 -lomp - MIN: 5.96 / MAX: 6.53 -lomp - MIN: 7.09 / MAX: 11.01 -lgomp - MIN: 9.42 / MAX: 13.72 1. (CXX) g++ options: -O3 -march=znver2 -rdynamic -lpthread
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: SET LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 300K 600K 900K 1200K 1500K SE +/- 22282.05, N = 15 SE +/- 27170.26, N = 15 SE +/- 24810.19, N = 15 1483322.31 1446872.35 1369757.13 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: GET LLVM Clang 11 AMD AOCC 2.3 GCC 10.2 500K 1000K 1500K 2000K 2500K SE +/- 49885.99, N = 15 SE +/- 30693.11, N = 15 SE +/- 18838.90, N = 3 2122749.43 1874175.91 1809976.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
Redis Test: LPUSH OpenBenchmarking.org Requests Per Second, More Is Better Redis 6.0.9 Test: LPUSH AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 300K 600K 900K 1200K 1500K SE +/- 18763.42, N = 3 SE +/- 22719.73, N = 15 SE +/- 21030.89, N = 15 1380024.29 1304842.23 1212068.06 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3 -march=znver2
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 20 40 60 80 100 SE +/- 0.32, N = 3 SE +/- 2.37, N = 15 SE +/- 0.80, N = 3 30.72 64.99 79.25 -fopenmp=libomp - MIN: 29.35 -fopenmp=libomp - MIN: 50.95 -fopenmp - MIN: 77.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
Geometric Mean Of All Test Results Result Composite - EPYC 7502 AOCC 2.3 Compiler Comparison OpenBenchmarking.org Geometric Mean, More Is Better Geometric Mean Of All Test Results Result Composite - EPYC 7502 AOCC 2.3 Compiler Comparison AMD AOCC 2.3 LLVM Clang 11 GCC 10.2 30 60 90 120 150 121.37 115.39 113.07
Number Of First Place Finishes Wins - 89 Tests LLVM Clang 11 11 [12.4%] GCC 10.2 17 [19.1%] AMD AOCC 2.3 61 [68.5%] Number Of First Place Finishes Wins - 89 Tests OpenBenchmarking.org
Number Of Last Place Finishes Losses - 89 Tests AMD AOCC 2.3 10 [11.2%] LLVM Clang 11 23 [25.8%] GCC 10.2 56 [62.9%] Number Of Last Place Finishes Losses - 89 Tests OpenBenchmarking.org
Phoronix Test Suite v10.8.5