AMD EPYC 7763 64-Core testing with a Supermicro H12SSL-i v1.01 (2.0 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2104140-IB-EPYC7763L31 EPYC 7763 LLVM Clang Compiler Tests - Phoronix Test Suite EPYC 7763 LLVM Clang Compiler Tests AMD EPYC 7763 64-Core testing with a Supermicro H12SSL-i v1.01 (2.0 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2104140-IB-EPYC7763L31&gru&sro .
EPYC 7763 LLVM Clang Compiler Tests Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution Clang 12.0 Clang 11.0 Clang 12.0 LTO GCC 9.3 GCC 10.3 GCC 11.0.1 AMD AOCC 3.0 AMD EPYC 7763 64-Core @ 2.45GHz (64 Cores / 128 Threads) Supermicro H12SSL-i v1.01 (2.0 BIOS) AMD Starship/Matisse 126GB 3841GB Micron_9300_MTFDHAL3T8TDP ASPEED 2 x Broadcom NetXtreme BCM5720 2-port PCIe Ubuntu 20.04 5.12.0-051200rc6daily20210408-generic (x86_64) 20210407 GNOME Shell 3.36.4 X Server 1.20.8 Clang 12.0.0-++20210409092622+fa0971b87fb2-1~exp1~20210409193326.73 ext4 1024x768 Clang 11.0.0-2~ubuntu20.04.1 Clang 12.0.0-++20210409092622+fa0971b87fb2-1~exp1~20210409193326.73 GCC 9.3.0 GCC 10.3.0 GCC 11.0.1 20210413 Clang 12.0.0 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Environment Details - Clang 12.0: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" - Clang 11.0: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" - Clang 12.0 LTO: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto" - GCC 9.3: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" - GCC 10.3: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" - GCC 11.0.1: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" - AMD AOCC 3.0: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa001119 Python Details - Python 3.8.2 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected Compiler Details - GCC 9.3: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - GCC 10.3: --disable-multilib --enable-checking=release - GCC 11.0.1: --disable-multilib --enable-checking=release - AMD AOCC 3.0: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: (unknown)
EPYC 7763 LLVM Clang Compiler Tests dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit aom-av1: Speed 0 Two-Pass - Bosphorus 4K aom-av1: Speed 4 Two-Pass - Bosphorus 4K aom-av1: Speed 6 Realtime - Bosphorus 4K aom-av1: Speed 6 Two-Pass - Bosphorus 4K aom-av1: Speed 8 Realtime - Bosphorus 4K aom-av1: Speed 9 Realtime - Bosphorus 4K aom-av1: Speed 0 Two-Pass - Bosphorus 1080p aom-av1: Speed 4 Two-Pass - Bosphorus 1080p aom-av1: Speed 6 Realtime - Bosphorus 1080p aom-av1: Speed 6 Two-Pass - Bosphorus 1080p aom-av1: Speed 8 Realtime - Bosphorus 1080p aom-av1: Speed 9 Realtime - Bosphorus 1080p svt-av1: Enc Mode 0 - 1080p svt-av1: Enc Mode 4 - 1080p svt-av1: Enc Mode 8 - 1080p svt-hevc: 1 - Bosphorus 1080p svt-hevc: 7 - Bosphorus 1080p svt-hevc: 10 - Bosphorus 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p x265: Bosphorus 4K x265: Bosphorus 1080p simdjson: Kostya simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT onnx: yolov4 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU graphics-magick: Swirl graphics-magick: Rotate graphics-magick: Sharpen graphics-magick: Enhanced graphics-magick: Resizing graphics-magick: Noise-Gaussian graphics-magick: HWB Color Space coremark: CoreMark Size 666 - Iterations Per Second securemark: SecureMark-TLS compress-lz4: 3 - Compression Speed compress-lz4: 3 - Decompression Speed compress-lz4: 9 - Compression Speed compress-lz4: 9 - Decompression Speed quantlib: fftw: Stock - 1D FFT Size 32 fftw: Stock - 1D FFT Size 1024 fftw: Stock - 1D FFT Size 2048 fftw: Stock - 1D FFT Size 4096 fftw: Stock - 2D FFT Size 1024 fftw: Stock - 2D FFT Size 2048 fftw: Stock - 2D FFT Size 4096 fftw: Float + SSE - 1D FFT Size 32 fftw: Float + SSE - 1D FFT Size 1024 fftw: Float + SSE - 1D FFT Size 2048 fftw: Float + SSE - 1D FFT Size 4096 fftw: Float + SSE - 2D FFT Size 1024 fftw: Float + SSE - 2D FFT Size 2048 fftw: Float + SSE - 2D FFT Size 4096 scimark2: Composite scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Sparse Matrix Multiply scimark2: Dense LU Matrix Factorization scimark2: Jacobi Successive Over-Relaxation botan: KASUMI botan: KASUMI - Decrypt botan: AES-256 botan: AES-256 - Decrypt botan: Twofish botan: Twofish - Decrypt botan: Blowfish botan: Blowfish - Decrypt botan: CAST-256 botan: CAST-256 - Decrypt botan: ChaCha20Poly1305 botan: ChaCha20Poly1305 - Decrypt jpegxl: PNG - 5 jpegxl: PNG - 7 jpegxl: PNG - 8 jpegxl: JPEG - 5 jpegxl: JPEG - 7 jpegxl: JPEG - 8 libraw: Post-Processing Benchmark etcpak: DXT1 etcpak: ETC1 etcpak: ETC2 tscp: AI Chess Performance liquid-dsp: 1 - 256 - 57 liquid-dsp: 32 - 256 - 57 liquid-dsp: 64 - 256 - 57 liquid-dsp: 128 - 256 - 57 pgbench: 100 - 1 - Read Only pgbench: 100 - 1 - Read Write pgbench: 100 - 100 - Read Only pgbench: 100 - 250 - Read Only pgbench: 100 - 100 - Read Write pgbench: 100 - 250 - Read Write webp: Default webp: Quality 100 webp: Quality 100, Lossless webp: Quality 100, Highest Compression webp: Quality 100, Lossless, Highest Compression toybrot: TBB toybrot: OpenMP toybrot: C++ Tasks toybrot: C++ Threads onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU financebench: Repo OpenMP financebench: Bonds OpenMP pgbench: 100 - 1 - Read Only - Average Latency pgbench: 100 - 1 - Read Write - Average Latency pgbench: 100 - 100 - Read Only - Average Latency pgbench: 100 - 250 - Read Only - Average Latency pgbench: 100 - 100 - Read Write - Average Latency pgbench: 100 - 250 - Read Write - Average Latency mrbayes: Primate Phylogeny Analysis avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 10 avifenc: 6, Lossless avifenc: 10, Lossless c-ray: Total Time - 4K, 16 Rays Per Pixel povray: Trace Time encode-flac: WAV To FLAC encode-mp3: WAV To MP3 encode-opus: WAV To Opus Encode gcrypt: ngspice: C2670 ngspice: C7552 tachyon: Total Time webp2: Default webp2: Quality 75, Compression Effort 7 webp2: Quality 95, Compression Effort 7 webp2: Quality 100, Compression Effort 5 webp2: Quality 100, Lossless Compression astcenc: Medium astcenc: Thorough astcenc: Exhaustive Clang 12.0 Clang 11.0 Clang 12.0 LTO GCC 9.3 GCC 10.3 GCC 11.0.1 AMD AOCC 3.0 1198.22 541.56 1244.11 308.32 0.21 4.87 17.22 8.99 33.39 38.11 0.53 7.10 26.85 22.13 88.78 103.17 0.183 11.474 118.067 41.09 345.30 643.58 487.43 488.23 372.49 30.32 74.00 2.75 0.84 4.60 4.62 471 357 434 604 878 819 69.1 626 48.6 65.7 51.9 73.0 333 498 112 9904 4456 1993 712 614 1076 2136 457 605 1785466.283969 265204 52.07 13911.5 48.50 13926.5 2653.8 13333 10805 10467 9862.0 9088.3 7789.9 6744.1 15649 50350 51254 45428 36239 31935 22797 3190.62 675.13 363.85 4280.22 8848.40 1785.50 82.644 84.229 4659.338 4682.455 315.409 321.190 380.054 351.284 132.820 133.048 850.496 843.404 74.27 12.15 0.82 66.66 66.38 28.13 41.78 2718.525 284.642 202.085 1570966 55663000 1564833333 3070633333 3643766667 24310 3281 1069022 1071209 62319 56684 1.331 2.199 19.016 6.309 38.449 6780 7507 7437 7220 1.07701 3.28507 1.07507 0.710124 1.22132 1.44425 2.36797 2.03606 0.491940 0.779776 1302.70 593.972 1307.49 590.182 0.313689 1305.10 597.481 1.17258 33246.837239 51596.867187 0.041 0.305 0.094 0.234 1.607 4.431 89.116 47.884 25.175 9.510 3.361 25.220 5.746 15.870 9.296 7.854 8.256 7.567 236.924 118.870 95.956 16.0468 2.739 109.525 207.008 6.690 374.035 4.0058 6.7647 18.9936 1190.41 543.43 1251.25 184.19 0.21 4.95 17.13 9.14 33.14 37.28 0.53 7.20 26.61 22.00 86.09 100.55 0.181 11.821 117.392 41.01 346.89 652.74 481.05 482.02 373.99 29.94 73.36 2.68 0.81 4.41 4.41 495 412 462 1877 1043 933 51.2 677 83.6 79.3 88.3 84.0 346 471 108 9797 4523 1915 665 613 1068 2034 463 616 1790837.010000 260119 52.35 13840.3 49.01 13927.9 2640.2 13324 10564 10004.2 9438.6 8809.6 7878.5 6823.8 14590 50740 50084 46676 36181 31741 22913 3319.34 674.86 399.16 4590.37 9146.88 1785.42 79.149 80.221 4901.127 4895.558 299.214 302.405 319.234 351.075 128.586 127.740 848.236 840.637 78.41 12.01 0.8 65.58 65.43 27.24 38.71 1872.759 205.065 168.819 1638265 56307000 1578400000 3051366667 3596533333 24943 3312 1069367 1065506 61616 54488 1.336 2.240 18.573 6.243 37.727 6247 7029 6836 6395 1.08011 3.52787 1.07577 0.594729 0.841169 1.45757 2.31859 1.60540 0.489278 0.779101 1276.04 563.200 1277.62 562.970 0.315522 1271.91 563.247 1.15140 33178.498698 51900.434896 0.040 0.302 0.094 0.235 1.626 4.603 88.620 47.894 25.472 9.536 3.429 26.034 5.879 15.599 9.408 7.979 8.250 7.392 240.205 103.826 90.527 16.4099 2.743 109.636 203.634 7.366 392.849 3.9837 6.7674 19.0255 50.93 13715.0 48.47 13698.7 2657.8 2719.985 284.763 202.101 7085 7367 7143 93.633 1145.50 530.82 1228.63 305.36 0.2 4.78 16.29 9.57 34.56 39.12 0.5 6.69 24.84 21.42 91.97 106.55 0.129 9.325 92.984 38.41 322.42 605.50 463.12 464.57 354.21 28.91 72.14 2.75 0.94 3.93 3.98 1217 813 636 1587 1521 1133 65.0 798 98.5 95.3 100.9 97.9 351 495 116 9419 5183 2129 709 806 1217 1238 547 785 2086609.978010 238935 53.83 13793.4 51.97 13895.3 2338.9 14399 11689 11053 10548 9798.6 8408.5 7007.3 16590 53275 52749 52099 36321 31341 25068 3229.22 668.10 384.03 3765.88 9178.97 2149.15 84.864 84.130 5484.676 5391.990 337.355 339.069 412.846 412.072 127.298 127.343 616.096 611.977 60.20 1082.365 269.673 174.812 1446372 61404000 1721900000 2940466667 3012066667 23895 3298 1057125 1067486 59364 53825 1.397 2.273 19.298 7.053 39.072 5107 5451 5414 5142 1.17486 3.67278 1.17434 0.654010 0.869308 7.19213 2.99759 1.66260 0.599140 0.786762 1357.29 658.660 1358.56 659.191 0.376992 1356.91 657.876 0.717782 42399.807757 76805.580729 0.042 0.303 0.095 0.235 1.688 4.657 89.163 52.217 27.784 10.399 3.659 29.080 6.131 9.158 9.968 8.534 7.011 7.504 232.572 101.535 89.091 15.6837 2.778 118.447 220.944 6.753 388.946 4.8745 7.8537 19.4794 1171.04 536.71 1245.11 316.14 0.21 4.84 17.03 9.10 35.26 39.32 0.52 6.87 26.49 21.64 93.05 107.46 0.169 11.230 109.697 39.03 330.53 615.62 472.61 477.67 364.12 28.60 72.60 2.77 0.9 4.02 4.13 1065.60 1350.0 592.97 1461.2 2158.4 1056.42 56.2 741.4 98.7 94.4 104 98.5 351 505 115 10197 5559 2112 689 807 1039 1208 544 772 2110880.427978 242700 52.87 13906.1 52.36 13806.6 2392.6 12576 11319 10711 10179 9247.3 8134.5 6974.0 16650 52054 53497 52130 35973 32061 23774 3235.94 682.87 388.98 3820.77 9248.89 2038.15 79.115 81.453 5525.710 5529.402 341.847 325.389 422.138 420.853 127.741 127.775 485.019 476.175 58.90 1114.603 281.146 173.226 1467179 62467333 1718000000 2942866667 3005033333 24845 3369 1076357 1089731 58894 53019 1.372 2.225 18.883 7.078 38.548 5181 5524 5610 5383 1.17894 3.61144 1.19747 0.646252 0.870784 7.23686 3.00341 1.64268 0.602155 0.782476 1382.41 659.265 1379.51 658.277 0.377733 1375.71 658.038 0.788192 34979.294271 51770.509114 0.040 0.297 0.093 0.230 1.701 4.731 93.656 51.454 27.386 10.417 3.643 26.911 6.107 9.029 9.570 8.567 7.231 7.469 231.238 103.598 90.432 16.1468 2.918 116.655 215.565 6.934 406.027 4.8699 7.8370 19.4583 1180.44 538.28 1249.74 334.35 0.21 4.84 17.37 9.41 35.26 39.71 0.52 6.95 27.01 22.11 94.40 111.27 0.176 11.905 110.702 38.86 329.32 611.73 472.32 478.16 366.39 28.79 71.79 1210 1496 649 1599 2359 1153 63.9 794 100.5 95.0 104 99.3 2161 694 809 1082 1188 550 771 2176407.665929 243861 51.32 13882.2 51.17 13857.4 12765 11044 10675 10205 9238.4 8231.1 6948.2 16590 51706 54710 51391 35718 31662 24888 3182.35 647.82 388.88 3462.66 9263.55 2148.84 57.24 1494250 60886333 1679800000 2989400000 3055766667 23661 3383 1090824 1090160 56369 53102 1.386 2.274 18.314 7.003 37.948 34199.600260 51376.816406 0.042 0.296 0.092 0.230 1.777 4.722 89.432 51.034 27.103 10.291 3.607 27.057 6.149 9.227 8.709 7.473 7.381 233.514 103.005 90.264 15.4989 4.8160 7.6989 19.6189 1188.43 541.58 1251.91 192.00 0.183 11.690 116.493 40.95 343.85 638.10 476.95 478.62 373.89 30.44 73.51 2.73 0.82 4.33 4.47 531 326 477 1944 1017 1165 55.2 783 84.0 78.8 90.0 84.4 386 459 122 11325 4383 1929 660 617 1057 1866 466 614 1720060.441307 264637 53.77 13562.5 50.32 13561.5 2725.7 13192 10669 10227 9603.2 8902.1 7784.8 6875.3 16146 49685 44412 45521 36100 31013 23111 3298.29 690.94 398.96 4594.27 9021.83 1785.45 82.827 82.949 4891.072 4887.573 304.996 303.806 319.787 355.059 127.768 128.008 845.141 838.089 79.23 11.37 0.81 65.57 65.68 27.29 41.64 2654.721 211.733 178.852 1697846 57411333 1609633333 3100400000 3606466667 1.351 2.262 19.126 6.578 38.338 6945 7477 7189 7144 1.03899 3.41583 1.04484 0.554231 0.833921 1.37059 2.28755 1.59597 0.459724 0.773233 1267.18 544.099 1259.59 544.306 0.301885 1268.08 544.600 1.17044 33146.028646 51885.519531 86.742 48.127 25.598 9.725 3.543 25.783 5.948 15.649 9.494 9.280 8.142 240.405 103.929 91.986 16.0581 2.816 109.811 205.034 7.403 382.985 3.8811 6.6409 18.9127 OpenBenchmarking.org
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.2 Video Input: Chimera 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 300 600 900 1200 1500 SE +/- 0.97, N = 3 SE +/- 6.69, N = 3 SE +/- 2.95, N = 3 SE +/- 3.74, N = 3 SE +/- 1.75, N = 3 SE +/- 5.12, N = 3 1188.43 1190.41 1198.22 1171.04 1180.44 1145.50 -lm - MIN: 703.73 / MAX: 1484.94 -lm - MIN: 685.16 / MAX: 1496.36 MIN: 700.24 / MAX: 1494.16 -lm - MIN: 683.28 / MAX: 1473.51 -lm - MIN: 680.31 / MAX: 1485.74 -lm - MIN: 664.19 / MAX: 1441.54 1. (CC) gcc options: -O3 -march=native -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 0.8.2 Video Input: Summer Nature 4K AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 120 240 360 480 600 SE +/- 1.13, N = 3 SE +/- 1.43, N = 3 SE +/- 1.79, N = 3 SE +/- 0.67, N = 3 SE +/- 2.51, N = 3 SE +/- 1.35, N = 3 541.58 543.43 541.56 536.71 538.28 530.82 -lm - MIN: 259.4 / MAX: 585.8 -lm - MIN: 256.75 / MAX: 593.99 MIN: 252.01 / MAX: 587.53 -lm - MIN: 256.44 / MAX: 577.82 -lm - MIN: 251.6 / MAX: 584.38 -lm - MIN: 248.84 / MAX: 574.28 1. (CC) gcc options: -O3 -march=native -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 0.8.2 Video Input: Summer Nature 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 300 600 900 1200 1500 SE +/- 4.95, N = 3 SE +/- 2.13, N = 3 SE +/- 7.87, N = 3 SE +/- 8.15, N = 3 SE +/- 1.96, N = 3 SE +/- 2.25, N = 3 1251.91 1251.25 1244.11 1245.11 1249.74 1228.63 -lm - MIN: 543.89 / MAX: 1394.16 -lm - MIN: 556.46 / MAX: 1394.06 MIN: 549.81 / MAX: 1390.03 -lm - MIN: 539.07 / MAX: 1398.87 -lm - MIN: 559.74 / MAX: 1387.11 -lm - MIN: 555.28 / MAX: 1361.68 1. (CC) gcc options: -O3 -march=native -pthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 0.8.2 Video Input: Chimera 1080p 10-bit AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 70 140 210 280 350 SE +/- 0.39, N = 3 SE +/- 0.48, N = 3 SE +/- 0.93, N = 3 SE +/- 0.21, N = 3 SE +/- 1.11, N = 3 SE +/- 0.71, N = 3 192.00 184.19 308.32 316.14 334.35 305.36 -lm - MIN: 118.57 / MAX: 324.98 -lm - MIN: 114.52 / MAX: 310.5 MIN: 220.53 / MAX: 490.51 -lm - MIN: 218.19 / MAX: 515.85 -lm - MIN: 234.24 / MAX: 544.9 -lm - MIN: 210.86 / MAX: 493.21 1. (CC) gcc options: -O3 -march=native -pthread
AOM AV1 Encoder Mode: Speed 0 Two-Pass - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 0 Two-Pass - Input: Bosphorus 4K Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.21 0.21 0.21 0.21 0.20 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 4K Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 1.1138 2.2276 3.3414 4.4552 5.569 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 4.95 4.87 4.84 4.84 4.78 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 4K Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.11, N = 3 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 17.13 17.22 17.03 17.37 16.29 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4K Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.11, N = 6 9.14 8.99 9.10 9.41 9.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 8 16 24 32 40 SE +/- 0.22, N = 3 SE +/- 0.48, N = 3 SE +/- 0.19, N = 3 SE +/- 0.47, N = 3 SE +/- 0.12, N = 3 33.14 33.39 35.26 35.26 34.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4K Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 9 18 27 36 45 SE +/- 0.31, N = 3 SE +/- 0.43, N = 3 SE +/- 0.19, N = 3 SE +/- 0.29, N = 3 SE +/- 0.38, N = 3 37.28 38.11 39.32 39.71 39.12 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 0 Two-Pass - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 0 Two-Pass - Input: Bosphorus 1080p Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.1193 0.2386 0.3579 0.4772 0.5965 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.53 0.53 0.52 0.52 0.50 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 1080p Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 7.20 7.10 6.87 6.95 6.69 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 1080p Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 6 12 18 24 30 SE +/- 0.13, N = 3 SE +/- 0.27, N = 3 SE +/- 0.25, N = 3 SE +/- 0.28, N = 3 SE +/- 0.13, N = 3 26.61 26.85 26.49 27.01 24.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 1080p Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 5 10 15 20 25 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 SE +/- 0.11, N = 3 22.00 22.13 21.64 22.11 21.42 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.51, N = 3 SE +/- 1.07, N = 3 SE +/- 0.65, N = 3 SE +/- 0.47, N = 3 SE +/- 0.89, N = 3 86.09 88.78 93.05 94.40 91.97 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.53, N = 3 SE +/- 0.31, N = 3 SE +/- 1.76, N = 3 SE +/- 1.15, N = 8 SE +/- 1.10, N = 8 100.55 103.17 107.46 111.27 106.55 1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
SVT-AV1 Encoder Mode: Enc Mode 0 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.0412 0.0824 0.1236 0.1648 0.206 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.183 0.181 0.183 0.169 0.176 0.129 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 4 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 3 6 9 12 15 SE +/- 0.189, N = 3 SE +/- 0.164, N = 4 SE +/- 0.170, N = 3 SE +/- 0.111, N = 9 SE +/- 0.139, N = 3 SE +/- 0.086, N = 3 11.690 11.821 11.474 11.230 11.905 9.325 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 8 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.46, N = 3 SE +/- 0.10, N = 3 SE +/- 1.05, N = 3 SE +/- 0.18, N = 3 SE +/- 0.83, N = 3 116.49 117.39 118.07 109.70 110.70 92.98 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-HEVC Tuning: 1 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 9 18 27 36 45 SE +/- 0.08, N = 3 SE +/- 0.09, N = 3 SE +/- 0.17, N = 3 SE +/- 0.05, N = 3 SE +/- 0.17, N = 3 SE +/- 0.18, N = 3 40.95 41.01 41.09 39.03 38.86 38.41 1. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
SVT-HEVC Tuning: 7 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 80 160 240 320 400 SE +/- 1.09, N = 3 SE +/- 3.43, N = 3 SE +/- 1.56, N = 3 SE +/- 1.54, N = 3 SE +/- 1.51, N = 3 SE +/- 1.20, N = 3 343.85 346.89 345.30 330.53 329.32 322.42 1. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
SVT-HEVC Tuning: 10 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 140 280 420 560 700 SE +/- 3.03, N = 3 SE +/- 5.55, N = 3 SE +/- 3.01, N = 3 SE +/- 2.42, N = 3 SE +/- 5.75, N = 3 SE +/- 3.83, N = 3 638.10 652.74 643.58 615.62 611.73 605.50 1. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 Tuning: VMAF Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 110 220 330 440 550 SE +/- 2.67, N = 3 SE +/- 0.23, N = 3 SE +/- 1.37, N = 3 SE +/- 0.24, N = 3 SE +/- 1.15, N = 3 SE +/- 0.82, N = 3 476.95 481.05 487.43 472.61 472.32 463.12 1. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 110 220 330 440 550 SE +/- 1.94, N = 3 SE +/- 1.76, N = 3 SE +/- 0.73, N = 3 SE +/- 2.08, N = 3 SE +/- 1.13, N = 3 SE +/- 0.32, N = 3 478.62 482.02 488.23 477.67 478.16 464.57 1. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
SVT-VP9 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 80 160 240 320 400 SE +/- 2.72, N = 3 SE +/- 1.91, N = 3 SE +/- 1.11, N = 3 SE +/- 0.47, N = 3 SE +/- 0.70, N = 3 SE +/- 3.83, N = 3 373.89 373.99 372.49 364.12 366.39 354.21 1. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 7 14 21 28 35 SE +/- 0.13, N = 3 SE +/- 0.25, N = 3 SE +/- 0.23, N = 3 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 30.44 29.94 30.32 28.60 28.79 28.91 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 16 32 48 64 80 SE +/- 0.63, N = 3 SE +/- 0.49, N = 3 SE +/- 0.49, N = 3 SE +/- 0.32, N = 3 SE +/- 0.56, N = 3 SE +/- 0.26, N = 3 73.51 73.36 74.00 72.60 71.79 72.14 1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.6233 1.2466 1.8699 2.4932 3.1165 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.73 2.68 2.75 2.77 2.75 1. (CXX) g++ options: -O3 -march=native -pthread
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.2115 0.423 0.6345 0.846 1.0575 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.82 0.81 0.84 0.90 0.94 1. (CXX) g++ options: -O3 -march=native -pthread
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 1.035 2.07 3.105 4.14 5.175 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.33 4.41 4.60 4.02 3.93 1. (CXX) g++ options: -O3 -march=native -pthread
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 1.0395 2.079 3.1185 4.158 5.1975 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 4.47 4.41 4.62 4.13 3.98 1. (CXX) g++ options: -O3 -march=native -pthread
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 300 600 900 1200 1500 SE +/- 32.29, N = 12 SE +/- 36.50, N = 15 SE +/- 15.30, N = 12 SE +/- 101.07, N = 12 SE +/- 25.34, N = 15 SE +/- 25.85, N = 15 531.00 495.00 471.00 1065.60 1210.00 1217.00 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 300 600 900 1200 1500 SE +/- 26.90, N = 12 SE +/- 34.43, N = 15 SE +/- 15.69, N = 12 SE +/- 132.58, N = 12 SE +/- 62.40, N = 15 SE +/- 2.85, N = 15 326.0 412.0 357.0 1350.0 1496.0 813.0 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 140 280 420 560 700 SE +/- 37.59, N = 12 SE +/- 38.96, N = 15 SE +/- 35.24, N = 12 SE +/- 53.43, N = 12 SE +/- 2.60, N = 15 SE +/- 0.80, N = 15 477.00 462.00 434.00 592.97 649.00 636.00 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 400 800 1200 1600 2000 SE +/- 9.88, N = 12 SE +/- 8.32, N = 15 SE +/- 15.32, N = 11 SE +/- 131.59, N = 12 SE +/- 2.67, N = 15 SE +/- 9.19, N = 15 1944.0 1877.0 604.0 1461.2 1599.0 1587.0 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 500 1000 1500 2000 2500 SE +/- 3.59, N = 12 SE +/- 1.59, N = 15 SE +/- 20.06, N = 12 SE +/- 194.35, N = 12 SE +/- 2.74, N = 15 SE +/- 2.06, N = 15 1017.0 1043.0 878.0 2158.4 2359.0 1521.0 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 300 600 900 1200 1500 SE +/- 2.61, N = 12 SE +/- 1.49, N = 15 SE +/- 17.06, N = 12 SE +/- 95.41, N = 12 SE +/- 1.87, N = 15 SE +/- 1.59, N = 15 1165.00 933.00 819.00 1056.42 1153.00 1133.00 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 15 30 45 60 75 SE +/- 3.44, N = 12 SE +/- 3.65, N = 15 SE +/- 2.22, N = 12 SE +/- 5.30, N = 12 SE +/- 3.83, N = 15 SE +/- 4.17, N = 15 55.2 51.2 69.1 56.2 63.9 65.0 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 200 400 600 800 1000 SE +/- 1.94, N = 12 SE +/- 1.41, N = 14 SE +/- 4.04, N = 12 SE +/- 66.49, N = 12 SE +/- 2.88, N = 15 SE +/- 2.10, N = 14 783.0 677.0 626.0 741.4 794.0 798.0 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.04, N = 12 SE +/- 0.06, N = 15 SE +/- 0.05, N = 12 SE +/- 1.05, N = 12 SE +/- 0.29, N = 15 SE +/- 0.16, N = 15 84.0 83.6 48.6 98.7 100.5 98.5 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.07, N = 12 SE +/- 0.03, N = 15 SE +/- 0.56, N = 12 SE +/- 0.59, N = 12 SE +/- 0.08, N = 15 SE +/- 0.07, N = 15 78.8 79.3 65.7 94.4 95.0 95.3 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.05, N = 12 SE +/- 0.02, N = 15 SE +/- 0.09, N = 12 SE +/- 0.62, N = 12 SE +/- 0.08, N = 15 90.0 88.3 51.9 104.0 104.0 100.9 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.08, N = 12 SE +/- 0.02, N = 14 SE +/- 0.07, N = 12 SE +/- 0.60, N = 12 SE +/- 0.05, N = 15 SE +/- 0.05, N = 15 84.4 84.0 73.0 98.5 99.3 97.9 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -rdynamic -lOpenCL
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 80 160 240 320 400 SE +/- 2.50, N = 3 SE +/- 1.42, N = 3 SE +/- 4.15, N = 4 SE +/- 0.17, N = 3 SE +/- 0.50, N = 3 386 346 333 351 351 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 110 220 330 440 550 SE +/- 10.39, N = 12 SE +/- 5.55, N = 3 SE +/- 10.30, N = 12 SE +/- 0.87, N = 3 SE +/- 4.64, N = 12 459 471 498 505 495 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 30 60 90 120 150 SE +/- 0.50, N = 3 SE +/- 0.29, N = 3 SE +/- 0.50, N = 3 SE +/- 0.17, N = 3 SE +/- 0.44, N = 3 122 108 112 115 116 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 2K 4K 6K 8K 10K SE +/- 171.77, N = 3 SE +/- 102.76, N = 8 SE +/- 88.25, N = 12 SE +/- 7.52, N = 3 SE +/- 138.76, N = 3 11325 9797 9904 10197 9419 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 1200 2400 3600 4800 6000 SE +/- 174.98, N = 12 SE +/- 169.87, N = 9 SE +/- 126.29, N = 12 SE +/- 17.50, N = 3 SE +/- 2.40, N = 3 4383 4523 4456 5559 5183 -fopenmp=libomp -fopenmp=libomp -fopenmp=libomp -fopenmp -fopenmp 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -ldl -lrt
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Swirl AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 500 1000 1500 2000 2500 SE +/- 4.63, N = 3 SE +/- 12.41, N = 3 SE +/- 6.57, N = 3 SE +/- 1.20, N = 3 SE +/- 4.81, N = 3 SE +/- 1.20, N = 3 1929 1915 1993 2112 2161 2129 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Rotate AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 150 300 450 600 750 SE +/- 1.33, N = 3 SE +/- 2.60, N = 3 SE +/- 5.21, N = 3 SE +/- 6.43, N = 3 660 665 712 689 694 709 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Sharpen AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 200 400 600 800 1000 SE +/- 0.58, N = 3 SE +/- 2.03, N = 3 617 613 614 807 809 806 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Enhanced AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 300 600 900 1200 1500 SE +/- 1.53, N = 3 SE +/- 1.86, N = 3 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 SE +/- 1.53, N = 3 1057 1068 1076 1039 1082 1217 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Resizing AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 500 1000 1500 2000 2500 SE +/- 52.84, N = 15 SE +/- 27.29, N = 3 SE +/- 41.63, N = 12 SE +/- 14.93, N = 3 SE +/- 17.34, N = 3 SE +/- 18.77, N = 3 1866 2034 2136 1208 1188 1238 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Noise-Gaussian OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 120 240 360 480 600 SE +/- 0.33, N = 3 SE +/- 1.00, N = 3 SE +/- 1.00, N = 3 466 463 457 544 550 547 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: HWB Color Space AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 200 400 600 800 1000 SE +/- 1.20, N = 3 SE +/- 0.88, N = 3 SE +/- 0.67, N = 3 SE +/- 0.67, N = 3 614 616 605 772 771 785 1. (CC) gcc options: -fopenmp -O3 -march=native -pthread -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 500K 1000K 1500K 2000K 2500K SE +/- 3670.84, N = 3 SE +/- 971.31, N = 3 SE +/- 984.68, N = 3 SE +/- 2170.85, N = 3 SE +/- 5755.65, N = 3 SE +/- 4791.32, N = 3 1720060.44 1790837.01 1785466.28 2110880.43 2176407.67 2086609.98 1. (CC) gcc options: -O2 -O3 -march=native -lrt" -lrt
SecureMark Benchmark: SecureMark-TLS OpenBenchmarking.org marks, More Is Better SecureMark 1.0.4 Benchmark: SecureMark-TLS AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 60K 120K 180K 240K 300K SE +/- 251.99, N = 3 SE +/- 407.86, N = 3 SE +/- 1778.47, N = 3 SE +/- 1024.96, N = 3 SE +/- 675.55, N = 3 SE +/- 537.86, N = 3 264637 260119 265204 242700 243861 238935 1. (CC) gcc options: -pedantic -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Compression Speed AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 11.0.1 GCC 9.3 12 24 36 48 60 SE +/- 0.48, N = 3 SE +/- 0.33, N = 3 SE +/- 0.80, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.73, N = 4 SE +/- 0.77, N = 4 53.77 52.35 52.07 50.93 52.87 51.32 53.83 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 3 - Decompression Speed AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 11.0.1 GCC 9.3 3K 6K 9K 12K 15K SE +/- 73.30, N = 3 SE +/- 15.91, N = 3 SE +/- 71.01, N = 3 SE +/- 60.82, N = 3 SE +/- 42.32, N = 3 SE +/- 34.44, N = 4 SE +/- 37.19, N = 4 13562.5 13840.3 13911.5 13715.0 13906.1 13882.2 13793.4 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Compression Speed AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 11.0.1 GCC 9.3 12 24 36 48 60 SE +/- 0.26, N = 3 SE +/- 0.46, N = 3 SE +/- 0.42, N = 3 SE +/- 0.74, N = 3 SE +/- 0.72, N = 4 SE +/- 0.65, N = 3 SE +/- 0.65, N = 5 50.32 49.01 48.50 48.47 52.36 51.17 51.97 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.3 Compression Level: 9 - Decompression Speed AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 11.0.1 GCC 9.3 3K 6K 9K 12K 15K SE +/- 33.89, N = 3 SE +/- 23.21, N = 3 SE +/- 65.90, N = 3 SE +/- 46.50, N = 3 SE +/- 6.60, N = 4 SE +/- 62.74, N = 3 SE +/- 17.75, N = 5 13561.5 13927.9 13926.5 13698.7 13806.6 13857.4 13895.3 1. (CC) gcc options: -O3
QuantLib OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.21 AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 600 1200 1800 2400 3000 SE +/- 2.28, N = 3 SE +/- 1.01, N = 3 SE +/- 1.92, N = 3 SE +/- 1.62, N = 3 SE +/- 2.06, N = 3 SE +/- 4.53, N = 3 2725.7 2640.2 2653.8 2657.8 2392.6 2338.9 1. (CXX) g++ options: -O3 -march=native -rdynamic
FFTW Build: Stock - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 3K 6K 9K 12K 15K SE +/- 41.35, N = 3 SE +/- 20.33, N = 3 SE +/- 24.25, N = 3 SE +/- 16.05, N = 3 SE +/- 45.16, N = 3 SE +/- 67.28, N = 3 13192 13324 13333 12576 12765 14399 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 1D FFT Size 1024 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 1024 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 3K 6K 9K 12K 15K SE +/- 34.64, N = 3 SE +/- 35.53, N = 3 SE +/- 27.10, N = 3 SE +/- 32.26, N = 3 SE +/- 189.35, N = 3 SE +/- 44.20, N = 3 10669 10564 10805 11319 11044 11689 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 1D FFT Size 2048 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 2048 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2K 4K 6K 8K 10K SE +/- 39.89, N = 3 SE +/- 28.76, N = 3 SE +/- 7.75, N = 3 SE +/- 14.75, N = 3 SE +/- 55.19, N = 3 SE +/- 37.69, N = 3 10227.0 10004.2 10467.0 10711.0 10675.0 11053.0 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 4096 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2K 4K 6K 8K 10K SE +/- 43.38, N = 3 SE +/- 15.16, N = 3 SE +/- 101.36, N = 3 SE +/- 57.26, N = 3 SE +/- 48.56, N = 3 SE +/- 20.21, N = 3 9603.2 9438.6 9862.0 10179.0 10205.0 10548.0 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 2D FFT Size 1024 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 1024 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2K 4K 6K 8K 10K SE +/- 14.28, N = 3 SE +/- 45.95, N = 3 SE +/- 48.25, N = 3 SE +/- 41.68, N = 3 SE +/- 25.87, N = 3 SE +/- 19.46, N = 3 8902.1 8809.6 9088.3 9247.3 9238.4 9798.6 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 2D FFT Size 2048 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 2048 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2K 4K 6K 8K 10K SE +/- 19.99, N = 3 SE +/- 27.38, N = 3 SE +/- 65.76, N = 3 SE +/- 56.49, N = 3 SE +/- 36.00, N = 3 SE +/- 50.36, N = 3 7784.8 7878.5 7789.9 8134.5 8231.1 8408.5 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 1500 3000 4500 6000 7500 SE +/- 65.81, N = 3 SE +/- 60.67, N = 3 SE +/- 35.20, N = 3 SE +/- 25.90, N = 3 SE +/- 23.67, N = 3 SE +/- 30.40, N = 3 6875.3 6823.8 6744.1 6974.0 6948.2 7007.3 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 4K 8K 12K 16K 20K SE +/- 5.33, N = 3 SE +/- 129.55, N = 3 SE +/- 48.79, N = 3 SE +/- 108.41, N = 3 SE +/- 168.99, N = 3 SE +/- 170.19, N = 8 16146 14590 15649 16650 16590 16590 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 1024 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 1024 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 11K 22K 33K 44K 55K SE +/- 621.84, N = 15 SE +/- 585.78, N = 3 SE +/- 952.64, N = 12 SE +/- 439.64, N = 15 SE +/- 568.96, N = 3 SE +/- 788.42, N = 3 49685 50740 50350 52054 51706 53275 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 2048 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 2048 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 12K 24K 36K 48K 60K SE +/- 756.91, N = 3 SE +/- 582.34, N = 3 SE +/- 439.50, N = 3 SE +/- 743.81, N = 3 SE +/- 156.75, N = 3 SE +/- 725.00, N = 3 44412 50084 51254 53497 54710 52749 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 4096 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 11K 22K 33K 44K 55K SE +/- 542.47, N = 15 SE +/- 413.24, N = 15 SE +/- 671.66, N = 15 SE +/- 844.19, N = 3 SE +/- 227.13, N = 3 SE +/- 228.68, N = 3 45521 46676 45428 52130 51391 52099 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 1024 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 1024 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 8K 16K 24K 32K 40K SE +/- 455.21, N = 12 SE +/- 530.09, N = 4 SE +/- 165.99, N = 3 SE +/- 301.69, N = 3 SE +/- 442.82, N = 3 SE +/- 79.87, N = 3 36100 36181 36239 35973 35718 36321 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 2048 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 2048 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 7K 14K 21K 28K 35K SE +/- 378.89, N = 6 SE +/- 146.10, N = 3 SE +/- 77.17, N = 3 SE +/- 14.99, N = 3 SE +/- 209.56, N = 3 SE +/- 37.37, N = 3 31013 31741 31935 32061 31662 31341 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 5K 10K 15K 20K 25K SE +/- 349.17, N = 9 SE +/- 220.77, N = 3 SE +/- 348.10, N = 9 SE +/- 538.47, N = 9 SE +/- 160.97, N = 3 SE +/- 106.49, N = 3 23111 22913 22797 23774 24888 25068 1. (CC) gcc options: -pthread -O3 -march=native -lm
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 700 1400 2100 2800 3500 SE +/- 1.29, N = 3 SE +/- 15.12, N = 3 SE +/- 1.11, N = 3 SE +/- 6.50, N = 3 SE +/- 5.19, N = 3 SE +/- 5.86, N = 3 3298.29 3319.34 3190.62 3235.94 3182.35 3229.22 1. (CC) gcc options: -O3 -march=native -lm
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 150 300 450 600 750 SE +/- 0.18, N = 3 SE +/- 0.40, N = 3 SE +/- 0.40, N = 3 SE +/- 1.71, N = 3 SE +/- 0.29, N = 3 SE +/- 0.14, N = 3 690.94 674.86 675.13 682.87 647.82 668.10 1. (CC) gcc options: -O3 -march=native -lm
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 90 180 270 360 450 SE +/- 0.70, N = 3 SE +/- 0.67, N = 3 SE +/- 0.46, N = 3 SE +/- 0.25, N = 3 SE +/- 1.03, N = 3 SE +/- 0.66, N = 3 398.96 399.16 363.85 388.98 388.88 384.03 1. (CC) gcc options: -O3 -march=native -lm
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 1000 2000 3000 4000 5000 SE +/- 5.98, N = 3 SE +/- 3.87, N = 3 SE +/- 10.41, N = 3 SE +/- 0.86, N = 3 SE +/- 0.39, N = 3 SE +/- 1.69, N = 3 4594.27 4590.37 4280.22 3820.77 3462.66 3765.88 1. (CC) gcc options: -O3 -march=native -lm
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2K 4K 6K 8K 10K SE +/- 0.22, N = 3 SE +/- 77.81, N = 3 SE +/- 7.16, N = 3 SE +/- 33.93, N = 3 SE +/- 25.06, N = 3 SE +/- 28.39, N = 3 9021.83 9146.88 8848.40 9248.89 9263.55 9178.97 1. (CC) gcc options: -O3 -march=native -lm
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 500 1000 1500 2000 2500 SE +/- 0.02, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 SE +/- 0.12, N = 3 1785.45 1785.42 1785.50 2038.15 2148.84 2149.15 1. (CC) gcc options: -O3 -march=native -lm
Botan Test: KASUMI OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 20 40 60 80 100 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 82.83 79.15 82.64 79.12 84.86 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: KASUMI - Decrypt OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI - Decrypt AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 20 40 60 80 100 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 82.95 80.22 84.23 81.45 84.13 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: AES-256 OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 1200 2400 3600 4800 6000 SE +/- 0.05, N = 3 SE +/- 2.16, N = 3 SE +/- 2.14, N = 3 SE +/- 4.47, N = 3 SE +/- 42.69, N = 3 4891.07 4901.13 4659.34 5525.71 5484.68 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: AES-256 - Decrypt OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 - Decrypt AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 1200 2400 3600 4800 6000 SE +/- 3.70, N = 3 SE +/- 1.35, N = 3 SE +/- 4.78, N = 3 SE +/- 5.42, N = 3 SE +/- 11.31, N = 3 4887.57 4895.56 4682.46 5529.40 5391.99 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: Twofish OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.13, N = 3 SE +/- 0.52, N = 3 SE +/- 0.04, N = 3 305.00 299.21 315.41 341.85 337.36 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: Twofish - Decrypt OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish - Decrypt AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.15, N = 3 SE +/- 0.16, N = 3 SE +/- 0.44, N = 3 SE +/- 0.04, N = 3 303.81 302.41 321.19 325.39 339.07 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: Blowfish OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 90 180 270 360 450 SE +/- 1.14, N = 3 SE +/- 1.73, N = 3 SE +/- 0.05, N = 3 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 319.79 319.23 380.05 422.14 412.85 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: Blowfish - Decrypt OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish - Decrypt AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 90 180 270 360 450 SE +/- 1.17, N = 3 SE +/- 2.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.95, N = 3 SE +/- 0.12, N = 3 355.06 351.08 351.28 420.85 412.07 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: CAST-256 OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 30 60 90 120 150 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.33, N = 3 SE +/- 0.09, N = 3 127.77 128.59 132.82 127.74 127.30 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: CAST-256 - Decrypt OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 - Decrypt AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.32, N = 3 SE +/- 0.08, N = 3 128.01 127.74 133.05 127.78 127.34 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: ChaCha20Poly1305 OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 200 400 600 800 1000 SE +/- 3.15, N = 3 SE +/- 0.62, N = 3 SE +/- 4.85, N = 3 SE +/- 0.28, N = 3 SE +/- 0.13, N = 3 845.14 848.24 850.50 485.02 616.10 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
Botan Test: ChaCha20Poly1305 - Decrypt OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 - Decrypt AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 200 400 600 800 1000 SE +/- 3.17, N = 3 SE +/- 0.16, N = 3 SE +/- 4.64, N = 3 SE +/- 0.02, N = 3 SE +/- 0.40, N = 3 838.09 840.64 843.40 476.18 611.98 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
JPEG XL Input: PNG - Encode Speed: 5 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 5 AMD AOCC 3.0 Clang 11.0 Clang 12.0 20 40 60 80 100 SE +/- 0.41, N = 3 SE +/- 0.24, N = 3 SE +/- 0.17, N = 3 79.23 78.41 74.27 1. (CXX) g++ options: -O3 -march=native -funwind-tables -Xclang -mrelax-all -O2 -fPIE -pie -pthread -ldl
JPEG XL Input: PNG - Encode Speed: 7 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 7 AMD AOCC 3.0 Clang 11.0 Clang 12.0 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 11.37 12.01 12.15 1. (CXX) g++ options: -O3 -march=native -funwind-tables -Xclang -mrelax-all -O2 -fPIE -pie -pthread -ldl
JPEG XL Input: PNG - Encode Speed: 8 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 8 AMD AOCC 3.0 Clang 11.0 Clang 12.0 0.1845 0.369 0.5535 0.738 0.9225 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.81 0.80 0.82 1. (CXX) g++ options: -O3 -march=native -funwind-tables -Xclang -mrelax-all -O2 -fPIE -pie -pthread -ldl
JPEG XL Input: JPEG - Encode Speed: 5 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 5 AMD AOCC 3.0 Clang 11.0 Clang 12.0 15 30 45 60 75 SE +/- 0.17, N = 3 SE +/- 0.20, N = 3 SE +/- 0.14, N = 3 65.57 65.58 66.66 1. (CXX) g++ options: -O3 -march=native -funwind-tables -Xclang -mrelax-all -O2 -fPIE -pie -pthread -ldl
JPEG XL Input: JPEG - Encode Speed: 7 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 7 AMD AOCC 3.0 Clang 11.0 Clang 12.0 15 30 45 60 75 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 SE +/- 0.16, N = 3 65.68 65.43 66.38 1. (CXX) g++ options: -O3 -march=native -funwind-tables -Xclang -mrelax-all -O2 -fPIE -pie -pthread -ldl
JPEG XL Input: JPEG - Encode Speed: 8 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 8 AMD AOCC 3.0 Clang 11.0 Clang 12.0 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 27.29 27.24 28.13 1. (CXX) g++ options: -O3 -march=native -funwind-tables -Xclang -mrelax-all -O2 -fPIE -pie -pthread -ldl
LibRaw Post-Processing Benchmark OpenBenchmarking.org Mpix/sec, More Is Better LibRaw 0.20 Post-Processing Benchmark AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 13 26 39 52 65 SE +/- 0.04, N = 3 SE +/- 0.33, N = 3 SE +/- 0.12, N = 3 SE +/- 0.23, N = 3 SE +/- 0.16, N = 3 SE +/- 0.19, N = 3 41.64 38.71 41.78 58.90 57.24 60.20 1. (CXX) g++ options: -O3 -march=native -fopenmp -ljpeg -lz -lm
Etcpak Configuration: DXT1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: DXT1 AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 600 1200 1800 2400 3000 SE +/- 8.09, N = 3 SE +/- 1.69, N = 3 SE +/- 2.64, N = 3 SE +/- 6.09, N = 3 SE +/- 0.48, N = 3 SE +/- 0.16, N = 3 2654.72 1872.76 2718.53 2719.99 1114.60 1082.37 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC1 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC1 AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 60 120 180 240 300 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 211.73 205.07 284.64 284.76 281.15 269.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
Etcpak Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 0.7 Configuration: ETC2 AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 40 80 120 160 200 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.09, N = 3 178.85 168.82 202.09 202.10 173.23 174.81 1. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 400K 800K 1200K 1600K 2000K SE +/- 2098.00, N = 5 SE +/- 2852.59, N = 5 SE +/- 1798.40, N = 5 SE +/- 956.77, N = 5 SE +/- 1626.80, N = 5 SE +/- 760.80, N = 5 1697846 1638265 1570966 1467179 1494250 1446372 1. (CC) gcc options: -O3 -march=native
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 13M 26M 39M 52M 65M SE +/- 47026.00, N = 3 SE +/- 40360.87, N = 3 SE +/- 790005.27, N = 3 SE +/- 6887.99, N = 3 SE +/- 318169.94, N = 3 SE +/- 870702.21, N = 3 57411333 56307000 55663000 62467333 60886333 61404000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 32 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 400M 800M 1200M 1600M 2000M SE +/- 2130988.29, N = 3 SE +/- 1331665.62, N = 3 SE +/- 2255610.29, N = 3 SE +/- 15763988.50, N = 3 SE +/- 17297784.06, N = 3 SE +/- 4864497.23, N = 3 1609633333 1578400000 1564833333 1718000000 1679800000 1721900000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 64 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 700M 1400M 2100M 2800M 3500M SE +/- 1234233.91, N = 3 SE +/- 2452436.43, N = 3 SE +/- 6045475.81, N = 3 SE +/- 4643753.27, N = 3 SE +/- 1154700.54, N = 3 SE +/- 2961043.36, N = 3 3100400000 3051366667 3070633333 2942866667 2989400000 2940466667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 128 - Buffer Length: 256 - Filter Length: 57 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 800M 1600M 2400M 3200M 4000M SE +/- 1543084.93, N = 3 SE +/- 1559202.08, N = 3 SE +/- 883804.91, N = 3 SE +/- 1679616.36, N = 3 SE +/- 6016181.88, N = 3 SE +/- 3384441.53, N = 3 3606466667 3596533333 3643766667 3005033333 3055766667 3012066667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
PostgreSQL pgbench Scaling Factor: 100 - Clients: 1 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 1 - Mode: Read Only Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 5K 10K 15K 20K 25K SE +/- 289.16, N = 3 SE +/- 303.43, N = 3 SE +/- 118.05, N = 3 SE +/- 281.76, N = 15 SE +/- 41.57, N = 3 24943 24310 24845 23661 23895 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 1 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 1 - Mode: Read Write Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 700 1400 2100 2800 3500 SE +/- 14.62, N = 3 SE +/- 3.48, N = 3 SE +/- 11.40, N = 3 SE +/- 28.00, N = 3 SE +/- 4.79, N = 3 3312 3281 3369 3383 3298 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 100 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 100 - Mode: Read Only Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 200K 400K 600K 800K 1000K SE +/- 1740.88, N = 3 SE +/- 720.87, N = 3 SE +/- 183.22, N = 3 SE +/- 1514.63, N = 3 SE +/- 1623.23, N = 3 1069367 1069022 1076357 1090824 1057125 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 250 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Only Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 200K 400K 600K 800K 1000K SE +/- 13844.42, N = 3 SE +/- 6289.60, N = 3 SE +/- 8859.63, N = 3 SE +/- 8885.95, N = 3 SE +/- 8843.08, N = 3 1065506 1071209 1089731 1090160 1067486 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 100 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 100 - Mode: Read Write Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 13K 26K 39K 52K 65K SE +/- 400.92, N = 3 SE +/- 162.92, N = 3 SE +/- 469.64, N = 3 SE +/- 899.82, N = 3 SE +/- 994.44, N = 3 61616 62319 58894 56369 59364 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 250 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Write Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 12K 24K 36K 48K 60K SE +/- 883.12, N = 3 SE +/- 702.52, N = 15 SE +/- 591.89, N = 7 SE +/- 211.73, N = 3 SE +/- 396.40, N = 3 54488 56684 53019 53102 53825 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
WebP Image Encode Encode Settings: Default OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Default AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.3143 0.6286 0.9429 1.2572 1.5715 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 1.351 1.336 1.331 1.372 1.386 1.397 -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16
WebP Image Encode Encode Settings: Quality 100 OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.5117 1.0234 1.5351 2.0468 2.5585 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.010, N = 3 SE +/- 0.007, N = 3 SE +/- 0.005, N = 3 2.262 2.240 2.199 2.225 2.274 2.273 -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16
WebP Image Encode Encode Settings: Quality 100, Lossless OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.13, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 19.13 18.57 19.02 18.88 18.31 19.30 -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16
WebP Image Encode Encode Settings: Quality 100, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Highest Compression AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2 4 6 8 10 SE +/- 0.009, N = 3 SE +/- 0.018, N = 3 SE +/- 0.004, N = 3 SE +/- 0.006, N = 3 SE +/- 0.021, N = 3 SE +/- 0.009, N = 3 6.578 6.243 6.309 7.078 7.003 7.053 -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16
WebP Image Encode Encode Settings: Quality 100, Lossless, Highest Compression OpenBenchmarking.org Encode Time - Seconds, Fewer Is Better WebP Image Encode 1.1 Encode Settings: Quality 100, Lossless, Highest Compression AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 9 18 27 36 45 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 38.34 37.73 38.45 38.55 37.95 39.07 -ltiff 1. (CC) gcc options: -fvisibility=hidden -O3 -march=native -pthread -lm -ljpeg -lpng16
toyBrot Fractal Generator Implementation: TBB OpenBenchmarking.org ms, Fewer Is Better toyBrot Fractal Generator 2020-11-18 Implementation: TBB AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 1500 3000 4500 6000 7500 SE +/- 52.54, N = 3 SE +/- 67.11, N = 7 SE +/- 87.21, N = 3 SE +/- 86.43, N = 3 SE +/- 67.68, N = 3 SE +/- 74.84, N = 3 6945 6247 6780 7085 5181 5107 -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc -flto -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc 1. (CXX) g++ options: -O3 -march=native -lpthread
toyBrot Fractal Generator Implementation: OpenMP OpenBenchmarking.org ms, Fewer Is Better toyBrot Fractal Generator 2020-11-18 Implementation: OpenMP AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 1600 3200 4800 6400 8000 SE +/- 22.73, N = 3 SE +/- 20.42, N = 3 SE +/- 14.89, N = 3 SE +/- 2.60, N = 3 SE +/- 3.18, N = 3 7477 7029 7507 5524 5451 1. (CXX) g++ options: -O3 -march=native -lpthread -lm -lgcc -lgcc_s -lc
toyBrot Fractal Generator Implementation: C++ Tasks OpenBenchmarking.org ms, Fewer Is Better toyBrot Fractal Generator 2020-11-18 Implementation: C++ Tasks AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 1600 3200 4800 6400 8000 SE +/- 41.46, N = 3 SE +/- 7.31, N = 3 SE +/- 33.67, N = 3 SE +/- 17.21, N = 3 SE +/- 31.52, N = 3 SE +/- 49.08, N = 3 7189 6836 7437 7367 5610 5414 -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc -flto -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc 1. (CXX) g++ options: -O3 -march=native -lpthread
toyBrot Fractal Generator Implementation: C++ Threads OpenBenchmarking.org ms, Fewer Is Better toyBrot Fractal Generator 2020-11-18 Implementation: C++ Threads AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 9.3 1500 3000 4500 6000 7500 SE +/- 24.26, N = 3 SE +/- 25.04, N = 3 SE +/- 30.90, N = 3 SE +/- 15.06, N = 3 SE +/- 6.12, N = 3 SE +/- 8.33, N = 3 7144 6395 7220 7143 5383 5142 -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc -flto -lm -lgcc -lgcc_s -lc -lm -lgcc -lgcc_s -lc 1. (CXX) g++ options: -O3 -march=native -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.2653 0.5306 0.7959 1.0612 1.3265 SE +/- 0.00160, N = 3 SE +/- 0.00127, N = 3 SE +/- 0.00199, N = 3 SE +/- 0.00296, N = 3 SE +/- 0.00349, N = 3 1.03899 1.08011 1.07701 1.17894 1.17486 -fopenmp=libomp - MIN: 0.99 -fopenmp=libomp - MIN: 1.03 -fopenmp=libomp - MIN: 1.04 -fopenmp - MIN: 1.12 -fopenmp - MIN: 1.12 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.8264 1.6528 2.4792 3.3056 4.132 SE +/- 0.02018, N = 3 SE +/- 0.04735, N = 3 SE +/- 0.01639, N = 3 SE +/- 0.02637, N = 3 SE +/- 0.03246, N = 3 3.41583 3.52787 3.28507 3.61144 3.67278 -fopenmp=libomp - MIN: 3.24 -fopenmp=libomp - MIN: 3.29 -fopenmp=libomp - MIN: 3.15 -fopenmp - MIN: 3.37 -fopenmp - MIN: 3.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.2694 0.5388 0.8082 1.0776 1.347 SE +/- 0.00668, N = 3 SE +/- 0.00395, N = 3 SE +/- 0.00286, N = 3 SE +/- 0.00438, N = 3 SE +/- 0.00597, N = 3 1.04484 1.07577 1.07507 1.19747 1.17434 -fopenmp=libomp - MIN: 0.83 -fopenmp=libomp - MIN: 0.86 -fopenmp=libomp - MIN: 0.87 -fopenmp - MIN: 0.98 -fopenmp - MIN: 0.96 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.1598 0.3196 0.4794 0.6392 0.799 SE +/- 0.000764, N = 3 SE +/- 0.008914, N = 3 SE +/- 0.011383, N = 3 SE +/- 0.003112, N = 3 SE +/- 0.003317, N = 3 0.554231 0.594729 0.710124 0.646252 0.654010 -fopenmp=libomp - MIN: 0.5 -fopenmp=libomp - MIN: 0.53 -fopenmp=libomp - MIN: 0.64 -fopenmp - MIN: 0.6 -fopenmp - MIN: 0.59 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.2748 0.5496 0.8244 1.0992 1.374 SE +/- 0.000645, N = 3 SE +/- 0.000480, N = 3 SE +/- 0.018279, N = 4 SE +/- 0.001247, N = 3 SE +/- 0.001032, N = 3 0.833921 0.841169 1.221320 0.870784 0.869308 -fopenmp=libomp - MIN: 0.81 -fopenmp=libomp - MIN: 0.82 -fopenmp=libomp - MIN: 1.13 -fopenmp - MIN: 0.84 -fopenmp - MIN: 0.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 2 4 6 8 10 SE +/- 0.00485, N = 3 SE +/- 0.00568, N = 3 SE +/- 0.00123, N = 3 SE +/- 0.03687, N = 3 SE +/- 0.02683, N = 3 1.37059 1.45757 1.44425 7.23686 7.19213 -fopenmp=libomp - MIN: 1.28 -fopenmp=libomp - MIN: 1.35 -fopenmp=libomp - MIN: 1.34 -fopenmp - MIN: 6.18 -fopenmp - MIN: 6.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.6758 1.3516 2.0274 2.7032 3.379 SE +/- 0.00564, N = 3 SE +/- 0.02389, N = 3 SE +/- 0.02100, N = 3 SE +/- 0.00883, N = 3 SE +/- 0.00845, N = 3 2.28755 2.31859 2.36797 3.00341 2.99759 -fopenmp=libomp - MIN: 1.91 -fopenmp=libomp - MIN: 1.92 -fopenmp=libomp - MIN: 2.01 -fopenmp - MIN: 2.35 -fopenmp - MIN: 2.24 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.4581 0.9162 1.3743 1.8324 2.2905 SE +/- 0.00195, N = 3 SE +/- 0.00118, N = 3 SE +/- 0.01922, N = 12 SE +/- 0.00384, N = 3 SE +/- 0.01150, N = 3 1.59597 1.60540 2.03606 1.64268 1.66260 -fopenmp=libomp - MIN: 1.54 -fopenmp=libomp - MIN: 1.55 -fopenmp=libomp - MIN: 1.81 -fopenmp - MIN: 1.58 -fopenmp - MIN: 1.59 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.1355 0.271 0.4065 0.542 0.6775 SE +/- 0.000365, N = 3 SE +/- 0.001652, N = 3 SE +/- 0.002843, N = 3 SE +/- 0.001964, N = 3 SE +/- 0.001469, N = 3 0.459724 0.489278 0.491940 0.602155 0.599140 -fopenmp=libomp - MIN: 0.44 -fopenmp=libomp - MIN: 0.46 -fopenmp=libomp - MIN: 0.47 -fopenmp - MIN: 0.57 -fopenmp - MIN: 0.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.177 0.354 0.531 0.708 0.885 SE +/- 0.001713, N = 3 SE +/- 0.001200, N = 3 SE +/- 0.004246, N = 3 SE +/- 0.002532, N = 3 SE +/- 0.002405, N = 3 0.773233 0.779101 0.779776 0.782476 0.786762 -fopenmp=libomp - MIN: 0.72 -fopenmp=libomp - MIN: 0.73 -fopenmp=libomp - MIN: 0.73 -fopenmp - MIN: 0.73 -fopenmp - MIN: 0.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 300 600 900 1200 1500 SE +/- 5.94, N = 3 SE +/- 9.46, N = 3 SE +/- 3.92, N = 3 SE +/- 3.72, N = 3 SE +/- 4.44, N = 3 1267.18 1276.04 1302.70 1382.41 1357.29 -fopenmp=libomp - MIN: 1248.35 -fopenmp=libomp - MIN: 1249.65 -fopenmp=libomp - MIN: 1289.86 -fopenmp - MIN: 1360.58 -fopenmp - MIN: 1335.63 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 140 280 420 560 700 SE +/- 0.53, N = 3 SE +/- 0.83, N = 3 SE +/- 9.50, N = 3 SE +/- 0.61, N = 3 SE +/- 0.64, N = 3 544.10 563.20 593.97 659.27 658.66 -fopenmp=libomp - MIN: 532.32 -fopenmp=libomp - MIN: 550.23 -fopenmp=libomp - MIN: 570.44 -fopenmp - MIN: 642.67 -fopenmp - MIN: 639.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 300 600 900 1200 1500 SE +/- 1.97, N = 3 SE +/- 7.11, N = 3 SE +/- 3.61, N = 3 SE +/- 1.75, N = 3 SE +/- 3.05, N = 3 1259.59 1277.62 1307.49 1379.51 1358.56 -fopenmp=libomp - MIN: 1247.29 -fopenmp=libomp - MIN: 1252.39 -fopenmp=libomp - MIN: 1293.38 -fopenmp - MIN: 1361.6 -fopenmp - MIN: 1337.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 140 280 420 560 700 SE +/- 0.90, N = 3 SE +/- 0.25, N = 3 SE +/- 1.89, N = 3 SE +/- 0.83, N = 3 SE +/- 1.25, N = 3 544.31 562.97 590.18 658.28 659.19 -fopenmp=libomp - MIN: 531.9 -fopenmp=libomp - MIN: 551.49 -fopenmp=libomp - MIN: 575.41 -fopenmp - MIN: 639.78 -fopenmp - MIN: 642.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.085 0.17 0.255 0.34 0.425 SE +/- 0.000492, N = 3 SE +/- 0.000247, N = 3 SE +/- 0.000321, N = 3 SE +/- 0.004341, N = 3 SE +/- 0.000576, N = 3 0.301885 0.315522 0.313689 0.377733 0.376992 -fopenmp=libomp - MIN: 0.29 -fopenmp=libomp - MIN: 0.3 -fopenmp=libomp - MIN: 0.3 -fopenmp - MIN: 0.36 -fopenmp - MIN: 0.36 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 300 600 900 1200 1500 SE +/- 0.58, N = 3 SE +/- 9.75, N = 3 SE +/- 1.78, N = 3 SE +/- 2.65, N = 3 SE +/- 4.57, N = 3 1268.08 1271.91 1305.10 1375.71 1356.91 -fopenmp=libomp - MIN: 1257.35 -fopenmp=libomp - MIN: 1252.33 -fopenmp=libomp - MIN: 1294.76 -fopenmp - MIN: 1355.68 -fopenmp - MIN: 1335.04 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 140 280 420 560 700 SE +/- 0.62, N = 3 SE +/- 0.10, N = 3 SE +/- 3.02, N = 3 SE +/- 1.86, N = 3 SE +/- 0.52, N = 3 544.60 563.25 597.48 658.04 657.88 -fopenmp=libomp - MIN: 532.91 -fopenmp=libomp - MIN: 551.31 -fopenmp=libomp - MIN: 580.8 -fopenmp - MIN: 635.78 -fopenmp - MIN: 638.35 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.2638 0.5276 0.7914 1.0552 1.319 SE +/- 0.004625, N = 3 SE +/- 0.006530, N = 3 SE +/- 0.004576, N = 3 SE +/- 0.005622, N = 3 SE +/- 0.003430, N = 3 1.170440 1.151400 1.172580 0.788192 0.717782 -fopenmp=libomp - MIN: 1.11 -fopenmp=libomp - MIN: 1.09 -fopenmp=libomp - MIN: 1.12 -fopenmp - MIN: 0.74 -fopenmp - MIN: 0.67 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl
FinanceBench Benchmark: Repo OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Repo OpenMP AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 9K 18K 27K 36K 45K SE +/- 9.32, N = 3 SE +/- 0.81, N = 3 SE +/- 64.93, N = 3 SE +/- 102.23, N = 3 SE +/- 3.94, N = 3 SE +/- 453.41, N = 14 33146.03 33178.50 33246.84 34979.29 34199.60 42399.81 1. (CXX) g++ options: -O3 -march=native -fopenmp
FinanceBench Benchmark: Bonds OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Bonds OpenMP AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 16K 32K 48K 64K 80K SE +/- 242.64, N = 3 SE +/- 4.51, N = 3 SE +/- 10.95, N = 3 SE +/- 23.55, N = 3 SE +/- 42.62, N = 3 SE +/- 971.24, N = 3 51885.52 51900.43 51596.87 51770.51 51376.82 76805.58 1. (CXX) g++ options: -O3 -march=native -fopenmp
PostgreSQL pgbench Scaling Factor: 100 - Clients: 1 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 1 - Mode: Read Only - Average Latency Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.0095 0.019 0.0285 0.038 0.0475 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 SE +/- 0.001, N = 15 SE +/- 0.000, N = 3 0.040 0.041 0.040 0.042 0.042 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 1 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 1 - Mode: Read Write - Average Latency Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.0686 0.1372 0.2058 0.2744 0.343 SE +/- 0.002, N = 3 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 0.302 0.305 0.297 0.296 0.303 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 100 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 100 - Mode: Read Only - Average Latency Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.0214 0.0428 0.0642 0.0856 0.107 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 SE +/- 0.000, N = 3 0.094 0.094 0.093 0.092 0.095 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 250 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Only - Average Latency Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.0529 0.1058 0.1587 0.2116 0.2645 SE +/- 0.003, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 SE +/- 0.002, N = 3 SE +/- 0.002, N = 3 0.235 0.234 0.230 0.230 0.235 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 100 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 100 - Mode: Read Write - Average Latency Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.3998 0.7996 1.1994 1.5992 1.999 SE +/- 0.011, N = 3 SE +/- 0.004, N = 3 SE +/- 0.013, N = 3 SE +/- 0.029, N = 3 SE +/- 0.028, N = 3 1.626 1.607 1.701 1.777 1.688 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL pgbench Scaling Factor: 100 - Clients: 250 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL pgbench 13.0 Scaling Factor: 100 - Clients: 250 - Mode: Read Write - Average Latency Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 1.0645 2.129 3.1935 4.258 5.3225 SE +/- 0.074, N = 3 SE +/- 0.054, N = 15 SE +/- 0.052, N = 7 SE +/- 0.021, N = 3 SE +/- 0.034, N = 3 4.603 4.431 4.731 4.722 4.657 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis AMD AOCC 3.0 Clang 11.0 Clang 12.0 Clang 12.0 LTO GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.98, N = 3 SE +/- 0.98, N = 3 SE +/- 1.09, N = 3 SE +/- 1.29, N = 4 SE +/- 0.33, N = 3 SE +/- 0.16, N = 3 86.74 88.62 89.12 93.63 93.66 89.43 89.16 -flto -mabm -mabm -mabm 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -O3 -std=c99 -pedantic -march=native -lm
libavif avifenc Encoder Speed: 0 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 0 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 12 24 36 48 60 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 SE +/- 0.08, N = 3 48.13 47.89 47.88 51.45 51.03 52.22 1. (CXX) g++ options: -O3 -fPIC -lm
libavif avifenc Encoder Speed: 2 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 2 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 7 14 21 28 35 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 25.60 25.47 25.18 27.39 27.10 27.78 1. (CXX) g++ options: -O3 -fPIC -lm
libavif avifenc Encoder Speed: 6 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 6 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 3 6 9 12 15 SE +/- 0.016, N = 3 SE +/- 0.022, N = 3 SE +/- 0.014, N = 3 SE +/- 0.032, N = 3 SE +/- 0.052, N = 3 SE +/- 0.031, N = 3 9.725 9.536 9.510 10.417 10.291 10.399 1. (CXX) g++ options: -O3 -fPIC -lm
libavif avifenc Encoder Speed: 10 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 10 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 0.8233 1.6466 2.4699 3.2932 4.1165 SE +/- 0.004, N = 3 SE +/- 0.010, N = 3 SE +/- 0.014, N = 3 SE +/- 0.022, N = 3 SE +/- 0.002, N = 3 SE +/- 0.016, N = 3 3.543 3.429 3.361 3.643 3.607 3.659 1. (CXX) g++ options: -O3 -fPIC -lm
libavif avifenc Encoder Speed: 6, Lossless OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 6, Lossless AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.22, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 25.78 26.03 25.22 26.91 27.06 29.08 1. (CXX) g++ options: -O3 -fPIC -lm
libavif avifenc Encoder Speed: 10, Lossless OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 0.9.0 Encoder Speed: 10, Lossless AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2 4 6 8 10 SE +/- 0.022, N = 3 SE +/- 0.011, N = 3 SE +/- 0.013, N = 3 SE +/- 0.007, N = 3 SE +/- 0.017, N = 3 SE +/- 0.022, N = 3 5.948 5.879 5.746 6.107 6.149 6.131 1. (CXX) g++ options: -O3 -fPIC -lm
C-Ray Total Time - 4K, 16 Rays Per Pixel OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 4 8 12 16 20 SE +/- 0.063, N = 3 SE +/- 0.009, N = 3 SE +/- 0.023, N = 3 SE +/- 0.014, N = 3 SE +/- 0.027, N = 3 SE +/- 0.014, N = 3 15.649 15.599 15.870 9.029 9.227 9.158 1. (CC) gcc options: -lm -lpthread -O3 -march=native
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 3 6 9 12 15 SE +/- 0.026, N = 3 SE +/- 0.032, N = 3 SE +/- 0.041, N = 3 SE +/- 0.049, N = 3 SE +/- 0.053, N = 3 9.494 9.408 9.296 9.570 9.968 1. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -pthread -lSDL -lXpm -lSM -lICE -lX11 -lIlmImf -lImath -lHalf -lIex -lIexMath -lIlmThread -lpthread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.2 WAV To FLAC AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 3 6 9 12 15 SE +/- 0.006, N = 5 SE +/- 0.006, N = 5 SE +/- 0.007, N = 5 SE +/- 0.008, N = 5 SE +/- 0.006, N = 5 SE +/- 0.011, N = 5 9.280 7.979 7.854 8.567 8.709 8.534 -fvisibility=hidden -fvisibility=hidden -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -logg -lm
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2 4 6 8 10 SE +/- 0.008, N = 3 SE +/- 0.021, N = 3 SE +/- 0.003, N = 3 SE +/- 0.006, N = 3 SE +/- 0.005, N = 3 SE +/- 0.019, N = 3 8.142 8.250 8.256 7.231 7.473 7.011 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr 1. (CC) gcc options: -O3 -pipe -march=native -lncurses -lm
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.3.1 WAV To Opus Encode Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2 4 6 8 10 SE +/- 0.002, N = 5 SE +/- 0.013, N = 5 SE +/- 0.003, N = 5 SE +/- 0.002, N = 5 SE +/- 0.002, N = 5 7.392 7.567 7.469 7.381 7.504 -fvisibility=hidden -fvisibility=hidden -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -logg -lm
Gcrypt Library OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.9 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 50 100 150 200 250 SE +/- 0.82, N = 3 SE +/- 0.28, N = 3 SE +/- 0.44, N = 3 SE +/- 0.32, N = 3 SE +/- 0.18, N = 3 SE +/- 0.54, N = 3 240.41 240.21 236.92 231.24 233.51 232.57 1. (CC) gcc options: -O3 -march=native -fvisibility=hidden
Ngspice Circuit: C2670 OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C2670 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 30 60 90 120 150 SE +/- 0.22, N = 3 SE +/- 0.06, N = 3 SE +/- 0.53, N = 3 SE +/- 0.48, N = 3 SE +/- 1.53, N = 3 SE +/- 1.32, N = 3 103.93 103.83 118.87 103.60 103.01 101.54 -lstdc++ -lstdc++ -lstdc++ -lstdc++ 1. (CC) gcc options: -O3 -march=native -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
Ngspice Circuit: C7552 OpenBenchmarking.org Seconds, Fewer Is Better Ngspice 34 Circuit: C7552 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 1.37, N = 3 SE +/- 1.11, N = 6 SE +/- 0.12, N = 3 SE +/- 0.43, N = 3 SE +/- 0.60, N = 3 91.99 90.53 95.96 90.43 90.26 89.09 -lstdc++ -lstdc++ -lstdc++ -lstdc++ 1. (CC) gcc options: -O3 -march=native -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lSM -lICE
Tachyon Total Time OpenBenchmarking.org Seconds, Fewer Is Better Tachyon 0.99b6 Total Time AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 16.06 16.41 16.05 16.15 15.50 15.68 1. (CC) gcc options: -m64 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread
WebP2 Image Encode Encode Settings: Default OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Default AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 0.6566 1.3132 1.9698 2.6264 3.283 SE +/- 0.010, N = 3 SE +/- 0.031, N = 3 SE +/- 0.027, N = 3 SE +/- 0.032, N = 7 SE +/- 0.038, N = 3 2.816 2.743 2.739 2.918 2.778 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -lpthread -ljpeg -lgif -lwebp -lwebpdemux
WebP2 Image Encode Encode Settings: Quality 75, Compression Effort 7 OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 75, Compression Effort 7 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 30 60 90 120 150 SE +/- 0.09, N = 3 SE +/- 0.10, N = 3 SE +/- 0.10, N = 3 SE +/- 0.16, N = 3 SE +/- 0.10, N = 3 109.81 109.64 109.53 116.66 118.45 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -lpthread -ljpeg -lgif -lwebp -lwebpdemux
WebP2 Image Encode Encode Settings: Quality 95, Compression Effort 7 OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 95, Compression Effort 7 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 50 100 150 200 250 SE +/- 0.17, N = 3 SE +/- 0.66, N = 3 SE +/- 0.07, N = 3 SE +/- 0.46, N = 3 SE +/- 1.32, N = 3 205.03 203.63 207.01 215.57 220.94 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -lpthread -ljpeg -lgif -lwebp -lwebpdemux
WebP2 Image Encode Encode Settings: Quality 100, Compression Effort 5 OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Compression Effort 5 AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 2 4 6 8 10 SE +/- 0.028, N = 3 SE +/- 0.022, N = 3 SE +/- 0.006, N = 3 SE +/- 0.017, N = 3 SE +/- 0.017, N = 3 7.403 7.366 6.690 6.934 6.753 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -lpthread -ljpeg -lgif -lwebp -lwebpdemux
WebP2 Image Encode Encode Settings: Quality 100, Lossless Compression OpenBenchmarking.org Seconds, Fewer Is Better WebP2 Image Encode 20210126 Encode Settings: Quality 100, Lossless Compression AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 9.3 90 180 270 360 450 SE +/- 0.39, N = 3 SE +/- 0.17, N = 3 SE +/- 0.49, N = 3 SE +/- 3.10, N = 3 SE +/- 1.92, N = 3 382.99 392.85 374.04 406.03 388.95 1. (CXX) g++ options: -O3 -march=native -fno-rtti -rdynamic -lpthread -ljpeg -lgif -lwebp -lwebpdemux
ASTC Encoder Preset: Medium OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Medium AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 1.0968 2.1936 3.2904 4.3872 5.484 SE +/- 0.0042, N = 3 SE +/- 0.0013, N = 3 SE +/- 0.0116, N = 3 SE +/- 0.0047, N = 3 SE +/- 0.0099, N = 3 SE +/- 0.0035, N = 3 3.8811 3.9837 4.0058 4.8699 4.8160 4.8745 1. (CXX) g++ options: -O3 -march=native -flto -pthread
ASTC Encoder Preset: Thorough OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Thorough AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 2 4 6 8 10 SE +/- 0.0015, N = 3 SE +/- 0.0026, N = 3 SE +/- 0.0028, N = 3 SE +/- 0.0011, N = 3 SE +/- 0.0034, N = 3 SE +/- 0.0029, N = 3 6.6409 6.7674 6.7647 7.8370 7.6989 7.8537 1. (CXX) g++ options: -O3 -march=native -flto -pthread
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Exhaustive AMD AOCC 3.0 Clang 11.0 Clang 12.0 GCC 10.3 GCC 11.0.1 GCC 9.3 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 18.91 19.03 18.99 19.46 19.62 19.48 1. (CXX) g++ options: -O3 -march=native -flto -pthread
Phoronix Test Suite v10.8.4