AOCC 4.0 AMD EPYC 9374F 2P Compiler Benchmarks

Tests for a future article by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/2212135-NE-AOCC40AMD35&sro.

AOCC 4.0 AMD EPYC 9374F 2P Compiler BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAOCC 4.0GCC 12.22 x AMD EPYC 9374F 32-Core @ 4.31GHz (64 Cores / 128 Threads)AMD Titanite_4G (RTI1002E BIOS)AMD Device 14a41520GB800GB INTEL SSDPF21Q800GBASPEEDVGA HDMIBroadcom NetXtreme BCM5720 PCIeUbuntu 22.105.19.0-26-generic (x86_64)GNOME Shell 43.0X Server 1.21.1.41.3.224Clang 14.0.6ext41920x1080GCC 12.2.0OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseEnvironment Details- CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Details- AOCC 4.0: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver4 - GCC 12.2: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa10110d Python Details- Python 3.10.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AOCC 4.0 AMD EPYC 9374F 2P Compiler Benchmarkscryptopp: Keyed Algorithmscryptopp: Unkeyed Algorithmslczero: BLASlczero: Eigenminibude: OpenMP - BM2minibude: OpenMP - BM2lammps: 20k Atomssimdjson: TopTweetsimdjson: PartialTweetssimdjson: DistinctUserIDcompress-zstd: 8 - Compression Speedcompress-zstd: 8 - Decompression Speedcompress-zstd: 19 - Compression Speedcompress-zstd: 19 - Decompression Speedjpegxl-decode: 1jpegxl-decode: Allwebp: Defaultwebp: Quality 100, Highest Compressionsrsran: OFDM_Testsrsran: 4G PHY_DL_Test 100 PRB MIMO 64-QAMgraphics-magick: Rotategraphics-magick: Sharpengraphics-magick: Enhancedaom-av1: Speed 6 Two-Pass - Bosphorus 4Kaom-av1: Speed 9 Realtime - Bosphorus 4Kkvazaar: Bosphorus 4K - Mediumkvazaar: Bosphorus 4K - Very Fastkvazaar: Bosphorus 4K - Ultra Fastsvt-av1: Preset 8 - Bosphorus 4Ksvt-av1: Preset 13 - Bosphorus 4Ksvt-hevc: 7 - Bosphorus 4Ksvt-hevc: 10 - Bosphorus 4Ksvt-vp9: VMAF Optimized - Bosphorus 4Ksvt-vp9: PSNR/SSIM Optimized - Bosphorus 4Ksvt-vp9: Visual Quality Optimized - Bosphorus 4Kvpxenc: Speed 0 - Bosphorus 4Kvpxenc: Speed 5 - Bosphorus 4Kstargate: 96000 - 512stargate: 192000 - 512stargate: 96000 - 1024stargate: 192000 - 1024avifenc: 0avifenc: 2avifenc: 6avifenc: 6, Losslessavifenc: 10, Losslessonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUsecuremark: SecureMark-TLSopenssl: SHA256openssl: RSA4096openssl: RSA4096liquid-dsp: 32 - 256 - 57liquid-dsp: 64 - 256 - 57liquid-dsp: 128 - 256 - 57astcenc: Mediumastcenc: Thoroughgpaw: Carbon Nanotubetnn: CPU - MobileNet v2tnn: CPU - SqueezeNet v2tnn: CPU - SqueezeNet v1.1kripke: AOCC 4.0GCC 12.2796.930644534.93008311900169824638.340185.53443.5448.549.179.556155.24722.5104.64123.160.57297.4922.784.86192833333476.67601150139320.9837.3934.2371.1782.7196.552220.119147.93157.12193.13189.03160.537.9919.514.9626983.0575095.4252323.41118753.15429.8332.4214.3423.2560.459440.2784670.3203021.722180.587671524.101292.7660.11238237026212286131274319492.81270745.6288400000055596000005868866667457.169364.001532.671343.97454.618287.194340106367779.239837448.18682311110117074539.557181.58242.1707.597.848.036060.74913.7101.44203.854.96280.4721.893.59189500000481.0732864140717.6636.3232.4266.4682.0792.543187.192121.95125.58173.56169.47120.037.8918.994.4776062.8130284.9600553.22110557.29231.9742.5734.7373.4224.283281.568020.3997742.037030.637696869.007664.9850.25031833950311919870900019467.81266612.9267513333352802000005451600000403.531662.965335.003229.79352.410226.006308884107OpenBenchmarking.org

Crypto++

Test: Keyed Algorithms

OpenBenchmarking.orgMiB/second, More Is BetterCrypto++ 8.2Test: Keyed AlgorithmsAOCC 4.0GCC 12.22004006008001000SE +/- 1.21, N = 3SE +/- 0.35, N = 3796.93779.241. (CXX) g++ options: -O3 -march=native -fPIC -pthread -pipe

Crypto++

Test: Unkeyed Algorithms

OpenBenchmarking.orgMiB/second, More Is BetterCrypto++ 8.2Test: Unkeyed AlgorithmsAOCC 4.0GCC 12.2120240360480600SE +/- 0.08, N = 3SE +/- 0.18, N = 3534.93448.191. (CXX) g++ options: -O3 -march=native -fPIC -pthread -pipe

LeelaChessZero

Backend: BLAS

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLASAOCC 4.0GCC 12.23K6K9K12K15KSE +/- 62.91, N = 3SE +/- 56.50, N = 311900111101. (CXX) g++ options: -flto -O3 -march=native -pthread

LeelaChessZero

Backend: Eigen

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: EigenAOCC 4.0GCC 12.24K8K12K16K20KSE +/- 173.43, N = 9SE +/- 95.82, N = 916982117071. (CXX) g++ options: -flto -O3 -march=native -pthread

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AOCC 4.0GCC 12.210002000300040005000SE +/- 4.03, N = 3SE +/- 9.63, N = 34638.344539.561. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AOCC 4.0GCC 12.24080120160200SE +/- 0.16, N = 3SE +/- 0.39, N = 3185.53181.581. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

LAMMPS Molecular Dynamics Simulator

Model: 20k Atoms

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: 20k AtomsAOCC 4.0GCC 12.21020304050SE +/- 0.06, N = 3SE +/- 0.09, N = 343.5442.171. (CXX) g++ options: -O3 -march=native -lm -ldl

simdjson

Throughput Test: TopTweet

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweetAOCC 4.0GCC 12.2246810SE +/- 0.02, N = 3SE +/- 0.00, N = 38.547.591. (CXX) g++ options: -O3 -march=native

simdjson

Throughput Test: PartialTweets

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweetsAOCC 4.0GCC 12.23691215SE +/- 0.05, N = 3SE +/- 0.06, N = 39.177.841. (CXX) g++ options: -O3 -march=native

simdjson

Throughput Test: DistinctUserID

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserIDAOCC 4.0GCC 12.23691215SE +/- 0.07, N = 3SE +/- 0.05, N = 39.558.031. (CXX) g++ options: -O3 -march=native

Zstd Compression

Compression Level: 8 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 8 - Compression SpeedAOCC 4.0GCC 12.213002600390052006500SE +/- 60.88, N = 3SE +/- 65.79, N = 46155.26060.71. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

Zstd Compression

Compression Level: 8 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 8 - Decompression SpeedAOCC 4.0GCC 12.211002200330044005500SE +/- 10.29, N = 3SE +/- 29.00, N = 44722.54913.71. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

Zstd Compression

Compression Level: 19 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19 - Compression SpeedAOCC 4.0GCC 12.220406080100SE +/- 1.53, N = 12SE +/- 0.61, N = 3104.6101.41. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

Zstd Compression

Compression Level: 19 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19 - Decompression SpeedAOCC 4.0GCC 12.29001800270036004500SE +/- 17.78, N = 12SE +/- 53.58, N = 34123.14203.81. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

JPEG XL Decoding libjxl

CPU Threads: 1

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL Decoding libjxl 0.7CPU Threads: 1AOCC 4.0GCC 12.21428425670SE +/- 0.22, N = 3SE +/- 0.05, N = 360.5754.96

JPEG XL Decoding libjxl

CPU Threads: All

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL Decoding libjxl 0.7CPU Threads: AllAOCC 4.0GCC 12.260120180240300SE +/- 2.41, N = 3SE +/- 0.55, N = 3297.49280.47

WebP Image Encode

Encode Settings: Default

OpenBenchmarking.orgMP/s, More Is BetterWebP Image Encode 1.2.4Encode Settings: DefaultAOCC 4.0GCC 12.2510152025SE +/- 0.03, N = 3SE +/- 0.02, N = 322.7821.891. (CC) gcc options: -fvisibility=hidden -O3 -march=native -lm

WebP Image Encode

Encode Settings: Quality 100, Highest Compression

OpenBenchmarking.orgMP/s, More Is BetterWebP Image Encode 1.2.4Encode Settings: Quality 100, Highest CompressionAOCC 4.0GCC 12.21.09352.1873.28054.3745.4675SE +/- 0.00, N = 3SE +/- 0.00, N = 34.863.591. (CC) gcc options: -fvisibility=hidden -O3 -march=native -lm

srsRAN

Test: OFDM_Test

OpenBenchmarking.orgSamples / Second, More Is BettersrsRAN 22.04.1Test: OFDM_TestAOCC 4.0GCC 12.240M80M120M160M200MSE +/- 1299145.02, N = 3SE +/- 781024.97, N = 3192833333189500000-latomic1. (CXX) g++ options: -O3 -march=native -std=c++14 -fno-strict-aliasing -mfpmath=sse -mavx2 -fvisibility=hidden -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

srsRAN

Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAM

OpenBenchmarking.orgeNb Mb/s, More Is BettersrsRAN 22.04.1Test: 4G PHY_DL_Test 100 PRB MIMO 64-QAMAOCC 4.0GCC 12.2100200300400500SE +/- 0.50, N = 3SE +/- 1.94, N = 3476.6481.0-latomic1. (CXX) g++ options: -O3 -march=native -std=c++14 -fno-strict-aliasing -mfpmath=sse -mavx2 -fvisibility=hidden -fno-trapping-math -fno-math-errno -mavx512f -mavx512cd -mavx512bw -mavx512dq -ldl -lpthread -lm

GraphicsMagick

Operation: Rotate

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.38Operation: RotateAOCC 4.0GCC 12.2160320480640800SE +/- 0.67, N = 3SE +/- 1.53, N = 37607321. (CC) gcc options: -fopenmp -O3 -march=native -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.38Operation: SharpenAOCC 4.0GCC 12.22004006008001000SE +/- 2.65, N = 3SE +/- 3.61, N = 311508641. (CC) gcc options: -fopenmp -O3 -march=native -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Enhanced

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.38Operation: EnhancedAOCC 4.0GCC 12.230060090012001500SE +/- 6.33, N = 3SE +/- 1.86, N = 3139314071. (CC) gcc options: -fopenmp -O3 -march=native -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

AOM AV1

Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.5Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4KAOCC 4.0GCC 12.2510152025SE +/- 0.37, N = 12SE +/- 0.18, N = 1520.9817.661. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm

AOM AV1

Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.5Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4KAOCC 4.0GCC 12.2918273645SE +/- 0.45, N = 3SE +/- 0.49, N = 1537.3936.321. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Medium

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: MediumAOCC 4.0GCC 12.2816243240SE +/- 0.00, N = 3SE +/- 0.10, N = 334.2332.42-lpthread1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -lm -lrt

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Very Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Very FastAOCC 4.0GCC 12.21632486480SE +/- 0.20, N = 3SE +/- 0.07, N = 371.1766.46-lpthread1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -lm -lrt

Kvazaar

Video Input: Bosphorus 4K - Video Preset: Ultra Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Ultra FastAOCC 4.0GCC 12.220406080100SE +/- 0.41, N = 3SE +/- 0.55, N = 382.7182.07-lpthread1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O3 -march=native -lm -lrt

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 8 - Input: Bosphorus 4KAOCC 4.0GCC 12.220406080100SE +/- 0.24, N = 3SE +/- 1.00, N = 396.5592.541. (CXX) g++ options: -O3 -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 13 - Input: Bosphorus 4KAOCC 4.0GCC 12.250100150200250SE +/- 2.79, N = 3SE +/- 2.73, N = 15220.12187.191. (CXX) g++ options: -O3 -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-HEVC

Tuning: 7 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 4KAOCC 4.0GCC 12.2306090120150SE +/- 1.90, N = 3SE +/- 1.03, N = 3147.93121.951. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt

SVT-HEVC

Tuning: 10 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 4KAOCC 4.0GCC 12.2306090120150SE +/- 1.46, N = 3SE +/- 0.86, N = 15157.12125.581. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt

SVT-VP9

Tuning: VMAF Optimized - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-VP9 0.3Tuning: VMAF Optimized - Input: Bosphorus 4KAOCC 4.0GCC 12.24080120160200SE +/- 1.04, N = 3SE +/- 1.38, N = 15193.13173.561. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm

SVT-VP9

Tuning: PSNR/SSIM Optimized - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-VP9 0.3Tuning: PSNR/SSIM Optimized - Input: Bosphorus 4KAOCC 4.0GCC 12.24080120160200SE +/- 1.57, N = 9SE +/- 1.94, N = 15189.03169.471. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm

SVT-VP9

Tuning: Visual Quality Optimized - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-VP9 0.3Tuning: Visual Quality Optimized - Input: Bosphorus 4KAOCC 4.0GCC 12.24080120160200SE +/- 1.60, N = 15SE +/- 0.72, N = 3160.53120.031. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm

VP9 libvpx Encoding

Speed: Speed 0 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.10.0Speed: Speed 0 - Input: Bosphorus 4KAOCC 4.0GCC 12.2246810SE +/- 0.06, N = 12SE +/- 0.06, N = 37.997.891. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE -std=gnu++11

VP9 libvpx Encoding

Speed: Speed 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.10.0Speed: Speed 5 - Input: Bosphorus 4KAOCC 4.0GCC 12.2510152025SE +/- 0.17, N = 3SE +/- 0.23, N = 1519.5118.991. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE -std=gnu++11

Stargate Digital Audio Workstation

Sample Rate: 96000 - Buffer Size: 512

OpenBenchmarking.orgRender Ratio, More Is BetterStargate Digital Audio Workstation 22.11.5Sample Rate: 96000 - Buffer Size: 512AOCC 4.0GCC 12.21.11662.23323.34984.46645.583SE +/- 0.008965, N = 3SE +/- 0.016850, N = 34.9626984.4776061. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions

Stargate Digital Audio Workstation

Sample Rate: 192000 - Buffer Size: 512

OpenBenchmarking.orgRender Ratio, More Is BetterStargate Digital Audio Workstation 22.11.5Sample Rate: 192000 - Buffer Size: 512AOCC 4.0GCC 12.20.68791.37582.06372.75163.4395SE +/- 0.001667, N = 3SE +/- 0.012940, N = 33.0575092.8130281. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions

Stargate Digital Audio Workstation

Sample Rate: 96000 - Buffer Size: 1024

OpenBenchmarking.orgRender Ratio, More Is BetterStargate Digital Audio Workstation 22.11.5Sample Rate: 96000 - Buffer Size: 1024AOCC 4.0GCC 12.21.22072.44143.66214.88286.1035SE +/- 0.005282, N = 3SE +/- 0.012443, N = 35.4252324.9600551. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions

Stargate Digital Audio Workstation

Sample Rate: 192000 - Buffer Size: 1024

OpenBenchmarking.orgRender Ratio, More Is BetterStargate Digital Audio Workstation 22.11.5Sample Rate: 192000 - Buffer Size: 1024AOCC 4.0GCC 12.20.76751.5352.30253.073.8375SE +/- 0.006797, N = 3SE +/- 0.009147, N = 33.4111873.2211051. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions

libavif avifenc

Encoder Speed: 0

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 0AOCC 4.0GCC 12.21326395265SE +/- 0.15, N = 3SE +/- 0.49, N = 353.1557.291. (CXX) g++ options: -O3 -fPIC -march=native -lm

libavif avifenc

Encoder Speed: 2

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 2AOCC 4.0GCC 12.2714212835SE +/- 0.28, N = 3SE +/- 0.21, N = 329.8331.971. (CXX) g++ options: -O3 -fPIC -march=native -lm

libavif avifenc

Encoder Speed: 6

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 6AOCC 4.0GCC 12.20.57891.15781.73672.31562.8945SE +/- 0.019, N = 3SE +/- 0.011, N = 32.4212.5731. (CXX) g++ options: -O3 -fPIC -march=native -lm

libavif avifenc

Encoder Speed: 6, Lossless

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 6, LosslessAOCC 4.0GCC 12.21.06582.13163.19744.26325.329SE +/- 0.008, N = 3SE +/- 0.028, N = 34.3424.7371. (CXX) g++ options: -O3 -fPIC -march=native -lm

libavif avifenc

Encoder Speed: 10, Lossless

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.11Encoder Speed: 10, LosslessAOCC 4.0GCC 12.20.771.542.313.083.85SE +/- 0.004, N = 3SE +/- 0.011, N = 33.2563.4221. (CXX) g++ options: -O3 -fPIC -march=native -lm

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.20.96371.92742.89113.85484.8185SE +/- 0.00415, N = 3SE +/- 0.05940, N = 150.459444.28328-fopenmp - MIN: 2.971. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.20.35280.70561.05841.41121.764SE +/- 0.000429, N = 3SE +/- 0.018545, N = 30.2784671.568020-fopenmp=libomp - MIN: 0.24-fopenmp - MIN: 1.21. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.20.08990.17980.26970.35960.4495SE +/- 0.000615, N = 3SE +/- 0.002555, N = 30.3203020.399774-fopenmp=libomp - MIN: 0.31-fopenmp - MIN: 0.361. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.20.45830.91661.37491.83322.2915SE +/- 0.00581, N = 3SE +/- 0.00916, N = 31.722182.03703-fopenmp=libomp - MIN: 1.53-fopenmp - MIN: 1.861. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.20.14350.2870.43050.5740.7175SE +/- 0.001000, N = 3SE +/- 0.003124, N = 30.5876710.637696-fopenmp=libomp - MIN: 0.54-fopenmp - MIN: 0.611. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.22004006008001000SE +/- 0.35, N = 3SE +/- 6.48, N = 3524.10869.01-fopenmp=libomp - MIN: 510.54-fopenmp - MIN: 839.811. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.2140280420560700SE +/- 0.18, N = 3SE +/- 4.69, N = 12292.77664.99-fopenmp=libomp - MIN: 283.8-fopenmp - MIN: 635.261. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUAOCC 4.0GCC 12.20.05630.11260.16890.22520.2815SE +/- 0.001257, N = 3SE +/- 0.002764, N = 40.1123820.250318-fopenmp=libomp - MIN: 0.1-fopenmp - MIN: 0.221. (CXX) g++ options: -O3 -march=native -msse4.1 -fPIC -pie -ldl -lpthread

SecureMark

Benchmark: SecureMark-TLS

OpenBenchmarking.orgmarks, More Is BetterSecureMark 1.0.4Benchmark: SecureMark-TLSAOCC 4.0GCC 12.280K160K240K320K400KSE +/- 398.31, N = 3SE +/- 1573.48, N = 33702623395031. (CC) gcc options: -pedantic -O3

OpenSSL

Algorithm: SHA256

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.0Algorithm: SHA256AOCC 4.0GCC 12.230000M60000M90000M120000M150000MSE +/- 7229089.97, N = 3SE +/- 3427994.13, N = 3122861312743119198709000-Qunused-arguments1. (CC) gcc options: -pthread -m64 -O3 -march=native -lssl -lcrypto -ldl

OpenSSL

Algorithm: RSA4096

OpenBenchmarking.orgsign/s, More Is BetterOpenSSL 3.0Algorithm: RSA4096AOCC 4.0GCC 12.24K8K12K16K20KSE +/- 0.40, N = 3SE +/- 18.34, N = 319492.819467.8-Qunused-arguments1. (CC) gcc options: -pthread -m64 -O3 -march=native -lssl -lcrypto -ldl

OpenSSL

Algorithm: RSA4096

OpenBenchmarking.orgverify/s, More Is BetterOpenSSL 3.0Algorithm: RSA4096AOCC 4.0GCC 12.2300K600K900K1200K1500KSE +/- 66.97, N = 3SE +/- 2377.84, N = 31270745.61266612.9-Qunused-arguments1. (CC) gcc options: -pthread -m64 -O3 -march=native -lssl -lcrypto -ldl

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 32 - Buffer Length: 256 - Filter Length: 57AOCC 4.0GCC 12.2600M1200M1800M2400M3000MSE +/- 4590206.97, N = 3SE +/- 12676794.20, N = 3288400000026751333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 64 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 64 - Buffer Length: 256 - Filter Length: 57AOCC 4.0GCC 12.21200M2400M3600M4800M6000MSE +/- 7275300.68, N = 3SE +/- 18956353.38, N = 3555960000052802000001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 128 - Buffer Length: 256 - Filter Length: 57AOCC 4.0GCC 12.21300M2600M3900M5200M6500MSE +/- 2562117.18, N = 3SE +/- 5550075.07, N = 3586886666754516000001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

ASTC Encoder

Preset: Medium

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: MediumAOCC 4.0GCC 12.2100200300400500SE +/- 0.25, N = 3SE +/- 0.24, N = 3457.17403.531. (CXX) g++ options: -O3 -march=native -flto -pthread

ASTC Encoder

Preset: Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: ThoroughAOCC 4.0GCC 12.21428425670SE +/- 0.03, N = 3SE +/- 0.03, N = 364.0062.971. (CXX) g++ options: -O3 -march=native -flto -pthread

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 22.1Input: Carbon NanotubeAOCC 4.0GCC 12.2816243240SE +/- 0.03, N = 3SE +/- 0.11, N = 332.6735.001. (CC) gcc options: -shared -fwrapv -O2 -O3 -march=native -lxc -lblas -lmpi

TNN

Target: CPU - Model: MobileNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: MobileNet v2AOCC 4.0GCC 12.270140210280350SE +/- 4.23, N = 15SE +/- 0.12, N = 3343.97229.79-fopenmp=libomp - MIN: 274.57 / MAX: 495.33-fopenmp - MIN: 227.65 / MAX: 234.531. (CXX) g++ options: -O3 -march=native -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2AOCC 4.0GCC 12.21224364860SE +/- 0.59, N = 5SE +/- 0.31, N = 354.6252.41-fopenmp=libomp - MIN: 52.11 / MAX: 57.47-fopenmp - MIN: 51.53 / MAX: 54.741. (CXX) g++ options: -O3 -march=native -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v1.1

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1AOCC 4.0GCC 12.260120180240300SE +/- 0.01, N = 3SE +/- 0.01, N = 3287.19226.01-fopenmp=libomp - MIN: 286.77 / MAX: 288-fopenmp - MIN: 225.65 / MAX: 227.031. (CXX) g++ options: -O3 -march=native -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl

Kripke

OpenBenchmarking.orgThroughput FoM, More Is BetterKripke 1.2.4AOCC 4.0GCC 12.270M140M210M280M350MSE +/- 1235045.49, N = 3SE +/- 4347432.93, N = 15340106367308884107-fopenmp=libomp-fopenmp1. (CXX) g++ options: -O3 -march=native


Phoronix Test Suite v10.8.4