AMD AOCC 3.1 Compiler Comparison

AMD EPYC 7543 testing of AMD AOCC 3.1 compiler benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2107288-IB-AOCC31BEN82&sor&sgm=1&hgv=AOCC+3.1.

AMD AOCC 3.1 Compiler ComparisonProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAOCC 3.1Clang 12.0GCC 11.1AMD EPYC 7543 32-Core @ 2.80GHz (32 Cores / 64 Threads)TYAN S8036GM2NE-LE (V2.00.B21 BIOS)AMD Starship/Matisse64GB1000GB Corsair Force MP600ASPEEDVE2282 x Broadcom NetXtreme BCM5720 2-port PCIeUbuntu 21.045.11.0-25-generic (x86_64)GNOME Shell 3.38.4X ServerClang 12.0.0ext41920x1080Clang 12.0.1-++20210630032617+fed41342a82f-1~exp1~20210630133328.128GCC 11.1.0OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseEnvironment Details- CXXFLAGS="-O3 -march=znver3" CFLAGS="-O3 -march=znver3"Compiler Details- AOCC 3.1: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver3- GCC 11.1: --disable-multilib --enable-checking=releaseProcessor Details- Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa001119Python Details- Python 3.9.5Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected

AMD AOCC 3.1 Compiler Comparisonblosc: blosclzquantlib: etcpak: DXT1etcpak: ETC2compress-lz4: 1 - Compression Speedcompress-zstd: 8 - Compression Speedcompress-zstd: 19, Long Mode - Compression Speedcompress-zstd: 19, Long Mode - Decompression Speedjpegxl: PNG - 7jpegxl: JPEG - 7jpegxl: JPEG - 8botan: AES-256botan: AES-256 - Decryptbotan: Twofishbotan: Twofish - Decryptbotan: Blowfishbotan: Blowfish - Decryptbotan: CAST-256botan: CAST-256 - Decryptlibraw: Post-Processing Benchmarkjohn-the-ripper: Blowfishjohn-the-ripper: MD5graphics-magick: Rotategraphics-magick: Enhancedsvt-av1: Preset 4 - Bosphorus 4Ksvt-av1: Preset 8 - Bosphorus 4Ksvt-hevc: 7 - Bosphorus 1080psvt-hevc: 10 - Bosphorus 1080psvt-vp9: Visual Quality Optimized - Bosphorus 1080pvpxenc: Speed 5 - Bosphorus 4Khimeno: Poisson Pressure Solverstockfish: Total Timeavifenc: 2avifenc: 6avifenc: 10avifenc: 6, Losslessavifenc: 10, Losslesspovray: Trace Timeonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUencode-flac: WAV To FLACencode-mp3: WAV To MP3ngspice: C2670ngspice: C7552rnnoise: tachyon: Total Timesynthmark: VoiceMark_100securemark: SecureMark-TLSliquid-dsp: 16 - 256 - 57liquid-dsp: 32 - 256 - 57liquid-dsp: 64 - 256 - 57financebench: Repo OpenMPfinancebench: Bonds OpenMPtjbench: Decompression Throughputastcenc: Exhaustivesqlite-speedtest: Timed Time - Size 1,000draco: Liondraco: Church Facadencnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - vgg16ncnn: CPU - resnet18tnn: CPU - SqueezeNet v2tnn: CPU - SqueezeNet v1.1rocksdb: Update Randrocksdb: Read While Writingrocksdb: Read Rand Write Randonnx: bertsquad-10 - OpenMP CPUonnx: fcn-resnet101-11 - OpenMP CPUonnx: super-resolution-10 - OpenMP CPUencode-wavpack: WAV To WavPackgnupg: 2.7GB Sample File EncryptionAOCC 3.1Clang 12.0GCC 11.125069.62854.92897.284207.55010964.112852.442.13264.49.0465.8928.304859.6604876.955332.808339.974417.045405.311134.367137.90745.116184222400007926961.98719.467303.75408.51234.1815.623819.9854469044803024.7679.6723.49726.6665.80014.7791.401645.075471.406741.800963.131866432.49668.7890.4359998.7007.82891.52282.27918.73328.6622621.6292799318065100001527933333189496666733763.57812551185.485677220.67107921.893855.235496064885.745.6051.6011.8860.655266.2418326516135560305906848790456813.17567.60925417.82838.62862.398214.16810375.182708.442.93315.99.4964.3527.784929.6724924.965334.183339.900398.706404.373140.287140.53843.686274520433338206661.98819.399295.67388.86240.6517.124112.99548324.4019.6323.51726.1035.96615.1361.408675.008422.777651.836573.109646828.04679.0490.4369067.3267.82692.72983.88618.55528.6607621.4262851728867933331651733333206836666733674.40364651304.839844214.48309954.798549074696.826.7759.7413.1561.222265.1228126655822258287599448085463013.17367.89825536.12861.21170.466183.80212074.052792.337.63434.55849.1325866.481331.413332.398407.147407.583133.676133.61960.938486581.93718.873302.07529.50256.9116.254237.3774869186378926.20810.3763.69827.8916.21215.3561.425555.034451.918274.592732.954146221.98702.0840.4578658.1617.09999.08584.13918.00727.63792571919544933331719166667187366666735022.33105550564.020833210.79826623.262553.7797.717.0956.4211.1964.790270.5408321186369004313999343289492213.13167.492OpenBenchmarking.org

C-Blosc

Compressor: blosclz

OpenBenchmarking.orgMB/s, More Is BetterC-Blosc 2.0Compressor: blosclzGCC 11.1Clang 12.0AOCC 3.15K10K15K20K25KSE +/- 95.08, N = 3SE +/- 118.87, N = 3SE +/- 100.31, N = 325536.125417.825069.6

QuantLib

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.21GCC 11.1AOCC 3.1Clang 12.06001200180024003000SE +/- 23.43, N = 9SE +/- 5.98, N = 3SE +/- 4.58, N = 32861.22854.92838.61. (CXX) g++ options: -O3 -march=native -rdynamic

Etcpak

Configuration: DXT1

OpenBenchmarking.orgMpx/s, More Is BetterEtcpak 0.7Configuration: DXT1AOCC 3.1Clang 12.0GCC 11.16001200180024003000SE +/- 4.90, N = 3SE +/- 6.55, N = 3SE +/- 0.56, N = 32897.282862.401170.471. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread

Etcpak

Configuration: ETC2

OpenBenchmarking.orgMpx/s, More Is BetterEtcpak 0.7Configuration: ETC2Clang 12.0AOCC 3.1GCC 11.150100150200250SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3214.17207.55183.801. (CXX) g++ options: -O3 -march=native -std=c++11 -lpthread

LZ4 Compression

Compression Level: 1 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.3Compression Level: 1 - Compression SpeedGCC 11.1AOCC 3.1Clang 12.03K6K9K12K15KSE +/- 86.05, N = 3SE +/- 7.15, N = 3SE +/- 14.32, N = 312074.0510964.1110375.181. (CC) gcc options: -O3

Zstd Compression

Compression Level: 8 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 8 - Compression SpeedAOCC 3.1GCC 11.1Clang 12.06001200180024003000SE +/- 17.26, N = 3SE +/- 4.24, N = 3SE +/- 23.16, N = 32852.42792.32708.41. (CC) gcc options: -O3 -march=znver3 -pthread -lz -llzma

Zstd Compression

Compression Level: 19, Long Mode - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Compression SpeedClang 12.0AOCC 3.1GCC 11.11020304050SE +/- 0.12, N = 3SE +/- 0.38, N = 3SE +/- 0.10, N = 342.942.137.61. (CC) gcc options: -O3 -march=znver3 -pthread -lz -llzma

Zstd Compression

Compression Level: 19, Long Mode - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterZstd Compression 1.5.0Compression Level: 19, Long Mode - Decompression SpeedGCC 11.1Clang 12.0AOCC 3.17001400210028003500SE +/- 30.46, N = 3SE +/- 18.29, N = 3SE +/- 6.78, N = 33434.53315.93264.41. (CC) gcc options: -O3 -march=znver3 -pthread -lz -llzma

JPEG XL

Input: PNG - Encode Speed: 7

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL 0.3.3Input: PNG - Encode Speed: 7Clang 12.0AOCC 3.13691215SE +/- 0.01, N = 3SE +/- 0.01, N = 39.499.041. (CXX) g++ options: -O3 -march=znver3 -funwind-tables -Xclang -mrelax-all -O2 -pthread -fPIE -pie -ldl

JPEG XL

Input: JPEG - Encode Speed: 7

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL 0.3.3Input: JPEG - Encode Speed: 7AOCC 3.1Clang 12.01530456075SE +/- 0.18, N = 3SE +/- 0.19, N = 365.8964.351. (CXX) g++ options: -O3 -march=znver3 -funwind-tables -Xclang -mrelax-all -O2 -pthread -fPIE -pie -ldl

JPEG XL

Input: JPEG - Encode Speed: 8

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL 0.3.3Input: JPEG - Encode Speed: 8AOCC 3.1Clang 12.0714212835SE +/- 0.04, N = 3SE +/- 0.10, N = 328.3027.781. (CXX) g++ options: -O3 -march=znver3 -funwind-tables -Xclang -mrelax-all -O2 -pthread -fPIE -pie -ldl

Botan

Test: AES-256

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: AES-256GCC 11.1Clang 12.0AOCC 3.113002600390052006500SE +/- 4.89, N = 3SE +/- 9.74, N = 3SE +/- 3.96, N = 35849.134929.674859.661. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: AES-256 - Decrypt

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: AES-256 - DecryptGCC 11.1Clang 12.0AOCC 3.113002600390052006500SE +/- 6.69, N = 3SE +/- 1.46, N = 3SE +/- 6.85, N = 35866.484924.974876.961. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: Twofish

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: TwofishClang 12.0AOCC 3.1GCC 11.170140210280350SE +/- 0.11, N = 3SE +/- 0.05, N = 3SE +/- 2.97, N = 12334.18332.81331.411. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: Twofish - Decrypt

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: Twofish - DecryptAOCC 3.1Clang 12.0GCC 11.170140210280350SE +/- 0.13, N = 3SE +/- 0.14, N = 3SE +/- 2.99, N = 12339.97339.90332.401. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: Blowfish

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: BlowfishAOCC 3.1GCC 11.1Clang 12.090180270360450SE +/- 0.17, N = 3SE +/- 0.11, N = 3SE +/- 0.05, N = 3417.05407.15398.711. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: Blowfish - Decrypt

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: Blowfish - DecryptGCC 11.1AOCC 3.1Clang 12.090180270360450SE +/- 0.07, N = 3SE +/- 0.11, N = 3SE +/- 0.03, N = 3407.58405.31404.371. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: CAST-256

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: CAST-256Clang 12.0AOCC 3.1GCC 11.1306090120150SE +/- 0.16, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 3140.29134.37133.681. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

Botan

Test: CAST-256 - Decrypt

OpenBenchmarking.orgMiB/s, More Is BetterBotan 2.17.3Test: CAST-256 - DecryptClang 12.0AOCC 3.1GCC 11.1306090120150SE +/- 0.12, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 3140.54137.91133.621. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt

LibRaw

Post-Processing Benchmark

OpenBenchmarking.orgMpix/sec, More Is BetterLibRaw 0.20Post-Processing BenchmarkGCC 11.1AOCC 3.1Clang 12.01428425670SE +/- 0.28, N = 3SE +/- 0.48, N = 3SE +/- 0.35, N = 360.9345.1143.681. (CXX) g++ options: -O3 -march=znver3 -fopenmp -ljpeg -lz -lm

John The Ripper

Test: Blowfish

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 1.9.0-jumbo-1Test: BlowfishClang 12.0AOCC 3.113K26K39K52K65KSE +/- 601.59, N = 6SE +/- 125.66, N = 362745618421. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -pthread -lm -lz -ldl -lcrypt

John The Ripper

Test: MD5

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 1.9.0-jumbo-1Test: MD5AOCC 3.1Clang 12.0500K1000K1500K2000K2500KSE +/- 23259.41, N = 3SE +/- 4666.67, N = 3224000020433331. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -pthread -lm -lz -ldl -lcrypt

GraphicsMagick

Operation: Rotate

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.33Operation: RotateGCC 11.1Clang 12.0AOCC 3.12004006008001000SE +/- 1.33, N = 3SE +/- 2.08, N = 3SE +/- 3.71, N = 38488207921. (CC) gcc options: -fopenmp -O3 -march=znver3 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread

GraphicsMagick

Operation: Enhanced

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.33Operation: EnhancedAOCC 3.1Clang 12.0GCC 11.1150300450600750SE +/- 0.33, N = 3SE +/- 0.58, N = 3SE +/- 3.48, N = 36966666581. (CC) gcc options: -fopenmp -O3 -march=znver3 -pthread -ljbig -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lxml2 -lz -lm -lpthread

SVT-AV1

Encoder Mode: Preset 4 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 0.8.7Encoder Mode: Preset 4 - Input: Bosphorus 4KClang 12.0AOCC 3.1GCC 11.10.44730.89461.34191.78922.2365SE +/- 0.004, N = 3SE +/- 0.001, N = 3SE +/- 0.005, N = 31.9881.9871.9371. (CXX) g++ options: -O3 -march=znver3 -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 0.8.7Encoder Mode: Preset 8 - Input: Bosphorus 4KAOCC 3.1Clang 12.0GCC 11.1510152025SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 319.4719.4018.871. (CXX) g++ options: -O3 -march=znver3 -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie

SVT-HEVC

Tuning: 7 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 1080pAOCC 3.1GCC 11.1Clang 12.070140210280350SE +/- 4.16, N = 3SE +/- 1.06, N = 3SE +/- 3.03, N = 15303.75302.07295.671. (CC) gcc options: -O3 -march=znver3 -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt

SVT-HEVC

Tuning: 10 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 1080pGCC 11.1AOCC 3.1Clang 12.0110220330440550SE +/- 4.86, N = 3SE +/- 5.79, N = 15SE +/- 4.14, N = 3529.50408.51388.861. (CC) gcc options: -O3 -march=znver3 -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt

SVT-VP9

Tuning: Visual Quality Optimized - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-VP9 0.3Tuning: Visual Quality Optimized - Input: Bosphorus 1080pGCC 11.1Clang 12.0AOCC 3.160120180240300SE +/- 0.79, N = 3SE +/- 3.15, N = 15SE +/- 1.81, N = 3256.91240.65234.181. (CC) gcc options: -O3 -fcommon -march=znver3 -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm

VP9 libvpx Encoding

Speed: Speed 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.10.0Speed: Speed 5 - Input: Bosphorus 4KClang 12.0GCC 11.1AOCC 3.148121620SE +/- 0.12, N = 3SE +/- 0.07, N = 3SE +/- 0.12, N = 317.1216.2515.621. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=znver3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure SolverGCC 11.1Clang 12.0AOCC 3.19001800270036004500SE +/- 19.17, N = 3SE +/- 58.62, N = 15SE +/- 26.55, N = 34237.384113.003819.991. (CC) gcc options: -O3 -march=znver3 -mavx2

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 13Total TimeGCC 11.1AOCC 3.120M40M60M80M100MSE +/- 1240215.58, N = 3SE +/- 655436.02, N = 159186378990448030-lgcov -mbmi2 -fno-peel-loops -fno-tracer -flto=jobserver1. (CXX) g++ options: -m64 -lpthread -O3 -march=znver3 -fno-exceptions -std=c++17 -pedantic -msse -msse3 -mpopcnt -mavx2 -msse4.1 -mssse3 -msse2 -flto -fprofile-use

libavif avifenc

Encoder Speed: 2

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.9.0Encoder Speed: 2Clang 12.0AOCC 3.1GCC 11.1612182430SE +/- 0.07, N = 3SE +/- 0.25, N = 3SE +/- 0.11, N = 324.4024.7726.211. (CXX) g++ options: -O3 -fPIC -lm

libavif avifenc

Encoder Speed: 6

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.9.0Encoder Speed: 6Clang 12.0AOCC 3.1GCC 11.13691215SE +/- 0.128, N = 3SE +/- 0.067, N = 3SE +/- 0.102, N = 39.6329.67210.3761. (CXX) g++ options: -O3 -fPIC -lm

libavif avifenc

Encoder Speed: 10

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.9.0Encoder Speed: 10AOCC 3.1Clang 12.0GCC 11.10.83211.66422.49633.32844.1605SE +/- 0.022, N = 3SE +/- 0.025, N = 15SE +/- 0.045, N = 33.4973.5173.6981. (CXX) g++ options: -O3 -fPIC -lm

libavif avifenc

Encoder Speed: 6, Lossless

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.9.0Encoder Speed: 6, LosslessClang 12.0AOCC 3.1GCC 11.1714212835SE +/- 0.14, N = 3SE +/- 0.14, N = 3SE +/- 0.10, N = 326.1026.6727.891. (CXX) g++ options: -O3 -fPIC -lm

libavif avifenc

Encoder Speed: 10, Lossless

OpenBenchmarking.orgSeconds, Fewer Is Betterlibavif avifenc 0.9.0Encoder Speed: 10, LosslessAOCC 3.1Clang 12.0GCC 11.1246810SE +/- 0.053, N = 3SE +/- 0.002, N = 3SE +/- 0.065, N = 35.8005.9666.2121. (CXX) g++ options: -O3 -fPIC -lm

POV-Ray

Trace Time

OpenBenchmarking.orgSeconds, Fewer Is BetterPOV-Ray 3.7.0.7Trace TimeAOCC 3.1Clang 12.0GCC 11.148121620SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.13, N = 314.7815.1415.36-R/usr/lib1. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -march=znver3 -pthread -lXpm -lSM -lICE -lX11 -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5 -lIexMath-2_5 -lIlmThread-2_5 -lIlmThread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUAOCC 3.1Clang 12.0GCC 11.10.32070.64140.96211.28281.6035SE +/- 0.00055, N = 3SE +/- 0.00211, N = 3SE +/- 0.00145, N = 31.401641.408671.42555-fopenmp=libomp - MIN: 1.27-fopenmp=libomp - MIN: 1.29-fopenmp - MIN: 1.281. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUClang 12.0GCC 11.1AOCC 3.11.1422.2843.4264.5685.71SE +/- 0.00188, N = 3SE +/- 0.03095, N = 14SE +/- 0.00433, N = 35.008425.034455.07547-fopenmp=libomp - MIN: 4.9-fopenmp - MIN: 4.87-fopenmp=libomp - MIN: 4.961. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUAOCC 3.1GCC 11.1Clang 12.00.6251.251.8752.53.125SE +/- 0.01062, N = 3SE +/- 0.00417, N = 3SE +/- 0.01715, N = 31.406741.918272.77765-fopenmp=libomp - MIN: 1.25-fopenmp - MIN: 1.2-fopenmp=libomp - MIN: 2.541. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUAOCC 3.1Clang 12.0GCC 11.11.03342.06683.10024.13365.167SE +/- 0.00353, N = 3SE +/- 0.00389, N = 3SE +/- 0.03320, N = 151.800961.836574.59273-fopenmp=libomp - MIN: 1.57-fopenmp=libomp - MIN: 1.61-fopenmp - MIN: 3.811. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUGCC 11.1Clang 12.0AOCC 3.10.70471.40942.11412.81883.5235SE +/- 0.00916, N = 3SE +/- 0.00207, N = 3SE +/- 0.00672, N = 32.954143.109643.13186-fopenmp - MIN: 2.68-fopenmp=libomp - MIN: 2.66-fopenmp=libomp - MIN: 2.761. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUGCC 11.1AOCC 3.1Clang 12.015003000450060007500SE +/- 85.54, N = 15SE +/- 68.78, N = 3SE +/- 58.03, N = 36221.986432.496828.04-fopenmp - MIN: 5154.78-fopenmp=libomp - MIN: 6265.76-fopenmp=libomp - MIN: 64941. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUAOCC 3.1Clang 12.0GCC 11.1150300450600750SE +/- 0.23, N = 3SE +/- 2.18, N = 3SE +/- 6.29, N = 7668.79679.05702.08-fopenmp=libomp - MIN: 659.59-fopenmp=libomp - MIN: 671.01-fopenmp - MIN: 683.481. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUAOCC 3.1Clang 12.0GCC 11.10.1030.2060.3090.4120.515SE +/- 0.000575, N = 3SE +/- 0.000360, N = 3SE +/- 0.000447, N = 30.4359990.4369060.457865-fopenmp=libomp - MIN: 0.39-fopenmp=libomp - MIN: 0.4-fopenmp - MIN: 0.421. (CXX) g++ options: -O3 -march=native -march=znver3 -std=c++11 -msse4.1 -fPIC -pie -lpthread -ldl

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACClang 12.0GCC 11.1AOCC 3.1246810SE +/- 0.007, N = 5SE +/- 0.007, N = 5SE +/- 0.002, N = 57.3268.1618.700-fvisibility=hidden1. (CXX) g++ options: -O3 -march=znver3 -logg -lm

LAME MP3 Encoding

WAV To MP3

OpenBenchmarking.orgSeconds, Fewer Is BetterLAME MP3 Encoding 3.100WAV To MP3GCC 11.1Clang 12.0AOCC 3.1246810SE +/- 0.016, N = 3SE +/- 0.003, N = 3SE +/- 0.003, N = 37.0997.8267.828-ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -lncurses-lncurses1. (CC) gcc options: -O3 -pipe -march=znver3 -lm

Ngspice

Circuit: C2670

OpenBenchmarking.orgSeconds, Fewer Is BetterNgspice 34Circuit: C2670AOCC 3.1Clang 12.0GCC 11.120406080100SE +/- 0.15, N = 3SE +/- 0.63, N = 3SE +/- 0.74, N = 391.5292.7399.09-lstdc++-lstdc++1. (CC) gcc options: -O3 -march=znver3 -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE

Ngspice

Circuit: C7552

OpenBenchmarking.orgSeconds, Fewer Is BetterNgspice 34Circuit: C7552AOCC 3.1Clang 12.0GCC 11.120406080100SE +/- 0.29, N = 3SE +/- 0.30, N = 3SE +/- 0.69, N = 982.2883.8984.14-lstdc++-lstdc++1. (CC) gcc options: -O3 -march=znver3 -fopenmp -lm -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE

RNNoise

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-28GCC 11.1Clang 12.0AOCC 3.1510152025SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.05, N = 318.0118.5618.731. (CC) gcc options: -O3 -march=znver3 -pedantic -fvisibility=hidden

Tachyon

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterTachyon 0.99b6Total TimeGCC 11.1Clang 12.0AOCC 3.1714212835SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 327.6428.6628.661. (CC) gcc options: -m64 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread

Google SynthMark

Test: VoiceMark_100

OpenBenchmarking.orgVoices, More Is BetterGoogle SynthMark 20201109Test: VoiceMark_100AOCC 3.1Clang 12.0130260390520650SE +/- 0.49, N = 3SE +/- 0.68, N = 3621.63621.431. (CXX) g++ options: -lm -lpthread -std=c++11 -Ofast

SecureMark

Benchmark: SecureMark-TLS

OpenBenchmarking.orgmarks, More Is BetterSecureMark 1.0.4Benchmark: SecureMark-TLSClang 12.0AOCC 3.1GCC 11.160K120K180K240K300KSE +/- 513.21, N = 3SE +/- 250.58, N = 3SE +/- 178.03, N = 32851722799312571911. (CC) gcc options: -pedantic -O3

Liquid-DSP

Threads: 16 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 16 - Buffer Length: 256 - Filter Length: 57GCC 11.1Clang 12.0AOCC 3.1200M400M600M800M1000MSE +/- 1548777.30, N = 3SE +/- 1271918.94, N = 3SE +/- 596852.86, N = 39544933338867933338065100001. (CC) gcc options: -O3 -march=znver3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 32 - Buffer Length: 256 - Filter Length: 57GCC 11.1Clang 12.0AOCC 3.1400M800M1200M1600M2000MSE +/- 3779035.74, N = 3SE +/- 1570916.22, N = 3SE +/- 635959.47, N = 31719166667165173333315279333331. (CC) gcc options: -O3 -march=znver3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 64 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 2021.01.31Threads: 64 - Buffer Length: 256 - Filter Length: 57Clang 12.0AOCC 3.1GCC 11.1400M800M1200M1600M2000MSE +/- 88191.71, N = 3SE +/- 983756.97, N = 3SE +/- 581186.53, N = 32068366667189496666718736666671. (CC) gcc options: -O3 -march=znver3 -pthread -lm -lc -lliquid

FinanceBench

Benchmark: Repo OpenMP

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Repo OpenMPClang 12.0AOCC 3.1GCC 11.18K16K24K32K40KSE +/- 257.66, N = 3SE +/- 308.02, N = 3SE +/- 418.97, N = 433674.4033763.5835022.331. (CXX) g++ options: -O3 -march=native -fopenmp

FinanceBench

Benchmark: Bonds OpenMP

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Bonds OpenMPGCC 11.1AOCC 3.1Clang 12.011K22K33K44K55KSE +/- 192.12, N = 3SE +/- 150.62, N = 3SE +/- 370.43, N = 350564.0251185.4951304.841. (CXX) g++ options: -O3 -march=native -fopenmp

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 2.1.0Test: Decompression ThroughputAOCC 3.1Clang 12.0GCC 11.150100150200250SE +/- 0.05, N = 3SE +/- 0.22, N = 3SE +/- 0.09, N = 3220.67214.48210.801. (CC) gcc options: -O3 -march=znver3 -rdynamic

ASTC Encoder

Preset: Exhaustive

OpenBenchmarking.orgSeconds, Fewer Is BetterASTC Encoder 3.0Preset: ExhaustiveAOCC 3.1GCC 11.1612182430SE +/- 0.01, N = 3SE +/- 0.00, N = 321.8923.261. (CXX) g++ options: -O3 -march=znver3 -flto -pthread

SQLite Speedtest

Timed Time - Size 1,000

OpenBenchmarking.orgSeconds, Fewer Is BetterSQLite Speedtest 3.30Timed Time - Size 1,000GCC 11.1Clang 12.0AOCC 3.11224364860SE +/- 0.18, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 353.7854.8055.241. (CC) gcc options: -O3 -march=znver3 -ldl -lz -lpthread

Google Draco

Model: Lion

OpenBenchmarking.orgms, Fewer Is BetterGoogle Draco 1.4.1Model: LionAOCC 3.1Clang 12.012002400360048006000SE +/- 25.93, N = 3SE +/- 6.01, N = 3496054901. (CXX) g++ options: -O3 -march=znver3

Google Draco

Model: Church Facade

OpenBenchmarking.orgms, Fewer Is BetterGoogle Draco 1.4.1Model: Church FacadeAOCC 3.1Clang 12.016003200480064008000SE +/- 49.64, N = 14SE +/- 22.32, N = 3648874691. (CXX) g++ options: -O3 -march=znver3

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210525Target: CPU-v2-v2 - Model: mobilenet-v2AOCC 3.1Clang 12.0GCC 11.1246810SE +/- 0.03, N = 15SE +/- 0.05, N = 3SE +/- 0.01, N = 35.746.827.71-lomp - MIN: 5.3 / MAX: 32.98-lomp - MIN: 6.56 / MAX: 7.47-lgomp - MIN: 7.5 / MAX: 16.341. (CXX) g++ options: -O3 -march=znver3 -rdynamic -lpthread -pthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210525Target: CPU-v3-v3 - Model: mobilenet-v3AOCC 3.1Clang 12.0GCC 11.1246810SE +/- 0.04, N = 15SE +/- 0.19, N = 3SE +/- 0.04, N = 35.606.777.09-lomp - MIN: 4.66 / MAX: 56.3-lomp - MIN: 5.93 / MAX: 47.97-lgomp - MIN: 6.52 / MAX: 39.191. (CXX) g++ options: -O3 -march=znver3 -rdynamic -lpthread -pthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210525Target: CPU - Model: vgg16AOCC 3.1GCC 11.1Clang 12.01326395265SE +/- 1.18, N = 15SE +/- 1.12, N = 3SE +/- 2.17, N = 351.6056.4259.74-lomp - MIN: 29.8 / MAX: 198.51-lgomp - MIN: 47.61 / MAX: 93.66-lomp - MIN: 51.34 / MAX: 157.21. (CXX) g++ options: -O3 -march=znver3 -rdynamic -lpthread -pthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210525Target: CPU - Model: resnet18GCC 11.1AOCC 3.1Clang 12.03691215SE +/- 0.11, N = 3SE +/- 0.09, N = 15SE +/- 0.04, N = 311.1911.8813.15-lgomp - MIN: 10.93 / MAX: 50.17-lomp - MIN: 10.11 / MAX: 110.24-lomp - MIN: 12.7 / MAX: 14.361. (CXX) g++ options: -O3 -march=znver3 -rdynamic -lpthread -pthread

TNN

Target: CPU - Model: SqueezeNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2AOCC 3.1Clang 12.0GCC 11.11428425670SE +/- 0.03, N = 3SE +/- 0.62, N = 3SE +/- 0.23, N = 360.6661.2264.79-fopenmp=libomp - MIN: 60.48 / MAX: 60.97-fopenmp=libomp - MIN: 60.1 / MAX: 67.06-fopenmp - MIN: 64.25 / MAX: 65.311. (CXX) g++ options: -O3 -march=znver3 -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v1.1

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1Clang 12.0AOCC 3.1GCC 11.160120180240300SE +/- 0.09, N = 3SE +/- 0.08, N = 3SE +/- 0.20, N = 3265.12266.24270.54-fopenmp=libomp - MIN: 264.65 / MAX: 271.09-fopenmp=libomp - MIN: 265.84 / MAX: 266.54-fopenmp - MIN: 270.13 / MAX: 271.091. (CXX) g++ options: -O3 -march=znver3 -pthread -fvisibility=hidden -fvisibility=default -rdynamic -ldl

Facebook RocksDB

Test: Update Random

OpenBenchmarking.orgOp/s, More Is BetterFacebook RocksDB 6.22.1Test: Update RandomAOCC 3.1GCC 11.1Clang 12.0200K400K600K800K1000KSE +/- 1446.17, N = 3SE +/- 3250.14, N = 3SE +/- 1551.72, N = 3832651832118812665-latomic-fno-builtin-memcmp-latomic1. (CXX) g++ options: -O3 -march=native -pthread -fno-rtti -lpthread

Facebook RocksDB

Test: Read While Writing

OpenBenchmarking.orgOp/s, More Is BetterFacebook RocksDB 6.22.1Test: Read While WritingGCC 11.1AOCC 3.1Clang 12.01.4M2.8M4.2M5.6M7MSE +/- 51448.06, N = 9SE +/- 55702.89, N = 3SE +/- 43820.68, N = 15636900461355605822258-fno-builtin-memcmp-latomic-latomic1. (CXX) g++ options: -O3 -march=native -pthread -fno-rtti -lpthread

Facebook RocksDB

Test: Read Random Write Random

OpenBenchmarking.orgOp/s, More Is BetterFacebook RocksDB 6.22.1Test: Read Random Write RandomGCC 11.1AOCC 3.1Clang 12.0700K1400K2100K2800K3500KSE +/- 12186.30, N = 3SE +/- 32512.94, N = 4SE +/- 18538.77, N = 14313999330590682875994-fno-builtin-memcmp-latomic-latomic1. (CXX) g++ options: -O3 -march=native -pthread -fno-rtti -lpthread

ONNX Runtime

Model: bertsquad-10 - Device: OpenMP CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.6Model: bertsquad-10 - Device: OpenMP CPUAOCC 3.1Clang 12.0GCC 11.1110220330440550SE +/- 4.81, N = 5SE +/- 5.04, N = 3SE +/- 5.78, N = 34874804321. (CXX) g++ options: -O3 -march=znver3 -fopenmp=libomp -ffunction-sections -fdata-sections -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: OpenMP CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.6Model: fcn-resnet101-11 - Device: OpenMP CPUAOCC 3.1GCC 11.1Clang 12.020406080100SE +/- 1.74, N = 11SE +/- 3.97, N = 9SE +/- 1.22, N = 129089851. (CXX) g++ options: -O3 -march=znver3 -fopenmp=libomp -ffunction-sections -fdata-sections -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: OpenMP CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.6Model: super-resolution-10 - Device: OpenMP CPUGCC 11.1Clang 12.0AOCC 3.111002200330044005500SE +/- 51.87, N = 3SE +/- 23.74, N = 3SE +/- 49.29, N = 54922463045681. (CXX) g++ options: -O3 -march=znver3 -fopenmp=libomp -ffunction-sections -fdata-sections -ldl -lrt

WavPack Audio Encoding

WAV To WavPack

OpenBenchmarking.orgSeconds, Fewer Is BetterWavPack Audio Encoding 5.3WAV To WavPackGCC 11.1Clang 12.0AOCC 3.13691215SE +/- 0.00, N = 5SE +/- 0.00, N = 5SE +/- 0.00, N = 513.1313.1713.181. (CXX) g++ options: -O3 -march=znver3 -rdynamic

GnuPG

2.7GB Sample File Encryption

OpenBenchmarking.orgSeconds, Fewer Is BetterGnuPG 2.2.272.7GB Sample File EncryptionGCC 11.1AOCC 3.1Clang 12.01530456075SE +/- 0.12, N = 3SE +/- 0.24, N = 3SE +/- 0.51, N = 367.4967.6167.901. (CC) gcc options: -O3 -march=znver3

Geometric Mean Of All Test Results

Result Composite - AMD AOCC 3.1 Compiler Comparison

OpenBenchmarking.orgGeometric Mean, More Is BetterGeometric Mean Of All Test ResultsResult Composite - AMD AOCC 3.1 Compiler ComparisonAOCC 3.1Clang 12.0GCC 11.1306090120150154.72152.20150.45


Phoronix Test Suite v10.8.4