AMD EPYC 7702 64-Core testing with a ASRockRack EPYCD8 (P2.40 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
1 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8301034Python Notes: Python 3.8.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
2 3 Processor: AMD EPYC 7702 64-Core @ 2.00GHz (64 Cores / 128 Threads), Motherboard: ASRockRack EPYCD8 (P2.40 BIOS), Chipset: AMD Starship/Matisse, Memory: 126GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel I350
OS: Ubuntu 20.04, Kernel: 5.9.0-050900rc6daily20200921-generic (x86_64) 20200920, Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
EPYC 7702 April 2021 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 2 3 AMD EPYC 7702 64-Core @ 2.00GHz (64 Cores / 128 Threads) ASRockRack EPYCD8 (P2.40 BIOS) AMD Starship/Matisse 126GB 3841GB Micron_9300_MTFDHAL3T8TDP ASPEED VE228 2 x Intel I350 Ubuntu 20.04 5.9.0-050900rc6daily20200921-generic (x86_64) 20200920 GNOME Shell 3.36.4 X Server 1.20.8 GCC 9.3.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8301034 Python Details - Python 3.8.2 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
1 2 3 Result Overview Phoronix Test Suite 100% 104% 107% 111% 115% SVT-VP9 Timed Linux Kernel Compilation Xcompact3d Incompact3d Stockfish AOM AV1 GNU Radio ViennaCL Zstd Compression libavif avifenc SVT-HEVC oneDNN Timed Erlang/OTP Compilation Timed Mesa Compilation Timed Node.js Compilation Blender Sysbench simdjson GNU GMP GMPbench LuaRadio Botan Liquid-DSP toyBrot Fractal Generator
EPYC 7702 April 2021 sysbench: CPU aom-av1: Speed 4 Two-Pass - Bosphorus 4K aom-av1: Speed 6 Realtime - Bosphorus 4K aom-av1: Speed 6 Two-Pass - Bosphorus 4K aom-av1: Speed 8 Realtime - Bosphorus 4K aom-av1: Speed 9 Realtime - Bosphorus 4K aom-av1: Speed 4 Two-Pass - Bosphorus 1080p aom-av1: Speed 6 Realtime - Bosphorus 1080p aom-av1: Speed 6 Two-Pass - Bosphorus 1080p aom-av1: Speed 8 Realtime - Bosphorus 1080p aom-av1: Speed 9 Realtime - Bosphorus 1080p svt-hevc: 1 - Bosphorus 1080p svt-hevc: 7 - Bosphorus 1080p svt-hevc: 10 - Bosphorus 1080p svt-vp9: VMAF Optimized - Bosphorus 1080p svt-vp9: PSNR/SSIM Optimized - Bosphorus 1080p svt-vp9: Visual Quality Optimized - Bosphorus 1080p simdjson: Kostya simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT gmpbench: Total Time compress-zstd: 3 - Compression Speed compress-zstd: 8 - Compression Speed compress-zstd: 8 - Decompression Speed compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed compress-zstd: 3, Long Mode - Compression Speed compress-zstd: 3, Long Mode - Decompression Speed compress-zstd: 8, Long Mode - Compression Speed compress-zstd: 8, Long Mode - Decompression Speed compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19, Long Mode - Decompression Speed botan: KASUMI botan: KASUMI - Decrypt botan: AES-256 botan: AES-256 - Decrypt botan: Twofish botan: Twofish - Decrypt botan: Blowfish botan: Blowfish - Decrypt botan: CAST-256 botan: CAST-256 - Decrypt botan: ChaCha20Poly1305 botan: ChaCha20Poly1305 - Decrypt luaradio: Five Back to Back FIR Filters luaradio: FM Deemphasis Filter luaradio: Hilbert Transform luaradio: Complex Phase gnuradio: Five Back to Back FIR Filters gnuradio: Signal Source (Cosine) gnuradio: FIR Filter gnuradio: IIR Filter gnuradio: FM Deemphasis Filter gnuradio: Hilbert Transform sysbench: RAM / Memory stockfish: Total Time liquid-dsp: 1 - 256 - 57 liquid-dsp: 2 - 256 - 57 liquid-dsp: 4 - 256 - 57 liquid-dsp: 8 - 256 - 57 liquid-dsp: 16 - 256 - 57 liquid-dsp: 32 - 256 - 57 liquid-dsp: 64 - 256 - 57 liquid-dsp: 128 - 256 - 57 toybrot: TBB toybrot: OpenMP toybrot: C++ Tasks toybrot: C++ Threads onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU incompact3d: X3D-benchmarking input.i3d incompact3d: input.i3d 129 Cells Per Direction incompact3d: input.i3d 193 Cells Per Direction avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 10 avifenc: 6, Lossless avifenc: 10, Lossless build-linux-kernel: Time To Compile build-mesa: Time To Compile build-nodejs: Time To Compile build-erlang: Time To Compile blender: BMW27 - CPU-Only blender: Classroom - CPU-Only blender: Fishy Cat - CPU-Only blender: Barbershop - CPU-Only blender: Pabellon Barcelona - CPU-Only 1 2 3 96671.09 3.82 11.89 6.72 22.22 25.94 5.79 18.24 16.22 58.37 67.34 34.41 277.91 462.61 238.14 362.92 279.1 2.12 0.66 3.55 3.59 757 687 461 1390 1320 896 30.9 636 93 90.5 95.1 92.9 4542 5279.4 2510 2763.5 83.5 2559.5 410.5 2892.4 444.1 2986.2 41.5 2552.2 77.653 75.091 4560.376 4560.395 302.438 302.234 369.321 367.42 120.055 120.066 633.174 632.656 451.2 339.2 82.7 515.5 331.7 2913.8 534.5 486.7 744 346.5 6128.19 134641813 59654000 118920000 237970000 475650000 937750000 1690600000 2715700000 3129700000 7471 7848 7599 7207 1.27985 5.04307 1.48314 0.998801 1.00901 1.71993 2.67016 3.454 0.911802 1.22494 2089.27 732.05 2081.53 730.784 0.433838 2082.99 734.173 1.14965 689.604065 5.54909182 25.8832493 53.306 28.412 10.442 3.712 29.438 6.767 33.55 23.304 128.184 157.679 38.74 100.03 55 142.58 116.52 96717.84 3.82 11.74 6.98 22.18 25.71 5.77 18.22 16.09 54.47 65.79 34.59 276.07 460.86 361.00 361.43 273.59 2.13 0.66 3.53 3.61 764 679 507 1367 1283 904 28.5 620 91.4 89.7 94.6 92.9 4546.4 5243.6 2441.1 2770.2 82.1 2564.8 404.5 2888.0 444.0 2992.6 41.1 2553.7 77.661 75.101 4558.626 4559.088 302.492 302.337 369.219 367.797 120.065 120.058 635.007 631.089 452.0 339.8 82.7 513.8 323.0 2868.2 537.3 489.7 734.1 346.1 6104.81 133169802 59569333 119036667 238236667 476116667 936776667 1693066667 2718266667 3124233333 7244 7941 7579 7278 1.27544 5.04320 1.47048 0.980323 1.00899 1.74774 2.67036 3.46141 0.872769 1.22224 2098.23 732.625 2085.38 732.810 0.446141 2085.08 732.294 1.15166 690.480672 5.96101077 25.5524209 53.216 28.391 10.468 3.754 29.500 6.796 31.155 23.232 128.468 157.479 38.88 100.58 54.80 142.94 117.14 96694.41 3.83 11.74 6.79 21.67 25.99 5.76 18.17 16.04 54.32 66.10 34.45 279.74 462.91 360.74 363.64 277.87 2.13 0.66 3.53 3.59 753 688 479 1383 1327 904 27.3 653 92.5 89.8 94.7 92.5 4542.4 5250.3 2386.1 2768.5 82.3 2555.8 408.0 2884.6 473.1 2989.6 41.2 2553.0 77.663 75.103 4555.625 4555.311 302.285 302.180 368.993 367.804 120.055 120.096 635.338 630.505 452.2 339.9 82.7 514.5 322.6 2879.4 535.1 488.3 730.1 347.1 6121.17 132630158 59520333 119086667 238053333 476420000 936146667 1691266667 2727266667 3119400000 7119 7921 7611 7258 1.28764 5.05103 1.46707 0.986806 1.00894 1.73568 2.66877 3.45314 0.893830 1.21841 2095.26 732.382 2122.66 733.512 0.450310 2094.12 732.483 1.15391 691.061747 5.68558963 25.6117503 53.287 28.414 10.488 3.773 29.556 6.809 31.162 23.232 128.378 157.302 39.04 100.08 55.02 142.46 116.66 OpenBenchmarking.org
Sysbench This is a benchmark of Sysbench with the built-in CPU and memory sub-tests. Sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU 1 2 3 20K 40K 60K 80K 100K SE +/- 9.51, N = 3 SE +/- 20.57, N = 3 96671.09 96717.84 96694.41 1. (CC) gcc options: -pthread -O2 -funroll-loops -O3 -march=native -rdynamic -ldl -laio -lm
AOM AV1 This is a test of the AOMedia AV1 encoder (libaom) developed by AOMedia and Google. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 4K 1 2 3 0.8618 1.7236 2.5854 3.4472 4.309 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.82 3.82 3.83 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 4K 1 2 3 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 11.89 11.74 11.74 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 4K 1 2 3 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 6.72 6.98 6.79 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4K 1 2 3 5 10 15 20 25 SE +/- 0.20, N = 3 SE +/- 0.30, N = 3 22.22 22.18 21.67 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4K 1 2 3 6 12 18 24 30 SE +/- 0.21, N = 3 SE +/- 0.12, N = 3 25.94 25.71 25.99 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 4 Two-Pass - Input: Bosphorus 1080p 1 2 3 1.3028 2.6056 3.9084 5.2112 6.514 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.79 5.77 5.76 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Realtime - Input: Bosphorus 1080p 1 2 3 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 18.24 18.22 18.17 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 1080p 1 2 3 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 16.22 16.09 16.04 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 8 Realtime - Input: Bosphorus 1080p 1 2 3 13 26 39 52 65 SE +/- 0.25, N = 3 SE +/- 0.38, N = 3 58.37 54.47 54.32 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 3.0 Encoder Mode: Speed 9 Realtime - Input: Bosphorus 1080p 1 2 3 15 30 45 60 75 SE +/- 0.53, N = 3 SE +/- 0.03, N = 3 67.34 65.79 66.10 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
SVT-HEVC This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 1080p 1 2 3 8 16 24 32 40 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 34.41 34.59 34.45 1. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 1080p 1 2 3 60 120 180 240 300 SE +/- 0.37, N = 3 SE +/- 1.54, N = 3 277.91 276.07 279.74 1. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 1080p 1 2 3 100 200 300 400 500 SE +/- 2.49, N = 3 SE +/- 3.83, N = 3 462.61 460.86 462.91 1. (CC) gcc options: -O3 -march=native -fPIE -fPIC -O2 -pie -rdynamic -lpthread -lrt
SVT-VP9 This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: VMAF Optimized - Input: Bosphorus 1080p 1 2 3 80 160 240 320 400 SE +/- 0.55, N = 3 SE +/- 0.35, N = 3 238.14 361.00 360.74 1. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080p 1 2 3 80 160 240 320 400 SE +/- 1.33, N = 3 SE +/- 3.53, N = 3 362.92 361.43 363.64 1. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
OpenBenchmarking.org Frames Per Second, More Is Better SVT-VP9 0.3 Tuning: Visual Quality Optimized - Input: Bosphorus 1080p 1 2 3 60 120 180 240 300 SE +/- 2.47, N = 3 SE +/- 3.78, N = 3 279.10 273.59 277.87 1. (CC) gcc options: -O3 -fcommon -march=native -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm
simdjson This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya 1 2 3 0.4793 0.9586 1.4379 1.9172 2.3965 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.12 2.13 2.13 1. (CXX) g++ options: -O3 -march=native -pthread
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom 1 2 3 0.1485 0.297 0.4455 0.594 0.7425 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.66 0.66 0.66 1. (CXX) g++ options: -O3 -march=native -pthread
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets 1 2 3 0.7988 1.5976 2.3964 3.1952 3.994 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 3.55 3.53 3.53 1. (CXX) g++ options: -O3 -march=native -pthread
OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID 1 2 3 0.8123 1.6246 2.4369 3.2492 4.0615 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.59 3.61 3.59 1. (CXX) g++ options: -O3 -march=native -pthread
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY 1 2 3 150 300 450 600 750 SE +/- 6.89, N = 3 SE +/- 0.58, N = 3 687 679 688 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT 1 2 3 110 220 330 440 550 SE +/- 27.14, N = 3 SE +/- 13.58, N = 3 461 507 479 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY 1 2 3 300 600 900 1200 1500 SE +/- 14.53, N = 3 SE +/- 8.82, N = 3 1390 1367 1383 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY 1 2 3 300 600 900 1200 1500 SE +/- 41.77, N = 3 SE +/- 3.33, N = 3 1320 1283 1327 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT 1 2 3 200 400 600 800 1000 SE +/- 6.84, N = 3 896 904 904 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N 1 2 3 7 14 21 28 35 SE +/- 0.47, N = 3 SE +/- 1.05, N = 3 30.9 28.5 27.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T 1 2 3 140 280 420 560 700 SE +/- 14.00, N = 3 SE +/- 7.86, N = 3 636 620 653 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN 1 2 3 20 40 60 80 100 SE +/- 0.35, N = 3 SE +/- 0.17, N = 3 93.0 91.4 92.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT 1 2 3 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.15, N = 3 90.5 89.7 89.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN 1 2 3 20 40 60 80 100 SE +/- 0.03, N = 3 SE +/- 0.15, N = 3 95.1 94.6 94.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT 1 2 3 20 40 60 80 100 SE +/- 0.20, N = 2 SE +/- 0.30, N = 3 92.9 92.9 92.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Zstd Compression This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 3 - Compression Speed 1 2 3 1100 2200 3300 4400 5500 SE +/- 2.38, N = 3 SE +/- 3.83, N = 3 5279.4 5243.6 5250.3 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8 - Compression Speed 1 2 3 500 1000 1500 2000 2500 SE +/- 26.18, N = 3 SE +/- 11.70, N = 3 2510.0 2441.1 2386.1 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8 - Decompression Speed 1 2 3 600 1200 1800 2400 3000 SE +/- 1.83, N = 3 SE +/- 5.72, N = 3 2763.5 2770.2 2768.5 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19 - Compression Speed 1 2 3 20 40 60 80 100 SE +/- 0.44, N = 3 SE +/- 1.26, N = 3 83.5 82.1 82.3 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19 - Decompression Speed 1 2 3 600 1200 1800 2400 3000 SE +/- 5.02, N = 3 SE +/- 7.03, N = 3 2559.5 2564.8 2555.8 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 3, Long Mode - Compression Speed 1 2 3 90 180 270 360 450 SE +/- 6.05, N = 3 SE +/- 3.33, N = 3 410.5 404.5 408.0 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 3, Long Mode - Decompression Speed 1 2 3 600 1200 1800 2400 3000 SE +/- 0.59, N = 3 SE +/- 3.16, N = 3 2892.4 2888.0 2884.6 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8, Long Mode - Compression Speed 1 2 3 100 200 300 400 500 SE +/- 0.80, N = 3 SE +/- 8.81, N = 15 444.1 444.0 473.1 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 8, Long Mode - Decompression Speed 1 2 3 600 1200 1800 2400 3000 SE +/- 5.14, N = 3 SE +/- 2.45, N = 15 2986.2 2992.6 2989.6 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19, Long Mode - Compression Speed 1 2 3 9 18 27 36 45 SE +/- 0.09, N = 3 SE +/- 0.23, N = 3 41.5 41.1 41.2 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.9 Compression Level: 19, Long Mode - Decompression Speed 1 2 3 500 1000 1500 2000 2500 SE +/- 3.23, N = 3 SE +/- 1.71, N = 3 2552.2 2553.7 2553.0 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
Botan Botan is a BSD-licensed cross-platform open-source C++ crypto library "cryptography toolkit" that supports most publicly known cryptographic algorithms. The project's stated goal is to be "the best option for cryptography in C++ by offering the tools necessary to implement a range of practical systems, such as TLS protocol, X.509 certificates, modern AEAD ciphers, PKCS#11 and TPM hardware support, password hashing, and post quantum crypto schemes." Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI 1 2 3 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 77.65 77.66 77.66 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: KASUMI - Decrypt 1 2 3 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 75.09 75.10 75.10 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 1 2 3 1000 2000 3000 4000 5000 SE +/- 0.67, N = 3 SE +/- 4.16, N = 3 4560.38 4558.63 4555.63 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: AES-256 - Decrypt 1 2 3 1000 2000 3000 4000 5000 SE +/- 1.20, N = 3 SE +/- 5.30, N = 3 4560.40 4559.09 4555.31 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish 1 2 3 70 140 210 280 350 SE +/- 0.02, N = 3 SE +/- 0.11, N = 3 302.44 302.49 302.29 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Twofish - Decrypt 1 2 3 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 302.23 302.34 302.18 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish 1 2 3 80 160 240 320 400 SE +/- 0.12, N = 3 SE +/- 0.14, N = 3 369.32 369.22 368.99 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: Blowfish - Decrypt 1 2 3 80 160 240 320 400 SE +/- 0.08, N = 3 SE +/- 0.08, N = 3 367.42 367.80 367.80 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 1 2 3 30 60 90 120 150 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 120.06 120.07 120.06 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: CAST-256 - Decrypt 1 2 3 30 60 90 120 150 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 120.07 120.06 120.10 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 1 2 3 140 280 420 560 700 SE +/- 0.48, N = 3 SE +/- 0.61, N = 3 633.17 635.01 635.34 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
OpenBenchmarking.org MiB/s, More Is Better Botan 2.17.3 Test: ChaCha20Poly1305 - Decrypt 1 2 3 140 280 420 560 700 SE +/- 0.23, N = 3 SE +/- 0.53, N = 3 632.66 631.09 630.51 1. (CXX) g++ options: -fstack-protector -m64 -pthread -lbotan-2 -ldl -lrt
LuaRadio LuaRadio is a lightweight software-defined radio (SDR) framework built atop LuaJIT. LuaRadio provides a suite of source, sink, and processing blocks, with a simple API for defining flow graphs, running flow graphs, creating blocks, and creating data types. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Five Back to Back FIR Filters 1 2 3 100 200 300 400 500 SE +/- 0.38, N = 3 SE +/- 1.34, N = 3 451.2 452.0 452.2
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Signal Source (Cosine) 1 2 3 600 1200 1800 2400 3000 SE +/- 4.36, N = 3 SE +/- 27.86, N = 3 2913.8 2868.2 2879.4 1. 3.8.1.0
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: FIR Filter 1 2 3 120 240 360 480 600 SE +/- 0.95, N = 3 SE +/- 0.73, N = 3 534.5 537.3 535.1 1. 3.8.1.0
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: IIR Filter 1 2 3 110 220 330 440 550 SE +/- 1.77, N = 3 SE +/- 0.90, N = 3 486.7 489.7 488.3 1. 3.8.1.0
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: FM Deemphasis Filter 1 2 3 160 320 480 640 800 SE +/- 2.41, N = 3 SE +/- 5.82, N = 3 744.0 734.1 730.1 1. 3.8.1.0
OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Hilbert Transform 1 2 3 80 160 240 320 400 SE +/- 0.32, N = 3 SE +/- 2.03, N = 3 346.5 346.1 347.1 1. 3.8.1.0
Sysbench This is a benchmark of Sysbench with the built-in CPU and memory sub-tests. Sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MiB/sec, More Is Better Sysbench 1.0.20 Test: RAM / Memory 1 2 3 1300 2600 3900 5200 6500 SE +/- 9.71, N = 3 SE +/- 9.05, N = 3 6128.19 6104.81 6121.17 1. (CC) gcc options: -pthread -O2 -funroll-loops -O3 -march=native -rdynamic -ldl -laio -lm
Stockfish This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time 1 2 3 30M 60M 90M 120M 150M SE +/- 1363379.57, N = 8 SE +/- 1858448.97, N = 4 134641813 133169802 132630158 1. (CXX) g++ options: -fprofile-use -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++17 -pedantic -msse -msse3 -mpopcnt -mavx2 -msse4.1 -mssse3 -msse2 -flto
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 1 2 3 13M 26M 39M 52M 65M SE +/- 43364.09, N = 3 SE +/- 66338.36, N = 3 59654000 59569333 59520333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 2 - Buffer Length: 256 - Filter Length: 57 1 2 3 30M 60M 90M 120M 150M SE +/- 28480.01, N = 3 SE +/- 60092.52, N = 3 118920000 119036667 119086667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 4 - Buffer Length: 256 - Filter Length: 57 1 2 3 50M 100M 150M 200M 250M SE +/- 92616.29, N = 3 SE +/- 116237.31, N = 3 237970000 238236667 238053333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 1 2 3 100M 200M 300M 400M 500M SE +/- 493637.29, N = 3 SE +/- 230289.67, N = 3 475650000 476116667 476420000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 16 - Buffer Length: 256 - Filter Length: 57 1 2 3 200M 400M 600M 800M 1000M SE +/- 935420.29, N = 3 SE +/- 1134097.78, N = 3 937750000 936776667 936146667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 32 - Buffer Length: 256 - Filter Length: 57 1 2 3 400M 800M 1200M 1600M 2000M SE +/- 88191.71, N = 3 SE +/- 218581.28, N = 3 1690600000 1693066667 1691266667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 64 - Buffer Length: 256 - Filter Length: 57 1 2 3 600M 1200M 1800M 2400M 3000M SE +/- 4836091.17, N = 3 SE +/- 1003881.36, N = 3 2715700000 2718266667 2727266667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 128 - Buffer Length: 256 - Filter Length: 57 1 2 3 700M 1400M 2100M 2800M 3500M SE +/- 2649108.86, N = 3 SE +/- 503322.30, N = 3 3129700000 3124233333 3119400000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 0.2897 0.5794 0.8691 1.1588 1.4485 SE +/- 0.00139, N = 3 SE +/- 0.01102, N = 3 1.27985 1.27544 1.28764 MIN: 1.22 MIN: 1.22 MIN: 1.22 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 2 3 1.1365 2.273 3.4095 4.546 5.6825 SE +/- 0.00358, N = 3 SE +/- 0.00299, N = 3 5.04307 5.04320 5.05103 MIN: 4.9 MIN: 4.85 MIN: 4.87 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3337 0.6674 1.0011 1.3348 1.6685 SE +/- 0.00614, N = 3 SE +/- 0.00090, N = 3 1.48314 1.47048 1.46707 MIN: 1.24 MIN: 1.21 MIN: 1.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2247 0.4494 0.6741 0.8988 1.1235 SE +/- 0.001130, N = 3 SE +/- 0.001469, N = 3 0.998801 0.980323 0.986806 MIN: 0.94 MIN: 0.94 MIN: 0.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 0.227 0.454 0.681 0.908 1.135 SE +/- 0.00095, N = 3 SE +/- 0.00167, N = 3 1.00901 1.00899 1.00894 MIN: 0.97 MIN: 0.97 MIN: 0.97 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 1 2 3 0.3932 0.7864 1.1796 1.5728 1.966 SE +/- 0.00765, N = 3 SE +/- 0.00100, N = 3 1.71993 1.74774 1.73568 MIN: 1.64 MIN: 1.65 MIN: 1.65 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 0.6008 1.2016 1.8024 2.4032 3.004 SE +/- 0.00818, N = 3 SE +/- 0.00016, N = 3 2.67016 2.67036 2.66877 MIN: 2.51 MIN: 2.5 MIN: 2.48 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.7788 1.5576 2.3364 3.1152 3.894 SE +/- 0.00371, N = 3 SE +/- 0.00642, N = 3 3.45400 3.46141 3.45314 MIN: 3.4 MIN: 3.38 MIN: 3.38 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2052 0.4104 0.6156 0.8208 1.026 SE +/- 0.011992, N = 4 SE +/- 0.007517, N = 3 0.911802 0.872769 0.893830 MIN: 0.81 MIN: 0.8 MIN: 0.81 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2756 0.5512 0.8268 1.1024 1.378 SE +/- 0.00091, N = 3 SE +/- 0.00239, N = 3 1.22494 1.22224 1.21841 MIN: 1.15 MIN: 1.13 MIN: 1.12 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 500 1000 1500 2000 2500 SE +/- 11.83, N = 3 SE +/- 3.90, N = 3 2089.27 2098.23 2095.26 MIN: 2067.67 MIN: 2065.31 MIN: 2072.7 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 160 320 480 640 800 SE +/- 1.26, N = 3 SE +/- 1.01, N = 3 732.05 732.63 732.38 MIN: 720.72 MIN: 720.33 MIN: 718.93 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 500 1000 1500 2000 2500 SE +/- 3.09, N = 3 SE +/- 10.59, N = 3 2081.53 2085.38 2122.66 MIN: 2065.05 MIN: 2066.13 MIN: 2083.97 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 160 320 480 640 800 SE +/- 1.60, N = 3 SE +/- 1.21, N = 3 730.78 732.81 733.51 MIN: 719.75 MIN: 720.04 MIN: 720.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.1013 0.2026 0.3039 0.4052 0.5065 SE +/- 0.001158, N = 3 SE +/- 0.001517, N = 3 0.433838 0.446141 0.450310 MIN: 0.39 MIN: 0.38 MIN: 0.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 400 800 1200 1600 2000 SE +/- 1.02, N = 3 SE +/- 6.39, N = 3 2082.99 2085.08 2094.12 MIN: 2067.76 MIN: 2067.79 MIN: 2069.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 160 320 480 640 800 SE +/- 0.35, N = 3 SE +/- 0.14, N = 3 734.17 732.29 732.48 MIN: 722.64 MIN: 720.4 MIN: 720.58 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2596 0.5192 0.7788 1.0384 1.298 SE +/- 0.00070, N = 3 SE +/- 0.00189, N = 3 1.14965 1.15166 1.15391 MIN: 1.09 MIN: 1.08 MIN: 1.09 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp=libomp -msse4.1 -fPIC -pie -lpthread -ldl
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d 1 2 3 150 300 450 600 750 SE +/- 0.64, N = 3 SE +/- 0.84, N = 3 689.60 690.48 691.06 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction 1 2 3 1.3412 2.6824 4.0236 5.3648 6.706 SE +/- 0.08892473, N = 3 SE +/- 0.08476458, N = 3 5.54909182 5.96101077 5.68558963 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 1 2 3 6 12 18 24 30 SE +/- 0.22, N = 3 SE +/- 0.28, N = 3 25.88 25.55 25.61 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Blender Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing via OpenCL, NVIDIA OptiX, and NVIDIA CUDA is supported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: BMW27 - Compute: CPU-Only 1 2 3 9 18 27 36 45 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 38.74 38.88 39.04
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: Classroom - Compute: CPU-Only 1 2 3 20 40 60 80 100 SE +/- 0.60, N = 3 SE +/- 0.04, N = 3 100.03 100.58 100.08
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: Fishy Cat - Compute: CPU-Only 1 2 3 12 24 36 48 60 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 55.00 54.80 55.02
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: Barbershop - Compute: CPU-Only 1 2 3 30 60 90 120 150 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 142.58 142.94 142.46
OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.92 Blend File: Pabellon Barcelona - Compute: CPU-Only 1 2 3 30 60 90 120 150 SE +/- 0.36, N = 3 SE +/- 0.22, N = 3 116.52 117.14 116.66
1 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8301034Python Notes: Python 3.8.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 3 April 2021 18:15 by user phoronix.
2 Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8301034Python Notes: Python 3.8.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 3 April 2021 20:11 by user phoronix.
3 Processor: AMD EPYC 7702 64-Core @ 2.00GHz (64 Cores / 128 Threads), Motherboard: ASRockRack EPYCD8 (P2.40 BIOS), Chipset: AMD Starship/Matisse, Memory: 126GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel I350
OS: Ubuntu 20.04, Kernel: 5.9.0-050900rc6daily20200921-generic (x86_64) 20200920, Desktop: GNOME Shell 3.36.4, Display Server: X Server 1.20.8, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8301034Python Notes: Python 3.8.2Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 4 April 2021 05:18 by user phoronix.