Tests for a future article. AMD EPYC 9384X 32-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 22.04 via the Phoronix Test Suite.
E Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa101121Java Notes: OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)Python Notes: Python 3.10.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
F Processor: AMD EPYC 9384X 32-Core @ 3.91GHz (32 Cores / 64 Threads), Motherboard: AMD Titanite_4G (RTI1007B BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.04, Kernel: 5.15.0-47-generic (x86_64), Desktop: GNOME Shell 42.4, Display Server: X Server 1.21.1.3, Vulkan: 1.2.204, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 1024x768
bergamo extra OpenBenchmarking.org Phoronix Test Suite AMD EPYC 9384X 32-Core @ 3.91GHz (32 Cores / 64 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 768GB 3841GB Micron_9300_MTFDHAL3T8TDP ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 22.04 5.15.0-47-generic (x86_64) GNOME Shell 42.4 X Server 1.21.1.3 1.2.204 GCC 11.2.0 ext4 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution Bergamo Extra Benchmarks System Logs - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa101121 - OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04) - Python 3.10.6 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
E vs. F Comparison Phoronix Test Suite Baseline +6.4% +6.4% +12.8% +12.8% +19.2% +19.2% 25.4% 17.5% 15.3% 9.5% 8.6% 7.3% 6.2% 5.7% 4.3% 4.2% 3.9% 3.7% 3.2% 3.2% 2.7% 2.7% 2.5% 2.5% 2.1% 2% Preset 8 - Bosphorus 4K AVX-512 VNNI 17.9% EP.D Cloning 200 - 1 - 500 10.2% Pipe 100 - 1 - 500 8.9% 200 - 1 - 500 8.7% 500 - 100 - 500 tConvolve OpenMP - Gridding 8.3% 200 - 100 - 200 200 - 1 - 200 6.5% 100 - 1 - 500 6.3% 200 - 100 - 200 500 - 100 - 500 MG.C 5.5% Futex 5% 200 - 1 - 200 4.6% 100 - 1 - 200 CG.C SENDFILE 100 - 100 - 500 LU.C 3.6% Semaphores 3.3% 100 - 100 - 500 100 - 100 - 200 100 - 100 - 200 500 - 1 - 500 MMAP 2.5% 100 - 1 - 200 CPU - blazeface Preset 12 - Bosphorus 1080p 2.1% 200 - 100 - 500 500 - 1 - 500 CPU - vgg16 2% SVT-AV1 Stress-NG NAS Parallel Benchmarks Stress-NG Apache IoTDB Stress-NG Apache IoTDB Apache IoTDB Apache IoTDB ASKAP Apache IoTDB Apache IoTDB Apache IoTDB Apache IoTDB Apache IoTDB NAS Parallel Benchmarks Stress-NG Apache IoTDB Apache IoTDB NAS Parallel Benchmarks Stress-NG Apache IoTDB NAS Parallel Benchmarks Stress-NG Apache IoTDB Apache IoTDB Apache IoTDB Apache IoTDB Stress-NG Apache IoTDB NCNN SVT-AV1 Apache IoTDB Apache IoTDB NCNN E F
bergamo extra stress-ng: Hash stress-ng: MMAP stress-ng: NUMA stress-ng: Pipe stress-ng: Poll stress-ng: Zlib stress-ng: Futex stress-ng: MEMFD stress-ng: Mutex stress-ng: Atomic stress-ng: Crypto stress-ng: Malloc stress-ng: Cloning stress-ng: Forking stress-ng: Pthread stress-ng: AVL Tree stress-ng: IO_uring stress-ng: SENDFILE stress-ng: CPU Cache stress-ng: CPU Stress stress-ng: Semaphores stress-ng: Matrix Math stress-ng: Vector Math stress-ng: AVX-512 VNNI stress-ng: Function Call stress-ng: x86_64 RdRand stress-ng: Floating Point stress-ng: Matrix 3D Math stress-ng: Memory Copying stress-ng: Vector Shuffle stress-ng: Mixed Scheduler stress-ng: Socket Activity stress-ng: Wide Vector Math stress-ng: Context Switching stress-ng: Fused Multiply-Add stress-ng: Vector Floating Point stress-ng: Glibc C String Functions stress-ng: Glibc Qsort Data Sorting stress-ng: System V Message Passing ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet npb: BT.C npb: EP.C npb: EP.D npb: FT.C npb: LU.C npb: SP.B npb: SP.C npb: IS.D npb: MG.C npb: CG.C askap: tConvolve MPI - Degridding askap: tConvolve MPI - Gridding askap: tConvolve OpenMP - Gridding askap: tConvolve OpenMP - Degridding askap: tConvolve MT - Gridding libxsmm: 32 askap: tConvolve MT - Degridding askap: Hogbom Clean OpenMP minife: Small incompact3d: input.i3d 129 Cells Per Direction incompact3d: input.i3d 193 Cells Per Direction incompact3d: X3D-benchmarking input.i3d compress-7zip: Compression Rating compress-7zip: Decompression Rating build-linux-kernel: defconfig build-linux-kernel: allmodconfig kvazaar: Bosphorus 1080p - Slow kvazaar: Bosphorus 1080p - Medium kvazaar: Bosphorus 1080p - Very Fast kvazaar: Bosphorus 1080p - Super Fast kvazaar: Bosphorus 1080p - Ultra Fast kvazaar: Bosphorus 4K - Slow kvazaar: Bosphorus 4K - Medium kvazaar: Bosphorus 4K - Very Fast kvazaar: Bosphorus 4K - Super Fast kvazaar: Bosphorus 4K - Ultra Fast svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 12 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 4 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 12 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p blender: BMW27 - CPU-Only blender: Classroom - CPU-Only blender: Fishy Cat - CPU-Only blender: Pabellon Barcelona - CPU-Only blender: Barbershop - CPU-Only avifenc: 0 avifenc: 2 avifenc: 6 avifenc: 6, Lossless avifenc: 10, Lossless build-godot: Time To Compile build-gem5: Time To Compile build-mesa: Time To Compile build-nodejs: Time To Compile openssl: RSA4096 openssl: RSA4096 openssl: SHA256 openssl: SHA512 openssl: AES-128-GCM openssl: AES-256-GCM openssl: ChaCha20 openssl: ChaCha20-Poly1305 apache-iotdb: 100 - 1 - 200 apache-iotdb: 100 - 1 - 200 apache-iotdb: 100 - 1 - 500 apache-iotdb: 100 - 1 - 500 apache-iotdb: 200 - 1 - 200 apache-iotdb: 200 - 1 - 200 apache-iotdb: 200 - 1 - 500 apache-iotdb: 200 - 1 - 500 apache-iotdb: 500 - 1 - 200 apache-iotdb: 500 - 1 - 200 apache-iotdb: 500 - 1 - 500 apache-iotdb: 500 - 1 - 500 apache-iotdb: 100 - 100 - 200 apache-iotdb: 100 - 100 - 200 apache-iotdb: 100 - 100 - 500 apache-iotdb: 100 - 100 - 500 apache-iotdb: 200 - 100 - 200 apache-iotdb: 200 - 100 - 200 apache-iotdb: 200 - 100 - 500 apache-iotdb: 200 - 100 - 500 apache-iotdb: 500 - 100 - 200 apache-iotdb: 500 - 100 - 200 apache-iotdb: 500 - 100 - 500 apache-iotdb: 500 - 100 - 500 libxsmm: 64 svt-av1: Preset 4 - Bosphorus 4K E F 7179982.15 625.09 647.21 18207317.96 4343727.88 3889.71 2852066.71 415.36 23255699.79 247.05 79338.51 159403243.38 6870.08 26368.36 104885.48 693.33 5064110.48 649786.78 1117509.74 76099.17 81406001.02 166863.57 214246.51 4093266.57 25044.49 11942773.88 11457.56 28914.23 12746.29 24688.71 33179 31093.74 1415448.85 14202532.49 30155491.73 101972.69 31352208.83 818.41 7967662.12 17.4 8.31 8.49 10.01 7.43 10.74 4.13 20.67 26.04 9.07 5.91 17.9 26.35 16.55 24.53 45 12.67 130842.77 4107.38 3555.93 91806.3 195117.13 119825.49 105770.9 4202.57 121134.91 47029.42 23854.1 30870.1 22188 33282 11262.6 981.1 15576.2 1149.43 57367.1 3.76278901 12.3650045 426.737732 301517 248227 31.94 334.195 69.99 73.29 162.16 224.67 263.45 22.7 23.49 48.9 61.87 72.57 68.736 161.629 161.793 13.535 141.111 515.851 615.958 39.98 101.62 49.68 124.16 355.8 72.293 38.121 2.867 5.65 4.082 112.549 155.586 17.533 171.749 17152.6 549644.5 48327010260 15424545380 350368000140 301537503100 190340976640 133613467510 588587.22 20.42 1016068.17 35.56 902188.66 15.3 1244290.54 33.33 1178643.97 13.94 1408314.28 32.07 38111412.39 37.95 52556675.96 78.92 47100474.32 35.05 49539921.59 92.91 60821535.47 29.07 67181470.11 70.79 1246.6 5.285 7192739.29 609.72 645.72 19946044.78 4271308.93 3893.79 2717400.09 411.49 22935233.04 246.37 79616.41 158836969.43 7924.08 26317.3 104500.93 694.87 5015105.12 675353.65 1117723.88 76433.87 78775153.73 166859.04 214233.97 3473026.51 25183.91 11878217.33 11511.71 28742.76 12744.05 24689.76 33150.73 31108.14 1409726.96 14196798.48 30114608.52 101749.29 31613431.85 818.44 7980751.91 17.26 8.27 8.48 9.92 7.44 10.67 4.03 20.55 26.57 9.12 5.87 18.01 26.32 16.41 24.3 45.62 12.59 130890.96 4134.38 4177.87 91355.37 188290.63 119401.29 105402.56 4255.49 114790.76 49017.63 23854.1 30870.1 20481.2 33282 11330 967.1 15633.4 1149.43 57438.2 3.82253599 12.255085 422.494141 302670 246009 32.065 333.555 70 72.72 161.28 223.65 266.37 22.93 23.49 48.77 61.71 72.53 86.174 163.265 162.457 13.668 139.795 505.005 605.256 40.01 101.45 49.79 123.67 355.94 72.602 37.935 2.857 5.611 4.085 112.619 155.667 17.527 171.149 17202.9 549345.8 48277353550 15419482790 351016588900 301560675530 190276666710 133611469760 603375.23 19.58 955839.49 38.71 862346.98 16.3 1144274.29 36.74 1193616.95 13.7 1445916.13 31.43 39129510.65 36.79 54498343.71 76.47 50016758.44 32.66 50107256.21 91.03 60811785.77 29.37 71000860.09 65.16 1244 5.348 OpenBenchmarking.org
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 F E 2 4 6 8 10 8.27 8.31 MIN: 7.89 / MAX: 9.92 MIN: 7.9 / MAX: 8.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 F E 2 4 6 8 10 8.48 8.49 MIN: 8.35 / MAX: 10.37 MIN: 8.38 / MAX: 8.91 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 F E 3 6 9 12 15 9.92 10.01 MIN: 9.83 / MAX: 10.35 MIN: 9.89 / MAX: 11.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet F E 2 4 6 8 10 7.44 7.43 MIN: 7.37 / MAX: 7.88 MIN: 7.36 / MAX: 7.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 F E 3 6 9 12 15 10.67 10.74 MIN: 10.57 / MAX: 11 MIN: 10.65 / MAX: 11.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface F E 0.9293 1.8586 2.7879 3.7172 4.6465 4.03 4.13 MIN: 3.96 / MAX: 4.49 MIN: 4.07 / MAX: 6.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet F E 5 10 15 20 25 20.55 20.67 MIN: 20.36 / MAX: 28.56 MIN: 20.55 / MAX: 21.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 F E 6 12 18 24 30 26.57 26.04 MIN: 26.31 / MAX: 28.59 MIN: 25.76 / MAX: 26.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 F E 3 6 9 12 15 9.12 9.07 MIN: 9.04 / MAX: 9.54 MIN: 9 / MAX: 9.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet F E 1.3298 2.6596 3.9894 5.3192 6.649 5.87 5.91 MIN: 5.81 / MAX: 6.3 MIN: 5.74 / MAX: 13.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 F E 4 8 12 16 20 18.01 17.90 MIN: 17.88 / MAX: 18.68 MIN: 17.78 / MAX: 19.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny F E 6 12 18 24 30 26.32 26.35 MIN: 26.16 / MAX: 27.75 MIN: 26.14 / MAX: 26.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd F E 4 8 12 16 20 16.41 16.55 MIN: 16.32 / MAX: 18.41 MIN: 16.46 / MAX: 17.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m F E 6 12 18 24 30 24.30 24.53 MIN: 23.83 / MAX: 52.68 MIN: 24.4 / MAX: 26.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer F E 10 20 30 40 50 45.62 45.00 MIN: 45.16 / MAX: 54.44 MIN: 44.46 / MAX: 52.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet F E 3 6 9 12 15 12.59 12.67 MIN: 12.44 / MAX: 14.79 MIN: 12.55 / MAX: 13.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C F E 30K 60K 90K 120K 150K 130890.96 130842.77 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C F E 900 1800 2700 3600 4500 4134.38 4107.38 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D F E 900 1800 2700 3600 4500 4177.87 3555.93 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C F E 20K 40K 60K 80K 100K 91355.37 91806.30 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C F E 40K 80K 120K 160K 200K 188290.63 195117.13 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B F E 30K 60K 90K 120K 150K 119401.29 119825.49 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C F E 20K 40K 60K 80K 100K 105402.56 105770.90 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D F E 900 1800 2700 3600 4500 4255.49 4202.57 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C F E 30K 60K 90K 120K 150K 114790.76 121134.91 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C F E 10K 20K 30K 40K 50K 49017.63 47029.42 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding F E 5K 10K 15K 20K 25K 23854.1 23854.1 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding F E 7K 14K 21K 28K 35K 30870.1 30870.1 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding F E 5K 10K 15K 20K 25K 20481.2 22188.0 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding F E 7K 14K 21K 28K 35K 33282 33282 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding F E 2K 4K 6K 8K 10K 11330.0 11262.6 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
libxsmm Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 F E 200 400 600 800 1000 967.1 981.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
ASKAP ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding F E 3K 6K 9K 12K 15K 15633.4 15576.2 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP F E 200 400 600 800 1000 1149.43 1149.43 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction F E 0.8601 1.7202 2.5803 3.4404 4.3005 3.82253599 3.76278901 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction F E 3 6 9 12 15 12.26 12.37 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d F E 90 180 270 360 450 422.49 426.74 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 1080p - Video Preset: Slow F E 16 32 48 64 80 70.00 69.99 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 1080p - Video Preset: Medium F E 16 32 48 64 80 72.72 73.29 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 1080p - Video Preset: Very Fast F E 40 80 120 160 200 161.28 162.16 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 1080p - Video Preset: Super Fast F E 50 100 150 200 250 223.65 224.67 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast F E 60 120 180 240 300 266.37 263.45 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Slow F E 5 10 15 20 25 22.93 22.70 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Medium F E 6 12 18 24 30 23.49 23.49 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Very Fast F E 11 22 33 44 55 48.77 48.90 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Super Fast F E 14 28 42 56 70 61.71 61.87 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Ultra Fast F E 16 32 48 64 80 72.53 72.57 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
SVT-AV1 OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 8 - Input: Bosphorus 4K F E 20 40 60 80 100 86.17 68.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 12 - Input: Bosphorus 4K F E 40 80 120 160 200 163.27 161.63 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 13 - Input: Bosphorus 4K F E 40 80 120 160 200 162.46 161.79 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 4 - Input: Bosphorus 1080p F E 4 8 12 16 20 13.67 13.54 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 8 - Input: Bosphorus 1080p F E 30 60 90 120 150 139.80 141.11 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 12 - Input: Bosphorus 1080p F E 110 220 330 440 550 505.01 515.85 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 13 - Input: Bosphorus 1080p F E 130 260 390 520 650 605.26 615.96 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 F E 4K 8K 12K 16K 20K 17202.9 17152.6 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 F E 120K 240K 360K 480K 600K 549345.8 549644.5 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA256 F E 10000M 20000M 30000M 40000M 50000M 48277353550 48327010260 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA512 F E 3000M 6000M 9000M 12000M 15000M 15419482790 15424545380 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-128-GCM F E 80000M 160000M 240000M 320000M 400000M 351016588900 350368000140 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-256-GCM F E 60000M 120000M 180000M 240000M 300000M 301560675530 301537503100 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20 F E 40000M 80000M 120000M 160000M 200000M 190276666710 190340976640 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20-Poly1305 F E 30000M 60000M 90000M 120000M 150000M 133611469760 133613467510 1. (CC) gcc options: -pthread -m64 -O3 -lssl -lcrypto -ldl
libxsmm Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 F E 300 600 900 1200 1500 1244.0 1246.6 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
SVT-AV1 OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.7 Encoder Mode: Preset 4 - Input: Bosphorus 4K F E 1.2033 2.4066 3.6099 4.8132 6.0165 5.348 5.285 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
E Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa101121Java Notes: OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)Python Notes: Python 3.10.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 29 August 2023 09:35 by user phoronix.
F Processor: AMD EPYC 9384X 32-Core @ 3.91GHz (32 Cores / 64 Threads), Motherboard: AMD Titanite_4G (RTI1007B BIOS), Chipset: AMD Device 14a4, Memory: 768GB, Disk: 3841GB Micron_9300_MTFDHAL3T8TDP, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 22.04, Kernel: 5.15.0-47-generic (x86_64), Desktop: GNOME Shell 42.4, Display Server: X Server 1.21.1.3, Vulkan: 1.2.204, Compiler: GCC 11.2.0, File-System: ext4, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-gBFGDP/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: amd-pstate performance (Boost: Enabled) - CPU Microcode: 0xa101121Java Notes: OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)Python Notes: Python 3.10.6Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 29 August 2023 11:27 by user phoronix.