AMD EPYC 9655P memory benchmarks by Michael Larabel for a future article.
12c DDR5-6000 Processor: AMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads), Motherboard: Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS), Chipset: AMD 1Ah, Memory: 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS, Graphics: ASPEED, Network: 2 x Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 24.10, Kernel: 6.12.0-rc7-phx (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server, Compiler: GCC 14.2.0, File-System: ext4, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116Python Notes: Python 3.12.7Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
8c DDR5-6000 Changed Memory to 8 x 64GB DDR5-6000MT/s .
Java Notes: OpenJDK Runtime Environment (build 21.0.5+11-Ubuntu-1ubuntu124.10)
AMD EPYC Turin 8c vs. 12c Memory Channel DDR5 Comparison OpenBenchmarking.org Phoronix Test Suite AMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads) Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS) AMD 1Ah 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF 8 x 64GB DDR5-6000MT/s 3201GB Micron_7450_MTFDKCB3T2TFS ASPEED 2 x Broadcom NetXtreme BCM5720 PCIe Ubuntu 24.10 6.12.0-rc7-phx (x86_64) GNOME Shell 47.0 X Server GCC 14.2.0 ext4 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution AMD EPYC Turin 8c Vs. 12c Memory Channel DDR5 Comparison Benchmarks System Logs - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected - 8c DDR5-6000: OpenJDK Runtime Environment (build 21.0.5+11-Ubuntu-1ubuntu124.10)
12c DDR5-6000 vs. 8c DDR5-6000 Comparison Phoronix Test Suite Baseline +13.8% +13.8% +27.6% +27.6% +41.4% +41.4% 7.1% 4.3% 3.6% 3.2% 2.9% 2.3% 2% Copy 55.1% P.D.F - CPU 53.9% Scale 53.9% P.D.F - CPU 53.5% Add 52.5% Triad 51.7% 104 104 104 - 60 51.4% 144 144 144 - 60 50.6% 47.5% X.b.i.i 43.5% SP.C 42.7% Chrysler Neon 1M 42.3% MG.C 39.8% FT.C 37.2% d.M.M.S - Execution Time 32.3% i.i.1.C.P.D 27.2% BT.C 26.8% M.C.F.B.S - 8192 MiB 23.4% IS.D 23.3% CPU - 512 - ResNet-50 22.9% M.C.F.B.S - 4096 MiB 22.8% Add - Integer 20.9% Copy - Integer 20.7% S.w.1.0.6.A 18.5% Scale - Integer 18.4% 1.R.H.D.S.R 18.2% Average - Integer 18% SP.B 17.6% 1.R.H.D.T.R 17.4% Triad - Integer 17% 1.R.H.D.F.R.C.C 16.1% M.T.E.T.D.F - CPU 14.2% M.T.E.T.D.F - CPU 14.2% LU.C 13.4% 26 11.5% Carbon Nanotube 11.3% 26 10.4% CaffeNet 12-int8 - CPU - Standard 8.7% N.S.P.L.F - CPU 8.5% 26 8.2% 26 8% CG.C 7.9% Mobilenet Quant 7.9% Memory Copy - 8192 MiB 7.6% N.S.P.L.F - CPU 7.3% Memory Copy - 4096 MiB 7.3% Compression Rating 7.2% EP.D MPI CPU - water_GMX50_bare 6.8% ZFNet-512 - CPU - Standard 6.7% 500 6.6% Medium 6.6% R.S.A.F.I - CPU 6.4% R.S.A.F.I - CPU 6.3% Small 6.2% Writes 5.9% A.w.3.5.A 5.7% Cone 4.9% GPT-2 - CPU - Standard 4.4% LuxCore Benchmark - CPU FP32MobileNetV3Large 4% 500 Fast 3.5% I.R.V 3.3% T5 Encoder - CPU - Standard 3.3% DeepLab V3 3.2% DLSC - CPU d.S.M.S - Mesh Time fcn-resnet101-11 - CPU - Standard 2.8% d.M.M.S - Mesh Time 2.7% Bosphorus 4K - Super Fast 2.7% FP16MobileNetV3Small 2.4% Tomographic Model v.I 2.3% Preset 8 - Bosphorus 4K 2.2% Bosphorus 4K - Super Fast GPT-2 - CPU - Standard 4.4% ZFNet-512 - CPU - Standard 6.7% T5 Encoder - CPU - Standard 3.3% CaffeNet 12-int8 - CPU - Standard 9.2% fcn-resnet101-11 - CPU - Standard 2.6% Stream OpenVINO Stream OpenVINO Stream Stream High Performance Conjugate Gradient High Performance Conjugate Gradient Algebraic Multi-Grid Benchmark Xcompact3d Incompact3d NAS Parallel Benchmarks OpenRadioss NAS Parallel Benchmarks NAS Parallel Benchmarks OpenFOAM Xcompact3d Incompact3d NAS Parallel Benchmarks MBW NAS Parallel Benchmarks TensorFlow MBW RAMspeed SMP RAMspeed SMP NAMD RAMspeed SMP ClickHouse RAMspeed SMP NAS Parallel Benchmarks ClickHouse RAMspeed SMP ClickHouse OpenVINO OpenVINO NAS Parallel Benchmarks Graph500 GPAW Graph500 ONNX Runtime OpenVINO Graph500 Graph500 NAS Parallel Benchmarks LiteRT MBW OpenVINO MBW 7-Zip Compression NAS Parallel Benchmarks GROMACS ONNX Runtime Apache HTTP Server Whisperfile OpenVINO OpenVINO Whisperfile Apache Cassandra NAMD Epoch ONNX Runtime LuxCoreRender XNNPACK nginx ASTC Encoder LiteRT ONNX Runtime LiteRT LuxCoreRender OpenFOAM ONNX Runtime OpenFOAM uvg266 XNNPACK SPECFEM3D OpenVKL SVT-AV1 Kvazaar ONNX Runtime ONNX Runtime ONNX Runtime ONNX Runtime ONNX Runtime 12c DDR5-6000 8c DDR5-6000
AMD EPYC Turin 8c vs. 12c Memory Channel DDR5 Comparison graph500: 26 graph500: 26 minibude: OpenMP - BM2 amg: openvino: Person Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU kvazaar: Bosphorus 4K - Slow kvazaar: Bosphorus 4K - Medium kvazaar: Bosphorus 4K - Very Fast kvazaar: Bosphorus 4K - Super Fast kvazaar: Bosphorus 4K - Ultra Fast svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 13 - Beauty 4K 10-bit uvg266: Bosphorus 4K - Slow uvg266: Bosphorus 4K - Medium uvg266: Bosphorus 4K - Very Fast uvg266: Bosphorus 4K - Super Fast uvg266: Bosphorus 4K - Ultra Fast vvenc: Bosphorus 4K - Fast vvenc: Bosphorus 4K - Faster minibude: OpenMP - BM2 hpcg: 104 104 104 - 60 hpcg: 144 144 144 - 60 tensorflow: CPU - 512 - ResNet-50 onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard openvkl: vklBenchmarkCPU ISPC luxcorerender: DLSC - CPU luxcorerender: Danish Mood - CPU luxcorerender: Orange Juice - CPU luxcorerender: LuxCore Benchmark - CPU luxcorerender: Rainbow Colors and Prism - CPU ramspeed: Add - Integer ramspeed: Copy - Integer ramspeed: Scale - Integer ramspeed: Triad - Integer ramspeed: Average - Integer stream: Copy stream: Scale stream: Triad stream: Add srsran: PDSCH Processor Benchmark, Throughput Total srsran: PUSCH Processor Benchmark, Throughput Total mbw: Memory Copy - 4096 MiB mbw: Memory Copy - 8192 MiB mbw: Memory Copy, Fixed Block Size - 4096 MiB mbw: Memory Copy, Fixed Block Size - 8192 MiB compress-7zip: Compression Rating compress-7zip: Decompression Rating astcenc: Fast astcenc: Medium astcenc: Thorough astcenc: Exhaustive astcenc: Very Thorough gromacs: MPI CPU - water_GMX50_bare namd: ATPase with 327,506 Atoms namd: STMV with 1,066,628 Atoms cassandra: Writes memcached: 1:5 memcached: 1:10 memcached: 1:100 clickhouse: 100M Rows Hits Dataset, First Run / Cold Cache clickhouse: 100M Rows Hits Dataset, Second Run clickhouse: 100M Rows Hits Dataset, Third Run john-the-ripper: bcrypt john-the-ripper: WPA PSK john-the-ripper: Blowfish john-the-ripper: HMAC-SHA512 john-the-ripper: MD5 stargate: 96000 - 512 stargate: 192000 - 512 stargate: 96000 - 1024 stargate: 192000 - 1024 nginx: 500 apache: 500 liquid-dsp: 64 - 256 - 57 liquid-dsp: 128 - 256 - 57 liquid-dsp: 192 - 256 - 57 liquid-dsp: 64 - 256 - 512 liquid-dsp: 128 - 256 - 512 liquid-dsp: 192 - 256 - 512 graph500: 26 graph500: 26 npb: BT.C npb: CG.C npb: EP.C npb: EP.D npb: FT.C npb: IS.D npb: LU.C npb: MG.C npb: SP.B npb: SP.C pgbench: 1000 - 800 - Read Only pgbench: 1000 - 800 - Read Write v-ray: CPU onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard litert: DeepLab V3 litert: SqueezeNet litert: Inception V4 litert: NASNet Mobile litert: Mobilenet Float litert: Mobilenet Quant litert: Inception ResNet V2 litert: Quantized COCO SSD MobileNet v1 onednn: IP Shapes 1D - CPU onednn: IP Shapes 3D - CPU onednn: Convolution Batch Shapes Auto - CPU onednn: Deconvolution Batch shapes_1d - CPU onednn: Deconvolution Batch shapes_3d - CPU onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU pgbench: 1000 - 800 - Read Only - Average Latency pgbench: 1000 - 800 - Read Write - Average Latency openvino: Person Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU incompact3d: X3D-benchmarking input.i3d incompact3d: input.i3d 193 Cells Per Direction openfoam: drivaerFastback, Small Mesh Size - Mesh Time openfoam: drivaerFastback, Small Mesh Size - Execution Time openfoam: drivaerFastback, Medium Mesh Size - Mesh Time openfoam: drivaerFastback, Medium Mesh Size - Execution Time openradioss: Chrysler Neon 1M openradioss: INIVOL and Fluid Structure Interaction Drop Container specfem3d: Mount St. Helens specfem3d: Layered Halfspace specfem3d: Tomographic Model specfem3d: Homogeneous Halfspace specfem3d: Water-layered Halfspace build-godot: Time To Compile build-linux-kernel: defconfig build-linux-kernel: allmodconfig build2: Time To Compile rawtherapee: Total Benchmark Time gpaw: Carbon Nanotube blender: BMW27 - CPU-Only blender: Junkshop - CPU-Only blender: Classroom - CPU-Only blender: Fishy Cat - CPU-Only blender: Barbershop - CPU-Only blender: Pabellon Barcelona - CPU-Only whisperfile: Small whisperfile: Medium epoch: Cone warpx: Uniform Plasma warpx: Plasma Acceleration m-queens: Time To Solve xnnpack: FP32MobileNetV1 xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV3Small xnnpack: FP16MobileNetV1 xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV3Small xnnpack: QS8MobileNetV2 12c DDR5-6000 8c DDR5-6000 1505710000 1478180000 286.858 3076286333 689.50 145.10 8240.40 23427.50 2437.93 844.93 14006.15 6380.58 7934.40 10462.56 3551.35 166983.33 44.79 45.30 90.46 91.54 96.07 15.684 56.438 186.687 435.382 1.852 7.699 13.554 17.145 30.40 33.04 67.24 67.26 67.65 9.643 22.960 7171.451 62.5441 61.7722 241.55 195.276 12.2149 163.976 243.328 22.0238 1041.95 9.44076 2825 16.49 12.03 24.19 12.39 28.23 120526.51 139450.46 139548.85 122052.34 130318.77 343229.3 314946.5 348347.3 355858.4 25941.9 6237.7 25160.632 25126.492 22520.674 22449.826 779242 627215 1109.9501 568.8262 82.2795 7.1334 11.6277 17.787 12.28998 4.21717 447176 3701189.23 6670118.94 11923788.43 740.15 771.05 765.07 237542 1023333 237515 413474333 21681000 5.943979 3.932179 6.539322 4.429570 505484.86 170467.37 3195900000 4963433333 5958600000 1412966667 1952100000 2149200000 747138000 554936000 361153.06 65380.58 11230.81 13685.97 164903.45 6970.89 311883.80 147097.41 199255.89 154965.84 3155036 113745 199769 5.11928 81.8816 6.09718 4.10880 45.4067 0.959531 106.2133 11450.9 7051.59 44088.3 99408.3 4399.48 5800.65 58054.3 7211.99 0.531984 0.266791 0.343593 6.76736 0.716373 425.784 277.643 0.254 7.037 69.48 330.01 5.80 3.96 19.61 56.71 6.71 7.48 11.57 4.56 27.01 0.33 203.937052 7.08492692 22.944605 22.408568 106.31177 192.22721 90.77 79.42 5.778714613 13.700075229 6.777958300 8.194862498 14.275555136 80.552 22.224 193.346 53.439 35.854 25.660 13.14 16.75 35.43 18.46 120.83 41.25 91.05176 202.15909 186.85 16.27457152 25.85109772 7.214 4629 9242 13643 10418 4611 8982 13074 10559 10095 1394450000 1365800000 288.351 2085218000 449.23 145.11 8240.98 23443.86 2291.09 740.13 14026.36 6329.60 7314.92 10448.92 3543.79 166363.91 44.76 45.38 89.33 93.40 96.72 15.603 56.088 182.608 430.868 1.846 7.687 13.512 17.110 30.30 32.92 66.12 65.52 66.86 9.598 22.635 7208.779 41.3186 41.0157 196.47 187.125 12.0605 153.637 235.596 21.6478 958.576 9.18611 2762 17.01 12.06 24.15 12.92 27.85 99666.50 115580.47 117864.04 104337.42 110436.65 221234.4 204677.8 229604.0 233276.2 26209.9 6238.1 23448.633 23357.889 18332.409 18188.111 727045 626553 1072.7819 568.3363 82.1896 7.1339 11.6303 16.654 11.62457 3.55953 422312 3711825.75 6752055.12 12031156.03 637.67 652.53 651.48 237055 1019333 237458 413326333 21408333 5.918595 3.896311 6.517349 4.388974 523766.78 159864.99 3214133333 5053200000 5955433333 1417633333 1956333333 2136766667 676709000 497798000 284718.53 60572.20 11375.83 14658.25 120202.20 5653.35 275011.41 105206.58 169436.81 108558.21 3149277 112175 198145 5.34359 82.9434 6.50761 4.24372 46.2026 1.047505 109.005 11817.9 7048.39 43959.9 100519.4 4441.27 6260.21 59977.8 7110.12 0.529052 0.265558 0.341374 6.84402 0.715363 429.684 277.996 0.254 7.136 106.96 330.13 5.80 3.97 20.85 64.75 6.70 7.53 12.42 4.56 27.06 0.33 292.578786 9.01416429 22.297072 22.648378 109.19543 254.29552 129.17 79.99 5.855350933 13.861252099 6.626369356 8.198005837 14.355377904 80.874 22.555 196.240 53.551 36.228 28.560 13.09 16.80 35.51 18.55 121.15 41.18 96.68799 215.52289 196.03 16.45373580 25.76282272 7.229 4596 9236 14189 10534 4570 9094 13257 10810 9968 OpenBenchmarking.org
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 8c DDR5-6000 12c DDR5-6000 300M 600M 900M 1200M 1500M 1394450000 1505710000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 8c DDR5-6000 12c DDR5-6000 300M 600M 900M 1200M 1500M 1365800000 1478180000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 8c DDR5-6000 12c DDR5-6000 60 120 180 240 300 SE +/- 0.50, N = 3 SE +/- 2.24, N = 3 288.35 286.86 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 150 300 450 600 750 SE +/- 6.83, N = 15 SE +/- 1.81, N = 3 449.23 689.50 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 30 60 90 120 150 SE +/- 0.12, N = 3 SE +/- 0.26, N = 3 145.11 145.10 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 5.48, N = 3 SE +/- 10.65, N = 3 8240.98 8240.40 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 5K 10K 15K 20K 25K SE +/- 11.10, N = 3 SE +/- 29.34, N = 3 23443.86 23427.50 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 500 1000 1500 2000 2500 SE +/- 2.70, N = 3 SE +/- 1.15, N = 3 2291.09 2437.93 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 200 400 600 800 1000 SE +/- 4.52, N = 3 SE +/- 1.30, N = 3 740.13 844.93 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 3K 6K 9K 12K 15K SE +/- 1.42, N = 3 SE +/- 11.52, N = 3 14026.36 14006.15 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 1400 2800 4200 5600 7000 SE +/- 17.72, N = 3 SE +/- 12.38, N = 3 6329.60 6380.58 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 10.57, N = 3 SE +/- 20.16, N = 3 7314.92 7934.40 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 7.25, N = 3 SE +/- 5.49, N = 3 10448.92 10462.56 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 800 1600 2400 3200 4000 SE +/- 1.42, N = 3 SE +/- 4.91, N = 3 3543.79 3551.35 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 40K 80K 120K 160K 200K SE +/- 234.32, N = 3 SE +/- 251.97, N = 3 166363.91 166983.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Kvazaar This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Slow 8c DDR5-6000 12c DDR5-6000 10 20 30 40 50 SE +/- 0.11, N = 3 SE +/- 0.08, N = 3 44.76 44.79 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Medium 8c DDR5-6000 12c DDR5-6000 10 20 30 40 50 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 45.38 45.30 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Very Fast 8c DDR5-6000 12c DDR5-6000 20 40 60 80 100 SE +/- 0.20, N = 3 SE +/- 0.61, N = 3 89.33 90.46 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Super Fast 8c DDR5-6000 12c DDR5-6000 20 40 60 80 100 SE +/- 0.53, N = 3 SE +/- 0.52, N = 3 93.40 91.54 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.2 Video Input: Bosphorus 4K - Video Preset: Ultra Fast 8c DDR5-6000 12c DDR5-6000 20 40 60 80 100 SE +/- 0.36, N = 3 SE +/- 0.29, N = 3 96.72 96.07 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
SVT-AV1 OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 4K 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 15.60 15.68 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 4K 8c DDR5-6000 12c DDR5-6000 13 26 39 52 65 SE +/- 0.10, N = 3 SE +/- 0.28, N = 3 56.09 56.44 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 4K 8c DDR5-6000 12c DDR5-6000 40 80 120 160 200 SE +/- 0.19, N = 3 SE +/- 1.50, N = 3 182.61 186.69 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 4K 8c DDR5-6000 12c DDR5-6000 90 180 270 360 450 SE +/- 0.57, N = 3 SE +/- 1.17, N = 3 430.87 435.38 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit 8c DDR5-6000 12c DDR5-6000 0.4167 0.8334 1.2501 1.6668 2.0835 SE +/- 0.002, N = 3 SE +/- 0.007, N = 3 1.846 1.852 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.026, N = 3 SE +/- 0.039, N = 3 7.687 7.699 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 13.51 13.55 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 17.11 17.15 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
uvg266 OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.8.0 Video Input: Bosphorus 4K - Video Preset: Slow 8c DDR5-6000 12c DDR5-6000 7 14 21 28 35 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 30.30 30.40
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.8.0 Video Input: Bosphorus 4K - Video Preset: Medium 8c DDR5-6000 12c DDR5-6000 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 32.92 33.04
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.8.0 Video Input: Bosphorus 4K - Video Preset: Very Fast 8c DDR5-6000 12c DDR5-6000 15 30 45 60 75 SE +/- 0.33, N = 3 SE +/- 0.21, N = 3 66.12 67.24
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.8.0 Video Input: Bosphorus 4K - Video Preset: Super Fast 8c DDR5-6000 12c DDR5-6000 15 30 45 60 75 SE +/- 0.32, N = 3 SE +/- 0.11, N = 3 65.52 67.26
OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.8.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast 8c DDR5-6000 12c DDR5-6000 15 30 45 60 75 SE +/- 0.79, N = 3 SE +/- 0.18, N = 3 66.86 67.65
VVenC VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.11 Video Input: Bosphorus 4K - Video Preset: Fast 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.064, N = 3 SE +/- 0.082, N = 3 9.598 9.643 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.11 Video Input: Bosphorus 4K - Video Preset: Faster 8c DDR5-6000 12c DDR5-6000 6 12 18 24 30 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 22.64 22.96 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
miniBUDE MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 8c DDR5-6000 12c DDR5-6000 1500 3000 4500 6000 7500 SE +/- 12.59, N = 3 SE +/- 56.00, N = 3 7208.78 7171.45 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
TensorFlow This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 8c DDR5-6000 12c DDR5-6000 50 100 150 200 250 SE +/- 0.27, N = 3 SE +/- 0.11, N = 3 196.47 241.55
OpenVKL OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU ISPC 8c DDR5-6000 12c DDR5-6000 600 1200 1800 2400 3000 SE +/- 1.53, N = 3 SE +/- 1.76, N = 3 2762 2825 MIN: 216 / MAX: 33882 MIN: 217 / MAX: 36373
LuxCoreRender LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: CPU 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.21, N = 15 SE +/- 0.19, N = 3 17.01 16.49 MIN: 15.67 / MAX: 20.15 MIN: 15.79 / MAX: 20.03
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: CPU 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.11, N = 15 SE +/- 0.09, N = 15 12.06 12.03 MIN: 5.76 / MAX: 14.38 MIN: 5.72 / MAX: 14.2
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: CPU 8c DDR5-6000 12c DDR5-6000 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.09, N = 3 24.15 24.19 MIN: 21.4 / MAX: 32.92 MIN: 21.48 / MAX: 33.13
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: CPU 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.17, N = 3 SE +/- 0.12, N = 15 12.92 12.39 MIN: 6.3 / MAX: 15.04 MIN: 5.62 / MAX: 14.88
OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: CPU 8c DDR5-6000 12c DDR5-6000 7 14 21 28 35 SE +/- 0.02, N = 3 SE +/- 0.19, N = 3 27.85 28.23 MIN: 24.8 / MAX: 28.48 MIN: 25.05 / MAX: 28.69
OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer 8c DDR5-6000 12c DDR5-6000 30K 60K 90K 120K 150K SE +/- 779.65, N = 3 SE +/- 266.61, N = 3 115580.47 139450.46 1. (CC) gcc options: -O3 -march=native
OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Integer 8c DDR5-6000 12c DDR5-6000 30K 60K 90K 120K 150K SE +/- 291.38, N = 3 SE +/- 283.77, N = 3 117864.04 139548.85 1. (CC) gcc options: -O3 -march=native
OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Triad - Benchmark: Integer 8c DDR5-6000 12c DDR5-6000 30K 60K 90K 120K 150K SE +/- 795.56, N = 3 SE +/- 253.90, N = 3 104337.42 122052.34 1. (CC) gcc options: -O3 -march=native
OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Integer 8c DDR5-6000 12c DDR5-6000 30K 60K 90K 120K 150K SE +/- 807.29, N = 3 SE +/- 252.96, N = 3 110436.65 130318.77 1. (CC) gcc options: -O3 -march=native
OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Scale 8c DDR5-6000 12c DDR5-6000 70K 140K 210K 280K 350K SE +/- 630.12, N = 5 SE +/- 478.12, N = 5 204677.8 314946.5 1. (CC) gcc options: -mcmodel=medium -O3 -march=native -fopenmp
OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Triad 8c DDR5-6000 12c DDR5-6000 70K 140K 210K 280K 350K SE +/- 936.98, N = 5 SE +/- 1895.03, N = 5 229604.0 348347.3 1. (CC) gcc options: -mcmodel=medium -O3 -march=native -fopenmp
OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Add 8c DDR5-6000 12c DDR5-6000 80K 160K 240K 320K 400K SE +/- 1797.63, N = 5 SE +/- 1639.97, N = 5 233276.2 355858.4 1. (CC) gcc options: -mcmodel=medium -O3 -march=native -fopenmp
srsRAN Project srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.10.1-20240325 Test: PDSCH Processor Benchmark, Throughput Total 8c DDR5-6000 12c DDR5-6000 6K 12K 18K 24K 30K SE +/- 185.11, N = 12 SE +/- 311.92, N = 3 26209.9 25941.9 1. (CXX) g++ options: -O3 -march=native -mavx2 -mavx -msse4.1 -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -fno-trapping-math -fno-math-errno -ldl
OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.10.1-20240325 Test: PUSCH Processor Benchmark, Throughput Total 8c DDR5-6000 12c DDR5-6000 1300 2600 3900 5200 6500 SE +/- 0.12, N = 3 SE +/- 0.38, N = 3 6238.1 6237.7 MIN: 3819.5 / MAX: 6238.3 MIN: 3818.9 / MAX: 6238.4 1. (CXX) g++ options: -O3 -march=native -mavx2 -mavx -msse4.1 -mfma -mavx512f -mavx512cd -mavx512bw -mavx512dq -fno-trapping-math -fno-math-errno -ldl
OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 8192 MiB 8c DDR5-6000 12c DDR5-6000 5K 10K 15K 20K 25K SE +/- 19.02, N = 3 SE +/- 31.99, N = 3 23357.89 25126.49 1. (CC) gcc options: -O3 -march=native
OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 4096 MiB 8c DDR5-6000 12c DDR5-6000 5K 10K 15K 20K 25K SE +/- 54.43, N = 3 SE +/- 55.98, N = 3 18332.41 22520.67 1. (CC) gcc options: -O3 -march=native
OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 8192 MiB 8c DDR5-6000 12c DDR5-6000 5K 10K 15K 20K 25K SE +/- 60.40, N = 3 SE +/- 64.32, N = 3 18188.11 22449.83 1. (CC) gcc options: -O3 -march=native
7-Zip Compression OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 24.05 Test: Compression Rating 8c DDR5-6000 12c DDR5-6000 200K 400K 600K 800K 1000K SE +/- 1850.34, N = 3 SE +/- 879.82, N = 3 727045 779242 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 24.05 Test: Decompression Rating 8c DDR5-6000 12c DDR5-6000 130K 260K 390K 520K 650K SE +/- 340.29, N = 3 SE +/- 169.10, N = 3 626553 627215 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
ASTC Encoder OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Fast 8c DDR5-6000 12c DDR5-6000 200 400 600 800 1000 SE +/- 7.37, N = 13 SE +/- 14.18, N = 3 1072.78 1109.95 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Medium 8c DDR5-6000 12c DDR5-6000 120 240 360 480 600 SE +/- 0.83, N = 3 SE +/- 0.76, N = 3 568.34 568.83 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Thorough 8c DDR5-6000 12c DDR5-6000 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 82.19 82.28 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Exhaustive 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.0004, N = 3 SE +/- 0.0018, N = 3 7.1339 7.1334 1. (CXX) g++ options: -O3 -flto -pthread
OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Very Thorough 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 11.63 11.63 1. (CXX) g++ options: -O3 -flto -pthread
GROMACS The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 16.65 17.79 1. (CXX) g++ options: -O3 -lm
NAMD OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: ATPase with 327,506 Atoms 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 11.62 12.29
OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: STMV with 1,066,628 Atoms 8c DDR5-6000 12c DDR5-6000 0.9489 1.8978 2.8467 3.7956 4.7445 SE +/- 0.00613, N = 3 SE +/- 0.00356, N = 3 3.55953 4.21717
Memcached Memcached is a high performance, distributed memory object caching system. This Memcached test profiles makes use of memtier_benchmark for excuting this CPU/memory-focused server benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:5 8c DDR5-6000 12c DDR5-6000 800K 1600K 2400K 3200K 4000K SE +/- 3725.22, N = 3 SE +/- 16589.87, N = 3 3711825.75 3701189.23 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:10 8c DDR5-6000 12c DDR5-6000 1.4M 2.8M 4.2M 5.6M 7M SE +/- 5728.08, N = 3 SE +/- 23004.60, N = 3 6752055.12 6670118.94 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:100 8c DDR5-6000 12c DDR5-6000 3M 6M 9M 12M 15M SE +/- 48609.95, N = 3 SE +/- 20796.91, N = 3 12031156.03 11923788.43 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
ClickHouse ClickHouse is an open-source, high performance OLAP data management system. This test profile uses ClickHouse's standard benchmark recommendations per https://clickhouse.com/docs/en/operations/performance-test/ / https://github.com/ClickHouse/ClickBench/tree/main/clickhouse with the 100 million rows web analytics dataset. The reported value is the query processing time using the geometric mean of all separate queries performed as an aggregate. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, First Run / Cold Cache 8c DDR5-6000 12c DDR5-6000 160 320 480 640 800 SE +/- 1.98, N = 3 SE +/- 3.26, N = 3 637.67 740.15 MIN: 67.95 / MAX: 7500 MIN: 69.2 / MAX: 7500
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Second Run 8c DDR5-6000 12c DDR5-6000 170 340 510 680 850 SE +/- 0.96, N = 3 SE +/- 2.95, N = 3 652.53 771.05 MIN: 69.28 / MAX: 7500 MIN: 69.77 / MAX: 8571.43
OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Third Run 8c DDR5-6000 12c DDR5-6000 160 320 480 640 800 SE +/- 1.81, N = 3 SE +/- 4.74, N = 3 651.48 765.07 MIN: 70.92 / MAX: 6666.67 MIN: 69.77 / MAX: 8571.43
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: WPA PSK 8c DDR5-6000 12c DDR5-6000 200K 400K 600K 800K 1000K SE +/- 1666.67, N = 3 SE +/- 2185.81, N = 3 1019333 1023333 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: Blowfish 8c DDR5-6000 12c DDR5-6000 50K 100K 150K 200K 250K SE +/- 16.64, N = 3 SE +/- 16.90, N = 3 237458 237515 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: HMAC-SHA512 8c DDR5-6000 12c DDR5-6000 90M 180M 270M 360M 450M SE +/- 211622.57, N = 3 SE +/- 2811379.81, N = 3 413326333 413474333 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: MD5 8c DDR5-6000 12c DDR5-6000 5M 10M 15M 20M 25M SE +/- 32518.37, N = 3 SE +/- 20808.65, N = 3 21408333 21681000 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
Stargate Digital Audio Workstation Stargate is an open-source, cross-platform digital audio workstation (DAW) software package with "a unique and carefully curated experience" with scalability from old systems up through modern multi-core systems. Stargate is GPLv3 licensed and makes use of Qt5 (PyQt5) for its user-interface. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 96000 - Buffer Size: 512 8c DDR5-6000 12c DDR5-6000 1.3374 2.6748 4.0122 5.3496 6.687 SE +/- 0.033577, N = 3 SE +/- 0.018441, N = 3 5.918595 5.943979 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 192000 - Buffer Size: 512 8c DDR5-6000 12c DDR5-6000 0.8847 1.7694 2.6541 3.5388 4.4235 SE +/- 0.040462, N = 3 SE +/- 0.003414, N = 3 3.896311 3.932179 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 96000 - Buffer Size: 1024 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.002866, N = 3 SE +/- 0.024623, N = 3 6.517349 6.539322 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
OpenBenchmarking.org Render Ratio, More Is Better Stargate Digital Audio Workstation 22.11.5 Sample Rate: 192000 - Buffer Size: 1024 8c DDR5-6000 12c DDR5-6000 0.9967 1.9934 2.9901 3.9868 4.9835 SE +/- 0.014254, N = 3 SE +/- 0.018753, N = 3 4.388974 4.429570 1. (CXX) g++ options: -lpthread -lsndfile -lm -O3 -march=native -ffast-math -funroll-loops -fstrength-reduce -fstrict-aliasing -finline-functions
nginx This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better nginx 1.23.2 Connections: 500 8c DDR5-6000 12c DDR5-6000 110K 220K 330K 440K 550K SE +/- 3145.55, N = 3 SE +/- 328.79, N = 3 523766.78 505484.86 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
Apache HTTP Server This is a test of the Apache HTTPD web server. This Apache HTTPD web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better Apache HTTP Server 2.4.56 Concurrent Requests: 500 8c DDR5-6000 12c DDR5-6000 40K 80K 120K 160K 200K SE +/- 177.80, N = 3 SE +/- 413.18, N = 3 159864.99 170467.37 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
Liquid-DSP LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 8c DDR5-6000 12c DDR5-6000 700M 1400M 2100M 2800M 3500M SE +/- 3636084.59, N = 3 SE +/- 1814754.35, N = 3 3214133333 3195900000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 8c DDR5-6000 12c DDR5-6000 1100M 2200M 3300M 4400M 5500M SE +/- 8868107.65, N = 3 SE +/- 8326730.72, N = 3 5053200000 4963433333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 57 8c DDR5-6000 12c DDR5-6000 1300M 2600M 3900M 5200M 6500M SE +/- 8738103.02, N = 3 SE +/- 9761830.43, N = 3 5955433333 5958600000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 8c DDR5-6000 12c DDR5-6000 300M 600M 900M 1200M 1500M SE +/- 2356079.61, N = 3 SE +/- 1419311.26, N = 3 1417633333 1412966667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 512 8c DDR5-6000 12c DDR5-6000 400M 800M 1200M 1600M 2000M SE +/- 1072898.46, N = 3 SE +/- 6251666.44, N = 3 1956333333 1952100000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 512 8c DDR5-6000 12c DDR5-6000 500M 1000M 1500M 2000M 2500M SE +/- 821245.67, N = 3 SE +/- 1582192.57, N = 3 2136766667 2149200000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Graph500 This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 8c DDR5-6000 12c DDR5-6000 160M 320M 480M 640M 800M 676709000 747138000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 8c DDR5-6000 12c DDR5-6000 120M 240M 360M 480M 600M 497798000 554936000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
NAS Parallel Benchmarks NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C 8c DDR5-6000 12c DDR5-6000 80K 160K 240K 320K 400K SE +/- 377.10, N = 3 SE +/- 2078.86, N = 3 284718.53 361153.06 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C 8c DDR5-6000 12c DDR5-6000 14K 28K 42K 56K 70K SE +/- 631.87, N = 15 SE +/- 928.22, N = 15 60572.20 65380.58 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 91.88, N = 3 SE +/- 85.90, N = 3 11375.83 11230.81 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D 8c DDR5-6000 12c DDR5-6000 3K 6K 9K 12K 15K SE +/- 133.00, N = 3 SE +/- 614.47, N = 12 14658.25 13685.97 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C 8c DDR5-6000 12c DDR5-6000 40K 80K 120K 160K 200K SE +/- 1124.88, N = 3 SE +/- 1877.63, N = 4 120202.20 164903.45 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D 8c DDR5-6000 12c DDR5-6000 1500 3000 4500 6000 7500 SE +/- 31.49, N = 3 SE +/- 26.78, N = 3 5653.35 6970.89 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C 8c DDR5-6000 12c DDR5-6000 70K 140K 210K 280K 350K SE +/- 1080.76, N = 3 SE +/- 2348.10, N = 3 275011.41 311883.80 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C 8c DDR5-6000 12c DDR5-6000 30K 60K 90K 120K 150K SE +/- 1082.06, N = 15 SE +/- 2056.93, N = 3 105206.58 147097.41 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B 8c DDR5-6000 12c DDR5-6000 40K 80K 120K 160K 200K SE +/- 1524.43, N = 15 SE +/- 2749.03, N = 3 169436.81 199255.89 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C 8c DDR5-6000 12c DDR5-6000 30K 60K 90K 120K 150K SE +/- 459.55, N = 3 SE +/- 622.44, N = 3 108558.21 154965.84 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
PostgreSQL OpenBenchmarking.org TPS, More Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 800 - Mode: Read Only 8c DDR5-6000 12c DDR5-6000 700K 1400K 2100K 2800K 3500K SE +/- 8342.05, N = 3 SE +/- 1685.88, N = 3 3149277 3155036 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpq -lpgcommon -lpgport -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 800 - Mode: Read Write 8c DDR5-6000 12c DDR5-6000 20K 40K 60K 80K 100K SE +/- 836.14, N = 11 SE +/- 1257.14, N = 5 112175 113745 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpq -lpgcommon -lpgport -lm
Chaos Group V-RAY This is a test of Chaos Group's V-RAY benchmark. V-RAY is a commercial renderer that can integrate with various creator software products like SketchUp and 3ds Max. The V-RAY benchmark is standalone and supports CPU and NVIDIA CUDA/RTX based rendering. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org vsamples, More Is Better Chaos Group V-RAY 6.0 Mode: CPU 8c DDR5-6000 12c DDR5-6000 40K 80K 120K 160K 200K SE +/- 357.23, N = 3 SE +/- 252.42, N = 3 198145 199769
OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile 8c DDR5-6000 12c DDR5-6000 20K 40K 60K 80K 100K SE +/- 1096.54, N = 15 SE +/- 1271.15, N = 3 100519.4 99408.3
OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float 8c DDR5-6000 12c DDR5-6000 1000 2000 3000 4000 5000 SE +/- 18.31, N = 3 SE +/- 16.88, N = 3 4441.27 4399.48
OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant 8c DDR5-6000 12c DDR5-6000 1300 2600 3900 5200 6500 SE +/- 155.36, N = 15 SE +/- 109.66, N = 15 6260.21 5800.65
OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 8c DDR5-6000 12c DDR5-6000 13K 26K 39K 52K 65K SE +/- 811.71, N = 3 SE +/- 598.16, N = 3 59977.8 58054.3
OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 8c DDR5-6000 12c DDR5-6000 1500 3000 4500 6000 7500 SE +/- 68.84, N = 15 SE +/- 28.16, N = 3 7110.12 7211.99
oneDNN OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU 8c DDR5-6000 12c DDR5-6000 0.1197 0.2394 0.3591 0.4788 0.5985 SE +/- 0.004111, N = 3 SE +/- 0.003116, N = 3 0.529052 0.531984 MIN: 0.47 MIN: 0.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU 8c DDR5-6000 12c DDR5-6000 0.06 0.12 0.18 0.24 0.3 SE +/- 0.000649, N = 3 SE +/- 0.000721, N = 3 0.265558 0.266791 MIN: 0.25 MIN: 0.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU 8c DDR5-6000 12c DDR5-6000 0.0773 0.1546 0.2319 0.3092 0.3865 SE +/- 0.000677, N = 3 SE +/- 0.000927, N = 3 0.341374 0.343593 MIN: 0.33 MIN: 0.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.01476, N = 3 SE +/- 0.02120, N = 3 6.84402 6.76736 MIN: 5.14 MIN: 4.07 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU 8c DDR5-6000 12c DDR5-6000 0.1612 0.3224 0.4836 0.6448 0.806 SE +/- 0.001800, N = 3 SE +/- 0.002304, N = 3 0.715363 0.716373 MIN: 0.62 MIN: 0.62 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU 8c DDR5-6000 12c DDR5-6000 90 180 270 360 450 SE +/- 0.28, N = 3 SE +/- 0.20, N = 3 429.68 425.78 MIN: 423.83 MIN: 420.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU 8c DDR5-6000 12c DDR5-6000 60 120 180 240 300 SE +/- 0.67, N = 3 SE +/- 0.81, N = 3 278.00 277.64 MIN: 271.73 MIN: 269.94 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
PostgreSQL OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 800 - Mode: Read Only - Average Latency 8c DDR5-6000 12c DDR5-6000 0.0572 0.1144 0.1716 0.2288 0.286 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 0.254 0.254 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpq -lpgcommon -lpgport -lm
OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 800 - Mode: Read Write - Average Latency 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.053, N = 11 SE +/- 0.079, N = 5 7.136 7.037 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpq -lpgcommon -lpgport -lm
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 20 40 60 80 100 SE +/- 1.46, N = 15 SE +/- 0.19, N = 3 106.96 69.48 MIN: 37.41 / MAX: 380.85 MIN: 34.01 / MAX: 134.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 70 140 210 280 350 SE +/- 0.26, N = 3 SE +/- 0.62, N = 3 330.13 330.01 MIN: 257.44 / MAX: 357.54 MIN: 242.42 / MAX: 353.46 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 1.305 2.61 3.915 5.22 6.525 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 5.80 5.80 MIN: 2.26 / MAX: 26.37 MIN: 2.41 / MAX: 27.54 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 0.8933 1.7866 2.6799 3.5732 4.4665 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 3.97 3.96 MIN: 1.61 / MAX: 20.64 MIN: 1.68 / MAX: 20.25 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 20.85 19.61 MIN: 10.51 / MAX: 40.51 MIN: 10.37 / MAX: 41.25 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 14 28 42 56 70 SE +/- 0.40, N = 3 SE +/- 0.09, N = 3 64.75 56.71 MIN: 28.81 / MAX: 99.11 MIN: 34.97 / MAX: 96.61 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 6.70 6.71 MIN: 2.45 / MAX: 24 MIN: 2.58 / MAX: 23.11 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 7.53 7.48 MIN: 4.35 / MAX: 22.92 MIN: 4.42 / MAX: 23.35 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 12.42 11.57 MIN: 6.11 / MAX: 28.55 MIN: 6.95 / MAX: 30.79 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU 8c DDR5-6000 12c DDR5-6000 1.026 2.052 3.078 4.104 5.13 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 4.56 4.56 MIN: 2.23 / MAX: 20.98 MIN: 2.45 / MAX: 18.95 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 27.06 27.01 MIN: 16.36 / MAX: 45.56 MIN: 16.35 / MAX: 47.96 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 8c DDR5-6000 12c DDR5-6000 0.0743 0.1486 0.2229 0.2972 0.3715 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.33 0.33 MIN: 0.13 / MAX: 24.63 MIN: 0.12 / MAX: 23.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Xcompact3d Incompact3d Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d 8c DDR5-6000 12c DDR5-6000 60 120 180 240 300 SE +/- 3.79, N = 3 SE +/- 2.76, N = 3 292.58 203.94 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.05824182, N = 15 SE +/- 0.06792326, N = 3 9.01416429 7.08492692 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenFOAM OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Mesh Time 8c DDR5-6000 12c DDR5-6000 5 10 15 20 25 22.30 22.94 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Small Mesh Size - Execution Time 8c DDR5-6000 12c DDR5-6000 5 10 15 20 25 22.65 22.41 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time 8c DDR5-6000 12c DDR5-6000 20 40 60 80 100 109.20 106.31 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time 8c DDR5-6000 12c DDR5-6000 60 120 180 240 300 254.30 192.23 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
OpenRadioss OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2023.09.15 Model: Chrysler Neon 1M 8c DDR5-6000 12c DDR5-6000 30 60 90 120 150 SE +/- 0.94, N = 11 SE +/- 0.26, N = 3 129.17 90.77
SPECFEM3D simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Mount St. Helens 8c DDR5-6000 12c DDR5-6000 1.3175 2.635 3.9525 5.27 6.5875 SE +/- 0.034598254, N = 3 SE +/- 0.055041694, N = 6 5.855350933 5.778714613 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Layered Halfspace 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.07, N = 3 13.86 13.70 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Tomographic Model 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.080343793, N = 3 SE +/- 0.086144824, N = 3 6.626369356 6.777958300 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Homogeneous Halfspace 8c DDR5-6000 12c DDR5-6000 2 4 6 8 10 SE +/- 0.043840287, N = 3 SE +/- 0.106640607, N = 3 8.198005837 8.194862498 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.1.1 Model: Water-layered Halfspace 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.08, N = 3 14.36 14.28 1. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Build2 OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.17 Time To Compile 8c DDR5-6000 12c DDR5-6000 12 24 36 48 60 SE +/- 0.16, N = 3 SE +/- 0.01, N = 3 53.55 53.44
GPAW GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube 8c DDR5-6000 12c DDR5-6000 7 14 21 28 35 SE +/- 0.24, N = 3 SE +/- 0.23, N = 3 28.56 25.66 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
Blender OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.2 Blend File: BMW27 - Compute: CPU-Only 8c DDR5-6000 12c DDR5-6000 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 13.09 13.14
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.2 Blend File: Junkshop - Compute: CPU-Only 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 16.80 16.75
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.2 Blend File: Classroom - Compute: CPU-Only 8c DDR5-6000 12c DDR5-6000 8 16 24 32 40 SE +/- 0.08, N = 3 SE +/- 0.04, N = 3 35.51 35.43
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.2 Blend File: Fishy Cat - Compute: CPU-Only 8c DDR5-6000 12c DDR5-6000 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 18.55 18.46
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.2 Blend File: Barbershop - Compute: CPU-Only 8c DDR5-6000 12c DDR5-6000 30 60 90 120 150 SE +/- 0.12, N = 3 SE +/- 0.11, N = 3 121.15 120.83
OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.2 Blend File: Pabellon Barcelona - Compute: CPU-Only 8c DDR5-6000 12c DDR5-6000 9 18 27 36 45 SE +/- 0.15, N = 3 SE +/- 0.14, N = 3 41.18 41.25
Epoch OpenBenchmarking.org Seconds, Fewer Is Better Epoch 4.19.4 Epoch3D Deck: Cone 8c DDR5-6000 12c DDR5-6000 40 80 120 160 200 SE +/- 0.52, N = 3 SE +/- 0.55, N = 3 196.03 186.85 1. (F9X) gfortran options: -O3 -std=f2003 -Jobj -lsdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
WarpX OpenBenchmarking.org Seconds, Fewer Is Better WarpX 24.10 Input: Uniform Plasma 8c DDR5-6000 12c DDR5-6000 4 8 12 16 20 SE +/- 0.24, N = 12 SE +/- 0.23, N = 15 16.45 16.27 1. (CXX) g++ options: -O3 -lm
OpenBenchmarking.org Seconds, Fewer Is Better WarpX 24.10 Input: Plasma Acceleration 8c DDR5-6000 12c DDR5-6000 6 12 18 24 30 SE +/- 0.17, N = 3 SE +/- 0.22, N = 3 25.76 25.85 1. (CXX) g++ options: -O3 -lm
XNNPACK OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 8c DDR5-6000 12c DDR5-6000 1000 2000 3000 4000 5000 SE +/- 41.29, N = 3 SE +/- 57.47, N = 3 4596 4629 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 46.71, N = 3 SE +/- 73.61, N = 3 9236 9242 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large 8c DDR5-6000 12c DDR5-6000 3K 6K 9K 12K 15K SE +/- 140.55, N = 3 SE +/- 63.76, N = 3 14189 13643 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 8.95, N = 3 SE +/- 54.11, N = 3 10534 10418 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 8c DDR5-6000 12c DDR5-6000 1000 2000 3000 4000 5000 SE +/- 11.46, N = 3 SE +/- 43.59, N = 3 4570 4611 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 49.08, N = 3 SE +/- 96.09, N = 3 9094 8982 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large 8c DDR5-6000 12c DDR5-6000 3K 6K 9K 12K 15K SE +/- 75.41, N = 3 SE +/- 58.09, N = 3 13257 13074 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 162.45, N = 3 SE +/- 181.63, N = 3 10810 10559 1. (CXX) g++ options: -O3 -lrt -lm
OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 8c DDR5-6000 12c DDR5-6000 2K 4K 6K 8K 10K SE +/- 103.51, N = 3 SE +/- 60.71, N = 3 9968 10095 1. (CXX) g++ options: -O3 -lrt -lm
12c DDR5-6000 Processor: AMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads), Motherboard: Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS), Chipset: AMD 1Ah, Memory: 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS, Graphics: ASPEED, Network: 2 x Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 24.10, Kernel: 6.12.0-rc7-phx (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server, Compiler: GCC 14.2.0, File-System: ext4, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116Python Notes: Python 3.12.7Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 15 November 2024 11:35 by user phoronix.
8c DDR5-6000 Processor: AMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads), Motherboard: Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS), Chipset: AMD 1Ah, Memory: 8 x 64GB DDR5-6000MT/s, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS, Graphics: ASPEED, Network: 2 x Broadcom NetXtreme BCM5720 PCIe
OS: Ubuntu 24.10, Kernel: 6.12.0-rc7-phx (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server, Compiler: GCC 14.2.0, File-System: ext4, Screen Resolution: 1024x768
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116Java Notes: OpenJDK Runtime Environment (build 21.0.5+11-Ubuntu-1ubuntu124.10)Python Notes: Python 3.12.7Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 15 November 2024 21:45 by user phoronix.