AMD Threadripper 7995WX NPS / SNC2 SNC4 Benchmarks AMD Ryzen Threadripper PRO 7995WX 96-Cores testing of NPS/SNC settings with default (disabled), SNC2, and SNC4 modes. Benchmarks by Michael Larabel for a future article.
HTML result view exported from: https://openbenchmarking.org/result/2311288-NE-TR7995WXN68&grs&sor .
AMD Threadripper 7995WX NPS / SNC2 SNC4 Benchmarks Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution Default - Disabled SNC2 SNC4 AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads) HP 8B24 (U65 Ver. 01.01.04 BIOS) AMD Device 14a4 128GB 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1 NVIDIA RTX A4000 16GB NVIDIA GA104 HD Audio ASUS VP28U Realtek RTL8111/8168/8411 Ubuntu 23.10 6.5.0-13-generic (x86_64) GNOME Shell 45.0 X Server 1.21.1.7 NVIDIA 535.129.03 4.6.0 OpenCL 3.0 CUDA 12.2.147 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 OpenCL Details - GPU Compute Cores: 6144 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AMD Threadripper 7995WX NPS / SNC2 SNC4 Benchmarks askap: Hogbom Clean OpenMP askap: tConvolve OpenMP - Gridding openvino: Vehicle Detection FP16 - CPU askap: tConvolve OpenMP - Degridding openvino: Road Segmentation ADAS FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Face Detection FP16 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU lulesh: build-linux-kernel: allmodconfig openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU graph500: 26 graph500: 26 openvino: Vehicle Detection FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU build-linux-kernel: defconfig openvino: Road Segmentation ADAS FP16 - CPU cloverleaf: clover_bm16 openvino: Person Vehicle Bike Detection FP16 - CPU build-llvm: Ninja askap: tConvolve MT - Gridding graph500: 26 openvkl: vklBenchmarkCPU ISPC askap: tConvolve MT - Degridding openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU asmfish: 1024 Hash Memory, 26 Depth build-nodejs: Time To Compile openvino: Face Detection Retail FP16 - CPU build-gem5: Time To Compile openfoam: drivaerFastback, Medium Mesh Size - Execution Time graph500: 26 npb: CG.C uvg266: Bosphorus 4K - Ultra Fast john-the-ripper: MD5 vvenc: Bosphorus 4K - Faster openvino: Person Detection FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU uvg266: Bosphorus 4K - Super Fast incompact3d: input.i3d 129 Cells Per Direction uvg266: Bosphorus 4K - Very Fast openvino: Person Detection FP32 - CPU cloverleaf: clover_bm embree: Pathtracer ISPC - Asian Dragon Obj tensorflow: CPU - 64 - ResNet-50 john-the-ripper: Blowfish cloverleaf: clover_bm64_short john-the-ripper: bcrypt amg: vvenc: Bosphorus 4K - Fast tensorflow: CPU - 32 - ResNet-50 specfem3d: Tomographic Model qe: AUSURF112 npb: FT.C tensorflow: CPU - 512 - ResNet-50 tensorflow: CPU - 256 - ResNet-50 rodinia: OpenMP Leukocyte pytorch: CPU - 1 - Efficientnet_v2_l luxcorerender: Orange Juice - CPU embree: Pathtracer ISPC - Asian Dragon pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 64 - ResNet-152 tensorflow: CPU - 16 - ResNet-50 pytorch: CPU - 16 - ResNet-152 rodinia: OpenMP Streamcluster npb: SP.B radiance: SMP Parallel luxcorerender: LuxCore Benchmark - CPU rodinia: OpenMP HotSpot3D specfem3d: Layered Halfspace openfoam: drivaerFastback, Medium Mesh Size - Mesh Time pytorch: CPU - 16 - ResNet-50 vvenc: Bosphorus 1080p - Fast openvino: Face Detection FP16 - CPU npb: MG.C memcached: 1:100 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 32 - ResNet-50 npb: IS.D john-the-ripper: WPA PSK luxcorerender: Danish Mood - CPU pytorch: CPU - 1 - ResNet-152 john-the-ripper: HMAC-SHA512 vvenc: Bosphorus 1080p - Faster incompact3d: input.i3d 193 Cells Per Direction build-llvm: Unix Makefiles specfem3d: Homogeneous Halfspace specfem3d: Mount St. Helens openvino: Vehicle Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU gpaw: Carbon Nanotube openvino: Face Detection Retail FP16-INT8 - CPU uvg266: Bosphorus 1080p - Super Fast uvg266: Bosphorus 1080p - Very Fast specfem3d: Water-layered Halfspace pytorch: CPU - 1 - ResNet-50 compress-7zip: Decompression Rating petsc: Streams uvg266: Bosphorus 1080p - Ultra Fast qmcpack: Li2_STO_ae uvg266: Bosphorus 4K - Slow compress-7zip: Compression Rating quantlib: Multi-Threaded npb: SP.C libxsmm: 128 npb: LU.C pgbench: 1000 - 1000 - Read Only pgbench: 1000 - 1000 - Read Only - Average Latency embree: Pathtracer ISPC - Crown numpy: memcached: 1:10 build-python: Released Build, PGO + LTO Optimized luxcorerender: DLSC - CPU blender: Fishy Cat - CPU-Only uvg266: Bosphorus 4K - Medium liquid-dsp: 192 - 256 - 512 openvino: Weld Porosity Detection FP16 - CPU libxsmm: 256 liquid-dsp: 32 - 256 - 57 ospray-studio: 2 - 4K - 32 - Path Tracer - CPU openvino: Weld Porosity Detection FP16 - CPU pgbench: 100 - 1000 - Read Only ospray-studio: 3 - 4K - 32 - Path Tracer - CPU liquid-dsp: 64 - 256 - 512 openradioss: Chrysler Neon 1M namd: ATPase Simulation - 327,506 Atoms libxsmm: 32 uvg266: Bosphorus 1080p - Slow openvino: Weld Porosity Detection FP16-INT8 - CPU pgbench: 100 - 1000 - Read Only - Average Latency openssl: SHA512 ospray-studio: 3 - 4K - 1 - Path Tracer - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU liquid-dsp: 128 - 256 - 512 ospray-studio: 1 - 4K - 32 - Path Tracer - CPU memcached: 1:5 ospray-studio: 1 - 4K - 1 - Path Tracer - CPU openvino: Face Detection FP16-INT8 - CPU ospray-studio: 3 - 4K - 16 - Path Tracer - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU ospray-studio: 2 - 4K - 16 - Path Tracer - CPU blender: Classroom - CPU-Only uvg266: Bosphorus 1080p - Medium libxsmm: 64 npb: BT.C openssl: AES-128-GCM openssl: RSA4096 mt-dgemm: Sustained Floating-Point Rate liquid-dsp: 128 - 256 - 57 liquid-dsp: 64 - 256 - 32 blender: BMW27 - CPU-Only ospray-studio: 1 - 4K - 16 - Path Tracer - CPU blender: Pabellon Barcelona - CPU-Only liquid-dsp: 192 - 256 - 57 liquid-dsp: 32 - 256 - 32 openvino: Handwritten English Recognition FP16-INT8 - CPU ospray-studio: 2 - 4K - 1 - Path Tracer - CPU liquid-dsp: 128 - 256 - 32 openvino: Handwritten English Recognition FP16-INT8 - CPU liquid-dsp: 192 - 256 - 32 openssl: ChaCha20-Poly1305 rodinia: OpenMP LavaMD openssl: AES-256-GCM liquid-dsp: 64 - 256 - 57 liquid-dsp: 32 - 256 - 512 openssl: ChaCha20 rodinia: OpenMP CFD Solver openssl: RSA4096 blender: Barbershop - CPU-Only openssl: SHA256 pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pgbench: 1000 - 1000 - Read Write - Average Latency pgbench: 1000 - 1000 - Read Write pgbench: 100 - 1000 - Read Write - Average Latency pgbench: 100 - 1000 - Read Write gromacs: MPI CPU - water_GMX50_bare askap: tConvolve MPI - Gridding askap: tConvolve MPI - Degridding clickhouse: 100M Rows Hits Dataset, Third Run clickhouse: 100M Rows Hits Dataset, Second Run clickhouse: 100M Rows Hits Dataset, First Run / Cold Cache stockfish: Total Time easywave: e2Asean Grid + BengkuluSept2007 Source - 2400 easywave: e2Asean Grid + BengkuluSept2007 Source - 1200 luxcorerender: Rainbow Colors and Prism - CPU npb: EP.C Default - Disabled SNC2 SNC4 1127.85 19018.3 17.55 20481.2 36.57 141.78 141.54 965.67 25.26 493.57 8.07 93.23 3.81 0.62 8.49 23294.844 264.077 86958.12 113350.17 756165000 727662000 2730.60 0.86 31.454 1310.86 329.61 5643.89 121.868 8655.84 462057000 2153 11871.2 2592.59 37.01 248215163 111.370 12547.91 150.970 331.85586 357127000 52079.13 69.02 14600667 15.245 338.09 514.22 67.47 2.61738705 66.32 338.74 10.96 112.8287 90.46 173145 39.32 174335 1662894000 8.715 70.35 7.987430452 326.06 100728.49 135.02 118.86 29.386 11.16 22.24 129.1719 16.16 15.91 51.77 16.06 4.703 145525.94 119.192 12.39 58.866 18.741871937 138.64581 38.68 24.298 49.41 95501.50 7743874.04 39.06 39.21 4279.98 614263 10.65 18.99 296518667 41.397 10.6618586 175.468 9.942989762 7.724890032 5936.37 5.38 38.128 17809.43 211.36 208.36 19.268505905 47.70 655203 183161.1076 207.97 103.90 29.92 546936 310771.1 89217.91 2043.5 255135.24 1986807 0.504 108.0805 746.05 5811757.93 188.351 15.23 19.47 33.23 1514200000 19.33 2564.6 1730566667 38264 4962.13 3792698 43999 938493333 157.35 0.25803 555.6 89.19 9.72 0.264 42723911653 1252 9866.45 1314200000 37890 3359726.87 1064 96.95 20017 1898.14 17193 38.02 98.33 1055.4 214445.97 941442701447 1533067.1 43.879451 4495533333 2646100000 15.25 16939 46.77 5407200000 1435333333 45.25 1076 4228133333 2120.23 5526733333 361852375860 26.953 815175705397 3020900000 522560000 511694244653 5.577 49897.3 136.29 131629499893 6.36 6.35 6.34 60.659 16623 55.396 18068 10.380 43532.9 40198.3 504.59 490.52 457.06 287331097 58.646 23.986 32.75 8869.84 564.984 12083.7 6.18 14122.5 14.51 67.22 67.17 468.35 12.52 245.02 4.14 49.80 2.09 0.36 5.06 33106.244 174.354 134566.00 166444.72 1069560000 1015300000 3875.39 0.61 24.579 1652.91 355.04 4737.07 104.601 8223.25 522187000 2089 11209.2 2435.28 39.36 272254945 102.322 11414.89 140.813 310.65963 389235000 54560.03 65.34 13888800 14.306 356.72 481.41 64.65 2.49659332 62.99 357.03 11.82 111.3601 85.54 169287 41.21 171734 1732149333 8.278 66.13 7.544143736 316.50 97435.24 130.65 115.90 30.894 10.65 22.77 128.0099 15.36 16.03 49.51 15.30 4.884 151244.49 114.284 11.94 59.859 18.359442232 135.10014 40.09 23.714 51.06 97068.01 7704668.27 40.27 40.41 4282.36 611669 10.85 18.44 289221400 40.222 10.4893306 170.828 9.742806803 7.756233737 5783.01 5.52 37.803 17366.29 206.23 203.48 19.215607236 48.83 640928 185070.8277 204.49 106.01 29.47 535713 313010.5 88668.13 2026.2 256378.22 1950507 0.513 106.2642 758.02 5818020.66 188.073 15.19 19.76 33.12 1499133333 19.08 2583.6 1744433333 38015 5023.42 3746006 44300 935170000 155.50 0.25612 553.3 88.68 9.63 0.267 43151618910 1259 9948.30 1302833333 38009 3359880.55 1074 97.76 20144 1914.89 17113 38.33 98.22 1052.3 215684.06 943365732580 1529451.7 43.766920 4493966667 2632233333 15.32 16999 46.97 5393633333 1436433333 45.22 1080 4233566667 2121.18 5519933333 361987547683 26.905 815393358357 3016100000 523603333 511318347957 5.581 49862.1 136.20 131627463983 5.38 5.31 5.36 63.438 15769 72.148 14096 10.450 43138.9 40552.4 444.12 438.54 417.30 295780798 59.249 25.480 31.82 9712.81 325.394 6602.28 6.20 7607.31 14.56 65.32 65.40 467.39 12.54 244.80 4.13 50.56 2.09 0.36 5.07 38146.304 168.153 134176.02 164363.19 1104900000 1056980000 3863.88 0.61 24.087 1647.09 396.53 4721.81 104.693 7467.14 527864000 1905 10522.5 2333.46 41.08 275458088 100.616 11421.25 137.479 302.69164 390682000 56924.56 63.23 13419000 14.014 367.12 474.11 62.25 2.41614302 61.26 366.59 10.95 104.8031 84.50 161790 42.00 163259 1773758333 8.186 66.88 7.519415000 307.06 94921.27 127.24 112.08 31.123 10.58 21.60 122.5871 15.61 15.24 49.27 15.39 4.933 151839.30 117.44 11.89 61.234 18.044192295 133.7447 39.80 23.452 51.19 98739.50 7507583.68 39.07 39.29 4155.68 596180 10.97 18.62 287955500 40.203 10.3613243 170.540 9.668759733 7.545176178 5797.28 5.51 37.163 17402.76 206.92 207.62 18.821734642 48.36 649120 187197.6338 208.97 103.77 29.30 536820 317145.3 87467.23 2004.7 259883.62 1956509 0.511 106.4294 753.64 5728479.09 185.505 15.00 19.71 32.75 1493000000 19.06 2599.9 1721266667 38523 5026.69 3759094 44535 927276667 156.21 0.25502 559.8 88.18 9.61 0.266 43206641983 1266 9973.18 1300233333 38284 3327393.35 1073 97.85 20199 1911.68 17262 38.24 97.60 1059.1 215764.82 946933378620 1538165.3 43.630978 4518133333 2643433333 15.31 17014 46.88 5416600000 1440933333 45.08 1079 4243600000 2127.42 5535500000 362765500023 26.971 817131387107 3023166667 522996667 512318580417 5.585 49924.9 136.28 131706698427 3.68 4.07 4.15 65.328 15314 72.650 14067 10.662 35893.7 34358.5 400.96 397.01 388.42 300267148 75.172 28.739 34.72 10411.60 OpenBenchmarking.org
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP Default - Disabled SNC2 SNC4 200 400 600 800 1000 SE +/- 4.25, N = 3 SE +/- 1.84, N = 3 SE +/- 1.54, N = 3 1127.85 564.98 325.39 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding Default - Disabled SNC2 SNC4 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 SE +/- 122.89, N = 15 SE +/- 54.12, N = 3 19018.30 12083.70 6602.28 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU SNC2 SNC4 Default - Disabled 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 6.18 6.20 17.55 MIN: 5.27 / MAX: 16.2 MIN: 5.02 / MAX: 17.36 MIN: 5.74 / MAX: 43.74 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding Default - Disabled SNC2 SNC4 4K 8K 12K 16K 20K SE +/- 100.74, N = 15 SE +/- 0.00, N = 3 20481.20 14122.50 7607.31 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU SNC2 SNC4 Default - Disabled 8 16 24 32 40 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.36, N = 3 14.51 14.56 36.57 MIN: 12.17 / MAX: 35.53 MIN: 11.83 / MAX: 32.54 MIN: 15.82 / MAX: 76.01 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 0.13, N = 3 SE +/- 0.29, N = 3 SE +/- 0.52, N = 3 65.32 67.22 141.78 MIN: 36.8 / MAX: 87.75 MIN: 37.86 / MAX: 90.7 MIN: 54.39 / MAX: 210.64 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 0.17, N = 3 SE +/- 0.43, N = 3 SE +/- 0.37, N = 3 65.40 67.17 141.54 MIN: 44.03 / MAX: 91.21 MIN: 38.08 / MAX: 93.26 MIN: 50.69 / MAX: 212.29 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 200 400 600 800 1000 SE +/- 1.50, N = 3 SE +/- 0.78, N = 3 SE +/- 0.53, N = 3 467.39 468.35 965.67 MIN: 391.77 / MAX: 504.29 MIN: 396.69 / MAX: 533.02 MIN: 767.19 / MAX: 1026.9 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 12.52 12.54 25.26 MIN: 10.66 / MAX: 27.97 MIN: 10.7 / MAX: 28.81 MIN: 12.5 / MAX: 49.19 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 110 220 330 440 550 SE +/- 0.34, N = 3 SE +/- 0.27, N = 3 SE +/- 0.76, N = 3 244.80 245.02 493.57 MIN: 212 / MAX: 263.67 MIN: 201.96 / MAX: 285.01 MIN: 246.03 / MAX: 522.95 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 4.13 4.14 8.07 MIN: 3.67 / MAX: 12.51 MIN: 3.73 / MAX: 12.01 MIN: 4.25 / MAX: 26.67 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU SNC2 SNC4 Default - Disabled 20 40 60 80 100 SE +/- 0.22, N = 3 SE +/- 0.15, N = 3 SE +/- 0.07, N = 3 49.80 50.56 93.23 MIN: 38.67 / MAX: 117.73 MIN: 38.52 / MAX: 103.27 MIN: 42.51 / MAX: 145.87 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU SNC2 SNC4 Default - Disabled 0.8573 1.7146 2.5719 3.4292 4.2865 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 2.09 2.09 3.81 MIN: 1.85 / MAX: 7.8 MIN: 1.87 / MAX: 8.99 MIN: 2.1 / MAX: 21.93 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 0.1395 0.279 0.4185 0.558 0.6975 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.36 0.36 0.62 MIN: 0.27 / MAX: 31.55 MIN: 0.27 / MAX: 41.09 MIN: 0.21 / MAX: 39.81 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU SNC2 SNC4 Default - Disabled 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 5.06 5.07 8.49 MIN: 4.47 / MAX: 12.75 MIN: 4.35 / MAX: 13.85 MIN: 5.66 / MAX: 25.99 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 SNC4 SNC2 Default - Disabled 8K 16K 24K 32K 40K SE +/- 142.33, N = 3 SE +/- 116.87, N = 3 SE +/- 398.51, N = 12 38146.30 33106.24 23294.84 1. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi
Timed Linux Kernel Compilation Build: allmodconfig OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 6.1 Build: allmodconfig SNC4 SNC2 Default - Disabled 60 120 180 240 300 SE +/- 0.44, N = 3 SE +/- 0.85, N = 3 SE +/- 1.24, N = 3 168.15 174.35 264.08
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU SNC2 SNC4 Default - Disabled 30K 60K 90K 120K 150K SE +/- 365.77, N = 3 SE +/- 140.85, N = 3 SE +/- 195.22, N = 3 134566.00 134176.02 86958.12 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 40K 80K 120K 160K 200K SE +/- 805.14, N = 3 SE +/- 183.36, N = 3 SE +/- 251.60, N = 3 166444.72 164363.19 113350.17 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
Graph500 Scale: 26 OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 SNC4 SNC2 Default - Disabled 200M 400M 600M 800M 1000M 1104900000 1069560000 756165000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Graph500 Scale: 26 OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 SNC4 SNC2 Default - Disabled 200M 400M 600M 800M 1000M 1056980000 1015300000 727662000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU SNC2 SNC4 Default - Disabled 800 1600 2400 3200 4000 SE +/- 8.11, N = 3 SE +/- 8.94, N = 3 SE +/- 9.07, N = 3 3875.39 3863.88 2730.60 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU SNC2 SNC4 Default - Disabled 0.1935 0.387 0.5805 0.774 0.9675 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.61 0.61 0.86 MIN: 0.44 / MAX: 16.14 MIN: 0.46 / MAX: 16.98 MIN: 0.28 / MAX: 17.81 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
Timed Linux Kernel Compilation Build: defconfig OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 6.1 Build: defconfig SNC4 SNC2 Default - Disabled 7 14 21 28 35 SE +/- 0.29, N = 4 SE +/- 0.26, N = 5 SE +/- 0.34, N = 4 24.09 24.58 31.45
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU SNC2 SNC4 Default - Disabled 400 800 1200 1600 2000 SE +/- 4.22, N = 3 SE +/- 0.95, N = 3 SE +/- 12.81, N = 3 1652.91 1647.09 1310.86 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
CloverLeaf Input: clover_bm16 OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf 1.3 Input: clover_bm16 Default - Disabled SNC2 SNC4 90 180 270 360 450 SE +/- 0.23, N = 3 SE +/- 1.55, N = 3 SE +/- 1.41, N = 3 329.61 355.04 396.53 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU Default - Disabled SNC2 SNC4 1200 2400 3600 4800 6000 SE +/- 12.20, N = 3 SE +/- 6.31, N = 3 SE +/- 10.32, N = 3 5643.89 4737.07 4721.81 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
Timed LLVM Compilation Build System: Ninja OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 16.0 Build System: Ninja SNC2 SNC4 Default - Disabled 30 60 90 120 150 SE +/- 0.30, N = 3 SE +/- 0.52, N = 3 SE +/- 0.08, N = 3 104.60 104.69 121.87
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding Default - Disabled SNC2 SNC4 2K 4K 6K 8K 10K SE +/- 13.20, N = 3 SE +/- 5.09, N = 3 SE +/- 78.17, N = 3 8655.84 8223.25 7467.14 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Graph500 Scale: 26 OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 SNC4 SNC2 Default - Disabled 110M 220M 330M 440M 550M 527864000 522187000 462057000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
OpenVKL Benchmark: vklBenchmarkCPU ISPC OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 2.0.0 Benchmark: vklBenchmarkCPU ISPC Default - Disabled SNC2 SNC4 500 1000 1500 2000 2500 SE +/- 0.88, N = 3 SE +/- 2.73, N = 3 SE +/- 5.29, N = 3 2153 2089 1905 MIN: 179 / MAX: 27831 MIN: 178 / MAX: 27767 MIN: 180 / MAX: 27886
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding Default - Disabled SNC2 SNC4 3K 6K 9K 12K 15K SE +/- 75.70, N = 3 SE +/- 13.79, N = 3 SE +/- 138.85, N = 3 11871.2 11209.2 10522.5 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU Default - Disabled SNC2 SNC4 600 1200 1800 2400 3000 SE +/- 19.25, N = 3 SE +/- 24.21, N = 3 SE +/- 9.95, N = 3 2592.59 2435.28 2333.46 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU Default - Disabled SNC2 SNC4 9 18 27 36 45 SE +/- 0.27, N = 3 SE +/- 0.40, N = 3 SE +/- 0.17, N = 3 37.01 39.36 41.08 MIN: 20.68 / MAX: 66.67 MIN: 32.78 / MAX: 55.31 MIN: 31.72 / MAX: 55.31 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
asmFish 1024 Hash Memory, 26 Depth OpenBenchmarking.org Nodes/second, More Is Better asmFish 2018-07-23 1024 Hash Memory, 26 Depth SNC4 SNC2 Default - Disabled 60M 120M 180M 240M 300M SE +/- 1785028.61, N = 3 SE +/- 2257424.10, N = 3 SE +/- 1898256.07, N = 3 275458088 272254945 248215163
Timed Node.js Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Node.js Compilation 19.8.1 Time To Compile SNC4 SNC2 Default - Disabled 20 40 60 80 100 SE +/- 0.39, N = 3 SE +/- 0.40, N = 3 SE +/- 1.21, N = 5 100.62 102.32 111.37
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU Default - Disabled SNC4 SNC2 3K 6K 9K 12K 15K SE +/- 73.24, N = 3 SE +/- 61.71, N = 3 SE +/- 56.50, N = 3 12547.91 11421.25 11414.89 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
Timed Gem5 Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Gem5 Compilation 23.0.1 Time To Compile SNC4 SNC2 Default - Disabled 30 60 90 120 150 SE +/- 1.46, N = 5 SE +/- 1.48, N = 3 SE +/- 1.82, N = 3 137.48 140.81 150.97
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Execution Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Execution Time SNC4 SNC2 Default - Disabled 70 140 210 280 350 302.69 310.66 331.86 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
Graph500 Scale: 26 OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 SNC4 SNC2 Default - Disabled 80M 160M 240M 320M 400M 390682000 389235000 357127000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C SNC4 SNC2 Default - Disabled 12K 24K 36K 48K 60K SE +/- 191.96, N = 3 SE +/- 324.22, N = 3 SE +/- 476.77, N = 3 56924.56 54560.03 52079.13 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
uvg266 Video Input: Bosphorus 4K - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Ultra Fast Default - Disabled SNC2 SNC4 15 30 45 60 75 SE +/- 0.15, N = 3 SE +/- 0.36, N = 3 SE +/- 0.25, N = 3 69.02 65.34 63.23
John The Ripper Test: MD5 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: MD5 Default - Disabled SNC2 SNC4 3M 6M 9M 12M 15M SE +/- 43978.53, N = 3 SE +/- 193319.26, N = 15 SE +/- 174193.35, N = 15 14600667 13888800 13419000 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
VVenC Video Input: Bosphorus 4K - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Faster Default - Disabled SNC2 SNC4 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.12, N = 3 SE +/- 0.09, N = 3 15.25 14.31 14.01 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 80 160 240 320 400 SE +/- 0.76, N = 3 SE +/- 1.55, N = 3 SE +/- 1.27, N = 3 367.12 356.72 338.09 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU Default - Disabled SNC2 SNC4 110 220 330 440 550 SE +/- 0.33, N = 3 SE +/- 2.07, N = 3 SE +/- 1.37, N = 3 514.22 481.41 474.11 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
uvg266 Video Input: Bosphorus 4K - Video Preset: Super Fast OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Super Fast Default - Disabled SNC2 SNC4 15 30 45 60 75 SE +/- 0.24, N = 3 SE +/- 0.12, N = 3 SE +/- 0.42, N = 3 67.47 64.65 62.25
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction SNC4 SNC2 Default - Disabled 0.5889 1.1778 1.7667 2.3556 2.9445 SE +/- 0.00587555, N = 3 SE +/- 0.01815766, N = 3 SE +/- 0.03355279, N = 3 2.41614302 2.49659332 2.61738705 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
uvg266 Video Input: Bosphorus 4K - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Very Fast Default - Disabled SNC2 SNC4 15 30 45 60 75 SE +/- 0.38, N = 3 SE +/- 0.60, N = 3 SE +/- 0.22, N = 3 66.32 62.99 61.26
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU SNC4 SNC2 Default - Disabled 80 160 240 320 400 SE +/- 0.96, N = 3 SE +/- 2.29, N = 3 SE +/- 0.89, N = 3 366.59 357.03 338.74 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
CloverLeaf Input: clover_bm OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf 1.3 Input: clover_bm SNC4 Default - Disabled SNC2 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.07, N = 14 SE +/- 0.09, N = 15 10.95 10.96 11.82 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon Obj Default - Disabled SNC2 SNC4 30 60 90 120 150 SE +/- 0.59, N = 3 SE +/- 0.58, N = 3 SE +/- 0.54, N = 3 112.83 111.36 104.80 MIN: 110.06 / MAX: 116.96 MIN: 108.45 / MAX: 114.81 MIN: 100.15 / MAX: 112.2
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 Default - Disabled SNC2 SNC4 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.22, N = 3 SE +/- 0.09, N = 3 90.46 85.54 84.50
John The Ripper Test: Blowfish OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: Blowfish Default - Disabled SNC2 SNC4 40K 80K 120K 160K 200K SE +/- 66.40, N = 3 SE +/- 1828.86, N = 12 SE +/- 1944.34, N = 15 173145 169287 161790 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
CloverLeaf Input: clover_bm64_short OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf 1.3 Input: clover_bm64_short Default - Disabled SNC2 SNC4 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.26, N = 3 SE +/- 0.08, N = 3 39.32 41.21 42.00 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
John The Ripper Test: bcrypt OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: bcrypt Default - Disabled SNC2 SNC4 40K 80K 120K 160K 200K SE +/- 1219.45, N = 3 SE +/- 2018.22, N = 4 SE +/- 1540.98, N = 15 174335 171734 163259 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 SNC4 SNC2 Default - Disabled 400M 800M 1200M 1600M 2000M SE +/- 14033805.07, N = 3 SE +/- 5512706.85, N = 3 SE +/- 1222576.38, N = 3 1773758333 1732149333 1662894000 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi
VVenC Video Input: Bosphorus 4K - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Fast Default - Disabled SNC2 SNC4 2 4 6 8 10 SE +/- 0.094, N = 3 SE +/- 0.051, N = 3 SE +/- 0.118, N = 3 8.715 8.278 8.186 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 Default - Disabled SNC4 SNC2 16 32 48 64 80 SE +/- 0.29, N = 3 SE +/- 0.74, N = 3 SE +/- 0.19, N = 3 70.35 66.88 66.13
SPECFEM3D Model: Tomographic Model OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Tomographic Model SNC4 SNC2 Default - Disabled 2 4 6 8 10 SE +/- 0.068276455, N = 3 SE +/- 0.094423144, N = 3 SE +/- 0.088577121, N = 3 7.519415000 7.544143736 7.987430452 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Quantum ESPRESSO Input: AUSURF112 OpenBenchmarking.org Seconds, Fewer Is Better Quantum ESPRESSO 7.0 Input: AUSURF112 SNC4 SNC2 Default - Disabled 70 140 210 280 350 SE +/- 0.32, N = 3 SE +/- 0.87, N = 3 SE +/- 0.35, N = 3 307.06 316.50 326.06 1. (F9X) gfortran options: -pthread -fopenmp -ldevXlib -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3_omp -lfftw3 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C Default - Disabled SNC2 SNC4 20K 40K 60K 80K 100K SE +/- 245.52, N = 3 SE +/- 925.13, N = 3 SE +/- 107.70, N = 3 100728.49 97435.24 94921.27 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
TensorFlow Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: ResNet-50 Default - Disabled SNC2 SNC4 30 60 90 120 150 SE +/- 0.04, N = 3 SE +/- 0.21, N = 3 SE +/- 0.33, N = 3 135.02 130.65 127.24
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: ResNet-50 Default - Disabled SNC2 SNC4 30 60 90 120 150 SE +/- 0.12, N = 3 SE +/- 0.23, N = 3 SE +/- 0.12, N = 3 118.86 115.90 112.08
Rodinia Test: OpenMP Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte Default - Disabled SNC2 SNC4 7 14 21 28 35 SE +/- 0.23, N = 3 SE +/- 0.14, N = 3 SE +/- 0.05, N = 3 29.39 30.89 31.12 1. (CXX) g++ options: -O2 -lOpenCL
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 11.16 10.65 10.58 MIN: 10.83 / MAX: 11.47 MIN: 6.15 / MAX: 11.04 MIN: 5.94 / MAX: 11.15
LuxCoreRender Scene: Orange Juice - Acceleration: CPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: CPU SNC2 Default - Disabled SNC4 5 10 15 20 25 SE +/- 0.29, N = 15 SE +/- 0.26, N = 15 SE +/- 0.06, N = 3 22.77 22.24 21.60 MIN: 18.36 / MAX: 29.05 MIN: 18.56 / MAX: 28.74 MIN: 18.58 / MAX: 28.17
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Asian Dragon Default - Disabled SNC2 SNC4 30 60 90 120 150 SE +/- 0.63, N = 3 SE +/- 0.56, N = 3 SE +/- 1.43, N = 4 129.17 128.01 122.59 MIN: 126.31 / MAX: 136 MIN: 125.3 / MAX: 131.84 MIN: 115.06 / MAX: 129.69
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 Default - Disabled SNC4 SNC2 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 SE +/- 0.19, N = 3 16.16 15.61 15.36 MIN: 15.77 / MAX: 16.51 MIN: 8.77 / MAX: 16.18 MIN: 8.91 / MAX: 15.79
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 SNC2 Default - Disabled SNC4 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 16.03 15.91 15.24 MIN: 9.32 / MAX: 16.38 MIN: 15.34 / MAX: 16.26 MIN: 8.23 / MAX: 15.81
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 Default - Disabled SNC2 SNC4 12 24 36 48 60 SE +/- 0.23, N = 3 SE +/- 0.59, N = 3 SE +/- 0.63, N = 3 51.77 49.51 49.27
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 Default - Disabled SNC4 SNC2 4 8 12 16 20 SE +/- 0.16, N = 5 SE +/- 0.17, N = 3 SE +/- 0.14, N = 3 16.06 15.39 15.30 MIN: 15.37 / MAX: 16.74 MIN: 8.86 / MAX: 16.02 MIN: 8.79 / MAX: 15.89
Rodinia Test: OpenMP Streamcluster OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster Default - Disabled SNC2 SNC4 1.1099 2.2198 3.3297 4.4396 5.5495 SE +/- 0.007, N = 3 SE +/- 0.019, N = 3 SE +/- 0.027, N = 3 4.703 4.884 4.933 1. (CXX) g++ options: -O2 -lOpenCL
NAS Parallel Benchmarks Test / Class: SP.B OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B SNC4 SNC2 Default - Disabled 30K 60K 90K 120K 150K SE +/- 1739.78, N = 3 SE +/- 674.53, N = 3 SE +/- 676.26, N = 3 151839.30 151244.49 145525.94 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
Radiance Benchmark Test: SMP Parallel OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: SMP Parallel SNC2 SNC4 Default - Disabled 30 60 90 120 150 114.28 117.44 119.19
LuxCoreRender Scene: LuxCore Benchmark - Acceleration: CPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: CPU Default - Disabled SNC2 SNC4 3 6 9 12 15 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 SE +/- 0.15, N = 3 12.39 11.94 11.89 MIN: 5.87 / MAX: 14.11 MIN: 5.67 / MAX: 13.6 MIN: 5.42 / MAX: 13.74
Rodinia Test: OpenMP HotSpot3D OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP HotSpot3D Default - Disabled SNC2 SNC4 14 28 42 56 70 SE +/- 0.82, N = 3 SE +/- 0.62, N = 15 SE +/- 0.53, N = 15 58.87 59.86 61.23 1. (CXX) g++ options: -O2 -lOpenCL
SPECFEM3D Model: Layered Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Layered Halfspace SNC4 SNC2 Default - Disabled 5 10 15 20 25 SE +/- 0.14, N = 3 SE +/- 0.24, N = 3 SE +/- 0.16, N = 3 18.04 18.36 18.74 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Mesh Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time SNC4 SNC2 Default - Disabled 30 60 90 120 150 133.74 135.10 138.65 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 SNC2 SNC4 Default - Disabled 9 18 27 36 45 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.12, N = 3 40.09 39.80 38.68 MIN: 22.35 / MAX: 41.64 MIN: 26.68 / MAX: 41.28 MIN: 37.08 / MAX: 39.76
VVenC Video Input: Bosphorus 1080p - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Fast Default - Disabled SNC2 SNC4 6 12 18 24 30 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 SE +/- 0.24, N = 3 24.30 23.71 23.45 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 12 24 36 48 60 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 51.19 51.06 49.41 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C SNC4 SNC2 Default - Disabled 20K 40K 60K 80K 100K SE +/- 517.95, N = 3 SE +/- 59.39, N = 3 SE +/- 1045.55, N = 4 98739.50 97068.01 95501.50 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
Memcached Set To Get Ratio: 1:100 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:100 Default - Disabled SNC2 SNC4 1.7M 3.4M 5.1M 6.8M 8.5M SE +/- 23469.53, N = 3 SE +/- 52363.50, N = 3 SE +/- 35640.28, N = 3 7743874.04 7704668.27 7507583.68 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 SNC2 SNC4 Default - Disabled 9 18 27 36 45 SE +/- 0.20, N = 3 SE +/- 0.12, N = 3 SE +/- 0.05, N = 3 40.27 39.07 39.06 MIN: 23.49 / MAX: 42.09 MIN: 19.69 / MAX: 41.1 MIN: 37.1 / MAX: 40.22
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 SNC2 SNC4 Default - Disabled 9 18 27 36 45 SE +/- 0.40, N = 3 SE +/- 0.45, N = 3 SE +/- 0.10, N = 3 40.41 39.29 39.21 MIN: 34.93 / MAX: 42.02 MIN: 25.78 / MAX: 41.34 MIN: 36.81 / MAX: 40.31
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D SNC2 Default - Disabled SNC4 900 1800 2700 3600 4500 SE +/- 8.86, N = 3 SE +/- 21.28, N = 3 SE +/- 45.26, N = 4 4282.36 4279.98 4155.68 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
John The Ripper Test: WPA PSK OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: WPA PSK Default - Disabled SNC2 SNC4 130K 260K 390K 520K 650K SE +/- 2146.03, N = 3 SE +/- 4996.06, N = 3 SE +/- 4063.87, N = 3 614263 611669 596180 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
LuxCoreRender Scene: Danish Mood - Acceleration: CPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: CPU SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.11, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 10.97 10.85 10.65 MIN: 5.05 / MAX: 12.72 MIN: 5.02 / MAX: 12.4 MIN: 4.82 / MAX: 12.11
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 Default - Disabled SNC4 SNC2 5 10 15 20 25 SE +/- 0.25, N = 3 SE +/- 0.20, N = 3 SE +/- 0.21, N = 3 18.99 18.62 18.44 MIN: 17.88 / MAX: 19.86 MIN: 10.5 / MAX: 19.64 MIN: 9.73 / MAX: 19.35
John The Ripper Test: HMAC-SHA512 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: HMAC-SHA512 Default - Disabled SNC2 SNC4 60M 120M 180M 240M 300M SE +/- 1312178.38, N = 3 SE +/- 1930805.92, N = 15 SE +/- 2535178.67, N = 12 296518667 289221400 287955500 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2
VVenC Video Input: Bosphorus 1080p - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Faster Default - Disabled SNC2 SNC4 9 18 27 36 45 SE +/- 0.17, N = 3 SE +/- 0.32, N = 3 SE +/- 0.45, N = 3 41.40 40.22 40.20 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 10.36 10.49 10.66 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Timed LLVM Compilation Build System: Unix Makefiles OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 16.0 Build System: Unix Makefiles SNC4 SNC2 Default - Disabled 40 80 120 160 200 SE +/- 0.32, N = 3 SE +/- 0.79, N = 3 SE +/- 0.35, N = 3 170.54 170.83 175.47
SPECFEM3D Model: Homogeneous Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Homogeneous Halfspace SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.063880399, N = 3 SE +/- 0.112412006, N = 4 SE +/- 0.062415502, N = 15 9.668759733 9.742806803 9.942989762 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
SPECFEM3D Model: Mount St. Helens OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Mount St. Helens SNC4 Default - Disabled SNC2 2 4 6 8 10 SE +/- 0.073100963, N = 3 SE +/- 0.077041220, N = 3 SE +/- 0.022697239, N = 3 7.545176178 7.724890032 7.756233737 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 1300 2600 3900 5200 6500 SE +/- 10.15, N = 3 SE +/- 15.69, N = 3 SE +/- 14.13, N = 3 5936.37 5797.28 5783.01 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 1.242 2.484 3.726 4.968 6.21 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 5.38 5.51 5.52 MIN: 3.31 / MAX: 23.31 MIN: 4.84 / MAX: 15.48 MIN: 4.91 / MAX: 14.41 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
GPAW Input: Carbon Nanotube OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube SNC4 SNC2 Default - Disabled 9 18 27 36 45 SE +/- 0.25, N = 3 SE +/- 0.23, N = 3 SE +/- 0.34, N = 3 37.16 37.80 38.13 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU Default - Disabled SNC4 SNC2 4K 8K 12K 16K 20K SE +/- 14.08, N = 3 SE +/- 11.11, N = 3 SE +/- 14.02, N = 3 17809.43 17402.76 17366.29 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
uvg266 Video Input: Bosphorus 1080p - Video Preset: Super Fast OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Super Fast Default - Disabled SNC4 SNC2 50 100 150 200 250 SE +/- 0.37, N = 3 SE +/- 2.03, N = 3 SE +/- 1.37, N = 3 211.36 206.92 206.23
uvg266 Video Input: Bosphorus 1080p - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Very Fast Default - Disabled SNC4 SNC2 50 100 150 200 250 SE +/- 1.32, N = 3 SE +/- 1.28, N = 3 SE +/- 0.63, N = 3 208.36 207.62 203.48
SPECFEM3D Model: Water-layered Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Water-layered Halfspace SNC4 SNC2 Default - Disabled 5 10 15 20 25 SE +/- 0.06, N = 3 SE +/- 0.19, N = 3 SE +/- 0.08, N = 3 18.82 19.22 19.27 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 SNC2 SNC4 Default - Disabled 11 22 33 44 55 SE +/- 0.25, N = 3 SE +/- 0.25, N = 3 SE +/- 0.07, N = 3 48.83 48.36 47.70 MIN: 28.77 / MAX: 50.91 MIN: 25.09 / MAX: 50.72 MIN: 44.82 / MAX: 49.01
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating Default - Disabled SNC4 SNC2 140K 280K 420K 560K 700K SE +/- 364.51, N = 3 SE +/- 5454.96, N = 3 SE +/- 11076.26, N = 3 655203 649120 640928 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
PETSc Test: Streams OpenBenchmarking.org MB/s, More Is Better PETSc 3.19 Test: Streams SNC4 SNC2 Default - Disabled 40K 80K 120K 160K 200K SE +/- 451.28, N = 3 SE +/- 708.72, N = 3 SE +/- 64.39, N = 3 187197.63 185070.83 183161.11 1. (CC) gcc options: -fPIC -O3 -O2 -lpthread -lpciaccess -lm
uvg266 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast SNC4 Default - Disabled SNC2 50 100 150 200 250 SE +/- 2.40, N = 3 SE +/- 0.58, N = 3 SE +/- 0.83, N = 3 208.97 207.97 204.49
QMCPACK Input: Li2_STO_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.17.1 Input: Li2_STO_ae SNC4 Default - Disabled SNC2 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.55, N = 3 SE +/- 1.13, N = 3 103.77 103.90 106.01 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
uvg266 Video Input: Bosphorus 4K - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Slow Default - Disabled SNC2 SNC4 7 14 21 28 35 SE +/- 0.09, N = 3 SE +/- 0.09, N = 3 SE +/- 0.13, N = 3 29.92 29.47 29.30
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Compression Rating Default - Disabled SNC4 SNC2 120K 240K 360K 480K 600K SE +/- 410.41, N = 3 SE +/- 2055.13, N = 3 SE +/- 2142.48, N = 3 546936 536820 535713 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
QuantLib Configuration: Multi-Threaded OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.32 Configuration: Multi-Threaded SNC4 SNC2 Default - Disabled 70K 140K 210K 280K 350K SE +/- 3981.12, N = 3 SE +/- 1378.05, N = 3 SE +/- 716.12, N = 3 317145.3 313010.5 310771.1 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C Default - Disabled SNC2 SNC4 20K 40K 60K 80K 100K SE +/- 61.36, N = 3 SE +/- 271.76, N = 3 SE +/- 287.86, N = 3 89217.91 88668.13 87467.23 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 Default - Disabled SNC2 SNC4 400 800 1200 1600 2000 SE +/- 2.22, N = 3 SE +/- 0.82, N = 3 SE +/- 11.98, N = 3 2043.5 2026.2 2004.7 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
NAS Parallel Benchmarks Test / Class: LU.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C SNC4 SNC2 Default - Disabled 60K 120K 180K 240K 300K SE +/- 1304.12, N = 3 SE +/- 427.36, N = 3 SE +/- 1442.62, N = 3 259883.62 256378.22 255135.24 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
PostgreSQL Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only Default - Disabled SNC4 SNC2 400K 800K 1200K 1600K 2000K SE +/- 7755.83, N = 3 SE +/- 14067.03, N = 3 SE +/- 19687.71, N = 6 1986807 1956509 1950507 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PostgreSQL Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only - Average Latency Default - Disabled SNC4 SNC2 0.1154 0.2308 0.3462 0.4616 0.577 SE +/- 0.002, N = 3 SE +/- 0.004, N = 3 SE +/- 0.005, N = 6 0.504 0.511 0.513 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.3 Binary: Pathtracer ISPC - Model: Crown Default - Disabled SNC4 SNC2 20 40 60 80 100 SE +/- 0.46, N = 3 SE +/- 0.40, N = 3 SE +/- 0.53, N = 3 108.08 106.43 106.26 MIN: 105.13 / MAX: 117.76 MIN: 103.16 / MAX: 113.77 MIN: 102.99 / MAX: 115.22
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark SNC2 SNC4 Default - Disabled 160 320 480 640 800 SE +/- 1.99, N = 3 SE +/- 7.89, N = 3 SE +/- 5.73, N = 3 758.02 753.64 746.05
Memcached Set To Get Ratio: 1:10 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:10 SNC2 Default - Disabled SNC4 1.2M 2.4M 3.6M 4.8M 6M SE +/- 17431.43, N = 3 SE +/- 23305.65, N = 3 SE +/- 8577.07, N = 3 5818020.66 5811757.93 5728479.09 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Timed CPython Compilation Build Configuration: Released Build, PGO + LTO Optimized OpenBenchmarking.org Seconds, Fewer Is Better Timed CPython Compilation 3.10.6 Build Configuration: Released Build, PGO + LTO Optimized SNC4 SNC2 Default - Disabled 40 80 120 160 200 185.51 188.07 188.35
LuxCoreRender Scene: DLSC - Acceleration: CPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: DLSC - Acceleration: CPU Default - Disabled SNC2 SNC4 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 SE +/- 0.02, N = 3 15.23 15.19 15.00 MIN: 14.85 / MAX: 18.76 MIN: 14.66 / MAX: 18.88 MIN: 14.61 / MAX: 18.76
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: CPU-Only Default - Disabled SNC4 SNC2 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 SE +/- 0.20, N = 3 19.47 19.71 19.76
uvg266 Video Input: Bosphorus 4K - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 4K - Video Preset: Medium Default - Disabled SNC2 SNC4 8 16 24 32 40 SE +/- 0.12, N = 3 SE +/- 0.17, N = 3 SE +/- 0.08, N = 3 33.23 33.12 32.75
Liquid-DSP Threads: 192 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 512 Default - Disabled SNC2 SNC4 300M 600M 900M 1200M 1500M SE +/- 4629254.80, N = 3 SE +/- 5353295.97, N = 3 SE +/- 5108163.40, N = 3 1514200000 1499133333 1493000000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 19.06 19.08 19.33 MIN: 17.12 / MAX: 40.94 MIN: 15.77 / MAX: 56.75 MIN: 9.33 / MAX: 85.74 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 SNC4 SNC2 Default - Disabled 600 1200 1800 2400 3000 SE +/- 21.00, N = 3 SE +/- 29.68, N = 3 SE +/- 18.22, N = 3 2599.9 2583.6 2564.6 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 SNC2 Default - Disabled SNC4 400M 800M 1200M 1600M 2000M SE +/- 3811532.21, N = 3 SE +/- 3985947.54, N = 3 SE +/- 6406333.67, N = 3 1744433333 1730566667 1721266667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OSPRay Studio Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU SNC2 Default - Disabled SNC4 8K 16K 24K 32K 40K SE +/- 192.95, N = 3 SE +/- 105.83, N = 3 SE +/- 65.34, N = 3 38015 38264 38523
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU SNC4 SNC2 Default - Disabled 1100 2200 3300 4400 5500 SE +/- 6.67, N = 3 SE +/- 9.79, N = 3 SE +/- 6.69, N = 3 5026.69 5023.42 4962.13 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
PostgreSQL Scaling Factor: 100 - Clients: 1000 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only Default - Disabled SNC4 SNC2 800K 1600K 2400K 3200K 4000K SE +/- 49179.25, N = 3 SE +/- 37681.71, N = 6 SE +/- 21467.51, N = 3 3792698 3759094 3746006 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OSPRay Studio Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC2 SNC4 10K 20K 30K 40K 50K SE +/- 70.44, N = 3 SE +/- 80.70, N = 3 SE +/- 146.66, N = 3 43999 44300 44535
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 Default - Disabled SNC2 SNC4 200M 400M 600M 800M 1000M SE +/- 8999104.28, N = 3 SE +/- 8992613.64, N = 3 SE +/- 6276308.19, N = 3 938493333 935170000 927276667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenRadioss Model: Chrysler Neon 1M OpenBenchmarking.org Seconds, Fewer Is Better OpenRadioss 2023.09.15 Model: Chrysler Neon 1M SNC2 SNC4 Default - Disabled 30 60 90 120 150 SE +/- 0.28, N = 3 SE +/- 0.27, N = 3 SE +/- 0.18, N = 3 155.50 156.21 157.35
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms SNC4 SNC2 Default - Disabled 0.0581 0.1162 0.1743 0.2324 0.2905 SE +/- 0.00197, N = 3 SE +/- 0.00284, N = 3 SE +/- 0.00087, N = 3 0.25502 0.25612 0.25803
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 SNC4 Default - Disabled SNC2 120 240 360 480 600 SE +/- 1.96, N = 3 SE +/- 0.58, N = 3 SE +/- 2.72, N = 3 559.8 555.6 553.3 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
uvg266 Video Input: Bosphorus 1080p - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Slow Default - Disabled SNC2 SNC4 20 40 60 80 100 SE +/- 0.27, N = 3 SE +/- 0.19, N = 3 SE +/- 0.40, N = 3 89.19 88.68 88.18
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 9.61 9.63 9.72 MIN: 8.18 / MAX: 19.44 MIN: 8.28 / MAX: 16.71 MIN: 4.95 / MAX: 29.09 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
PostgreSQL Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency Default - Disabled SNC4 SNC2 0.0601 0.1202 0.1803 0.2404 0.3005 SE +/- 0.003, N = 3 SE +/- 0.003, N = 6 SE +/- 0.002, N = 3 0.264 0.266 0.267 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
OpenSSL Algorithm: SHA512 OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA512 SNC4 SNC2 Default - Disabled 9000M 18000M 27000M 36000M 45000M SE +/- 16399997.56, N = 3 SE +/- 19206282.00, N = 3 SE +/- 32408039.38, N = 3 43206641983 43151618910 42723911653 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OSPRay Studio Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC2 SNC4 300 600 900 1200 1500 SE +/- 1.53, N = 3 SE +/- 2.40, N = 3 SE +/- 4.33, N = 3 1252 1259 1266
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 2K 4K 6K 8K 10K SE +/- 26.50, N = 3 SE +/- 17.38, N = 3 SE +/- 30.53, N = 3 9973.18 9948.30 9866.45 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 512 Default - Disabled SNC2 SNC4 300M 600M 900M 1200M 1500M SE +/- 6222807.51, N = 3 SE +/- 6948700.92, N = 3 SE +/- 7198688.15, N = 3 1314200000 1302833333 1300233333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OSPRay Studio Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC2 SNC4 8K 16K 24K 32K 40K SE +/- 51.31, N = 3 SE +/- 106.88, N = 3 SE +/- 154.47, N = 3 37890 38009 38284
Memcached Set To Get Ratio: 1:5 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:5 SNC2 Default - Disabled SNC4 700K 1400K 2100K 2800K 3500K SE +/- 33459.28, N = 3 SE +/- 15559.68, N = 3 SE +/- 44466.09, N = 3 3359880.55 3359726.87 3327393.35 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
OSPRay Studio Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC4 SNC2 200 400 600 800 1000 SE +/- 4.18, N = 3 SE +/- 1.86, N = 3 SE +/- 2.73, N = 3 1064 1073 1074
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 SE +/- 0.15, N = 3 97.85 97.76 96.95 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OSPRay Studio Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC2 SNC4 4K 8K 12K 16K 20K SE +/- 42.71, N = 3 SE +/- 18.75, N = 3 SE +/- 57.00, N = 3 20017 20144 20199
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU SNC2 SNC4 Default - Disabled 400 800 1200 1600 2000 SE +/- 7.84, N = 3 SE +/- 7.31, N = 3 SE +/- 4.08, N = 3 1914.89 1911.68 1898.14 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OSPRay Studio Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU SNC2 Default - Disabled SNC4 4K 8K 12K 16K 20K SE +/- 54.03, N = 3 SE +/- 94.37, N = 3 SE +/- 70.54, N = 3 17113 17193 17262
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: CPU-Only Default - Disabled SNC4 SNC2 9 18 27 36 45 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 38.02 38.24 38.33
uvg266 Video Input: Bosphorus 1080p - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better uvg266 0.4.1 Video Input: Bosphorus 1080p - Video Preset: Medium Default - Disabled SNC2 SNC4 20 40 60 80 100 SE +/- 0.23, N = 3 SE +/- 0.18, N = 3 SE +/- 0.34, N = 3 98.33 98.22 97.60
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 SNC4 Default - Disabled SNC2 200 400 600 800 1000 SE +/- 1.09, N = 3 SE +/- 0.53, N = 3 SE +/- 2.33, N = 3 1059.1 1055.4 1052.3 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C SNC4 SNC2 Default - Disabled 50K 100K 150K 200K 250K SE +/- 112.43, N = 3 SE +/- 322.09, N = 3 SE +/- 280.16, N = 3 215764.82 215684.06 214445.97 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
OpenSSL Algorithm: AES-128-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-128-GCM SNC4 SNC2 Default - Disabled 200000M 400000M 600000M 800000M 1000000M SE +/- 179395215.71, N = 3 SE +/- 1081587572.26, N = 3 SE +/- 2155035998.71, N = 3 946933378620 943365732580 941442701447 1. (CC) gcc options: -pthread -m64 -O3 -ldl
OpenSSL Algorithm: RSA4096 OpenBenchmarking.org verify/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 SNC4 Default - Disabled SNC2 300K 600K 900K 1200K 1500K SE +/- 3270.23, N = 3 SE +/- 3444.37, N = 3 SE +/- 486.05, N = 3 1538165.3 1533067.1 1529451.7 1. (CC) gcc options: -pthread -m64 -O3 -ldl
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate Default - Disabled SNC2 SNC4 10 20 30 40 50 SE +/- 0.18, N = 3 SE +/- 0.36, N = 3 SE +/- 0.42, N = 3 43.88 43.77 43.63 1. (CC) gcc options: -O3 -march=native -fopenmp
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 SNC4 Default - Disabled SNC2 1000M 2000M 3000M 4000M 5000M SE +/- 22778084.01, N = 3 SE +/- 17623122.44, N = 3 SE +/- 10235938.86, N = 3 4518133333 4495533333 4493966667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 Default - Disabled SNC4 SNC2 600M 1200M 1800M 2400M 3000M SE +/- 26479677.74, N = 3 SE +/- 26768908.17, N = 3 SE +/- 22995168.57, N = 3 2646100000 2643433333 2632233333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: CPU-Only Default - Disabled SNC4 SNC2 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 15.25 15.31 15.32
OSPRay Studio Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC2 SNC4 4K 8K 12K 16K 20K SE +/- 35.36, N = 3 SE +/- 20.23, N = 3 SE +/- 2.67, N = 3 16939 16999 17014
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: CPU-Only Default - Disabled SNC4 SNC2 11 22 33 44 55 SE +/- 0.20, N = 3 SE +/- 0.29, N = 3 SE +/- 0.23, N = 3 46.77 46.88 46.97
Liquid-DSP Threads: 192 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 57 SNC4 Default - Disabled SNC2 1200M 2400M 3600M 4800M 6000M SE +/- 14304195.19, N = 3 SE +/- 18999298.23, N = 3 SE +/- 15429013.07, N = 3 5416600000 5407200000 5393633333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 SNC4 SNC2 Default - Disabled 300M 600M 900M 1200M 1500M SE +/- 3769320.60, N = 3 SE +/- 5024716.69, N = 3 SE +/- 1937638.88, N = 3 1440933333 1436433333 1435333333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 10 20 30 40 50 SE +/- 0.16, N = 3 SE +/- 0.11, N = 3 SE +/- 0.07, N = 3 45.08 45.22 45.25 MIN: 36.6 / MAX: 52.28 MIN: 37.77 / MAX: 56.52 MIN: 34.3 / MAX: 61.3 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OSPRay Studio Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU OpenBenchmarking.org ms, Fewer Is Better OSPRay Studio 0.13 Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU Default - Disabled SNC4 SNC2 200 400 600 800 1000 SE +/- 3.79, N = 3 SE +/- 3.71, N = 3 SE +/- 3.61, N = 3 1076 1079 1080
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 32 SNC4 SNC2 Default - Disabled 900M 1800M 2700M 3600M 4500M SE +/- 26314254.69, N = 3 SE +/- 26426018.32, N = 3 SE +/- 18076719.22, N = 3 4243600000 4233566667 4228133333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU SNC4 SNC2 Default - Disabled 500 1000 1500 2000 2500 SE +/- 7.69, N = 3 SE +/- 5.01, N = 3 SE +/- 3.37, N = 3 2127.42 2121.18 2120.23 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
Liquid-DSP Threads: 192 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 192 - Buffer Length: 256 - Filter Length: 32 SNC4 Default - Disabled SNC2 1200M 2400M 3600M 4800M 6000M SE +/- 29526993.30, N = 3 SE +/- 19784955.00, N = 3 SE +/- 18095333.96, N = 3 5535500000 5526733333 5519933333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenSSL Algorithm: ChaCha20-Poly1305 OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20-Poly1305 SNC4 SNC2 Default - Disabled 80000M 160000M 240000M 320000M 400000M SE +/- 129282934.95, N = 3 SE +/- 140024932.44, N = 3 SE +/- 176665463.84, N = 3 362765500023 361987547683 361852375860 1. (CC) gcc options: -pthread -m64 -O3 -ldl
Rodinia Test: OpenMP LavaMD OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD SNC2 Default - Disabled SNC4 6 12 18 24 30 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 26.91 26.95 26.97 1. (CXX) g++ options: -O2 -lOpenCL
OpenSSL Algorithm: AES-256-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: AES-256-GCM SNC4 SNC2 Default - Disabled 200000M 400000M 600000M 800000M 1000000M SE +/- 1023563571.48, N = 3 SE +/- 991457061.79, N = 3 SE +/- 1400557991.78, N = 3 817131387107 815393358357 815175705397 1. (CC) gcc options: -pthread -m64 -O3 -ldl
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 SNC4 Default - Disabled SNC2 600M 1200M 1800M 2400M 3000M SE +/- 1844210.16, N = 3 SE +/- 20219380.14, N = 3 SE +/- 14068522.78, N = 3 3023166667 3020900000 3016100000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 SNC2 SNC4 Default - Disabled 110M 220M 330M 440M 550M SE +/- 3417076.40, N = 3 SE +/- 3819181.12, N = 3 SE +/- 2767116.43, N = 3 523603333 522996667 522560000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
OpenSSL Algorithm: ChaCha20 OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: ChaCha20 SNC4 Default - Disabled SNC2 110000M 220000M 330000M 440000M 550000M SE +/- 56755200.60, N = 3 SE +/- 157967680.92, N = 3 SE +/- 45048908.57, N = 3 512318580417 511694244653 511318347957 1. (CC) gcc options: -pthread -m64 -O3 -ldl
Rodinia Test: OpenMP CFD Solver OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver Default - Disabled SNC2 SNC4 1.2566 2.5132 3.7698 5.0264 6.283 SE +/- 0.036, N = 3 SE +/- 0.002, N = 3 SE +/- 0.013, N = 3 5.577 5.581 5.585 1. (CXX) g++ options: -O2 -lOpenCL
OpenSSL Algorithm: RSA4096 OpenBenchmarking.org sign/s, More Is Better OpenSSL 3.1 Algorithm: RSA4096 SNC4 Default - Disabled SNC2 11K 22K 33K 44K 55K SE +/- 90.07, N = 3 SE +/- 49.04, N = 3 SE +/- 67.46, N = 3 49924.9 49897.3 49862.1 1. (CC) gcc options: -pthread -m64 -O3 -ldl
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: CPU-Only SNC2 SNC4 Default - Disabled 30 60 90 120 150 SE +/- 0.26, N = 3 SE +/- 0.23, N = 3 SE +/- 0.18, N = 3 136.20 136.28 136.29
OpenSSL Algorithm: SHA256 OpenBenchmarking.org byte/s, More Is Better OpenSSL 3.1 Algorithm: SHA256 SNC4 Default - Disabled SNC2 30000M 60000M 90000M 120000M 150000M SE +/- 282602364.23, N = 3 SE +/- 311439878.18, N = 3 SE +/- 241913832.01, N = 3 131706698427 131629499893 131627463983 1. (CC) gcc options: -pthread -m64 -O3 -ldl
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l Default - Disabled SNC2 SNC4 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.13, N = 6 6.36 5.38 3.68 MIN: 5.69 / MAX: 6.62 MIN: 3.75 / MAX: 5.87 MIN: 1.19 / MAX: 6.36
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l Default - Disabled SNC2 SNC4 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.29, N = 6 6.35 5.31 4.07 MIN: 5.72 / MAX: 6.64 MIN: 3.81 / MAX: 5.85 MIN: 1.11 / MAX: 6.41
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l Default - Disabled SNC2 SNC4 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.23, N = 9 6.34 5.36 4.15 MIN: 5.68 / MAX: 6.64 MIN: 3.75 / MAX: 5.85 MIN: 1.09 / MAX: 6.41
PostgreSQL Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write - Average Latency Default - Disabled SNC2 SNC4 15 30 45 60 75 SE +/- 1.61, N = 12 SE +/- 0.82, N = 3 SE +/- 0.79, N = 4 60.66 63.44 65.33 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PostgreSQL Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write Default - Disabled SNC2 SNC4 4K 8K 12K 16K 20K SE +/- 472.23, N = 12 SE +/- 204.24, N = 3 SE +/- 181.92, N = 4 16623 15769 15314 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PostgreSQL Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average Latency Default - Disabled SNC2 SNC4 16 32 48 64 80 SE +/- 0.50, N = 12 SE +/- 3.12, N = 12 SE +/- 3.47, N = 12 55.40 72.15 72.65 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
PostgreSQL Scaling Factor: 100 - Clients: 1000 - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL 16 Scaling Factor: 100 - Clients: 1000 - Mode: Read Write Default - Disabled SNC2 SNC4 4K 8K 12K 16K 20K SE +/- 160.41, N = 12 SE +/- 496.25, N = 12 SE +/- 577.54, N = 12 18068 14096 14067 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2023 Implementation: MPI CPU - Input: water_GMX50_bare SNC4 SNC2 Default - Disabled 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 SE +/- 0.38, N = 9 10.66 10.45 10.38 1. (CXX) g++ options: -O3
ASKAP Test: tConvolve MPI - Gridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding Default - Disabled SNC2 SNC4 9K 18K 27K 36K 45K SE +/- 199.70, N = 3 SE +/- 341.22, N = 3 SE +/- 743.57, N = 15 43532.9 43138.9 35893.7 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Degridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding SNC2 Default - Disabled SNC4 9K 18K 27K 36K 45K SE +/- 463.37, N = 3 SE +/- 170.33, N = 3 SE +/- 691.76, N = 15 40552.4 40198.3 34358.5 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ClickHouse 100M Rows Hits Dataset, Third Run OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Third Run Default - Disabled SNC2 SNC4 110 220 330 440 550 SE +/- 5.43, N = 3 SE +/- 8.91, N = 12 SE +/- 10.68, N = 12 504.59 444.12 400.96 MIN: 58.03 / MAX: 3750 MIN: 48.62 / MAX: 7500 MIN: 41.1 / MAX: 6000
ClickHouse 100M Rows Hits Dataset, Second Run OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Second Run Default - Disabled SNC2 SNC4 110 220 330 440 550 SE +/- 6.17, N = 3 SE +/- 8.80, N = 12 SE +/- 10.89, N = 12 490.52 438.54 397.01 MIN: 34.27 / MAX: 4615.38 MIN: 35.21 / MAX: 6666.67 MIN: 39.66 / MAX: 6000
ClickHouse 100M Rows Hits Dataset, First Run / Cold Cache OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, First Run / Cold Cache Default - Disabled SNC2 SNC4 100 200 300 400 500 SE +/- 4.35, N = 3 SE +/- 8.46, N = 12 SE +/- 10.37, N = 12 457.06 417.30 388.42 MIN: 47.36 / MAX: 4285.71 MIN: 32.68 / MAX: 6000 MIN: 30.26 / MAX: 6000
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 15 Total Time SNC4 SNC2 Default - Disabled 60M 120M 180M 240M 300M SE +/- 6406046.35, N = 15 SE +/- 3381815.04, N = 15 SE +/- 2377868.67, N = 3 300267148 295780798 287331097 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400 Default - Disabled SNC2 SNC4 20 40 60 80 100 SE +/- 0.47, N = 9 SE +/- 0.49, N = 3 SE +/- 3.38, N = 12 58.65 59.25 75.17 1. (CXX) g++ options: -O3 -fopenmp
easyWave Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 OpenBenchmarking.org Seconds, Fewer Is Better easyWave r34 Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200 Default - Disabled SNC2 SNC4 7 14 21 28 35 SE +/- 0.32, N = 15 SE +/- 0.46, N = 15 SE +/- 0.57, N = 12 23.99 25.48 28.74 1. (CXX) g++ options: -O3 -fopenmp
LuxCoreRender Scene: Rainbow Colors and Prism - Acceleration: CPU OpenBenchmarking.org M samples/sec, More Is Better LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: CPU SNC4 Default - Disabled SNC2 8 16 24 32 40 SE +/- 0.24, N = 3 SE +/- 1.27, N = 15 SE +/- 1.17, N = 15 34.72 32.75 31.82 MIN: 34.26 / MAX: 35.1 MIN: 17.73 / MAX: 35.6 MIN: 18.52 / MAX: 35.42
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C SNC4 SNC2 Default - Disabled 2K 4K 6K 8K 10K SE +/- 304.30, N = 12 SE +/- 257.16, N = 15 SE +/- 60.80, N = 3 10411.60 9712.81 8869.84 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5
Phoronix Test Suite v10.8.5