AMD Threadripper 7995WX NPS / SNC2 SNC4 Benchmarks

AMD Ryzen Threadripper PRO 7995WX 96-Cores testing of NPS/SNC settings with default (disabled), SNC2, and SNC4 modes. Benchmarks by Michael Larabel for a future article.

HTML result view exported from: https://openbenchmarking.org/result/2311288-NE-TR7995WXN68&grs&sor.

AMD Threadripper 7995WX NPS / SNC2 SNC4 BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionDefault - DisabledSNC2SNC4AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads)HP 8B24 (U65 Ver. 01.01.04 BIOS)AMD Device 14a4128GB2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1NVIDIA RTX A4000 16GBNVIDIA GA104 HD AudioASUS VP28URealtek RTL8111/8168/8411Ubuntu 23.106.5.0-13-generic (x86_64)GNOME Shell 45.0X Server 1.21.1.7NVIDIA 535.129.034.6.0OpenCL 3.0 CUDA 12.2.147GCC 13.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105OpenCL Details- GPU Compute Cores: 6144Python Details- Python 3.11.6Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AMD Threadripper 7995WX NPS / SNC2 SNC4 Benchmarksaskap: Hogbom Clean OpenMPaskap: tConvolve OpenMP - Griddingopenvino: Vehicle Detection FP16 - CPUaskap: tConvolve OpenMP - Degriddingopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUlulesh: build-linux-kernel: allmodconfigopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUgraph500: 26graph500: 26openvino: Vehicle Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUbuild-linux-kernel: defconfigopenvino: Road Segmentation ADAS FP16 - CPUcloverleaf: clover_bm16openvino: Person Vehicle Bike Detection FP16 - CPUbuild-llvm: Ninjaaskap: tConvolve MT - Griddinggraph500: 26openvkl: vklBenchmarkCPU ISPCaskap: tConvolve MT - Degriddingopenvino: Handwritten English Recognition FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUasmfish: 1024 Hash Memory, 26 Depthbuild-nodejs: Time To Compileopenvino: Face Detection Retail FP16 - CPUbuild-gem5: Time To Compileopenfoam: drivaerFastback, Medium Mesh Size - Execution Timegraph500: 26npb: CG.Cuvg266: Bosphorus 4K - Ultra Fastjohn-the-ripper: MD5vvenc: Bosphorus 4K - Fasteropenvino: Person Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUuvg266: Bosphorus 4K - Super Fastincompact3d: input.i3d 129 Cells Per Directionuvg266: Bosphorus 4K - Very Fastopenvino: Person Detection FP32 - CPUcloverleaf: clover_bmembree: Pathtracer ISPC - Asian Dragon Objtensorflow: CPU - 64 - ResNet-50john-the-ripper: Blowfishcloverleaf: clover_bm64_shortjohn-the-ripper: bcryptamg: vvenc: Bosphorus 4K - Fasttensorflow: CPU - 32 - ResNet-50specfem3d: Tomographic Modelqe: AUSURF112npb: FT.Ctensorflow: CPU - 512 - ResNet-50tensorflow: CPU - 256 - ResNet-50rodinia: OpenMP Leukocytepytorch: CPU - 1 - Efficientnet_v2_lluxcorerender: Orange Juice - CPUembree: Pathtracer ISPC - Asian Dragonpytorch: CPU - 32 - ResNet-152pytorch: CPU - 64 - ResNet-152tensorflow: CPU - 16 - ResNet-50pytorch: CPU - 16 - ResNet-152rodinia: OpenMP Streamclusternpb: SP.Bradiance: SMP Parallelluxcorerender: LuxCore Benchmark - CPUrodinia: OpenMP HotSpot3Dspecfem3d: Layered Halfspaceopenfoam: drivaerFastback, Medium Mesh Size - Mesh Timepytorch: CPU - 16 - ResNet-50vvenc: Bosphorus 1080p - Fastopenvino: Face Detection FP16 - CPUnpb: MG.Cmemcached: 1:100pytorch: CPU - 64 - ResNet-50pytorch: CPU - 32 - ResNet-50npb: IS.Djohn-the-ripper: WPA PSKluxcorerender: Danish Mood - CPUpytorch: CPU - 1 - ResNet-152john-the-ripper: HMAC-SHA512vvenc: Bosphorus 1080p - Fasterincompact3d: input.i3d 193 Cells Per Directionbuild-llvm: Unix Makefilesspecfem3d: Homogeneous Halfspacespecfem3d: Mount St. Helensopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUgpaw: Carbon Nanotubeopenvino: Face Detection Retail FP16-INT8 - CPUuvg266: Bosphorus 1080p - Super Fastuvg266: Bosphorus 1080p - Very Fastspecfem3d: Water-layered Halfspacepytorch: CPU - 1 - ResNet-50compress-7zip: Decompression Ratingpetsc: Streamsuvg266: Bosphorus 1080p - Ultra Fastqmcpack: Li2_STO_aeuvg266: Bosphorus 4K - Slowcompress-7zip: Compression Ratingquantlib: Multi-Threadednpb: SP.Clibxsmm: 128npb: LU.Cpgbench: 1000 - 1000 - Read Onlypgbench: 1000 - 1000 - Read Only - Average Latencyembree: Pathtracer ISPC - Crownnumpy: memcached: 1:10build-python: Released Build, PGO + LTO Optimizedluxcorerender: DLSC - CPUblender: Fishy Cat - CPU-Onlyuvg266: Bosphorus 4K - Mediumliquid-dsp: 192 - 256 - 512openvino: Weld Porosity Detection FP16 - CPUlibxsmm: 256liquid-dsp: 32 - 256 - 57ospray-studio: 2 - 4K - 32 - Path Tracer - CPUopenvino: Weld Porosity Detection FP16 - CPUpgbench: 100 - 1000 - Read Onlyospray-studio: 3 - 4K - 32 - Path Tracer - CPUliquid-dsp: 64 - 256 - 512openradioss: Chrysler Neon 1Mnamd: ATPase Simulation - 327,506 Atomslibxsmm: 32uvg266: Bosphorus 1080p - Slowopenvino: Weld Porosity Detection FP16-INT8 - CPUpgbench: 100 - 1000 - Read Only - Average Latencyopenssl: SHA512ospray-studio: 3 - 4K - 1 - Path Tracer - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUliquid-dsp: 128 - 256 - 512ospray-studio: 1 - 4K - 32 - Path Tracer - CPUmemcached: 1:5ospray-studio: 1 - 4K - 1 - Path Tracer - CPUopenvino: Face Detection FP16-INT8 - CPUospray-studio: 3 - 4K - 16 - Path Tracer - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUospray-studio: 2 - 4K - 16 - Path Tracer - CPUblender: Classroom - CPU-Onlyuvg266: Bosphorus 1080p - Mediumlibxsmm: 64npb: BT.Copenssl: AES-128-GCMopenssl: RSA4096mt-dgemm: Sustained Floating-Point Rateliquid-dsp: 128 - 256 - 57liquid-dsp: 64 - 256 - 32blender: BMW27 - CPU-Onlyospray-studio: 1 - 4K - 16 - Path Tracer - CPUblender: Pabellon Barcelona - CPU-Onlyliquid-dsp: 192 - 256 - 57liquid-dsp: 32 - 256 - 32openvino: Handwritten English Recognition FP16-INT8 - CPUospray-studio: 2 - 4K - 1 - Path Tracer - CPUliquid-dsp: 128 - 256 - 32openvino: Handwritten English Recognition FP16-INT8 - CPUliquid-dsp: 192 - 256 - 32openssl: ChaCha20-Poly1305rodinia: OpenMP LavaMDopenssl: AES-256-GCMliquid-dsp: 64 - 256 - 57liquid-dsp: 32 - 256 - 512openssl: ChaCha20rodinia: OpenMP CFD Solveropenssl: RSA4096blender: Barbershop - CPU-Onlyopenssl: SHA256pytorch: CPU - 64 - Efficientnet_v2_lpytorch: CPU - 32 - Efficientnet_v2_lpytorch: CPU - 16 - Efficientnet_v2_lpgbench: 1000 - 1000 - Read Write - Average Latencypgbench: 1000 - 1000 - Read Writepgbench: 100 - 1000 - Read Write - Average Latencypgbench: 100 - 1000 - Read Writegromacs: MPI CPU - water_GMX50_bareaskap: tConvolve MPI - Griddingaskap: tConvolve MPI - Degriddingclickhouse: 100M Rows Hits Dataset, Third Runclickhouse: 100M Rows Hits Dataset, Second Runclickhouse: 100M Rows Hits Dataset, First Run / Cold Cachestockfish: Total Timeeasywave: e2Asean Grid + BengkuluSept2007 Source - 2400easywave: e2Asean Grid + BengkuluSept2007 Source - 1200luxcorerender: Rainbow Colors and Prism - CPUnpb: EP.CDefault - DisabledSNC2SNC41127.8519018.317.5520481.236.57141.78141.54965.6725.26493.578.0793.233.810.628.4923294.844264.07786958.12113350.177561650007276620002730.600.8631.4541310.86329.615643.89121.8688655.84462057000215311871.22592.5937.01248215163111.37012547.91150.970331.8558635712700052079.1369.021460066715.245338.09514.2267.472.6173870566.32338.7410.96112.828790.4617314539.3217433516628940008.71570.357.987430452326.06100728.49135.02118.8629.38611.1622.24129.171916.1615.9151.7716.064.703145525.94119.19212.3958.86618.741871937138.6458138.6824.29849.4195501.507743874.0439.0639.214279.9861426310.6518.9929651866741.39710.6618586175.4689.9429897627.7248900325936.375.3838.12817809.43211.36208.3619.26850590547.70655203183161.1076207.97103.9029.92546936310771.189217.912043.5255135.2419868070.504108.0805746.055811757.93188.35115.2319.4733.23151420000019.332564.61730566667382644962.13379269843999938493333157.350.25803555.689.199.720.2644272391165312529866.451314200000378903359726.87106496.95200171898.141719338.0298.331055.4214445.979414427014471533067.143.8794514495533333264610000015.251693946.775407200000143533333345.25107642281333332120.23552673333336185237586026.95381517570539730209000005225600005116942446535.57749897.3136.291316294998936.366.356.3460.6591662355.3961806810.38043532.940198.3504.59490.52457.0628733109758.64623.98632.758869.84564.98412083.76.1814122.514.5167.2267.17468.3512.52245.024.1449.802.090.365.0633106.244174.354134566.00166444.72106956000010153000003875.390.6124.5791652.91355.044737.07104.6018223.25522187000208911209.22435.2839.36272254945102.32211414.89140.813310.6596338923500054560.0365.341388880014.306356.72481.4164.652.4965933262.99357.0311.82111.360185.5416928741.2117173417321493338.27866.137.544143736316.5097435.24130.65115.9030.89410.6522.77128.009915.3616.0349.5115.304.884151244.49114.28411.9459.85918.359442232135.1001440.0923.71451.0697068.017704668.2740.2740.414282.3661166910.8518.4428922140040.22210.4893306170.8289.7428068037.7562337375783.015.5237.80317366.29206.23203.4819.21560723648.83640928185070.8277204.49106.0129.47535713313010.588668.132026.2256378.2219505070.513106.2642758.025818020.66188.07315.1919.7633.12149913333319.082583.61744433333380155023.42374600644300935170000155.500.25612553.388.689.630.2674315161891012599948.301302833333380093359880.55107497.76201441914.891711338.3398.221052.3215684.069433657325801529451.743.7669204493966667263223333315.321699946.975393633333143643333345.22108042335666672121.18551993333336198754768326.90581539335835730161000005236033335113183479575.58149862.1136.201316274639835.385.315.3663.4381576972.1481409610.45043138.940552.4444.12438.54417.3029578079859.24925.48031.829712.81325.3946602.286.207607.3114.5665.3265.40467.3912.54244.804.1350.562.090.365.0738146.304168.153134176.02164363.19110490000010569800003863.880.6124.0871647.09396.534721.81104.6937467.14527864000190510522.52333.4641.08275458088100.61611421.25137.479302.6916439068200056924.5663.231341900014.014367.12474.1162.252.4161430261.26366.5910.95104.803184.5016179042.0016325917737583338.18666.887.519415000307.0694921.27127.24112.0831.12310.5821.60122.587115.6115.2449.2715.394.933151839.30117.4411.8961.23418.044192295133.744739.8023.45251.1998739.507507583.6839.0739.294155.6859618010.9718.6228795550040.20310.3613243170.5409.6687597337.5451761785797.285.5137.16317402.76206.92207.6218.82173464248.36649120187197.6338208.97103.7729.30536820317145.387467.232004.7259883.6219565090.511106.4294753.645728479.09185.50515.0019.7132.75149300000019.062599.91721266667385235026.69375909444535927276667156.210.25502559.888.189.610.2664320664198312669973.181300233333382843327393.35107397.85201991911.681726238.2497.601059.1215764.829469333786201538165.343.6309784518133333264343333315.311701446.885416600000144093333345.08107942436000002127.42553550000036276550002326.97181713138710730231666675229966675123185804175.58549924.9136.281317066984273.684.074.1565.3281531472.6501406710.66235893.734358.5400.96397.01388.4230026714875.17228.73934.7210411.60OpenBenchmarking.org

ASKAP

Test: Hogbom Clean OpenMP

OpenBenchmarking.orgIterations Per Second, More Is BetterASKAP 1.0Test: Hogbom Clean OpenMPDefault - DisabledSNC2SNC42004006008001000SE +/- 4.25, N = 3SE +/- 1.84, N = 3SE +/- 1.54, N = 31127.85564.98325.391. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve OpenMP - Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - GriddingDefault - DisabledSNC2SNC44K8K12K16K20KSE +/- 0.00, N = 3SE +/- 122.89, N = 15SE +/- 54.12, N = 319018.3012083.706602.281. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Vehicle Detection FP16 - Device: CPUSNC2SNC4Default - Disabled48121620SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 36.186.2017.55MIN: 5.27 / MAX: 16.2MIN: 5.02 / MAX: 17.36MIN: 5.74 / MAX: 43.741. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

ASKAP

Test: tConvolve OpenMP - Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - DegriddingDefault - DisabledSNC2SNC44K8K12K16K20KSE +/- 100.74, N = 15SE +/- 0.00, N = 320481.2014122.507607.311. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenVINO

Model: Road Segmentation ADAS FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Road Segmentation ADAS FP16 - Device: CPUSNC2SNC4Default - Disabled816243240SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.36, N = 314.5114.5636.57MIN: 12.17 / MAX: 35.53MIN: 11.83 / MAX: 32.54MIN: 15.82 / MAX: 76.011. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Person Detection FP16 - Device: CPUSNC4SNC2Default - Disabled306090120150SE +/- 0.13, N = 3SE +/- 0.29, N = 3SE +/- 0.52, N = 365.3267.22141.78MIN: 36.8 / MAX: 87.75MIN: 37.86 / MAX: 90.7MIN: 54.39 / MAX: 210.641. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Person Detection FP32 - Device: CPUSNC4SNC2Default - Disabled306090120150SE +/- 0.17, N = 3SE +/- 0.43, N = 3SE +/- 0.37, N = 365.4067.17141.54MIN: 44.03 / MAX: 91.21MIN: 38.08 / MAX: 93.26MIN: 50.69 / MAX: 212.291. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection FP16 - Device: CPUSNC4SNC2Default - Disabled2004006008001000SE +/- 1.50, N = 3SE +/- 0.78, N = 3SE +/- 0.53, N = 3467.39468.35965.67MIN: 391.77 / MAX: 504.29MIN: 396.69 / MAX: 533.02MIN: 767.19 / MAX: 1026.91. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Road Segmentation ADAS FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Road Segmentation ADAS FP16-INT8 - Device: CPUSNC2SNC4Default - Disabled612182430SE +/- 0.05, N = 3SE +/- 0.05, N = 3SE +/- 0.05, N = 312.5212.5425.26MIN: 10.66 / MAX: 27.97MIN: 10.7 / MAX: 28.81MIN: 12.5 / MAX: 49.191. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled110220330440550SE +/- 0.34, N = 3SE +/- 0.27, N = 3SE +/- 0.76, N = 3244.80245.02493.57MIN: 212 / MAX: 263.67MIN: 201.96 / MAX: 285.01MIN: 246.03 / MAX: 522.951. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled246810SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 34.134.148.07MIN: 3.67 / MAX: 12.51MIN: 3.73 / MAX: 12.01MIN: 4.25 / MAX: 26.671. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Machine Translation EN To DE FP16 - Device: CPUSNC2SNC4Default - Disabled20406080100SE +/- 0.22, N = 3SE +/- 0.15, N = 3SE +/- 0.07, N = 349.8050.5693.23MIN: 38.67 / MAX: 117.73MIN: 38.52 / MAX: 103.27MIN: 42.51 / MAX: 145.871. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection Retail FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection Retail FP16 - Device: CPUSNC2SNC4Default - Disabled0.85731.71462.57193.42924.2865SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 32.092.093.81MIN: 1.85 / MAX: 7.8MIN: 1.87 / MAX: 8.99MIN: 2.1 / MAX: 21.931. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUSNC2SNC4Default - Disabled0.13950.2790.41850.5580.6975SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 30.360.360.62MIN: 0.27 / MAX: 31.55MIN: 0.27 / MAX: 41.09MIN: 0.21 / MAX: 39.811. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUSNC2SNC4Default - Disabled246810SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 35.065.078.49MIN: 4.47 / MAX: 12.75MIN: 4.35 / MAX: 13.85MIN: 5.66 / MAX: 25.991. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3SNC4SNC2Default - Disabled8K16K24K32K40KSE +/- 142.33, N = 3SE +/- 116.87, N = 3SE +/- 398.51, N = 1238146.3033106.2423294.841. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigSNC4SNC2Default - Disabled60120180240300SE +/- 0.44, N = 3SE +/- 0.85, N = 3SE +/- 1.24, N = 3168.15174.35264.08

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUSNC2SNC4Default - Disabled30K60K90K120K150KSE +/- 365.77, N = 3SE +/- 140.85, N = 3SE +/- 195.22, N = 3134566.00134176.0286958.121. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUSNC2SNC4Default - Disabled40K80K120K160K200KSE +/- 805.14, N = 3SE +/- 183.36, N = 3SE +/- 251.60, N = 3166444.72164363.19113350.171. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

Graph500

Scale: 26

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26SNC4SNC2Default - Disabled200M400M600M800M1000M110490000010695600007561650001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26SNC4SNC2Default - Disabled200M400M600M800M1000M105698000010153000007276620001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Vehicle Detection FP16 - Device: CPUSNC2SNC4Default - Disabled8001600240032004000SE +/- 8.11, N = 3SE +/- 8.94, N = 3SE +/- 9.07, N = 33875.393863.882730.601. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUSNC2SNC4Default - Disabled0.19350.3870.58050.7740.9675SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 30.610.610.86MIN: 0.44 / MAX: 16.14MIN: 0.46 / MAX: 16.98MIN: 0.28 / MAX: 17.811. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

Timed Linux Kernel Compilation

Build: defconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: defconfigSNC4SNC2Default - Disabled714212835SE +/- 0.29, N = 4SE +/- 0.26, N = 5SE +/- 0.34, N = 424.0924.5831.45

OpenVINO

Model: Road Segmentation ADAS FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Road Segmentation ADAS FP16 - Device: CPUSNC2SNC4Default - Disabled400800120016002000SE +/- 4.22, N = 3SE +/- 0.95, N = 3SE +/- 12.81, N = 31652.911647.091310.861. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

CloverLeaf

Input: clover_bm16

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeaf 1.3Input: clover_bm16Default - DisabledSNC2SNC490180270360450SE +/- 0.23, N = 3SE +/- 1.55, N = 3SE +/- 1.41, N = 3329.61355.04396.531. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUDefault - DisabledSNC2SNC412002400360048006000SE +/- 12.20, N = 3SE +/- 6.31, N = 3SE +/- 10.32, N = 35643.894737.074721.811. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: NinjaSNC2SNC4Default - Disabled306090120150SE +/- 0.30, N = 3SE +/- 0.52, N = 3SE +/- 0.08, N = 3104.60104.69121.87

ASKAP

Test: tConvolve MT - Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - GriddingDefault - DisabledSNC2SNC42K4K6K8K10KSE +/- 13.20, N = 3SE +/- 5.09, N = 3SE +/- 78.17, N = 38655.848223.257467.141. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

Graph500

Scale: 26

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26SNC4SNC2Default - Disabled110M220M330M440M550M5278640005221870004620570001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenVKL

Benchmark: vklBenchmarkCPU ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 2.0.0Benchmark: vklBenchmarkCPU ISPCDefault - DisabledSNC2SNC45001000150020002500SE +/- 0.88, N = 3SE +/- 2.73, N = 3SE +/- 5.29, N = 3215320891905MIN: 179 / MAX: 27831MIN: 178 / MAX: 27767MIN: 180 / MAX: 27886

ASKAP

Test: tConvolve MT - Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - DegriddingDefault - DisabledSNC2SNC43K6K9K12K15KSE +/- 75.70, N = 3SE +/- 13.79, N = 3SE +/- 138.85, N = 311871.211209.210522.51. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenVINO

Model: Handwritten English Recognition FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Handwritten English Recognition FP16 - Device: CPUDefault - DisabledSNC2SNC46001200180024003000SE +/- 19.25, N = 3SE +/- 24.21, N = 3SE +/- 9.95, N = 32592.592435.282333.461. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Handwritten English Recognition FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Handwritten English Recognition FP16 - Device: CPUDefault - DisabledSNC2SNC4918273645SE +/- 0.27, N = 3SE +/- 0.40, N = 3SE +/- 0.17, N = 337.0139.3641.08MIN: 20.68 / MAX: 66.67MIN: 32.78 / MAX: 55.31MIN: 31.72 / MAX: 55.311. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

asmFish

1024 Hash Memory, 26 Depth

OpenBenchmarking.orgNodes/second, More Is BetterasmFish 2018-07-231024 Hash Memory, 26 DepthSNC4SNC2Default - Disabled60M120M180M240M300MSE +/- 1785028.61, N = 3SE +/- 2257424.10, N = 3SE +/- 1898256.07, N = 3275458088272254945248215163

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileSNC4SNC2Default - Disabled20406080100SE +/- 0.39, N = 3SE +/- 0.40, N = 3SE +/- 1.21, N = 5100.62102.32111.37

OpenVINO

Model: Face Detection Retail FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection Retail FP16 - Device: CPUDefault - DisabledSNC4SNC23K6K9K12K15KSE +/- 73.24, N = 3SE +/- 61.71, N = 3SE +/- 56.50, N = 312547.9111421.2511414.891. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

Timed Gem5 Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Gem5 Compilation 23.0.1Time To CompileSNC4SNC2Default - Disabled306090120150SE +/- 1.46, N = 5SE +/- 1.48, N = 3SE +/- 1.82, N = 3137.48140.81150.97

OpenFOAM

Input: drivaerFastback, Medium Mesh Size - Execution Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeSNC4SNC2Default - Disabled70140210280350302.69310.66331.861. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Graph500

Scale: 26

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26SNC4SNC2Default - Disabled80M160M240M320M400M3906820003892350003571270001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CSNC4SNC2Default - Disabled12K24K36K48K60KSE +/- 191.96, N = 3SE +/- 324.22, N = 3SE +/- 476.77, N = 356924.5654560.0352079.131. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

uvg266

Video Input: Bosphorus 4K - Video Preset: Ultra Fast

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: Ultra FastDefault - DisabledSNC2SNC41530456075SE +/- 0.15, N = 3SE +/- 0.36, N = 3SE +/- 0.25, N = 369.0265.3463.23

John The Ripper

Test: MD5

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: MD5Default - DisabledSNC2SNC43M6M9M12M15MSE +/- 43978.53, N = 3SE +/- 193319.26, N = 15SE +/- 174193.35, N = 151460066713888800134190001. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: FasterDefault - DisabledSNC2SNC448121620SE +/- 0.02, N = 3SE +/- 0.12, N = 3SE +/- 0.09, N = 315.2514.3114.011. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Person Detection FP16 - Device: CPUSNC4SNC2Default - Disabled80160240320400SE +/- 0.76, N = 3SE +/- 1.55, N = 3SE +/- 1.27, N = 3367.12356.72338.091. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Machine Translation EN To DE FP16 - Device: CPUDefault - DisabledSNC2SNC4110220330440550SE +/- 0.33, N = 3SE +/- 2.07, N = 3SE +/- 1.37, N = 3514.22481.41474.111. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

uvg266

Video Input: Bosphorus 4K - Video Preset: Super Fast

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: Super FastDefault - DisabledSNC2SNC41530456075SE +/- 0.24, N = 3SE +/- 0.12, N = 3SE +/- 0.42, N = 367.4764.6562.25

Xcompact3d Incompact3d

Input: input.i3d 129 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per DirectionSNC4SNC2Default - Disabled0.58891.17781.76672.35562.9445SE +/- 0.00587555, N = 3SE +/- 0.01815766, N = 3SE +/- 0.03355279, N = 32.416143022.496593322.617387051. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

uvg266

Video Input: Bosphorus 4K - Video Preset: Very Fast

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: Very FastDefault - DisabledSNC2SNC41530456075SE +/- 0.38, N = 3SE +/- 0.60, N = 3SE +/- 0.22, N = 366.3262.9961.26

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Person Detection FP32 - Device: CPUSNC4SNC2Default - Disabled80160240320400SE +/- 0.96, N = 3SE +/- 2.29, N = 3SE +/- 0.89, N = 3366.59357.03338.741. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

CloverLeaf

Input: clover_bm

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeaf 1.3Input: clover_bmSNC4Default - DisabledSNC23691215SE +/- 0.07, N = 3SE +/- 0.07, N = 14SE +/- 0.09, N = 1510.9510.9611.821. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Embree

Binary: Pathtracer ISPC - Model: Asian Dragon Obj

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian Dragon ObjDefault - DisabledSNC2SNC4306090120150SE +/- 0.59, N = 3SE +/- 0.58, N = 3SE +/- 0.54, N = 3112.83111.36104.80MIN: 110.06 / MAX: 116.96MIN: 108.45 / MAX: 114.81MIN: 100.15 / MAX: 112.2

TensorFlow

Device: CPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: ResNet-50Default - DisabledSNC2SNC420406080100SE +/- 0.05, N = 3SE +/- 0.22, N = 3SE +/- 0.09, N = 390.4685.5484.50

John The Ripper

Test: Blowfish

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: BlowfishDefault - DisabledSNC2SNC440K80K120K160K200KSE +/- 66.40, N = 3SE +/- 1828.86, N = 12SE +/- 1944.34, N = 151731451692871617901. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

CloverLeaf

Input: clover_bm64_short

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeaf 1.3Input: clover_bm64_shortDefault - DisabledSNC2SNC41020304050SE +/- 0.01, N = 3SE +/- 0.26, N = 3SE +/- 0.08, N = 339.3241.2142.001. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

John The Ripper

Test: bcrypt

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: bcryptDefault - DisabledSNC2SNC440K80K120K160K200KSE +/- 1219.45, N = 3SE +/- 2018.22, N = 4SE +/- 1540.98, N = 151743351717341632591. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

Algebraic Multi-Grid Benchmark

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.2SNC4SNC2Default - Disabled400M800M1200M1600M2000MSE +/- 14033805.07, N = 3SE +/- 5512706.85, N = 3SE +/- 1222576.38, N = 31773758333173214933316628940001. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: FastDefault - DisabledSNC2SNC4246810SE +/- 0.094, N = 3SE +/- 0.051, N = 3SE +/- 0.118, N = 38.7158.2788.1861. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

TensorFlow

Device: CPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: ResNet-50Default - DisabledSNC4SNC21632486480SE +/- 0.29, N = 3SE +/- 0.74, N = 3SE +/- 0.19, N = 370.3566.8866.13

SPECFEM3D

Model: Tomographic Model

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic ModelSNC4SNC2Default - Disabled246810SE +/- 0.068276455, N = 3SE +/- 0.094423144, N = 3SE +/- 0.088577121, N = 37.5194150007.5441437367.9874304521. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Quantum ESPRESSO

Input: AUSURF112

OpenBenchmarking.orgSeconds, Fewer Is BetterQuantum ESPRESSO 7.0Input: AUSURF112SNC4SNC2Default - Disabled70140210280350SE +/- 0.32, N = 3SE +/- 0.87, N = 3SE +/- 0.35, N = 3307.06316.50326.061. (F9X) gfortran options: -pthread -fopenmp -ldevXlib -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3_omp -lfftw3 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CDefault - DisabledSNC2SNC420K40K60K80K100KSE +/- 245.52, N = 3SE +/- 925.13, N = 3SE +/- 107.70, N = 3100728.4997435.2494921.271. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-50Default - DisabledSNC2SNC4306090120150SE +/- 0.04, N = 3SE +/- 0.21, N = 3SE +/- 0.33, N = 3135.02130.65127.24

TensorFlow

Device: CPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-50Default - DisabledSNC2SNC4306090120150SE +/- 0.12, N = 3SE +/- 0.23, N = 3SE +/- 0.12, N = 3118.86115.90112.08

Rodinia

Test: OpenMP Leukocyte

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP LeukocyteDefault - DisabledSNC2SNC4714212835SE +/- 0.23, N = 3SE +/- 0.14, N = 3SE +/- 0.05, N = 329.3930.8931.121. (CXX) g++ options: -O2 -lOpenCL

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lDefault - DisabledSNC2SNC43691215SE +/- 0.08, N = 3SE +/- 0.10, N = 3SE +/- 0.04, N = 311.1610.6510.58MIN: 10.83 / MAX: 11.47MIN: 6.15 / MAX: 11.04MIN: 5.94 / MAX: 11.15

LuxCoreRender

Scene: Orange Juice - Acceleration: CPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: CPUSNC2Default - DisabledSNC4510152025SE +/- 0.29, N = 15SE +/- 0.26, N = 15SE +/- 0.06, N = 322.7722.2421.60MIN: 18.36 / MAX: 29.05MIN: 18.56 / MAX: 28.74MIN: 18.58 / MAX: 28.17

Embree

Binary: Pathtracer ISPC - Model: Asian Dragon

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: Asian DragonDefault - DisabledSNC2SNC4306090120150SE +/- 0.63, N = 3SE +/- 0.56, N = 3SE +/- 1.43, N = 4129.17128.01122.59MIN: 126.31 / MAX: 136MIN: 125.3 / MAX: 131.84MIN: 115.06 / MAX: 129.69

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-152Default - DisabledSNC4SNC248121620SE +/- 0.10, N = 3SE +/- 0.13, N = 3SE +/- 0.19, N = 316.1615.6115.36MIN: 15.77 / MAX: 16.51MIN: 8.77 / MAX: 16.18MIN: 8.91 / MAX: 15.79

PyTorch

Device: CPU - Batch Size: 64 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 64 - Model: ResNet-152SNC2Default - DisabledSNC448121620SE +/- 0.10, N = 3SE +/- 0.11, N = 3SE +/- 0.06, N = 316.0315.9115.24MIN: 9.32 / MAX: 16.38MIN: 15.34 / MAX: 16.26MIN: 8.23 / MAX: 15.81

TensorFlow

Device: CPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: ResNet-50Default - DisabledSNC2SNC41224364860SE +/- 0.23, N = 3SE +/- 0.59, N = 3SE +/- 0.63, N = 351.7749.5149.27

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-152Default - DisabledSNC4SNC248121620SE +/- 0.16, N = 5SE +/- 0.17, N = 3SE +/- 0.14, N = 316.0615.3915.30MIN: 15.37 / MAX: 16.74MIN: 8.86 / MAX: 16.02MIN: 8.79 / MAX: 15.89

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP StreamclusterDefault - DisabledSNC2SNC41.10992.21983.32974.43965.5495SE +/- 0.007, N = 3SE +/- 0.019, N = 3SE +/- 0.027, N = 34.7034.8844.9331. (CXX) g++ options: -O2 -lOpenCL

NAS Parallel Benchmarks

Test / Class: SP.B

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.BSNC4SNC2Default - Disabled30K60K90K120K150KSE +/- 1739.78, N = 3SE +/- 674.53, N = 3SE +/- 676.26, N = 3151839.30151244.49145525.941. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

Radiance Benchmark

Test: SMP Parallel

OpenBenchmarking.orgSeconds, Fewer Is BetterRadiance Benchmark 5.0Test: SMP ParallelSNC2SNC4Default - Disabled306090120150114.28117.44119.19

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: CPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: CPUDefault - DisabledSNC2SNC43691215SE +/- 0.11, N = 3SE +/- 0.09, N = 3SE +/- 0.15, N = 312.3911.9411.89MIN: 5.87 / MAX: 14.11MIN: 5.67 / MAX: 13.6MIN: 5.42 / MAX: 13.74

Rodinia

Test: OpenMP HotSpot3D

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP HotSpot3DDefault - DisabledSNC2SNC41428425670SE +/- 0.82, N = 3SE +/- 0.62, N = 15SE +/- 0.53, N = 1558.8759.8661.231. (CXX) g++ options: -O2 -lOpenCL

SPECFEM3D

Model: Layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered HalfspaceSNC4SNC2Default - Disabled510152025SE +/- 0.14, N = 3SE +/- 0.24, N = 3SE +/- 0.16, N = 318.0418.3618.741. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenFOAM

Input: drivaerFastback, Medium Mesh Size - Mesh Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh TimeSNC4SNC2Default - Disabled306090120150133.74135.10138.651. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-50SNC2SNC4Default - Disabled918273645SE +/- 0.17, N = 3SE +/- 0.17, N = 3SE +/- 0.12, N = 340.0939.8038.68MIN: 22.35 / MAX: 41.64MIN: 26.68 / MAX: 41.28MIN: 37.08 / MAX: 39.76

VVenC

Video Input: Bosphorus 1080p - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 1080p - Video Preset: FastDefault - DisabledSNC2SNC4612182430SE +/- 0.17, N = 3SE +/- 0.29, N = 3SE +/- 0.24, N = 324.3023.7123.451. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection FP16 - Device: CPUSNC4SNC2Default - Disabled1224364860SE +/- 0.15, N = 3SE +/- 0.09, N = 3SE +/- 0.02, N = 351.1951.0649.411. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CSNC4SNC2Default - Disabled20K40K60K80K100KSE +/- 517.95, N = 3SE +/- 59.39, N = 3SE +/- 1045.55, N = 498739.5097068.0195501.501. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

Memcached

Set To Get Ratio: 1:100

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:100Default - DisabledSNC2SNC41.7M3.4M5.1M6.8M8.5MSE +/- 23469.53, N = 3SE +/- 52363.50, N = 3SE +/- 35640.28, N = 37743874.047704668.277507583.681. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

PyTorch

Device: CPU - Batch Size: 64 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 64 - Model: ResNet-50SNC2SNC4Default - Disabled918273645SE +/- 0.20, N = 3SE +/- 0.12, N = 3SE +/- 0.05, N = 340.2739.0739.06MIN: 23.49 / MAX: 42.09MIN: 19.69 / MAX: 41.1MIN: 37.1 / MAX: 40.22

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-50SNC2SNC4Default - Disabled918273645SE +/- 0.40, N = 3SE +/- 0.45, N = 3SE +/- 0.10, N = 340.4139.2939.21MIN: 34.93 / MAX: 42.02MIN: 25.78 / MAX: 41.34MIN: 36.81 / MAX: 40.31

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DSNC2Default - DisabledSNC49001800270036004500SE +/- 8.86, N = 3SE +/- 21.28, N = 3SE +/- 45.26, N = 44282.364279.984155.681. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

John The Ripper

Test: WPA PSK

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: WPA PSKDefault - DisabledSNC2SNC4130K260K390K520K650KSE +/- 2146.03, N = 3SE +/- 4996.06, N = 3SE +/- 4063.87, N = 36142636116695961801. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

LuxCoreRender

Scene: Danish Mood - Acceleration: CPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: CPUSNC4SNC2Default - Disabled3691215SE +/- 0.11, N = 3SE +/- 0.12, N = 3SE +/- 0.08, N = 310.9710.8510.65MIN: 5.05 / MAX: 12.72MIN: 5.02 / MAX: 12.4MIN: 4.82 / MAX: 12.11

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152Default - DisabledSNC4SNC2510152025SE +/- 0.25, N = 3SE +/- 0.20, N = 3SE +/- 0.21, N = 318.9918.6218.44MIN: 17.88 / MAX: 19.86MIN: 10.5 / MAX: 19.64MIN: 9.73 / MAX: 19.35

John The Ripper

Test: HMAC-SHA512

OpenBenchmarking.orgReal C/S, More Is BetterJohn The Ripper 2023.03.14Test: HMAC-SHA512Default - DisabledSNC2SNC460M120M180M240M300MSE +/- 1312178.38, N = 3SE +/- 1930805.92, N = 15SE +/- 2535178.67, N = 122965186672892214002879555001. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lm -lrt -lz -ldl -lcrypt -lbz2

VVenC

Video Input: Bosphorus 1080p - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 1080p - Video Preset: FasterDefault - DisabledSNC2SNC4918273645SE +/- 0.17, N = 3SE +/- 0.32, N = 3SE +/- 0.45, N = 341.4040.2240.201. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionSNC4SNC2Default - Disabled3691215SE +/- 0.08, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 310.3610.4910.661. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Unix MakefilesSNC4SNC2Default - Disabled4080120160200SE +/- 0.32, N = 3SE +/- 0.79, N = 3SE +/- 0.35, N = 3170.54170.83175.47

SPECFEM3D

Model: Homogeneous Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous HalfspaceSNC4SNC2Default - Disabled3691215SE +/- 0.063880399, N = 3SE +/- 0.112412006, N = 4SE +/- 0.062415502, N = 159.6687597339.7428068039.9429897621. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Mount St. Helens

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. HelensSNC4Default - DisabledSNC2246810SE +/- 0.073100963, N = 3SE +/- 0.077041220, N = 3SE +/- 0.022697239, N = 37.5451761787.7248900327.7562337371. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUDefault - DisabledSNC4SNC213002600390052006500SE +/- 10.15, N = 3SE +/- 15.69, N = 3SE +/- 14.13, N = 35936.375797.285783.011. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection Retail FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection Retail FP16-INT8 - Device: CPUDefault - DisabledSNC4SNC21.2422.4843.7264.9686.21SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.385.515.52MIN: 3.31 / MAX: 23.31MIN: 4.84 / MAX: 15.48MIN: 4.91 / MAX: 14.411. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon NanotubeSNC4SNC2Default - Disabled918273645SE +/- 0.25, N = 3SE +/- 0.23, N = 3SE +/- 0.34, N = 337.1637.8038.131. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

OpenVINO

Model: Face Detection Retail FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection Retail FP16-INT8 - Device: CPUDefault - DisabledSNC4SNC24K8K12K16K20KSE +/- 14.08, N = 3SE +/- 11.11, N = 3SE +/- 14.02, N = 317809.4317402.7617366.291. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

uvg266

Video Input: Bosphorus 1080p - Video Preset: Super Fast

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: Super FastDefault - DisabledSNC4SNC250100150200250SE +/- 0.37, N = 3SE +/- 2.03, N = 3SE +/- 1.37, N = 3211.36206.92206.23

uvg266

Video Input: Bosphorus 1080p - Video Preset: Very Fast

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: Very FastDefault - DisabledSNC4SNC250100150200250SE +/- 1.32, N = 3SE +/- 1.28, N = 3SE +/- 0.63, N = 3208.36207.62203.48

SPECFEM3D

Model: Water-layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered HalfspaceSNC4SNC2Default - Disabled510152025SE +/- 0.06, N = 3SE +/- 0.19, N = 3SE +/- 0.08, N = 318.8219.2219.271. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50SNC2SNC4Default - Disabled1122334455SE +/- 0.25, N = 3SE +/- 0.25, N = 3SE +/- 0.07, N = 348.8348.3647.70MIN: 28.77 / MAX: 50.91MIN: 25.09 / MAX: 50.72MIN: 44.82 / MAX: 49.01

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingDefault - DisabledSNC4SNC2140K280K420K560K700KSE +/- 364.51, N = 3SE +/- 5454.96, N = 3SE +/- 11076.26, N = 36552036491206409281. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsSNC4SNC2Default - Disabled40K80K120K160K200KSE +/- 451.28, N = 3SE +/- 708.72, N = 3SE +/- 64.39, N = 3187197.63185070.83183161.111. (CC) gcc options: -fPIC -O3 -O2 -lpthread -lpciaccess -lm

uvg266

Video Input: Bosphorus 1080p - Video Preset: Ultra Fast

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: Ultra FastSNC4Default - DisabledSNC250100150200250SE +/- 2.40, N = 3SE +/- 0.58, N = 3SE +/- 0.83, N = 3208.97207.97204.49

QMCPACK

Input: Li2_STO_ae

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: Li2_STO_aeSNC4Default - DisabledSNC220406080100SE +/- 0.06, N = 3SE +/- 0.55, N = 3SE +/- 1.13, N = 3103.77103.90106.011. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

uvg266

Video Input: Bosphorus 4K - Video Preset: Slow

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: SlowDefault - DisabledSNC2SNC4714212835SE +/- 0.09, N = 3SE +/- 0.09, N = 3SE +/- 0.13, N = 329.9229.4729.30

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingDefault - DisabledSNC4SNC2120K240K360K480K600KSE +/- 410.41, N = 3SE +/- 2055.13, N = 3SE +/- 2142.48, N = 35469365368205357131. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

QuantLib

Configuration: Multi-Threaded

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.32Configuration: Multi-ThreadedSNC4SNC2Default - Disabled70K140K210K280K350KSE +/- 3981.12, N = 3SE +/- 1378.05, N = 3SE +/- 716.12, N = 3317145.3313010.5310771.11. (CXX) g++ options: -O3 -march=native -fPIE -pie

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CDefault - DisabledSNC2SNC420K40K60K80K100KSE +/- 61.36, N = 3SE +/- 271.76, N = 3SE +/- 287.86, N = 389217.9188668.1387467.231. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128Default - DisabledSNC2SNC4400800120016002000SE +/- 2.22, N = 3SE +/- 0.82, N = 3SE +/- 11.98, N = 32043.52026.22004.71. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

NAS Parallel Benchmarks

Test / Class: LU.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: LU.CSNC4SNC2Default - Disabled60K120K180K240K300KSE +/- 1304.12, N = 3SE +/- 427.36, N = 3SE +/- 1442.62, N = 3259883.62256378.22255135.241. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

PostgreSQL

Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 1000 - Clients: 1000 - Mode: Read OnlyDefault - DisabledSNC4SNC2400K800K1200K1600K2000KSE +/- 7755.83, N = 3SE +/- 14067.03, N = 3SE +/- 19687.71, N = 61986807195650919505071. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only - Average LatencyDefault - DisabledSNC4SNC20.11540.23080.34620.46160.577SE +/- 0.002, N = 3SE +/- 0.004, N = 3SE +/- 0.005, N = 60.5040.5110.5131. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.3Binary: Pathtracer ISPC - Model: CrownDefault - DisabledSNC4SNC220406080100SE +/- 0.46, N = 3SE +/- 0.40, N = 3SE +/- 0.53, N = 3108.08106.43106.26MIN: 105.13 / MAX: 117.76MIN: 103.16 / MAX: 113.77MIN: 102.99 / MAX: 115.22

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkSNC2SNC4Default - Disabled160320480640800SE +/- 1.99, N = 3SE +/- 7.89, N = 3SE +/- 5.73, N = 3758.02753.64746.05

Memcached

Set To Get Ratio: 1:10

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:10SNC2Default - DisabledSNC41.2M2.4M3.6M4.8M6MSE +/- 17431.43, N = 3SE +/- 23305.65, N = 3SE +/- 8577.07, N = 35818020.665811757.935728479.091. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Timed CPython Compilation

Build Configuration: Released Build, PGO + LTO Optimized

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed CPython Compilation 3.10.6Build Configuration: Released Build, PGO + LTO OptimizedSNC4SNC2Default - Disabled4080120160200185.51188.07188.35

LuxCoreRender

Scene: DLSC - Acceleration: CPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: CPUDefault - DisabledSNC2SNC448121620SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 315.2315.1915.00MIN: 14.85 / MAX: 18.76MIN: 14.66 / MAX: 18.88MIN: 14.61 / MAX: 18.76

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: CPU-OnlyDefault - DisabledSNC4SNC2510152025SE +/- 0.09, N = 3SE +/- 0.05, N = 3SE +/- 0.20, N = 319.4719.7119.76

uvg266

Video Input: Bosphorus 4K - Video Preset: Medium

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 4K - Video Preset: MediumDefault - DisabledSNC2SNC4816243240SE +/- 0.12, N = 3SE +/- 0.17, N = 3SE +/- 0.08, N = 333.2333.1232.75

Liquid-DSP

Threads: 192 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 192 - Buffer Length: 256 - Filter Length: 512Default - DisabledSNC2SNC4300M600M900M1200M1500MSE +/- 4629254.80, N = 3SE +/- 5353295.97, N = 3SE +/- 5108163.40, N = 31514200000149913333314930000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Weld Porosity Detection FP16 - Device: CPUSNC4SNC2Default - Disabled510152025SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 319.0619.0819.33MIN: 17.12 / MAX: 40.94MIN: 15.77 / MAX: 56.75MIN: 9.33 / MAX: 85.741. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256SNC4SNC2Default - Disabled6001200180024003000SE +/- 21.00, N = 3SE +/- 29.68, N = 3SE +/- 18.22, N = 32599.92583.62564.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57SNC2Default - DisabledSNC4400M800M1200M1600M2000MSE +/- 3811532.21, N = 3SE +/- 3985947.54, N = 3SE +/- 6406333.67, N = 31744433333173056666717212666671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OSPRay Studio

Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUSNC2Default - DisabledSNC48K16K24K32K40KSE +/- 192.95, N = 3SE +/- 105.83, N = 3SE +/- 65.34, N = 3380153826438523

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Weld Porosity Detection FP16 - Device: CPUSNC4SNC2Default - Disabled11002200330044005500SE +/- 6.67, N = 3SE +/- 9.79, N = 3SE +/- 6.69, N = 35026.695023.424962.131. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read OnlyDefault - DisabledSNC4SNC2800K1600K2400K3200K4000KSE +/- 49179.25, N = 3SE +/- 37681.71, N = 6SE +/- 21467.51, N = 33792698375909437460061. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

OSPRay Studio

Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC2SNC410K20K30K40K50KSE +/- 70.44, N = 3SE +/- 80.70, N = 3SE +/- 146.66, N = 3439994430044535

Liquid-DSP

Threads: 64 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 64 - Buffer Length: 256 - Filter Length: 512Default - DisabledSNC2SNC4200M400M600M800M1000MSE +/- 8999104.28, N = 3SE +/- 8992613.64, N = 3SE +/- 6276308.19, N = 39384933339351700009272766671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenRadioss

Model: Chrysler Neon 1M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Chrysler Neon 1MSNC2SNC4Default - Disabled306090120150SE +/- 0.28, N = 3SE +/- 0.27, N = 3SE +/- 0.18, N = 3155.50156.21157.35

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsSNC4SNC2Default - Disabled0.05810.11620.17430.23240.2905SE +/- 0.00197, N = 3SE +/- 0.00284, N = 3SE +/- 0.00087, N = 30.255020.256120.25803

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32SNC4Default - DisabledSNC2120240360480600SE +/- 1.96, N = 3SE +/- 0.58, N = 3SE +/- 2.72, N = 3559.8555.6553.31. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

uvg266

Video Input: Bosphorus 1080p - Video Preset: Slow

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: SlowDefault - DisabledSNC2SNC420406080100SE +/- 0.27, N = 3SE +/- 0.19, N = 3SE +/- 0.40, N = 389.1988.6888.18

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled3691215SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 39.619.639.72MIN: 8.18 / MAX: 19.44MIN: 8.28 / MAX: 16.71MIN: 4.95 / MAX: 29.091. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average LatencyDefault - DisabledSNC4SNC20.06010.12020.18030.24040.3005SE +/- 0.003, N = 3SE +/- 0.003, N = 6SE +/- 0.002, N = 30.2640.2660.2671. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

OpenSSL

Algorithm: SHA512

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: SHA512SNC4SNC2Default - Disabled9000M18000M27000M36000M45000MSE +/- 16399997.56, N = 3SE +/- 19206282.00, N = 3SE +/- 32408039.38, N = 34320664198343151618910427239116531. (CC) gcc options: -pthread -m64 -O3 -ldl

OSPRay Studio

Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC2SNC430060090012001500SE +/- 1.53, N = 3SE +/- 2.40, N = 3SE +/- 4.33, N = 3125212591266

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled2K4K6K8K10KSE +/- 26.50, N = 3SE +/- 17.38, N = 3SE +/- 30.53, N = 39973.189948.309866.451. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 512Default - DisabledSNC2SNC4300M600M900M1200M1500MSE +/- 6222807.51, N = 3SE +/- 6948700.92, N = 3SE +/- 7198688.15, N = 31314200000130283333313002333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OSPRay Studio

Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC2SNC48K16K24K32K40KSE +/- 51.31, N = 3SE +/- 106.88, N = 3SE +/- 154.47, N = 3378903800938284

Memcached

Set To Get Ratio: 1:5

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:5SNC2Default - DisabledSNC4700K1400K2100K2800K3500KSE +/- 33459.28, N = 3SE +/- 15559.68, N = 3SE +/- 44466.09, N = 33359880.553359726.873327393.351. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

OSPRay Studio

Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC4SNC22004006008001000SE +/- 4.18, N = 3SE +/- 1.86, N = 3SE +/- 2.73, N = 3106410731074

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled20406080100SE +/- 0.15, N = 3SE +/- 0.12, N = 3SE +/- 0.15, N = 397.8597.7696.951. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OSPRay Studio

Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC2SNC44K8K12K16K20KSE +/- 42.71, N = 3SE +/- 18.75, N = 3SE +/- 57.00, N = 3200172014420199

OpenVINO

Model: Road Segmentation ADAS FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Road Segmentation ADAS FP16-INT8 - Device: CPUSNC2SNC4Default - Disabled400800120016002000SE +/- 7.84, N = 3SE +/- 7.31, N = 3SE +/- 4.08, N = 31914.891911.681898.141. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OSPRay Studio

Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUSNC2Default - DisabledSNC44K8K12K16K20KSE +/- 54.03, N = 3SE +/- 94.37, N = 3SE +/- 70.54, N = 3171131719317262

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: CPU-OnlyDefault - DisabledSNC4SNC2918273645SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 338.0238.2438.33

uvg266

Video Input: Bosphorus 1080p - Video Preset: Medium

OpenBenchmarking.orgFrames Per Second, More Is Betteruvg266 0.4.1Video Input: Bosphorus 1080p - Video Preset: MediumDefault - DisabledSNC2SNC420406080100SE +/- 0.23, N = 3SE +/- 0.18, N = 3SE +/- 0.34, N = 398.3398.2297.60

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64SNC4Default - DisabledSNC22004006008001000SE +/- 1.09, N = 3SE +/- 0.53, N = 3SE +/- 2.33, N = 31059.11055.41052.31. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CSNC4SNC2Default - Disabled50K100K150K200K250KSE +/- 112.43, N = 3SE +/- 322.09, N = 3SE +/- 280.16, N = 3215764.82215684.06214445.971. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5

OpenSSL

Algorithm: AES-128-GCM

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: AES-128-GCMSNC4SNC2Default - Disabled200000M400000M600000M800000M1000000MSE +/- 179395215.71, N = 3SE +/- 1081587572.26, N = 3SE +/- 2155035998.71, N = 39469333786209433657325809414427014471. (CC) gcc options: -pthread -m64 -O3 -ldl

OpenSSL

Algorithm: RSA4096

OpenBenchmarking.orgverify/s, More Is BetterOpenSSL 3.1Algorithm: RSA4096SNC4Default - DisabledSNC2300K600K900K1200K1500KSE +/- 3270.23, N = 3SE +/- 3444.37, N = 3SE +/- 486.05, N = 31538165.31533067.11529451.71. (CC) gcc options: -pthread -m64 -O3 -ldl

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateDefault - DisabledSNC2SNC41020304050SE +/- 0.18, N = 3SE +/- 0.36, N = 3SE +/- 0.42, N = 343.8843.7743.631. (CC) gcc options: -O3 -march=native -fopenmp

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57SNC4Default - DisabledSNC21000M2000M3000M4000M5000MSE +/- 22778084.01, N = 3SE +/- 17623122.44, N = 3SE +/- 10235938.86, N = 34518133333449553333344939666671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 64 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 64 - Buffer Length: 256 - Filter Length: 32Default - DisabledSNC4SNC2600M1200M1800M2400M3000MSE +/- 26479677.74, N = 3SE +/- 26768908.17, N = 3SE +/- 22995168.57, N = 32646100000264343333326322333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: CPU-OnlyDefault - DisabledSNC4SNC248121620SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 315.2515.3115.32

OSPRay Studio

Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC2SNC44K8K12K16K20KSE +/- 35.36, N = 3SE +/- 20.23, N = 3SE +/- 2.67, N = 3169391699917014

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: CPU-OnlyDefault - DisabledSNC4SNC21122334455SE +/- 0.20, N = 3SE +/- 0.29, N = 3SE +/- 0.23, N = 346.7746.8846.97

Liquid-DSP

Threads: 192 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 192 - Buffer Length: 256 - Filter Length: 57SNC4Default - DisabledSNC21200M2400M3600M4800M6000MSE +/- 14304195.19, N = 3SE +/- 18999298.23, N = 3SE +/- 15429013.07, N = 35416600000540720000053936333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32SNC4SNC2Default - Disabled300M600M900M1200M1500MSE +/- 3769320.60, N = 3SE +/- 5024716.69, N = 3SE +/- 1937638.88, N = 31440933333143643333314353333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenVINO

Model: Handwritten English Recognition FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Handwritten English Recognition FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled1020304050SE +/- 0.16, N = 3SE +/- 0.11, N = 3SE +/- 0.07, N = 345.0845.2245.25MIN: 36.6 / MAX: 52.28MIN: 37.77 / MAX: 56.52MIN: 34.3 / MAX: 61.31. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OSPRay Studio

Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPU

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.13Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer - Acceleration: CPUDefault - DisabledSNC4SNC22004006008001000SE +/- 3.79, N = 3SE +/- 3.71, N = 3SE +/- 3.61, N = 3107610791080

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32SNC4SNC2Default - Disabled900M1800M2700M3600M4500MSE +/- 26314254.69, N = 3SE +/- 26426018.32, N = 3SE +/- 18076719.22, N = 34243600000423356666742281333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenVINO

Model: Handwritten English Recognition FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Handwritten English Recognition FP16-INT8 - Device: CPUSNC4SNC2Default - Disabled5001000150020002500SE +/- 7.69, N = 3SE +/- 5.01, N = 3SE +/- 3.37, N = 32127.422121.182120.231. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

Liquid-DSP

Threads: 192 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 192 - Buffer Length: 256 - Filter Length: 32SNC4Default - DisabledSNC21200M2400M3600M4800M6000MSE +/- 29526993.30, N = 3SE +/- 19784955.00, N = 3SE +/- 18095333.96, N = 35535500000552673333355199333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenSSL

Algorithm: ChaCha20-Poly1305

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: ChaCha20-Poly1305SNC4SNC2Default - Disabled80000M160000M240000M320000M400000MSE +/- 129282934.95, N = 3SE +/- 140024932.44, N = 3SE +/- 176665463.84, N = 33627655000233619875476833618523758601. (CC) gcc options: -pthread -m64 -O3 -ldl

Rodinia

Test: OpenMP LavaMD

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP LavaMDSNC2Default - DisabledSNC4612182430SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 326.9126.9526.971. (CXX) g++ options: -O2 -lOpenCL

OpenSSL

Algorithm: AES-256-GCM

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: AES-256-GCMSNC4SNC2Default - Disabled200000M400000M600000M800000M1000000MSE +/- 1023563571.48, N = 3SE +/- 991457061.79, N = 3SE +/- 1400557991.78, N = 38171313871078153933583578151757053971. (CC) gcc options: -pthread -m64 -O3 -ldl

Liquid-DSP

Threads: 64 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 64 - Buffer Length: 256 - Filter Length: 57SNC4Default - DisabledSNC2600M1200M1800M2400M3000MSE +/- 1844210.16, N = 3SE +/- 20219380.14, N = 3SE +/- 14068522.78, N = 33023166667302090000030161000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 512SNC2SNC4Default - Disabled110M220M330M440M550MSE +/- 3417076.40, N = 3SE +/- 3819181.12, N = 3SE +/- 2767116.43, N = 35236033335229966675225600001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenSSL

Algorithm: ChaCha20

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: ChaCha20SNC4Default - DisabledSNC2110000M220000M330000M440000M550000MSE +/- 56755200.60, N = 3SE +/- 157967680.92, N = 3SE +/- 45048908.57, N = 35123185804175116942446535113183479571. (CC) gcc options: -pthread -m64 -O3 -ldl

Rodinia

Test: OpenMP CFD Solver

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenMP CFD SolverDefault - DisabledSNC2SNC41.25662.51323.76985.02646.283SE +/- 0.036, N = 3SE +/- 0.002, N = 3SE +/- 0.013, N = 35.5775.5815.5851. (CXX) g++ options: -O2 -lOpenCL

OpenSSL

Algorithm: RSA4096

OpenBenchmarking.orgsign/s, More Is BetterOpenSSL 3.1Algorithm: RSA4096SNC4Default - DisabledSNC211K22K33K44K55KSE +/- 90.07, N = 3SE +/- 49.04, N = 3SE +/- 67.46, N = 349924.949897.349862.11. (CC) gcc options: -pthread -m64 -O3 -ldl

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: CPU-OnlySNC2SNC4Default - Disabled306090120150SE +/- 0.26, N = 3SE +/- 0.23, N = 3SE +/- 0.18, N = 3136.20136.28136.29

OpenSSL

Algorithm: SHA256

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSL 3.1Algorithm: SHA256SNC4Default - DisabledSNC230000M60000M90000M120000M150000MSE +/- 282602364.23, N = 3SE +/- 311439878.18, N = 3SE +/- 241913832.01, N = 31317066984271316294998931316274639831. (CC) gcc options: -pthread -m64 -O3 -ldl

PyTorch

Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_lDefault - DisabledSNC2SNC4246810SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.13, N = 66.365.383.68MIN: 5.69 / MAX: 6.62MIN: 3.75 / MAX: 5.87MIN: 1.19 / MAX: 6.36

PyTorch

Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_lDefault - DisabledSNC2SNC4246810SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.29, N = 66.355.314.07MIN: 5.72 / MAX: 6.64MIN: 3.81 / MAX: 5.85MIN: 1.11 / MAX: 6.41

PyTorch

Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_lDefault - DisabledSNC2SNC4246810SE +/- 0.02, N = 3SE +/- 0.06, N = 3SE +/- 0.23, N = 96.345.364.15MIN: 5.68 / MAX: 6.64MIN: 3.75 / MAX: 5.85MIN: 1.09 / MAX: 6.41

PostgreSQL

Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write - Average LatencyDefault - DisabledSNC2SNC41530456075SE +/- 1.61, N = 12SE +/- 0.82, N = 3SE +/- 0.79, N = 460.6663.4465.331. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 1000 - Clients: 1000 - Mode: Read WriteDefault - DisabledSNC2SNC44K8K12K16K20KSE +/- 472.23, N = 12SE +/- 204.24, N = 3SE +/- 181.92, N = 41662315769153141. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average LatencyDefault - DisabledSNC2SNC41632486480SE +/- 0.50, N = 12SE +/- 3.12, N = 12SE +/- 3.47, N = 1255.4072.1572.651. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read WriteDefault - DisabledSNC2SNC44K8K12K16K20KSE +/- 160.41, N = 12SE +/- 496.25, N = 12SE +/- 577.54, N = 121806814096140671. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bareSNC4SNC2Default - Disabled3691215SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.38, N = 910.6610.4510.381. (CXX) g++ options: -O3

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingDefault - DisabledSNC2SNC49K18K27K36K45KSE +/- 199.70, N = 3SE +/- 341.22, N = 3SE +/- 743.57, N = 1543532.943138.935893.71. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingSNC2Default - DisabledSNC49K18K27K36K45KSE +/- 463.37, N = 3SE +/- 170.33, N = 3SE +/- 691.76, N = 1540552.440198.334358.51. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ClickHouse

100M Rows Hits Dataset, Third Run

OpenBenchmarking.orgQueries Per Minute, Geo Mean, More Is BetterClickHouse 22.12.3.5100M Rows Hits Dataset, Third RunDefault - DisabledSNC2SNC4110220330440550SE +/- 5.43, N = 3SE +/- 8.91, N = 12SE +/- 10.68, N = 12504.59444.12400.96MIN: 58.03 / MAX: 3750MIN: 48.62 / MAX: 7500MIN: 41.1 / MAX: 6000

ClickHouse

100M Rows Hits Dataset, Second Run

OpenBenchmarking.orgQueries Per Minute, Geo Mean, More Is BetterClickHouse 22.12.3.5100M Rows Hits Dataset, Second RunDefault - DisabledSNC2SNC4110220330440550SE +/- 6.17, N = 3SE +/- 8.80, N = 12SE +/- 10.89, N = 12490.52438.54397.01MIN: 34.27 / MAX: 4615.38MIN: 35.21 / MAX: 6666.67MIN: 39.66 / MAX: 6000

ClickHouse

100M Rows Hits Dataset, First Run / Cold Cache

OpenBenchmarking.orgQueries Per Minute, Geo Mean, More Is BetterClickHouse 22.12.3.5100M Rows Hits Dataset, First Run / Cold CacheDefault - DisabledSNC2SNC4100200300400500SE +/- 4.35, N = 3SE +/- 8.46, N = 12SE +/- 10.37, N = 12457.06417.30388.42MIN: 47.36 / MAX: 4285.71MIN: 32.68 / MAX: 6000MIN: 30.26 / MAX: 6000

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 15Total TimeSNC4SNC2Default - Disabled60M120M180M240M300MSE +/- 6406046.35, N = 15SE +/- 3381815.04, N = 15SE +/- 2377868.67, N = 33002671482957807982873310971. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver

easyWave

Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400Default - DisabledSNC2SNC420406080100SE +/- 0.47, N = 9SE +/- 0.49, N = 3SE +/- 3.38, N = 1258.6559.2575.171. (CXX) g++ options: -O3 -fopenmp

easyWave

Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200Default - DisabledSNC2SNC4714212835SE +/- 0.32, N = 15SE +/- 0.46, N = 15SE +/- 0.57, N = 1223.9925.4828.741. (CXX) g++ options: -O3 -fopenmp

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: CPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: CPUSNC4Default - DisabledSNC2816243240SE +/- 0.24, N = 3SE +/- 1.27, N = 15SE +/- 1.17, N = 1534.7232.7531.82MIN: 34.26 / MAX: 35.1MIN: 17.73 / MAX: 35.6MIN: 18.52 / MAX: 35.42

NAS Parallel Benchmarks

Test / Class: EP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.CSNC4SNC2Default - Disabled2K4K6K8K10KSE +/- 304.30, N = 12SE +/- 257.16, N = 15SE +/- 60.80, N = 310411.609712.818869.841. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.5


Phoronix Test Suite v10.8.5