Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2308099-NE-XEONPLATI49&grr&sro&rro.

Xeon Platinum 8380 AVX-512 WorkloadsProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen Resolution0xd0003900xd0003a52 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Ice Lake IEH512GB7682GB INTEL SSDPF2KX076TZASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 22.106.5.0-060500rc4daily20230804-generic (x86_64)GNOME Shell 43.0X Server 1.21.1.31.3.224GCC 12.2.0ext41920x10806.5.0-rc5-phx-tues (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- 0xd000390: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390- 0xd0003a5: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5Python Details- Python 3.10.7Security Details- 0xd000390: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - 0xd0003a5: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

Xeon Platinum 8380 AVX-512 Workloadstensorflow: CPU - 512 - ResNet-50tensorflow: CPU - 256 - ResNet-50libxsmm: 128onnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardopenvkl: vklBenchmark ISPConednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonnx: yolov4 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardqmcpack: FeCO6_b3lyp_gmsonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Standardlibxsmm: 256tensorflow: CPU - 512 - GoogLeNetospray: particle_volume/pathtracer/real_timemrbayes: Primate Phylogeny Analysisonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardqmcpack: FeCO6_b3lyp_gmspalabos: 100cpuminer-opt: Garlicoinospray: particle_volume/scivis/real_timedeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streampalabos: 400qmcpack: Li2_STO_aepalabos: 500ncnn: CPU - FastestDetncnn: CPU - vision_transformerncnn: CPU - regnety_400mncnn: CPU - squeezenet_ssdncnn: CPU - yolov4-tinyncnn: CPU - resnet50ncnn: CPU - alexnetncnn: CPU - resnet18ncnn: CPU - vgg16ncnn: CPU - googlenetncnn: CPU - blazefacencnn: CPU - efficientnet-b0ncnn: CPU - mnasnetncnn: CPU - shufflenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU - mobilenetvvenc: Bosphorus 4K - Fastospray: particle_volume/ao/real_timetensorflow: CPU - 256 - GoogLeNetcpuminer-opt: Myriad-Groestllaghos: Sedov Blast Wave, ube_922_hex.meshonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUsimdjson: PartialTweetssimdjson: DistinctUserIDsimdjson: TopTweettensorflow: CPU - 512 - AlexNetdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUvvenc: Bosphorus 4K - Fasteropenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUsvt-hevc: 1 - Bosphorus 4Ksimdjson: Kostyadeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timedeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamsimdjson: LargeRandvpxenc: Speed 5 - Bosphorus 4Kminibude: OpenMP - BM2minibude: OpenMP - BM2specfem3d: Water-layered Halfspaceonednn: IP Shapes 3D - bf16bf16bf16 - CPUdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamlaghos: Triple Point Problemdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamqmcpack: simple-H2Otensorflow: CPU - 256 - AlexNetdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamlibxsmm: 64specfem3d: Layered Halfspaceblender: Fishy Cat - CPU-Onlylibxsmm: 32cpuminer-opt: LBC, LBRY Creditscpuminer-opt: scryptcpuminer-opt: Skeincoincpuminer-opt: Blake-2 Scpuminer-opt: Magicpuminer-opt: Deepcoincpuminer-opt: Triple SHA-256, Onecoincpuminer-opt: x25xcpuminer-opt: Quad SHA-256, Pyritegromacs: MPI CPU - water_GMX50_bareblender: BMW27 - CPU-Onlyonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUspecfem3d: Homogeneous Halfspacedav1d: Chimera 1080pspecfem3d: Tomographic Modelspecfem3d: Mount St. Helensoidn: RTLightmap.hdr.4096x4096 - CPU-Onlydav1d: Summer Nature 4Ksvt-av1: Preset 8 - Bosphorus 4Kremhos: Sample Remap Examplecloverleaf: Lagrangian-Eulerian Hydrodynamicsincompact3d: input.i3d 193 Cells Per Directionminibude: OpenMP - BM1minibude: OpenMP - BM1embree: Pathtracer ISPC - Crownoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyembree: Pathtracer ISPC - Asian Dragonsvt-hevc: 7 - Bosphorus 4Konednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUsvt-hevc: 10 - Bosphorus 4Ksvt-av1: Preset 12 - Bosphorus 4Ksvt-av1: Preset 13 - Bosphorus 4Kheffte: c2c - FFTW - double - 256onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUheffte: r2c - FFTW - double - 256heffte: c2c - FFTW - float - 256heffte: r2c - FFTW - float - 256heffte: c2c - FFTW - float - 128heffte: r2c - FFTW - double - 128heffte: c2c - FFTW - double - 128heffte: r2c - FFTW - float - 1280xd0003900xd0003a585.9783.891941.1110.3239.08067912832.45285.798711.66055.54783180.16325.568739.11571.43407696.7256.31428158.355268.5659.841516.7110594.6317.27150.281166.5284.52403221.020147.51312.1952920324.9506405.966298.3394388.476124.23413.20710.2046.9245.5415.3423.7117.155.398.9723.8615.364.3511.627.579.898.888.0315.465.72224.7473309.6343127385.89524.3814.625.525.60760.1593.5399426.94951490.7613.291517.6913.03827.5124.04209.3195.4279.47251.009.772039.6310.36433.892344.971.1667604.0037.99431051.78688.509396.521.3359274.064.514419.1717.801121.5610.462.61474.691784.018117.35042301.155420.578021.0761129.8095307.8729551.119572.0674551.818072.17880.8512.63101.0762526.88731.1469512382.59271173.7945230.052543.3033922.9302256.2762.0099644.443039.555723.2794.6126422.116739.73081005.584339.69541006.34075.22107633.25981098.829.49798789830.74604.74216602319.3161333344623272309.476467713322372659.179217309.23423.833.9036018.022236470515.8114.57419202213.1486923621.46281.3667.17012.24512.0411.024027894.1362353.38988.19413.03104.6844138.752.06936184.38180.967175.10246.35163.6252693.7474101.977224.417154.731144.94293.0906195.19984.8486.931978.9117.5458.59908856816.50889.829911.15395.28095190.57025.860138.71851.51029664.5376.71968158.014263.2361.509116.3098600.2323.79136.853165.4194.62085216.481178.19312.53022086.2516.3849421.754494.2573393.844123.26417.4839.7145.2338.8516.1024.1018.515.469.4225.7116.584.5411.717.419.768.767.9615.665.70516.4611321.7243450386.08521.7424.775.715.75839.41100.1856398.71531496.1913.251519.8413.04823.0924.18209.0295.5478.18255.099.632070.7210.41533.632362.841.1667754.3445.9597869.03938.489419.771.3359377.964.494442.9817.871117.0910.452.87478.851083.204119.30902068.304718.462618.8862136.1780293.1241553.268271.8868555.087671.54360.9612.33104.0482601.21231.3817985282.57437172.5579231.427742.9405930.7506256.8764.4439620.254841.246781.2596.9429412.053140.4508987.545239.41311013.82155.62107092.50301177.129.37360655130.90609.24231302321.7461713044666532308.666489713331172659.559262779.09423.723.9132217.751628972514.5814.14599954612.9487597951.46280.8466.46012.40111.9811.004196894.8972372.41883.46213.03101.0959138.492.06967182.74177.721177.20346.98333.6226694.8614102.920226.783159.100153.79093.9207198.869OpenBenchmarking.org

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-500xd0003a50xd00039020406080100SE +/- 1.17, N = 3SE +/- 1.13, N = 384.8485.97

TensorFlow

Device: CPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-500xd0003a50xd00039020406080100SE +/- 0.80, N = 9SE +/- 0.37, N = 386.9383.89

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 1280xd0003a50xd000390400800120016002000SE +/- 20.92, N = 3SE +/- 32.53, N = 71978.91941.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003a50xd000390306090120150SE +/- 3.29, N = 15SE +/- 1.29, N = 15117.55110.321. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003a50xd0003903691215SE +/- 0.23491, N = 15SE +/- 0.10283, N = 158.599089.080671. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPC0xd0003a50xd0003902004006008001000SE +/- 0.88, N = 3SE +/- 1.53, N = 3856912MIN: 137 / MAX: 7211MIN: 140 / MAX: 7236

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003902004006008001000SE +/- 14.20, N = 12SE +/- 16.59, N = 15816.51832.45MIN: 723.44MIN: 714.881. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003a50xd00039020406080100SE +/- 1.07, N = 15SE +/- 0.79, N = 789.8385.801. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003a50xd0003903691215SE +/- 0.13, N = 15SE +/- 0.11, N = 711.1511.661. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003a50xd0003901.24832.49663.74494.99326.2415SE +/- 0.11956, N = 15SE +/- 0.00245, N = 35.280955.547831. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003a50xd0003904080120160200SE +/- 4.37, N = 15SE +/- 0.08, N = 3190.57180.161. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003a50xd000390612182430SE +/- 0.26, N = 15SE +/- 0.30, N = 325.8625.571. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003a50xd000390918273645SE +/- 0.38, N = 15SE +/- 0.46, N = 338.7239.121. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003a50xd0003900.33980.67961.01941.35921.699SE +/- 0.02722, N = 15SE +/- 0.00633, N = 31.510291.434071. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003a50xd000390150300450600750SE +/- 12.01, N = 15SE +/- 3.08, N = 3664.54696.731. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003a50xd000390246810SE +/- 0.43315, N = 15SE +/- 0.00502, N = 36.719686.314281. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003a50xd000390306090120150SE +/- 10.25, N = 15SE +/- 0.13, N = 3158.01158.361. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

QMCPACK

Input: FeCO6_b3lyp_gms

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003a50xd00039060120180240300SE +/- 2.19, N = 3SE +/- 3.60, N = 3263.23268.561. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003a50xd0003901428425670SE +/- 1.14, N = 12SE +/- 0.28, N = 361.5159.841. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003a50xd00039048121620SE +/- 0.26, N = 12SE +/- 0.08, N = 316.3116.711. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 2560xd0003a50xd000390130260390520650SE +/- 2.54, N = 3SE +/- 2.33, N = 3600.2594.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

TensorFlow

Device: CPU - Batch Size: 512 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNet0xd0003a50xd00039070140210280350SE +/- 0.43, N = 3SE +/- 0.87, N = 3323.79317.27

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_time0xd0003a50xd000390306090120150SE +/- 0.45, N = 3SE +/- 0.54, N = 3136.85150.28

Timed MrBayes Analysis

Primate Phylogeny Analysis

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis0xd0003a50xd0003904080120160200SE +/- 0.98, N = 3SE +/- 1.36, N = 3165.42166.531. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003a50xd0003901.03972.07943.11914.15885.1985SE +/- 0.04028, N = 8SE +/- 0.02854, N = 34.620854.524031. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003a50xd00039050100150200250SE +/- 1.91, N = 8SE +/- 1.39, N = 3216.48221.021. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

QMCPACK

Input: FeCO6_b3lyp_gms

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003a50xd0003904080120160200SE +/- 0.31, N = 3SE +/- 0.12, N = 3178.19147.511. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Palabos

Grid Size: 100

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 1000xd0003a50xd00039070140210280350SE +/- 0.20, N = 3SE +/- 0.89, N = 3312.53312.201. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Cpuminer-Opt

Algorithm: Garlicoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Garlicoin0xd0003a50xd0003906K12K18K24K30KSE +/- 3833.17, N = 12SE +/- 330.17, N = 322086.2529203.001. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_time0xd0003a50xd000390612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 316.3824.95

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039090180270360450SE +/- 2.15, N = 3SE +/- 0.42, N = 3421.75405.97

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 0.51, N = 3SE +/- 0.12, N = 394.2698.34

Palabos

Grid Size: 400

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 4000xd0003a50xd00039090180270360450SE +/- 0.06, N = 3SE +/- 0.94, N = 3393.84388.481. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

QMCPACK

Input: Li2_STO_ae

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: Li2_STO_ae0xd0003a50xd000390306090120150SE +/- 1.03, N = 3SE +/- 1.55, N = 3123.26124.231. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Palabos

Grid Size: 500

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 5000xd0003a50xd00039090180270360450SE +/- 0.72, N = 3SE +/- 0.42, N = 3417.48413.211. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

NCNN

Target: CPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDet0xd0003a50xd0003903691215SE +/- 0.07, N = 3SE +/- 0.64, N = 39.7110.20MIN: 9.22 / MAX: 27.29MIN: 9.01 / MAX: 500.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformer0xd0003a50xd0003901122334455SE +/- 0.31, N = 3SE +/- 1.17, N = 345.2346.92MIN: 43.43 / MAX: 73.39MIN: 43.11 / MAX: 881.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400m0xd0003a50xd0003901020304050SE +/- 0.50, N = 3SE +/- 7.69, N = 338.8545.54MIN: 37.13 / MAX: 233.44MIN: 36.01 / MAX: 3343.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssd0xd0003a50xd00039048121620SE +/- 0.25, N = 3SE +/- 0.12, N = 316.1015.34MIN: 15.43 / MAX: 48.02MIN: 14.63 / MAX: 39.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tiny0xd0003a50xd000390612182430SE +/- 0.17, N = 3SE +/- 0.22, N = 324.1023.71MIN: 23.25 / MAX: 51.47MIN: 22.57 / MAX: 46.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet500xd0003a50xd000390510152025SE +/- 0.69, N = 3SE +/- 0.52, N = 318.5117.15MIN: 17.32 / MAX: 299.93MIN: 16.19 / MAX: 41.831. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnet0xd0003a50xd0003901.22852.4573.68554.9146.1425SE +/- 0.15, N = 3SE +/- 0.16, N = 35.465.39MIN: 5.01 / MAX: 29.14MIN: 4.83 / MAX: 151.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet180xd0003a50xd0003903691215SE +/- 0.09, N = 2SE +/- 0.14, N = 39.428.97MIN: 9.21 / MAX: 32.8MIN: 8.63 / MAX: 27.271. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg160xd0003a50xd000390612182430SE +/- 0.19, N = 3SE +/- 0.25, N = 325.7123.86MIN: 24.88 / MAX: 62.96MIN: 23.05 / MAX: 47.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenet0xd0003a50xd00039048121620SE +/- 0.48, N = 3SE +/- 0.17, N = 316.5815.36MIN: 15.29 / MAX: 39.63MIN: 14.6 / MAX: 182.221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazeface0xd0003a50xd0003901.02152.0433.06454.0865.1075SE +/- 0.07, N = 3SE +/- 0.06, N = 34.544.35MIN: 4.37 / MAX: 5.17MIN: 4.16 / MAX: 4.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b00xd0003a50xd0003903691215SE +/- 0.11, N = 3SE +/- 0.38, N = 311.7111.62MIN: 11.15 / MAX: 19.87MIN: 10.82 / MAX: 21.031. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnet0xd0003a50xd000390246810SE +/- 0.04, N = 3SE +/- 0.10, N = 37.417.57MIN: 7.15 / MAX: 31.17MIN: 7.28 / MAX: 30.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v20xd0003a50xd0003903691215SE +/- 0.13, N = 3SE +/- 0.10, N = 39.769.89MIN: 9.32 / MAX: 33.4MIN: 9.61 / MAX: 33.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v30xd0003a50xd000390246810SE +/- 0.12, N = 3SE +/- 0.05, N = 38.768.88MIN: 8.3 / MAX: 32.6MIN: 8.69 / MAX: 32.331. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v20xd0003a50xd000390246810SE +/- 0.04, N = 3SE +/- 0.09, N = 37.968.03MIN: 7.75 / MAX: 31.75MIN: 7.8 / MAX: 31.341. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenet0xd0003a50xd00039048121620SE +/- 0.07, N = 3SE +/- 0.09, N = 315.6615.46MIN: 15.24 / MAX: 38.75MIN: 14.85 / MAX: 106.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fast0xd0003a50xd0003901.28752.5753.86255.156.4375SE +/- 0.029, N = 3SE +/- 0.033, N = 35.7055.7221. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_time0xd0003a50xd000390612182430SE +/- 0.01, N = 3SE +/- 0.11, N = 316.4624.75

TensorFlow

Device: CPU - Batch Size: 256 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNet0xd0003a50xd00039070140210280350SE +/- 2.10, N = 3SE +/- 1.89, N = 3321.72309.63

Cpuminer-Opt

Algorithm: Myriad-Groestl

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-Groestl0xd0003a50xd0003909K18K27K36K45KSE +/- 406.32, N = 3SE +/- 386.00, N = 1543450431271. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.mesh0xd0003a50xd00039080160240320400SE +/- 0.80, N = 3SE +/- 0.23, N = 3386.08385.891. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd000390110220330440550SE +/- 2.82, N = 3SE +/- 7.19, N = 3521.74524.38MIN: 505.72MIN: 499.751. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

simdjson

Throughput Test: PartialTweets

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweets0xd0003a50xd0003901.07332.14663.21994.29325.3665SE +/- 0.01, N = 3SE +/- 0.01, N = 34.774.621. (CXX) g++ options: -O3

simdjson

Throughput Test: DistinctUserID

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserID0xd0003a50xd0003901.28482.56963.85445.13926.424SE +/- 0.00, N = 3SE +/- 0.02, N = 35.715.521. (CXX) g++ options: -O3

simdjson

Throughput Test: TopTweet

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweet0xd0003a50xd0003901.29382.58763.88145.17526.469SE +/- 0.01, N = 3SE +/- 0.03, N = 35.755.601. (CXX) g++ options: -O3

TensorFlow

Device: CPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNet0xd0003a50xd0003902004006008001000SE +/- 2.61, N = 3SE +/- 3.77, N = 3839.41760.15

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 1.28, N = 4SE +/- 0.07, N = 3100.1993.54

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039090180270360450SE +/- 4.97, N = 4SE +/- 0.21, N = 3398.72426.95

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003a50xd00039030060090012001500SE +/- 0.81, N = 3SE +/- 2.35, N = 31496.191490.76MIN: 1043.7 / MAX: 1711.44MIN: 1074.22 / MAX: 1692.481. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003a50xd0003903691215SE +/- 0.01, N = 3SE +/- 0.02, N = 313.2513.291. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003a50xd00039030060090012001500SE +/- 0.84, N = 3SE +/- 0.52, N = 31519.841517.69MIN: 1074.08 / MAX: 1721.59MIN: 1081.6 / MAX: 1690.61. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003a50xd0003903691215SE +/- 0.01, N = 3SE +/- 0.00, N = 313.0413.031. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003a50xd0003902004006008001000SE +/- 0.71, N = 3SE +/- 0.42, N = 3823.09827.51MIN: 550.41 / MAX: 926.21MIN: 628.21 / MAX: 980.481. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003a50xd000390612182430SE +/- 0.02, N = 3SE +/- 0.01, N = 324.1824.041. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003a50xd00039050100150200250SE +/- 0.14, N = 3SE +/- 0.03, N = 3209.02209.31MIN: 152.83 / MAX: 234.28MIN: 160.46 / MAX: 249.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003a50xd00039020406080100SE +/- 0.07, N = 3SE +/- 0.01, N = 395.5495.421. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003a50xd00039020406080100SE +/- 0.19, N = 3SE +/- 0.22, N = 378.1879.47MIN: 66.08 / MAX: 194.38MIN: 62.57 / MAX: 232.011. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003a50xd00039060120180240300SE +/- 0.63, N = 3SE +/- 0.68, N = 3255.09251.001. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003a50xd0003903691215SE +/- 0.02, N = 3SE +/- 0.01, N = 39.639.77MIN: 8.33 / MAX: 19.33MIN: 7.83 / MAX: 20.011. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003a50xd000390400800120016002000SE +/- 3.75, N = 3SE +/- 1.77, N = 32070.722039.631. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Faster0xd0003a50xd0003903691215SE +/- 0.06, N = 3SE +/- 0.09, N = 310.4210.361. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003a50xd000390816243240SE +/- 0.06, N = 3SE +/- 0.05, N = 333.6333.89MIN: 29.54 / MAX: 113.55MIN: 29.83 / MAX: 113.131. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003a50xd0003905001000150020002500SE +/- 3.91, N = 3SE +/- 3.01, N = 32362.842344.971. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003a50xd0003900.2610.5220.7831.0441.305SE +/- 0.00, N = 3SE +/- 0.00, N = 31.161.16MIN: 0.86 / MAX: 17.98MIN: 0.88 / MAX: 12.611. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003a50xd00039015K30K45K60K75KSE +/- 29.72, N = 3SE +/- 17.48, N = 367754.3467604.001. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901020304050SE +/- 0.10, N = 3SE +/- 0.07, N = 345.9637.99

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003902004006008001000SE +/- 1.83, N = 3SE +/- 1.88, N = 3869.041051.79

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003a50xd000390246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.488.50MIN: 7.15 / MAX: 22.53MIN: 7.17 / MAX: 18.391. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003a50xd0003902K4K6K8K10KSE +/- 2.19, N = 3SE +/- 3.74, N = 39419.779396.521. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003a50xd0003900.29930.59860.89791.19721.4965SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.33MIN: 0.97 / MAX: 13.43MIN: 0.97 / MAX: 131. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003a50xd00039013K26K39K52K65KSE +/- 18.68, N = 3SE +/- 17.38, N = 359377.9659274.061. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003a50xd0003901.01482.02963.04444.05925.074SE +/- 0.00, N = 3SE +/- 0.00, N = 34.494.51MIN: 4.04 / MAX: 15.85MIN: 4.02 / MAX: 13.951. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003a50xd00039010002000300040005000SE +/- 2.64, N = 3SE +/- 3.29, N = 34442.984419.171. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003a50xd00039048121620SE +/- 0.02, N = 3SE +/- 0.01, N = 317.8717.80MIN: 12.24 / MAX: 38.45MIN: 12.83 / MAX: 32.641. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003a50xd0003902004006008001000SE +/- 1.14, N = 3SE +/- 0.88, N = 31117.091121.561. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SVT-HEVC

Tuning: 1 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 1 - Input: Bosphorus 4K0xd0003a50xd0003903691215SE +/- 0.01, N = 3SE +/- 0.06, N = 310.4510.461. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

simdjson

Throughput Test: Kostya

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: Kostya0xd0003a50xd0003900.64581.29161.93742.58323.229SE +/- 0.00, N = 3SE +/- 0.00, N = 32.872.611. (CXX) g++ options: -O3

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390100200300400500SE +/- 1.39, N = 3SE +/- 1.00, N = 3478.85474.69

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 0.18, N = 3SE +/- 0.32, N = 383.2084.02

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390510152025SE +/- 0.03, N = 3SE +/- 0.01, N = 319.3117.35

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003905001000150020002500SE +/- 3.15, N = 3SE +/- 1.14, N = 32068.302301.16

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_time0xd0003a50xd000390510152025SE +/- 0.03, N = 3SE +/- 0.08, N = 318.4620.58

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_time0xd0003a50xd000390510152025SE +/- 0.02, N = 3SE +/- 0.05, N = 318.8921.08

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390306090120150SE +/- 0.60, N = 3SE +/- 0.25, N = 3136.18129.81

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039070140210280350SE +/- 1.05, N = 3SE +/- 0.57, N = 3293.12307.87

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390120240360480600SE +/- 1.19, N = 3SE +/- 0.77, N = 3553.27551.12

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901632486480SE +/- 0.10, N = 3SE +/- 0.09, N = 371.8972.07

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390120240360480600SE +/- 1.12, N = 3SE +/- 0.62, N = 3555.09551.82

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901632486480SE +/- 0.12, N = 3SE +/- 0.15, N = 371.5472.18

simdjson

Throughput Test: LargeRandom

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: LargeRandom0xd0003a50xd0003900.2160.4320.6480.8641.08SE +/- 0.00, N = 3SE +/- 0.00, N = 30.960.851. (CXX) g++ options: -O3

VP9 libvpx Encoding

Speed: Speed 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.13Speed: Speed 5 - Input: Bosphorus 4K0xd0003a50xd0003903691215SE +/- 0.13, N = 3SE +/- 0.12, N = 312.3312.631. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003a50xd00039020406080100SE +/- 0.38, N = 3SE +/- 0.39, N = 3104.05101.081. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003a50xd0003906001200180024003000SE +/- 9.53, N = 3SE +/- 9.78, N = 32601.212526.891. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

SPECFEM3D

Model: Water-layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered Halfspace0xd0003a50xd000390714212835SE +/- 0.31, N = 5SE +/- 0.24, N = 331.3831.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003900.58341.16681.75022.33362.917SE +/- 0.02489, N = 15SE +/- 0.03265, N = 152.574372.59271MIN: 1.99MIN: 1.911. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003904080120160200SE +/- 1.64, N = 3SE +/- 0.34, N = 3172.56173.79

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039050100150200250SE +/- 2.37, N = 3SE +/- 0.46, N = 3231.43230.05

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901020304050SE +/- 0.06, N = 3SE +/- 0.53, N = 342.9443.30

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003902004006008001000SE +/- 1.22, N = 3SE +/- 11.47, N = 3930.75922.93

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point Problem0xd0003a50xd00039060120180240300SE +/- 0.94, N = 3SE +/- 0.32, N = 3256.87256.271. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901428425670SE +/- 0.78, N = 3SE +/- 0.23, N = 364.4462.01

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390140280420560700SE +/- 7.47, N = 3SE +/- 2.45, N = 3620.25644.44

QMCPACK

Input: simple-H2O

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: simple-H2O0xd0003a50xd000390918273645SE +/- 0.02, N = 3SE +/- 0.12, N = 341.2539.561. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

TensorFlow

Device: CPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNet0xd0003a50xd0003902004006008001000SE +/- 4.27, N = 3SE +/- 2.99, N = 3781.25723.27

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 0.29, N = 3SE +/- 0.07, N = 396.9494.61

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039090180270360450SE +/- 1.24, N = 3SE +/- 0.36, N = 3412.05422.12

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390918273645SE +/- 0.10, N = 3SE +/- 0.06, N = 340.4539.73

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003902004006008001000SE +/- 2.17, N = 3SE +/- 1.41, N = 3987.551005.58

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390918273645SE +/- 0.01, N = 3SE +/- 0.02, N = 339.4139.70

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003902004006008001000SE +/- 0.31, N = 3SE +/- 0.56, N = 31013.821006.34

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901.26472.52943.79415.05886.3235SE +/- 0.0018, N = 3SE +/- 0.0056, N = 35.62105.2210

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039016003200480064008000SE +/- 2.44, N = 3SE +/- 8.14, N = 37092.507633.26

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 640xd0003a50xd00039030060090012001500SE +/- 13.09, N = 15SE +/- 8.60, N = 31177.11098.81. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

SPECFEM3D

Model: Layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered Halfspace0xd0003a50xd000390714212835SE +/- 0.18, N = 3SE +/- 0.05, N = 329.3729.501. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Only0xd0003a50xd000390714212835SE +/- 0.02, N = 3SE +/- 0.06, N = 330.9030.74

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 320xd0003a50xd000390130260390520650SE +/- 4.92, N = 3SE +/- 8.71, N = 15609.2604.71. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Cpuminer-Opt

Algorithm: LBC, LBRY Credits

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY Credits0xd0003a50xd00039090K180K270K360K450KSE +/- 860.95, N = 3SE +/- 313.42, N = 34231304216601. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: scrypt

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scrypt0xd0003a50xd0003905001000150020002500SE +/- 8.35, N = 3SE +/- 1.08, N = 32321.742319.311. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Skeincoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Skeincoin0xd0003a50xd000390130K260K390K520K650KSE +/- 3788.20, N = 3SE +/- 1652.89, N = 36171306133331. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Blake-2 S

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 S0xd0003a50xd0003901000K2000K3000K4000K5000KSE +/- 8676.85, N = 3SE +/- 8325.80, N = 3446665344623271. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Magi

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Magi0xd0003a50xd0003905001000150020002500SE +/- 1.08, N = 3SE +/- 3.91, N = 32308.662309.471. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Deepcoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Deepcoin0xd0003a50xd00039014K28K42K56K70KSE +/- 187.02, N = 3SE +/- 89.69, N = 364897646771. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Triple SHA-256, Onecoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Triple SHA-256, Onecoin0xd0003a50xd000390300K600K900K1200K1500KSE +/- 7235.75, N = 3SE +/- 7105.40, N = 3133311713322371. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: x25x

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25x0xd0003a50xd0003906001200180024003000SE +/- 5.54, N = 3SE +/- 4.75, N = 32659.552659.171. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Quad SHA-256, Pyrite

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, Pyrite0xd0003a50xd000390200K400K600K800K1000KSE +/- 1690.96, N = 3SE +/- 3352.41, N = 39262779217301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bare0xd0003a50xd0003903691215SE +/- 0.026, N = 3SE +/- 0.021, N = 39.0949.2341. (CXX) g++ options: -O3

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Only0xd0003a50xd000390612182430SE +/- 0.03, N = 3SE +/- 0.06, N = 323.7223.83

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003900.88051.7612.64153.5224.4025SE +/- 0.00080, N = 3SE +/- 0.00202, N = 33.913223.90360MIN: 3.69MIN: 3.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

SPECFEM3D

Model: Homogeneous Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous Halfspace0xd0003a50xd00039048121620SE +/- 0.16, N = 3SE +/- 0.15, N = 317.7518.021. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

dav1d

Video Input: Chimera 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Chimera 1080p0xd0003a50xd000390110220330440550SE +/- 0.51, N = 3SE +/- 0.80, N = 3514.58515.811. (CC) gcc options: -pthread -lm

SPECFEM3D

Model: Tomographic Model

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic Model0xd0003a50xd00039048121620SE +/- 0.15, N = 3SE +/- 0.02, N = 314.1514.571. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Mount St. Helens

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. Helens0xd0003a50xd0003903691215SE +/- 0.12, N = 3SE +/- 0.18, N = 312.9513.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only0xd0003a50xd0003900.32850.6570.98551.3141.6425SE +/- 0.00, N = 3SE +/- 0.00, N = 31.461.46

dav1d

Video Input: Summer Nature 4K

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Summer Nature 4K0xd0003a50xd00039060120180240300SE +/- 0.81, N = 3SE +/- 1.06, N = 3280.84281.361. (CC) gcc options: -pthread -lm

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 8 - Input: Bosphorus 4K0xd0003a50xd0003901530456075SE +/- 0.46, N = 3SE +/- 0.47, N = 366.4667.171. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Remhos

Test: Sample Remap Example

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap Example0xd0003a50xd0003903691215SE +/- 0.04, N = 3SE +/- 0.03, N = 312.4012.251. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

CloverLeaf

Lagrangian-Eulerian Hydrodynamics

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeafLagrangian-Eulerian Hydrodynamics0xd0003a50xd0003903691215SE +/- 0.09, N = 3SE +/- 0.06, N = 311.9812.041. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Direction0xd0003a50xd0003903691215SE +/- 0.02, N = 3SE +/- 0.02, N = 311.0011.021. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003a50xd00039020406080100SE +/- 0.44, N = 3SE +/- 0.14, N = 394.9094.141. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003a50xd0003905001000150020002500SE +/- 10.90, N = 3SE +/- 3.60, N = 32372.422353.391. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Crown0xd0003a50xd00039020406080100SE +/- 0.32, N = 3SE +/- 0.07, N = 383.4688.19MIN: 80.27 / MAX: 87.69MIN: 85.24 / MAX: 92.72

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only0xd0003a50xd0003900.68181.36362.04542.72723.409SE +/- 0.00, N = 3SE +/- 0.00, N = 33.033.03

Embree

Binary: Pathtracer ISPC - Model: Asian Dragon

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon0xd0003a50xd00039020406080100SE +/- 0.19, N = 3SE +/- 0.43, N = 3101.10104.68MIN: 98.59 / MAX: 105.53MIN: 101.9 / MAX: 109.48

SVT-HEVC

Tuning: 7 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 4K0xd0003a50xd000390306090120150SE +/- 0.44, N = 3SE +/- 0.60, N = 3138.49138.751. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003900.46570.93141.39711.86282.3285SE +/- 0.00191, N = 3SE +/- 0.00038, N = 32.069672.06936MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

SVT-HEVC

Tuning: 10 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 4K0xd0003a50xd0003904080120160200SE +/- 0.38, N = 3SE +/- 2.03, N = 3182.74184.381. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-AV1

Encoder Mode: Preset 12 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 12 - Input: Bosphorus 4K0xd0003a50xd0003904080120160200SE +/- 1.20, N = 3SE +/- 1.29, N = 3177.72180.971. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 13 - Input: Bosphorus 4K0xd0003a50xd0003904080120160200SE +/- 2.02, N = 3SE +/- 0.79, N = 3177.20175.101. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003a50xd0003901122334455SE +/- 0.61, N = 3SE +/- 0.25, N = 346.9846.351. (CXX) g++ options: -O3

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003900.81571.63142.44713.26284.0785SE +/- 0.00622, N = 3SE +/- 0.00896, N = 33.622663.62526MIN: 3.53MIN: 3.541. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003a50xd00039020406080100SE +/- 1.13, N = 3SE +/- 1.09, N = 394.8693.751. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003a50xd00039020406080100SE +/- 0.20, N = 3SE +/- 0.27, N = 3102.92101.981. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003a50xd00039050100150200250SE +/- 2.85, N = 3SE +/- 2.25, N = 3226.78224.421. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003a50xd0003904080120160200SE +/- 1.01, N = 3SE +/- 1.53, N = 5159.10154.731. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003a50xd000390306090120150SE +/- 1.71, N = 3SE +/- 1.81, N = 4153.79144.941. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003a50xd00039020406080100SE +/- 0.28, N = 3SE +/- 0.15, N = 393.9293.091. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003a50xd0003904080120160200SE +/- 1.08, N = 3SE +/- 0.92, N = 3198.87195.201. (CXX) g++ options: -O3


Phoronix Test Suite v10.8.5