10980XE onednn onnx

Intel Core i9-10980XE testing with a ASRock X299 Steel Legend (P1.30 BIOS) and NVIDIA GeForce GTX 1080 Ti 11GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2204024-PTS-10980XEO27&grs&sro.

10980XE onednn onnxProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionABCDIntel Core i9-10980XE @ 4.80GHz (18 Cores / 36 Threads)ASRock X299 Steel Legend (P1.30 BIOS)Intel Sky Lake-E DMI3 Registers32GBSamsung SSD 970 PRO 512GBNVIDIA GeForce GTX 1080 Ti 11GBRealtek ALC1220ASUS VP28UIntel I219-V + Intel I211Ubuntu 22.045.15.0-17-generic (x86_64)GNOME Shell 40.5X Server 1.20.13NVIDIA 495.464.6.0OpenCL 3.0 CUDA 11.5.1031.2.186GCC 11.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_cpufreq schedutil - CPU Microcode: 0x5003102Java Details- OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1)Python Details- Python 3.9.9Security Details- itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled

10980XE onednn onnxonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUspeedtest-cli: Internet Latencyonnx: super-resolution-10 - CPU - Standardfast-cli: Internet Latencyperf-bench: Sched Pipeonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonnx: ArcFace ResNet-100 - CPU - Standardonednn: Recurrent Neural Network Training - f32 - CPUonnx: fcn-resnet101-11 - CPU - Standardonnx: yolov4 - CPU - Standardfast-cli: Internet Upload Speedspeedtest-cli: Internet Download Speedonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUperf-bench: Epoll Waitfast-cli: Internet Loaded Latency (Bufferbloat)onednn: Deconvolution Batch shapes_1d - f32 - CPUspeedtest-cli: Internet Upload Speedonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUperf-bench: Futex Lock-Pionnx: GPT-2 - CPU - Standardfast-cli: Internet Download Speedonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUperf-bench: Memcpy 1MBonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonnx: GPT-2 - CPU - Parallelperf-bench: Memset 1MBonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Parallelperf-bench: Syscall Basiconednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonnx: ArcFace ResNet-100 - CPU - Parallelonednn: Convolution Batch Shapes Auto - f32 - CPUonnx: super-resolution-10 - CPU - Parallelonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonnx: yolov4 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Parallelperf-bench: Futex Hashjava-jmh: ThroughputABCD50.29969.3334523.87616888698235.2021205323584.21486656.7324.65116.514311577399.189.4730.7784236874937022.399683.898417.38256428356.5579966.05773219.08562715523516.127.001637.793787.069344.430426069.622990.48977821714054558.075855.929393.3073146424.0564589023.7444444101449439433246015808.70250.15539.4042415.487988487830136.3562153823366.81196636.9321.74135.4733061564109.8089.2234.5221215881237023.902188.358616.76649827247.5579568.32113618.698727353.622790.925.887139.048589.024644.216826549.222726.79117741722154458.289556.061393.432147424.384597523.7723448102449341533247767440.9890.7749191.6290728.7821311101426.99118304.96807.9280.18130.9583555664107.7148.430.8774222959539022.764589.515317.77937728624.4611469.50083619.668728513.622788.526.180437.782387.062645.834726033.722605.39237861714583257.245856.731293.275324.421824.0619447101451394433236796629.9254.02629.3523522.67710255107623731.2678201423371.51485696.8327.12118.1543520570112.6238.9533.3777231959136023.019589.009917.27710628758.8607268.90292318.944828429.223844.125.941537.553190.322644.399926872.523319.68977961750414658.436155.582791.696149124.3468596923.7889449101449908133246121586.081OpenBenchmarking.org

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUABCD122436486050.29960050.1553000.77491954.026200MIN: 9.93MIN: 10.62MIN: 10.841. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUABCD36912159.333459.404241.629079.35235MIN: 0.66MIN: 8.8MIN: 8.081. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

speedtest-cli

Internet Latency

OpenBenchmarking.orgms, Fewer Is Betterspeedtest-cli 2.1.3Internet LatencyABCD71421283523.8715.4928.7822.68

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardABD2K4K6K8K10K61689884102551. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

fast-cli

Internet Latency

OpenBenchmarking.orgms, Fewer Is Betterfast-cliInternet LatencyABCD3691215881310

perf-bench

Benchmark: Sched Pipe

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Sched PipeABCD20K40K60K80K100K8698278301111014762371. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUABCD81624324035.2036.3626.9931.27MIN: 15.38MIN: 25.91MIN: 1.4MIN: 11.611. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardABD4008001200160020002053153820141. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUABCD5K10K15K20K25K23584.223366.818304.923371.5MIN: 21773.9MIN: 19714.8MIN: 16470MIN: 20963.91. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: StandardABD3060901201501481191481. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPU - Executor: StandardABCD1503004506007506656636805691. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

fast-cli

Internet Upload Speed

OpenBenchmarking.orgMbit/s, More Is Betterfast-cliInternet Upload SpeedABCD2468106.76.97.96.8

speedtest-cli

Internet Download Speed

OpenBenchmarking.orgMbit/s, More Is Betterspeedtest-cli 2.1.3Internet Download SpeedABCD70140210280350324.65321.74280.18327.12

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUABCD306090120150116.51135.47130.96118.15MIN: 32.08MIN: 51.31MIN: 35.29MIN: 16.461. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

perf-bench

Benchmark: Epoll Wait

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Epoll WaitABCD8K16K24K32K40K311573061535556352051. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

fast-cli

Internet Loaded Latency (Bufferbloat)

OpenBenchmarking.orgms, Fewer Is Betterfast-cliInternet Loaded Latency (Bufferbloat)ABCD163248648073646470

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUABCD30609012015099.18109.81107.71112.62MIN: 6.42MIN: 24.19MIN: 23.72MIN: 65.191. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

speedtest-cli

Internet Upload Speed

OpenBenchmarking.orgMbit/s, More Is Betterspeedtest-cli 2.1.3Internet Upload SpeedABCD36912159.479.228.408.95

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUABCD81624324030.7834.5230.8833.38MIN: 21.88MIN: 16.34MIN: 10.41MIN: 15.351. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

perf-bench

Benchmark: Futex Lock-Pi

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex Lock-PiABCD501001502002502362152222311. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPU - Executor: StandardABCD2K4K6K8K10K87498812959595911. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

fast-cli

Internet Download Speed

OpenBenchmarking.orgMbit/s, More Is Betterfast-cliInternet Download SpeedABCD80160240320400370370390360

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUABCD61218243022.4023.9022.7623.02MIN: 17.08MIN: 19.53MIN: 19.46MIN: 19.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUABCD2040608010083.9088.3689.5289.01MIN: 35.65MIN: 57.57MIN: 54.63MIN: 64.341. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

perf-bench

Benchmark: Memcpy 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memcpy 1MBABCD4812162017.3816.7717.7817.281. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUABCD6K12K18K24K30K28356.527247.528624.428758.8MIN: 24027MIN: 22350MIN: 25457.4MIN: 237471. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPU - Executor: ParallelABCD1300260039005200650057995795611460721. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

perf-bench

Benchmark: Memset 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memset 1MBABCD153045607566.0668.3269.5068.901. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUABCD51015202519.0918.7019.6718.94MIN: 2.83MIN: 11.49MIN: 3.02MIN: 16.511. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUABCD6K12K18K24K30K27155.027353.628513.628429.2MIN: 21711.8MIN: 23193.9MIN: 24411.3MIN: 24528.91. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUABCD5K10K15K20K25K23516.122790.922788.523844.1MIN: 21274.9MIN: 19530.9MIN: 19335.2MIN: 22253.51. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUABCD61218243027.0025.8926.1825.94MIN: 12.07MIN: 11.87MIN: 12.79MIN: 12.271. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUABCD91827364537.7939.0537.7837.55MIN: 20.51MIN: 28.78MIN: 17.54MIN: 24.181. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUABCD2040608010087.0789.0287.0690.32MIN: 32.83MIN: 33.04MIN: 49.86MIN: 70.911. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUABCD102030405044.4344.2245.8344.40MIN: 19.44MIN: 32.09MIN: 24.2MIN: 6.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUABCD6K12K18K24K30K26069.626549.226033.726872.5MIN: 20933.3MIN: 21084.1MIN: 20193.5MIN: 201701. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUABCD5K10K15K20K25K22990.422726.722605.323319.6MIN: 19958.1MIN: 19307.6MIN: 19157.8MIN: 20370.91. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: StandardABCD20040060080010008979119238971. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: ParallelABCD20040060080010007827747867961. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

perf-bench

Benchmark: Syscall Basic

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Syscall BasicABCD4M8M12M16M20M171405451722154417145832175041461. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUABCD132639526558.0858.2957.2558.44MIN: 42.18MIN: 50.37MIN: 14.92MIN: 39.611. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUABCD132639526555.9356.0656.7355.58MIN: 29.18MIN: 16.51MIN: 32.47MIN: 30.281. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUABCD2040608010093.3193.4393.2891.70MIN: 46.42MIN: 43.74MIN: 59.18MIN: 42.741. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: ParallelABD300600900120015001464147414911. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUABCD61218243024.0624.3824.4224.35MIN: 12.14MIN: 14.3MIN: 14.77MIN: 14.811. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: ParallelABD130026003900520065005890597559691. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUABCD61218243023.7423.7724.0623.79MIN: 13.82MIN: 22.07MIN: 13.26MIN: 10.191. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPU - Executor: ParallelABCD1002003004005004444484474491. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelABCD204060801001011021011011. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

perf-bench

Benchmark: Futex Hash

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex HashABCD1000K2000K3000K4000K5000K44943944493415451394444990811. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma

Java JMH

Throughput

OpenBenchmarking.orgOps/s, More Is BetterJava JMHThroughputABCD7000M14000M21000M28000M35000M33246015808.7033247767440.9933236796629.9233246121586.08


Phoronix Test Suite v10.8.4