MBP M1 Max Machine Learning, sys76-kudu-ML

Apple M1 Max testing with a Apple MacBook Pro and Apple M1 Max on macOS 12.1 via the Phoronix Test Suite. sys76-kudu-ML: AMD Ryzen 9 5900HX testing with a System76 Kudu (1.07.09RSA1 BIOS) and AMD Cezanne on Pop 21.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2202161-NE-MBPM1MAXM40,2202165-NE-SYS76KUDU88&grr.

MBP M1 Max Machine Learning, sys76-kudu-MLProcessorMotherboardMemoryDiskGraphicsMonitorChipsetAudioNetworkOSKernelOpenCLCompilerFile-SystemScreen ResolutionDesktopDisplay ServerOpenGLVulkanMBP M1 Max Machine LearningML TestsApple M1 Max (10 Cores)Apple MacBook Pro64GB1859GBApple M1 MaxColor LCDmacOS 12.121.2.0 (arm64)OpenCL 1.2 (Nov 13 2021 00:45:09)GCC 13.0.0 + Clang 13.0.0APFS3456x2234AMD Ryzen 9 5900HX @ 3.30GHz (8 Cores / 16 Threads)System76 Kudu (1.07.09RSA1 BIOS)AMD Renoir/Cezanne16GBSamsung SSD 970 EVO Plus 500GBAMD Cezanne (2100/400MHz)AMD Renoir Radeon HD AudioRealtek RTL8125 2.5GbE + Intel Wi-Fi 6 AX200Pop 21.105.15.15-76051515-generic (x86_64)GNOME Shell 40.5X Server 1.20.134.6 Mesa 21.2.2 (LLVM 12.0.1)1.2.182GCC 11.2.0ext41920x1080OpenBenchmarking.orgEnvironment Details- MBP M1 Max Machine Learning: XPC_FLAGS=0x0Python Details- MBP M1 Max Machine Learning: Python 2.7.18 + Python 3.8.9- ML Tests: Python 3.9.7Kernel Details- ML Tests: Transparent Huge Pages: madviseCompiler Details- ML Tests: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- ML Tests: Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa50000cGraphics Details- ML Tests: GLAMOR - BAR1 / Visible vRAM Size: 512 MBSecurity Details- ML Tests: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected

MBP M1 Max Machine Learning, sys76-kudu-MLcaffe: GoogleNet - CPU - 1000lczero: BLASecp-candle: P3B1mnn: inception-v3mnn: mobilenet-v1-1.0mnn: MobileNetV2_224mnn: SqueezeNetV1.0mnn: resnet-v2-50mnn: squeezenetv1.1mnn: mobilenetV3caffe: AlexNet - CPU - 1000plaidml: No - Inference - ResNet 50 - CPUecp-candle: P3B2plaidml: No - Inference - VGG16 - CPUtnn: CPU - DenseNetonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUcaffe: GoogleNet - CPU - 200tensorflow-lite: Inception V4tensorflow-lite: Inception ResNet V2mlpack: scikit_qdanumpy: ncnn: CPU - regnety_400mncnn: CPU - squeezenet_ssdncnn: CPU - yolov4-tinyncnn: CPU - resnet50ncnn: CPU - alexnetncnn: CPU - resnet18ncnn: CPU - vgg16ncnn: CPU - googlenetncnn: CPU - blazefacencnn: CPU - efficientnet-b0ncnn: CPU - mnasnetncnn: CPU - shufflenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU - mobilenetncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetcaffe: GoogleNet - CPU - 100opencv: DNN - Deep Neural Networkcaffe: AlexNet - CPU - 200tensorflow-lite: SqueezeNettensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Quanttensorflow-lite: Mobilenet Floatmlpack: scikit_icaonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUmlpack: scikit_linearridgeregressiononednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUcaffe: AlexNet - CPU - 100deepspeech: CPUmlpack: scikit_svmrbenchmark: onednn: IP Shapes 1D - f32 - CPUtnn: CPU - MobileNet v2rnnoise: tnn: CPU - SqueezeNet v1.1ecp-candle: P1B2onednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUtnn: CPU - SqueezeNet v2onednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUopenvino: Face Detection 0106 FP16 - Intel GPUMBP M1 Max Machine LearningML Tests58.2538.20510.6779.96742.4287.2749.1527.1820.5330.2443.1629.9316.8271.0124.961.658.695.403.474.365.3320.327.1920.5530.3343.0829.8916.8070.8924.91.648.715.373.464.355.3020.308687585631463.72231.5762.4402.3874.54622.4412.8031.2023258846.88730.73612.472736.1732237.651736712749623247908065.69422.456.9018.5624.9725.1714.5515.7871.9713.741.205.223.252.753.413.9915.955.2815.3618.8213.126.326.0943.998.731.3510.073.893.024.693.8810.2786567137876598618976415218614117412781848.403577.003587.173579.002.102228.172219.133349674.4404317.600.12934.25855249.47716.137222.32637.518.347892.117711.629854.593432.9865912.09262.6921055.43423.767422.79266.745593.24784OpenBenchmarking.org

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 1000ML Tests200K400K600K800K1000KSE +/- 470.76, N = 38687581. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

LeelaChessZero

Backend: BLAS

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLASML Tests120240360480600SE +/- 5.14, N = 75631. (CXX) g++ options: -flto -pthread

ECP-CANDLE

Benchmark: P3B1

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.4Benchmark: P3B1ML Tests300600900120015001463.72

Mobile Neural Network

Model: inception-v3

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: inception-v3MBP M1 Max Machine LearningML Tests1326395265SE +/- 6.12, N = 9SE +/- 0.42, N = 358.2531.58-arch -isysroot - MIN: 30.46 / MAX: 200.21-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 29.6 / MAX: 48.321. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: mobilenet-v1-1.0

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: mobilenet-v1-1.0MBP M1 Max Machine LearningML Tests246810SE +/- 0.384, N = 9SE +/- 0.019, N = 38.2052.440-arch -isysroot - MIN: 4.27 / MAX: 48.5-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.17 / MAX: 181. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: MobileNetV2_224

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: MobileNetV2_224MBP M1 Max Machine LearningML Tests3691215SE +/- 0.187, N = 9SE +/- 0.018, N = 310.6772.387-arch -isysroot - MIN: 5.12 / MAX: 61.59-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.24 / MAX: 17.041. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: SqueezeNetV1.0

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: SqueezeNetV1.0MBP M1 Max Machine LearningML Tests3691215SE +/- 0.664, N = 9SE +/- 0.040, N = 39.9674.546-arch -isysroot - MIN: 4.34 / MAX: 49.52-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 4.32 / MAX: 20.481. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: resnet-v2-50

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: resnet-v2-50MBP M1 Max Machine LearningML Tests1020304050SE +/- 4.17, N = 9SE +/- 0.09, N = 342.4322.44-arch -isysroot - MIN: 24 / MAX: 197.77-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 21.5 / MAX: 43.071. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: squeezenetv1.1

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: squeezenetv1.1MBP M1 Max Machine LearningML Tests246810SE +/- 0.345, N = 9SE +/- 0.009, N = 37.2742.803-arch -isysroot - MIN: 2.75 / MAX: 117.92-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.6 / MAX: 17.211. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: mobilenetV3

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: mobilenetV3MBP M1 Max Machine LearningML Tests3691215SE +/- 0.487, N = 9SE +/- 0.005, N = 39.1521.202-arch -isysroot - MIN: 3.37 / MAX: 58.791. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 1000ML Tests70K140K210K280K350KSE +/- 469.87, N = 33258841. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

PlaidML

FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: ResNet 50 - Device: CPUML Tests246810SE +/- 0.02, N = 36.88

ECP-CANDLE

Benchmark: P3B2

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.4Benchmark: P3B2ML Tests160320480640800730.74

PlaidML

FP16: No - Mode: Inference - Network: VGG16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: VGG16 - Device: CPUML Tests3691215SE +/- 0.07, N = 312.47

TNN

Target: CPU - Model: DenseNet

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: DenseNetML Tests6001200180024003000SE +/- 0.83, N = 32736.17MIN: 2687.97 / MAX: 2827.521. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUML Tests5001000150020002500SE +/- 14.65, N = 142237.65MIN: 2174.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 200ML Tests40K80K120K160K200KSE +/- 318.11, N = 31736711. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

TensorFlow Lite

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception V4ML Tests600K1200K1800K2400K3000KSE +/- 1719.91, N = 32749623

TensorFlow Lite

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception ResNet V2ML Tests500K1000K1500K2000K2500KSE +/- 1189.89, N = 32479080

Mlpack Benchmark

Benchmark: scikit_qda

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_qdaML Tests1530456075SE +/- 0.03, N = 365.69

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkML Tests90180270360450SE +/- 0.84, N = 3422.45

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: regnety_400mMBP M1 Max Machine LearningML Tests246810SE +/- 0.00, N = 3SE +/- 0.04, N = 37.186.90-arch -isysroot - MIN: 7.14 / MAX: 8.13-rdynamic -lgomp -lpthread - MIN: 6.35 / MAX: 21.531. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: squeezenet_ssdMBP M1 Max Machine LearningML Tests510152025SE +/- 0.05, N = 3SE +/- 0.15, N = 320.5318.56-arch -isysroot - MIN: 20.37 / MAX: 21.53-rdynamic -lgomp -lpthread - MIN: 17.64 / MAX: 34.931. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: yolov4-tinyMBP M1 Max Machine LearningML Tests714212835SE +/- 0.03, N = 3SE +/- 0.12, N = 330.2424.97-arch -isysroot - MIN: 29.85 / MAX: 31.87-rdynamic -lgomp -lpthread - MIN: 23.9 / MAX: 38.981. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: resnet50MBP M1 Max Machine LearningML Tests1020304050SE +/- 0.07, N = 3SE +/- 0.06, N = 343.1625.17-arch -isysroot - MIN: 42.92 / MAX: 44.81-rdynamic -lgomp -lpthread - MIN: 23.91 / MAX: 41.271. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: alexnetMBP M1 Max Machine LearningML Tests714212835SE +/- 0.05, N = 3SE +/- 0.06, N = 329.9314.55-arch -isysroot - MIN: 29.79 / MAX: 31.03-rdynamic -lgomp -lpthread - MIN: 13.9 / MAX: 33.491. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: resnet18MBP M1 Max Machine LearningML Tests48121620SE +/- 0.04, N = 3SE +/- 0.43, N = 316.8215.78-arch -isysroot - MIN: 16.69 / MAX: 17.58-rdynamic -lgomp -lpthread - MIN: 14.59 / MAX: 30.811. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: vgg16MBP M1 Max Machine LearningML Tests1632486480SE +/- 0.15, N = 3SE +/- 0.16, N = 371.0171.97-arch -isysroot - MIN: 70.58 / MAX: 74.44-rdynamic -lgomp -lpthread - MIN: 69.95 / MAX: 94.761. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: googlenetMBP M1 Max Machine LearningML Tests612182430SE +/- 0.07, N = 3SE +/- 0.28, N = 324.9613.74-arch -isysroot - MIN: 24.82 / MAX: 25.91-rdynamic -lgomp -lpthread - MIN: 12.47 / MAX: 28.561. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: blazefaceMBP M1 Max Machine LearningML Tests0.37130.74261.11391.48521.8565SE +/- 0.01, N = 3SE +/- 0.01, N = 31.651.20-arch -isysroot - MIN: 1.64 / MAX: 1.72-rdynamic -lgomp -lpthread - MIN: 1.16 / MAX: 1.781. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: efficientnet-b0MBP M1 Max Machine LearningML Tests246810SE +/- 0.04, N = 3SE +/- 0.01, N = 38.695.22-arch -isysroot - MIN: 8.59 / MAX: 9.15-rdynamic -lgomp -lpthread - MIN: 4.86 / MAX: 20.631. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: mnasnetMBP M1 Max Machine LearningML Tests1.2152.433.6454.866.075SE +/- 0.03, N = 3SE +/- 0.03, N = 35.403.25-arch -isysroot - MIN: 5.35 / MAX: 5.68-rdynamic -lgomp -lpthread - MIN: 2.82 / MAX: 16.821. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: shufflenet-v2MBP M1 Max Machine LearningML Tests0.78081.56162.34243.12323.904SE +/- 0.02, N = 3SE +/- 0.04, N = 33.472.75-arch -isysroot - MIN: 3.43 / MAX: 3.84-rdynamic -lgomp -lpthread - MIN: 2.48 / MAX: 16.131. (CXX) g++ options: -O3

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU-v3-v3 - Model: mobilenet-v3MBP M1 Max Machine LearningML Tests0.9811.9622.9433.9244.905SE +/- 0.03, N = 3SE +/- 0.02, N = 34.363.41-arch -isysroot - MIN: 4.32 / MAX: 4.61-rdynamic -lgomp -lpthread - MIN: 3.11 / MAX: 17.471. (CXX) g++ options: -O3

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU-v2-v2 - Model: mobilenet-v2MBP M1 Max Machine LearningML Tests1.19932.39863.59794.79725.9965SE +/- 0.03, N = 3SE +/- 0.02, N = 35.333.99-arch -isysroot - MIN: 5.27 / MAX: 5.61-rdynamic -lgomp -lpthread - MIN: 3.71 / MAX: 19.111. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: mobilenetMBP M1 Max Machine LearningML Tests510152025SE +/- 0.02, N = 3SE +/- 0.09, N = 320.3215.95-arch -isysroot - MIN: 20.23 / MAX: 21.33-rdynamic -lgomp -lpthread - MIN: 14.92 / MAX: 35.661. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: regnety_400mMBP M1 Max Machine LearningML Tests246810SE +/- 0.00, N = 3SE +/- 0.06, N = 37.195.28-arch -isysroot - MIN: 7.15 / MAX: 7.72-rdynamic -lgomp -lpthread - MIN: 4.68 / MAX: 6.441. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: squeezenet_ssdMBP M1 Max Machine LearningML Tests510152025SE +/- 0.05, N = 3SE +/- 0.36, N = 320.5515.36-arch -isysroot - MIN: 20.39 / MAX: 22.13-rdynamic -lgomp -lpthread - MIN: 14.17 / MAX: 22.531. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: yolov4-tinyMBP M1 Max Machine LearningML Tests714212835SE +/- 0.07, N = 3SE +/- 0.42, N = 330.3318.82-arch -isysroot - MIN: 29.85 / MAX: 32.58-rdynamic -lgomp -lpthread - MIN: 17.12 / MAX: 24.451. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: resnet50MBP M1 Max Machine LearningML Tests1020304050SE +/- 0.01, N = 3SE +/- 0.06, N = 343.0813.12-arch -isysroot - MIN: 42.9 / MAX: 45.66-rdynamic -lgomp -lpthread - MIN: 12.27 / MAX: 15.041. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: alexnetMBP M1 Max Machine LearningML Tests714212835SE +/- 0.00, N = 3SE +/- 0.03, N = 329.896.32-arch -isysroot - MIN: 29.79 / MAX: 31.07-rdynamic -lgomp -lpthread - MIN: 5.95 / MAX: 7.491. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: resnet18MBP M1 Max Machine LearningML Tests48121620SE +/- 0.01, N = 3SE +/- 0.08, N = 316.806.09-arch -isysroot - MIN: 16.69 / MAX: 18.25-rdynamic -lgomp -lpthread - MIN: 5.63 / MAX: 7.521. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: vgg16MBP M1 Max Machine LearningML Tests1632486480SE +/- 0.02, N = 3SE +/- 0.07, N = 370.8943.99-arch -isysroot - MIN: 70.59 / MAX: 73.62-rdynamic -lgomp -lpthread - MIN: 43.17 / MAX: 45.591. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: googlenetMBP M1 Max Machine LearningML Tests612182430SE +/- 0.00, N = 3SE +/- 0.25, N = 324.908.73-arch -isysroot - MIN: 24.82 / MAX: 25.79-rdynamic -lgomp -lpthread - MIN: 7.89 / MAX: 10.641. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: blazefaceMBP M1 Max Machine LearningML Tests0.3690.7381.1071.4761.845SE +/- 0.00, N = 3SE +/- 0.01, N = 31.641.35-arch -isysroot - MIN: 1.63 / MAX: 1.79-rdynamic -lgomp -lpthread - MIN: 1.17 / MAX: 2.431. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: efficientnet-b0MBP M1 Max Machine LearningML Tests3691215SE +/- 0.02, N = 3SE +/- 0.06, N = 38.7110.07-arch -isysroot - MIN: 8.6 / MAX: 9.43-rdynamic -lgomp -lpthread - MIN: 9.06 / MAX: 11.431. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: mnasnetMBP M1 Max Machine LearningML Tests1.20832.41663.62494.83326.0415SE +/- 0.00, N = 3SE +/- 0.11, N = 25.373.89-arch -isysroot - MIN: 5.35 / MAX: 5.62-rdynamic -lgomp -lpthread - MIN: 3.56 / MAX: 5.011. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: shufflenet-v2MBP M1 Max Machine LearningML Tests0.77851.5572.33553.1143.8925SE +/- 0.01, N = 3SE +/- 0.07, N = 33.463.02-arch -isysroot - MIN: 3.44 / MAX: 3.82-rdynamic -lgomp -lpthread - MIN: 2.54 / MAX: 4.381. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3MBP M1 Max Machine LearningML Tests1.05532.11063.16594.22125.2765SE +/- 0.00, N = 3SE +/- 0.12, N = 34.354.69-arch -isysroot - MIN: 4.32 / MAX: 4.63-rdynamic -lgomp -lpthread - MIN: 4.29 / MAX: 5.941. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2MBP M1 Max Machine LearningML Tests1.19252.3853.57754.775.9625SE +/- 0.01, N = 3SE +/- 0.06, N = 35.303.88-arch -isysroot - MIN: 5.28 / MAX: 5.98-rdynamic -lgomp -lpthread - MIN: 3.49 / MAX: 5.251. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: mobilenetMBP M1 Max Machine LearningML Tests510152025SE +/- 0.02, N = 3SE +/- 0.09, N = 320.3010.27-arch -isysroot - MIN: 20.23 / MAX: 21.48-rdynamic -lgomp -lpthread - MIN: 9.59 / MAX: 17.841. (CXX) g++ options: -O3

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 100ML Tests20K40K60K80K100KSE +/- 103.35, N = 3865671. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

OpenCV

Test: DNN - Deep Neural Network

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.5.4Test: DNN - Deep Neural NetworkML Tests3K6K9K12K15KSE +/- 269.19, N = 15137871. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 200ML Tests14K28K42K56K70KSE +/- 167.60, N = 3659861. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

TensorFlow Lite

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: SqueezeNetML Tests40K80K120K160K200KSE +/- 108.90, N = 3189764

TensorFlow Lite

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: NASNet MobileML Tests30K60K90K120K150KSE +/- 344.97, N = 3152186

TensorFlow Lite

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet QuantML Tests30K60K90K120K150KSE +/- 38.25, N = 3141174

TensorFlow Lite

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet FloatML Tests30K60K90K120K150KSE +/- 174.79, N = 3127818

Mlpack Benchmark

Benchmark: scikit_ica

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_icaML Tests1122334455SE +/- 0.12, N = 348.40

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUML Tests8001600240032004000SE +/- 4.93, N = 33577.00MIN: 3514.721. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUML Tests8001600240032004000SE +/- 4.16, N = 33587.17MIN: 3527.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUML Tests8001600240032004000SE +/- 7.39, N = 33579.00MIN: 3519.931. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Mlpack Benchmark

Benchmark: scikit_linearridgeregression

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_linearridgeregressionML Tests0.47250.9451.41751.892.3625SE +/- 0.01, N = 32.10

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUML Tests5001000150020002500SE +/- 6.34, N = 32228.17MIN: 2189.621. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUML Tests5001000150020002500SE +/- 1.17, N = 32219.13MIN: 2182.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 100ML Tests7K14K21K28K35KSE +/- 37.32, N = 3334961. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

DeepSpeech

Acceleration: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPUML Tests20406080100SE +/- 0.17, N = 374.44

Mlpack Benchmark

Benchmark: scikit_svm

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_svmML Tests48121620SE +/- 0.02, N = 317.60

R Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterR BenchmarkML Tests0.02910.05820.08730.11640.1455SE +/- 0.0003, N = 30.12931. R scripting front-end version 4.0.4 (2021-02-15)

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUML Tests0.95821.91642.87463.83284.791SE +/- 0.03780, N = 74.25855MIN: 3.881. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

TNN

Target: CPU - Model: MobileNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: MobileNet v2ML Tests50100150200250SE +/- 0.40, N = 3249.48MIN: 247.22 / MAX: 255.161. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

RNNoise

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-28ML Tests48121620SE +/- 0.02, N = 316.141. (CC) gcc options: -O2 -pedantic -fvisibility=hidden

TNN

Target: CPU - Model: SqueezeNet v1.1

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1ML Tests50100150200250SE +/- 0.13, N = 3222.33MIN: 221.49 / MAX: 224.651. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

ECP-CANDLE

Benchmark: P1B2

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.4Benchmark: P1B2ML Tests91827364537.51

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUML Tests246810SE +/- 0.02843, N = 38.34789MIN: 4.751. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUML Tests0.47650.9531.42951.9062.3825SE +/- 0.00458, N = 32.11771MIN: 1.911. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUML Tests0.36670.73341.10011.46681.8335SE +/- 0.00920, N = 31.62985MIN: 1.491. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUML Tests1.03352.0673.10054.1345.1675SE +/- 0.00541, N = 34.59343MIN: 4.391. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUML Tests0.6721.3442.0162.6883.36SE +/- 0.01068, N = 32.98659MIN: 2.721. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUML Tests3691215SE +/- 0.02, N = 312.09MIN: 11.921. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUML Tests0.60571.21141.81712.42283.0285SE +/- 0.00222, N = 32.69210MIN: 2.571. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

TNN

Target: CPU - Model: SqueezeNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2ML Tests1224364860SE +/- 0.62, N = 355.43MIN: 54.24 / MAX: 57.061. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUML Tests612182430SE +/- 0.02, N = 323.77MIN: 22.91. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUML Tests510152025SE +/- 0.03, N = 322.79MIN: 21.941. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUML Tests246810SE +/- 0.01002, N = 36.74559MIN: 6.521. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUML Tests0.73081.46162.19242.92323.654SE +/- 0.02694, N = 33.24784MIN: 2.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread


Phoronix Test Suite v10.8.5