MBP M1 Max Machine Learning, sys76-kudu-ML

Apple M1 Max testing with a Apple MacBook Pro and Apple M1 Max on macOS 12.1 via the Phoronix Test Suite. sys76-kudu-ML: AMD Ryzen 9 5900HX testing with a System76 Kudu (1.07.09RSA1 BIOS) and AMD Cezanne on Pop 21.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2202161-NE-MBPM1MAXM40,2202165-NE-SYS76KUDU88&grt&rdt.

MBP M1 Max Machine Learning, sys76-kudu-MLProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkMonitorOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionOpenCLML TestsMBP M1 Max Machine LearningAMD Ryzen 9 5900HX @ 3.30GHz (8 Cores / 16 Threads)System76 Kudu (1.07.09RSA1 BIOS)AMD Renoir/Cezanne16GBSamsung SSD 970 EVO Plus 500GBAMD Cezanne (2100/400MHz)AMD Renoir Radeon HD AudioRealtek RTL8125 2.5GbE + Intel Wi-Fi 6 AX200Pop 21.105.15.15-76051515-generic (x86_64)GNOME Shell 40.5X Server 1.20.134.6 Mesa 21.2.2 (LLVM 12.0.1)1.2.182GCC 11.2.0ext41920x1080Apple M1 Max (10 Cores)Apple MacBook Pro64GB1859GBApple M1 MaxColor LCDmacOS 12.121.2.0 (arm64)OpenCL 1.2 (Nov 13 2021 00:45:09)GCC 13.0.0 + Clang 13.0.0APFS3456x2234OpenBenchmarking.orgKernel Details- ML Tests: Transparent Huge Pages: madviseCompiler Details- ML Tests: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- ML Tests: Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa50000cGraphics Details- ML Tests: GLAMOR - BAR1 / Visible vRAM Size: 512 MBPython Details- ML Tests: Python 3.9.7- MBP M1 Max Machine Learning: Python 2.7.18 + Python 3.8.9Security Details- ML Tests: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected Environment Details- MBP M1 Max Machine Learning: XPC_FLAGS=0x0

MBP M1 Max Machine Learning, sys76-kudu-MLcaffe: AlexNet - CPU - 100caffe: AlexNet - CPU - 200caffe: AlexNet - CPU - 1000caffe: GoogleNet - CPU - 100caffe: GoogleNet - CPU - 200caffe: GoogleNet - CPU - 1000deepspeech: CPUecp-candle: P1B2ecp-candle: P3B1ecp-candle: P3B2lczero: BLASmlpack: scikit_icamlpack: scikit_qdamlpack: scikit_svmmlpack: scikit_linearridgeregressionmnn: mobilenetV3mnn: squeezenetv1.1mnn: resnet-v2-50mnn: SqueezeNetV1.0mnn: MobileNetV2_224mnn: mobilenet-v1-1.0mnn: inception-v3ncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mnumpy: onednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUopencv: DNN - Deep Neural Networkplaidml: No - Inference - VGG16 - CPUplaidml: No - Inference - ResNet 50 - CPUrbenchmark: rnnoise: tensorflow-lite: SqueezeNettensorflow-lite: Inception V4tensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Floattensorflow-lite: Mobilenet Quanttensorflow-lite: Inception ResNet V2tnn: CPU - DenseNettnn: CPU - MobileNet v2tnn: CPU - SqueezeNet v2tnn: CPU - SqueezeNet v1.1ML TestsMBP M1 Max Machine Learning33496659863258848656717367186875874.4404337.511463.722730.73656348.4065.6917.602.101.2022.80322.4414.5462.3872.44031.57615.953.993.412.753.255.221.2013.7471.9715.7814.5525.1724.9718.566.9010.273.884.693.023.8910.071.358.7343.996.096.3213.1218.8215.365.28422.454.2585512.09261.629852.6921022.79268.347896.7455923.76742.117713.247843579.002219.133587.172228.174.593433577.002237.652.986591378712.476.880.129316.137189764274962315218612781814117424790802736.173249.47755.434222.3269.1527.27442.4289.96710.6778.20558.25320.325.334.363.475.408.691.6524.9671.0116.8229.9343.1630.2420.537.1820.305.304.353.465.378.711.6424.970.8916.8029.8943.0830.3320.557.19OpenBenchmarking.org

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 100ML Tests7K14K21K28K35KSE +/- 37.32, N = 3334961. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 200ML Tests14K28K42K56K70KSE +/- 167.60, N = 3659861. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: AlexNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: AlexNet - Acceleration: CPU - Iterations: 1000ML Tests70K140K210K280K350KSE +/- 469.87, N = 33258841. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 100

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 100ML Tests20K40K60K80K100KSE +/- 103.35, N = 3865671. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 200

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 200ML Tests40K80K120K160K200KSE +/- 318.11, N = 31736711. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

Caffe

Model: GoogleNet - Acceleration: CPU - Iterations: 1000

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe 2020-02-13Model: GoogleNet - Acceleration: CPU - Iterations: 1000ML Tests200K400K600K800K1000KSE +/- 470.76, N = 38687581. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas

DeepSpeech

Acceleration: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPUML Tests20406080100SE +/- 0.17, N = 374.44

ECP-CANDLE

Benchmark: P1B2

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.4Benchmark: P1B2ML Tests91827364537.51

ECP-CANDLE

Benchmark: P3B1

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.4Benchmark: P3B1ML Tests300600900120015001463.72

ECP-CANDLE

Benchmark: P3B2

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.4Benchmark: P3B2ML Tests160320480640800730.74

LeelaChessZero

Backend: BLAS

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLASML Tests120240360480600SE +/- 5.14, N = 75631. (CXX) g++ options: -flto -pthread

Mlpack Benchmark

Benchmark: scikit_ica

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_icaML Tests1122334455SE +/- 0.12, N = 348.40

Mlpack Benchmark

Benchmark: scikit_qda

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_qdaML Tests1530456075SE +/- 0.03, N = 365.69

Mlpack Benchmark

Benchmark: scikit_svm

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_svmML Tests48121620SE +/- 0.02, N = 317.60

Mlpack Benchmark

Benchmark: scikit_linearridgeregression

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_linearridgeregressionML Tests0.47250.9451.41751.892.3625SE +/- 0.01, N = 32.10

Mobile Neural Network

Model: mobilenetV3

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: mobilenetV3ML TestsMBP M1 Max Machine Learning3691215SE +/- 0.005, N = 3SE +/- 0.487, N = 91.2029.152-arch -isysroot - MIN: 3.37 / MAX: 58.791. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: squeezenetv1.1

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: squeezenetv1.1ML TestsMBP M1 Max Machine Learning246810SE +/- 0.009, N = 3SE +/- 0.345, N = 92.8037.274-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.6 / MAX: 17.21-arch -isysroot - MIN: 2.75 / MAX: 117.921. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: resnet-v2-50

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: resnet-v2-50ML TestsMBP M1 Max Machine Learning1020304050SE +/- 0.09, N = 3SE +/- 4.17, N = 922.4442.43-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 21.5 / MAX: 43.07-arch -isysroot - MIN: 24 / MAX: 197.771. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: SqueezeNetV1.0

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: SqueezeNetV1.0ML TestsMBP M1 Max Machine Learning3691215SE +/- 0.040, N = 3SE +/- 0.664, N = 94.5469.967-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 4.32 / MAX: 20.48-arch -isysroot - MIN: 4.34 / MAX: 49.521. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: MobileNetV2_224

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: MobileNetV2_224ML TestsMBP M1 Max Machine Learning3691215SE +/- 0.018, N = 3SE +/- 0.187, N = 92.38710.677-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.24 / MAX: 17.04-arch -isysroot - MIN: 5.12 / MAX: 61.591. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: mobilenet-v1-1.0

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: mobilenet-v1-1.0ML TestsMBP M1 Max Machine Learning246810SE +/- 0.019, N = 3SE +/- 0.384, N = 92.4408.205-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.17 / MAX: 18-arch -isysroot - MIN: 4.27 / MAX: 48.51. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

Mobile Neural Network

Model: inception-v3

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.2Model: inception-v3ML TestsMBP M1 Max Machine Learning1326395265SE +/- 0.42, N = 3SE +/- 6.12, N = 931.5858.25-fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 29.6 / MAX: 48.32-arch -isysroot - MIN: 30.46 / MAX: 200.211. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: mobilenetML TestsMBP M1 Max Machine Learning510152025SE +/- 0.09, N = 3SE +/- 0.02, N = 315.9520.32-rdynamic -lgomp -lpthread - MIN: 14.92 / MAX: 35.66-arch -isysroot - MIN: 20.23 / MAX: 21.331. (CXX) g++ options: -O3

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU-v2-v2 - Model: mobilenet-v2ML TestsMBP M1 Max Machine Learning1.19932.39863.59794.79725.9965SE +/- 0.02, N = 3SE +/- 0.03, N = 33.995.33-rdynamic -lgomp -lpthread - MIN: 3.71 / MAX: 19.11-arch -isysroot - MIN: 5.27 / MAX: 5.611. (CXX) g++ options: -O3

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU-v3-v3 - Model: mobilenet-v3ML TestsMBP M1 Max Machine Learning0.9811.9622.9433.9244.905SE +/- 0.02, N = 3SE +/- 0.03, N = 33.414.36-rdynamic -lgomp -lpthread - MIN: 3.11 / MAX: 17.47-arch -isysroot - MIN: 4.32 / MAX: 4.611. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: shufflenet-v2ML TestsMBP M1 Max Machine Learning0.78081.56162.34243.12323.904SE +/- 0.04, N = 3SE +/- 0.02, N = 32.753.47-rdynamic -lgomp -lpthread - MIN: 2.48 / MAX: 16.13-arch -isysroot - MIN: 3.43 / MAX: 3.841. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: mnasnetML TestsMBP M1 Max Machine Learning1.2152.433.6454.866.075SE +/- 0.03, N = 3SE +/- 0.03, N = 33.255.40-rdynamic -lgomp -lpthread - MIN: 2.82 / MAX: 16.82-arch -isysroot - MIN: 5.35 / MAX: 5.681. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: efficientnet-b0ML TestsMBP M1 Max Machine Learning246810SE +/- 0.01, N = 3SE +/- 0.04, N = 35.228.69-rdynamic -lgomp -lpthread - MIN: 4.86 / MAX: 20.63-arch -isysroot - MIN: 8.59 / MAX: 9.151. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: blazefaceML TestsMBP M1 Max Machine Learning0.37130.74261.11391.48521.8565SE +/- 0.01, N = 3SE +/- 0.01, N = 31.201.65-rdynamic -lgomp -lpthread - MIN: 1.16 / MAX: 1.78-arch -isysroot - MIN: 1.64 / MAX: 1.721. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: googlenetML TestsMBP M1 Max Machine Learning612182430SE +/- 0.28, N = 3SE +/- 0.07, N = 313.7424.96-rdynamic -lgomp -lpthread - MIN: 12.47 / MAX: 28.56-arch -isysroot - MIN: 24.82 / MAX: 25.911. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: vgg16ML TestsMBP M1 Max Machine Learning1632486480SE +/- 0.16, N = 3SE +/- 0.15, N = 371.9771.01-rdynamic -lgomp -lpthread - MIN: 69.95 / MAX: 94.76-arch -isysroot - MIN: 70.58 / MAX: 74.441. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: resnet18ML TestsMBP M1 Max Machine Learning48121620SE +/- 0.43, N = 3SE +/- 0.04, N = 315.7816.82-rdynamic -lgomp -lpthread - MIN: 14.59 / MAX: 30.81-arch -isysroot - MIN: 16.69 / MAX: 17.581. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: alexnetML TestsMBP M1 Max Machine Learning714212835SE +/- 0.06, N = 3SE +/- 0.05, N = 314.5529.93-rdynamic -lgomp -lpthread - MIN: 13.9 / MAX: 33.49-arch -isysroot - MIN: 29.79 / MAX: 31.031. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: resnet50ML TestsMBP M1 Max Machine Learning1020304050SE +/- 0.06, N = 3SE +/- 0.07, N = 325.1743.16-rdynamic -lgomp -lpthread - MIN: 23.91 / MAX: 41.27-arch -isysroot - MIN: 42.92 / MAX: 44.811. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: yolov4-tinyML TestsMBP M1 Max Machine Learning714212835SE +/- 0.12, N = 3SE +/- 0.03, N = 324.9730.24-rdynamic -lgomp -lpthread - MIN: 23.9 / MAX: 38.98-arch -isysroot - MIN: 29.85 / MAX: 31.871. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: squeezenet_ssdML TestsMBP M1 Max Machine Learning510152025SE +/- 0.15, N = 3SE +/- 0.05, N = 318.5620.53-rdynamic -lgomp -lpthread - MIN: 17.64 / MAX: 34.93-arch -isysroot - MIN: 20.37 / MAX: 21.531. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: CPU - Model: regnety_400mML TestsMBP M1 Max Machine Learning246810SE +/- 0.04, N = 3SE +/- 0.00, N = 36.907.18-rdynamic -lgomp -lpthread - MIN: 6.35 / MAX: 21.53-arch -isysroot - MIN: 7.14 / MAX: 8.131. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: mobilenetML TestsMBP M1 Max Machine Learning510152025SE +/- 0.09, N = 3SE +/- 0.02, N = 310.2720.30-rdynamic -lgomp -lpthread - MIN: 9.59 / MAX: 17.84-arch -isysroot - MIN: 20.23 / MAX: 21.481. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2ML TestsMBP M1 Max Machine Learning1.19252.3853.57754.775.9625SE +/- 0.06, N = 3SE +/- 0.01, N = 33.885.30-rdynamic -lgomp -lpthread - MIN: 3.49 / MAX: 5.25-arch -isysroot - MIN: 5.28 / MAX: 5.981. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3ML TestsMBP M1 Max Machine Learning1.05532.11063.16594.22125.2765SE +/- 0.12, N = 3SE +/- 0.00, N = 34.694.35-rdynamic -lgomp -lpthread - MIN: 4.29 / MAX: 5.94-arch -isysroot - MIN: 4.32 / MAX: 4.631. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: shufflenet-v2ML TestsMBP M1 Max Machine Learning0.77851.5572.33553.1143.8925SE +/- 0.07, N = 3SE +/- 0.01, N = 33.023.46-rdynamic -lgomp -lpthread - MIN: 2.54 / MAX: 4.38-arch -isysroot - MIN: 3.44 / MAX: 3.821. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: mnasnetML TestsMBP M1 Max Machine Learning1.20832.41663.62494.83326.0415SE +/- 0.11, N = 2SE +/- 0.00, N = 33.895.37-rdynamic -lgomp -lpthread - MIN: 3.56 / MAX: 5.01-arch -isysroot - MIN: 5.35 / MAX: 5.621. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: efficientnet-b0ML TestsMBP M1 Max Machine Learning3691215SE +/- 0.06, N = 3SE +/- 0.02, N = 310.078.71-rdynamic -lgomp -lpthread - MIN: 9.06 / MAX: 11.43-arch -isysroot - MIN: 8.6 / MAX: 9.431. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: blazefaceML TestsMBP M1 Max Machine Learning0.3690.7381.1071.4761.845SE +/- 0.01, N = 3SE +/- 0.00, N = 31.351.64-rdynamic -lgomp -lpthread - MIN: 1.17 / MAX: 2.43-arch -isysroot - MIN: 1.63 / MAX: 1.791. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: googlenetML TestsMBP M1 Max Machine Learning612182430SE +/- 0.25, N = 3SE +/- 0.00, N = 38.7324.90-rdynamic -lgomp -lpthread - MIN: 7.89 / MAX: 10.64-arch -isysroot - MIN: 24.82 / MAX: 25.791. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: vgg16ML TestsMBP M1 Max Machine Learning1632486480SE +/- 0.07, N = 3SE +/- 0.02, N = 343.9970.89-rdynamic -lgomp -lpthread - MIN: 43.17 / MAX: 45.59-arch -isysroot - MIN: 70.59 / MAX: 73.621. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: resnet18ML TestsMBP M1 Max Machine Learning48121620SE +/- 0.08, N = 3SE +/- 0.01, N = 36.0916.80-rdynamic -lgomp -lpthread - MIN: 5.63 / MAX: 7.52-arch -isysroot - MIN: 16.69 / MAX: 18.251. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: alexnetML TestsMBP M1 Max Machine Learning714212835SE +/- 0.03, N = 3SE +/- 0.00, N = 36.3229.89-rdynamic -lgomp -lpthread - MIN: 5.95 / MAX: 7.49-arch -isysroot - MIN: 29.79 / MAX: 31.071. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: resnet50ML TestsMBP M1 Max Machine Learning1020304050SE +/- 0.06, N = 3SE +/- 0.01, N = 313.1243.08-rdynamic -lgomp -lpthread - MIN: 12.27 / MAX: 15.04-arch -isysroot - MIN: 42.9 / MAX: 45.661. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: yolov4-tinyML TestsMBP M1 Max Machine Learning714212835SE +/- 0.42, N = 3SE +/- 0.07, N = 318.8230.33-rdynamic -lgomp -lpthread - MIN: 17.12 / MAX: 24.45-arch -isysroot - MIN: 29.85 / MAX: 32.581. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: squeezenet_ssdML TestsMBP M1 Max Machine Learning510152025SE +/- 0.36, N = 3SE +/- 0.05, N = 315.3620.55-rdynamic -lgomp -lpthread - MIN: 14.17 / MAX: 22.53-arch -isysroot - MIN: 20.39 / MAX: 22.131. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20210720Target: Vulkan GPU - Model: regnety_400mML TestsMBP M1 Max Machine Learning246810SE +/- 0.06, N = 3SE +/- 0.00, N = 35.287.19-rdynamic -lgomp -lpthread - MIN: 4.68 / MAX: 6.44-arch -isysroot - MIN: 7.15 / MAX: 7.721. (CXX) g++ options: -O3

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkML Tests90180270360450SE +/- 0.84, N = 3422.45

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUML Tests0.95821.91642.87463.83284.791SE +/- 0.03780, N = 74.25855MIN: 3.881. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUML Tests3691215SE +/- 0.02, N = 312.09MIN: 11.921. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUML Tests0.36670.73341.10011.46681.8335SE +/- 0.00920, N = 31.62985MIN: 1.491. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUML Tests0.60571.21141.81712.42283.0285SE +/- 0.00222, N = 32.69210MIN: 2.571. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUML Tests510152025SE +/- 0.03, N = 322.79MIN: 21.941. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUML Tests246810SE +/- 0.02843, N = 38.34789MIN: 4.751. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUML Tests246810SE +/- 0.01002, N = 36.74559MIN: 6.521. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUML Tests612182430SE +/- 0.02, N = 323.77MIN: 22.91. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUML Tests0.47650.9531.42951.9062.3825SE +/- 0.00458, N = 32.11771MIN: 1.911. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUML Tests0.73081.46162.19242.92323.654SE +/- 0.02694, N = 33.24784MIN: 2.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUML Tests8001600240032004000SE +/- 7.39, N = 33579.00MIN: 3519.931. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUML Tests5001000150020002500SE +/- 1.17, N = 32219.13MIN: 2182.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUML Tests8001600240032004000SE +/- 4.16, N = 33587.17MIN: 3527.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUML Tests5001000150020002500SE +/- 6.34, N = 32228.17MIN: 2189.621. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUML Tests1.03352.0673.10054.1345.1675SE +/- 0.00541, N = 34.59343MIN: 4.391. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUML Tests8001600240032004000SE +/- 4.93, N = 33577.00MIN: 3514.721. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUML Tests5001000150020002500SE +/- 14.65, N = 142237.65MIN: 2174.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUML Tests0.6721.3442.0162.6883.36SE +/- 0.01068, N = 32.98659MIN: 2.721. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenCV

Test: DNN - Deep Neural Network

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.5.4Test: DNN - Deep Neural NetworkML Tests3K6K9K12K15KSE +/- 269.19, N = 15137871. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared

PlaidML

FP16: No - Mode: Inference - Network: VGG16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: VGG16 - Device: CPUML Tests3691215SE +/- 0.07, N = 312.47

PlaidML

FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: ResNet 50 - Device: CPUML Tests246810SE +/- 0.02, N = 36.88

R Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterR BenchmarkML Tests0.02910.05820.08730.11640.1455SE +/- 0.0003, N = 30.12931. R scripting front-end version 4.0.4 (2021-02-15)

RNNoise

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-28ML Tests48121620SE +/- 0.02, N = 316.141. (CC) gcc options: -O2 -pedantic -fvisibility=hidden

TensorFlow Lite

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: SqueezeNetML Tests40K80K120K160K200KSE +/- 108.90, N = 3189764

TensorFlow Lite

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception V4ML Tests600K1200K1800K2400K3000KSE +/- 1719.91, N = 32749623

TensorFlow Lite

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: NASNet MobileML Tests30K60K90K120K150KSE +/- 344.97, N = 3152186

TensorFlow Lite

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet FloatML Tests30K60K90K120K150KSE +/- 174.79, N = 3127818

TensorFlow Lite

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet QuantML Tests30K60K90K120K150KSE +/- 38.25, N = 3141174

TensorFlow Lite

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception ResNet V2ML Tests500K1000K1500K2000K2500KSE +/- 1189.89, N = 32479080

TNN

Target: CPU - Model: DenseNet

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: DenseNetML Tests6001200180024003000SE +/- 0.83, N = 32736.17MIN: 2687.97 / MAX: 2827.521. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: MobileNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: MobileNet v2ML Tests50100150200250SE +/- 0.40, N = 3249.48MIN: 247.22 / MAX: 255.161. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2ML Tests1224364860SE +/- 0.62, N = 355.43MIN: 54.24 / MAX: 57.061. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v1.1

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1ML Tests50100150200250SE +/- 0.13, N = 3222.33MIN: 221.49 / MAX: 224.651. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl


Phoronix Test Suite v10.8.5