MBP M1 Max Machine Learning, sys76-kudu-ML Apple M1 Max testing with a Apple MacBook Pro and Apple M1 Max on macOS 12.1 via the Phoronix Test Suite.
sys76-kudu-ML: AMD Ryzen 9 5900HX testing with a System76 Kudu (1.07.09RSA1 BIOS) and AMD Cezanne on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2202161-NE-MBPM1MAXM40,2202165-NE-SYS76KUDU88&grw&rdt .
MBP M1 Max Machine Learning, sys76-kudu-ML Processor Motherboard Chipset Memory Disk Graphics Audio Network Monitor OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution OpenCL ML Tests MBP M1 Max Machine Learning AMD Ryzen 9 5900HX @ 3.30GHz (8 Cores / 16 Threads) System76 Kudu (1.07.09RSA1 BIOS) AMD Renoir/Cezanne 16GB Samsung SSD 970 EVO Plus 500GB AMD Cezanne (2100/400MHz) AMD Renoir Radeon HD Audio Realtek RTL8125 2.5GbE + Intel Wi-Fi 6 AX200 Pop 21.10 5.15.15-76051515-generic (x86_64) GNOME Shell 40.5 X Server 1.20.13 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 1920x1080 Apple M1 Max (10 Cores) Apple MacBook Pro 64GB 1859GB Apple M1 Max Color LCD macOS 12.1 21.2.0 (arm64) OpenCL 1.2 (Nov 13 2021 00:45:09) GCC 13.0.0 + Clang 13.0.0 APFS 3456x2234 OpenBenchmarking.org Kernel Details - ML Tests: Transparent Huge Pages: madvise Compiler Details - ML Tests: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - ML Tests: Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa50000c Graphics Details - ML Tests: GLAMOR - BAR1 / Visible vRAM Size: 512 MB Python Details - ML Tests: Python 3.9.7 - MBP M1 Max Machine Learning: Python 2.7.18 + Python 3.8.9 Security Details - ML Tests: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected Environment Details - MBP M1 Max Machine Learning: XPC_FLAGS=0x0
MBP M1 Max Machine Learning, sys76-kudu-ML plaidml: No - Inference - VGG16 - CPU plaidml: No - Inference - ResNet 50 - CPU lczero: BLAS rbenchmark: numpy: deepspeech: CPU rnnoise: ecp-candle: P1B2 ecp-candle: P3B1 ecp-candle: P3B2 mnn: mobilenetV3 mnn: squeezenetv1.1 mnn: resnet-v2-50 mnn: SqueezeNetV1.0 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 opencv: DNN - Deep Neural Network tensorflow-lite: SqueezeNet tensorflow-lite: Inception V4 tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Float tensorflow-lite: Mobilenet Quant tensorflow-lite: Inception ResNet V2 tnn: CPU - DenseNet tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v2 tnn: CPU - SqueezeNet v1.1 caffe: AlexNet - CPU - 100 caffe: AlexNet - CPU - 200 caffe: AlexNet - CPU - 1000 caffe: GoogleNet - CPU - 100 caffe: GoogleNet - CPU - 200 caffe: GoogleNet - CPU - 1000 ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m mlpack: scikit_ica mlpack: scikit_qda mlpack: scikit_svm mlpack: scikit_linearridgeregression onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - Intel GPU ML Tests MBP M1 Max Machine Learning 12.47 6.88 563 0.1293 422.45 74.44043 16.137 37.51 1463.722 730.736 1.202 2.803 22.441 4.546 2.387 2.440 31.576 13787 189764 2749623 152186 127818 141174 2479080 2736.173 249.477 55.434 222.326 33496 65986 325884 86567 173671 868758 15.95 3.99 3.41 2.75 3.25 5.22 1.20 13.74 71.97 15.78 14.55 25.17 24.97 18.56 6.90 10.27 3.88 4.69 3.02 3.89 10.07 1.35 8.73 43.99 6.09 6.32 13.12 18.82 15.36 5.28 48.40 65.69 17.60 2.10 4.25855 12.0926 1.62985 2.69210 22.7926 8.34789 6.74559 23.7674 2.11771 3.24784 3579.00 2219.13 3587.17 2228.17 4.59343 3577.00 2237.65 2.98659 9.152 7.274 42.428 9.967 10.677 8.205 58.253 20.32 5.33 4.36 3.47 5.40 8.69 1.65 24.96 71.01 16.82 29.93 43.16 30.24 20.53 7.18 20.30 5.30 4.35 3.46 5.37 8.71 1.64 24.9 70.89 16.80 29.89 43.08 30.33 20.55 7.19 OpenBenchmarking.org
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU ML Tests 3 6 9 12 15 SE +/- 0.07, N = 3 12.47
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU ML Tests 2 4 6 8 10 SE +/- 0.02, N = 3 6.88
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS ML Tests 120 240 360 480 600 SE +/- 5.14, N = 7 563 1. (CXX) g++ options: -flto -pthread
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark ML Tests 0.0291 0.0582 0.0873 0.1164 0.1455 SE +/- 0.0003, N = 3 0.1293 1. R scripting front-end version 4.0.4 (2021-02-15)
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark ML Tests 90 180 270 360 450 SE +/- 0.84, N = 3 422.45
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU ML Tests 20 40 60 80 100 SE +/- 0.17, N = 3 74.44
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 ML Tests 4 8 12 16 20 SE +/- 0.02, N = 3 16.14 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P1B2 ML Tests 9 18 27 36 45 37.51
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P3B1 ML Tests 300 600 900 1200 1500 1463.72
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P3B2 ML Tests 160 320 480 640 800 730.74
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: mobilenetV3 ML Tests MBP M1 Max Machine Learning 3 6 9 12 15 SE +/- 0.005, N = 3 SE +/- 0.487, N = 9 1.202 9.152 -arch -isysroot - MIN: 3.37 / MAX: 58.79 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: squeezenetv1.1 ML Tests MBP M1 Max Machine Learning 2 4 6 8 10 SE +/- 0.009, N = 3 SE +/- 0.345, N = 9 2.803 7.274 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.6 / MAX: 17.21 -arch -isysroot - MIN: 2.75 / MAX: 117.92 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: resnet-v2-50 ML Tests MBP M1 Max Machine Learning 10 20 30 40 50 SE +/- 0.09, N = 3 SE +/- 4.17, N = 9 22.44 42.43 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 21.5 / MAX: 43.07 -arch -isysroot - MIN: 24 / MAX: 197.77 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: SqueezeNetV1.0 ML Tests MBP M1 Max Machine Learning 3 6 9 12 15 SE +/- 0.040, N = 3 SE +/- 0.664, N = 9 4.546 9.967 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 4.32 / MAX: 20.48 -arch -isysroot - MIN: 4.34 / MAX: 49.52 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: MobileNetV2_224 ML Tests MBP M1 Max Machine Learning 3 6 9 12 15 SE +/- 0.018, N = 3 SE +/- 0.187, N = 9 2.387 10.677 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.24 / MAX: 17.04 -arch -isysroot - MIN: 5.12 / MAX: 61.59 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: mobilenet-v1-1.0 ML Tests MBP M1 Max Machine Learning 2 4 6 8 10 SE +/- 0.019, N = 3 SE +/- 0.384, N = 9 2.440 8.205 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.17 / MAX: 18 -arch -isysroot - MIN: 4.27 / MAX: 48.5 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: inception-v3 ML Tests MBP M1 Max Machine Learning 13 26 39 52 65 SE +/- 0.42, N = 3 SE +/- 6.12, N = 9 31.58 58.25 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 29.6 / MAX: 48.32 -arch -isysroot - MIN: 30.46 / MAX: 200.21 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.5.4 Test: DNN - Deep Neural Network ML Tests 3K 6K 9K 12K 15K SE +/- 269.19, N = 15 13787 1. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet ML Tests 40K 80K 120K 160K 200K SE +/- 108.90, N = 3 189764
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 ML Tests 600K 1200K 1800K 2400K 3000K SE +/- 1719.91, N = 3 2749623
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile ML Tests 30K 60K 90K 120K 150K SE +/- 344.97, N = 3 152186
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float ML Tests 30K 60K 90K 120K 150K SE +/- 174.79, N = 3 127818
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant ML Tests 30K 60K 90K 120K 150K SE +/- 38.25, N = 3 141174
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 ML Tests 500K 1000K 1500K 2000K 2500K SE +/- 1189.89, N = 3 2479080
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet ML Tests 600 1200 1800 2400 3000 SE +/- 0.83, N = 3 2736.17 MIN: 2687.97 / MAX: 2827.52 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 ML Tests 50 100 150 200 250 SE +/- 0.40, N = 3 249.48 MIN: 247.22 / MAX: 255.16 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 ML Tests 12 24 36 48 60 SE +/- 0.62, N = 3 55.43 MIN: 54.24 / MAX: 57.06 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 ML Tests 50 100 150 200 250 SE +/- 0.13, N = 3 222.33 MIN: 221.49 / MAX: 224.65 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 ML Tests 7K 14K 21K 28K 35K SE +/- 37.32, N = 3 33496 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 ML Tests 14K 28K 42K 56K 70K SE +/- 167.60, N = 3 65986 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 ML Tests 70K 140K 210K 280K 350K SE +/- 469.87, N = 3 325884 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 ML Tests 20K 40K 60K 80K 100K SE +/- 103.35, N = 3 86567 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 ML Tests 40K 80K 120K 160K 200K SE +/- 318.11, N = 3 173671 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 ML Tests 200K 400K 600K 800K 1000K SE +/- 470.76, N = 3 868758 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mobilenet ML Tests MBP M1 Max Machine Learning 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 15.95 20.32 -rdynamic -lgomp -lpthread - MIN: 14.92 / MAX: 35.66 -arch -isysroot - MIN: 20.23 / MAX: 21.33 1. (CXX) g++ options: -O3
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v2-v2 - Model: mobilenet-v2 ML Tests MBP M1 Max Machine Learning 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 3.99 5.33 -rdynamic -lgomp -lpthread - MIN: 3.71 / MAX: 19.11 -arch -isysroot - MIN: 5.27 / MAX: 5.61 1. (CXX) g++ options: -O3
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v3-v3 - Model: mobilenet-v3 ML Tests MBP M1 Max Machine Learning 0.981 1.962 2.943 3.924 4.905 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 3.41 4.36 -rdynamic -lgomp -lpthread - MIN: 3.11 / MAX: 17.47 -arch -isysroot - MIN: 4.32 / MAX: 4.61 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: shufflenet-v2 ML Tests MBP M1 Max Machine Learning 0.7808 1.5616 2.3424 3.1232 3.904 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 2.75 3.47 -rdynamic -lgomp -lpthread - MIN: 2.48 / MAX: 16.13 -arch -isysroot - MIN: 3.43 / MAX: 3.84 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mnasnet ML Tests MBP M1 Max Machine Learning 1.215 2.43 3.645 4.86 6.075 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 3.25 5.40 -rdynamic -lgomp -lpthread - MIN: 2.82 / MAX: 16.82 -arch -isysroot - MIN: 5.35 / MAX: 5.68 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: efficientnet-b0 ML Tests MBP M1 Max Machine Learning 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 5.22 8.69 -rdynamic -lgomp -lpthread - MIN: 4.86 / MAX: 20.63 -arch -isysroot - MIN: 8.59 / MAX: 9.15 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: blazeface ML Tests MBP M1 Max Machine Learning 0.3713 0.7426 1.1139 1.4852 1.8565 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.20 1.65 -rdynamic -lgomp -lpthread - MIN: 1.16 / MAX: 1.78 -arch -isysroot - MIN: 1.64 / MAX: 1.72 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: googlenet ML Tests MBP M1 Max Machine Learning 6 12 18 24 30 SE +/- 0.28, N = 3 SE +/- 0.07, N = 3 13.74 24.96 -rdynamic -lgomp -lpthread - MIN: 12.47 / MAX: 28.56 -arch -isysroot - MIN: 24.82 / MAX: 25.91 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: vgg16 ML Tests MBP M1 Max Machine Learning 16 32 48 64 80 SE +/- 0.16, N = 3 SE +/- 0.15, N = 3 71.97 71.01 -rdynamic -lgomp -lpthread - MIN: 69.95 / MAX: 94.76 -arch -isysroot - MIN: 70.58 / MAX: 74.44 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet18 ML Tests MBP M1 Max Machine Learning 4 8 12 16 20 SE +/- 0.43, N = 3 SE +/- 0.04, N = 3 15.78 16.82 -rdynamic -lgomp -lpthread - MIN: 14.59 / MAX: 30.81 -arch -isysroot - MIN: 16.69 / MAX: 17.58 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: alexnet ML Tests MBP M1 Max Machine Learning 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 14.55 29.93 -rdynamic -lgomp -lpthread - MIN: 13.9 / MAX: 33.49 -arch -isysroot - MIN: 29.79 / MAX: 31.03 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet50 ML Tests MBP M1 Max Machine Learning 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 25.17 43.16 -rdynamic -lgomp -lpthread - MIN: 23.91 / MAX: 41.27 -arch -isysroot - MIN: 42.92 / MAX: 44.81 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: yolov4-tiny ML Tests MBP M1 Max Machine Learning 7 14 21 28 35 SE +/- 0.12, N = 3 SE +/- 0.03, N = 3 24.97 30.24 -rdynamic -lgomp -lpthread - MIN: 23.9 / MAX: 38.98 -arch -isysroot - MIN: 29.85 / MAX: 31.87 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: squeezenet_ssd ML Tests MBP M1 Max Machine Learning 5 10 15 20 25 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 18.56 20.53 -rdynamic -lgomp -lpthread - MIN: 17.64 / MAX: 34.93 -arch -isysroot - MIN: 20.37 / MAX: 21.53 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: regnety_400m ML Tests MBP M1 Max Machine Learning 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 6.90 7.18 -rdynamic -lgomp -lpthread - MIN: 6.35 / MAX: 21.53 -arch -isysroot - MIN: 7.14 / MAX: 8.13 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: mobilenet ML Tests MBP M1 Max Machine Learning 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 10.27 20.30 -rdynamic -lgomp -lpthread - MIN: 9.59 / MAX: 17.84 -arch -isysroot - MIN: 20.23 / MAX: 21.48 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ML Tests MBP M1 Max Machine Learning 1.1925 2.385 3.5775 4.77 5.9625 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 3.88 5.30 -rdynamic -lgomp -lpthread - MIN: 3.49 / MAX: 5.25 -arch -isysroot - MIN: 5.28 / MAX: 5.98 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ML Tests MBP M1 Max Machine Learning 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.12, N = 3 SE +/- 0.00, N = 3 4.69 4.35 -rdynamic -lgomp -lpthread - MIN: 4.29 / MAX: 5.94 -arch -isysroot - MIN: 4.32 / MAX: 4.63 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: shufflenet-v2 ML Tests MBP M1 Max Machine Learning 0.7785 1.557 2.3355 3.114 3.8925 SE +/- 0.07, N = 3 SE +/- 0.01, N = 3 3.02 3.46 -rdynamic -lgomp -lpthread - MIN: 2.54 / MAX: 4.38 -arch -isysroot - MIN: 3.44 / MAX: 3.82 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: mnasnet ML Tests MBP M1 Max Machine Learning 1.2083 2.4166 3.6249 4.8332 6.0415 SE +/- 0.11, N = 2 SE +/- 0.00, N = 3 3.89 5.37 -rdynamic -lgomp -lpthread - MIN: 3.56 / MAX: 5.01 -arch -isysroot - MIN: 5.35 / MAX: 5.62 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: efficientnet-b0 ML Tests MBP M1 Max Machine Learning 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 10.07 8.71 -rdynamic -lgomp -lpthread - MIN: 9.06 / MAX: 11.43 -arch -isysroot - MIN: 8.6 / MAX: 9.43 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: blazeface ML Tests MBP M1 Max Machine Learning 0.369 0.738 1.107 1.476 1.845 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 1.35 1.64 -rdynamic -lgomp -lpthread - MIN: 1.17 / MAX: 2.43 -arch -isysroot - MIN: 1.63 / MAX: 1.79 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: googlenet ML Tests MBP M1 Max Machine Learning 6 12 18 24 30 SE +/- 0.25, N = 3 SE +/- 0.00, N = 3 8.73 24.90 -rdynamic -lgomp -lpthread - MIN: 7.89 / MAX: 10.64 -arch -isysroot - MIN: 24.82 / MAX: 25.79 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: vgg16 ML Tests MBP M1 Max Machine Learning 16 32 48 64 80 SE +/- 0.07, N = 3 SE +/- 0.02, N = 3 43.99 70.89 -rdynamic -lgomp -lpthread - MIN: 43.17 / MAX: 45.59 -arch -isysroot - MIN: 70.59 / MAX: 73.62 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: resnet18 ML Tests MBP M1 Max Machine Learning 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 6.09 16.80 -rdynamic -lgomp -lpthread - MIN: 5.63 / MAX: 7.52 -arch -isysroot - MIN: 16.69 / MAX: 18.25 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: alexnet ML Tests MBP M1 Max Machine Learning 7 14 21 28 35 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 6.32 29.89 -rdynamic -lgomp -lpthread - MIN: 5.95 / MAX: 7.49 -arch -isysroot - MIN: 29.79 / MAX: 31.07 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: resnet50 ML Tests MBP M1 Max Machine Learning 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 13.12 43.08 -rdynamic -lgomp -lpthread - MIN: 12.27 / MAX: 15.04 -arch -isysroot - MIN: 42.9 / MAX: 45.66 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: yolov4-tiny ML Tests MBP M1 Max Machine Learning 7 14 21 28 35 SE +/- 0.42, N = 3 SE +/- 0.07, N = 3 18.82 30.33 -rdynamic -lgomp -lpthread - MIN: 17.12 / MAX: 24.45 -arch -isysroot - MIN: 29.85 / MAX: 32.58 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: squeezenet_ssd ML Tests MBP M1 Max Machine Learning 5 10 15 20 25 SE +/- 0.36, N = 3 SE +/- 0.05, N = 3 15.36 20.55 -rdynamic -lgomp -lpthread - MIN: 14.17 / MAX: 22.53 -arch -isysroot - MIN: 20.39 / MAX: 22.13 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: regnety_400m ML Tests MBP M1 Max Machine Learning 2 4 6 8 10 SE +/- 0.06, N = 3 SE +/- 0.00, N = 3 5.28 7.19 -rdynamic -lgomp -lpthread - MIN: 4.68 / MAX: 6.44 -arch -isysroot - MIN: 7.15 / MAX: 7.72 1. (CXX) g++ options: -O3
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica ML Tests 11 22 33 44 55 SE +/- 0.12, N = 3 48.40
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda ML Tests 15 30 45 60 75 SE +/- 0.03, N = 3 65.69
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm ML Tests 4 8 12 16 20 SE +/- 0.02, N = 3 17.60
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression ML Tests 0.4725 0.945 1.4175 1.89 2.3625 SE +/- 0.01, N = 3 2.10
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU ML Tests 0.9582 1.9164 2.8746 3.8328 4.791 SE +/- 0.03780, N = 7 4.25855 MIN: 3.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU ML Tests 3 6 9 12 15 SE +/- 0.02, N = 3 12.09 MIN: 11.92 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU ML Tests 0.3667 0.7334 1.1001 1.4668 1.8335 SE +/- 0.00920, N = 3 1.62985 MIN: 1.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU ML Tests 0.6057 1.2114 1.8171 2.4228 3.0285 SE +/- 0.00222, N = 3 2.69210 MIN: 2.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ML Tests 5 10 15 20 25 SE +/- 0.03, N = 3 22.79 MIN: 21.94 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ML Tests 2 4 6 8 10 SE +/- 0.02843, N = 3 8.34789 MIN: 4.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ML Tests 2 4 6 8 10 SE +/- 0.01002, N = 3 6.74559 MIN: 6.52 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ML Tests 6 12 18 24 30 SE +/- 0.02, N = 3 23.77 MIN: 22.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ML Tests 0.4765 0.953 1.4295 1.906 2.3825 SE +/- 0.00458, N = 3 2.11771 MIN: 1.91 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ML Tests 0.7308 1.4616 2.1924 2.9232 3.654 SE +/- 0.02694, N = 3 3.24784 MIN: 2.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU ML Tests 800 1600 2400 3200 4000 SE +/- 7.39, N = 3 3579.00 MIN: 3519.93 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ML Tests 500 1000 1500 2000 2500 SE +/- 1.17, N = 3 2219.13 MIN: 2182.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU ML Tests 800 1600 2400 3200 4000 SE +/- 4.16, N = 3 3587.17 MIN: 3527.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ML Tests 500 1000 1500 2000 2500 SE +/- 6.34, N = 3 2228.17 MIN: 2189.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU ML Tests 1.0335 2.067 3.1005 4.134 5.1675 SE +/- 0.00541, N = 3 4.59343 MIN: 4.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ML Tests 800 1600 2400 3200 4000 SE +/- 4.93, N = 3 3577.00 MIN: 3514.72 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ML Tests 500 1000 1500 2000 2500 SE +/- 14.65, N = 14 2237.65 MIN: 2174.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU ML Tests 0.672 1.344 2.016 2.688 3.36 SE +/- 0.01068, N = 3 2.98659 MIN: 2.72 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.5