MBP M1 Max Machine Learning, sys76-kudu-ML Apple M1 Max testing with a Apple MacBook Pro and Apple M1 Max on macOS 12.1 via the Phoronix Test Suite.
sys76-kudu-ML: AMD Ryzen 9 5900HX testing with a System76 Kudu (1.07.09RSA1 BIOS) and AMD Cezanne on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2202161-NE-MBPM1MAXM40,2202165-NE-SYS76KUDU88&grr&sro .
MBP M1 Max Machine Learning, sys76-kudu-ML Processor Motherboard Memory Disk Graphics Monitor Chipset Audio Network OS Kernel OpenCL Compiler File-System Screen Resolution Desktop Display Server OpenGL Vulkan MBP M1 Max Machine Learning ML Tests Apple M1 Max (10 Cores) Apple MacBook Pro 64GB 1859GB Apple M1 Max Color LCD macOS 12.1 21.2.0 (arm64) OpenCL 1.2 (Nov 13 2021 00:45:09) GCC 13.0.0 + Clang 13.0.0 APFS 3456x2234 AMD Ryzen 9 5900HX @ 3.30GHz (8 Cores / 16 Threads) System76 Kudu (1.07.09RSA1 BIOS) AMD Renoir/Cezanne 16GB Samsung SSD 970 EVO Plus 500GB AMD Cezanne (2100/400MHz) AMD Renoir Radeon HD Audio Realtek RTL8125 2.5GbE + Intel Wi-Fi 6 AX200 Pop 21.10 5.15.15-76051515-generic (x86_64) GNOME Shell 40.5 X Server 1.20.13 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 1920x1080 OpenBenchmarking.org Environment Details - MBP M1 Max Machine Learning: XPC_FLAGS=0x0 Python Details - MBP M1 Max Machine Learning: Python 2.7.18 + Python 3.8.9 - ML Tests: Python 3.9.7 Kernel Details - ML Tests: Transparent Huge Pages: madvise Compiler Details - ML Tests: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - ML Tests: Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa50000c Graphics Details - ML Tests: GLAMOR - BAR1 / Visible vRAM Size: 512 MB Security Details - ML Tests: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
MBP M1 Max Machine Learning, sys76-kudu-ML caffe: GoogleNet - CPU - 1000 lczero: BLAS ecp-candle: P3B1 mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 caffe: AlexNet - CPU - 1000 plaidml: No - Inference - ResNet 50 - CPU ecp-candle: P3B2 plaidml: No - Inference - VGG16 - CPU tnn: CPU - DenseNet onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU caffe: GoogleNet - CPU - 200 tensorflow-lite: Inception V4 tensorflow-lite: Inception ResNet V2 mlpack: scikit_qda numpy: ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet caffe: GoogleNet - CPU - 100 opencv: DNN - Deep Neural Network caffe: AlexNet - CPU - 200 tensorflow-lite: SqueezeNet tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Quant tensorflow-lite: Mobilenet Float mlpack: scikit_ica onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU mlpack: scikit_linearridgeregression onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU caffe: AlexNet - CPU - 100 deepspeech: CPU mlpack: scikit_svm rbenchmark: onednn: IP Shapes 1D - f32 - CPU tnn: CPU - MobileNet v2 rnnoise: tnn: CPU - SqueezeNet v1.1 ecp-candle: P1B2 onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU tnn: CPU - SqueezeNet v2 onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU openvino: Face Detection 0106 FP16 - Intel GPU MBP M1 Max Machine Learning ML Tests 58.253 8.205 10.677 9.967 42.428 7.274 9.152 7.18 20.53 30.24 43.16 29.93 16.82 71.01 24.96 1.65 8.69 5.40 3.47 4.36 5.33 20.32 7.19 20.55 30.33 43.08 29.89 16.80 70.89 24.9 1.64 8.71 5.37 3.46 4.35 5.30 20.30 868758 563 1463.722 31.576 2.440 2.387 4.546 22.441 2.803 1.202 325884 6.88 730.736 12.47 2736.173 2237.65 173671 2749623 2479080 65.69 422.45 6.90 18.56 24.97 25.17 14.55 15.78 71.97 13.74 1.20 5.22 3.25 2.75 3.41 3.99 15.95 5.28 15.36 18.82 13.12 6.32 6.09 43.99 8.73 1.35 10.07 3.89 3.02 4.69 3.88 10.27 86567 13787 65986 189764 152186 141174 127818 48.40 3577.00 3587.17 3579.00 2.10 2228.17 2219.13 33496 74.44043 17.60 0.1293 4.25855 249.477 16.137 222.326 37.51 8.34789 2.11771 1.62985 4.59343 2.98659 12.0926 2.69210 55.434 23.7674 22.7926 6.74559 3.24784 OpenBenchmarking.org
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 ML Tests 200K 400K 600K 800K 1000K SE +/- 470.76, N = 3 868758 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS ML Tests 120 240 360 480 600 SE +/- 5.14, N = 7 563 1. (CXX) g++ options: -flto -pthread
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P3B1 ML Tests 300 600 900 1200 1500 1463.72
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: inception-v3 MBP M1 Max Machine Learning ML Tests 13 26 39 52 65 SE +/- 6.12, N = 9 SE +/- 0.42, N = 3 58.25 31.58 -arch -isysroot - MIN: 30.46 / MAX: 200.21 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 29.6 / MAX: 48.32 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: mobilenet-v1-1.0 MBP M1 Max Machine Learning ML Tests 2 4 6 8 10 SE +/- 0.384, N = 9 SE +/- 0.019, N = 3 8.205 2.440 -arch -isysroot - MIN: 4.27 / MAX: 48.5 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.17 / MAX: 18 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: MobileNetV2_224 MBP M1 Max Machine Learning ML Tests 3 6 9 12 15 SE +/- 0.187, N = 9 SE +/- 0.018, N = 3 10.677 2.387 -arch -isysroot - MIN: 5.12 / MAX: 61.59 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.24 / MAX: 17.04 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: SqueezeNetV1.0 MBP M1 Max Machine Learning ML Tests 3 6 9 12 15 SE +/- 0.664, N = 9 SE +/- 0.040, N = 3 9.967 4.546 -arch -isysroot - MIN: 4.34 / MAX: 49.52 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 4.32 / MAX: 20.48 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: resnet-v2-50 MBP M1 Max Machine Learning ML Tests 10 20 30 40 50 SE +/- 4.17, N = 9 SE +/- 0.09, N = 3 42.43 22.44 -arch -isysroot - MIN: 24 / MAX: 197.77 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 21.5 / MAX: 43.07 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: squeezenetv1.1 MBP M1 Max Machine Learning ML Tests 2 4 6 8 10 SE +/- 0.345, N = 9 SE +/- 0.009, N = 3 7.274 2.803 -arch -isysroot - MIN: 2.75 / MAX: 117.92 -fomit-frame-pointer -rdynamic -pthread -ldl - MIN: 2.6 / MAX: 17.21 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: mobilenetV3 MBP M1 Max Machine Learning ML Tests 3 6 9 12 15 SE +/- 0.487, N = 9 SE +/- 0.005, N = 3 9.152 1.202 -arch -isysroot - MIN: 3.37 / MAX: 58.79 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 ML Tests 70K 140K 210K 280K 350K SE +/- 469.87, N = 3 325884 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU ML Tests 2 4 6 8 10 SE +/- 0.02, N = 3 6.88
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P3B2 ML Tests 160 320 480 640 800 730.74
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU ML Tests 3 6 9 12 15 SE +/- 0.07, N = 3 12.47
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet ML Tests 600 1200 1800 2400 3000 SE +/- 0.83, N = 3 2736.17 MIN: 2687.97 / MAX: 2827.52 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ML Tests 500 1000 1500 2000 2500 SE +/- 14.65, N = 14 2237.65 MIN: 2174.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 ML Tests 40K 80K 120K 160K 200K SE +/- 318.11, N = 3 173671 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 ML Tests 600K 1200K 1800K 2400K 3000K SE +/- 1719.91, N = 3 2749623
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 ML Tests 500K 1000K 1500K 2000K 2500K SE +/- 1189.89, N = 3 2479080
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda ML Tests 15 30 45 60 75 SE +/- 0.03, N = 3 65.69
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark ML Tests 90 180 270 360 450 SE +/- 0.84, N = 3 422.45
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: regnety_400m MBP M1 Max Machine Learning ML Tests 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.04, N = 3 7.18 6.90 -arch -isysroot - MIN: 7.14 / MAX: 8.13 -rdynamic -lgomp -lpthread - MIN: 6.35 / MAX: 21.53 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: squeezenet_ssd MBP M1 Max Machine Learning ML Tests 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.15, N = 3 20.53 18.56 -arch -isysroot - MIN: 20.37 / MAX: 21.53 -rdynamic -lgomp -lpthread - MIN: 17.64 / MAX: 34.93 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: yolov4-tiny MBP M1 Max Machine Learning ML Tests 7 14 21 28 35 SE +/- 0.03, N = 3 SE +/- 0.12, N = 3 30.24 24.97 -arch -isysroot - MIN: 29.85 / MAX: 31.87 -rdynamic -lgomp -lpthread - MIN: 23.9 / MAX: 38.98 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet50 MBP M1 Max Machine Learning ML Tests 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 43.16 25.17 -arch -isysroot - MIN: 42.92 / MAX: 44.81 -rdynamic -lgomp -lpthread - MIN: 23.91 / MAX: 41.27 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: alexnet MBP M1 Max Machine Learning ML Tests 7 14 21 28 35 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 29.93 14.55 -arch -isysroot - MIN: 29.79 / MAX: 31.03 -rdynamic -lgomp -lpthread - MIN: 13.9 / MAX: 33.49 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet18 MBP M1 Max Machine Learning ML Tests 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.43, N = 3 16.82 15.78 -arch -isysroot - MIN: 16.69 / MAX: 17.58 -rdynamic -lgomp -lpthread - MIN: 14.59 / MAX: 30.81 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: vgg16 MBP M1 Max Machine Learning ML Tests 16 32 48 64 80 SE +/- 0.15, N = 3 SE +/- 0.16, N = 3 71.01 71.97 -arch -isysroot - MIN: 70.58 / MAX: 74.44 -rdynamic -lgomp -lpthread - MIN: 69.95 / MAX: 94.76 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: googlenet MBP M1 Max Machine Learning ML Tests 6 12 18 24 30 SE +/- 0.07, N = 3 SE +/- 0.28, N = 3 24.96 13.74 -arch -isysroot - MIN: 24.82 / MAX: 25.91 -rdynamic -lgomp -lpthread - MIN: 12.47 / MAX: 28.56 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: blazeface MBP M1 Max Machine Learning ML Tests 0.3713 0.7426 1.1139 1.4852 1.8565 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 1.65 1.20 -arch -isysroot - MIN: 1.64 / MAX: 1.72 -rdynamic -lgomp -lpthread - MIN: 1.16 / MAX: 1.78 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: efficientnet-b0 MBP M1 Max Machine Learning ML Tests 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 8.69 5.22 -arch -isysroot - MIN: 8.59 / MAX: 9.15 -rdynamic -lgomp -lpthread - MIN: 4.86 / MAX: 20.63 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mnasnet MBP M1 Max Machine Learning ML Tests 1.215 2.43 3.645 4.86 6.075 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 5.40 3.25 -arch -isysroot - MIN: 5.35 / MAX: 5.68 -rdynamic -lgomp -lpthread - MIN: 2.82 / MAX: 16.82 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: shufflenet-v2 MBP M1 Max Machine Learning ML Tests 0.7808 1.5616 2.3424 3.1232 3.904 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 3.47 2.75 -arch -isysroot - MIN: 3.43 / MAX: 3.84 -rdynamic -lgomp -lpthread - MIN: 2.48 / MAX: 16.13 1. (CXX) g++ options: -O3
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v3-v3 - Model: mobilenet-v3 MBP M1 Max Machine Learning ML Tests 0.981 1.962 2.943 3.924 4.905 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 4.36 3.41 -arch -isysroot - MIN: 4.32 / MAX: 4.61 -rdynamic -lgomp -lpthread - MIN: 3.11 / MAX: 17.47 1. (CXX) g++ options: -O3
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v2-v2 - Model: mobilenet-v2 MBP M1 Max Machine Learning ML Tests 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 5.33 3.99 -arch -isysroot - MIN: 5.27 / MAX: 5.61 -rdynamic -lgomp -lpthread - MIN: 3.71 / MAX: 19.11 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mobilenet MBP M1 Max Machine Learning ML Tests 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 20.32 15.95 -arch -isysroot - MIN: 20.23 / MAX: 21.33 -rdynamic -lgomp -lpthread - MIN: 14.92 / MAX: 35.66 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: regnety_400m MBP M1 Max Machine Learning ML Tests 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.06, N = 3 7.19 5.28 -arch -isysroot - MIN: 7.15 / MAX: 7.72 -rdynamic -lgomp -lpthread - MIN: 4.68 / MAX: 6.44 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: squeezenet_ssd MBP M1 Max Machine Learning ML Tests 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.36, N = 3 20.55 15.36 -arch -isysroot - MIN: 20.39 / MAX: 22.13 -rdynamic -lgomp -lpthread - MIN: 14.17 / MAX: 22.53 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: yolov4-tiny MBP M1 Max Machine Learning ML Tests 7 14 21 28 35 SE +/- 0.07, N = 3 SE +/- 0.42, N = 3 30.33 18.82 -arch -isysroot - MIN: 29.85 / MAX: 32.58 -rdynamic -lgomp -lpthread - MIN: 17.12 / MAX: 24.45 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: resnet50 MBP M1 Max Machine Learning ML Tests 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 43.08 13.12 -arch -isysroot - MIN: 42.9 / MAX: 45.66 -rdynamic -lgomp -lpthread - MIN: 12.27 / MAX: 15.04 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: alexnet MBP M1 Max Machine Learning ML Tests 7 14 21 28 35 SE +/- 0.00, N = 3 SE +/- 0.03, N = 3 29.89 6.32 -arch -isysroot - MIN: 29.79 / MAX: 31.07 -rdynamic -lgomp -lpthread - MIN: 5.95 / MAX: 7.49 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: resnet18 MBP M1 Max Machine Learning ML Tests 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 16.80 6.09 -arch -isysroot - MIN: 16.69 / MAX: 18.25 -rdynamic -lgomp -lpthread - MIN: 5.63 / MAX: 7.52 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: vgg16 MBP M1 Max Machine Learning ML Tests 16 32 48 64 80 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 70.89 43.99 -arch -isysroot - MIN: 70.59 / MAX: 73.62 -rdynamic -lgomp -lpthread - MIN: 43.17 / MAX: 45.59 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: googlenet MBP M1 Max Machine Learning ML Tests 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.25, N = 3 24.90 8.73 -arch -isysroot - MIN: 24.82 / MAX: 25.79 -rdynamic -lgomp -lpthread - MIN: 7.89 / MAX: 10.64 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: blazeface MBP M1 Max Machine Learning ML Tests 0.369 0.738 1.107 1.476 1.845 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 1.64 1.35 -arch -isysroot - MIN: 1.63 / MAX: 1.79 -rdynamic -lgomp -lpthread - MIN: 1.17 / MAX: 2.43 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: efficientnet-b0 MBP M1 Max Machine Learning ML Tests 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 8.71 10.07 -arch -isysroot - MIN: 8.6 / MAX: 9.43 -rdynamic -lgomp -lpthread - MIN: 9.06 / MAX: 11.43 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: mnasnet MBP M1 Max Machine Learning ML Tests 1.2083 2.4166 3.6249 4.8332 6.0415 SE +/- 0.00, N = 3 SE +/- 0.11, N = 2 5.37 3.89 -arch -isysroot - MIN: 5.35 / MAX: 5.62 -rdynamic -lgomp -lpthread - MIN: 3.56 / MAX: 5.01 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: shufflenet-v2 MBP M1 Max Machine Learning ML Tests 0.7785 1.557 2.3355 3.114 3.8925 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 3.46 3.02 -arch -isysroot - MIN: 3.44 / MAX: 3.82 -rdynamic -lgomp -lpthread - MIN: 2.54 / MAX: 4.38 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 MBP M1 Max Machine Learning ML Tests 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.00, N = 3 SE +/- 0.12, N = 3 4.35 4.69 -arch -isysroot - MIN: 4.32 / MAX: 4.63 -rdynamic -lgomp -lpthread - MIN: 4.29 / MAX: 5.94 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 MBP M1 Max Machine Learning ML Tests 1.1925 2.385 3.5775 4.77 5.9625 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 5.30 3.88 -arch -isysroot - MIN: 5.28 / MAX: 5.98 -rdynamic -lgomp -lpthread - MIN: 3.49 / MAX: 5.25 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: mobilenet MBP M1 Max Machine Learning ML Tests 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 20.30 10.27 -arch -isysroot - MIN: 20.23 / MAX: 21.48 -rdynamic -lgomp -lpthread - MIN: 9.59 / MAX: 17.84 1. (CXX) g++ options: -O3
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 ML Tests 20K 40K 60K 80K 100K SE +/- 103.35, N = 3 86567 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.5.4 Test: DNN - Deep Neural Network ML Tests 3K 6K 9K 12K 15K SE +/- 269.19, N = 15 13787 1. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 ML Tests 14K 28K 42K 56K 70K SE +/- 167.60, N = 3 65986 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet ML Tests 40K 80K 120K 160K 200K SE +/- 108.90, N = 3 189764
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile ML Tests 30K 60K 90K 120K 150K SE +/- 344.97, N = 3 152186
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant ML Tests 30K 60K 90K 120K 150K SE +/- 38.25, N = 3 141174
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float ML Tests 30K 60K 90K 120K 150K SE +/- 174.79, N = 3 127818
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica ML Tests 11 22 33 44 55 SE +/- 0.12, N = 3 48.40
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ML Tests 800 1600 2400 3200 4000 SE +/- 4.93, N = 3 3577.00 MIN: 3514.72 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU ML Tests 800 1600 2400 3200 4000 SE +/- 4.16, N = 3 3587.17 MIN: 3527.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU ML Tests 800 1600 2400 3200 4000 SE +/- 7.39, N = 3 3579.00 MIN: 3519.93 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression ML Tests 0.4725 0.945 1.4175 1.89 2.3625 SE +/- 0.01, N = 3 2.10
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ML Tests 500 1000 1500 2000 2500 SE +/- 6.34, N = 3 2228.17 MIN: 2189.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ML Tests 500 1000 1500 2000 2500 SE +/- 1.17, N = 3 2219.13 MIN: 2182.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 ML Tests 7K 14K 21K 28K 35K SE +/- 37.32, N = 3 33496 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU ML Tests 20 40 60 80 100 SE +/- 0.17, N = 3 74.44
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm ML Tests 4 8 12 16 20 SE +/- 0.02, N = 3 17.60
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark ML Tests 0.0291 0.0582 0.0873 0.1164 0.1455 SE +/- 0.0003, N = 3 0.1293 1. R scripting front-end version 4.0.4 (2021-02-15)
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU ML Tests 0.9582 1.9164 2.8746 3.8328 4.791 SE +/- 0.03780, N = 7 4.25855 MIN: 3.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 ML Tests 50 100 150 200 250 SE +/- 0.40, N = 3 249.48 MIN: 247.22 / MAX: 255.16 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 ML Tests 4 8 12 16 20 SE +/- 0.02, N = 3 16.14 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 ML Tests 50 100 150 200 250 SE +/- 0.13, N = 3 222.33 MIN: 221.49 / MAX: 224.65 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P1B2 ML Tests 9 18 27 36 45 37.51
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ML Tests 2 4 6 8 10 SE +/- 0.02843, N = 3 8.34789 MIN: 4.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ML Tests 0.4765 0.953 1.4295 1.906 2.3825 SE +/- 0.00458, N = 3 2.11771 MIN: 1.91 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU ML Tests 0.3667 0.7334 1.1001 1.4668 1.8335 SE +/- 0.00920, N = 3 1.62985 MIN: 1.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU ML Tests 1.0335 2.067 3.1005 4.134 5.1675 SE +/- 0.00541, N = 3 4.59343 MIN: 4.39 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU ML Tests 0.672 1.344 2.016 2.688 3.36 SE +/- 0.01068, N = 3 2.98659 MIN: 2.72 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU ML Tests 3 6 9 12 15 SE +/- 0.02, N = 3 12.09 MIN: 11.92 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU ML Tests 0.6057 1.2114 1.8171 2.4228 3.0285 SE +/- 0.00222, N = 3 2.69210 MIN: 2.57 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 ML Tests 12 24 36 48 60 SE +/- 0.62, N = 3 55.43 MIN: 54.24 / MAX: 57.06 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ML Tests 6 12 18 24 30 SE +/- 0.02, N = 3 23.77 MIN: 22.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ML Tests 5 10 15 20 25 SE +/- 0.03, N = 3 22.79 MIN: 21.94 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ML Tests 2 4 6 8 10 SE +/- 0.01002, N = 3 6.74559 MIN: 6.52 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ML Tests 0.7308 1.4616 2.1924 2.9232 3.654 SE +/- 0.02694, N = 3 3.24784 MIN: 2.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.5