kubuntu-2404-nvme

ARMv8 Cortex-A76 testing with a Raspberry Pi 5 Model B Rev 1.0 and V3D 7.1.7 8GB on Ubuntu 24.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411044-NE-KUBUNTU2463
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
ARMv8 Cortex-A76
September 27
  1 Day, 23 Minutes
V3D 7.1.7
November 04
  9 Minutes
Invert Behavior (Only Show Selected Data)
  12 Hours, 16 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


kubuntu-2404-nvmeProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerOpenGLFile-SystemScreen ResolutionCompilerARMv8 Cortex-A76V3D 7.1.7ARMv8 Cortex-A76 @ 2.40GHz (4 Cores)Raspberry Pi 5 Model B Rev 1.0Broadcom BCM27128GB500GB KINGSTON SNV2S500GV3D 7.1.7 8GBPA247CVRaspberry Pi RP1 PCIe 2.0 South BridgeUbuntu 24.046.8.0-1012-raspi (aarch64)KDE Plasma 5.27.11X Server 1.21.1.113.1 Mesa 24.0.9-0ubuntu0.1ext41920x1080500GB KINGSTON SNV2S500G + 62GB DataTraveler 3.06.8.0-1013-raspi (aarch64)3.1 Mesa 24.0.9-0ubuntu0.2GCC 13.2.0OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseProcessor Details- Scaling Governor: cpufreq-dt ondemandSecurity Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected Compiler Details- V3D 7.1.7: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v

kubuntu-2404-nvmewhisper-cpp: ggml-medium.en - 2016 State of the Unionwhisperfile: Mediumllama-cpp: Meta-Llama-3-8B-Instruct-Q8_0.ggufwhisper-cpp: ggml-small.en - 2016 State of the Unionwhisperfile: Smallmnn: inception-v3mnn: mobilenet-v1-1.0mnn: MobileNetV2_224mnn: SqueezeNetV1.0mnn: resnet-v2-50mnn: squeezenetv1.1mnn: mobilenetV3mnn: nasnetwhisper-cpp: ggml-base.en - 2016 State of the Unionopencv: Image Processingllamafile: mistral-7b-instruct-v0.2.Q5_K_M - GPU AUTOopencv: DNN - Deep Neural Networkbuild-ffmpeg: Time To Compileopencv: Stitchingxnnpack: QU8MobileNetV3Smallxnnpack: QU8MobileNetV3Largexnnpack: QU8MobileNetV2xnnpack: FP16MobileNetV3Smallxnnpack: FP16MobileNetV3Largexnnpack: FP16MobileNetV2xnnpack: FP32MobileNetV3Smallxnnpack: FP32MobileNetV3Largexnnpack: FP32MobileNetV2onednn: Recurrent Neural Network Training - CPUllamafile: mistral-7b-instruct-v0.2.Q5_K_M - CPUnumpy: ncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetncnn: CPU - FastestDetncnn: CPU - vision_transformerncnn: CPU - regnety_400mncnn: CPU - squeezenet_ssdncnn: CPU - yolov4-tinyncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - resnet50ncnn: CPU - alexnetncnn: CPU - resnet18ncnn: CPU - vgg16ncnn: CPU - googlenetncnn: CPU - blazefacencnn: CPU - efficientnet-b0ncnn: CPU - mnasnetncnn: CPU - shufflenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU - mobilenetwhisperfile: Tinyopencv: Coreonednn: Recurrent Neural Network Inference - CPUllamafile: llava-v1.6-mistral-7b.Q8_0 - CPUopencv: Features 2Dllamafile: llava-v1.6-mistral-7b.Q8_0 - GPU AUTOopencv: Videox265: Bosphorus 1080pvkmark: 1920 x 1080 - Mailboxvkmark: 1280 x 1024 - Mailboxvkmark: 1024 x 768 - Mailboxvkmark: 800 x 600 - Mailboxbuild-apache: Time To Compileonnx: ResNet101_DUC_HDC-12 - CPU - Parallelonnx: ResNet101_DUC_HDC-12 - CPU - Parallelrbenchmark: onnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Parallelonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Parallelonnx: ZFNet-512 - CPU - Parallelonnx: ZFNet-512 - CPU - Parallelonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Paralleltensorflow-lite: Inception V4tensorflow-lite: Inception ResNet V2opencv: Object Detectiontensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Floatonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Paralleltensorflow-lite: SqueezeNettensorflow-lite: Mobilenet Quantonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Parallelx264: Bosphorus 1080ponednn: Deconvolution Batch shapes_1d - CPUonednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUllamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - CPUllamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - GPU AUTOonednn: Deconvolution Batch shapes_3d - CPUonnx: bertsquad-12 - CPU - ParallelARMv8 Cortex-A76V3D 7.1.75899.93655162.4460.121928.494121779.11862150.3821.5514.9630.068122.55714.3053.97528.907646.2346282641.88472235462.72543949533371033811904404610478101667492203542052265557.71.89136.856.13629.5812.8642.5157.0548.8358.3125.8925.01163.3234.491.916.069.133.979.7614.1248.836.32627.8612.8243.1157.2148.6657.9425.6924.95163.1434.791.5715.999.163.8310.0813.9448.66332.1208828147035072.31.962408881.951618593.71104.93223144.30.04320720.60389958.270.1004191874.140.533576104.1659.5995859.141916.9072465502152536144352846.618684.622.131945.173926872.65774.2480.010912.4974115.0118.6943418.14560.13258.65262.9802264.111154.38103.122131192302467OpenBenchmarking.org

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

ARMv8 Cortex-A76: The test quit with a non-zero exit status.

Test: Meta-Llama-3-8B-Instruct.F16 - Acceleration: CPU

ARMv8 Cortex-A76: The test quit with a non-zero exit status.

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the UnionARMv8 Cortex-A76130026003900520065005899.941. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -mcpu=native

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: MediumARMv8 Cortex-A76110022003300440055005162.45

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b3067Model: Meta-Llama-3-8B-Instruct-Q8_0.ggufARMv8 Cortex-A760.0270.0540.0810.1080.1350.121. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -lopenblas

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the UnionARMv8 Cortex-A764008001200160020001928.491. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -mcpu=native

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: SmallARMv8 Cortex-A764008001200160020001779.12

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

Test: Graph API

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: AbsExact error: G-API output and reference output matrixes are not bitexact equal.

Mobile Neural Network

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: inception-v3ARMv8 Cortex-A76306090120150150.38MIN: 141.2 / MAX: 217.081. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: mobilenet-v1-1.0ARMv8 Cortex-A7651015202521.55MIN: 20.1 / MAX: 44.011. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: MobileNetV2_224ARMv8 Cortex-A764812162014.96MIN: 13.95 / MAX: 32.711. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: SqueezeNetV1.0ARMv8 Cortex-A7671421283530.07MIN: 28.25 / MAX: 64.531. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: resnet-v2-50ARMv8 Cortex-A76306090120150122.56MIN: 112.59 / MAX: 278.691. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: squeezenetv1.1ARMv8 Cortex-A764812162014.31MIN: 13.36 / MAX: 34.421. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: mobilenetV3ARMv8 Cortex-A760.89441.78882.68323.57764.4723.975MIN: 3.66 / MAX: 12.571. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.9.b11b7037dModel: nasnetARMv8 Cortex-A7671421283528.91MIN: 26.83 / MAX: 75.521. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -pthread -ldl

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-base.en - Input: 2016 State of the UnionARMv8 Cortex-A76140280420560700646.231. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -mcpu=native

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Image ProcessingARMv8 Cortex-A76130K260K390K520K650K6282641. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.6Test: mistral-7b-instruct-v0.2.Q5_K_M - Acceleration: GPU AUTOARMv8 Cortex-A760.4230.8461.2691.6922.1151.88

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: DNN - Deep Neural NetworkARMv8 Cortex-A76100K200K300K400K500K4722351. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

Timed FFmpeg Compilation

This test times how long it takes to build the FFmpeg multimedia library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed FFmpeg Compilation 7.0Time To CompileARMv8 Cortex-A76100200300400500462.73

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: StitchingARMv8 Cortex-A7690K180K270K360K450K4394951. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: QU8MobileNetV3SmallARMv8 Cortex-A76700140021002800350033371. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: QU8MobileNetV3LargeARMv8 Cortex-A762K4K6K8K10K103381. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: QU8MobileNetV2ARMv8 Cortex-A763K6K9K12K15K119041. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: FP16MobileNetV3SmallARMv8 Cortex-A76900180027003600450040461. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: FP16MobileNetV3LargeARMv8 Cortex-A762K4K6K8K10K104781. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: FP16MobileNetV2ARMv8 Cortex-A762K4K6K8K10K101661. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: FP32MobileNetV3SmallARMv8 Cortex-A761600320048006400800074921. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: FP32MobileNetV3LargeARMv8 Cortex-A764K8K12K16K20K203541. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK 2cd86bModel: FP32MobileNetV2ARMv8 Cortex-A764K8K12K16K20K205221. (CXX) g++ options: -O3 -lrt -lm

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Training - Engine: CPUARMv8 Cortex-A7614K28K42K56K70K65557.7MIN: 65037.41. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.6Test: mistral-7b-instruct-v0.2.Q5_K_M - Acceleration: CPUARMv8 Cortex-A760.42530.85061.27591.70122.12651.89

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkARMv8 Cortex-A76306090120150136.85

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetARMv8 Cortex-A762468106.13MIN: 5.6 / MAX: 44.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerARMv8 Cortex-A76140280420560700629.58MIN: 585.91 / MAX: 719.321. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mARMv8 Cortex-A76369121512.86MIN: 11.6 / MAX: 77.321. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdARMv8 Cortex-A76102030405042.51MIN: 39.68 / MAX: 110.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyARMv8 Cortex-A76132639526557.05MIN: 54.01 / MAX: 101.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3ARMv8 Cortex-A76112233445548.83MIN: 45.66 / MAX: 115.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50ARMv8 Cortex-A76132639526558.31MIN: 53.92 / MAX: 113.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetARMv8 Cortex-A7661218243025.89MIN: 24.36 / MAX: 55.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18ARMv8 Cortex-A7661218243025.01MIN: 23.31 / MAX: 70.281. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16ARMv8 Cortex-A764080120160200163.32MIN: 155.28 / MAX: 202.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetARMv8 Cortex-A7681624324034.49MIN: 32.05 / MAX: 115.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceARMv8 Cortex-A760.42750.8551.28251.712.13751.9MIN: 1.52 / MAX: 40.051. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0ARMv8 Cortex-A764812162016.06MIN: 14.54 / MAX: 81.051. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetARMv8 Cortex-A7636912159.13MIN: 8.28 / MAX: 66.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2ARMv8 Cortex-A760.89331.78662.67993.57324.46653.97MIN: 3.46 / MAX: 65.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3ARMv8 Cortex-A7636912159.76MIN: 8.95 / MAX: 77.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2ARMv8 Cortex-A764812162014.12MIN: 12.74 / MAX: 76.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetARMv8 Cortex-A76112233445548.83MIN: 45.66 / MAX: 115.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDetARMv8 Cortex-A762468106.32MIN: 5.61 / MAX: 67.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformerARMv8 Cortex-A76140280420560700627.86MIN: 594.98 / MAX: 660.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400mARMv8 Cortex-A76369121512.82MIN: 11.59 / MAX: 75.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssdARMv8 Cortex-A76102030405043.11MIN: 39.81 / MAX: 109.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tinyARMv8 Cortex-A76132639526557.21MIN: 54.26 / MAX: 106.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3ARMv8 Cortex-A76112233445548.66MIN: 45.43 / MAX: 106.551. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet50ARMv8 Cortex-A76132639526557.94MIN: 53.77 / MAX: 1151. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnetARMv8 Cortex-A7661218243025.69MIN: 24.31 / MAX: 55.131. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet18ARMv8 Cortex-A7661218243024.95MIN: 23.29 / MAX: 74.281. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg16ARMv8 Cortex-A764080120160200163.14MIN: 155.59 / MAX: 207.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenetARMv8 Cortex-A7681624324034.79MIN: 32.19 / MAX: 113.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazefaceARMv8 Cortex-A760.35330.70661.05991.41321.76651.57MIN: 1.5 / MAX: 8.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b0ARMv8 Cortex-A764812162015.99MIN: 14.54 / MAX: 78.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnetARMv8 Cortex-A7636912159.16MIN: 8.33 / MAX: 41.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v2ARMv8 Cortex-A760.86181.72362.58543.44724.3093.83MIN: 3.41 / MAX: 68.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v3ARMv8 Cortex-A76369121510.08MIN: 9.01 / MAX: 81.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v2ARMv8 Cortex-A764812162013.94MIN: 12.76 / MAX: 75.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenetARMv8 Cortex-A76112233445548.66MIN: 45.43 / MAX: 106.551. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: TinyARMv8 Cortex-A7670140210280350332.12

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: CoreARMv8 Cortex-A7660K120K180K240K300K2814701. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Inference - Engine: CPUARMv8 Cortex-A768K16K24K32K40K35072.3MIN: 34622.41. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.6Test: llava-v1.6-mistral-7b.Q8_0 - Acceleration: CPUARMv8 Cortex-A760.4410.8821.3231.7642.2051.96

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Features 2DARMv8 Cortex-A7650K100K150K200K250K2408881. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.6Test: llava-v1.6-mistral-7b.Q8_0 - Acceleration: GPU AUTOARMv8 Cortex-A760.43880.87761.31641.75522.1941.95

Test: Meta-Llama-3-8B-Instruct.F16 - Acceleration: GPU AUTO

ARMv8 Cortex-A76: The test quit with a non-zero exit status.

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: VideoARMv8 Cortex-A7630K60K90K120K150K1618591. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

x265

This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.6Video Input: Bosphorus 1080pARMv8 Cortex-A760.83481.66962.50443.33924.1743.711. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

VKMark

VKMark is a collection of open-source Vulkan tests / rendering benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgVKMark Score, More Is BetterVKMark 2022-05-16Resolution: 1920 x 1080 - Present Mode: MailboxV3D 7.1.73060901201501311. (CXX) g++ options: -pthread -ldl -std=c++14 -O0 -MD -MQ -MF

OpenBenchmarking.orgVKMark Score, More Is BetterVKMark 2022-05-16Resolution: 1280 x 1024 - Present Mode: MailboxV3D 7.1.740801201602001921. (CXX) g++ options: -pthread -ldl -std=c++14 -O0 -MD -MQ -MF

OpenBenchmarking.orgVKMark Score, More Is BetterVKMark 2022-05-16Resolution: 1024 x 768 - Present Mode: MailboxV3D 7.1.7701402102803503021. (CXX) g++ options: -pthread -ldl -std=c++14 -O0 -MD -MQ -MF

OpenBenchmarking.orgVKMark Score, More Is BetterVKMark 2022-05-16Resolution: 800 x 600 - Present Mode: MailboxV3D 7.1.71002003004005004671. (CXX) g++ options: -pthread -ldl -std=c++14 -O0 -MD -MQ -MF

Timed Apache Compilation

This test times how long it takes to build the Apache HTTPD web server. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.41Time To CompileARMv8 Cortex-A7620406080100104.93

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: GPU AUTO

ARMv8 Cortex-A76: The test quit with a non-zero exit status.

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: ParallelARMv8 Cortex-A765K10K15K20K25K23144.31. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: ParallelARMv8 Cortex-A760.00970.01940.02910.03880.04850.04320721. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

R Benchmark

This test is a quick-running survey of general R performance Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterR BenchmarkARMv8 Cortex-A760.13590.27180.40770.54360.67950.6038

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelARMv8 Cortex-A762K4K6K8K10K9958.271. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelARMv8 Cortex-A760.02260.04520.06780.09040.1130.1004191. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: ParallelARMv8 Cortex-A764008001200160020001874.141. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: ParallelARMv8 Cortex-A760.12010.24020.36030.48040.60050.5335761. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: ParallelARMv8 Cortex-A7620406080100104.171. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: ParallelARMv8 Cortex-A7636912159.599581. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: ParallelARMv8 Cortex-A76132639526559.141. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: ParallelARMv8 Cortex-A764812162016.911. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception V4ARMv8 Cortex-A7650K100K150K200K250K246550

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception ResNet V2ARMv8 Cortex-A7650K100K150K200K250K215253

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: Object DetectionARMv8 Cortex-A7613K26K39K52K65K614431. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: NASNet MobileARMv8 Cortex-A7611K22K33K44K55K52846.6

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet FloatARMv8 Cortex-A764K8K12K16K20K18684.6

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: ParallelARMv8 Cortex-A7651015202522.131. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: ParallelARMv8 Cortex-A76102030405045.171. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: SqueezeNetARMv8 Cortex-A766K12K18K24K30K26872.6

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet QuantARMv8 Cortex-A76120024003600480060005774.24

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: ParallelARMv8 Cortex-A762040608010080.011. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: ParallelARMv8 Cortex-A76369121512.501. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: ParallelARMv8 Cortex-A76306090120150115.011. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: ParallelARMv8 Cortex-A762468108.694341. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

x264

This is a multi-threaded test of the x264 video encoder run on the CPU with a choice of 1080p or 4K video input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2022-02-22Video Input: Bosphorus 1080pARMv8 Cortex-A764812162018.141. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -lm -lpthread -O3 -flto

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_1d - Engine: CPUARMv8 Cortex-A76120240360480600560.13MIN: 541.241. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 1D - Engine: CPUARMv8 Cortex-A76132639526558.65MIN: 54.791. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 3D - Engine: CPUARMv8 Cortex-A76142842567062.98MIN: 60.871. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Convolution Batch Shapes Auto - Engine: CPUARMv8 Cortex-A7660120180240300264.11MIN: 254.721. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.6Test: TinyLlama-1.1B-Chat-v1.0.BF16 - Acceleration: CPUARMv8 Cortex-A764812162015

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.6Test: TinyLlama-1.1B-Chat-v1.0.BF16 - Acceleration: GPU AUTOARMv8 Cortex-A760.98551.9712.95653.9424.92754.38

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

Device: CPU - Batch Size: 256 - Model: ResNet-152

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 64 - Model: ResNet-152

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 512 - Model: ResNet-152

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 32 - Model: ResNet-152

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 16 - Model: ResNet-152

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 1 - Model: ResNet-152

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 1 - Model: ResNet-50

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 256 - Model: ResNet-50

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 512 - Model: ResNet-50

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 16 - Model: ResNet-50

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 64 - Model: ResNet-50

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

Device: CPU - Batch Size: 32 - Model: ResNet-50

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: KeyError: 'brand_raw'

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_3d - Engine: CPUARMv8 Cortex-A7620406080100103.12MIN: 95.691. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

Benchmark: scikit_svm

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Benchmark: scikit_qda

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Benchmark: scikit_linearridgeregression

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Benchmark: scikit_ica

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'imp'

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

Detector: Bayesian Changepoint

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: Contextual Anomaly Detector OSE

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: KNN CAD

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: Earthgecko Skyline

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

Detector: Relative Entropy

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

Detector: Windowed Gaussian

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'pandas'

LeelaChessZero

Backend: Eigen

ARMv8 Cortex-A76: The test run did not produce a result.

Backend: BLAS

ARMv8 Cortex-A76: The test run did not produce a result.

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "resnet100/resnet100.onnx" failed: No such file or directory

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "FasterRCNN-12-int8/FasterRCNN-12-int8.onnx" failed: No such file or directory

Model: GPT-2 - Device: CPU - Executor: Parallel

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "GPT2/model.onnx" failed: No such file or directory

Model: bertsquad-12 - Device: CPU - Executor: Parallel

ARMv8 Cortex-A76: The test quit with a non-zero exit status. E: onnxruntime/onnxruntime/test/onnx/onnx_model_info.cc:45 void OnnxModelInfo::InitOnnxModelInfo(const std::filesystem::__cxx11::path&) open file "bertsquad-12/bertsquad-12.onnx" failed: No such file or directory

System Temperature Monitor

MinAvgMaxV3D 7.1.744.147.250.7ARMv8 Cortex-A7644.755.973.8OpenBenchmarking.orgCelsiusSystem Temperature MonitorPhoronix Test Suite System Monitoring20406080100

System Fan Speed Monitor

MinAvgMaxV3D 7.1.7173033913465ARMv8 Cortex-A761148228294OpenBenchmarking.orgRPMSystem Fan Speed MonitorPhoronix Test Suite System Monitoring2K4K6K8K10K

Memory Usage Monitor

MinAvgMaxV3D 7.1.79159751075ARMv8 Cortex-A7624719543664OpenBenchmarking.orgMegabytesMemory Usage MonitorPhoronix Test Suite System Monitoring10002000300040005000

CPU Usage (Summary) Monitor

MinAvgMaxV3D 7.1.70.04.647.5ARMv8 Cortex-A760.087.9100.0OpenBenchmarking.orgPercentCPU Usage (Summary) MonitorPhoronix Test Suite System Monitoring20406080100

CPU Temperature Monitor

MinAvgMaxV3D 7.1.743.647.250.2ARMv8 Cortex-A7644.755.974.4OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System Monitoring20406080100

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxV3D 7.1.7150015702400ARMv8 Cortex-A76150020862400OpenBenchmarking.orgMegahertzCPU Peak Freq (Highest CPU Core Frequency) MonitorPhoronix Test Suite System Monitoring6001200180024003000

CPU Fan Speed Monitor

MinAvgMaxV3D 7.1.7173033913465ARMv8 Cortex-A761148228294OpenBenchmarking.orgRPMCPU Fan Speed MonitorPhoronix Test Suite System Monitoring2K4K6K8K10K

119 Results Shown

Whisper.cpp
Whisperfile
Llama.cpp
Whisper.cpp
Whisperfile
Mobile Neural Network:
  inception-v3
  mobilenet-v1-1.0
  MobileNetV2_224
  SqueezeNetV1.0
  resnet-v2-50
  squeezenetv1.1
  mobilenetV3
  nasnet
Whisper.cpp
OpenCV
Llamafile
OpenCV
Timed FFmpeg Compilation
OpenCV
XNNPACK:
  QU8MobileNetV3Small
  QU8MobileNetV3Large
  QU8MobileNetV2
  FP16MobileNetV3Small
  FP16MobileNetV3Large
  FP16MobileNetV2
  FP32MobileNetV3Small
  FP32MobileNetV3Large
  FP32MobileNetV2
oneDNN
Llamafile
Numpy Benchmark
NCNN:
  Vulkan GPU - FastestDet
  Vulkan GPU - vision_transformer
  Vulkan GPU - regnety_400m
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - yolov4-tiny
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - resnet50
  Vulkan GPU - alexnet
  Vulkan GPU - resnet18
  Vulkan GPU - vgg16
  Vulkan GPU - googlenet
  Vulkan GPU - blazeface
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - mnasnet
  Vulkan GPU - shufflenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU - mobilenet
  CPU - FastestDet
  CPU - vision_transformer
  CPU - regnety_400m
  CPU - squeezenet_ssd
  CPU - yolov4-tiny
  CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  CPU - resnet50
  CPU - alexnet
  CPU - resnet18
  CPU - vgg16
  CPU - googlenet
  CPU - blazeface
  CPU - efficientnet-b0
  CPU - mnasnet
  CPU - shufflenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU-v2-v2 - mobilenet-v2
  CPU - mobilenet
Whisperfile
OpenCV
oneDNN
Llamafile
OpenCV
Llamafile
OpenCV
x265
VKMark:
  1920 x 1080 - Mailbox
  1280 x 1024 - Mailbox
  1024 x 768 - Mailbox
  800 x 600 - Mailbox
Timed Apache Compilation
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
R Benchmark
ONNX Runtime:
  fcn-resnet101-11 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  yolov4 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  ZFNet-512 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  T5 Encoder - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
TensorFlow Lite:
  Inception V4
  Inception ResNet V2
OpenCV
TensorFlow Lite:
  NASNet Mobile
  Mobilenet Float
ONNX Runtime:
  CaffeNet 12-int8 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
TensorFlow Lite:
  SqueezeNet
  Mobilenet Quant
ONNX Runtime:
  ResNet50 v1-12-int8 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
  super-resolution-10 - CPU - Parallel:
    Inference Time Cost (ms)
    Inferences Per Second
x264
oneDNN:
  Deconvolution Batch shapes_1d - CPU
  IP Shapes 1D - CPU
  IP Shapes 3D - CPU
  Convolution Batch Shapes Auto - CPU
Llamafile:
  TinyLlama-1.1B-Chat-v1.0.BF16 - CPU
  TinyLlama-1.1B-Chat-v1.0.BF16 - GPU AUTO
oneDNN
System Temperature Monitor:
  Phoronix Test Suite System Monitoring:
    Celsius
    RPM
    Megabytes
    Percent
    Celsius
    Megahertz
    RPM