installres1

AMD Ryzen 5 5600X 6-Core testing with a ASRock X570 Phantom Gaming-ITX/TB3 (P3.00 BIOS) and NVIDIA GeForce RTX 3090 24GB on Ubuntu 18.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2103310-HA-INSTALLRE08
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
NVIDIA GeForce RTX 3090
March 29 2021
  6 Hours, 54 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


installres1OpenBenchmarking.orgPhoronix Test SuiteAMD Ryzen 5 5600X 6-Core @ 3.70GHz (6 Cores / 12 Threads)ASRock X570 Phantom Gaming-ITX/TB3 (P3.00 BIOS)AMD Device 148064GB2000GB Samsung SSD 970 EVO Plus 2TB + 4001GB Samsung SSD 870 + ProductCodeNVIDIA GeForce RTX 3090 24GBNVIDIA Device 1aefmarantz-AVRIntel I211 + Intel Device 2723Ubuntu 18.045.4.0-70-generic (x86_64)GNOME Shell 3.28.4X Server 1.20.8NVIDIA 460.32.034.6.0OpenCL 1.2 CUDA 11.2.1091.2.155GCC 7.5.0 + CUDA 11.2ext41920x1080ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionInstallres1 BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xa201009- GPU Compute Cores: 10496- Python 3.8.5- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected

installres1plaidml: No - Inference - VGG16 - CPUplaidml: No - Inference - ResNet 50 - CPUshoc: OpenCL - Triadshoc: OpenCL - Reductionshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthshoc: OpenCL - S3Dshoc: OpenCL - FFT SPshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Max SP Flopsshoc: OpenCL - MD5 Hashnumpy: ai-benchmark: Device Inference Scoreai-benchmark: Device Training Scoreai-benchmark: Device AI Scoretensorflow-lite: SqueezeNettensorflow-lite: Inception V4tensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Floattensorflow-lite: Mobilenet Quanttensorflow-lite: Inception ResNet V2onednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUmnn: SqueezeNetV1.0mnn: resnet-v2-50mnn: MobileNetV2_224mnn: mobilenet-v1-1.0mnn: inception-v3ncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mtnn: CPU - MobileNet v2tnn: CPU - SqueezeNet v1.1deepspeech: CPUrnnoise: ecp-candle: P1B2ecp-candle: P3B1ecp-candle: P3B2numenta-nab: EXPoSEnumenta-nab: Relative Entropynumenta-nab: Windowed Gaussiannumenta-nab: Earthgecko Skylinenumenta-nab: Bayesian Changepointmlpack: scikit_icamlpack: scikit_qdamlpack: scikit_svmmlpack: scikit_linearridgeregressionscikit-learn: NVIDIA GeForce RTX 309010.999.4012.8371383.06213.070413.17762156.05428.1402346.708373.7939360.442.9143466.33111111632274234227345746319268415937517420531217034.642887.091881.994991.4721713.535110.53888.3369712.80892.614084.371944007.492107.133999.332135.242.256364001.992109.643.537954.14323.6762.0992.73428.19414.223.683.114.403.515.371.4013.1355.9815.1513.4827.4722.1516.709.7314.183.683.184.313.435.091.4012.9755.8515.1113.4227.5622.6316.939.62213.475214.40563.9298015.97431.596896.151453.131319.32919.07813.084141.33843.28233.52120.4517.812.438.735OpenBenchmarking.org

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: VGG16 - Device: CPUNVIDIA GeForce RTX 30903691215SE +/- 0.17, N = 310.99

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: ResNet 50 - Device: CPUNVIDIA GeForce RTX 30903691215SE +/- 0.03, N = 39.40

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadNVIDIA GeForce RTX 30903691215SE +/- 0.00, N = 312.841. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionNVIDIA GeForce RTX 309080160240320400SE +/- 0.34, N = 3383.061. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadNVIDIA GeForce RTX 30903691215SE +/- 0.00, N = 313.071. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackNVIDIA GeForce RTX 30903691215SE +/- 0.00, N = 313.181. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthNVIDIA GeForce RTX 30905001000150020002500SE +/- 0.75, N = 32156.051. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DNVIDIA GeForce RTX 309090180270360450SE +/- 0.05, N = 3428.141. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPNVIDIA GeForce RTX 30905001000150020002500SE +/- 0.45, N = 32346.701. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NNVIDIA GeForce RTX 30902K4K6K8K10KSE +/- 70.49, N = 38373.791. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsNVIDIA GeForce RTX 30908K16K24K32K40KSE +/- 160.97, N = 339360.41. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashNVIDIA GeForce RTX 30901020304050SE +/- 0.00, N = 342.911. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkNVIDIA GeForce RTX 3090100200300400500SE +/- 0.17, N = 3466.33

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device Inference ScoreNVIDIA GeForce RTX 309020040060080010001111

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device Training ScoreNVIDIA GeForce RTX 3090300600900120015001163

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device AI ScoreNVIDIA GeForce RTX 309050010001500200025002274

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: SqueezeNetNVIDIA GeForce RTX 309050K100K150K200K250KSE +/- 102.89, N = 3234227

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception V4NVIDIA GeForce RTX 3090700K1400K2100K2800K3500KSE +/- 684.89, N = 33457463

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: NASNet MobileNVIDIA GeForce RTX 309040K80K120K160K200KSE +/- 231.45, N = 3192684

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet FloatNVIDIA GeForce RTX 309030K60K90K120K150KSE +/- 31.27, N = 3159375

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Mobilenet QuantNVIDIA GeForce RTX 309040K80K120K160K200KSE +/- 78.39, N = 3174205

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2020-08-23Model: Inception ResNet V2NVIDIA GeForce RTX 3090700K1400K2100K2800K3500KSE +/- 624.96, N = 33121703

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 30901.04462.08923.13384.17845.223SE +/- 0.01355, N = 34.64288MIN: 4.391. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 3090246810SE +/- 0.02249, N = 37.09188MIN: 6.941. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30900.44890.89781.34671.79562.2445SE +/- 0.01040, N = 31.99499MIN: 1.811. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30900.33120.66240.99361.32481.656SE +/- 0.00109, N = 31.47217MIN: 1.411. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 30903691215SE +/- 0.02, N = 313.54MIN: 13.261. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 30903691215SE +/- 0.02, N = 310.54MIN: 6.441. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 3090246810SE +/- 0.02918, N = 38.33697MIN: 8.011. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30903691215SE +/- 0.20, N = 412.81MIN: 12.251. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30900.58821.17641.76462.35282.941SE +/- 0.00540, N = 32.61408MIN: 2.411. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30900.98371.96742.95113.93484.9185SE +/- 0.00941, N = 34.37194MIN: 4.171. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 30909001800270036004500SE +/- 3.18, N = 34007.49MIN: 3971.881. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 30905001000150020002500SE +/- 1.73, N = 32107.13MIN: 2086.21. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30909001800270036004500SE +/- 5.89, N = 33999.33MIN: 3962.351. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30905001000150020002500SE +/- 27.02, N = 32135.24MIN: 2083.31. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUNVIDIA GeForce RTX 30900.50771.01541.52312.03082.5385SE +/- 0.00794, N = 32.25636MIN: 2.171. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUNVIDIA GeForce RTX 30909001800270036004500SE +/- 3.64, N = 34001.99MIN: 3964.951. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUNVIDIA GeForce RTX 30905001000150020002500SE +/- 2.08, N = 32109.64MIN: 2084.371. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUNVIDIA GeForce RTX 30900.7961.5922.3883.1843.98SE +/- 0.00554, N = 33.53795MIN: 3.281. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Mobile Neural Network

MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.1.3Model: SqueezeNetV1.0NVIDIA GeForce RTX 30900.93221.86442.79663.72884.661SE +/- 0.044, N = 34.143MIN: 3.94 / MAX: 60.311. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.1.3Model: resnet-v2-50NVIDIA GeForce RTX 3090612182430SE +/- 0.06, N = 323.68MIN: 23.08 / MAX: 54.31. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.1.3Model: MobileNetV2_224NVIDIA GeForce RTX 30900.47230.94461.41691.88922.3615SE +/- 0.002, N = 32.099MIN: 2.05 / MAX: 6.641. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.1.3Model: mobilenet-v1-1.0NVIDIA GeForce RTX 30900.61521.23041.84562.46083.076SE +/- 0.007, N = 32.734MIN: 2.67 / MAX: 7.631. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 1.1.3Model: inception-v3NVIDIA GeForce RTX 3090714212835SE +/- 0.05, N = 328.19MIN: 27.64 / MAX: 58.711. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: mobilenetNVIDIA GeForce RTX 309048121620SE +/- 0.22, N = 414.22MIN: 13.84 / MAX: 23.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU-v2-v2 - Model: mobilenet-v2NVIDIA GeForce RTX 30900.8281.6562.4843.3124.14SE +/- 0.01, N = 43.68MIN: 3.58 / MAX: 8.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU-v3-v3 - Model: mobilenet-v3NVIDIA GeForce RTX 30900.69981.39962.09942.79923.499SE +/- 0.01, N = 43.11MIN: 3.06 / MAX: 8.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: shufflenet-v2NVIDIA GeForce RTX 30900.991.982.973.964.95SE +/- 0.04, N = 44.40MIN: 4.29 / MAX: 29.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: mnasnetNVIDIA GeForce RTX 30900.78981.57962.36943.15923.949SE +/- 0.04, N = 43.51MIN: 3.32 / MAX: 22.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: efficientnet-b0NVIDIA GeForce RTX 30901.20832.41663.62494.83326.0415SE +/- 0.11, N = 45.37MIN: 5 / MAX: 19.551. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: blazefaceNVIDIA GeForce RTX 30900.3150.630.9451.261.575SE +/- 0.03, N = 41.40MIN: 1.33 / MAX: 5.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: googlenetNVIDIA GeForce RTX 30903691215SE +/- 0.01, N = 413.13MIN: 12.48 / MAX: 23.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: vgg16NVIDIA GeForce RTX 30901326395265SE +/- 0.16, N = 455.98MIN: 54.46 / MAX: 67.131. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: resnet18NVIDIA GeForce RTX 309048121620SE +/- 0.04, N = 415.15MIN: 14.72 / MAX: 25.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: alexnetNVIDIA GeForce RTX 30903691215SE +/- 0.11, N = 413.48MIN: 12.81 / MAX: 24.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: resnet50NVIDIA GeForce RTX 3090612182430SE +/- 0.10, N = 427.47MIN: 26.81 / MAX: 52.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: yolov4-tinyNVIDIA GeForce RTX 3090510152025SE +/- 0.16, N = 422.15MIN: 21.24 / MAX: 31.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: squeezenet_ssdNVIDIA GeForce RTX 309048121620SE +/- 0.02, N = 416.70MIN: 16.4 / MAX: 35.061. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: CPU - Model: regnety_400mNVIDIA GeForce RTX 30903691215SE +/- 0.08, N = 49.73MIN: 9.45 / MAX: 34.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: mobilenetNVIDIA GeForce RTX 309048121620SE +/- 0.08, N = 314.18MIN: 13.89 / MAX: 18.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2NVIDIA GeForce RTX 30900.8281.6562.4843.3124.14SE +/- 0.01, N = 33.68MIN: 3.6 / MAX: 8.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3NVIDIA GeForce RTX 30900.71551.4312.14652.8623.5775SE +/- 0.04, N = 33.18MIN: 3.03 / MAX: 20.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: shufflenet-v2NVIDIA GeForce RTX 30900.96981.93962.90943.87924.849SE +/- 0.07, N = 34.31MIN: 4.11 / MAX: 22.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: mnasnetNVIDIA GeForce RTX 30900.77181.54362.31543.08723.859SE +/- 0.07, N = 33.43MIN: 3.3 / MAX: 29.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: efficientnet-b0NVIDIA GeForce RTX 30901.14532.29063.43594.58125.7265SE +/- 0.09, N = 35.09MIN: 4.94 / MAX: 9.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: blazefaceNVIDIA GeForce RTX 30900.3150.630.9451.261.575SE +/- 0.05, N = 31.40MIN: 1.32 / MAX: 3.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: googlenetNVIDIA GeForce RTX 30903691215SE +/- 0.06, N = 312.97MIN: 12.29 / MAX: 24.791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: vgg16NVIDIA GeForce RTX 30901326395265SE +/- 0.32, N = 355.85MIN: 54.73 / MAX: 681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: resnet18NVIDIA GeForce RTX 309048121620SE +/- 0.02, N = 315.11MIN: 14.69 / MAX: 24.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: alexnetNVIDIA GeForce RTX 30903691215SE +/- 0.18, N = 313.42MIN: 12.87 / MAX: 23.341. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: resnet50NVIDIA GeForce RTX 3090612182430SE +/- 0.04, N = 327.56MIN: 26.94 / MAX: 58.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: yolov4-tinyNVIDIA GeForce RTX 3090510152025SE +/- 0.02, N = 322.63MIN: 21.64 / MAX: 47.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: squeezenet_ssdNVIDIA GeForce RTX 309048121620SE +/- 0.07, N = 316.93MIN: 16.52 / MAX: 24.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20201218Target: Vulkan GPU - Model: regnety_400mNVIDIA GeForce RTX 30903691215SE +/- 0.04, N = 39.62MIN: 9.41 / MAX: 14.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

TNN

TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.2.3Target: CPU - Model: MobileNet v2NVIDIA GeForce RTX 309050100150200250SE +/- 0.11, N = 3213.48MIN: 212.57 / MAX: 222.671. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.2.3Target: CPU - Model: SqueezeNet v1.1NVIDIA GeForce RTX 309050100150200250SE +/- 0.06, N = 3214.41MIN: 214.21 / MAX: 217.451. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -O3 -rdynamic -ldl

DeepSpeech

Mozilla DeepSpeech is a speech-to-text engine powered by TensorFlow for machine learning and derived from Baidu's Deep Speech research paper. This test profile times the speech-to-text process for a roughly three minute audio recording. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPUNVIDIA GeForce RTX 30901428425670SE +/- 0.22, N = 363.93

RNNoise

RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-28NVIDIA GeForce RTX 309048121620SE +/- 0.02, N = 315.971. (CC) gcc options: -O2 -pedantic -fvisibility=hidden

ECP-CANDLE

The CANDLE benchmark codes implement deep learning architectures relevant to problems in cancer. These architectures address problems at different biological scales, specifically problems at the molecular, cellular and population scales. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.3Benchmark: P1B2NVIDIA GeForce RTX 309071421283531.60

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.3Benchmark: P3B1NVIDIA GeForce RTX 30902004006008001000896.15

OpenBenchmarking.orgSeconds, Fewer Is BetterECP-CANDLE 0.3Benchmark: P3B2NVIDIA GeForce RTX 3090100200300400500453.13

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: EXPoSENVIDIA GeForce RTX 309070140210280350SE +/- 0.35, N = 3319.33

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Relative EntropyNVIDIA GeForce RTX 3090510152025SE +/- 0.26, N = 319.08

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Windowed GaussianNVIDIA GeForce RTX 30903691215SE +/- 0.06, N = 313.08

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Earthgecko SkylineNVIDIA GeForce RTX 3090306090120150SE +/- 0.95, N = 3141.34

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Bayesian ChangepointNVIDIA GeForce RTX 30901020304050SE +/- 0.35, N = 343.28

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_icaNVIDIA GeForce RTX 3090816243240SE +/- 0.02, N = 333.52

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_qdaNVIDIA GeForce RTX 3090306090120150SE +/- 0.77, N = 3120.45

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_svmNVIDIA GeForce RTX 309048121620SE +/- 0.07, N = 317.81

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_linearridgeregressionNVIDIA GeForce RTX 30900.54681.09361.64042.18722.734SE +/- 0.01, N = 32.43

Scikit-Learn

Scikit-learn is a Python module for machine learning Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterScikit-Learn 0.22.1NVIDIA GeForce RTX 3090246810SE +/- 0.085, N = 38.735

92 Results Shown

PlaidML:
  No - Inference - VGG16 - CPU
  No - Inference - ResNet 50 - CPU
SHOC Scalable HeterOgeneous Computing:
  OpenCL - Triad
  OpenCL - Reduction
  OpenCL - Bus Speed Download
  OpenCL - Bus Speed Readback
  OpenCL - Texture Read Bandwidth
  OpenCL - S3D
  OpenCL - FFT SP
  OpenCL - GEMM SGEMM_N
  OpenCL - Max SP Flops
  OpenCL - MD5 Hash
Numpy Benchmark
AI Benchmark Alpha:
  Device Inference Score
  Device Training Score
  Device AI Score
TensorFlow Lite:
  SqueezeNet
  Inception V4
  NASNet Mobile
  Mobilenet Float
  Mobilenet Quant
  Inception ResNet V2
oneDNN:
  IP Shapes 1D - f32 - CPU
  IP Shapes 3D - f32 - CPU
  IP Shapes 1D - u8s8f32 - CPU
  IP Shapes 3D - u8s8f32 - CPU
  Convolution Batch Shapes Auto - f32 - CPU
  Deconvolution Batch shapes_1d - f32 - CPU
  Deconvolution Batch shapes_3d - f32 - CPU
  Convolution Batch Shapes Auto - u8s8f32 - CPU
  Deconvolution Batch shapes_1d - u8s8f32 - CPU
  Deconvolution Batch shapes_3d - u8s8f32 - CPU
  Recurrent Neural Network Training - f32 - CPU
  Recurrent Neural Network Inference - f32 - CPU
  Recurrent Neural Network Training - u8s8f32 - CPU
  Recurrent Neural Network Inference - u8s8f32 - CPU
  Matrix Multiply Batch Shapes Transformer - f32 - CPU
  Recurrent Neural Network Training - bf16bf16bf16 - CPU
  Recurrent Neural Network Inference - bf16bf16bf16 - CPU
  Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU
Mobile Neural Network:
  SqueezeNetV1.0
  resnet-v2-50
  MobileNetV2_224
  mobilenet-v1-1.0
  inception-v3
NCNN:
  CPU - mobilenet
  CPU-v2-v2 - mobilenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU - shufflenet-v2
  CPU - mnasnet
  CPU - efficientnet-b0
  CPU - blazeface
  CPU - googlenet
  CPU - vgg16
  CPU - resnet18
  CPU - alexnet
  CPU - resnet50
  CPU - yolov4-tiny
  CPU - squeezenet_ssd
  CPU - regnety_400m
  Vulkan GPU - mobilenet
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU - shufflenet-v2
  Vulkan GPU - mnasnet
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - blazeface
  Vulkan GPU - googlenet
  Vulkan GPU - vgg16
  Vulkan GPU - resnet18
  Vulkan GPU - alexnet
  Vulkan GPU - resnet50
  Vulkan GPU - yolov4-tiny
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - regnety_400m
TNN:
  CPU - MobileNet v2
  CPU - SqueezeNet v1.1
DeepSpeech
RNNoise
ECP-CANDLE:
  P1B2
  P3B1
  P3B2
Numenta Anomaly Benchmark:
  EXPoSE
  Relative Entropy
  Windowed Gaussian
  Earthgecko Skyline
  Bayesian Changepoint
Mlpack Benchmark:
  scikit_ica
  scikit_qda
  scikit_svm
  scikit_linearridgeregression
Scikit-Learn