sys76-kudu-ml-nvidia AMD Ryzen 9 5900HX testing with a System76 Kudu (1.07.09RSA1 BIOS) and NVIDIA GeForce RTX 3060 Laptop GPU 6GB on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2202175-NE-SYS76KUDU28&grr .
sys76-kudu-ml-nvidia Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution NVIDIA GeForce RTX 3060 Laptop GPU AMD Ryzen 9 5900HX @ 3.30GHz (8 Cores / 16 Threads) System76 Kudu (1.07.09RSA1 BIOS) AMD Renoir/Cezanne 16GB Samsung SSD 970 EVO Plus 500GB NVIDIA GeForce RTX 3060 Laptop GPU 6GB NVIDIA Device 228e Realtek RTL8125 2.5GbE + Intel Wi-Fi 6 AX200 Pop 21.10 5.15.15-76051515-generic (x86_64) GNOME Shell 40.5 X Server 1.20.13 NVIDIA 470.86 4.6.0 OpenCL 3.0 CUDA 11.4.158 1.2.182 GCC 11.2.0 ext4 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa50000c - GLAMOR - BAR1 / Visible vRAM Size: 8192 MiB - GPU Compute Cores: 3840 - Python 3.9.7 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
sys76-kudu-ml-nvidia shoc: OpenCL - Max SP Flops caffe: GoogleNet - CPU - 1000 plaidml: No - Inference - VGG16 - CPU ecp-candle: P3B1 lczero: BLAS caffe: AlexNet - CPU - 1000 plaidml: No - Inference - ResNet 50 - CPU ecp-candle: P3B2 tnn: CPU - DenseNet numpy: caffe: GoogleNet - CPU - 200 opencv: DNN - Deep Neural Network tensorflow-lite: Inception V4 tensorflow-lite: Inception ResNet V2 mlpack: scikit_qda caffe: GoogleNet - CPU - 100 mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet caffe: AlexNet - CPU - 200 tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Quant tensorflow-lite: SqueezeNet tensorflow-lite: Mobilenet Float mlpack: scikit_ica deepspeech: CPU onednn: IP Shapes 3D - u8s8f32 - CPU mlpack: scikit_linearridgeregression caffe: AlexNet - CPU - 100 mlpack: scikit_svm onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU rbenchmark: onednn: IP Shapes 1D - f32 - CPU tnn: CPU - MobileNet v2 rnnoise: tnn: CPU - SqueezeNet v1.1 onednn: IP Shapes 1D - u8s8f32 - CPU shoc: OpenCL - Texture Read Bandwidth onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU ecp-candle: P1B2 onednn: IP Shapes 3D - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU shoc: OpenCL - GEMM SGEMM_N tnn: CPU - SqueezeNet v2 shoc: OpenCL - Bus Speed Readback shoc: OpenCL - S3D onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU shoc: OpenCL - Bus Speed Download shoc: OpenCL - Triad shoc: OpenCL - FFT SP shoc: OpenCL - Reduction shoc: OpenCL - MD5 Hash openvino: Face Detection 0106 FP16 - CPU NVIDIA GeForce RTX 3060 Laptop GPU 15066.7 866136 13.26 1473.428 565 325342 7.00 716.514 2721.407 432.86 174373 28019 2758593 2486377 66.04 86800 31.923 2.411 2.385 4.490 22.606 2.773 1.199 3570.14 3566.36 3558.73 2164.42 2164.93 2135.56 6.85 18.56 25.11 25.27 14.63 15.92 70.90 13.44 1.19 5.26 3.24 2.74 3.48 4.00 15.69 5.79 19.63 19.01 13.36 6.73 6.51 44.99 9.01 1.34 10.03 3.96 2.88 4.66 4.02 10.6 65872 151814 141345 190603 127871 48.31 68.95180 2.47936 2.09 33278 17.59 8.34154 2.12066 0.1261 4.30842 250.129 16.110 222.651 1.63127 1309.85 4.60966 3.00110 35.184 11.2735 22.5331 23.3045 2765.33 54.863 6.7648 166.282 6.71586 3.18077 6.6926 6.5530 877.307 295.567 16.2706 OpenBenchmarking.org
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops NVIDIA GeForce RTX 3060 Laptop GPU 3K 6K 9K 12K 15K SE +/- 19.26, N = 3 15066.7 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 NVIDIA GeForce RTX 3060 Laptop GPU 200K 400K 600K 800K 1000K SE +/- 360.03, N = 3 866136 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.11, N = 12 13.26
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P3B1 NVIDIA GeForce RTX 3060 Laptop GPU 300 600 900 1200 1500 1473.43
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS NVIDIA GeForce RTX 3060 Laptop GPU 120 240 360 480 600 SE +/- 6.66, N = 3 565 1. (CXX) g++ options: -flto -pthread
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 NVIDIA GeForce RTX 3060 Laptop GPU 70K 140K 210K 280K 350K SE +/- 901.36, N = 3 325342 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.02, N = 3 7.00
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P3B2 NVIDIA GeForce RTX 3060 Laptop GPU 150 300 450 600 750 716.51
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet NVIDIA GeForce RTX 3060 Laptop GPU 600 1200 1800 2400 3000 SE +/- 2.86, N = 3 2721.41 MIN: 2675.12 / MAX: 2805.37 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark NVIDIA GeForce RTX 3060 Laptop GPU 90 180 270 360 450 SE +/- 0.83, N = 3 432.86
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 NVIDIA GeForce RTX 3060 Laptop GPU 40K 80K 120K 160K 200K SE +/- 519.84, N = 3 174373 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.5.4 Test: DNN - Deep Neural Network NVIDIA GeForce RTX 3060 Laptop GPU 6K 12K 18K 24K 30K SE +/- 568.38, N = 15 28019 1. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 NVIDIA GeForce RTX 3060 Laptop GPU 600K 1200K 1800K 2400K 3000K SE +/- 636.98, N = 3 2758593
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 NVIDIA GeForce RTX 3060 Laptop GPU 500K 1000K 1500K 2000K 2500K SE +/- 1929.87, N = 3 2486377
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda NVIDIA GeForce RTX 3060 Laptop GPU 15 30 45 60 75 SE +/- 0.17, N = 3 66.04
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 NVIDIA GeForce RTX 3060 Laptop GPU 20K 40K 60K 80K 100K SE +/- 83.58, N = 3 86800 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: inception-v3 NVIDIA GeForce RTX 3060 Laptop GPU 7 14 21 28 35 SE +/- 0.30, N = 3 31.92 MIN: 29.4 / MAX: 49.5 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: mobilenet-v1-1.0 NVIDIA GeForce RTX 3060 Laptop GPU 0.5425 1.085 1.6275 2.17 2.7125 SE +/- 0.039, N = 3 2.411 MIN: 2.17 / MAX: 19.27 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: MobileNetV2_224 NVIDIA GeForce RTX 3060 Laptop GPU 0.5366 1.0732 1.6098 2.1464 2.683 SE +/- 0.012, N = 3 2.385 MIN: 2.24 / MAX: 20.06 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: SqueezeNetV1.0 NVIDIA GeForce RTX 3060 Laptop GPU 1.0103 2.0206 3.0309 4.0412 5.0515 SE +/- 0.077, N = 3 4.490 MIN: 4.31 / MAX: 10.28 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: resnet-v2-50 NVIDIA GeForce RTX 3060 Laptop GPU 5 10 15 20 25 SE +/- 0.23, N = 3 22.61 MIN: 21.44 / MAX: 48.44 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: squeezenetv1.1 NVIDIA GeForce RTX 3060 Laptop GPU 0.6239 1.2478 1.8717 2.4956 3.1195 SE +/- 0.014, N = 3 2.773 MIN: 2.57 / MAX: 11.72 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.2 Model: mobilenetV3 NVIDIA GeForce RTX 3060 Laptop GPU 0.2698 0.5396 0.8094 1.0792 1.349 SE +/- 0.004, N = 3 1.199 MIN: 1.14 / MAX: 9.8 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 800 1600 2400 3200 4000 SE +/- 11.16, N = 3 3570.14 MIN: 3520.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 800 1600 2400 3200 4000 SE +/- 5.55, N = 3 3566.36 MIN: 3515.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 800 1600 2400 3200 4000 SE +/- 6.65, N = 3 3558.73 MIN: 3513.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 500 1000 1500 2000 2500 SE +/- 15.52, N = 3 2164.42 MIN: 2113.31 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 500 1000 1500 2000 2500 SE +/- 1.37, N = 3 2164.93 MIN: 2138.25 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 500 1000 1500 2000 2500 SE +/- 5.46, N = 3 2135.56 MIN: 2104.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: regnety_400m NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.02, N = 3 6.85 MIN: 6.35 / MAX: 16.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: squeezenet_ssd NVIDIA GeForce RTX 3060 Laptop GPU 5 10 15 20 25 SE +/- 0.06, N = 3 18.56 MIN: 17.91 / MAX: 26.49 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: yolov4-tiny NVIDIA GeForce RTX 3060 Laptop GPU 6 12 18 24 30 SE +/- 0.08, N = 3 25.11 MIN: 24.08 / MAX: 47.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet50 NVIDIA GeForce RTX 3060 Laptop GPU 6 12 18 24 30 SE +/- 0.08, N = 3 25.27 MIN: 24.15 / MAX: 39.62 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: alexnet NVIDIA GeForce RTX 3060 Laptop GPU 4 8 12 16 20 SE +/- 0.02, N = 3 14.63 MIN: 14.11 / MAX: 22.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: resnet18 NVIDIA GeForce RTX 3060 Laptop GPU 4 8 12 16 20 SE +/- 0.44, N = 3 15.92 MIN: 14.75 / MAX: 26.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: vgg16 NVIDIA GeForce RTX 3060 Laptop GPU 16 32 48 64 80 SE +/- 0.04, N = 3 70.90 MIN: 69.81 / MAX: 84.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: googlenet NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.37, N = 3 13.44 MIN: 12.49 / MAX: 28.69 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: blazeface NVIDIA GeForce RTX 3060 Laptop GPU 0.2678 0.5356 0.8034 1.0712 1.339 SE +/- 0.01, N = 3 1.19 MIN: 1.15 / MAX: 6.67 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: efficientnet-b0 NVIDIA GeForce RTX 3060 Laptop GPU 1.1835 2.367 3.5505 4.734 5.9175 SE +/- 0.01, N = 3 5.26 MIN: 4.89 / MAX: 15.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mnasnet NVIDIA GeForce RTX 3060 Laptop GPU 0.729 1.458 2.187 2.916 3.645 SE +/- 0.01, N = 3 3.24 MIN: 2.91 / MAX: 8.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: shufflenet-v2 NVIDIA GeForce RTX 3060 Laptop GPU 0.6165 1.233 1.8495 2.466 3.0825 SE +/- 0.02, N = 3 2.74 MIN: 2.46 / MAX: 14.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v3-v3 - Model: mobilenet-v3 NVIDIA GeForce RTX 3060 Laptop GPU 0.783 1.566 2.349 3.132 3.915 SE +/- 0.01, N = 3 3.48 MIN: 3.18 / MAX: 9.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU-v2-v2 - Model: mobilenet-v2 NVIDIA GeForce RTX 3060 Laptop GPU 0.9 1.8 2.7 3.6 4.5 SE +/- 0.01, N = 3 4.00 MIN: 3.73 / MAX: 9.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: CPU - Model: mobilenet NVIDIA GeForce RTX 3060 Laptop GPU 4 8 12 16 20 SE +/- 0.22, N = 3 15.69 MIN: 14.93 / MAX: 22 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: regnety_400m NVIDIA GeForce RTX 3060 Laptop GPU 1.3028 2.6056 3.9084 5.2112 6.514 SE +/- 0.14, N = 3 5.79 MIN: 4.79 / MAX: 9.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: squeezenet_ssd NVIDIA GeForce RTX 3060 Laptop GPU 5 10 15 20 25 SE +/- 4.25, N = 3 19.63 MIN: 13.92 / MAX: 38.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: yolov4-tiny NVIDIA GeForce RTX 3060 Laptop GPU 5 10 15 20 25 SE +/- 0.43, N = 3 19.01 MIN: 17.19 / MAX: 30.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: resnet50 NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.05, N = 3 13.36 MIN: 12.46 / MAX: 19.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: alexnet NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.20, N = 3 6.73 MIN: 6.02 / MAX: 10.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: resnet18 NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.21, N = 3 6.51 MIN: 5.75 / MAX: 13.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: vgg16 NVIDIA GeForce RTX 3060 Laptop GPU 10 20 30 40 50 SE +/- 0.01, N = 3 44.99 MIN: 44.03 / MAX: 51.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: googlenet NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.01, N = 3 9.01 MIN: 8.1 / MAX: 13.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: blazeface NVIDIA GeForce RTX 3060 Laptop GPU 0.3015 0.603 0.9045 1.206 1.5075 SE +/- 0.03, N = 3 1.34 MIN: 1.18 / MAX: 2.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: efficientnet-b0 NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.10, N = 3 10.03 MIN: 9.08 / MAX: 16.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: mnasnet NVIDIA GeForce RTX 3060 Laptop GPU 0.891 1.782 2.673 3.564 4.455 SE +/- 0.04, N = 3 3.96 MIN: 3.55 / MAX: 7.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: shufflenet-v2 NVIDIA GeForce RTX 3060 Laptop GPU 0.648 1.296 1.944 2.592 3.24 SE +/- 0.01, N = 3 2.88 MIN: 2.33 / MAX: 4.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 NVIDIA GeForce RTX 3060 Laptop GPU 1.0485 2.097 3.1455 4.194 5.2425 SE +/- 0.07, N = 3 4.66 MIN: 4.22 / MAX: 8.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 NVIDIA GeForce RTX 3060 Laptop GPU 0.9045 1.809 2.7135 3.618 4.5225 SE +/- 0.09, N = 3 4.02 MIN: 3.54 / MAX: 6.66 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20210720 Target: Vulkan GPU - Model: mobilenet NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.09, N = 3 10.6 MIN: 9.57 / MAX: 13.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 NVIDIA GeForce RTX 3060 Laptop GPU 14K 28K 42K 56K 70K SE +/- 77.42, N = 3 65872 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile NVIDIA GeForce RTX 3060 Laptop GPU 30K 60K 90K 120K 150K SE +/- 523.89, N = 3 151814
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant NVIDIA GeForce RTX 3060 Laptop GPU 30K 60K 90K 120K 150K SE +/- 185.90, N = 3 141345
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet NVIDIA GeForce RTX 3060 Laptop GPU 40K 80K 120K 160K 200K SE +/- 32.83, N = 3 190603
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float NVIDIA GeForce RTX 3060 Laptop GPU 30K 60K 90K 120K 150K SE +/- 44.24, N = 3 127871
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica NVIDIA GeForce RTX 3060 Laptop GPU 11 22 33 44 55 SE +/- 0.07, N = 3 48.31
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU NVIDIA GeForce RTX 3060 Laptop GPU 15 30 45 60 75 SE +/- 0.10, N = 3 68.95
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 0.5579 1.1158 1.6737 2.2316 2.7895 SE +/- 0.03298, N = 15 2.47936 MIN: 2.32 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression NVIDIA GeForce RTX 3060 Laptop GPU 0.4703 0.9406 1.4109 1.8812 2.3515 SE +/- 0.01, N = 3 2.09
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 NVIDIA GeForce RTX 3060 Laptop GPU 7K 14K 21K 28K 35K SE +/- 36.37, N = 3 33278 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm NVIDIA GeForce RTX 3060 Laptop GPU 4 8 12 16 20 SE +/- 0.04, N = 3 17.59
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.01115, N = 3 8.34154 MIN: 4.81 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 0.4771 0.9542 1.4313 1.9084 2.3855 SE +/- 0.00175, N = 3 2.12066 MIN: 1.95 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark NVIDIA GeForce RTX 3060 Laptop GPU 0.0284 0.0568 0.0852 0.1136 0.142 SE +/- 0.0004, N = 3 0.1261 1. R scripting front-end version 4.0.4 (2021-02-15)
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 0.9694 1.9388 2.9082 3.8776 4.847 SE +/- 0.05326, N = 4 4.30842 MIN: 3.94 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 NVIDIA GeForce RTX 3060 Laptop GPU 50 100 150 200 250 SE +/- 0.45, N = 3 250.13 MIN: 247.81 / MAX: 257.68 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 NVIDIA GeForce RTX 3060 Laptop GPU 4 8 12 16 20 SE +/- 0.02, N = 3 16.11 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 NVIDIA GeForce RTX 3060 Laptop GPU 50 100 150 200 250 SE +/- 0.12, N = 3 222.65 MIN: 221.97 / MAX: 223.85 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 0.367 0.734 1.101 1.468 1.835 SE +/- 0.00737, N = 3 1.63127 MIN: 1.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth NVIDIA GeForce RTX 3060 Laptop GPU 300 600 900 1200 1500 SE +/- 2.56, N = 3 1309.85 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 1.0372 2.0744 3.1116 4.1488 5.186 SE +/- 0.00366, N = 3 4.60966 MIN: 4.42 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 0.6752 1.3504 2.0256 2.7008 3.376 SE +/- 0.00612, N = 3 3.00110 MIN: 2.77 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.4 Benchmark: P1B2 NVIDIA GeForce RTX 3060 Laptop GPU 8 16 24 32 40 35.18
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 3 6 9 12 15 SE +/- 0.01, N = 3 11.27 MIN: 10.74 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 5 10 15 20 25 SE +/- 0.04, N = 3 22.53 MIN: 21.51 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 6 12 18 24 30 SE +/- 0.05, N = 3 23.30 MIN: 22.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N NVIDIA GeForce RTX 3060 Laptop GPU 600 1200 1800 2400 3000 SE +/- 14.73, N = 3 2765.33 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 NVIDIA GeForce RTX 3060 Laptop GPU 12 24 36 48 60 SE +/- 0.15, N = 3 54.86 MIN: 54.44 / MAX: 55.55 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.0000, N = 3 6.7648 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D NVIDIA GeForce RTX 3060 Laptop GPU 40 80 120 160 200 SE +/- 0.03, N = 3 166.28 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.01705, N = 3 6.71586 MIN: 6.54 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU NVIDIA GeForce RTX 3060 Laptop GPU 0.7157 1.4314 2.1471 2.8628 3.5785 SE +/- 0.03473, N = 3 3.18077 MIN: 2.72 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.0004, N = 3 6.6926 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad NVIDIA GeForce RTX 3060 Laptop GPU 2 4 6 8 10 SE +/- 0.0003, N = 3 6.5530 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP NVIDIA GeForce RTX 3060 Laptop GPU 200 400 600 800 1000 SE +/- 0.02, N = 3 877.31 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction NVIDIA GeForce RTX 3060 Laptop GPU 60 120 180 240 300 SE +/- 0.07, N = 3 295.57 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash NVIDIA GeForce RTX 3060 Laptop GPU 4 8 12 16 20 SE +/- 0.02, N = 3 16.27 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5