2990WX March AMD Ryzen Threadripper 2990WX 32-Core testing with a ASUS ROG ZENITH EXTREME (1701 BIOS) and Gigabyte AMD Radeon RX 470/480/570/570X/580/580X/590 4GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103152-HA-2990WXMAR69 .
2990WX March Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution 1 2 3 AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads) ASUS ROG ZENITH EXTREME (1701 BIOS) AMD 17h 32GB Samsung SSD 970 EVO 500GB + 250GB Western Digital WDS250G2X0C-00L350 Gigabyte AMD Radeon RX 470/480/570/570X/580/580X/590 4GB (1244/1750MHz) Realtek ALC1220 LG Ultra HD Intel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11ad Ubuntu 20.10 5.8.0-44-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 4.6 Mesa 20.2.1 (LLVM 11.0.0) 1.2.131 GCC 10.2.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x800820d Python Details - Python 3.8.6 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
2990WX March simdjson: Kostya simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID jpegxl: PNG - 5 jpegxl: PNG - 7 jpegxl: PNG - 8 jpegxl: JPEG - 5 jpegxl: JPEG - 7 jpegxl: JPEG - 8 jpegxl-decode: 1 jpegxl-decode: All srslte: OFDM_Test srslte: PHY_DL_Test srslte: PHY_DL_Test luaradio: Five Back to Back FIR Filters luaradio: FM Deemphasis Filter luaradio: Hilbert Transform luaradio: Complex Phase gnuradio: Five Back to Back FIR Filters gnuradio: Signal Source (Cosine) gnuradio: FIR Filter gnuradio: IIR Filter gnuradio: FM Deemphasis Filter gnuradio: Hilbert Transform aom-av1: Speed 0 Two-Pass aom-av1: Speed 4 Two-Pass aom-av1: Speed 6 Realtime aom-av1: Speed 6 Two-Pass aom-av1: Speed 8 Realtime onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU liquid-dsp: 1 - 256 - 57 liquid-dsp: 2 - 256 - 57 liquid-dsp: 4 - 256 - 57 liquid-dsp: 8 - 256 - 57 liquid-dsp: 16 - 256 - 57 liquid-dsp: 32 - 256 - 57 liquid-dsp: 64 - 256 - 57 astcenc: Medium astcenc: Thorough astcenc: Exhaustive basis: ETC1S basis: UASTC Level 0 basis: UASTC Level 2 basis: UASTC Level 3 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 sysbench: RAM / Memory sysbench: CPU 1 2 3 2.28 0.92 2.96 3.32 58.84 8.26 0.72 53.26 53.47 22.91 32.45 173.91 82300000 227.9 86.2 493.1 405.0 98.1 570.6 369.9 3139.8 601.9 575.8 809.1 411.9 0.21 4.51 17.24 13.33 52.03 6.68194 12.4504 2.51177 3.57521 20.4034 6.96031 5.93567 25.4645 2.63122 3.80016 13898.8 3886.86 13965.6 3918.54 1.91617 14208.1 3947.67 1.55988 60627667 120953333 236713333 460310000 835613333 1477766667 1616966667 5.2065 6.3944 45.6420 27.434 7.493 15.671 25.058 9.062 38.015 5.871 4.310 48.048 6791.12 57045.06 2.29 0.92 2.97 3.33 58.35 8.26 0.71 53.60 54.08 23.49 32.59 174.35 82266667 228.7 86.2 488.1 403.2 97.4 565.6 384.1 3176.9 599.5 584.0 804.0 410.3 0.21 4.47 17.15 13.38 53.32 6.68145 12.0962 2.52878 3.59224 20.3116 6.96671 5.92992 25.3540 2.62957 3.79185 13707.4 3917.80 13883.6 3897.24 1.89112 14041.2 3911.83 1.55683 59917000 119076667 236780000 460586667 835380000 1474633333 1616600000 5.1920 6.3871 46.0109 27.108 7.477 15.723 25.163 8.783 38.151 5.910 4.234 48.082 6809.31 57054.82 2.29 0.92 2.97 3.33 58.67 8.25 0.72 52.85 53.19 23.30 32.55 174.95 82733333 229.1 86.5 491.9 404.7 97.4 567.3 373.6 3144.5 607.0 580.3 791.9 412.4 0.21 4.50 17.19 13.33 53.85 6.72476 12.5269 2.53747 3.60454 20.4069 6.98731 5.93542 25.4172 2.62773 3.78904 13632.5 3817.64 13706.6 3912.25 1.76706 13829.2 3928.48 1.56511 60272667 120150000 237340000 460676667 836323333 1478866667 1614700000 5.2163 6.4015 45.7101 27.516 7.495 15.792 25.160 8.912 37.561 5.718 4.041 47.210 6733.70 57053.00 OpenBenchmarking.org
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: Kostya 1 2 3 0.5153 1.0306 1.5459 2.0612 2.5765 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.28 2.29 2.29 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: LargeRandom 1 2 3 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.92 0.92 0.92 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: PartialTweets 1 2 3 0.6683 1.3366 2.0049 2.6732 3.3415 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.96 2.97 2.97 1. (CXX) g++ options: -O3 -pthread
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 0.8.2 Throughput Test: DistinctUserID 1 2 3 0.7493 1.4986 2.2479 2.9972 3.7465 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 3.32 3.33 3.33 1. (CXX) g++ options: -O3 -pthread
JPEG XL Input: PNG - Encode Speed: 5 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 5 1 2 3 13 26 39 52 65 SE +/- 0.09, N = 3 SE +/- 0.38, N = 3 SE +/- 0.89, N = 3 58.84 58.35 58.67 1. (CXX) g++ options: -funwind-tables -O3 -O2 -pthread -fPIE -pie -ldl
JPEG XL Input: PNG - Encode Speed: 7 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 7 1 2 3 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 8.26 8.26 8.25 1. (CXX) g++ options: -funwind-tables -O3 -O2 -pthread -fPIE -pie -ldl
JPEG XL Input: PNG - Encode Speed: 8 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: PNG - Encode Speed: 8 1 2 3 0.162 0.324 0.486 0.648 0.81 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.72 0.71 0.72 1. (CXX) g++ options: -funwind-tables -O3 -O2 -pthread -fPIE -pie -ldl
JPEG XL Input: JPEG - Encode Speed: 5 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 5 1 2 3 12 24 36 48 60 SE +/- 0.28, N = 3 SE +/- 0.61, N = 3 SE +/- 0.76, N = 3 53.26 53.60 52.85 1. (CXX) g++ options: -funwind-tables -O3 -O2 -pthread -fPIE -pie -ldl
JPEG XL Input: JPEG - Encode Speed: 7 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 7 1 2 3 12 24 36 48 60 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 SE +/- 0.40, N = 3 53.47 54.08 53.19 1. (CXX) g++ options: -funwind-tables -O3 -O2 -pthread -fPIE -pie -ldl
JPEG XL Input: JPEG - Encode Speed: 8 OpenBenchmarking.org MP/s, More Is Better JPEG XL 0.3.3 Input: JPEG - Encode Speed: 8 1 2 3 6 12 18 24 30 SE +/- 0.16, N = 3 SE +/- 0.15, N = 3 SE +/- 0.15, N = 3 22.91 23.49 23.30 1. (CXX) g++ options: -funwind-tables -O3 -O2 -pthread -fPIE -pie -ldl
JPEG XL Decoding CPU Threads: 1 OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding 0.3.3 CPU Threads: 1 1 2 3 8 16 24 32 40 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 32.45 32.59 32.55
JPEG XL Decoding CPU Threads: All OpenBenchmarking.org MP/s, More Is Better JPEG XL Decoding 0.3.3 CPU Threads: All 1 2 3 40 80 120 160 200 SE +/- 0.36, N = 3 SE +/- 0.60, N = 3 SE +/- 0.29, N = 3 173.91 174.35 174.95
srsLTE Test: OFDM_Test OpenBenchmarking.org Samples / Second, More Is Better srsLTE 20.10.1 Test: OFDM_Test 1 2 3 20M 40M 60M 80M 100M SE +/- 57735.03, N = 3 SE +/- 166666.67, N = 3 SE +/- 133333.33, N = 3 82300000 82266667 82733333 1. (CXX) g++ options: -std=c++11 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -rdynamic -lpthread -lmbedcrypto -lconfig++ -lsctp -lbladeRF -lm -lfftw3f
srsLTE Test: PHY_DL_Test OpenBenchmarking.org eNb Mb/s, More Is Better srsLTE 20.10.1 Test: PHY_DL_Test 1 2 3 50 100 150 200 250 SE +/- 1.36, N = 3 SE +/- 1.76, N = 3 SE +/- 0.86, N = 3 227.9 228.7 229.1 1. (CXX) g++ options: -std=c++11 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -rdynamic -lpthread -lmbedcrypto -lconfig++ -lsctp -lbladeRF -lm -lfftw3f
srsLTE Test: PHY_DL_Test OpenBenchmarking.org UE Mb/s, More Is Better srsLTE 20.10.1 Test: PHY_DL_Test 1 2 3 20 40 60 80 100 SE +/- 0.48, N = 3 SE +/- 0.43, N = 3 SE +/- 0.37, N = 3 86.2 86.2 86.5 1. (CXX) g++ options: -std=c++11 -fno-strict-aliasing -march=native -mfpmath=sse -mavx2 -fvisibility=hidden -O3 -fno-trapping-math -fno-math-errno -rdynamic -lpthread -lmbedcrypto -lconfig++ -lsctp -lbladeRF -lm -lfftw3f
LuaRadio Test: Five Back to Back FIR Filters OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Five Back to Back FIR Filters 1 2 3 110 220 330 440 550 SE +/- 1.15, N = 3 SE +/- 0.58, N = 3 SE +/- 1.14, N = 3 493.1 488.1 491.9
LuaRadio Test: FM Deemphasis Filter OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: FM Deemphasis Filter 1 2 3 90 180 270 360 450 SE +/- 0.51, N = 3 SE +/- 2.02, N = 3 SE +/- 0.90, N = 3 405.0 403.2 404.7
LuaRadio Test: Hilbert Transform OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Hilbert Transform 1 2 3 20 40 60 80 100 SE +/- 0.17, N = 3 SE +/- 0.52, N = 3 SE +/- 0.45, N = 3 98.1 97.4 97.4
LuaRadio Test: Complex Phase OpenBenchmarking.org MiB/s, More Is Better LuaRadio 0.9.1 Test: Complex Phase 1 2 3 120 240 360 480 600 SE +/- 0.23, N = 3 SE +/- 2.48, N = 3 SE +/- 3.02, N = 3 570.6 565.6 567.3
GNU Radio Test: Five Back to Back FIR Filters OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Five Back to Back FIR Filters 1 2 3 80 160 240 320 400 SE +/- 5.54, N = 9 SE +/- 3.80, N = 3 SE +/- 4.94, N = 9 369.9 384.1 373.6 1. 3.8.1.0
GNU Radio Test: Signal Source (Cosine) OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Signal Source (Cosine) 1 2 3 700 1400 2100 2800 3500 SE +/- 11.69, N = 9 SE +/- 18.55, N = 3 SE +/- 12.27, N = 9 3139.8 3176.9 3144.5 1. 3.8.1.0
GNU Radio Test: FIR Filter OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: FIR Filter 1 2 3 130 260 390 520 650 SE +/- 4.29, N = 9 SE +/- 2.03, N = 3 SE +/- 1.71, N = 9 601.9 599.5 607.0 1. 3.8.1.0
GNU Radio Test: IIR Filter OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: IIR Filter 1 2 3 130 260 390 520 650 SE +/- 3.61, N = 9 SE +/- 1.47, N = 3 SE +/- 1.18, N = 9 575.8 584.0 580.3 1. 3.8.1.0
GNU Radio Test: FM Deemphasis Filter OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: FM Deemphasis Filter 1 2 3 200 400 600 800 1000 SE +/- 2.77, N = 9 SE +/- 2.87, N = 3 SE +/- 11.33, N = 9 809.1 804.0 791.9 1. 3.8.1.0
GNU Radio Test: Hilbert Transform OpenBenchmarking.org MiB/s, More Is Better GNU Radio Test: Hilbert Transform 1 2 3 90 180 270 360 450 SE +/- 2.03, N = 9 SE +/- 1.57, N = 3 SE +/- 1.42, N = 9 411.9 410.3 412.4 1. 3.8.1.0
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 0 Two-Pass 1 2 3 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.21 0.21 0.21 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 4 Two-Pass 1 2 3 1.0148 2.0296 3.0444 4.0592 5.074 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 4.51 4.47 4.50 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Realtime 1 2 3 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.09, N = 3 17.24 17.15 17.19 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 6 Two-Pass 1 2 3 3 6 9 12 15 SE +/- 0.09, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 3 13.33 13.38 13.33 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.1-rc Encoder Mode: Speed 8 Realtime 1 2 3 12 24 36 48 60 SE +/- 0.83, N = 3 SE +/- 0.82, N = 3 SE +/- 0.00, N = 3 52.03 53.32 53.85 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.09926, N = 3 SE +/- 0.02263, N = 3 SE +/- 0.01807, N = 3 6.68194 6.68145 6.72476 MIN: 5.95 MIN: 5.95 MIN: 5.9 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 2 3 3 6 9 12 15 SE +/- 0.13, N = 15 SE +/- 0.16, N = 4 SE +/- 0.14, N = 15 12.45 12.10 12.53 MIN: 11.43 MIN: 11.23 MIN: 11.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.5709 1.1418 1.7127 2.2836 2.8545 SE +/- 0.00876, N = 3 SE +/- 0.02431, N = 3 SE +/- 0.02758, N = 3 2.51177 2.52878 2.53747 MIN: 2.32 MIN: 2.33 MIN: 2.32 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.811 1.622 2.433 3.244 4.055 SE +/- 0.00442, N = 3 SE +/- 0.00737, N = 3 SE +/- 0.01752, N = 3 3.57521 3.59224 3.60454 MIN: 3.17 MIN: 3.18 MIN: 1.97 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.13, N = 3 SE +/- 0.01, N = 3 20.40 20.31 20.41 MIN: 19.97 MIN: 18.74 MIN: 19.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 1 2 3 2 4 6 8 10 SE +/- 0.01479, N = 3 SE +/- 0.01848, N = 3 SE +/- 0.02797, N = 3 6.96031 6.96671 6.98731 MIN: 6.16 MIN: 6.12 MIN: 6.09 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 1.3355 2.671 4.0065 5.342 6.6775 SE +/- 0.01143, N = 3 SE +/- 0.00690, N = 3 SE +/- 0.00102, N = 3 5.93567 5.92992 5.93542 MIN: 5.52 MIN: 5.5 MIN: 5.49 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 25.46 25.35 25.42 MIN: 24.65 MIN: 24.3 MIN: 24.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.592 1.184 1.776 2.368 2.96 SE +/- 0.00877, N = 3 SE +/- 0.00861, N = 3 SE +/- 0.00732, N = 3 2.63122 2.62957 2.62773 MIN: 2.5 MIN: 2.5 MIN: 2.51 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.855 1.71 2.565 3.42 4.275 SE +/- 0.00430, N = 3 SE +/- 0.00167, N = 3 SE +/- 0.00762, N = 3 3.80016 3.79185 3.78904 MIN: 3.72 MIN: 3.71 MIN: 3.71 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 3K 6K 9K 12K 15K SE +/- 154.68, N = 7 SE +/- 145.74, N = 7 SE +/- 176.85, N = 5 13898.8 13707.4 13632.5 MIN: 13032.7 MIN: 12922.4 MIN: 12827.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 17.43, N = 3 SE +/- 27.48, N = 3 SE +/- 39.34, N = 3 3886.86 3917.80 3817.64 MIN: 3655.72 MIN: 3769.84 MIN: 3695.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 3K 6K 9K 12K 15K SE +/- 178.47, N = 5 SE +/- 182.34, N = 5 SE +/- 149.77, N = 12 13965.6 13883.6 13706.6 MIN: 13115.7 MIN: 12751.7 MIN: 12260.3 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 20.29, N = 3 SE +/- 11.92, N = 3 SE +/- 15.33, N = 3 3918.54 3897.24 3912.25 MIN: 3855 MIN: 3835.75 MIN: 3811.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.4311 0.8622 1.2933 1.7244 2.1555 SE +/- 0.05033, N = 15 SE +/- 0.06076, N = 15 SE +/- 0.06494, N = 15 1.91617 1.89112 1.76706 MIN: 1.23 MIN: 1.25 MIN: 1.22 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 3K 6K 9K 12K 15K SE +/- 92.65, N = 3 SE +/- 125.70, N = 3 SE +/- 92.53, N = 3 14208.1 14041.2 13829.2 MIN: 13948.2 MIN: 13720.3 MIN: 13621 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 800 1600 2400 3200 4000 SE +/- 27.19, N = 3 SE +/- 8.79, N = 3 SE +/- 15.75, N = 3 3947.67 3911.83 3928.48 MIN: 3865.39 MIN: 3788.89 MIN: 3822.06 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.3521 0.7042 1.0563 1.4084 1.7605 SE +/- 0.00396, N = 3 SE +/- 0.00329, N = 3 SE +/- 0.00128, N = 3 1.55988 1.55683 1.56511 MIN: 1.43 MIN: 1.43 MIN: 1.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 1 - Buffer Length: 256 - Filter Length: 57 1 2 3 13M 26M 39M 52M 65M SE +/- 6385.75, N = 3 SE +/- 65592.17, N = 3 SE +/- 229485.17, N = 3 60627667 59917000 60272667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 2 - Buffer Length: 256 - Filter Length: 57 1 2 3 30M 60M 90M 120M 150M SE +/- 8819.17, N = 3 SE +/- 524859.77, N = 3 SE +/- 165025.25, N = 3 120953333 119076667 120150000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 4 - Buffer Length: 256 - Filter Length: 57 1 2 3 50M 100M 150M 200M 250M SE +/- 161279.61, N = 3 SE +/- 528488.41, N = 3 SE +/- 585946.53, N = 3 236713333 236780000 237340000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 8 - Buffer Length: 256 - Filter Length: 57 1 2 3 100M 200M 300M 400M 500M SE +/- 920887.25, N = 3 SE +/- 668539.04, N = 3 SE +/- 870868.02, N = 3 460310000 460586667 460676667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 16 - Buffer Length: 256 - Filter Length: 57 1 2 3 200M 400M 600M 800M 1000M SE +/- 414782.41, N = 3 SE +/- 1629918.20, N = 3 SE +/- 670232.13, N = 3 835613333 835380000 836323333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 32 - Buffer Length: 256 - Filter Length: 57 1 2 3 300M 600M 900M 1200M 1500M SE +/- 2370185.18, N = 3 SE +/- 5967783.88, N = 3 SE +/- 4056818.68, N = 3 1477766667 1474633333 1478866667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 2021.01.31 Threads: 64 - Buffer Length: 256 - Filter Length: 57 1 2 3 300M 600M 900M 1200M 1500M SE +/- 1354416.64, N = 3 SE +/- 1026320.29, N = 3 SE +/- 1193035.34, N = 3 1616966667 1616600000 1614700000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
ASTC Encoder Preset: Medium OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Medium 1 2 3 1.1737 2.3474 3.5211 4.6948 5.8685 SE +/- 0.0362, N = 3 SE +/- 0.0034, N = 3 SE +/- 0.0222, N = 3 5.2065 5.1920 5.2163 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Thorough OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Thorough 1 2 3 2 4 6 8 10 SE +/- 0.0026, N = 3 SE +/- 0.0067, N = 3 SE +/- 0.0153, N = 3 6.3944 6.3871 6.4015 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org Seconds, Fewer Is Better ASTC Encoder 2.4 Preset: Exhaustive 1 2 3 10 20 30 40 50 SE +/- 0.05, N = 3 SE +/- 0.30, N = 3 SE +/- 0.06, N = 3 45.64 46.01 45.71 1. (CXX) g++ options: -O3 -flto -pthread
Basis Universal Settings: ETC1S OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: ETC1S 1 2 3 6 12 18 24 30 SE +/- 0.27, N = 3 SE +/- 0.16, N = 3 SE +/- 0.05, N = 3 27.43 27.11 27.52 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 0 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 0 1 2 3 2 4 6 8 10 SE +/- 0.013, N = 3 SE +/- 0.011, N = 3 SE +/- 0.038, N = 3 7.493 7.477 7.495 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 2 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 2 1 2 3 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 15.67 15.72 15.79 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Basis Universal Settings: UASTC Level 3 OpenBenchmarking.org Seconds, Fewer Is Better Basis Universal 1.13 Settings: UASTC Level 3 1 2 3 6 12 18 24 30 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 25.06 25.16 25.16 1. (CXX) g++ options: -std=c++11 -fvisibility=hidden -fPIC -fno-strict-aliasing -O3 -rdynamic -lm -lpthread
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: SqueezeNetV1.0 1 2 3 3 6 9 12 15 SE +/- 0.095, N = 15 SE +/- 0.105, N = 15 SE +/- 0.108, N = 3 9.062 8.783 8.912 MIN: 8.06 / MAX: 18.98 MIN: 8 / MAX: 22.86 MIN: 8.28 / MAX: 17.91 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: resnet-v2-50 1 2 3 9 18 27 36 45 SE +/- 0.30, N = 15 SE +/- 0.34, N = 15 SE +/- 0.53, N = 3 38.02 38.15 37.56 MIN: 35.17 / MAX: 122.04 MIN: 35.3 / MAX: 124.03 MIN: 35.2 / MAX: 119.58 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: MobileNetV2_224 1 2 3 1.3298 2.6596 3.9894 5.3192 6.649 SE +/- 0.069, N = 15 SE +/- 0.058, N = 15 SE +/- 0.211, N = 3 5.871 5.910 5.718 MIN: 5.4 / MAX: 14.78 MIN: 5.42 / MAX: 7.12 MIN: 5.38 / MAX: 6.46 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: mobilenet-v1-1.0 1 2 3 0.9698 1.9396 2.9094 3.8792 4.849 SE +/- 0.063, N = 15 SE +/- 0.056, N = 15 SE +/- 0.114, N = 3 4.310 4.234 4.041 MIN: 3.38 / MAX: 40.78 MIN: 3.39 / MAX: 44.28 MIN: 3.38 / MAX: 29.44 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 1.1.3 Model: inception-v3 1 2 3 11 22 33 44 55 SE +/- 0.34, N = 15 SE +/- 0.26, N = 15 SE +/- 0.37, N = 3 48.05 48.08 47.21 MIN: 44.63 / MAX: 136.06 MIN: 45.19 / MAX: 105.01 MIN: 43.85 / MAX: 97.15 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Sysbench Test: RAM / Memory OpenBenchmarking.org MiB/sec, More Is Better Sysbench 1.0.20 Test: RAM / Memory 1 2 3 1500 3000 4500 6000 7500 SE +/- 64.27, N = 3 SE +/- 32.29, N = 3 SE +/- 33.94, N = 3 6791.12 6809.31 6733.70 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU 1 2 3 12K 24K 36K 48K 60K SE +/- 2.08, N = 3 SE +/- 3.63, N = 3 SE +/- 1.70, N = 3 57045.06 57054.82 57053.00 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
Phoronix Test Suite v10.8.4