onnx 1.11 Threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2203274-PTS-ONNX111T03&export=pdf&sro&gru .
onnx 1.11 Threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C D AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 21.10 5.17.0-rc1-sched-core-phx (x86_64) GNOME Shell 40.5 X Server 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039 Python Details - Python 3.9.7 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onnx 1.11 Threadripper perf-bench: Memcpy 1MB perf-bench: Memset 1MB onnx: GPT-2 - CPU onnx: yolov4 - CPU onnx: bertsquad-12 - CPU onnx: fcn-resnet101-11 - CPU onnx: ArcFace ResNet-100 - CPU onnx: super-resolution-10 - CPU perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic A B C D 12.078737 69.637544 4120 380 435 81 1096 3634 5044 3553285 255487 93 16605197 11.959034 69.555229 3850 368 426 81 1092 3662 5166 3578390 246745 93 16727281 12.086277 69.494542 3868 370 431 81 1086 3692 5158 3483369 256473 93 16659267 11.973617 69.496049 3869 369 431 81 1092 3787 5213 3540563 264487 93 16878142 OpenBenchmarking.org
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB A B C D 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.13, N = 15 SE +/- 0.08, N = 13 SE +/- 0.13, N = 3 12.08 11.96 12.09 11.97 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB A B C D 15 30 45 60 75 SE +/- 0.56, N = 9 SE +/- 0.56, N = 3 SE +/- 0.81, N = 15 SE +/- 0.65, N = 15 69.64 69.56 69.49 69.50 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
ONNX Runtime Model: GPT-2 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU A B C D 900 1800 2700 3600 4500 SE +/- 14.70, N = 3 SE +/- 34.56, N = 3 SE +/- 10.15, N = 3 SE +/- 1.15, N = 3 4120 3850 3868 3869 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU A B C D 80 160 240 320 400 SE +/- 1.26, N = 3 SE +/- 0.87, N = 3 SE +/- 0.44, N = 3 SE +/- 0.76, N = 3 380 368 370 369 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU A B C D 90 180 270 360 450 SE +/- 1.32, N = 3 SE +/- 0.73, N = 3 SE +/- 0.93, N = 3 SE +/- 0.93, N = 3 435 426 431 431 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU A B C D 20 40 60 80 100 SE +/- 0.29, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 81 81 81 81 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU A B C D 200 400 600 800 1000 SE +/- 7.58, N = 3 SE +/- 3.77, N = 3 SE +/- 14.26, N = 3 SE +/- 2.68, N = 3 1096 1092 1086 1092 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU A B C D 800 1600 2400 3200 4000 SE +/- 24.28, N = 3 SE +/- 7.29, N = 3 SE +/- 17.99, N = 3 SE +/- 9.97, N = 3 3634 3662 3692 3787 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait A B C D 1100 2200 3300 4400 5500 SE +/- 60.84, N = 4 SE +/- 56.49, N = 3 SE +/- 24.06, N = 3 SE +/- 4.63, N = 3 5044 5166 5158 5213 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash A B C D 800K 1600K 2400K 3200K 4000K SE +/- 9635.16, N = 3 SE +/- 21250.47, N = 3 SE +/- 10646.17, N = 3 SE +/- 7910.60, N = 3 3553285 3578390 3483369 3540563 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe A B C D 60K 120K 180K 240K 300K SE +/- 13951.36, N = 12 SE +/- 10178.25, N = 15 SE +/- 11466.52, N = 15 SE +/- 12339.61, N = 15 255487 246745 256473 264487 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi A B C D 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 93 93 93 93 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic A B C D 4M 8M 12M 16M 20M SE +/- 92684.37, N = 3 SE +/- 124978.04, N = 3 SE +/- 167024.68, N = 6 SE +/- 14405.48, N = 3 16605197 16727281 16659267 16878142 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
Phoronix Test Suite v10.8.5