onnx 1.11 Threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2203274-PTS-ONNX111T03&grw&sor .
onnx 1.11 Threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C D AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 21.10 5.17.0-rc1-sched-core-phx (x86_64) GNOME Shell 40.5 X Server 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039 Python Details - Python 3.9.7 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onnx 1.11 Threadripper onnx: GPT-2 - CPU onnx: yolov4 - CPU onnx: bertsquad-12 - CPU onnx: fcn-resnet101-11 - CPU onnx: ArcFace ResNet-100 - CPU onnx: super-resolution-10 - CPU perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Memcpy 1MB perf-bench: Memset 1MB perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic A B C D 4120 380 435 81 1096 3634 5044 3553285 12.078737 69.637544 255487 93 16605197 3850 368 426 81 1092 3662 5166 3578390 11.959034 69.555229 246745 93 16727281 3868 370 431 81 1086 3692 5158 3483369 12.086277 69.494542 256473 93 16659267 3869 369 431 81 1092 3787 5213 3540563 11.973617 69.496049 264487 93 16878142 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU A D C B 900 1800 2700 3600 4500 SE +/- 14.70, N = 3 SE +/- 1.15, N = 3 SE +/- 10.15, N = 3 SE +/- 34.56, N = 3 4120 3869 3868 3850 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU A C D B 80 160 240 320 400 SE +/- 1.26, N = 3 SE +/- 0.44, N = 3 SE +/- 0.76, N = 3 SE +/- 0.87, N = 3 380 370 369 368 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU A D C B 90 180 270 360 450 SE +/- 1.32, N = 3 SE +/- 0.93, N = 3 SE +/- 0.93, N = 3 SE +/- 0.73, N = 3 435 431 431 426 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU D C B A 20 40 60 80 100 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 81 81 81 81 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU A D B C 200 400 600 800 1000 SE +/- 7.58, N = 3 SE +/- 2.68, N = 3 SE +/- 3.77, N = 3 SE +/- 14.26, N = 3 1096 1092 1092 1086 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU D C B A 800 1600 2400 3200 4000 SE +/- 9.97, N = 3 SE +/- 17.99, N = 3 SE +/- 7.29, N = 3 SE +/- 24.28, N = 3 3787 3692 3662 3634 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait D B C A 1100 2200 3300 4400 5500 SE +/- 4.63, N = 3 SE +/- 56.49, N = 3 SE +/- 24.06, N = 3 SE +/- 60.84, N = 4 5213 5166 5158 5044 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash B A D C 800K 1600K 2400K 3200K 4000K SE +/- 21250.47, N = 3 SE +/- 9635.16, N = 3 SE +/- 7910.60, N = 3 SE +/- 10646.17, N = 3 3578390 3553285 3540563 3483369 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB C A D B 3 6 9 12 15 SE +/- 0.08, N = 13 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 SE +/- 0.13, N = 15 12.09 12.08 11.97 11.96 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB A B D C 15 30 45 60 75 SE +/- 0.56, N = 9 SE +/- 0.56, N = 3 SE +/- 0.65, N = 15 SE +/- 0.81, N = 15 69.64 69.56 69.50 69.49 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe D C A B 60K 120K 180K 240K 300K SE +/- 12339.61, N = 15 SE +/- 11466.52, N = 15 SE +/- 13951.36, N = 12 SE +/- 10178.25, N = 15 264487 256473 255487 246745 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi D C B A 20 40 60 80 100 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 93 93 93 93 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic D B C A 4M 8M 12M 16M 20M SE +/- 14405.48, N = 3 SE +/- 124978.04, N = 3 SE +/- 167024.68, N = 6 SE +/- 92684.37, N = 3 16878142 16727281 16659267 16605197 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
Phoronix Test Suite v10.8.5