onnx 1.11 Threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2203274-PTS-ONNX111T03&grs&sor .
onnx 1.11 Threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C D AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 21.10 5.17.0-rc1-sched-core-phx (x86_64) GNOME Shell 40.5 X Server 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039 Python Details - Python 3.9.7 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onnx 1.11 Threadripper onnx: GPT-2 - CPU onnx: super-resolution-10 - CPU perf-bench: Epoll Wait onnx: yolov4 - CPU perf-bench: Futex Hash onnx: bertsquad-12 - CPU perf-bench: Syscall Basic perf-bench: Memcpy 1MB onnx: ArcFace ResNet-100 - CPU perf-bench: Memset 1MB onnx: fcn-resnet101-11 - CPU perf-bench: Futex Lock-Pi perf-bench: Sched Pipe A B C D 4120 3634 5044 380 3553285 435 16605197 12.078737 1096 69.637544 81 93 255487 3850 3662 5166 368 3578390 426 16727281 11.959034 1092 69.555229 81 93 246745 3868 3692 5158 370 3483369 431 16659267 12.086277 1086 69.494542 81 93 256473 3869 3787 5213 369 3540563 431 16878142 11.973617 1092 69.496049 81 93 264487 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU A D C B 900 1800 2700 3600 4500 SE +/- 14.70, N = 3 SE +/- 1.15, N = 3 SE +/- 10.15, N = 3 SE +/- 34.56, N = 3 4120 3869 3868 3850 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU D C B A 800 1600 2400 3200 4000 SE +/- 9.97, N = 3 SE +/- 17.99, N = 3 SE +/- 7.29, N = 3 SE +/- 24.28, N = 3 3787 3692 3662 3634 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait D B C A 1100 2200 3300 4400 5500 SE +/- 4.63, N = 3 SE +/- 56.49, N = 3 SE +/- 24.06, N = 3 SE +/- 60.84, N = 4 5213 5166 5158 5044 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
ONNX Runtime Model: yolov4 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU A C D B 80 160 240 320 400 SE +/- 1.26, N = 3 SE +/- 0.44, N = 3 SE +/- 0.76, N = 3 SE +/- 0.87, N = 3 380 370 369 368 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash B A D C 800K 1600K 2400K 3200K 4000K SE +/- 21250.47, N = 3 SE +/- 9635.16, N = 3 SE +/- 7910.60, N = 3 SE +/- 10646.17, N = 3 3578390 3553285 3540563 3483369 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
ONNX Runtime Model: bertsquad-12 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU A D C B 90 180 270 360 450 SE +/- 1.32, N = 3 SE +/- 0.93, N = 3 SE +/- 0.93, N = 3 SE +/- 0.73, N = 3 435 431 431 426 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic D B C A 4M 8M 12M 16M 20M SE +/- 14405.48, N = 3 SE +/- 124978.04, N = 3 SE +/- 167024.68, N = 6 SE +/- 92684.37, N = 3 16878142 16727281 16659267 16605197 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB C A D B 3 6 9 12 15 SE +/- 0.08, N = 13 SE +/- 0.04, N = 3 SE +/- 0.13, N = 3 SE +/- 0.13, N = 15 12.09 12.08 11.97 11.96 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU A D B C 200 400 600 800 1000 SE +/- 7.58, N = 3 SE +/- 2.68, N = 3 SE +/- 3.77, N = 3 SE +/- 14.26, N = 3 1096 1092 1092 1086 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB A B D C 15 30 45 60 75 SE +/- 0.56, N = 9 SE +/- 0.56, N = 3 SE +/- 0.65, N = 15 SE +/- 0.81, N = 15 69.64 69.56 69.50 69.49 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU D C B A 20 40 60 80 100 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 81 81 81 81 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi D C B A 20 40 60 80 100 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 93 93 93 93 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe D C A B 60K 120K 180K 240K 300K SE +/- 12339.61, N = 15 SE +/- 11466.52, N = 15 SE +/- 13951.36, N = 12 SE +/- 10178.25, N = 15 264487 256473 255487 246745 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma
Phoronix Test Suite v10.8.5