onnx 1.11 Threadripper

AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2203274-PTS-ONNX111T03.

onnx 1.11 ThreadripperProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionABCDAMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads)Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS)AMD Starship/Matisse128GBSamsung SSD 970 EVO Plus 500GBAMD Radeon RX 5700 8GB (1750/875MHz)AMD Navi 10 HDMI AudioDELL P2415QIntel I211 + Intel Wi-Fi 6 AX200Pop 21.105.17.0-rc1-sched-core-phx (x86_64)GNOME Shell 40.5X Server4.6 Mesa 21.2.2 (LLVM 12.0.1)1.2.182GCC 11.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039Python Details- Python 3.9.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

onnx 1.11 Threadripperperf-bench: Epoll Waitperf-bench: Futex Hashperf-bench: Memcpy 1MBperf-bench: Memset 1MBperf-bench: Sched Pipeperf-bench: Futex Lock-Piperf-bench: Syscall Basiconnx: GPT-2 - CPUonnx: yolov4 - CPUonnx: bertsquad-12 - CPUonnx: fcn-resnet101-11 - CPUonnx: ArcFace ResNet-100 - CPUonnx: super-resolution-10 - CPUABCD5044355328512.07873769.6375442554879316605197412038043581109636345166357839011.95903469.5552292467459316727281385036842681109236625158348336912.08627769.4945422564739316659267386837043181108636925213354056311.97361769.496049264487931687814238693694318110923787OpenBenchmarking.org

perf-bench

Benchmark: Epoll Wait

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Epoll WaitABCD11002200330044005500SE +/- 60.84, N = 4SE +/- 56.49, N = 3SE +/- 24.06, N = 3SE +/- 4.63, N = 350445166515852131. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

perf-bench

Benchmark: Futex Hash

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex HashABCD800K1600K2400K3200K4000KSE +/- 9635.16, N = 3SE +/- 21250.47, N = 3SE +/- 10646.17, N = 3SE +/- 7910.60, N = 335532853578390348336935405631. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

perf-bench

Benchmark: Memcpy 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memcpy 1MBABCD3691215SE +/- 0.04, N = 3SE +/- 0.13, N = 15SE +/- 0.08, N = 13SE +/- 0.13, N = 312.0811.9612.0911.971. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

perf-bench

Benchmark: Memset 1MB

OpenBenchmarking.orgGB/sec, More Is Betterperf-benchBenchmark: Memset 1MBABCD1530456075SE +/- 0.56, N = 9SE +/- 0.56, N = 3SE +/- 0.81, N = 15SE +/- 0.65, N = 1569.6469.5669.4969.501. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

perf-bench

Benchmark: Sched Pipe

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Sched PipeABCD60K120K180K240K300KSE +/- 13951.36, N = 12SE +/- 10178.25, N = 15SE +/- 11466.52, N = 15SE +/- 12339.61, N = 152554872467452564732644871. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

perf-bench

Benchmark: Futex Lock-Pi

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Futex Lock-PiABCD20406080100SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3939393931. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

perf-bench

Benchmark: Syscall Basic

OpenBenchmarking.orgops/sec, More Is Betterperf-benchBenchmark: Syscall BasicABCD4M8M12M16M20MSE +/- 92684.37, N = 3SE +/- 124978.04, N = 3SE +/- 167024.68, N = 6SE +/- 14405.48, N = 3166051971672728116659267168781421. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -O6 -ggdb3 -funwind-tables -std=gnu99 -lnuma

ONNX Runtime

Model: GPT-2 - Device: CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPUABCD9001800270036004500SE +/- 14.70, N = 3SE +/- 34.56, N = 3SE +/- 10.15, N = 3SE +/- 1.15, N = 341203850386838691. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPUABCD80160240320400SE +/- 1.26, N = 3SE +/- 0.87, N = 3SE +/- 0.44, N = 3SE +/- 0.76, N = 33803683703691. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPUABCD90180270360450SE +/- 1.32, N = 3SE +/- 0.73, N = 3SE +/- 0.93, N = 3SE +/- 0.93, N = 34354264314311. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPUABCD20406080100SE +/- 0.29, N = 3SE +/- 0.17, N = 3SE +/- 0.17, N = 3SE +/- 0.17, N = 3818181811. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPUABCD2004006008001000SE +/- 7.58, N = 3SE +/- 3.77, N = 3SE +/- 14.26, N = 3SE +/- 2.68, N = 310961092108610921. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPUABCD8001600240032004000SE +/- 24.28, N = 3SE +/- 7.29, N = 3SE +/- 17.99, N = 3SE +/- 9.97, N = 336343662369237871. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.4