CUDA NVIDIA Tegra X1 GPGPU Linux Tests

Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes.

HTML result view exported from: https://openbenchmarking.org/result/1812280-SK-1511154HA50&sor&grw.

CUDA NVIDIA Tegra X1 GPGPU Linux TestsProcessorMotherboardMemoryDiskGraphicsChipsetAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverCompilerFile-SystemScreen ResolutionJetson TX1NVIDIA GTX 650 TiCortex A57 rev 1 @ 1.91GHz (4 Cores)jetson_tx14096MB16GB 016G32 + 16GB SL16GNVIDIA TEGRAUbuntu 14.043.10.67-g3a5c467 (aarch64)Unity 7.2.2X Server 1.15.1NVIDIA 1.0.0GCC 4.8.4 + CUDA 7.0ext43840x2160Intel Core i5-2400 @ 3.40GHz (4 Cores)ASUS P8H67-M PRO (3802 BIOS)Intel 2nd Generation Core Family DRAM16384MB1000GB Western Digital WD10EALX-009 + 250GB Western Digital WD2500AAKX-7 + SSD 240GBIntel 2nd Generation Core Family IGP 981MBRealtek ALC892DELL E178WFPRealtek RTL8111/8168/8411Ubuntu 16.044.4.0-140-generic (x86_64)Unity 7.4.5X Server 1.18.4modesetting 1.18.4GCC 5.5.0 20171010 + CUDA 9.01366x768OpenBenchmarking.orgCompiler Details- Jetson TX1: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libmudflap --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb -v - NVIDIA GTX 650 Ti: --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Jetson TX1: Scaling Governor: tegra interactive- NVIDIA GTX 650 Ti: Scaling Governor: intel_pstate powersaveSecurity Details- NVIDIA GTX 650 Ti: KPTI + __user pointer sanitization + Full generic retpoline IBPB (Intel v4) IBRS_FW + SSB disabled via prctl and seccomp + PTE Inversion

CUDA NVIDIA Tegra X1 GPGPU Linux Testscuda-mini-nbody: Flush Denormals To Zerocuda-mini-nbody: Loop Unrollingcuda-mini-nbody: SOA Data Layoutcuda-mini-nbody: Originalcuda-mini-nbody: Cache Blockingshoc: CUDA - FFT SPshoc: CUDA - MD5 Hashshoc: CUDA - Texture Read Bandwidthaskap: Griddingaskap: DegriddingJetson TX1NVIDIA GTX 650 Ti538.07236.40529.59513.47277.483.920.6246.62262.83649.052.09OpenBenchmarking.org

CUDA Mini-Nbody

Test: Flush Denormals To Zero

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Flush Denormals To ZeroJetson TX1120240360480600SE +/- 5.74, N = 3538.07

CUDA Mini-Nbody

Test: Loop Unrolling

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Loop UnrollingNVIDIA GTX 650 TiJetson TX150100150200250SE +/- 0.03, N = 3SE +/- 3.29, N = 62.09236.40

CUDA Mini-Nbody

Test: SOA Data Layout

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: SOA Data LayoutJetson TX1110220330440550SE +/- 8.35, N = 3529.59

CUDA Mini-Nbody

Test: Original

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: OriginalJetson TX1110220330440550SE +/- 9.09, N = 6513.47

CUDA Mini-Nbody

Test: Cache Blocking

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Cache BlockingJetson TX160120180240300SE +/- 7.87, N = 6277.48

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: FFT SPJetson TX10.8821.7642.6463.5284.41SE +/- 0.23, N = 63.921. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: MD5 HashJetson TX10.13950.2790.41850.5580.6975SE +/- 0.00, N = 30.621. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Texture Read BandwidthJetson TX11122334455SE +/- 0.84, N = 346.621. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

ASKAP tConvolveCuda

Processing: Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: GriddingJetson TX160120180240300SE +/- 7.41, N = 6262.831. (CXX) g++ options: -fPIC -O3 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

ASKAP tConvolveCuda

Processing: Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: DegriddingJetson TX1140280420560700SE +/- 7.47, N = 3649.051. (CXX) g++ options: -fPIC -O3 -lcudadevrt -lcudart_static -lrt -lpthread -ldl


Phoronix Test Suite v10.8.4