CUDA NVIDIA Tegra X1 GPGPU Linux Tests

Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes.

HTML result view exported from: https://openbenchmarking.org/result/1703105-RI-1511154HA83.

CUDA NVIDIA Tegra X1 GPGPU Linux TestsProcessorMotherboardMemoryDiskGraphicsChipsetAudioNetworkOSKernelDesktopDisplay ServerDisplay DriverCompilerFile-SystemScreen ResolutionOpenGLVulkanJetson TX1DesktopCortex A57 rev 1 @ 1.91GHz (4 Cores)jetson_tx14096MB16GB 016G32 + 16GB SL16GNVIDIA TEGRAUbuntu 14.043.10.67-g3a5c467 (aarch64)Unity 7.2.2X Server 1.15.1NVIDIA 1.0.0GCC 4.8.4 + CUDA 7.0ext43840x2160Intel Core i7-7700K @ 4.20GHz (8 Cores)ASRock Z270 Extreme4Intel Device 591f32768MB525GB Crucial_CT525MX3 + 3001GB TOSHIBA DT01ACA3NVIDIA GeForce GTX 1080 8192MB (101/405MHz)Realtek GenericIntel ConnectionUbuntu 16.044.4.0-66-generic (x86_64)Unity 7.4.0X Server 1.18.4NVIDIA 375.264.5.01.0.24GCC 5.4.0 20160609 + CUDA 8.03840x1080OpenBenchmarking.orgCompiler Details- Jetson TX1: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libmudflap --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb -v - Desktop: --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Jetson TX1: Scaling Governor: tegra interactive- Desktop: Scaling Governor: acpi-cpufreq ondemandOpenCL Details- Desktop: GPU Compute Cores: 2560System Details- Desktop: GPU Compute Cores: 2560.

CUDA NVIDIA Tegra X1 GPGPU Linux Testsshoc: CUDA - FFT SPshoc: CUDA - MD5 Hashshoc: CUDA - Texture Read Bandwidthaskap: Griddingaskap: Degriddingcuda-mini-nbody: Originalcuda-mini-nbody: Cache Blockingcuda-mini-nbody: Loop Unrollingcuda-mini-nbody: SOA Data Layoutcuda-mini-nbody: Flush Denormals To ZeroJetson TX1Desktop3.920.6246.62262.83649.05513.47277.48236.40529.59538.07493.4612.80551.688588.9015372.0730.4712.9613.3726.6826.72OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: FFT SPJetson TX1Desktop110220330440550SE +/- 0.23, N = 6SE +/- 2.73, N = 33.92493.461. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: MD5 HashJetson TX1Desktop3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 30.6212.801. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Texture Read BandwidthJetson TX1Desktop120240360480600SE +/- 0.84, N = 3SE +/- 2.47, N = 346.62551.681. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

ASKAP tConvolveCuda

Processing: Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: GriddingJetson TX1Desktop2K4K6K8K10KSE +/- 7.41, N = 6SE +/- 0.00, N = 3262.838588.90-m641. (CXX) g++ options: -fPIC -O3 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

ASKAP tConvolveCuda

Processing: Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: DegriddingJetson TX1Desktop3K6K9K12K15KSE +/- 7.47, N = 3SE +/- 290.03, N = 3649.0515372.07-m641. (CXX) g++ options: -fPIC -O3 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

CUDA Mini-Nbody

Test: Original

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: OriginalJetson TX1Desktop110220330440550SE +/- 9.09, N = 6SE +/- 0.07, N = 3513.4730.47

CUDA Mini-Nbody

Test: Cache Blocking

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Cache BlockingJetson TX1Desktop60120180240300SE +/- 7.87, N = 6SE +/- 0.02, N = 3277.4812.96

CUDA Mini-Nbody

Test: Loop Unrolling

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Loop UnrollingJetson TX1Desktop50100150200250SE +/- 3.29, N = 6SE +/- 0.03, N = 3236.4013.37

CUDA Mini-Nbody

Test: SOA Data Layout

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: SOA Data LayoutJetson TX1Desktop110220330440550SE +/- 8.35, N = 3SE +/- 0.01, N = 3529.5926.68

CUDA Mini-Nbody

Test: Flush Denormals To Zero

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Flush Denormals To ZeroJetson TX1Desktop120240360480600SE +/- 5.74, N = 3SE +/- 0.01, N = 3538.0726.72


Phoronix Test Suite v10.8.4