CUDA NVIDIA Tegra X1 GPGPU Linux Tests Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes. Jetson TX1: Processor: Cortex A57 rev 1 @ 1.91GHz (4 Cores), Motherboard: jetson_tx1, Memory: 4096MB, Disk: 16GB 016G32 + 16GB SL16G, Graphics: NVIDIA TEGRA OS: Ubuntu 14.04, Kernel: 3.10.67-g3a5c467 (aarch64), Desktop: Unity 7.2.2, Display Server: X Server 1.15.1, Display Driver: NVIDIA 1.0.0, Compiler: GCC 4.8.4 + CUDA 7.0, File-System: ext4, Screen Resolution: 3840x2160 Jetson TX2 Hogh-P: Processor: ARMv8 rev 3 @ 2.04GHz (6 Cores), Motherboard: quill, Memory: 8192MB, Disk: 31GB 032G34, Graphics: NVIDIA TEGRA OS: Ubuntu 16.04, Kernel: 4.4.38-tegra (aarch64), Desktop: Unity 7.4.5, Display Server: X Server 1.18.4, Display Driver: NVIDIA 1.0.0, Compiler: GCC 5.4.0 20160609 + CUDA 9.0, File-System: ext4, Screen Resolution: 1366x768 NVIDIA GTX 650 Ti: Processor: Intel Core i5-2400 @ 3.40GHz (4 Cores), Motherboard: ASUS P8H67-M PRO (3802 BIOS), Chipset: Intel 2nd Generation Core Family DRAM, Memory: 16384MB, Disk: 1000GB Western Digital WD10EALX-009 + 250GB Western Digital WD2500AAKX-7 + SSD 240GB, Graphics: Intel 2nd Generation Core Family IGP 981MB, Audio: Realtek ALC892, Monitor: DELL E178WFP, Network: Realtek RTL8111/8168/8411 OS: Ubuntu 16.04, Kernel: 4.4.0-140-generic (x86_64), Desktop: Unity 7.4.5, Display Server: X Server 1.18.4, Display Driver: modesetting 1.18.4, OpenCL: OpenCL 1.2 CUDA 10.0.211, Compiler: GCC 5.5.0 20171010 + CUDA 9.0, File-System: ext4, Screen Resolution: 1366x768 SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: FFT SP GFLOPS > Higher Is Better Jetson TX1 ........ 3.92 |========================= Jetson TX2 Hogh-P . 8.24 |===================================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: MD5 Hash GHash/s > Higher Is Better Jetson TX1 ........ 0.62 |================================== Jetson TX2 Hogh-P . 0.98 |===================================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Texture Read Bandwidth GB/s > Higher Is Better Jetson TX1 ........ 46.62 |=============================== Jetson TX2 Hogh-P . 79.04 |==================================================== ASKAP tConvolveCuda 2015-11-10 Processing: Gridding Million Grid Points Per Second > Higher Is Better Jetson TX1 ........ 263 |================ Jetson TX2 Hogh-P . 905 |====================================================== ASKAP tConvolveCuda 2015-11-10 Processing: Degridding Million Grid Points Per Second > Higher Is Better Jetson TX1 ........ 649 |======================= Jetson TX2 Hogh-P . 1513 |===================================================== CUDA Mini-Nbody 2015-11-10 Test: Original Seconds < Lower Is Better Jetson TX1 ........ 513 |=================================== Jetson TX2 Hogh-P . 781 |====================================================== CUDA Mini-Nbody 2015-11-10 Test: Cache Blocking Seconds < Lower Is Better Jetson TX1 ........ 277 |====================================================== Jetson TX2 Hogh-P . 176 |================================== CUDA Mini-Nbody 2015-11-10 Test: Loop Unrolling Seconds < Lower Is Better Jetson TX1 ........ 236.00 |=================================================== Jetson TX2 Hogh-P . 200.00 |=========================================== NVIDIA GTX 650 Ti . 2.14 | CUDA Mini-Nbody 2015-11-10 Test: SOA Data Layout Seconds < Lower Is Better Jetson TX1 ........ 530 |=================================================== Jetson TX2 Hogh-P . 557 |====================================================== CUDA Mini-Nbody 2015-11-10 Test: Flush Denormals To Zero Seconds < Lower Is Better Jetson TX1 ........ 538 |==================================================== Jetson TX2 Hogh-P . 554 |======================================================