Xavier Carmel CPU Core Compiler Tests NVIDIA Jetson Xavier ARMv8 compiler benchmarks on GCC and LLVM Clang for a future article on Phoronix. GCC 7.3.0: Processor: ARMv8 rev 0 @ 2.27GHz (8 Cores), Motherboard: jetson-xavier, Memory: 16384MB, Disk: 31GB HBG4a2, Graphics: NVIDIA TEGRA OS: Ubuntu 18.04, Kernel: 4.9.108-tegra (aarch64), Desktop: Unity 7.5.0, Display Server: X Server 1.19.6, Display Driver: NVIDIA 1.0.0, Vulkan: 1.1.76, Compiler: GCC 7.3.0 + CUDA 10.0, File-System: ext4, Screen Resolution: 1920x2160 GCC 8.2.0: Processor: ARMv8 rev 0 @ 2.27GHz (8 Cores), Motherboard: jetson-xavier, Memory: 16384MB, Disk: 31GB HBG4a2, Graphics: NVIDIA TEGRA OS: Ubuntu 18.04, Kernel: 4.9.108-tegra (aarch64), Desktop: Unity 7.5.0, Display Server: X Server 1.19.6, Display Driver: NVIDIA 1.0.0, Vulkan: 1.1.76, Compiler: GCC 8.2.0 + clang (GCC) 8.2.0 + CUDA 10.0, File-System: ext4, Screen Resolution: 1920x2160 GCC 9.0.0: Processor: ARMv8 rev 0 @ 2.27GHz (8 Cores), Motherboard: jetson-xavier, Memory: 16384MB, Disk: 31GB HBG4a2, Graphics: NVIDIA TEGRA OS: Ubuntu 18.04, Kernel: 4.9.108-tegra (aarch64), Desktop: Unity 7.5.0, Display Server: X Server 1.19.6, Display Driver: NVIDIA 1.0.0, Vulkan: 1.1.76, Compiler: GCC 9.0.0 20181230 + clang (GCC) 9.0.0 20181230 (experimental) + CUDA 10.0, File-System: ext4, Screen Resolution: 1920x2160 Clang 6.0: Processor: ARMv8 rev 0 @ 2.27GHz (8 Cores), Motherboard: jetson-xavier, Memory: 16384MB, Disk: 31GB HBG4a2, Graphics: NVIDIA TEGRA OS: Ubuntu 18.04, Kernel: 4.9.108-tegra (aarch64), Desktop: Unity 7.5.0, Display Server: X Server 1.19.6, Display Driver: NVIDIA 1.0.0, Vulkan: 1.1.76, Compiler: Clang 6.0.0-1ubuntu2 + CUDA 10.0, File-System: ext4, Screen Resolution: 1920x2160 Clang 7.0.1: Processor: ARMv8 rev 0 @ 2.27GHz (8 Cores), Motherboard: jetson-xavier, Memory: 16384MB, Disk: 31GB HBG4a2, Graphics: NVIDIA TEGRA OS: Ubuntu 18.04, Kernel: 4.9.108-tegra (aarch64), Desktop: Unity 7.5.0, Display Server: X Server 1.19.6, Display Driver: NVIDIA 1.0.0, Vulkan: 1.1.76, Compiler: Clang 7.0.1 + LLVM 7.0.1 + CUDA 10.0, File-System: ext4, Screen Resolution: 1920x2160 Clang 8.0 SVN: Processor: ARMv8 rev 0 @ 2.27GHz (8 Cores), Motherboard: jetson-xavier, Memory: 16384MB, Disk: 31GB HBG4a2, Graphics: NVIDIA TEGRA OS: Ubuntu 18.04, Kernel: 4.9.108-tegra (aarch64), Desktop: Unity 7.5.0, Display Server: X Server 1.19.6, Display Driver: NVIDIA 1.0.0, Vulkan: 1.1.76, Compiler: Clang 8.0.0 (SVN 350356) + LLVM 8.0.0svn + CUDA 10.0, File-System: ext4, Screen Resolution: 1920x2160 SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better GCC 7.3.0 ..... 572 |========================================================== GCC 8.2.0 ..... 554 |======================================================== GCC 9.0.0 ..... 126 |============= Clang 6.0 ..... 540 |======================================================= Clang 7.0.1 ... 102 |========== Clang 8.0 SVN . 102 |========== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better GCC 7.3.0 ..... 217.00 |======================================================= GCC 8.2.0 ..... 197.00 |================================================== GCC 9.0.0 ..... 47.45 |============ Clang 6.0 ..... 214.00 |====================================================== Clang 7.0.1 ... 34.33 |========= Clang 8.0 SVN . 34.30 |========= SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better GCC 7.3.0 ..... 183.00 |======================================================= GCC 8.2.0 ..... 164.00 |================================================= GCC 9.0.0 ..... 69.79 |===================== Clang 6.0 ..... 163.00 |================================================= Clang 7.0.1 ... 52.89 |================ Clang 8.0 SVN . 52.86 |================ SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better GCC 7.3.0 ..... 620.00 |======================================================= GCC 8.2.0 ..... 565.00 |================================================== GCC 9.0.0 ..... 118.00 |========== Clang 6.0 ..... 546.00 |================================================ Clang 7.0.1 ... 78.11 |======= Clang 8.0 SVN . 78.18 |======= SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better GCC 7.3.0 ..... 962 |========================================================== GCC 8.2.0 ..... 966 |========================================================== GCC 9.0.0 ..... 151 |========= Clang 6.0 ..... 840 |================================================== Clang 7.0.1 ... 197 |============ Clang 8.0 SVN . 198 |============ SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better GCC 7.3.0 ..... 879 |======================================================= GCC 8.2.0 ..... 879 |======================================================= GCC 9.0.0 ..... 245 |=============== Clang 6.0 ..... 935 |========================================================== Clang 7.0.1 ... 148 |========= Clang 8.0 SVN . 148 |========= TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better GCC 7.3.0 . 146 |============================================================== GCC 8.2.0 . 127 |====================================================== Clang 6.0 . 137 |========================================================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better GCC 7.3.0 ..... 269 |========================================== GCC 8.2.0 ..... 248 |======================================= GCC 9.0.0 ..... 368 |========================================================== Clang 6.0 ..... 265 |========================================== Clang 7.0.1 ... 281 |============================================ Clang 8.0 SVN . 270 |=========================================== 7-Zip Compression 16.02 Compress Speed Test MIPS > Higher Is Better GCC 7.3.0 . 18139 |====================================================== GCC 8.2.0 . 20239 |============================================================ GCC 9.0.0 . 8651 |========================== asmFish 2018-07-23 1024 Hash Memory, 26 Depth Nodes/second > Higher Is Better GCC 7.3.0 ..... 8295044 |====================================================== GCC 8.2.0 ..... 7195166 |=============================================== GCC 9.0.0 ..... 4991900 |================================ Clang 6.0 ..... 8305398 |====================================================== Clang 7.0.1 ... 5083834 |================================= Clang 8.0 SVN . 5046980 |================================= Timed Linux Kernel Compilation 4.18 Time To Compile Seconds < Lower Is Better GCC 7.3.0 . 831 |======================================= GCC 8.2.0 . 1312 |============================================================= GCC 9.0.0 . 963 |============================================= Timed PHP Compilation 7.1.9 Time To Compile Seconds < Lower Is Better GCC 7.3.0 ..... 276 |======================================= GCC 8.2.0 ..... 409 |========================================================== Clang 6.0 ..... 319 |============================================= Clang 7.0.1 ... 337 |================================================ Clang 8.0 SVN . 355 |================================================== C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better GCC 7.3.0 ..... 258 |=================== GCC 8.2.0 ..... 469 |================================== GCC 9.0.0 ..... 451 |================================= Clang 6.0 ..... 408 |============================== Clang 7.0.1 ... 796 |========================================================== Clang 8.0 SVN . 796 |========================================================== Primesieve 7.2 1e12 Prime Number Generation Seconds < Lower Is Better GCC 7.3.0 ..... 106 |=========================================== GCC 8.2.0 ..... 134 |====================================================== Clang 6.0 ..... 106 |=========================================== Clang 7.0.1 ... 144 |========================================================== Clang 8.0 SVN . 137 |======================================================= Bullet Physics Engine 2.81 Test: Raytests Seconds < Lower Is Better GCC 7.3.0 . 5.53 |======================== GCC 8.2.0 . 13.92 |============================================================ GCC 9.0.0 . 13.02 |======================================================== Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better GCC 7.3.0 . 12.68 |======================================== GCC 8.2.0 . 19.23 |============================================================ GCC 9.0.0 . 17.13 |===================================================== Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better GCC 7.3.0 . 12.86 |=============================== GCC 8.2.0 . 25.01 |============================================================ GCC 9.0.0 . 22.16 |===================================================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better GCC 7.3.0 . 9.95 |====================== GCC 8.2.0 . 26.96 |============================================================ GCC 9.0.0 . 25.50 |========================================================= Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better GCC 7.3.0 . 7.30 |================================== GCC 8.2.0 . 12.93 |============================================================ GCC 9.0.0 . 11.47 |===================================================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better GCC 7.3.0 . 2.12 |================================== GCC 8.2.0 . 3.85 |============================================================= GCC 9.0.0 . 3.35 |===================================================== Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better GCC 7.3.0 . 2.32 |========================== GCC 8.2.0 . 5.52 |============================================================= GCC 9.0.0 . 5.04 |======================================================== FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better GCC 7.3.0 ..... 57.27 |=================== GCC 8.2.0 ..... 152.90 |================================================== GCC 9.0.0 ..... 140.12 |============================================== Clang 6.0 ..... 54.61 |================== Clang 7.0.1 ... 166.52 |======================================================= Clang 8.0 SVN . 166.90 |======================================================= LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better GCC 7.3.0 ..... 31.83 |=========== GCC 8.2.0 ..... 160.36 |======================================================= GCC 9.0.0 ..... 139.86 |================================================ Clang 6.0 ..... 18.86 |====== Clang 7.0.1 ... 49.86 |================= Clang 8.0 SVN . 53.26 |================== Redis 4.0.8 Test: LPOP Requests Per Second > Higher Is Better GCC 7.3.0 ..... 1105270 |================================================ GCC 8.2.0 ..... 417642 |================== GCC 9.0.0 ..... 377319 |================ Clang 6.0 ..... 1254672 |====================================================== Clang 7.0.1 ... 377274 |================ Clang 8.0 SVN . 372959 |================ Redis 4.0.8 Test: SADD Requests Per Second > Higher Is Better GCC 7.3.0 ..... 899066 |====================================================== GCC 8.2.0 ..... 310886 |=================== GCC 9.0.0 ..... 303100 |================== Clang 6.0 ..... 923930 |======================================================= Clang 7.0.1 ... 302329 |================== Clang 8.0 SVN . 299731 |================== Redis 4.0.8 Test: LPUSH Requests Per Second > Higher Is Better GCC 7.3.0 ..... 705232 |==================================================== GCC 8.2.0 ..... 254896 |=================== GCC 9.0.0 ..... 239389 |================== Clang 6.0 ..... 744500 |======================================================= Clang 7.0.1 ... 234910 |================= Clang 8.0 SVN . 237439 |================== Redis 4.0.8 Test: GET Requests Per Second > Higher Is Better GCC 7.3.0 ..... 973500 |============================================== GCC 8.2.0 ..... 391000 |=================== GCC 9.0.0 ..... 363700 |================= Clang 6.0 ..... 1138493 |====================================================== Clang 7.0.1 ... 366301 |================= Clang 8.0 SVN . 366870 |================= Redis 4.0.8 Test: SET Requests Per Second > Higher Is Better GCC 7.3.0 ..... 756550 |=================================================== GCC 8.2.0 ..... 274012 |================== GCC 9.0.0 ..... 274302 |================== Clang 6.0 ..... 816377 |======================================================= Clang 7.0.1 ... 268782 |================== Clang 8.0 SVN . 262650 |================== Apache Benchmark 2.4.29 Static Web Page Serving Requests Per Second > Higher Is Better GCC 7.3.0 ..... 10782 |======================================================== GCC 8.2.0 ..... 6218 |================================ GCC 9.0.0 ..... 6009 |=============================== Clang 6.0 ..... 10741 |======================================================== Clang 7.0.1 ... 6040 |=============================== Clang 8.0 SVN . 6061 |===============================