cuda

2 x Intel Xeon E5-2630 v3 testing with a Supermicro X10DRG-H v1.02 and ASPEED ASPEED Family on Ubuntu 14.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/1602024-HA-CUDA0016144.

cudaProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionDisplay Servercuda-minimini-nbody0202cuda-test2 x Intel Xeon E5-2630 v3 - ASPEED ASPEED Family -2 x Intel Xeon E5-2630 v3 @ 2.40GHz (32 Cores)Supermicro X10DRG-H v1.02Intel Haswell-E DMI264512MB500GB HGST HTS545050A7LLVMpipeSyncMasterIntel I350 Gigabit ConnectionUbuntu 14.043.13.0-32-generic (x86_64)Unity 7.2.2modesetting 0.8.12.1 Mesa 10.1.3 Gallium 0.4GCC 4.8.4 + CUDA 7.5ext41024x768X Server 1.15.1GCC 4.8.4 + CUDA 7.0ASPEED ASPEED Family1280x1024OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-browser-plugin --disable-libmudflap --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Scaling Governor: acpi-cpufreq ondemand

cudashoc: CUDA - Triadshoc: CUDA - FFT SPshoc: CUDA - MD5 Hashshoc: CUDA - Max SP Flopsshoc: CUDA - Bus Speed Downloadshoc: CUDA - Bus Speed Readbackshoc: CUDA - Texture Read Bandwidthaskap: Griddingaskap: Degriddingcuda-mini-nbody: Originalcuda-mini-nbody: Cache Blockingcuda-mini-nbody: Loop Unrollingcuda-mini-nbody: SOA Data Layoutcuda-mini-nbody: Flush Denormals To Zeroshoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthcuda-minimini-nbody0202cuda-test2 x Intel Xeon E5-2630 v3 - ASPEED ASPEED Family -12.89285.467.776557.1210.3012.69335.137006.7414532.5058.3444.7444.4565.2865.5959.5559.1115.73291.417.546561.668.7012.57335.047132.9914013.5059.1044.9044.9165.4565.5011.84174.737.736505.9512.4613.07331.70OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Triadcuda-minicuda-test48121620SE +/- 1.80, N = 6SE +/- 0.02, N = 312.8915.731. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: FFT SPcuda-minicuda-test60120180240300SE +/- 1.86, N = 3SE +/- 5.52, N = 6285.46291.411. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: MD5 Hashcuda-minicuda-test246810SE +/- 0.12, N = 3SE +/- 0.13, N = 47.777.541. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Max SP Flopscuda-minicuda-test14002800420056007000SE +/- 3.17, N = 3SE +/- 0.15, N = 36557.126561.661. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Bus Speed Downloadcuda-minicuda-test3691215SE +/- 0.97, N = 6SE +/- 0.78, N = 610.308.701. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Bus Speed Readbackcuda-minicuda-test3691215SE +/- 0.33, N = 6SE +/- 0.31, N = 612.6912.571. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Texture Read Bandwidthcuda-minicuda-test70140210280350SE +/- 0.40, N = 3SE +/- 0.27, N = 3335.13335.041. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

ASKAP tConvolveCuda

Processing: Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: Griddingcuda-minicuda-test15003000450060007500SE +/- 0.00, N = 3SE +/- 63.12, N = 37006.747132.991. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

ASKAP tConvolveCuda

Processing: Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: Degriddingcuda-minicuda-test3K6K9K12K15KSE +/- 259.50, N = 3SE +/- 0.00, N = 314532.5014013.501. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

CUDA Mini-Nbody

Test: Original

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Originalcuda-minimini-nbody0202cuda-test1326395265SE +/- 0.21, N = 3SE +/- 0.94, N = 6SE +/- 0.10, N = 3SE +/- 0.25, N = 358.3459.5559.1159.10

CUDA Mini-Nbody

Test: Cache Blocking

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Cache Blockingcuda-minicuda-test1020304050SE +/- 0.06, N = 3SE +/- 0.17, N = 344.7444.90

CUDA Mini-Nbody

Test: Loop Unrolling

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Loop Unrollingcuda-minicuda-test1020304050SE +/- 0.42, N = 3SE +/- 0.14, N = 344.4544.91

CUDA Mini-Nbody

Test: SOA Data Layout

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: SOA Data Layoutcuda-minicuda-test1530456075SE +/- 0.11, N = 3SE +/- 0.08, N = 365.2865.45

CUDA Mini-Nbody

Test: Flush Denormals To Zero

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Flush Denormals To Zerocuda-minicuda-test1530456075SE +/- 0.22, N = 3SE +/- 0.20, N = 365.5965.50

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Triadcuda-test3691215SE +/- 0.03, N = 311.841. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPcuda-test4080120160200SE +/- 1.40, N = 3174.731. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 Hashcuda-test246810SE +/- 0.12, N = 37.731. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Max SP Flopscuda-test14002800420056007000SE +/- 6.06, N = 36505.951. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed Downloadcuda-test3691215SE +/- 0.00, N = 312.461. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed Readbackcuda-test3691215SE +/- 0.14, N = 313.071. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read Bandwidthcuda-test70140210280350SE +/- 0.26, N = 3331.701. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft


Phoronix Test Suite v10.8.4