gpu3Multicore1 AMD Ryzen Threadripper 2950X 16-Core testing with a ASRock X399 Professional Gaming (P3.80 BIOS) and MSI NVIDIA GeForce GTX 1080 8GB on Ubuntu 16.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2102113-HA-GPU3MULTI38&grs .
gpu3Multicore1 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenCL Vulkan Compiler File-System Screen Resolution gpu3Multicore1 AMD Ryzen Threadripper 2950X 16-Core @ 3.50GHz (16 Cores / 32 Threads) ASRock X399 Professional Gaming (P3.80 BIOS) AMD 17h 126GB 1000GB Samsung SSD 860 MSI NVIDIA GeForce GTX 1080 8GB NVIDIA GP104 HD Audio Aquantia AQC107 NBase-T/IEEE + 2 x Intel I211 + Intel Dual Band-AC 3168NGW Ubuntu 16.04 4.19.174-custom (x86_64) X Server 1.19.6 NVIDIA OpenCL 1.2 CUDA 10.1.120 1.1.99 GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + CUDA 9.2 ext4 640x480 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x800820b - Python 2.7.12 + Python 3.5.2 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
gpu3Multicore1 xsbench: blender: Pabellon Barcelona - NVIDIA OptiX blender: Pabellon Barcelona - CPU-Only blender: Pabellon Barcelona - OpenCL blender: Pabellon Barcelona - CUDA blender: Barbershop - NVIDIA OptiX blender: Fishy Cat - NVIDIA OptiX blender: Classroom - NVIDIA OptiX blender: Barbershop - CPU-Only blender: Fishy Cat - CPU-Only blender: Classroom - CPU-Only blender: BMW27 - NVIDIA OptiX blender: Barbershop - OpenCL blender: Fishy Cat - OpenCL blender: Classroom - OpenCL blender: Barbershop - CUDA blender: Fishy Cat - CUDA blender: Classroom - CUDA blender: BMW27 - CPU-Only blender: BMW27 - OpenCL blender: BMW27 - CUDA sysbench: CPU sysbench: Memory mysqlslap: 512 mysqlslap: 256 mysqlslap: 128 mysqlslap: 64 mysqlslap: 16 mysqlslap: 8 mysqlslap: 4 mysqlslap: 1 gromacs: Water Benchmark intel-mpi: IMB-P2P PingPong askap: Hogbom Clean OpenMP askap: tConvolve OpenMP - Degridding askap: tConvolve OpenMP - Gridding askap: tConvolve MT - Degridding askap: tConvolve MT - Gridding aircrack-ng: tachyon: Total Time radiance: SMP Parallel radiance: Serial n-queens: Elapsed Time m-queens: Time To Solve ffmpeg: H.264 HD To NTSC DV build-eigen: Time To Compile aobench: 2048 x 2048 - Total Time tungsten: Volumetric Caustic tungsten: Non-Exponential tungsten: Water Caustic tungsten: Hair smallpt: Global Illumination Renderer; 128 Samples rust-prime: Prime Number Test To 200,000,000 rust-mandel: Time To Complete Serial/Parallel Mandelbrot primesieve: 1e12 Prime Number Generation compress-pbzip2: 256MB File Compression c-ray: Total Time - 4K, 16 Rays Per Pixel build2: Time To Compile build-mplayer: Time To Compile build-llvm: Time To Compile build-linux-kernel: Time To Compile build-imagemagick: Time To Compile build-gcc: Time To Compile build-ffmpeg: Time To Compile ebizzy: swet: Average asmfish: 1024 Hash Memory, 26 Depth compress-7zip: Compress Speed Test coremark: CoreMark Size 666 - Iterations Per Second openvkl: vklBenchmarkUnstructuredVolume openvkl: vklBenchmarkStructuredVolume openvkl: vklBenchmarkVdbVolume openvkl: vklBenchmark oidn: Memorial mt-dgemm: Sustained Floating-Point Rate x265: Bosphorus 1080p x265: Bosphorus 4K vpxenc: Speed 5 vpxenc: Speed 0 svt-hevc: 1080p 8-bit YUV To HEVC Video Encode svt-av1: Enc Mode 8 - 1080p svt-av1: Enc Mode 4 - 1080p svt-av1: Enc Mode 0 - 1080p rav1e: 10 rav1e: 6 rav1e: 5 rav1e: 1 kvazaar: Bosphorus 1080p - Ultra Fast kvazaar: Bosphorus 1080p - Very Fast kvazaar: Bosphorus 4K - Ultra Fast kvazaar: Bosphorus 4K - Very Fast kvazaar: Bosphorus 1080p - Medium kvazaar: Bosphorus 1080p - Slow kvazaar: Bosphorus 4K - Medium kvazaar: Bosphorus 4K - Slow aom-av1: Speed 8 Realtime aom-av1: Speed 6 Two-Pass aom-av1: Speed 6 Realtime aom-av1: Speed 4 Two-Pass aom-av1: Speed 0 Two-Pass ospray: Magnetic Reconnection - Path Tracer ospray: NASA Streamlines - Path Tracer ospray: Magnetic Reconnection - SciVis ospray: XFrog Forest - Path Tracer ospray: NASA Streamlines - SciVis ospray: San Miguel - Path Tracer ospray: XFrog Forest - SciVis ospray: San Miguel - SciVis onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - f32 - CPU nero2d: Total Time john-the-ripper: MD5 john-the-ripper: Blowfish arrayfire: BLAS CPU compress-zstd: 19 compress-zstd: 3 libgav1: Chimera 1080p 10-bit libgav1: Summer Nature 1080p libgav1: Summer Nature 4K libgav1: Chimera 1080p lammps: Rhodopsin Protein lammps: 20k Atoms pennant: leblancbig pennant: sedovbig namd: ATPase Simulation - 327,506 Atoms rodinia: OpenMP Streamcluster rodinia: OpenMP CFD Solver rodinia: OpenMP Leukocyte rodinia: OpenMP HotSpot3D rodinia: OpenMP LavaMD parboil: OpenMP MRI Gridding parboil: OpenMP Stencil parboil: OpenMP MRI-Q parboil: OpenMP CUTCP parboil: OpenMP LBM npb: SP.B npb: MG.C npb: LU.C npb: IS.D npb: FT.C npb: EP.D npb: EP.C npb: CG.C npb: BT.C hpcg: mysqlslap: 32 intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 PingPong intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-MPI1 Exchange gpu3Multicore1 2979914 954.94 339.34 957.47 192.15 585.55 852.90 321.26 449.85 143.83 290.41 316.48 575.76 854.84 324.61 236.16 70.47 67.35 99.46 316.48 27.40 34334.3745 7093621.1853 170 171 177 219 945 1018 1073 1813 1.130 6329420 268.337 2716.9 1857.74 2273.88 1575.19 28284.282 55.4636 235.272 758.359 8.010 36.964 7.688 63.576 44.496 10.1462 10.1038 26.4170 16.8214 6.618 20.465 38.554 13.639 2.430 34.584 90.045 25.106 429.220 49.810 20.801 947.290 39.720 444948 757710060 39041533 73186 490007.450348 1358541 65718519 16110776 179 6.93 1.610873 14.65 5.07 18.27 5.93 72.35 35.694 4.703 0.126 3.032 1.371 1.042 0.347 106.81 59.84 31.16 16.77 26.92 25.99 7.06 6.94 27.12 2.97 15.06 1.87 0.26 166.67 4.42 12.66 1.58 22.22 1.33 3.05 17.54 2.45445 2153.61 4043.52 2.59347 2150.86 4055.95 2153.94 4043.69 5.31369 8.31723 13.4320 7.52129 4.75016 10.0448 1.50128 2.93945 5.03966 4.10359 32.454 1052667 34430 410.332 45.4 5059.1 15.29 53.71 17.03 37.25 10.019 10.921 38.66881 51.03195 1.30977 19.508 17.988 108.620 97.437 330.705 170.793706 8.251748 6.089053 2.122072 71.679357 13344.88 17236.60 42621.39 877.87 19497.22 621.18 631.90 8001.35 37405.63 7.02881 322 270.90 2001.41 2338.83 575.42 1701.77 OpenBenchmarking.org
Xsbench OpenBenchmarking.org Lookups/s, More Is Better Xsbench 2017-07-06 gpu3Multicore1 600K 1200K 1800K 2400K 3000K SE +/- 920.47, N = 3 2979914 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -lm
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX gpu3Multicore1 200 400 600 800 1000 SE +/- 5.63, N = 3 954.94
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CPU-Only gpu3Multicore1 70 140 210 280 350 SE +/- 0.86, N = 3 339.34
Blender Blend File: Pabellon Barcelona - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: OpenCL gpu3Multicore1 200 400 600 800 1000 SE +/- 1.53, N = 3 957.47
Blender Blend File: Pabellon Barcelona - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CUDA gpu3Multicore1 40 80 120 160 200 SE +/- 1.07, N = 3 192.15
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: NVIDIA OptiX gpu3Multicore1 130 260 390 520 650 SE +/- 2.13, N = 3 585.55
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: NVIDIA OptiX gpu3Multicore1 200 400 600 800 1000 SE +/- 4.49, N = 3 852.90
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: NVIDIA OptiX gpu3Multicore1 70 140 210 280 350 SE +/- 0.41, N = 3 321.26
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CPU-Only gpu3Multicore1 100 200 300 400 500 SE +/- 0.88, N = 3 449.85
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CPU-Only gpu3Multicore1 30 60 90 120 150 SE +/- 0.04, N = 3 143.83
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CPU-Only gpu3Multicore1 60 120 180 240 300 SE +/- 0.30, N = 3 290.41
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: NVIDIA OptiX gpu3Multicore1 70 140 210 280 350 SE +/- 0.62, N = 3 316.48
Blender Blend File: Barbershop - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: OpenCL gpu3Multicore1 120 240 360 480 600 SE +/- 2.24, N = 3 575.76
Blender Blend File: Fishy Cat - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: OpenCL gpu3Multicore1 200 400 600 800 1000 SE +/- 4.14, N = 3 854.84
Blender Blend File: Classroom - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: OpenCL gpu3Multicore1 70 140 210 280 350 SE +/- 1.77, N = 3 324.61
Blender Blend File: Barbershop - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CUDA gpu3Multicore1 50 100 150 200 250 SE +/- 1.46, N = 3 236.16
Blender Blend File: Fishy Cat - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CUDA gpu3Multicore1 16 32 48 64 80 SE +/- 0.72, N = 3 70.47
Blender Blend File: Classroom - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CUDA gpu3Multicore1 15 30 45 60 75 SE +/- 0.53, N = 3 67.35
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CPU-Only gpu3Multicore1 20 40 60 80 100 SE +/- 0.23, N = 3 99.46
Blender Blend File: BMW27 - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: OpenCL gpu3Multicore1 70 140 210 280 350 SE +/- 0.57, N = 3 316.48
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CUDA gpu3Multicore1 6 12 18 24 30 SE +/- 0.02, N = 3 27.40
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 2018-07-28 Test: CPU gpu3Multicore1 7K 14K 21K 28K 35K SE +/- 2.71, N = 3 34334.37 1. (CC) gcc options: -pthread -O3 -funroll-loops -ggdb3 -march=amdfam10 -rdynamic -ldl -lm
Sysbench Test: Memory OpenBenchmarking.org Events Per Second, More Is Better Sysbench 2018-07-28 Test: Memory gpu3Multicore1 1.5M 3M 4.5M 6M 7.5M SE +/- 3413.58, N = 3 7093621.19 1. (CC) gcc options: -pthread -O3 -funroll-loops -ggdb3 -march=amdfam10 -rdynamic -ldl -lm
MariaDB Clients: 512 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 512 gpu3Multicore1 40 80 120 160 200 SE +/- 0.29, N = 3 170 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 256 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 256 gpu3Multicore1 40 80 120 160 200 SE +/- 0.37, N = 3 171 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 128 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 128 gpu3Multicore1 40 80 120 160 200 SE +/- 0.10, N = 3 177 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 64 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 64 gpu3Multicore1 50 100 150 200 250 SE +/- 0.35, N = 3 219 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 16 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 16 gpu3Multicore1 200 400 600 800 1000 SE +/- 1.66, N = 3 945 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 8 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 8 gpu3Multicore1 200 400 600 800 1000 SE +/- 3.01, N = 3 1018 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 4 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 4 gpu3Multicore1 200 400 600 800 1000 SE +/- 2.39, N = 3 1073 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 1 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 1 gpu3Multicore1 400 800 1200 1600 2000 SE +/- 7.16, N = 3 1813 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
GROMACS Water Benchmark OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2020.3 Water Benchmark gpu3Multicore1 0.2543 0.5086 0.7629 1.0172 1.2715 SE +/- 0.002, N = 3 1.130 1. (CXX) g++ options: -O3 -pthread -lrt -lpthread -lm
Intel MPI Benchmarks Test: IMB-P2P PingPong OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong gpu3Multicore1 1.4M 2.8M 4.2M 5.6M 7M SE +/- 35072.49, N = 3 6329420 MIN: 1185 / MAX: 15350414 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP gpu3Multicore1 60 120 180 240 300 SE +/- 0.24, N = 3 268.34 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding gpu3Multicore1 600 1200 1800 2400 3000 SE +/- 0.00, N = 3 2716.9 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding gpu3Multicore1 400 800 1200 1600 2000 SE +/- 11.39, N = 3 1857.74 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 2.64, N = 3 2273.88 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding gpu3Multicore1 300 600 900 1200 1500 SE +/- 0.61, N = 3 1575.19 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Aircrack-ng OpenBenchmarking.org k/s, More Is Better Aircrack-ng 1.5.2 gpu3Multicore1 6K 12K 18K 24K 30K SE +/- 68.07, N = 3 28284.28 1. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -fcommon -rdynamic -lpthread -lz -lcrypto -lhwloc -ldl -lm -pthread
Tachyon Total Time OpenBenchmarking.org Seconds, Fewer Is Better Tachyon 0.99b6 Total Time gpu3Multicore1 12 24 36 48 60 SE +/- 0.15, N = 3 55.46 1. (CC) gcc options: -m64 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread
Radiance Benchmark Test: SMP Parallel OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: SMP Parallel gpu3Multicore1 50 100 150 200 250 235.27
Radiance Benchmark Test: Serial OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: Serial gpu3Multicore1 160 320 480 640 800 758.36
N-Queens Elapsed Time OpenBenchmarking.org Seconds, Fewer Is Better N-Queens 1.0 Elapsed Time gpu3Multicore1 2 4 6 8 10 SE +/- 0.001, N = 3 8.010 1. (CC) gcc options: -static -fopenmp -O3 -march=native
m-queens Time To Solve OpenBenchmarking.org Seconds, Fewer Is Better m-queens 1.2 Time To Solve gpu3Multicore1 8 16 24 32 40 SE +/- 0.08, N = 3 36.96 1. (CXX) g++ options: -fopenmp -O2 -march=native
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 4.0.2 H.264 HD To NTSC DV gpu3Multicore1 2 4 6 8 10 SE +/- 0.050, N = 3 7.688 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lm -lxcb -lxcb-shape -lxcb-xfixes -lxcb-render -pthread -lbz2 -std=c11 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.3.9 Time To Compile gpu3Multicore1 14 28 42 56 70 SE +/- 0.06, N = 3 63.58
AOBench Size: 2048 x 2048 - Total Time OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time gpu3Multicore1 10 20 30 40 50 SE +/- 0.14, N = 3 44.50 1. (CC) gcc options: -lm -O3
Tungsten Renderer Scene: Volumetric Caustic OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Volumetric Caustic gpu3Multicore1 3 6 9 12 15 SE +/- 0.01, N = 3 10.15 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
Tungsten Renderer Scene: Non-Exponential OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Non-Exponential gpu3Multicore1 3 6 9 12 15 SE +/- 0.04, N = 3 10.10 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
Tungsten Renderer Scene: Water Caustic OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Water Caustic gpu3Multicore1 6 12 18 24 30 SE +/- 0.09, N = 3 26.42 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
Tungsten Renderer Scene: Hair OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Hair gpu3Multicore1 4 8 12 16 20 SE +/- 0.03, N = 3 16.82 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
Smallpt Global Illumination Renderer; 128 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 128 Samples gpu3Multicore1 2 4 6 8 10 SE +/- 0.008, N = 3 6.618 1. (CXX) g++ options: -fopenmp -O3
Rust Prime Benchmark Prime Number Test To 200,000,000 OpenBenchmarking.org Seconds, Fewer Is Better Rust Prime Benchmark Prime Number Test To 200,000,000 gpu3Multicore1 5 10 15 20 25 SE +/- 0.01, N = 3 20.47 1. (CC) gcc options: -m64 -pie -nodefaultlibs -ldl -lrt -lpthread -lgcc_s -lc -lm -lutil
Rust Mandelbrot Time To Complete Serial/Parallel Mandelbrot OpenBenchmarking.org Seconds, Fewer Is Better Rust Mandelbrot Time To Complete Serial/Parallel Mandelbrot gpu3Multicore1 9 18 27 36 45 SE +/- 0.01, N = 3 38.55 1. (CC) gcc options: -m64 -pie -nodefaultlibs -ldl -lrt -lpthread -lgcc_s -lc -lm -lutil
Primesieve 1e12 Prime Number Generation OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.4 1e12 Prime Number Generation gpu3Multicore1 4 8 12 16 20 SE +/- 0.03, N = 3 13.64 1. (CXX) g++ options: -O3 -lpthread
Parallel BZIP2 Compression 256MB File Compression OpenBenchmarking.org Seconds, Fewer Is Better Parallel BZIP2 Compression 1.1.12 256MB File Compression gpu3Multicore1 0.5468 1.0936 1.6404 2.1872 2.734 SE +/- 0.009, N = 3 2.430 1. (CXX) g++ options: -O2 -pthread -lbz2 -lpthread
C-Ray Total Time - 4K, 16 Rays Per Pixel OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel gpu3Multicore1 8 16 24 32 40 SE +/- 0.06, N = 3 34.58 1. (CC) gcc options: -lm -lpthread -O3
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile gpu3Multicore1 20 40 60 80 100 SE +/- 0.16, N = 3 90.05
Timed MPlayer Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed MPlayer Compilation 1.4 Time To Compile gpu3Multicore1 6 12 18 24 30 SE +/- 0.07, N = 3 25.11
Timed LLVM Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 10.0 Time To Compile gpu3Multicore1 90 180 270 360 450 SE +/- 2.48, N = 3 429.22
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 5.4 Time To Compile gpu3Multicore1 11 22 33 44 55 SE +/- 0.53, N = 3 49.81
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile gpu3Multicore1 5 10 15 20 25 SE +/- 0.10, N = 3 20.80
Timed GCC Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed GCC Compilation 9.3.0 Time To Compile gpu3Multicore1 200 400 600 800 1000 SE +/- 0.65, N = 3 947.29
Timed FFmpeg Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed FFmpeg Compilation 4.2.2 Time To Compile gpu3Multicore1 9 18 27 36 45 SE +/- 0.09, N = 3 39.72
ebizzy OpenBenchmarking.org Records/s, More Is Better ebizzy 0.3 gpu3Multicore1 100K 200K 300K 400K 500K SE +/- 5403.48, N = 15 444948 1. (CC) gcc options: -pthread -lpthread -O3 -march=native
Swet Average OpenBenchmarking.org Operations Per Second, More Is Better Swet 1.5.16 Average gpu3Multicore1 160M 320M 480M 640M 800M SE +/- 9820613.43, N = 3 757710060 1. (CC) gcc options: -lm -lpthread -lcurses -lrt
asmFish 1024 Hash Memory, 26 Depth OpenBenchmarking.org Nodes/second, More Is Better asmFish 2018-07-23 1024 Hash Memory, 26 Depth gpu3Multicore1 8M 16M 24M 32M 40M SE +/- 41169.03, N = 3 39041533
7-Zip Compression Compress Speed Test OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 16.02 Compress Speed Test gpu3Multicore1 16K 32K 48K 64K 80K SE +/- 211.29, N = 3 73186 1. (CXX) g++ options: -pipe -lpthread
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second gpu3Multicore1 100K 200K 300K 400K 500K SE +/- 885.70, N = 3 490007.45 1. (CC) gcc options: -O2 -lrt" -lrt
OpenVKL Benchmark: vklBenchmarkUnstructuredVolume OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkUnstructuredVolume gpu3Multicore1 300K 600K 900K 1200K 1500K SE +/- 2593.98, N = 3 1358541 MIN: 17295 / MAX: 4612260
OpenVKL Benchmark: vklBenchmarkStructuredVolume OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkStructuredVolume gpu3Multicore1 14M 28M 42M 56M 70M SE +/- 249259.82, N = 3 65718519 MIN: 429600 / MAX: 792670464
OpenVKL Benchmark: vklBenchmarkVdbVolume OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkVdbVolume gpu3Multicore1 3M 6M 9M 12M 15M SE +/- 189330.67, N = 3 16110776 MIN: 463741 / MAX: 96050880
OpenVKL Benchmark: vklBenchmark OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmark gpu3Multicore1 40 80 120 160 200 179 MIN: 1 / MAX: 587
Intel Open Image Denoise Scene: Memorial OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 1.2.0 Scene: Memorial gpu3Multicore1 2 4 6 8 10 SE +/- 0.05, N = 3 6.93
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate gpu3Multicore1 0.3624 0.7248 1.0872 1.4496 1.812 SE +/- 0.007697, N = 3 1.610873 1. (CC) gcc options: -O3 -march=native -fopenmp
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p gpu3Multicore1 4 8 12 16 20 SE +/- 0.06, N = 3 14.65 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K gpu3Multicore1 1.1408 2.2816 3.4224 4.5632 5.704 SE +/- 0.06, N = 3 5.07 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
VP9 libvpx Encoding Speed: Speed 5 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 gpu3Multicore1 4 8 12 16 20 SE +/- 0.05, N = 3 18.27 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=c++11
VP9 libvpx Encoding Speed: Speed 0 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 gpu3Multicore1 1.3343 2.6686 4.0029 5.3372 6.6715 SE +/- 0.01, N = 3 5.93 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=c++11
SVT-HEVC 1080p 8-bit YUV To HEVC Video Encode OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.4.1 1080p 8-bit YUV To HEVC Video Encode gpu3Multicore1 16 32 48 64 80 SE +/- 0.05, N = 3 72.35 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
SVT-AV1 Encoder Mode: Enc Mode 8 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p gpu3Multicore1 8 16 24 32 40 SE +/- 0.12, N = 3 35.69 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 4 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p gpu3Multicore1 1.0582 2.1164 3.1746 4.2328 5.291 SE +/- 0.015, N = 3 4.703 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
SVT-AV1 Encoder Mode: Enc Mode 0 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p gpu3Multicore1 0.0284 0.0568 0.0852 0.1136 0.142 SE +/- 0.000, N = 3 0.126 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 gpu3Multicore1 0.6822 1.3644 2.0466 2.7288 3.411 SE +/- 0.009, N = 3 3.032
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 gpu3Multicore1 0.3085 0.617 0.9255 1.234 1.5425 SE +/- 0.002, N = 3 1.371
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 gpu3Multicore1 0.2345 0.469 0.7035 0.938 1.1725 SE +/- 0.001, N = 3 1.042
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 1 gpu3Multicore1 0.0781 0.1562 0.2343 0.3124 0.3905 SE +/- 0.001, N = 3 0.347
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast gpu3Multicore1 20 40 60 80 100 SE +/- 0.08, N = 3 106.81 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Very Fast gpu3Multicore1 13 26 39 52 65 SE +/- 0.04, N = 3 59.84 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast gpu3Multicore1 7 14 21 28 35 SE +/- 0.05, N = 3 31.16 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Very Fast gpu3Multicore1 4 8 12 16 20 SE +/- 0.01, N = 3 16.77 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Medium gpu3Multicore1 6 12 18 24 30 SE +/- 0.04, N = 3 26.92 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Slow gpu3Multicore1 6 12 18 24 30 SE +/- 0.03, N = 3 25.99 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Medium gpu3Multicore1 2 4 6 8 10 SE +/- 0.01, N = 3 7.06 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Slow gpu3Multicore1 2 4 6 8 10 SE +/- 0.02, N = 3 6.94 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 8 Realtime gpu3Multicore1 6 12 18 24 30 SE +/- 0.17, N = 3 27.12 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Two-Pass gpu3Multicore1 0.6683 1.3366 2.0049 2.6732 3.3415 SE +/- 0.01, N = 3 2.97 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Realtime gpu3Multicore1 4 8 12 16 20 SE +/- 0.02, N = 3 15.06 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 4 Two-Pass gpu3Multicore1 0.4208 0.8416 1.2624 1.6832 2.104 SE +/- 0.01, N = 3 1.87 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 0 Two-Pass gpu3Multicore1 0.0585 0.117 0.1755 0.234 0.2925 SE +/- 0.00, N = 3 0.26 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
OSPray Demo: Magnetic Reconnection - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: Path Tracer gpu3Multicore1 40 80 120 160 200 SE +/- 0.00, N = 3 166.67 MIN: 125 / MAX: 200
OSPray Demo: NASA Streamlines - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: Path Tracer gpu3Multicore1 0.9945 1.989 2.9835 3.978 4.9725 SE +/- 0.01, N = 3 4.42 MIN: 4.35 / MAX: 4.55
OSPray Demo: Magnetic Reconnection - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: SciVis gpu3Multicore1 3 6 9 12 15 SE +/- 0.00, N = 3 12.66 MIN: 12.5 / MAX: 12.82
OSPray Demo: XFrog Forest - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: Path Tracer gpu3Multicore1 0.3555 0.711 1.0665 1.422 1.7775 SE +/- 0.00, N = 3 1.58 MIN: 1.56 / MAX: 1.61
OSPray Demo: NASA Streamlines - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: SciVis gpu3Multicore1 5 10 15 20 25 SE +/- 0.00, N = 3 22.22 MIN: 21.74 / MAX: 22.73
OSPray Demo: San Miguel - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: Path Tracer gpu3Multicore1 0.2993 0.5986 0.8979 1.1972 1.4965 SE +/- 0.00, N = 3 1.33 MIN: 1.32 / MAX: 1.34
OSPray Demo: XFrog Forest - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: SciVis gpu3Multicore1 0.6863 1.3726 2.0589 2.7452 3.4315 SE +/- 0.00, N = 3 3.05 MIN: 3.01 / MAX: 3.09
OSPray Demo: San Miguel - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: SciVis gpu3Multicore1 4 8 12 16 20 SE +/- 0.00, N = 3 17.54 MIN: 16.95 / MAX: 18.52
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 0.5523 1.1046 1.6569 2.2092 2.7615 SE +/- 0.00098, N = 3 2.45445 MIN: 2.29 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 1.16, N = 3 2153.61 MIN: 2144.87 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU gpu3Multicore1 900 1800 2700 3600 4500 SE +/- 6.67, N = 3 4043.52 MIN: 4021.89 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU gpu3Multicore1 0.5835 1.167 1.7505 2.334 2.9175 SE +/- 0.00134, N = 3 2.59347 MIN: 2.53 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 1.08, N = 3 2150.86 MIN: 2141.22 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 900 1800 2700 3600 4500 SE +/- 0.97, N = 3 4055.95 MIN: 4045.37 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 2.49, N = 3 2153.94 MIN: 2140.59 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU gpu3Multicore1 900 1800 2700 3600 4500 SE +/- 10.83, N = 3 4043.69 MIN: 4015.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 1.1956 2.3912 3.5868 4.7824 5.978 SE +/- 0.00230, N = 3 5.31369 MIN: 5.08 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 2 4 6 8 10 SE +/- 0.09293, N = 3 8.31723 MIN: 7.94 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 3 6 9 12 15 SE +/- 0.01, N = 3 13.43 MIN: 13.21 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU gpu3Multicore1 2 4 6 8 10 SE +/- 0.00364, N = 3 7.52129 MIN: 6.83 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU gpu3Multicore1 1.0688 2.1376 3.2064 4.2752 5.344 SE +/- 0.01438, N = 3 4.75016 MIN: 4.35 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU gpu3Multicore1 3 6 9 12 15 SE +/- 0.01, N = 3 10.04 MIN: 9.89 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 0.3378 0.6756 1.0134 1.3512 1.689 SE +/- 0.00670, N = 3 1.50128 MIN: 1.41 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 0.6614 1.3228 1.9842 2.6456 3.307 SE +/- 0.00193, N = 3 2.93945 MIN: 2.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU gpu3Multicore1 1.1339 2.2678 3.4017 4.5356 5.6695 SE +/- 0.00243, N = 3 5.03966 MIN: 4.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU gpu3Multicore1 0.9233 1.8466 2.7699 3.6932 4.6165 SE +/- 0.01135, N = 3 4.10359 MIN: 3.82 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Open FMM Nero2D Total Time OpenBenchmarking.org Seconds, Fewer Is Better Open FMM Nero2D 2.0.2 Total Time gpu3Multicore1 8 16 24 32 40 SE +/- 0.08, N = 3 32.45 1. (CXX) g++ options: -O2 -lfftw3 -llapack -lblas -lgfortran -lquadmath -lm -pthread -lmpi_cxx -lmpi
John The Ripper Test: MD5 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: MD5 gpu3Multicore1 200K 400K 600K 800K 1000K SE +/- 6489.31, N = 3 1052667 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -pthread -lm -lz -ldl -lcrypt -lbz2
John The Ripper Test: Blowfish OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: Blowfish gpu3Multicore1 7K 14K 21K 28K 35K SE +/- 93.23, N = 3 34430 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -pthread -lm -lz -ldl -lcrypt -lbz2
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU gpu3Multicore1 90 180 270 360 450 SE +/- 0.49, N = 3 410.33 1. (CXX) g++ options: -rdynamic
Zstd Compression Compression Level: 19 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 19 gpu3Multicore1 10 20 30 40 50 SE +/- 0.03, N = 3 45.4 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 3 gpu3Multicore1 1100 2200 3300 4400 5500 SE +/- 3.85, N = 3 5059.1 1. (CC) gcc options: -O3 -pthread -lz
libgav1 Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Chimera 1080p 10-bit gpu3Multicore1 4 8 12 16 20 SE +/- 0.15, N = 3 15.29 1. (CXX) g++ options: -O3 -lpthread
libgav1 Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Summer Nature 1080p gpu3Multicore1 12 24 36 48 60 SE +/- 0.04, N = 3 53.71 1. (CXX) g++ options: -O3 -lpthread
libgav1 Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Summer Nature 4K gpu3Multicore1 4 8 12 16 20 SE +/- 0.01, N = 3 17.03 1. (CXX) g++ options: -O3 -lpthread
libgav1 Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Chimera 1080p gpu3Multicore1 9 18 27 36 45 SE +/- 0.05, N = 3 37.25 1. (CXX) g++ options: -O3 -lpthread
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein gpu3Multicore1 3 6 9 12 15 SE +/- 0.11, N = 15 10.02 1. (CXX) g++ options: -O3 -pthread -lm
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms gpu3Multicore1 3 6 9 12 15 SE +/- 0.03, N = 3 10.92 1. (CXX) g++ options: -O3 -pthread -lm
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig gpu3Multicore1 9 18 27 36 45 SE +/- 0.03, N = 3 38.67 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig gpu3Multicore1 12 24 36 48 60 SE +/- 0.04, N = 3 51.03 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms gpu3Multicore1 0.2947 0.5894 0.8841 1.1788 1.4735 SE +/- 0.00223, N = 3 1.30977
Rodinia Test: OpenMP Streamcluster OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster gpu3Multicore1 5 10 15 20 25 SE +/- 0.23, N = 15 19.51 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Rodinia Test: OpenMP CFD Solver OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver gpu3Multicore1 4 8 12 16 20 SE +/- 0.06, N = 3 17.99 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Rodinia Test: OpenMP Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte gpu3Multicore1 20 40 60 80 100 SE +/- 0.57, N = 3 108.62 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Rodinia Test: OpenMP HotSpot3D OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP HotSpot3D gpu3Multicore1 20 40 60 80 100 SE +/- 0.83, N = 3 97.44 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Rodinia Test: OpenMP LavaMD OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD gpu3Multicore1 70 140 210 280 350 SE +/- 0.71, N = 3 330.71 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding gpu3Multicore1 40 80 120 160 200 SE +/- 0.80, N = 3 170.79 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil gpu3Multicore1 2 4 6 8 10 SE +/- 0.029315, N = 3 8.251748 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP MRI-Q OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI-Q gpu3Multicore1 2 4 6 8 10 SE +/- 0.001530, N = 3 6.089053 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP gpu3Multicore1 0.4775 0.955 1.4325 1.91 2.3875 SE +/- 0.020422, N = 3 2.122072 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM gpu3Multicore1 16 32 48 64 80 SE +/- 0.02, N = 3 71.68 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
NAS Parallel Benchmarks Test / Class: SP.B OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B gpu3Multicore1 3K 6K 9K 12K 15K SE +/- 29.92, N = 3 13344.88 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C gpu3Multicore1 4K 8K 12K 16K 20K SE +/- 8.84, N = 3 17236.60 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: LU.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C gpu3Multicore1 9K 18K 27K 36K 45K SE +/- 122.06, N = 3 42621.39 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D gpu3Multicore1 200 400 600 800 1000 SE +/- 0.89, N = 3 877.87 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C gpu3Multicore1 4K 8K 12K 16K 20K SE +/- 18.12, N = 3 19497.22 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D gpu3Multicore1 130 260 390 520 650 SE +/- 0.35, N = 3 621.18 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C gpu3Multicore1 140 280 420 560 700 SE +/- 0.54, N = 3 631.90 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C gpu3Multicore1 2K 4K 6K 8K 10K SE +/- 8.06, N = 3 8001.35 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C gpu3Multicore1 8K 16K 24K 32K 40K SE +/- 37.34, N = 3 37405.63 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 gpu3Multicore1 2 4 6 8 10 SE +/- 0.01315, N = 3 7.02881 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
MariaDB Clients: 32 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 32 gpu3Multicore1 70 140 210 280 350 SE +/- 62.25, N = 9 322 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv gpu3Multicore1 60 120 180 240 300 SE +/- 5.97, N = 15 270.90 MIN: 0.16 / MAX: 7916.44 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv gpu3Multicore1 400 800 1200 1600 2000 SE +/- 147.96, N = 15 2001.41 MAX: 19536.02 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 PingPong OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 442.13, N = 12 2338.83 MIN: 3.77 / MAX: 11785.09 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange gpu3Multicore1 120 240 360 480 600 SE +/- 17.24, N = 12 575.42 MIN: 0.3 / MAX: 17330.78 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange gpu3Multicore1 400 800 1200 1600 2000 SE +/- 136.55, N = 12 1701.77 MAX: 18194.89 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.4