gpu3Multicore1 AMD Ryzen Threadripper 2950X 16-Core testing with a ASRock X399 Professional Gaming (P3.80 BIOS) and MSI NVIDIA GeForce GTX 1080 8GB on Ubuntu 16.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2102113-HA-GPU3MULTI38&grr .
gpu3Multicore1 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Display Server Display Driver OpenCL Vulkan Compiler File-System Screen Resolution gpu3Multicore1 AMD Ryzen Threadripper 2950X 16-Core @ 3.50GHz (16 Cores / 32 Threads) ASRock X399 Professional Gaming (P3.80 BIOS) AMD 17h 126GB 1000GB Samsung SSD 860 MSI NVIDIA GeForce GTX 1080 8GB NVIDIA GP104 HD Audio Aquantia AQC107 NBase-T/IEEE + 2 x Intel I211 + Intel Dual Band-AC 3168NGW Ubuntu 16.04 4.19.174-custom (x86_64) X Server 1.19.6 NVIDIA OpenCL 1.2 CUDA 10.1.120 1.1.99 GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + CUDA 9.2 ext4 640x480 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x800820b - Python 2.7.12 + Python 3.5.2 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
gpu3Multicore1 mysqlslap: 32 mysqlslap: 512 mysqlslap: 256 mysqlslap: 128 mysqlslap: 64 blender: Pabellon Barcelona - OpenCL blender: Pabellon Barcelona - NVIDIA OptiX build-gcc: Time To Compile openvkl: vklBenchmarkUnstructuredVolume blender: Fishy Cat - OpenCL blender: Fishy Cat - NVIDIA OptiX lammps: 20k Atoms libgav1: Chimera 1080p 10-bit blender: Barbershop - NVIDIA OptiX blender: Barbershop - OpenCL blender: Barbershop - CPU-Only build-llvm: Time To Compile openvkl: vklBenchmark blender: Pabellon Barcelona - CPU-Only rodinia: OpenMP LavaMD blender: Classroom - OpenCL blender: Classroom - NVIDIA OptiX blender: BMW27 - NVIDIA OptiX blender: BMW27 - OpenCL blender: Classroom - CPU-Only radiance: Serial libgav1: Chimera 1080p blender: Barbershop - CUDA mysqlslap: 16 intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-MPI1 Exchange npb: EP.D mysqlslap: 8 libgav1: Summer Nature 4K hpcg: mysqlslap: 4 blender: Pabellon Barcelona - CUDA ospray: San Miguel - Path Tracer asmfish: 1024 Hash Memory, 26 Depth parboil: OpenMP MRI Gridding intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 Sendrecv svt-av1: Enc Mode 0 - 1080p gromacs: Water Benchmark blender: Fishy Cat - CPU-Only mt-dgemm: Sustained Floating-Point Rate askap: tConvolve MT - Degridding askap: tConvolve MT - Gridding ospray: XFrog Forest - Path Tracer mysqlslap: 1 x265: Bosphorus 4K rodinia: OpenMP Leukocyte vpxenc: Speed 0 blender: BMW27 - CPU-Only ebizzy: rodinia: OpenMP Streamcluster rodinia: OpenMP HotSpot3D build2: Time To Compile kvazaar: Bosphorus 4K - Slow kvazaar: Bosphorus 4K - Medium onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - f32 - CPU namd: ATPase Simulation - 327,506 Atoms radiance: SMP Parallel onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU npb: BT.C parboil: OpenMP LBM ospray: XFrog Forest - SciVis blender: Fishy Cat - CUDA libgav1: Summer Nature 1080p blender: Classroom - CUDA openvkl: vklBenchmarkVdbVolume compress-zstd: 19 build-eigen: Time To Compile john-the-ripper: MD5 openvkl: vklBenchmarkStructuredVolume rav1e: 1 rav1e: 5 tachyon: Total Time pennant: sedovbig ospray: NASA Streamlines - Path Tracer build-linux-kernel: Time To Compile npb: LU.C npb: IS.D compress-7zip: Compress Speed Test aobench: 2048 x 2048 - Total Time intel-mpi: IMB-MPI1 PingPong rav1e: 6 intel-mpi: IMB-P2P PingPong x265: Bosphorus 1080p aom-av1: Speed 6 Realtime build-ffmpeg: Time To Compile aom-av1: Speed 0 Two-Pass pennant: leblancbig ospray: San Miguel - SciVis rust-mandel: Time To Complete Serial/Parallel Mandelbrot m-queens: Time To Solve kvazaar: Bosphorus 4K - Very Fast c-ray: Total Time - 4K, 16 Rays Per Pixel aom-av1: Speed 6 Two-Pass vpxenc: Speed 5 nero2d: Total Time compress-zstd: 3 john-the-ripper: Blowfish aircrack-ng: rav1e: 10 oidn: Memorial askap: Hogbom Clean OpenMP blender: BMW27 - CUDA tungsten: Water Caustic npb: SP.B aom-av1: Speed 4 Two-Pass build-mplayer: Time To Compile coremark: CoreMark Size 666 - Iterations Per Second kvazaar: Bosphorus 1080p - Slow aom-av1: Speed 8 Realtime kvazaar: Bosphorus 1080p - Medium xsbench: npb: FT.C onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU build-imagemagick: Time To Compile rust-prime: Prime Number Test To 200,000,000 swet: Average kvazaar: Bosphorus 4K - Ultra Fast svt-av1: Enc Mode 4 - 1080p npb: CG.C rodinia: OpenMP CFD Solver ospray: Magnetic Reconnection - SciVis tungsten: Hair arrayfire: BLAS CPU askap: tConvolve OpenMP - Degridding askap: tConvolve OpenMP - Gridding onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU npb: EP.C primesieve: 1e12 Prime Number Generation onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU ospray: NASA Streamlines - SciVis lammps: Rhodopsin Protein svt-av1: Enc Mode 8 - 1080p npb: MG.C tungsten: Volumetric Caustic tungsten: Non-Exponential parboil: OpenMP Stencil kvazaar: Bosphorus 1080p - Very Fast sysbench: CPU sysbench: Memory onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU svt-hevc: 1080p 8-bit YUV To HEVC Video Encode n-queens: Elapsed Time ffmpeg: H.264 HD To NTSC DV smallpt: Global Illumination Renderer; 128 Samples parboil: OpenMP MRI-Q onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU kvazaar: Bosphorus 1080p - Ultra Fast onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU ospray: Magnetic Reconnection - Path Tracer compress-pbzip2: 256MB File Compression parboil: OpenMP CUTCP gpu3Multicore1 322 170 171 177 219 957.47 954.94 947.290 1358541 854.84 852.90 10.921 15.29 585.55 575.76 449.85 429.220 179 339.34 330.705 324.61 321.26 316.48 316.48 290.41 758.359 37.25 236.16 945 575.42 1701.77 621.18 1018 17.03 7.02881 1073 192.15 1.33 39041533 170.793706 270.90 2001.41 0.126 1.130 143.83 1.610873 2273.88 1575.19 1.58 1813 5.07 108.620 5.93 99.46 444948 19.508 97.437 90.045 6.94 7.06 4055.95 4043.52 4043.69 1.30977 235.272 2153.61 2153.94 2150.86 37405.63 71.679357 3.05 70.47 53.71 67.35 16110776 45.4 63.576 1052667 65718519 0.347 1.042 55.4636 51.03195 4.42 49.810 42621.39 877.87 73186 44.496 2338.83 1.371 6329420 14.65 15.06 39.720 0.26 38.66881 17.54 38.554 36.964 16.77 34.584 2.97 18.27 32.454 5059.1 34430 28284.282 3.032 6.93 268.337 27.40 26.4170 13344.88 1.87 25.106 490007.450348 25.99 27.12 26.92 2979914 19497.22 8.31723 4.75016 20.801 20.465 757710060 31.16 4.703 8001.35 17.988 12.66 16.8214 410.332 2716.9 1857.74 4.10359 2.93945 631.90 13.639 2.59347 2.45445 22.22 10.019 35.694 17236.60 10.1462 10.1038 8.251748 59.84 34334.3745 7093621.1853 1.50128 5.03966 72.35 8.010 7.688 6.618 6.089053 13.4320 10.0448 106.81 5.31369 7.52129 166.67 2.430 2.122072 OpenBenchmarking.org
MariaDB Clients: 32 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 32 gpu3Multicore1 70 140 210 280 350 SE +/- 62.25, N = 9 322 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 512 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 512 gpu3Multicore1 40 80 120 160 200 SE +/- 0.29, N = 3 170 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 256 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 256 gpu3Multicore1 40 80 120 160 200 SE +/- 0.37, N = 3 171 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 128 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 128 gpu3Multicore1 40 80 120 160 200 SE +/- 0.10, N = 3 177 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
MariaDB Clients: 64 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 64 gpu3Multicore1 50 100 150 200 250 SE +/- 0.35, N = 3 219 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
Blender Blend File: Pabellon Barcelona - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: OpenCL gpu3Multicore1 200 400 600 800 1000 SE +/- 1.53, N = 3 957.47
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX gpu3Multicore1 200 400 600 800 1000 SE +/- 5.63, N = 3 954.94
Timed GCC Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed GCC Compilation 9.3.0 Time To Compile gpu3Multicore1 200 400 600 800 1000 SE +/- 0.65, N = 3 947.29
OpenVKL Benchmark: vklBenchmarkUnstructuredVolume OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkUnstructuredVolume gpu3Multicore1 300K 600K 900K 1200K 1500K SE +/- 2593.98, N = 3 1358541 MIN: 17295 / MAX: 4612260
Blender Blend File: Fishy Cat - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: OpenCL gpu3Multicore1 200 400 600 800 1000 SE +/- 4.14, N = 3 854.84
Blender Blend File: Fishy Cat - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: NVIDIA OptiX gpu3Multicore1 200 400 600 800 1000 SE +/- 4.49, N = 3 852.90
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms gpu3Multicore1 3 6 9 12 15 SE +/- 0.03, N = 3 10.92 1. (CXX) g++ options: -O3 -pthread -lm
libgav1 Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Chimera 1080p 10-bit gpu3Multicore1 4 8 12 16 20 SE +/- 0.15, N = 3 15.29 1. (CXX) g++ options: -O3 -lpthread
Blender Blend File: Barbershop - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: NVIDIA OptiX gpu3Multicore1 130 260 390 520 650 SE +/- 2.13, N = 3 585.55
Blender Blend File: Barbershop - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: OpenCL gpu3Multicore1 120 240 360 480 600 SE +/- 2.24, N = 3 575.76
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CPU-Only gpu3Multicore1 100 200 300 400 500 SE +/- 0.88, N = 3 449.85
Timed LLVM Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 10.0 Time To Compile gpu3Multicore1 90 180 270 360 450 SE +/- 2.48, N = 3 429.22
OpenVKL Benchmark: vklBenchmark OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmark gpu3Multicore1 40 80 120 160 200 179 MIN: 1 / MAX: 587
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CPU-Only gpu3Multicore1 70 140 210 280 350 SE +/- 0.86, N = 3 339.34
Rodinia Test: OpenMP LavaMD OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP LavaMD gpu3Multicore1 70 140 210 280 350 SE +/- 0.71, N = 3 330.71 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Blender Blend File: Classroom - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: OpenCL gpu3Multicore1 70 140 210 280 350 SE +/- 1.77, N = 3 324.61
Blender Blend File: Classroom - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: NVIDIA OptiX gpu3Multicore1 70 140 210 280 350 SE +/- 0.41, N = 3 321.26
Blender Blend File: BMW27 - Compute: NVIDIA OptiX OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: NVIDIA OptiX gpu3Multicore1 70 140 210 280 350 SE +/- 0.62, N = 3 316.48
Blender Blend File: BMW27 - Compute: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: OpenCL gpu3Multicore1 70 140 210 280 350 SE +/- 0.57, N = 3 316.48
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CPU-Only gpu3Multicore1 60 120 180 240 300 SE +/- 0.30, N = 3 290.41
Radiance Benchmark Test: Serial OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: Serial gpu3Multicore1 160 320 480 640 800 758.36
libgav1 Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Chimera 1080p gpu3Multicore1 9 18 27 36 45 SE +/- 0.05, N = 3 37.25 1. (CXX) g++ options: -O3 -lpthread
Blender Blend File: Barbershop - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Barbershop - Compute: CUDA gpu3Multicore1 50 100 150 200 250 SE +/- 1.46, N = 3 236.16
MariaDB Clients: 16 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 16 gpu3Multicore1 200 400 600 800 1000 SE +/- 1.66, N = 3 945 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange gpu3Multicore1 120 240 360 480 600 SE +/- 17.24, N = 12 575.42 MIN: 0.3 / MAX: 17330.78 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange gpu3Multicore1 400 800 1200 1600 2000 SE +/- 136.55, N = 12 1701.77 MAX: 18194.89 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D gpu3Multicore1 130 260 390 520 650 SE +/- 0.35, N = 3 621.18 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
MariaDB Clients: 8 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 8 gpu3Multicore1 200 400 600 800 1000 SE +/- 3.01, N = 3 1018 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
libgav1 Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Summer Nature 4K gpu3Multicore1 4 8 12 16 20 SE +/- 0.01, N = 3 17.03 1. (CXX) g++ options: -O3 -lpthread
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 gpu3Multicore1 2 4 6 8 10 SE +/- 0.01315, N = 3 7.02881 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
MariaDB Clients: 4 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 4 gpu3Multicore1 200 400 600 800 1000 SE +/- 2.39, N = 3 1073 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
Blender Blend File: Pabellon Barcelona - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Pabellon Barcelona - Compute: CUDA gpu3Multicore1 40 80 120 160 200 SE +/- 1.07, N = 3 192.15
OSPray Demo: San Miguel - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: Path Tracer gpu3Multicore1 0.2993 0.5986 0.8979 1.1972 1.4965 SE +/- 0.00, N = 3 1.33 MIN: 1.32 / MAX: 1.34
asmFish 1024 Hash Memory, 26 Depth OpenBenchmarking.org Nodes/second, More Is Better asmFish 2018-07-23 1024 Hash Memory, 26 Depth gpu3Multicore1 8M 16M 24M 32M 40M SE +/- 41169.03, N = 3 39041533
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding gpu3Multicore1 40 80 120 160 200 SE +/- 0.80, N = 3 170.79 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv gpu3Multicore1 60 120 180 240 300 SE +/- 5.97, N = 15 270.90 MIN: 0.16 / MAX: 7916.44 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv gpu3Multicore1 400 800 1200 1600 2000 SE +/- 147.96, N = 15 2001.41 MAX: 19536.02 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
SVT-AV1 Encoder Mode: Enc Mode 0 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 0 - Input: 1080p gpu3Multicore1 0.0284 0.0568 0.0852 0.1136 0.142 SE +/- 0.000, N = 3 0.126 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
GROMACS Water Benchmark OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2020.3 Water Benchmark gpu3Multicore1 0.2543 0.5086 0.7629 1.0172 1.2715 SE +/- 0.002, N = 3 1.130 1. (CXX) g++ options: -O3 -pthread -lrt -lpthread -lm
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CPU-Only gpu3Multicore1 30 60 90 120 150 SE +/- 0.04, N = 3 143.83
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate gpu3Multicore1 0.3624 0.7248 1.0872 1.4496 1.812 SE +/- 0.007697, N = 3 1.610873 1. (CC) gcc options: -O3 -march=native -fopenmp
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 2.64, N = 3 2273.88 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding gpu3Multicore1 300 600 900 1200 1500 SE +/- 0.61, N = 3 1575.19 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OSPray Demo: XFrog Forest - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: Path Tracer gpu3Multicore1 0.3555 0.711 1.0665 1.422 1.7775 SE +/- 0.00, N = 3 1.58 MIN: 1.56 / MAX: 1.61
MariaDB Clients: 1 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 10.5.2 Clients: 1 gpu3Multicore1 400 800 1200 1600 2000 SE +/- 7.16, N = 3 1813 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O2 -lpthread -lbz2 -lnuma -lcrypt -lz -lm -lssl -lcrypto -ldl
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 4K gpu3Multicore1 1.1408 2.2816 3.4224 4.5632 5.704 SE +/- 0.06, N = 3 5.07 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
Rodinia Test: OpenMP Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte gpu3Multicore1 20 40 60 80 100 SE +/- 0.57, N = 3 108.62 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
VP9 libvpx Encoding Speed: Speed 0 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 0 gpu3Multicore1 1.3343 2.6686 4.0029 5.3372 6.6715 SE +/- 0.01, N = 3 5.93 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=c++11
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CPU-Only gpu3Multicore1 20 40 60 80 100 SE +/- 0.23, N = 3 99.46
ebizzy OpenBenchmarking.org Records/s, More Is Better ebizzy 0.3 gpu3Multicore1 100K 200K 300K 400K 500K SE +/- 5403.48, N = 15 444948 1. (CC) gcc options: -pthread -lpthread -O3 -march=native
Rodinia Test: OpenMP Streamcluster OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Streamcluster gpu3Multicore1 5 10 15 20 25 SE +/- 0.23, N = 15 19.51 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Rodinia Test: OpenMP HotSpot3D OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP HotSpot3D gpu3Multicore1 20 40 60 80 100 SE +/- 0.83, N = 3 97.44 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.13 Time To Compile gpu3Multicore1 20 40 60 80 100 SE +/- 0.16, N = 3 90.05
Kvazaar Video Input: Bosphorus 4K - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Slow gpu3Multicore1 2 4 6 8 10 SE +/- 0.02, N = 3 6.94 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Medium gpu3Multicore1 2 4 6 8 10 SE +/- 0.01, N = 3 7.06 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 900 1800 2700 3600 4500 SE +/- 0.97, N = 3 4055.95 MIN: 4045.37 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU gpu3Multicore1 900 1800 2700 3600 4500 SE +/- 6.67, N = 3 4043.52 MIN: 4021.89 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU gpu3Multicore1 900 1800 2700 3600 4500 SE +/- 10.83, N = 3 4043.69 MIN: 4015.01 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms gpu3Multicore1 0.2947 0.5894 0.8841 1.1788 1.4735 SE +/- 0.00223, N = 3 1.30977
Radiance Benchmark Test: SMP Parallel OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: SMP Parallel gpu3Multicore1 50 100 150 200 250 235.27
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 1.16, N = 3 2153.61 MIN: 2144.87 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 2.49, N = 3 2153.94 MIN: 2140.59 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 1.08, N = 3 2150.86 MIN: 2141.22 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C gpu3Multicore1 8K 16K 24K 32K 40K SE +/- 37.34, N = 3 37405.63 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM gpu3Multicore1 16 32 48 64 80 SE +/- 0.02, N = 3 71.68 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
OSPray Demo: XFrog Forest - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: SciVis gpu3Multicore1 0.6863 1.3726 2.0589 2.7452 3.4315 SE +/- 0.00, N = 3 3.05 MIN: 3.01 / MAX: 3.09
Blender Blend File: Fishy Cat - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Fishy Cat - Compute: CUDA gpu3Multicore1 16 32 48 64 80 SE +/- 0.72, N = 3 70.47
libgav1 Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better libgav1 2019-10-05 Video Input: Summer Nature 1080p gpu3Multicore1 12 24 36 48 60 SE +/- 0.04, N = 3 53.71 1. (CXX) g++ options: -O3 -lpthread
Blender Blend File: Classroom - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: Classroom - Compute: CUDA gpu3Multicore1 15 30 45 60 75 SE +/- 0.53, N = 3 67.35
OpenVKL Benchmark: vklBenchmarkVdbVolume OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkVdbVolume gpu3Multicore1 3M 6M 9M 12M 15M SE +/- 189330.67, N = 3 16110776 MIN: 463741 / MAX: 96050880
Zstd Compression Compression Level: 19 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 19 gpu3Multicore1 10 20 30 40 50 SE +/- 0.03, N = 3 45.4 1. (CC) gcc options: -O3 -pthread -lz
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.3.9 Time To Compile gpu3Multicore1 14 28 42 56 70 SE +/- 0.06, N = 3 63.58
John The Ripper Test: MD5 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: MD5 gpu3Multicore1 200K 400K 600K 800K 1000K SE +/- 6489.31, N = 3 1052667 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -pthread -lm -lz -ldl -lcrypt -lbz2
OpenVKL Benchmark: vklBenchmarkStructuredVolume OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 0.9 Benchmark: vklBenchmarkStructuredVolume gpu3Multicore1 14M 28M 42M 56M 70M SE +/- 249259.82, N = 3 65718519 MIN: 429600 / MAX: 792670464
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 1 gpu3Multicore1 0.0781 0.1562 0.2343 0.3124 0.3905 SE +/- 0.001, N = 3 0.347
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 5 gpu3Multicore1 0.2345 0.469 0.7035 0.938 1.1725 SE +/- 0.001, N = 3 1.042
Tachyon Total Time OpenBenchmarking.org Seconds, Fewer Is Better Tachyon 0.99b6 Total Time gpu3Multicore1 12 24 36 48 60 SE +/- 0.15, N = 3 55.46 1. (CC) gcc options: -m64 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig gpu3Multicore1 12 24 36 48 60 SE +/- 0.04, N = 3 51.03 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
OSPray Demo: NASA Streamlines - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: Path Tracer gpu3Multicore1 0.9945 1.989 2.9835 3.978 4.9725 SE +/- 0.01, N = 3 4.42 MIN: 4.35 / MAX: 4.55
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 5.4 Time To Compile gpu3Multicore1 11 22 33 44 55 SE +/- 0.53, N = 3 49.81
NAS Parallel Benchmarks Test / Class: LU.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C gpu3Multicore1 9K 18K 27K 36K 45K SE +/- 122.06, N = 3 42621.39 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D gpu3Multicore1 200 400 600 800 1000 SE +/- 0.89, N = 3 877.87 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
7-Zip Compression Compress Speed Test OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 16.02 Compress Speed Test gpu3Multicore1 16K 32K 48K 64K 80K SE +/- 211.29, N = 3 73186 1. (CXX) g++ options: -pipe -lpthread
AOBench Size: 2048 x 2048 - Total Time OpenBenchmarking.org Seconds, Fewer Is Better AOBench Size: 2048 x 2048 - Total Time gpu3Multicore1 10 20 30 40 50 SE +/- 0.14, N = 3 44.50 1. (CC) gcc options: -lm -O3
Intel MPI Benchmarks Test: IMB-MPI1 PingPong OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong gpu3Multicore1 500 1000 1500 2000 2500 SE +/- 442.13, N = 12 2338.83 MIN: 3.77 / MAX: 11785.09 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 6 gpu3Multicore1 0.3085 0.617 0.9255 1.234 1.5425 SE +/- 0.002, N = 3 1.371
Intel MPI Benchmarks Test: IMB-P2P PingPong OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong gpu3Multicore1 1.4M 2.8M 4.2M 5.6M 7M SE +/- 35072.49, N = 3 6329420 MIN: 1185 / MAX: 15350414 1. (CXX) g++ options: -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 3.4 Video Input: Bosphorus 1080p gpu3Multicore1 4 8 12 16 20 SE +/- 0.06, N = 3 14.65 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
AOM AV1 Encoder Mode: Speed 6 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Realtime gpu3Multicore1 4 8 12 16 20 SE +/- 0.02, N = 3 15.06 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Timed FFmpeg Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed FFmpeg Compilation 4.2.2 Time To Compile gpu3Multicore1 9 18 27 36 45 SE +/- 0.09, N = 3 39.72
AOM AV1 Encoder Mode: Speed 0 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 0 Two-Pass gpu3Multicore1 0.0585 0.117 0.1755 0.234 0.2925 SE +/- 0.00, N = 3 0.26 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig gpu3Multicore1 9 18 27 36 45 SE +/- 0.03, N = 3 38.67 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
OSPray Demo: San Miguel - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: SciVis gpu3Multicore1 4 8 12 16 20 SE +/- 0.00, N = 3 17.54 MIN: 16.95 / MAX: 18.52
Rust Mandelbrot Time To Complete Serial/Parallel Mandelbrot OpenBenchmarking.org Seconds, Fewer Is Better Rust Mandelbrot Time To Complete Serial/Parallel Mandelbrot gpu3Multicore1 9 18 27 36 45 SE +/- 0.01, N = 3 38.55 1. (CC) gcc options: -m64 -pie -nodefaultlibs -ldl -lrt -lpthread -lgcc_s -lc -lm -lutil
m-queens Time To Solve OpenBenchmarking.org Seconds, Fewer Is Better m-queens 1.2 Time To Solve gpu3Multicore1 8 16 24 32 40 SE +/- 0.08, N = 3 36.96 1. (CXX) g++ options: -fopenmp -O2 -march=native
Kvazaar Video Input: Bosphorus 4K - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Very Fast gpu3Multicore1 4 8 12 16 20 SE +/- 0.01, N = 3 16.77 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
C-Ray Total Time - 4K, 16 Rays Per Pixel OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel gpu3Multicore1 8 16 24 32 40 SE +/- 0.06, N = 3 34.58 1. (CC) gcc options: -lm -lpthread -O3
AOM AV1 Encoder Mode: Speed 6 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 6 Two-Pass gpu3Multicore1 0.6683 1.3366 2.0049 2.6732 3.3415 SE +/- 0.01, N = 3 2.97 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
VP9 libvpx Encoding Speed: Speed 5 OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.8.2 Speed: Speed 5 gpu3Multicore1 4 8 12 16 20 SE +/- 0.05, N = 3 18.27 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=c++11
Open FMM Nero2D Total Time OpenBenchmarking.org Seconds, Fewer Is Better Open FMM Nero2D 2.0.2 Total Time gpu3Multicore1 8 16 24 32 40 SE +/- 0.08, N = 3 32.45 1. (CXX) g++ options: -O2 -lfftw3 -llapack -lblas -lgfortran -lquadmath -lm -pthread -lmpi_cxx -lmpi
Zstd Compression Compression Level: 3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.4.5 Compression Level: 3 gpu3Multicore1 1100 2200 3300 4400 5500 SE +/- 3.85, N = 3 5059.1 1. (CC) gcc options: -O3 -pthread -lz
John The Ripper Test: Blowfish OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: Blowfish gpu3Multicore1 7K 14K 21K 28K 35K SE +/- 93.23, N = 3 34430 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -pthread -lm -lz -ldl -lcrypt -lbz2
Aircrack-ng OpenBenchmarking.org k/s, More Is Better Aircrack-ng 1.5.2 gpu3Multicore1 6K 12K 18K 24K 30K SE +/- 68.07, N = 3 28284.28 1. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -fcommon -rdynamic -lpthread -lz -lcrypto -lhwloc -ldl -lm -pthread
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.4 Speed: 10 gpu3Multicore1 0.6822 1.3644 2.0466 2.7288 3.411 SE +/- 0.009, N = 3 3.032
Intel Open Image Denoise Scene: Memorial OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 1.2.0 Scene: Memorial gpu3Multicore1 2 4 6 8 10 SE +/- 0.05, N = 3 6.93
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP gpu3Multicore1 60 120 180 240 300 SE +/- 0.24, N = 3 268.34 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 2.90 Blend File: BMW27 - Compute: CUDA gpu3Multicore1 6 12 18 24 30 SE +/- 0.02, N = 3 27.40
Tungsten Renderer Scene: Water Caustic OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Water Caustic gpu3Multicore1 6 12 18 24 30 SE +/- 0.09, N = 3 26.42 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
NAS Parallel Benchmarks Test / Class: SP.B OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B gpu3Multicore1 3K 6K 9K 12K 15K SE +/- 29.92, N = 3 13344.88 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
AOM AV1 Encoder Mode: Speed 4 Two-Pass OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 4 Two-Pass gpu3Multicore1 0.4208 0.8416 1.2624 1.6832 2.104 SE +/- 0.01, N = 3 1.87 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Timed MPlayer Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed MPlayer Compilation 1.4 Time To Compile gpu3Multicore1 6 12 18 24 30 SE +/- 0.07, N = 3 25.11
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second gpu3Multicore1 100K 200K 300K 400K 500K SE +/- 885.70, N = 3 490007.45 1. (CC) gcc options: -O2 -lrt" -lrt
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Slow OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Slow gpu3Multicore1 6 12 18 24 30 SE +/- 0.03, N = 3 25.99 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
AOM AV1 Encoder Mode: Speed 8 Realtime OpenBenchmarking.org Frames Per Second, More Is Better AOM AV1 2.0 Encoder Mode: Speed 8 Realtime gpu3Multicore1 6 12 18 24 30 SE +/- 0.17, N = 3 27.12 1. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Medium OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Medium gpu3Multicore1 6 12 18 24 30 SE +/- 0.04, N = 3 26.92 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Xsbench OpenBenchmarking.org Lookups/s, More Is Better Xsbench 2017-07-06 gpu3Multicore1 600K 1200K 1800K 2400K 3000K SE +/- 920.47, N = 3 2979914 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -lm
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C gpu3Multicore1 4K 8K 12K 16K 20K SE +/- 18.12, N = 3 19497.22 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 2 4 6 8 10 SE +/- 0.09293, N = 3 8.31723 MIN: 7.94 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU gpu3Multicore1 1.0688 2.1376 3.2064 4.2752 5.344 SE +/- 0.01438, N = 3 4.75016 MIN: 4.35 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile gpu3Multicore1 5 10 15 20 25 SE +/- 0.10, N = 3 20.80
Rust Prime Benchmark Prime Number Test To 200,000,000 OpenBenchmarking.org Seconds, Fewer Is Better Rust Prime Benchmark Prime Number Test To 200,000,000 gpu3Multicore1 5 10 15 20 25 SE +/- 0.01, N = 3 20.47 1. (CC) gcc options: -m64 -pie -nodefaultlibs -ldl -lrt -lpthread -lgcc_s -lc -lm -lutil
Swet Average OpenBenchmarking.org Operations Per Second, More Is Better Swet 1.5.16 Average gpu3Multicore1 160M 320M 480M 640M 800M SE +/- 9820613.43, N = 3 757710060 1. (CC) gcc options: -lm -lpthread -lcurses -lrt
Kvazaar Video Input: Bosphorus 4K - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 4K - Video Preset: Ultra Fast gpu3Multicore1 7 14 21 28 35 SE +/- 0.05, N = 3 31.16 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
SVT-AV1 Encoder Mode: Enc Mode 4 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 4 - Input: 1080p gpu3Multicore1 1.0582 2.1164 3.1746 4.2328 5.291 SE +/- 0.015, N = 3 4.703 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C gpu3Multicore1 2K 4K 6K 8K 10K SE +/- 8.06, N = 3 8001.35 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
Rodinia Test: OpenMP CFD Solver OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP CFD Solver gpu3Multicore1 4 8 12 16 20 SE +/- 0.06, N = 3 17.99 1. (CXX) g++ options: -m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl
OSPray Demo: Magnetic Reconnection - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: SciVis gpu3Multicore1 3 6 9 12 15 SE +/- 0.00, N = 3 12.66 MIN: 12.5 / MAX: 12.82
Tungsten Renderer Scene: Hair OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Hair gpu3Multicore1 4 8 12 16 20 SE +/- 0.03, N = 3 16.82 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU gpu3Multicore1 90 180 270 360 450 SE +/- 0.49, N = 3 410.33 1. (CXX) g++ options: -rdynamic
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding gpu3Multicore1 600 1200 1800 2400 3000 SE +/- 0.00, N = 3 2716.9 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding gpu3Multicore1 400 800 1200 1600 2000 SE +/- 11.39, N = 3 1857.74 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU gpu3Multicore1 0.9233 1.8466 2.7699 3.6932 4.6165 SE +/- 0.01135, N = 3 4.10359 MIN: 3.82 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 0.6614 1.3228 1.9842 2.6456 3.307 SE +/- 0.00193, N = 3 2.93945 MIN: 2.79 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C gpu3Multicore1 140 280 420 560 700 SE +/- 0.54, N = 3 631.90 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
Primesieve 1e12 Prime Number Generation OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 7.4 1e12 Prime Number Generation gpu3Multicore1 4 8 12 16 20 SE +/- 0.03, N = 3 13.64 1. (CXX) g++ options: -O3 -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU gpu3Multicore1 0.5835 1.167 1.7505 2.334 2.9175 SE +/- 0.00134, N = 3 2.59347 MIN: 2.53 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 0.5523 1.1046 1.6569 2.2092 2.7615 SE +/- 0.00098, N = 3 2.45445 MIN: 2.29 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OSPray Demo: NASA Streamlines - Renderer: SciVis OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: SciVis gpu3Multicore1 5 10 15 20 25 SE +/- 0.00, N = 3 22.22 MIN: 21.74 / MAX: 22.73
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein gpu3Multicore1 3 6 9 12 15 SE +/- 0.11, N = 15 10.02 1. (CXX) g++ options: -O3 -pthread -lm
SVT-AV1 Encoder Mode: Enc Mode 8 - Input: 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 0.8 Encoder Mode: Enc Mode 8 - Input: 1080p gpu3Multicore1 8 16 24 32 40 SE +/- 0.12, N = 3 35.69 1. (CXX) g++ options: -O3 -fcommon -fPIE -fPIC -pie
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C gpu3Multicore1 4K 8K 12K 16K 20K SE +/- 8.84, N = 3 17236.60 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 1.10.2
Tungsten Renderer Scene: Volumetric Caustic OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Volumetric Caustic gpu3Multicore1 3 6 9 12 15 SE +/- 0.01, N = 3 10.15 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
Tungsten Renderer Scene: Non-Exponential OpenBenchmarking.org Seconds, Fewer Is Better Tungsten Renderer 0.2.2 Scene: Non-Exponential gpu3Multicore1 3 6 9 12 15 SE +/- 0.04, N = 3 10.10 1. (CXX) g++ options: -std=c++0x -march=broadwell -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -mfma -mbmi2 -mno-avx -mno-avx2 -mno-xop -mno-fma4 -mno-avx512f -mno-avx512vl -mno-avx512pf -mno-avx512er -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512ifma -mno-avx512vbmi -fstrict-aliasing -O3 -rdynamic -lpthread -ldl
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil gpu3Multicore1 2 4 6 8 10 SE +/- 0.029315, N = 3 8.251748 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Very Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Very Fast gpu3Multicore1 13 26 39 52 65 SE +/- 0.04, N = 3 59.84 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 2018-07-28 Test: CPU gpu3Multicore1 7K 14K 21K 28K 35K SE +/- 2.71, N = 3 34334.37 1. (CC) gcc options: -pthread -O3 -funroll-loops -ggdb3 -march=amdfam10 -rdynamic -ldl -lm
Sysbench Test: Memory OpenBenchmarking.org Events Per Second, More Is Better Sysbench 2018-07-28 Test: Memory gpu3Multicore1 1.5M 3M 4.5M 6M 7.5M SE +/- 3413.58, N = 3 7093621.19 1. (CC) gcc options: -pthread -O3 -funroll-loops -ggdb3 -march=amdfam10 -rdynamic -ldl -lm
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 0.3378 0.6756 1.0134 1.3512 1.689 SE +/- 0.00670, N = 3 1.50128 MIN: 1.41 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU gpu3Multicore1 1.1339 2.2678 3.4017 4.5356 5.6695 SE +/- 0.00243, N = 3 5.03966 MIN: 4.98 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
SVT-HEVC 1080p 8-bit YUV To HEVC Video Encode OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.4.1 1080p 8-bit YUV To HEVC Video Encode gpu3Multicore1 16 32 48 64 80 SE +/- 0.05, N = 3 72.35 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
N-Queens Elapsed Time OpenBenchmarking.org Seconds, Fewer Is Better N-Queens 1.0 Elapsed Time gpu3Multicore1 2 4 6 8 10 SE +/- 0.001, N = 3 8.010 1. (CC) gcc options: -static -fopenmp -O3 -march=native
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 4.0.2 H.264 HD To NTSC DV gpu3Multicore1 2 4 6 8 10 SE +/- 0.050, N = 3 7.688 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lm -lxcb -lxcb-shape -lxcb-xfixes -lxcb-render -pthread -lbz2 -std=c11 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Smallpt Global Illumination Renderer; 128 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 128 Samples gpu3Multicore1 2 4 6 8 10 SE +/- 0.008, N = 3 6.618 1. (CXX) g++ options: -fopenmp -O3
Parboil Test: OpenMP MRI-Q OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI-Q gpu3Multicore1 2 4 6 8 10 SE +/- 0.001530, N = 3 6.089053 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 3 6 9 12 15 SE +/- 0.01, N = 3 13.43 MIN: 13.21 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU gpu3Multicore1 3 6 9 12 15 SE +/- 0.01, N = 3 10.04 MIN: 9.89 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Kvazaar Video Input: Bosphorus 1080p - Video Preset: Ultra Fast OpenBenchmarking.org Frames Per Second, More Is Better Kvazaar 2.0 Video Input: Bosphorus 1080p - Video Preset: Ultra Fast gpu3Multicore1 20 40 60 80 100 SE +/- 0.08, N = 3 106.81 1. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU gpu3Multicore1 1.1956 2.3912 3.5868 4.7824 5.978 SE +/- 0.00230, N = 3 5.31369 MIN: 5.08 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU gpu3Multicore1 2 4 6 8 10 SE +/- 0.00364, N = 3 7.52129 MIN: 6.83 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
OSPray Demo: Magnetic Reconnection - Renderer: Path Tracer OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: Path Tracer gpu3Multicore1 40 80 120 160 200 SE +/- 0.00, N = 3 166.67 MIN: 125 / MAX: 200
Parallel BZIP2 Compression 256MB File Compression OpenBenchmarking.org Seconds, Fewer Is Better Parallel BZIP2 Compression 1.1.12 256MB File Compression gpu3Multicore1 0.5468 1.0936 1.6404 2.1872 2.734 SE +/- 0.009, N = 3 2.430 1. (CXX) g++ options: -O2 -pthread -lbz2 -lpthread
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP gpu3Multicore1 0.4775 0.955 1.4325 1.91 2.3875 SE +/- 0.020422, N = 3 2.122072 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Phoronix Test Suite v10.8.4