xeon auggy Tests for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2308065-NE-XEONAUGGY78&grt&rdt .
xeon auggy Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution a b 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Ice Lake IEH 512GB 7682GB INTEL SSDPF2KX076TZ ASPEED VE228 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 22.10 6.2.0-rc5-phx-dodt (x86_64) GNOME Shell 43.0 X Server 1.21.1.3 1.3.224 GCC 12.2.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000389 Java Details - OpenJDK Runtime Environment (build 11.0.19+7-post-Ubuntu-0ubuntu122.10.1) Python Details - Python 3.10.7 Security Details - dodt: Mitigation of DOITM + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
xeon auggy couchdb: 100 - 1000 - 30 couchdb: 300 - 1000 - 30 couchdb: 500 - 1000 - 30 apache-iotdb: 100 - 1 - 200 apache-iotdb: 100 - 1 - 200 apache-iotdb: 100 - 1 - 500 apache-iotdb: 100 - 1 - 500 apache-iotdb: 200 - 1 - 200 apache-iotdb: 200 - 1 - 200 apache-iotdb: 200 - 1 - 500 apache-iotdb: 200 - 1 - 500 apache-iotdb: 500 - 1 - 200 apache-iotdb: 500 - 1 - 200 apache-iotdb: 500 - 1 - 500 apache-iotdb: 500 - 1 - 500 apache-iotdb: 100 - 100 - 200 apache-iotdb: 100 - 100 - 200 apache-iotdb: 100 - 100 - 500 apache-iotdb: 100 - 100 - 500 blender: BMW27 - CPU-Only blender: Classroom - CPU-Only blender: Fishy Cat - CPU-Only blender: Barbershop - CPU-Only dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit embree: Pathtracer - Crown embree: Pathtracer ISPC - Crown embree: Pathtracer - Asian Dragon embree: Pathtracer - Asian Dragon Obj embree: Pathtracer ISPC - Asian Dragon embree: Pathtracer ISPC - Asian Dragon Obj gpaw: Carbon Nanotube heffte: c2c - FFTW - float - 128 heffte: c2c - FFTW - float - 256 heffte: r2c - FFTW - float - 128 heffte: r2c - FFTW - float - 256 heffte: c2c - FFTW - double - 128 heffte: c2c - FFTW - double - 256 heffte: c2c - Stock - float - 128 heffte: c2c - Stock - float - 256 heffte: r2c - FFTW - double - 128 heffte: r2c - FFTW - double - 256 heffte: r2c - Stock - float - 128 heffte: r2c - Stock - float - 256 heffte: c2c - Stock - double - 128 heffte: c2c - Stock - double - 256 heffte: r2c - Stock - double - 128 heffte: r2c - Stock - double - 256 heffte: c2c - FFTW - float - 512 heffte: r2c - FFTW - float - 512 heffte: c2c - FFTW - double - 512 heffte: c2c - Stock - float - 512 heffte: r2c - FFTW - double - 512 heffte: r2c - Stock - float - 512 heffte: c2c - Stock - double - 512 heffte: r2c - Stock - double - 512 oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only libxsmm: 128 libxsmm: 256 libxsmm: 32 libxsmm: 64 liquid-dsp: 32 - 256 - 32 liquid-dsp: 32 - 256 - 57 liquid-dsp: 64 - 256 - 32 liquid-dsp: 64 - 256 - 57 liquid-dsp: 128 - 256 - 32 liquid-dsp: 128 - 256 - 57 liquid-dsp: 160 - 256 - 32 liquid-dsp: 160 - 256 - 57 liquid-dsp: 32 - 256 - 512 liquid-dsp: 64 - 256 - 512 liquid-dsp: 128 - 256 - 512 liquid-dsp: 160 - 256 - 512 liquid-dsp: 1 - 256 - 32 liquid-dsp: 1 - 256 - 57 liquid-dsp: 1 - 256 - 512 liquid-dsp: 16 - 256 - 32 liquid-dsp: 16 - 256 - 57 liquid-dsp: 16 - 256 - 512 ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet encode-opus: WAV To Opus Encode ospray: particle_volume/ao/real_time ospray: particle_volume/scivis/real_time ospray: particle_volume/pathtracer/real_time ospray: gravity_spheres_volume/dim_512/ao/real_time ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/pathtracer/real_time quantlib: remhos: Sample Remap Example srsran: Downlink Processor Benchmark srsran: PUSCH Processor Benchmark, Throughput Total srsran: PUSCH Processor Benchmark, Throughput Thread stress-ng: Pipe stress-ng: Zlib stress-ng: Cloning stress-ng: Pthread stress-ng: AVL Tree stress-ng: Floating Point stress-ng: Matrix 3D Math stress-ng: Vector Shuffle stress-ng: Wide Vector Math stress-ng: Fused Multiply-Add stress-ng: Vector Floating Point build-gcc: Time To Compile vvenc: Bosphorus 4K - Fast vvenc: Bosphorus 4K - Faster vvenc: Bosphorus 1080p - Fast vvenc: Bosphorus 1080p - Faster z3: 1.smt2 z3: 2.smt2 a b 94.834 152.456 1090.424 638644.35 17.54 995259.68 35.99 904320.6 14.84 1134736.54 36.74 1199743.22 13.25 1343156.56 33.38 34266143.85 42.79 39562245.22 109.4 23.69 62.35 30.59 239.55 516.17 282.53 699.97 476.82 72.0419 87.9306 85.2423 76.9608 104.4148 89.8447 45.824 159.344 102.278 199.103 222.215 94.4544 45.8509 107.452 101.8 156.224 93.0098 185.453 236.666 69.4816 46.6636 117.006 101.938 94.8348 170.906 49.4363 93.3349 90.5745 176.630 47.2801 94.2637 3.04 3.05 1.47 1055.3 599.8 633.2 1219.9 992540000 1197700000 1805000000 2069200000 2961100000 2519200000 3390700000 2602300000 400730000 725840000 949400000 1013200000 32338000 53918500 13323000 493660000 615105000 201615000 16.06 7.91 8.71 9.82 7.61 11.48 4.49 17.06 26.27 10.31 5.65 17.57 24.77 15.78 38.20 46.50 9.62 36.736 24.637 24.3592 151.138 21.2056 20.811 22.6977 2622.9 12.195 556.5 9800.5 164.8 40500166.81 6879.86 16195.03 92131.54 610.69 21134.81 12743.81 48054.48 2195391.41 181083180.47 132479.08 957.946 5.672 10.284 15.708 29.077 25.713 87.998 628202.55 18.04 992909.69 36.16 920435.77 14.42 1137612.61 36.82 1141859.25 14.12 1372429.58 32.75 34807016.85 42.19 43021501.4 96.18 23.62 62.51 30.77 239.03 516.50 282.65 699.09 476.77 70.7905 87.9319 85.1315 77.2633 104.5539 90.0132 45.636 158.711 98.8315 199.333 230.541 91.9039 46.4607 107.952 104.345 148.177 92.8161 182.03 236.119 67.9483 46.5052 116.839 102.426 94.3442 171.134 48.7210 92.2568 90.2388 174.114 47.3856 92.7173 3.01 3.03 1.46 1946.7 592.5 639.3 1216.0 993445000 1185500000 1825450000 2076650000 2945150000 2426350000 3381950000 2636450000 396265000 730310000 945190000 1011200000 32267000 53926500 13291000 498410000 623535000 198790000 15.90 7.94 8.77 9.75 7.43 11.64 4.61 16.72 26.19 9.63 5.44 18.15 24.48 16.13 39.37 45.56 10.01 36.726 24.6207 24.7827 151.273 20.9459 20.4818 22.5752 2607.6 12.365 556.8 9756.7 164.7 49325396.97 6880.22 13172.81 90361.7 610.83 21133.02 12742.70 48076.78 2196242.21 181314757.42 131100.25 956.127 5.717 10.430 15.723 29.176 25.251 87.178 OpenBenchmarking.org
Apache CouchDB Bulk Size: 100 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.3.2 Bulk Size: 100 - Inserts: 1000 - Rounds: 30 a 20 40 60 80 100 94.83 1. (CXX) g++ options: -std=c++17 -lmozjs-78 -lm -lei -fPIC -MMD
Apache CouchDB Bulk Size: 300 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.3.2 Bulk Size: 300 - Inserts: 1000 - Rounds: 30 a 30 60 90 120 150 152.46 1. (CXX) g++ options: -std=c++17 -lmozjs-78 -lm -lei -fPIC -MMD
Apache CouchDB Bulk Size: 500 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.3.2 Bulk Size: 500 - Inserts: 1000 - Rounds: 30 a 200 400 600 800 1000 1090.42 1. (CXX) g++ options: -std=c++17 -lmozjs-78 -lm -lei -fPIC -MMD
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 a b 140K 280K 420K 560K 700K 638644.35 628202.55
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 200 a b 4 8 12 16 20 17.54 18.04 MAX: 680.16 MAX: 597.99
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 a b 200K 400K 600K 800K 1000K 995259.68 992909.69
Apache IoTDB Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 1 - Sensor Count: 500 a b 8 16 24 32 40 35.99 36.16 MAX: 724.8 MAX: 769.5
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 a b 200K 400K 600K 800K 1000K 904320.60 920435.77
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 200 a b 4 8 12 16 20 14.84 14.42 MAX: 605.55 MAX: 596.84
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 a b 200K 400K 600K 800K 1000K 1134736.54 1137612.61
Apache IoTDB Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 200 - Batch Size Per Write: 1 - Sensor Count: 500 a b 8 16 24 32 40 36.74 36.82 MAX: 793.88 MAX: 691.5
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 a b 300K 600K 900K 1200K 1500K 1199743.22 1141859.25
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 200 a b 4 8 12 16 20 13.25 14.12 MAX: 896.77 MAX: 878.17
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 a b 300K 600K 900K 1200K 1500K 1343156.56 1372429.58
Apache IoTDB Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 500 - Batch Size Per Write: 1 - Sensor Count: 500 a b 8 16 24 32 40 33.38 32.75 MAX: 934.86 MAX: 992.49
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 a b 7M 14M 21M 28M 35M 34266143.85 34807016.85
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 200 a b 10 20 30 40 50 42.79 42.19 MAX: 855.16 MAX: 784.56
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 a b 9M 18M 27M 36M 45M 39562245.22 43021501.40
Apache IoTDB Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.1.2 Device Count: 100 - Batch Size Per Write: 100 - Sensor Count: 500 a b 20 40 60 80 100 109.40 96.18 MAX: 2142.92 MAX: 1249.92
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only a b 6 12 18 24 30 SE +/- 0.09, N = 2 SE +/- 0.06, N = 2 23.69 23.62
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: CPU-Only a b 14 28 42 56 70 SE +/- 0.04, N = 2 SE +/- 0.19, N = 2 62.35 62.51
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only a b 7 14 21 28 35 SE +/- 0.01, N = 2 SE +/- 0.02, N = 2 30.59 30.77
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: CPU-Only a b 50 100 150 200 250 SE +/- 0.32, N = 2 SE +/- 1.32, N = 2 239.55 239.03
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Chimera 1080p a b 110 220 330 440 550 SE +/- 0.06, N = 2 516.17 516.50 1. (CC) gcc options: -pthread -lm
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Summer Nature 4K a b 60 120 180 240 300 SE +/- 0.08, N = 2 282.53 282.65 1. (CC) gcc options: -pthread -lm
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Summer Nature 1080p a b 150 300 450 600 750 SE +/- 0.50, N = 2 699.97 699.09 1. (CC) gcc options: -pthread -lm
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Chimera 1080p 10-bit a b 100 200 300 400 500 SE +/- 0.41, N = 2 476.82 476.77 1. (CC) gcc options: -pthread -lm
Embree Binary: Pathtracer - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Crown a b 16 32 48 64 80 SE +/- 0.12, N = 2 72.04 70.79 MIN: 68.2 / MAX: 79.55 MIN: 67 / MAX: 79.71
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Crown a b 20 40 60 80 100 SE +/- 0.10, N = 2 87.93 87.93 MIN: 85.27 / MAX: 92.58 MIN: 84.73 / MAX: 92.37
Embree Binary: Pathtracer - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Asian Dragon a b 20 40 60 80 100 SE +/- 0.04, N = 2 SE +/- 0.14, N = 2 85.24 85.13 MIN: 83.75 / MAX: 89.99 MIN: 83.65 / MAX: 90.45
Embree Binary: Pathtracer - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Asian Dragon Obj a b 20 40 60 80 100 SE +/- 0.03, N = 2 SE +/- 0.03, N = 2 76.96 77.26 MIN: 75.53 / MAX: 82.14 MIN: 75.78 / MAX: 81.08
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon a b 20 40 60 80 100 SE +/- 0.32, N = 2 SE +/- 0.24, N = 2 104.41 104.55 MIN: 101.88 / MAX: 109.22 MIN: 102.2 / MAX: 108.91
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon Obj a b 20 40 60 80 100 SE +/- 0.06, N = 2 SE +/- 0.05, N = 2 89.84 90.01 MIN: 87.68 / MAX: 94.71 MIN: 87.6 / MAX: 94.43
GPAW Input: Carbon Nanotube OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube a b 10 20 30 40 50 SE +/- 0.02, N = 2 SE +/- 0.03, N = 2 45.82 45.64 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 a b 40 80 120 160 200 SE +/- 0.21, N = 2 159.34 158.71 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 a b 20 40 60 80 100 SE +/- 0.00, N = 2 102.28 98.83 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 a b 40 80 120 160 200 SE +/- 0.39, N = 2 199.10 199.33 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 a b 50 100 150 200 250 SE +/- 3.31, N = 2 222.22 230.54 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 a b 20 40 60 80 100 SE +/- 1.46, N = 2 94.45 91.90 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 a b 11 22 33 44 55 SE +/- 0.55, N = 2 45.85 46.46 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 128 a b 20 40 60 80 100 SE +/- 1.54, N = 2 107.45 107.95 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 a b 20 40 60 80 100 SE +/- 0.34, N = 2 101.80 104.35 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 a b 30 60 90 120 150 SE +/- 2.45, N = 2 156.22 148.18 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 a b 20 40 60 80 100 SE +/- 1.52, N = 2 93.01 92.82 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 128 a b 40 80 120 160 200 SE +/- 0.86, N = 2 185.45 182.03 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 a b 50 100 150 200 250 SE +/- 2.75, N = 2 236.67 236.12 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 a b 15 30 45 60 75 SE +/- 1.01, N = 2 69.48 67.95 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 a b 11 22 33 44 55 SE +/- 0.08, N = 2 46.66 46.51 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 128 a b 30 60 90 120 150 SE +/- 4.04, N = 2 117.01 116.84 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 a b 20 40 60 80 100 SE +/- 0.72, N = 2 101.94 102.43 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 a b 20 40 60 80 100 SE +/- 1.11, N = 2 SE +/- 0.95, N = 2 94.83 94.34 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 a b 40 80 120 160 200 SE +/- 1.25, N = 2 SE +/- 1.39, N = 2 170.91 171.13 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 a b 11 22 33 44 55 SE +/- 0.29, N = 2 SE +/- 0.39, N = 2 49.44 48.72 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 a b 20 40 60 80 100 SE +/- 0.40, N = 2 SE +/- 0.24, N = 2 93.33 92.26 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 a b 20 40 60 80 100 SE +/- 1.08, N = 2 SE +/- 1.17, N = 2 90.57 90.24 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 a b 40 80 120 160 200 SE +/- 0.68, N = 2 SE +/- 0.12, N = 2 176.63 174.11 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 a b 11 22 33 44 55 SE +/- 0.14, N = 2 SE +/- 0.07, N = 2 47.28 47.39 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 a b 20 40 60 80 100 SE +/- 0.13, N = 2 SE +/- 0.73, N = 2 94.26 92.72 1. (CXX) g++ options: -O3
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.684 1.368 2.052 2.736 3.42 SE +/- 0.00, N = 2 3.04 3.01
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b 0.6863 1.3726 2.0589 2.7452 3.4315 SE +/- 0.01, N = 2 3.05 3.03
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b 0.3308 0.6616 0.9924 1.3232 1.654 SE +/- 0.00, N = 2 1.47 1.46
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 a b 400 800 1200 1600 2000 SE +/- 54.55, N = 2 1055.3 1946.7 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 a b 130 260 390 520 650 SE +/- 2.65, N = 2 599.8 592.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 a b 140 280 420 560 700 SE +/- 2.35, N = 2 633.2 639.3 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 a b 300 600 900 1200 1500 SE +/- 1.25, N = 2 1219.9 1216.0 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 a b 200M 400M 600M 800M 1000M SE +/- 2085000.00, N = 2 992540000 993445000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 a b 300M 600M 900M 1200M 1500M SE +/- 20600000.00, N = 2 1197700000 1185500000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 a b 400M 800M 1200M 1600M 2000M SE +/- 2750000.00, N = 2 1805000000 1825450000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 a b 400M 800M 1200M 1600M 2000M SE +/- 13650000.00, N = 2 2069200000 2076650000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 32 a b 600M 1200M 1800M 2400M 3000M SE +/- 6350000.00, N = 2 2961100000 2945150000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 a b 500M 1000M 1500M 2000M 2500M SE +/- 13350000.00, N = 2 2519200000 2426350000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 160 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 160 - Buffer Length: 256 - Filter Length: 32 a b 700M 1400M 2100M 2800M 3500M SE +/- 9150000.00, N = 2 3390700000 3381950000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 160 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 160 - Buffer Length: 256 - Filter Length: 57 a b 600M 1200M 1800M 2400M 3000M SE +/- 17250000.00, N = 2 2602300000 2636450000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 a b 90M 180M 270M 360M 450M SE +/- 2775000.00, N = 2 400730000 396265000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 a b 160M 320M 480M 640M 800M SE +/- 1250000.00, N = 2 725840000 730310000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 512 a b 200M 400M 600M 800M 1000M SE +/- 2180000.00, N = 2 949400000 945190000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 160 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 160 - Buffer Length: 256 - Filter Length: 512 a b 200M 400M 600M 800M 1000M SE +/- 1800000.00, N = 2 1013200000 1011200000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 a b 7M 14M 21M 28M 35M SE +/- 0.00, N = 2 SE +/- 0.00, N = 2 32338000 32267000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 57 a b 12M 24M 36M 48M 60M SE +/- 500.00, N = 2 SE +/- 1500.00, N = 2 53918500 53926500 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 512 a b 3M 6M 9M 12M 15M SE +/- 1000.00, N = 2 SE +/- 34000.00, N = 2 13323000 13291000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 32 a b 110M 220M 330M 440M 550M SE +/- 1500000.00, N = 2 SE +/- 830000.00, N = 2 493660000 498410000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 57 a b 130M 260M 390M 520M 650M SE +/- 3155000.00, N = 2 SE +/- 11305000.00, N = 2 615105000 623535000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 512 a b 40M 80M 120M 160M 200M SE +/- 795000.00, N = 2 SE +/- 650000.00, N = 2 201615000 198790000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet a b 4 8 12 16 20 SE +/- 0.82, N = 2 SE +/- 0.28, N = 2 16.06 15.90 MIN: 14.92 / MAX: 25.43 MIN: 15.2 / MAX: 39.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 a b 2 4 6 8 10 SE +/- 0.13, N = 2 SE +/- 0.02, N = 2 7.91 7.94 MIN: 7.68 / MAX: 9.6 MIN: 7.81 / MAX: 10.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 a b 2 4 6 8 10 SE +/- 0.14, N = 2 SE +/- 0.03, N = 2 8.71 8.77 MIN: 8.43 / MAX: 9.8 MIN: 8.59 / MAX: 32.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 a b 3 6 9 12 15 SE +/- 0.03, N = 2 SE +/- 0.06, N = 2 9.82 9.75 MIN: 9.6 / MAX: 12.61 MIN: 9.56 / MAX: 13.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet a b 2 4 6 8 10 SE +/- 0.07, N = 2 SE +/- 0.01, N = 2 7.61 7.43 MIN: 7.33 / MAX: 43.29 MIN: 7.16 / MAX: 15.37 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 a b 3 6 9 12 15 SE +/- 0.23, N = 2 SE +/- 0.26, N = 2 11.48 11.64 MIN: 10.9 / MAX: 56.34 MIN: 10.85 / MAX: 37.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface a b 1.0373 2.0746 3.1119 4.1492 5.1865 SE +/- 0.09, N = 2 SE +/- 0.01, N = 2 4.49 4.61 MIN: 4.31 / MAX: 5.13 MIN: 4.49 / MAX: 5.33 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet a b 4 8 12 16 20 SE +/- 1.05, N = 2 SE +/- 0.37, N = 2 17.06 16.72 MIN: 15.5 / MAX: 66.12 MIN: 15.67 / MAX: 100.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 a b 6 12 18 24 30 SE +/- 0.84, N = 2 SE +/- 0.34, N = 2 26.27 26.19 MIN: 24.05 / MAX: 301.35 MIN: 24.19 / MAX: 341.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 a b 3 6 9 12 15 SE +/- 1.04, N = 2 SE +/- 0.31, N = 2 10.31 9.63 MIN: 9.03 / MAX: 33.3 MIN: 9.16 / MAX: 26.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet a b 1.2713 2.5426 3.8139 5.0852 6.3565 SE +/- 0.43, N = 2 SE +/- 0.22, N = 2 5.65 5.44 MIN: 5.03 / MAX: 6.71 MIN: 5.08 / MAX: 7.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 a b 4 8 12 16 20 SE +/- 0.36, N = 2 SE +/- 0.83, N = 2 17.57 18.15 MIN: 16.92 / MAX: 18.88 MIN: 16.98 / MAX: 42.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny a b 6 12 18 24 30 SE +/- 0.65, N = 2 SE +/- 0.64, N = 2 24.77 24.48 MIN: 22.68 / MAX: 208.18 MIN: 22.66 / MAX: 47.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd a b 4 8 12 16 20 SE +/- 0.07, N = 2 SE +/- 0.41, N = 2 15.78 16.13 MIN: 15.4 / MAX: 43.08 MIN: 15.35 / MAX: 39.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m a b 9 18 27 36 45 SE +/- 0.86, N = 2 SE +/- 1.10, N = 2 38.20 39.37 MIN: 36.18 / MAX: 62.76 MIN: 37.07 / MAX: 103.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer a b 11 22 33 44 55 SE +/- 2.46, N = 2 SE +/- 1.19, N = 2 46.50 45.56 MIN: 42.6 / MAX: 72.28 MIN: 43.24 / MAX: 70.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet a b 3 6 9 12 15 SE +/- 0.05, N = 2 SE +/- 0.28, N = 2 9.62 10.01 MIN: 9.35 / MAX: 10.52 MIN: 9.4 / MAX: 59.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.4 WAV To Opus Encode a b 8 16 24 32 40 SE +/- 0.01, N = 2 36.74 36.73 1. (CXX) g++ options: -O3 -fvisibility=hidden -logg -lm
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time a b 6 12 18 24 30 SE +/- 0.09, N = 2 24.64 24.62
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time a b 6 12 18 24 30 SE +/- 0.03, N = 2 24.36 24.78
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time a b 30 60 90 120 150 SE +/- 0.63, N = 2 151.14 151.27
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time a b 5 10 15 20 25 SE +/- 0.22, N = 2 21.21 20.95
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time a b 5 10 15 20 25 SE +/- 0.05, N = 2 20.81 20.48
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time a b 5 10 15 20 25 SE +/- 0.00, N = 2 22.70 22.58
QuantLib OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.30 a b 600 1200 1800 2400 3000 SE +/- 2.00, N = 2 SE +/- 1.80, N = 2 2622.9 2607.6 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
Remhos Test: Sample Remap Example OpenBenchmarking.org Seconds, Fewer Is Better Remhos 1.0 Test: Sample Remap Example a b 3 6 9 12 15 SE +/- 0.10, N = 2 12.20 12.37 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
srsRAN Project Test: Downlink Processor Benchmark OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: Downlink Processor Benchmark a b 120 240 360 480 600 SE +/- 0.70, N = 2 SE +/- 1.25, N = 2 556.5 556.8 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: PUSCH Processor Benchmark, Throughput Total a b 2K 4K 6K 8K 10K SE +/- 47.35, N = 2 SE +/- 33.95, N = 2 9800.5 9756.7 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 23.5 Test: PUSCH Processor Benchmark, Throughput Thread a b 40 80 120 160 200 SE +/- 1.70, N = 2 SE +/- 0.90, N = 2 164.8 164.7 1. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest
Stress-NG Test: Pipe OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Pipe a b 11M 22M 33M 44M 55M SE +/- 2523742.74, N = 2 SE +/- 6369572.71, N = 2 40500166.81 49325396.97 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Zlib OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Zlib a b 1500 3000 4500 6000 7500 SE +/- 8.83, N = 2 SE +/- 4.39, N = 2 6879.86 6880.22 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Cloning OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Cloning a b 3K 6K 9K 12K 15K SE +/- 3270.70, N = 2 SE +/- 654.66, N = 2 16195.03 13172.81 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Pthread OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Pthread a b 20K 40K 60K 80K 100K SE +/- 279.15, N = 2 SE +/- 894.90, N = 2 92131.54 90361.70 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: AVL Tree OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: AVL Tree a b 130 260 390 520 650 SE +/- 0.08, N = 2 SE +/- 0.44, N = 2 610.69 610.83 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Floating Point a b 5K 10K 15K 20K 25K SE +/- 6.86, N = 2 SE +/- 9.14, N = 2 21134.81 21133.02 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Matrix 3D Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Matrix 3D Math a b 3K 6K 9K 12K 15K SE +/- 9.80, N = 2 SE +/- 5.06, N = 2 12743.81 12742.70 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Vector Shuffle OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Shuffle a b 10K 20K 30K 40K 50K SE +/- 1.26, N = 2 SE +/- 42.22, N = 2 48054.48 48076.78 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Wide Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Wide Vector Math a b 500K 1000K 1500K 2000K 2500K SE +/- 1200.16, N = 2 SE +/- 497.69, N = 2 2195391.41 2196242.21 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Fused Multiply-Add OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Fused Multiply-Add a b 40M 80M 120M 160M 200M SE +/- 118686.48, N = 2 SE +/- 92010.25, N = 2 181083180.47 181314757.42 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Stress-NG Test: Vector Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Floating Point a b 30K 60K 90K 120K 150K SE +/- 872.22, N = 2 SE +/- 235.00, N = 2 132479.08 131100.25 1. (CXX) g++ options: -O2 -std=gnu99 -lc
Timed GCC Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed GCC Compilation 13.2 Time To Compile a b 200 400 600 800 1000 SE +/- 1.97, N = 2 957.95 956.13
VVenC Video Input: Bosphorus 4K - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Fast a b 1.2863 2.5726 3.8589 5.1452 6.4315 SE +/- 0.019, N = 2 SE +/- 0.063, N = 2 5.672 5.717 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
VVenC Video Input: Bosphorus 4K - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Faster a b 3 6 9 12 15 SE +/- 0.03, N = 2 SE +/- 0.10, N = 2 10.28 10.43 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
VVenC Video Input: Bosphorus 1080p - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Fast a b 4 8 12 16 20 SE +/- 0.06, N = 2 SE +/- 0.04, N = 2 15.71 15.72 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
VVenC Video Input: Bosphorus 1080p - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 1080p - Video Preset: Faster a b 7 14 21 28 35 SE +/- 0.37, N = 2 SE +/- 0.23, N = 2 29.08 29.18 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Z3 Theorem Prover SMT File: 1.smt2 OpenBenchmarking.org Seconds, Fewer Is Better Z3 Theorem Prover 4.12.1 SMT File: 1.smt2 a b 6 12 18 24 30 SE +/- 0.03, N = 2 SE +/- 0.07, N = 2 25.71 25.25 1. (CXX) g++ options: -lpthread -std=c++17 -fvisibility=hidden -mfpmath=sse -msse -msse2 -O3 -fPIC
Z3 Theorem Prover SMT File: 2.smt2 OpenBenchmarking.org Seconds, Fewer Is Better Z3 Theorem Prover 4.12.1 SMT File: 2.smt2 a b 20 40 60 80 100 SE +/- 0.02, N = 2 SE +/- 0.05, N = 2 88.00 87.18 1. (CXX) g++ options: -lpthread -std=c++17 -fvisibility=hidden -mfpmath=sse -msse -msse2 -O3 -fPIC
Phoronix Test Suite v10.8.5