eoy2024 Tests for a future article. AMD EPYC 4564P 16-Core testing with a Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412068-NE-EOY20244373&grs&sro .
eoy2024 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b c AMD EPYC 4564P 16-Core @ 5.88GHz (16 Cores / 32 Threads) Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) AMD Device 14d8 2 x 32GB DRAM-4800MT/s Micron MTC20C2085S1EC48BA1 BC 3201GB Micron_7450_MTFDKCC3T2TFS + 960GB SAMSUNG MZ1L2960HCJR-00A07 ASPEED AMD Rembrandt Radeon HD Audio VA2431 2 x Intel I210 Ubuntu 24.04 6.8.0-11-generic (x86_64) GNOME Shell 45.3 X Server 1.21.1.11 GCC 13.2.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601209 Java Details - OpenJDK Runtime Environment (build 21.0.2+13-Ubuntu-2) Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
eoy2024 litert: Quantized COCO SSD MobileNet v1 litert: NASNet Mobile litert: DeepLab V3 litert: Mobilenet Quant cp2k: Fayalite-FIST llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 stockfish: Chess Benchmark relion: Basic - CPU litert: Inception V4 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 renaissance: Apache Spark Bayes llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 litert: Mobilenet Float renaissance: In-Memory Database Shootout renaissance: Scala Dotty cp2k: H20-256 renaissance: Rand Forest llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 rustls: handshake - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 simdjson: LargeRand gcrypt: xnnpack: FP16MobileNetV2 onednn: Deconvolution Batch shapes_1d - CPU litert: Inception ResNet V2 xnnpack: FP32MobileNetV2 simdjson: Kostya pyperformance: asyncio_tcp_ssl llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16 xnnpack: FP32MobileNetV3Large litert: SqueezeNet llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 renaissance: Genetic Algorithm Using Jenetics + Futures xnnpack: FP16MobileNetV3Large simdjson: TopTweet llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 xnnpack: FP32MobileNetV1 renaissance: Gaussian Mixture Model xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small renaissance: Savina Reactors.IO renaissance: Akka Unbalanced Cobwebbed Tree svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 4K onednn: IP Shapes 3D - CPU renaissance: Finagle HTTP Requests onednn: IP Shapes 1D - CPU llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 cp2k: H20-64 onednn: Convolution Batch Shapes Auto - CPU x265: Bosphorus 4K svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 5 - Beauty 4K 10-bit build-eigen: Time To Compile onednn: Deconvolution Batch shapes_3d - CPU rustls: handshake-resume - TLS13_CHACHA20_POLY1305_SHA256 whisper-cpp: ggml-small.en - 2016 State of the Union rustls: handshake-ticket - TLS13_CHACHA20_POLY1305_SHA256 onnx: bertsquad-12 - CPU - Standard llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128 rustls: handshake - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 rustls: handshake-resume - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 stockfish: Chess Benchmark povray: Trace Time llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128 llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128 renaissance: ALS Movie Lens svt-av1: Preset 13 - Bosphorus 4K onednn: Recurrent Neural Network Inference - CPU onnx: ZFNet-512 - CPU - Standard x265: Bosphorus 1080p onnx: fcn-resnet101-11 - CPU - Standard simdjson: PartialTweets whisperfile: Small onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard namd: ATPase with 327,506 Atoms onnx: ResNet101_DUC_HDC-12 - CPU - Standard couchdb: 100 - 3000 - 30 numpy: pyperformance: chaos mt-dgemm: Sustained Floating-Point Rate svt-av1: Preset 5 - Bosphorus 1080p pyperformance: float xnnpack: FP16MobileNetV3Small rustls: handshake-ticket - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 xnnpack: QS8MobileNetV2 whisperfile: Tiny onnx: ArcFace ResNet-100 - CPU - Standard renaissance: Apache Spark PageRank svt-av1: Preset 8 - Beauty 4K 10-bit pyperformance: raytrace rustls: handshake-ticket - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 financebench: Bonds OpenMP onnx: CaffeNet 12-int8 - CPU - Standard couchdb: 500 - 3000 - 30 onnx: ResNet50 v1-12-int8 - CPU - Standard pyperformance: go svt-av1: Preset 3 - Bosphorus 4K couchdb: 300 - 1000 - 30 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 blender: Junkshop - CPU-Only svt-av1: Preset 5 - Bosphorus 4K ospray: gravity_spheres_volume/dim_512/ao/real_time onnx: yolov4 - CPU - Standard compress-7zip: Decompression Rating onednn: Recurrent Neural Network Training - CPU openvino-genai: Gemma-7b-int4-ov - CPU svt-av1: Preset 13 - Beauty 4K 10-bit ospray: gravity_spheres_volume/dim_512/scivis/real_time svt-av1: Preset 3 - Beauty 4K 10-bit gromacs: water_GMX50_bare simdjson: DistinctUserID blender: Classroom - CPU-Only pyperformance: pathlib llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 astcenc: Exhaustive astcenc: Thorough etcpak: Multi-Threaded - ETC2 pyperformance: nbody svt-av1: Preset 3 - Bosphorus 1080p llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 couchdb: 500 - 1000 - 30 astcenc: Medium blender: Barbershop - CPU-Only pyperformance: pickle_pure_python astcenc: Very Thorough pyperformance: gc_collect ospray: particle_volume/scivis/real_time primesieve: 1e13 pyperformance: regex_compile llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16 ospray: particle_volume/pathtracer/real_time pyperformance: async_tree_io llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 128 blender: Fishy Cat - CPU-Only primesieve: 1e12 rustls: handshake - TLS13_CHACHA20_POLY1305_SHA256 financebench: Repo OpenMP pyperformance: django_template ospray: particle_volume/ao/real_time byte: Dhrystone 2 quantlib: XXS y-cruncher: 1B llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16 whisperfile: Medium byte: Pipe llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16 blender: BMW27 - CPU-Only openssl: AES-128-GCM openssl: AES-256-GCM pyperformance: python_startup ospray: gravity_spheres_volume/dim_512/pathtracer/real_time whisper-cpp: ggml-medium.en - 2016 State of the Union pyperformance: asyncio_websockets openvino-genai: Falcon-7b-instruct-int4-ov - CPU quantlib: S pyperformance: xml_etree compress-7zip: Compression Rating couchdb: 100 - 1000 - 30 build2: Time To Compile byte: System Call y-cruncher: 500M whisper-cpp: ggml-base.en - 2016 State of the Union onnx: T5 Encoder - CPU - Standard pyperformance: crypto_pyaes namd: STMV with 1,066,628 Atoms couchdb: 300 - 3000 - 30 onnx: GPT-2 - CPU - Standard openssl: ChaCha20-Poly1305 openssl: ChaCha20 byte: Whetstone Double onnx: super-resolution-10 - CPU - Standard blender: Pabellon Barcelona - CPU-Only astcenc: Fast rustls: handshake-resume - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 cassandra: Writes llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 2048 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 1024 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 512 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 256 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 2048 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 1024 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 512 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 256 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256 pyperformance: json_loads openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Token openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Token openvino-genai: Gemma-7b-int4-ov - CPU - Time To First Token onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: GPT-2 - CPU - Standard renaissance: Apache Spark ALS a b c 2129.52 16936 3579.67 823.17 94.032 68.4 54752796 944.27 21477.8 355.09 490.0 70.76 70.85 1211.48 3256.1 477.0 592.857 414.4 63.09 423535.68 1.83 162.125 1190 2.97612 19530.2 1495 5.97 645 19.03 1810 1794.11 62.97 732.8 1498 10.46 47.72 1252 3399.5 1143 979 3506.4 4403.8 339.023 102.005 4.058 2319.4 1.12573 279.04 58.191 6.67287 32.57 842.558 6.504 58.655 2.41294 388077.69 245.07838 404263.45 15.5899 26.28 80462.6 3563852.57 46507038 18.542 10.47 20.13 9805.7 212.52 700.859 102.331 114.45 3.2167 9.76 195.41642 47.0691 2.79632 1.54196 232.188 775.75 38.2 1141.194104 101.971 50.7 920 2620332 844 41.70935 42.4537 2412.2 12.468 175 1553632.14 33061.21875 636.318 511.775 390.597 77.8 9.59 106.13 327.3 73.56 34.538 7.63944 11.0552 165916 1372.03 9.83 18.588 7.58789 1.422 1.692 10.46 143.36 14.2 7.24 1.6844 20.3025 577.817 59 29.573 69.26 148.049 156.2217 506.2 165 2.741 677 8.98486 78.498 69.8 1.78 236.245 755 1.99 71.35 6.347 76454.45 21418.445312 20.7 9.00917 1866536062.7 13.432 18.485 6.88 19.28 24.59 534.919 48806257.1 10.22 53.55 104784522170 97172751700 5.77 8.82093 700.91 315 12.93 12.7476 35.8 163859 69.929 92.053 49140426.6 8.772 87.48973 156.453 41.7 0.75656 367.83 134.596 92393529340 130588495050 343491.9 141.117 166.12 396.6495 1820810.21 271333 12288 6144 3072 1536 32768 16384 8192 4096 32768 16384 8192 4096 32768 16384 8192 4096 12.1 51.86 55.93 77.34 86.06 101.72 106.62 21.2429 648.522 7.08601 2.55898 23.553 310.875 1.57084 64.141 6.39112 9.76985 90.4523 7.42776 2958.48 21468.7 4287.06 933.176 102.418 61.62 59130265 867.315 23265.4 328.47 529.5 75.96 66 1295.51 3081.5 447.0 629.557 398.1 59.84 402625.06 1.81 154.53 1247 3.1126 20375.7 1559 5.93 672 18.32 1877 1860.35 65.27 744.3 1549 10.8 46.28 1290 3494.8 1174 1005 3594.3 4439.9 330.87 99.554 4.15682 2264.7 1.15274 285.71 57.347 6.81754 32.04 824.808 6.371 59.873 2.46279 380493.86 240.59909 397022.4 15.3182 25.83 79085.8 3504511.31 45751747 18.846 10.64 19.81 9958.3 209.773 711.433 103.862 112.85 3.1705 9.82 192.67808 47.7293 2.79025 1.52109 235.345 765.35 38.7 1137.394602 100.893 50.1 931 2589637.92 854 42.20049 41.9639 2439.9 12.611 177 1536355.9 33432.636719 629.655 517.149 386.577 77 9.554 107.182 324.21 74.26 34.225 7.57408 10.9603 166843 1383.64 9.91 18.442 7.52875 1.415 1.679 10.38 144.41 14.3 7.19 1.6728 20.1644 575.02 59.4 29.375 68.8 149.028 155.2665 509.3 166 2.7248 681 8.93245 78.949 70.2 1.79 235.325 759 2 71.7 6.378 76083.73 21522.066406 20.8 8.96632 1857795366.1 13.4292 18.403 6.85 19.2 24.69 532.80744 48718087.1 10.26 53.75 104404347840 96821737060 5.79 8.79096 703.22188 316 12.97 12.7098 35.7 164050 70.114 92.292 49062324.1 8.794 87.27256 156.833 41.8 0.75634 368.664 134.311 92216350580 130359884190 343113 141.003 166.25 396.4261 1821261.88 271373 12288 6144 3072 1536 32768 16384 8192 4096 32768 16384 8192 4096 32768 16384 8192 4096 12.1 52.1 56.26 77.13 84.39 100.94 107.03 20.9488 657.42 7.09172 2.58589 23.8276 315.405 1.58746 65.2787 6.37566 9.6259 91.2359 7.4434 105.221 53623108 939.897 500.3 3046.8 458.5 624.571 420.8 1.74 5.73 719.1 10.79 3472.4 3567.8 4331.7 338.653 100.893 2296.6 58.647 32.73 838.168 6.374 9907.4 212.945 114.52 9.68 2.82925 1127.270287 102.109 2439.2 12.597 9.495 34.448 7.64282 167321 18.556 7.55791 1.411 10.43 573.914 29.465 8.97005 234.971 8.98586 1862548305.4 13.491 48613927.9 8.81199 12.7242 164313 49016743.6 0.75813 343187 OpenBenchmarking.org
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a b 600 1200 1800 2400 3000 2129.52 2958.48
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a b 5K 10K 15K 20K 25K 16936.0 21468.7
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a b 900 1800 2700 3600 4500 3579.67 4287.06
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a b 200 400 600 800 1000 823.17 933.18
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: Fayalite-FIST a b c 20 40 60 80 100 94.03 102.42 105.22 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b 15 30 45 60 75 68.40 61.62 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Stockfish Chess Benchmark OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 17 Chess Benchmark a b c 13M 26M 39M 52M 65M 54752796 59130265 53623108 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -msse -msse3 -mpopcnt -mavx2 -mbmi -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto-partition=one -flto=jobserver
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 5.0 Test: Basic - Device: CPU a b c 200 400 600 800 1000 944.27 867.32 939.90 1. (CXX) g++ options: -fPIC -std=c++14 -fopenmp -O3 -rdynamic -lfftw3f -lfftw3 -ldl -ltiff -lpng -ljpeg -lmpi_cxx -lmpi
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a b 5K 10K 15K 20K 25K 21477.8 23265.4
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b 80 160 240 320 400 355.09 328.47 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Renaissance Test: Apache Spark Bayes OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark Bayes a b c 110 220 330 440 550 490.0 529.5 500.3 MIN: 459.29 / MAX: 580.9 MIN: 458.39 / MAX: 562.09 MIN: 460.66 / MAX: 542.36
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b 20 40 60 80 100 70.76 75.96 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b 16 32 48 64 80 70.85 66.00 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a b 300 600 900 1200 1500 1211.48 1295.51
Renaissance Test: In-Memory Database Shootout OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: In-Memory Database Shootout a b c 700 1400 2100 2800 3500 3256.1 3081.5 3046.8 MIN: 3019.89 / MAX: 3599.5 MIN: 2836.52 / MAX: 3397.02 MIN: 2814.66 / MAX: 3304.16
Renaissance Test: Scala Dotty OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Scala Dotty a b c 100 200 300 400 500 477.0 447.0 458.5 MIN: 371.54 / MAX: 736.5 MIN: 402.95 / MAX: 718.21 MIN: 406.93 / MAX: 746.39
CP2K Molecular Dynamics Input: H20-256 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-256 a b c 140 280 420 560 700 592.86 629.56 624.57 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
Renaissance Test: Random Forest OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Random Forest a b c 90 180 270 360 450 414.4 398.1 420.8 MIN: 322.79 / MAX: 466.1 MIN: 343.09 / MAX: 475.62 MIN: 316.29 / MAX: 556.39
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b 14 28 42 56 70 63.09 59.84 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Rustls Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a b 90K 180K 270K 360K 450K 423535.68 402625.06 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: LargeRandom a b c 0.4118 0.8236 1.2354 1.6472 2.059 1.83 1.81 1.74 1. (CXX) g++ options: -O3 -lrt
Gcrypt Library OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.10.3 a b 40 80 120 160 200 162.13 154.53 1. (CC) gcc options: -O2 -fvisibility=hidden
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a b 300 600 900 1200 1500 1190 1247 1. (CXX) g++ options: -O3 -lrt -lm
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a b 0.7003 1.4006 2.1009 2.8012 3.5015 2.97612 3.11260 MIN: 2.42 MIN: 2.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 a b 4K 8K 12K 16K 20K 19530.2 20375.7
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 a b 300 600 900 1200 1500 1495 1559 1. (CXX) g++ options: -O3 -lrt -lm
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: Kostya a b c 1.3433 2.6866 4.0299 5.3732 6.7165 5.97 5.93 5.73 1. (CXX) g++ options: -O3 -lrt
PyPerformance Benchmark: asyncio_tcp_ssl OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: asyncio_tcp_ssl a b 150 300 450 600 750 645 672
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 a b 5 10 15 20 25 19.03 18.32
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large a b 400 800 1200 1600 2000 1810 1877 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a b 400 800 1200 1600 2000 1794.11 1860.35
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b 15 30 45 60 75 62.97 65.27 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Renaissance Test: Genetic Algorithm Using Jenetics + Futures OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Genetic Algorithm Using Jenetics + Futures a b c 160 320 480 640 800 732.8 744.3 719.1 MIN: 713.67 / MAX: 813.49 MIN: 714.12 / MAX: 802.66 MIN: 670.9 / MAX: 764.9
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a b 300 600 900 1200 1500 1498 1549 1. (CXX) g++ options: -O3 -lrt -lm
simdjson Throughput Test: TopTweet OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: TopTweet a b c 3 6 9 12 15 10.46 10.80 10.79 1. (CXX) g++ options: -O3 -lrt
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b 11 22 33 44 55 47.72 46.28 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a b 300 600 900 1200 1500 1252 1290 1. (CXX) g++ options: -O3 -lrt -lm
Renaissance Test: Gaussian Mixture Model OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Gaussian Mixture Model a b c 700 1400 2100 2800 3500 3399.5 3494.8 3472.4 MIN: 2471.52 MIN: 2520.23 MIN: 2469.6
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a b 300 600 900 1200 1500 1143 1174 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small a b 200 400 600 800 1000 979 1005 1. (CXX) g++ options: -O3 -lrt -lm
Renaissance Test: Savina Reactors.IO OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Savina Reactors.IO a b c 800 1600 2400 3200 4000 3506.4 3594.3 3567.8 MIN: 3506.38 / MAX: 4329.37 MIN: 3594.26 / MAX: 4599.09 MAX: 5162.74
Renaissance Test: Akka Unbalanced Cobwebbed Tree OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Akka Unbalanced Cobwebbed Tree a b c 1000 2000 3000 4000 5000 4403.8 4439.9 4331.7 MAX: 5719.11 MAX: 5696.46 MIN: 4331.69 / MAX: 5601.8
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c 70 140 210 280 350 339.02 330.87 338.65 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c 20 40 60 80 100 102.01 99.55 100.89 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU a b 0.9353 1.8706 2.8059 3.7412 4.6765 4.05800 4.15682 MIN: 3.75 MIN: 3.75 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Renaissance Test: Finagle HTTP Requests OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Finagle HTTP Requests a b c 500 1000 1500 2000 2500 2319.4 2264.7 2296.6 MIN: 1832.84 MIN: 1788.41 / MAX: 2264.71 MIN: 1805.17
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a b 0.2594 0.5188 0.7782 1.0376 1.297 1.12573 1.15274 MIN: 1.03 MIN: 1.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b 60 120 180 240 300 279.04 285.71 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
CP2K Molecular Dynamics Input: H20-64 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-64 a b c 13 26 39 52 65 58.19 57.35 58.65 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a b 2 4 6 8 10 6.67287 6.81754 MIN: 6.2 MIN: 6.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 Video Input: Bosphorus 4K a b c 8 16 24 32 40 32.57 32.04 32.73 1. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c 200 400 600 800 1000 842.56 824.81 838.17 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c 2 4 6 8 10 6.504 6.371 6.374 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.4.0 Time To Compile a b 13 26 39 52 65 58.66 59.87
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a b 0.5541 1.1082 1.6623 2.2164 2.7705 2.41294 2.46279 MIN: 2.34 MIN: 2.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Rustls Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256 a b 80K 160K 240K 320K 400K 388077.69 380493.86 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union a b 50 100 150 200 250 245.08 240.60 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Rustls Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256 a b 90K 180K 270K 360K 450K 404263.45 397022.40 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b 4 8 12 16 20 15.59 15.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 a b 6 12 18 24 30 26.28 25.83
Rustls Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a b 20K 40K 60K 80K 100K 80462.6 79085.8 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Rustls Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a b 800K 1600K 2400K 3200K 4000K 3563852.57 3504511.31 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Stockfish Chess Benchmark OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish Chess Benchmark a b 10M 20M 30M 40M 50M 46507038 45751747 1. Stockfish 16 by the Stockfish developers (see AUTHORS file)
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray Trace Time a b 5 10 15 20 25 18.54 18.85 1. POV-Ray 3.7.0.10.unofficial
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 a b 3 6 9 12 15 10.47 10.64
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 a b 5 10 15 20 25 20.13 19.81
Renaissance Test: ALS Movie Lens OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: ALS Movie Lens a b c 2K 4K 6K 8K 10K 9805.7 9958.3 9907.4 MIN: 9253.4 / MAX: 10057.61 MIN: 9305.94 / MAX: 10040.58 MIN: 9393.64 / MAX: 10087.8
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c 50 100 150 200 250 212.52 209.77 212.95 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a b 150 300 450 600 750 700.86 711.43 MIN: 679.89 MIN: 684.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b 20 40 60 80 100 102.33 103.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 Video Input: Bosphorus 1080p a b c 30 60 90 120 150 114.45 112.85 114.52 1. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b 0.7238 1.4476 2.1714 2.8952 3.619 3.2167 3.1705 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: PartialTweets a b c 3 6 9 12 15 9.76 9.82 9.68 1. (CXX) g++ options: -O3 -lrt
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a b 40 80 120 160 200 195.42 192.68
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b 11 22 33 44 55 47.07 47.73 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: ATPase with 327,506 Atoms a b c 0.6366 1.2732 1.9098 2.5464 3.183 2.79632 2.79025 2.82925
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b 0.3469 0.6938 1.0407 1.3876 1.7345 1.54196 1.52109 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Apache CouchDB Bulk Size: 100 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 100 - Inserts: 3000 - Rounds: 30 a b 50 100 150 200 250 232.19 235.35 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark a b 200 400 600 800 1000 775.75 765.35
PyPerformance Benchmark: chaos OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: chaos a b 9 18 27 36 45 38.2 38.7
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate a b c 200 400 600 800 1000 1141.19 1137.39 1127.27 1. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c 20 40 60 80 100 101.97 100.89 102.11 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
PyPerformance Benchmark: float OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: float a b 11 22 33 44 55 50.7 50.1
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a b 200 400 600 800 1000 920 931 1. (CXX) g++ options: -O3 -lrt -lm
Rustls Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a b 600K 1200K 1800K 2400K 3000K 2620332.00 2589637.92 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a b 200 400 600 800 1000 844 854 1. (CXX) g++ options: -O3 -lrt -lm
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a b 10 20 30 40 50 41.71 42.20
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b 10 20 30 40 50 42.45 41.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Renaissance Test: Apache Spark PageRank OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark PageRank a b c 500 1000 1500 2000 2500 2412.2 2439.9 2439.2 MIN: 1691.04 MIN: 1684.02 / MAX: 2439.95 MIN: 1679.36 / MAX: 2439.21
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c 3 6 9 12 15 12.47 12.61 12.60 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
PyPerformance Benchmark: raytrace OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: raytrace a b 40 80 120 160 200 175 177
Rustls Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a b 300K 600K 900K 1200K 1500K 1553632.14 1536355.90 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
FinanceBench Benchmark: Bonds OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Bonds OpenMP a b 7K 14K 21K 28K 35K 33061.22 33432.64 1. (CXX) g++ options: -O3 -march=native -fopenmp
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b 140 280 420 560 700 636.32 629.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Apache CouchDB Bulk Size: 500 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 500 - Inserts: 3000 - Rounds: 30 a b 110 220 330 440 550 511.78 517.15 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b 80 160 240 320 400 390.60 386.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
PyPerformance Benchmark: go OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: go a b 20 40 60 80 100 77.8 77.0
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c 3 6 9 12 15 9.590 9.554 9.495 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Apache CouchDB Bulk Size: 300 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 300 - Inserts: 1000 - Rounds: 30 a b 20 40 60 80 100 106.13 107.18 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b 70 140 210 280 350 327.30 324.21 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Blender Blend File: Junkshop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Junkshop - Compute: CPU-Only a b 16 32 48 64 80 73.56 74.26
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c 8 16 24 32 40 34.54 34.23 34.45 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/ao/real_time a b c 2 4 6 8 10 7.63944 7.57408 7.64282
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b 3 6 9 12 15 11.06 10.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression Test: Decompression Rating a b c 40K 80K 120K 160K 200K 165916 166843 167321 1. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a b 300 600 900 1200 1500 1372.03 1383.64 MIN: 1342.06 MIN: 1333.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU a b 3 6 9 12 15 9.83 9.91
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c 5 10 15 20 25 18.59 18.44 18.56 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time a b c 2 4 6 8 10 7.58789 7.52875 7.55791
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c 0.32 0.64 0.96 1.28 1.6 1.422 1.415 1.411 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
GROMACS Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS Input: water_GMX50_bare a b 0.3807 0.7614 1.1421 1.5228 1.9035 1.692 1.679 1. GROMACS version: 2023.3-Ubuntu_2023.3_1ubuntu3
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: DistinctUserID a b c 3 6 9 12 15 10.46 10.38 10.43 1. (CXX) g++ options: -O3 -lrt
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Classroom - Compute: CPU-Only a b 30 60 90 120 150 143.36 144.41
PyPerformance Benchmark: pathlib OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: pathlib a b 4 8 12 16 20 14.2 14.3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b 2 4 6 8 10 7.24 7.19 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Exhaustive a b 0.379 0.758 1.137 1.516 1.895 1.6844 1.6728 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Thorough a b 5 10 15 20 25 20.30 20.16 1. (CXX) g++ options: -O3 -flto -pthread
Etcpak Benchmark: Multi-Threaded - Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 2.0 Benchmark: Multi-Threaded - Configuration: ETC2 a b c 120 240 360 480 600 577.82 575.02 573.91 1. (CXX) g++ options: -flto -pthread
PyPerformance Benchmark: nbody OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: nbody a b 13 26 39 52 65 59.0 59.4
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c 7 14 21 28 35 29.57 29.38 29.47 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b 15 30 45 60 75 69.26 68.80 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Apache CouchDB Bulk Size: 500 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 500 - Inserts: 1000 - Rounds: 30 a b 30 60 90 120 150 148.05 149.03 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
ASTC Encoder Preset: Medium OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Medium a b 30 60 90 120 150 156.22 155.27 1. (CXX) g++ options: -O3 -flto -pthread
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Barbershop - Compute: CPU-Only a b 110 220 330 440 550 506.2 509.3
PyPerformance Benchmark: pickle_pure_python OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: pickle_pure_python a b 40 80 120 160 200 165 166
ASTC Encoder Preset: Very Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Very Thorough a b 0.6167 1.2334 1.8501 2.4668 3.0835 2.7410 2.7248 1. (CXX) g++ options: -O3 -flto -pthread
PyPerformance Benchmark: gc_collect OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: gc_collect a b 150 300 450 600 750 677 681
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/scivis/real_time a b c 3 6 9 12 15 8.98486 8.93245 8.97005
Primesieve Length: 1e13 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 12.6 Length: 1e13 a b 20 40 60 80 100 78.50 78.95 1. (CXX) g++ options: -O3
PyPerformance Benchmark: regex_compile OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: regex_compile a b 16 32 48 64 80 69.8 70.2
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 a b 0.4028 0.8056 1.2084 1.6112 2.014 1.78 1.79
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/pathtracer/real_time a b c 50 100 150 200 250 236.25 235.33 234.97
PyPerformance Benchmark: async_tree_io OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: async_tree_io a b 160 320 480 640 800 755 759
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 a b 0.45 0.9 1.35 1.8 2.25 1.99 2.00
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Fishy Cat - Compute: CPU-Only a b 16 32 48 64 80 71.35 71.70
Primesieve Length: 1e12 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 12.6 Length: 1e12 a b 2 4 6 8 10 6.347 6.378 1. (CXX) g++ options: -O3
Rustls Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256 a b 16K 32K 48K 64K 80K 76454.45 76083.73 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
FinanceBench Benchmark: Repo OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Repo OpenMP a b 5K 10K 15K 20K 25K 21418.45 21522.07 1. (CXX) g++ options: -O3 -march=native -fopenmp
PyPerformance Benchmark: django_template OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: django_template a b 5 10 15 20 25 20.7 20.8
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/ao/real_time a b c 3 6 9 12 15 9.00917 8.96632 8.98586
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Dhrystone 2 a b c 400M 800M 1200M 1600M 2000M 1866536062.7 1857795366.1 1862548305.4 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
QuantLib Size: XXS OpenBenchmarking.org tasks/s, More Is Better QuantLib 1.35-dev Size: XXS a b c 3 6 9 12 15 13.43 13.43 13.49 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.5 Pi Digits To Calculate: 1B a b 5 10 15 20 25 18.49 18.40
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b 2 4 6 8 10 6.88 6.85 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU a b 5 10 15 20 25 19.28 19.20
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 a b 6 12 18 24 30 24.59 24.69
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium a b 120 240 360 480 600 534.92 532.81
BYTE Unix Benchmark Computational Test: Pipe OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Pipe a b c 10M 20M 30M 40M 50M 48806257.1 48718087.1 48613927.9 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 a b 3 6 9 12 15 10.22 10.26
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: BMW27 - Compute: CPU-Only a b 12 24 36 48 60 53.55 53.75
OpenSSL Algorithm: AES-128-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: AES-128-GCM a b 20000M 40000M 60000M 80000M 100000M 104784522170 104404347840 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: AES-256-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: AES-256-GCM a b 20000M 40000M 60000M 80000M 100000M 97172751700 96821737060 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
PyPerformance Benchmark: python_startup OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: python_startup a b 1.3028 2.6056 3.9084 5.2112 6.514 5.77 5.79
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time a b c 2 4 6 8 10 8.82093 8.79096 8.81199
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union a b 150 300 450 600 750 700.91 703.22 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
PyPerformance Benchmark: asyncio_websockets OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: asyncio_websockets a b 70 140 210 280 350 315 316
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU a b 3 6 9 12 15 12.93 12.97
QuantLib Size: S OpenBenchmarking.org tasks/s, More Is Better QuantLib 1.35-dev Size: S a b c 3 6 9 12 15 12.75 12.71 12.72 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
PyPerformance Benchmark: xml_etree OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: xml_etree a b 8 16 24 32 40 35.8 35.7
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression Test: Compression Rating a b c 40K 80K 120K 160K 200K 163859 164050 164313 1. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
Apache CouchDB Bulk Size: 100 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 100 - Inserts: 1000 - Rounds: 30 a b 16 32 48 64 80 69.93 70.11 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.17 Time To Compile a b 20 40 60 80 100 92.05 92.29
BYTE Unix Benchmark Computational Test: System Call OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: System Call a b c 11M 22M 33M 44M 55M 49140426.6 49062324.1 49016743.6 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.5 Pi Digits To Calculate: 500M a b 2 4 6 8 10 8.772 8.794
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union a b 20 40 60 80 100 87.49 87.27 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b 30 60 90 120 150 156.45 156.83 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
PyPerformance Benchmark: crypto_pyaes OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: crypto_pyaes a b 10 20 30 40 50 41.7 41.8
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: STMV with 1,066,628 Atoms a b c 0.1706 0.3412 0.5118 0.6824 0.853 0.75656 0.75634 0.75813
Apache CouchDB Bulk Size: 300 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 300 - Inserts: 3000 - Rounds: 30 a b 80 160 240 320 400 367.83 368.66 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b 30 60 90 120 150 134.60 134.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
OpenSSL Algorithm: ChaCha20-Poly1305 OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: ChaCha20-Poly1305 a b 20000M 40000M 60000M 80000M 100000M 92393529340 92216350580 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: ChaCha20 OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: ChaCha20 a b 30000M 60000M 90000M 120000M 150000M 130588495050 130359884190 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
BYTE Unix Benchmark Computational Test: Whetstone Double OpenBenchmarking.org MWIPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Whetstone Double a b c 70K 140K 210K 280K 350K 343491.9 343113.0 343187.0 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b 30 60 90 120 150 141.12 141.00 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Pabellon Barcelona - Compute: CPU-Only a b 40 80 120 160 200 166.12 166.25
ASTC Encoder Preset: Fast OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Fast a b 90 180 270 360 450 396.65 396.43 1. (CXX) g++ options: -O3 -flto -pthread
Rustls Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a b 400K 800K 1200K 1600K 2000K 1820810.21 1821261.88 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Apache Cassandra Test: Writes OpenBenchmarking.org Op/s, More Is Better Apache Cassandra 5.0 Test: Writes a b 60K 120K 180K 240K 300K 271333 271373
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 a b 3K 6K 9K 12K 15K 12288 12288
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 a b 1300 2600 3900 5200 6500 6144 6144
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 a b 700 1400 2100 2800 3500 3072 3072
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 a b 300 600 900 1200 1500 1536 1536
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 a b 7K 14K 21K 28K 35K 32768 32768
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 a b 4K 8K 12K 16K 20K 16384 16384
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 a b 2K 4K 6K 8K 10K 8192 8192
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 a b 900 1800 2700 3600 4500 4096 4096
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 a b 7K 14K 21K 28K 35K 32768 32768
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 a b 4K 8K 12K 16K 20K 16384 16384
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 a b 2K 4K 6K 8K 10K 8192 8192
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 a b 900 1800 2700 3600 4500 4096 4096
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 a b 7K 14K 21K 28K 35K 32768 32768
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 a b 4K 8K 12K 16K 20K 16384 16384
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 a b 2K 4K 6K 8K 10K 8192 8192
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 a b 900 1800 2700 3600 4500 4096 4096
PyPerformance Benchmark: json_loads OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: json_loads a b 3 6 9 12 15 12.1 12.1
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token a b 12 24 36 48 60 51.86 52.10
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token a b 13 26 39 52 65 55.93 56.26
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token a b 20 40 60 80 100 77.34 77.13
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token a b 20 40 60 80 100 86.06 84.39
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token a b 20 40 60 80 100 101.72 100.94
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token a b 20 40 60 80 100 106.62 107.03
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b 5 10 15 20 25 21.24 20.95 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b 140 280 420 560 700 648.52 657.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b 2 4 6 8 10 7.08601 7.09172 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b 0.5818 1.1636 1.7454 2.3272 2.909 2.55898 2.58589 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b 6 12 18 24 30 23.55 23.83 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b 70 140 210 280 350 310.88 315.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b 0.3572 0.7144 1.0716 1.4288 1.786 1.57084 1.58746 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b 15 30 45 60 75 64.14 65.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b 2 4 6 8 10 6.39112 6.37566 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b 3 6 9 12 15 9.76985 9.62590 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b 20 40 60 80 100 90.45 91.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b 2 4 6 8 10 7.42776 7.44340 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5