eoy2024 Benchmarks for a future article. AMD EPYC 4484PX 12-Core testing with a Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412083-NE-EOY20246055&rdt&grs .
eoy2024 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a 4484PX px AMD EPYC 4564P 16-Core @ 5.88GHz (16 Cores / 32 Threads) Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) AMD Device 14d8 2 x 32GB DRAM-4800MT/s Micron MTC20C2085S1EC48BA1 BC 3201GB Micron_7450_MTFDKCC3T2TFS + 960GB SAMSUNG MZ1L2960HCJR-00A07 ASPEED AMD Rembrandt Radeon HD Audio VA2431 2 x Intel I210 Ubuntu 24.04 6.8.0-11-generic (x86_64) GNOME Shell 45.3 X Server 1.21.1.11 GCC 13.2.0 ext4 1024x768 AMD EPYC 4484PX 12-Core @ 5.66GHz (12 Cores / 24 Threads) 6.12.2-061202-generic (x86_64) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - a: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601209 - 4484PX: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601209 - px: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601209 Java Details - OpenJDK Runtime Environment (build 21.0.2+13-Ubuntu-2) Python Details - Python 3.12.3 Security Details - a: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - 4484PX: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected - px: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
eoy2024 litert: NASNet Mobile onednn: IP Shapes 1D - CPU onednn: Convolution Batch Shapes Auto - CPU byte: System Call cassandra: Writes llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 litert: DeepLab V3 litert: Quantized COCO SSD MobileNet v1 onednn: IP Shapes 3D - CPU onnx: CaffeNet 12-int8 - CPU - Standard byte: Pipe onednn: Deconvolution Batch shapes_3d - CPU primesieve: 1e12 astcenc: Thorough astcenc: Medium astcenc: Fast astcenc: Exhaustive astcenc: Very Thorough primesieve: 1e13 etcpak: Multi-Threaded - ETC2 byte: Whetstone Double llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 ospray: particle_volume/scivis/real_time rustls: handshake - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 byte: Dhrystone 2 onednn: Recurrent Neural Network Training - CPU blender: BMW27 - CPU-Only ospray: particle_volume/ao/real_time stockfish: Chess Benchmark onednn: Recurrent Neural Network Inference - CPU blender: Classroom - CPU-Only ospray: gravity_spheres_volume/dim_512/pathtracer/real_time openssl: AES-128-GCM openssl: AES-256-GCM ospray: gravity_spheres_volume/dim_512/scivis/real_time povray: Trace Time blender: Pabellon Barcelona - CPU-Only blender: Fishy Cat - CPU-Only rustls: handshake - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 ospray: gravity_spheres_volume/dim_512/ao/real_time mt-dgemm: Sustained Floating-Point Rate openssl: ChaCha20 openssl: ChaCha20-Poly1305 blender: Barbershop - CPU-Only llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 onnx: T5 Encoder - CPU - Standard rustls: handshake - TLS13_CHACHA20_POLY1305_SHA256 compress-7zip: Decompression Rating blender: Junkshop - CPU-Only onnx: ResNet101_DUC_HDC-12 - CPU - Standard relion: Basic - CPU stockfish: Chess Benchmark renaissance: Genetic Algorithm Using Jenetics + Futures svt-av1: Preset 3 - Bosphorus 4K build2: Time To Compile xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small simdjson: PartialTweets x265: Bosphorus 4K svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 8 - Bosphorus 4K simdjson: DistinctUserID svt-av1: Preset 5 - Bosphorus 4K ospray: particle_volume/pathtracer/real_time xnnpack: FP32MobileNetV3Large namd: ATPase with 327,506 Atoms onnx: GPT-2 - CPU - Standard svt-av1: Preset 8 - Bosphorus 1080p xnnpack: FP16MobileNetV3Small rustls: handshake-ticket - TLS13_CHACHA20_POLY1305_SHA256 xnnpack: QS8MobileNetV2 rustls: handshake-resume - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard svt-av1: Preset 5 - Beauty 4K 10-bit rustls: handshake-ticket - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 whisperfile: Small rustls: handshake-resume - TLS13_CHACHA20_POLY1305_SHA256 svt-av1: Preset 3 - Bosphorus 1080p namd: STMV with 1,066,628 Atoms compress-7zip: Compression Rating rustls: handshake-resume - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 whisper-cpp: ggml-medium.en - 2016 State of the Union svt-av1: Preset 5 - Bosphorus 1080p pyperformance: async_tree_io onnx: fcn-resnet101-11 - CPU - Standard svt-av1: Preset 8 - Beauty 4K 10-bit build-eigen: Time To Compile rustls: handshake-ticket - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 onednn: Deconvolution Batch shapes_1d - CPU onnx: ArcFace ResNet-100 - CPU - Standard renaissance: Gaussian Mixture Model x265: Bosphorus 1080p whisperfile: Medium onnx: super-resolution-10 - CPU - Standard renaissance: Apache Spark PageRank couchdb: 300 - 1000 - 30 whisperfile: Tiny simdjson: Kostya numpy: couchdb: 500 - 1000 - 30 renaissance: Scala Dotty couchdb: 300 - 3000 - 30 quantlib: XXS cp2k: H20-64 renaissance: Akka Unbalanced Cobwebbed Tree llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 couchdb: 100 - 3000 - 30 onnx: ResNet50 v1-12-int8 - CPU - Standard couchdb: 500 - 3000 - 30 svt-av1: Preset 13 - Bosphorus 4K xnnpack: FP32MobileNetV2 whisper-cpp: ggml-small.en - 2016 State of the Union svt-av1: Preset 13 - Bosphorus 1080p renaissance: Rand Forest pyperformance: asyncio_tcp_ssl couchdb: 100 - 1000 - 30 onnx: ZFNet-512 - CPU - Standard renaissance: Apache Spark Bayes quantlib: S renaissance: Finagle HTTP Requests gromacs: water_GMX50_bare onnx: bertsquad-12 - CPU - Standard svt-av1: Preset 13 - Beauty 4K 10-bit whisper-cpp: ggml-base.en - 2016 State of the Union llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 cp2k: H20-256 litert: Inception V4 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128 renaissance: ALS Movie Lens financebench: Bonds OpenMP pyperformance: python_startup llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16 gcrypt: openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU xnnpack: FP16MobileNetV2 renaissance: Savina Reactors.IO llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128 pyperformance: gc_collect financebench: Repo OpenMP openvino-genai: Gemma-7b-int4-ov - CPU llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 xnnpack: FP16MobileNetV3Large pyperformance: raytrace pyperformance: chaos pyperformance: regex_compile pyperformance: crypto_pyaes openvino-genai: Falcon-7b-instruct-int4-ov - CPU llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 simdjson: TopTweet llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16 pyperformance: json_loads onnx: yolov4 - CPU - Standard litert: Mobilenet Quant llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 128 cp2k: Fayalite-FIST pyperformance: xml_etree llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 litert: Mobilenet Float renaissance: In-Memory Database Shootout llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16 pyperformance: pickle_pure_python pyperformance: django_template llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16 pyperformance: asyncio_websockets pyperformance: go llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128 y-cruncher: 500M xnnpack: FP32MobileNetV1 litert: SqueezeNet pyperformance: pathlib pyperformance: float llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 pyperformance: nbody y-cruncher: 1B simdjson: LargeRand litert: Inception ResNet V2 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 2048 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 1024 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 512 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 256 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 2048 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 1024 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 512 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 256 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256 openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Token openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Token openvino-genai: Gemma-7b-int4-ov - CPU - Time To First Token onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: GPT-2 - CPU - Standard renaissance: Apache Spark ALS a 4484PX px 16936 1.12573 6.67287 49140426.6 271333 355.09 3579.67 2129.52 4.058 636.318 48806257.1 2.41294 6.347 20.3025 156.2217 396.6495 1.6844 2.741 78.498 577.817 343491.9 327.3 8.98486 423535.68 1866536062.7 1372.03 53.55 9.00917 46507038 700.859 143.36 8.82093 104784522170 97172751700 7.58789 18.542 166.12 71.35 80462.6 7.63944 1141.194104 130588495050 92393529340 506.2 279.04 156.453 76454.45 165916 73.56 1.54196 944.27 54752796 732.8 9.59 92.053 1143 979 9.76 32.57 1.422 102.005 10.46 34.538 236.245 1810 2.79632 134.596 339.023 920 404263.45 844 3563852.57 47.0691 6.504 1553632.14 195.41642 388077.69 29.573 0.75656 163859 1820810.21 700.91 101.971 755 3.2167 12.468 58.655 2620332 2.97612 42.4537 3399.5 114.45 534.919 141.117 2412.2 106.13 41.70935 5.97 775.75 148.049 477.0 367.83 13.432 58.191 4403.8 47.72 232.188 390.597 511.775 212.52 1495 245.07838 842.558 414.4 645 69.929 102.331 490.0 12.7476 2319.4 1.692 15.5899 18.588 87.48973 70.85 592.857 21477.8 26.28 9805.7 33061.21875 5.77 24.59 162.125 19.28 1190 3506.4 10.47 677 21418.445312 9.83 70.76 69.26 1498 175 38.2 69.8 41.7 12.93 6.88 10.46 1.78 12.1 11.0552 823.17 1.99 94.032 35.8 7.24 1211.48 3256.1 19.03 165 20.7 10.22 315 77.8 20.13 8.772 1252 1794.11 14.2 50.7 63.09 62.97 68.4 59 18.485 1.83 19530.2 12288 6144 3072 1536 32768 16384 8192 4096 32768 16384 8192 4096 32768 16384 8192 4096 51.86 55.93 77.34 86.06 101.72 106.62 21.2429 648.522 7.08601 2.55898 23.553 310.875 1.57084 64.141 6.39112 9.76985 90.4523 7.42776 8057.56 1.93806 4.11551 30761218.9 174960 232.26 2343.38 1420.15 2.73072 941.401 33443359.2 3.5084 9.116 14.17 109.0265 278.2445 1.1887 1.9412 110.608 410.726 244075.3 243.14 6.44913 306153.2 1346521770.3 1898.36 74.08 6.52776 33702298 965.015 197.2 6.41198 76496336760 71160291870 5.54888 25.264 226.34 96.67 59308.75 5.63122 842.730642 97105235690 68816544020 679.34 222.75 208.174 57716.64 125698 97.01 1.17627 729.4 45267546 904.0 7.684 111.651 1383 809 10.1 27.16 1.188 85.201 10.76 29.094 199.023 1515 2.38124 159.71 287.047 779 344296.24 717 3035330.21 40.0935 5.602 1329363.1 173.38197 333882.92 25.446 0.65119 141263 1586292.42 809.78969 88.415 666 2.81093 10.967 67.364 2282729.64 3.40293 37.3832 3860.6 101.37 473.55091 125.172 2138.1 117.566 37.13462 6.11 745.59 164.468 428.6 406.12 12.1169 53.005 4038.4 52.3 253.99 356.409 559.346 198.112 1365 268.23891 776.115 422.0 590 75.901 110.94 513.2 11.8647 2492.2 1.577 14.5122 17.406 92.70933 66.57 628.104 22083.3 27.59 9378.8 34600.773438 6.08 25.86 171.023 20.28 1217 3655.8 10.91 699 22320.332031 10.23 69.11 66.85 1467 182 39.7 71.7 43.1 13.4 7.11 10.82 1.83 12.4 10.7338 848.943 2.05 92.21 36.8 7.41 1244.7 3241.5 19.49 169 21 10.45 321 78.6 20.39 8.688 1257 1809.18 14.4 51.3 63.8 63.61 68.2 59.5 18.379 1.84 19477.8 12288 6144 3072 1536 32768 16384 8192 4096 32768 16384 8192 4096 32768 16384 8192 4096 49.31 58.91 74.65 93.01 97.79 121.48 24.9402 850.141 7.98873 2.80544 26.7478 355.751 1.06188 68.9051 4.80287 9.01322 93.1605 6.25815 7931.64 1.93913 4.13321 30701622.8 173946 244.77 2359.99 1417.35 2.72942 937.778 33381363.1 3.51243 9.147 14.1464 108.8588 277.2994 1.1862 1.9391 110.709 409.875 244131 232.86 6.52304 304060.28 1340340196.6 1895.68 73.16 6.52206 33871595 966.013 197.53 6.4074 76184405610 70902656480 5.6147 25.328 224.64 97.09 59206.34 5.71084 842.012831 97019897450 68678955550 678.4 208.99 206.091 57688.08 125605 97.1 1.1705 733.02 42973396 920.7 7.646 113.78 1386 837 8.35 26.94 1.184 84.998 8.97 28.824 197.2 1574 2.35379 157.893 286.962 798 342775.29 723 3038723.48 43.362 5.551 1340712.85 167.89219 333574.3 25.447 0.65448 142213 1572010.68 809.489 88.27 656 2.79638 10.855 67.076 2292879.44 3.40628 37.1048 3815.2 101.25 475.51084 125.076 2229.7 119.349 38.71828 5.45 831.42 164.812 436.2 408.483 12.1057 52.724 4002.3 52.37 254.733 356.194 560.7 194.024 1368 266.81425 769.818 453.2 590 76.389 110.892 474.9 11.839 2483.1 1.575 14.5747 17.355 93.45463 66.35 631.31 22752.4 27.8 9275.7 34896.835938 6.09 25.94 163.839 20.29 1248 3676.0 10.93 706 22318.738281 10.24 67.95 66.52 1527 182 39.4 72.5 43.3 13.41 7.12 10.51 1.84 12.5 10.7127 849.209 2.05 94.896 36.5 7.44 1244.51 3175.6 19.5 168 21.2 10.45 322 79.4 20.51 8.623 1272 1821.35 14.4 50.8 63.79 63.41 68.81 59.2 18.365 1.84 19490.7 12288 6144 3072 1536 32768 16384 8192 4096 32768 16384 8192 4096 32768 16384 8192 4096 49.28 58.86 74.54 93 97.61 122.3 23.0604 854.334 7.99486 2.80695 26.9485 357.602 1.066 68.6104 4.85142 9.01687 93.3441 6.33034 OpenBenchmarking.org
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a 4484PX px 4K 8K 12K 16K 20K 16936.00 8057.56 7931.64
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a 4484PX px 0.4363 0.8726 1.3089 1.7452 2.1815 1.12573 1.93806 1.93913 MIN: 1.03 MIN: 1.92 MIN: 1.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a 4484PX px 2 4 6 8 10 6.67287 4.11551 4.13321 MIN: 6.2 MIN: 4.05 MIN: 4.07 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
BYTE Unix Benchmark Computational Test: System Call OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: System Call a 4484PX px 11M 22M 33M 44M 55M 49140426.6 30761218.9 30701622.8 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
Apache Cassandra Test: Writes OpenBenchmarking.org Op/s, More Is Better Apache Cassandra 5.0 Test: Writes a 4484PX px 60K 120K 180K 240K 300K 271333 174960 173946
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a 4484PX px 80 160 240 320 400 355.09 232.26 244.77 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a 4484PX px 800 1600 2400 3200 4000 3579.67 2343.38 2359.99
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a 4484PX px 500 1000 1500 2000 2500 2129.52 1420.15 1417.35
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU a 4484PX px 0.9131 1.8262 2.7393 3.6524 4.5655 4.05800 2.73072 2.72942 MIN: 3.75 MIN: 2.7 MIN: 2.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a 4484PX px 200 400 600 800 1000 636.32 941.40 937.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
BYTE Unix Benchmark Computational Test: Pipe OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Pipe a 4484PX px 10M 20M 30M 40M 50M 48806257.1 33443359.2 33381363.1 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a 4484PX px 0.7903 1.5806 2.3709 3.1612 3.9515 2.41294 3.50840 3.51243 MIN: 2.34 MIN: 3.46 MIN: 3.47 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Primesieve Length: 1e12 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 12.6 Length: 1e12 a 4484PX px 3 6 9 12 15 6.347 9.116 9.147 1. (CXX) g++ options: -O3
ASTC Encoder Preset: Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Thorough a 4484PX px 5 10 15 20 25 20.30 14.17 14.15 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Medium OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Medium a 4484PX px 30 60 90 120 150 156.22 109.03 108.86 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Fast OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Fast a 4484PX px 90 180 270 360 450 396.65 278.24 277.30 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Exhaustive a 4484PX px 0.379 0.758 1.137 1.516 1.895 1.6844 1.1887 1.1862 1. (CXX) g++ options: -O3 -flto -pthread
ASTC Encoder Preset: Very Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Very Thorough a 4484PX px 0.6167 1.2334 1.8501 2.4668 3.0835 2.7410 1.9412 1.9391 1. (CXX) g++ options: -O3 -flto -pthread
Primesieve Length: 1e13 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 12.6 Length: 1e13 a 4484PX px 20 40 60 80 100 78.50 110.61 110.71 1. (CXX) g++ options: -O3
Etcpak Benchmark: Multi-Threaded - Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 2.0 Benchmark: Multi-Threaded - Configuration: ETC2 a 4484PX px 120 240 360 480 600 577.82 410.73 409.88 1. (CXX) g++ options: -flto -pthread
BYTE Unix Benchmark Computational Test: Whetstone Double OpenBenchmarking.org MWIPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Whetstone Double a 4484PX px 70K 140K 210K 280K 350K 343491.9 244075.3 244131.0 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a 4484PX px 70 140 210 280 350 327.30 243.14 232.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/scivis/real_time a 4484PX px 3 6 9 12 15 8.98486 6.44913 6.52304
Rustls Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a 4484PX px 90K 180K 270K 360K 450K 423535.68 306153.20 304060.28 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Dhrystone 2 a 4484PX px 400M 800M 1200M 1600M 2000M 1866536062.7 1346521770.3 1340340196.6 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a 4484PX px 400 800 1200 1600 2000 1372.03 1898.36 1895.68 MIN: 1342.06 MIN: 1894.26 MIN: 1892.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: BMW27 - Compute: CPU-Only a 4484PX px 16 32 48 64 80 53.55 74.08 73.16
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/ao/real_time a 4484PX px 3 6 9 12 15 9.00917 6.52776 6.52206
Stockfish Chess Benchmark OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish Chess Benchmark a 4484PX px 10M 20M 30M 40M 50M 46507038 33702298 33871595 1. Stockfish 16 by the Stockfish developers (see AUTHORS file)
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a 4484PX px 200 400 600 800 1000 700.86 965.02 966.01 MIN: 679.89 MIN: 963.27 MIN: 963.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Classroom - Compute: CPU-Only a 4484PX px 40 80 120 160 200 143.36 197.20 197.53
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time a 4484PX px 2 4 6 8 10 8.82093 6.41198 6.40740
OpenSSL Algorithm: AES-128-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: AES-128-GCM a 4484PX px 20000M 40000M 60000M 80000M 100000M 104784522170 76496336760 76184405610 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: AES-256-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: AES-256-GCM a 4484PX px 20000M 40000M 60000M 80000M 100000M 97172751700 71160291870 70902656480 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time a 4484PX px 2 4 6 8 10 7.58789 5.54888 5.61470
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray Trace Time a 4484PX px 6 12 18 24 30 18.54 25.26 25.33 1. POV-Ray 3.7.0.10.unofficial
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Pabellon Barcelona - Compute: CPU-Only a 4484PX px 50 100 150 200 250 166.12 226.34 224.64
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Fishy Cat - Compute: CPU-Only a 4484PX px 20 40 60 80 100 71.35 96.67 97.09
Rustls Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a 4484PX px 20K 40K 60K 80K 100K 80462.60 59308.75 59206.34 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/ao/real_time a 4484PX px 2 4 6 8 10 7.63944 5.63122 5.71084
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate a 4484PX px 200 400 600 800 1000 1141.19 842.73 842.01 1. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas
OpenSSL Algorithm: ChaCha20 OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: ChaCha20 a 4484PX px 30000M 60000M 90000M 120000M 150000M 130588495050 97105235690 97019897450 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: ChaCha20-Poly1305 OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: ChaCha20-Poly1305 a 4484PX px 20000M 40000M 60000M 80000M 100000M 92393529340 68816544020 68678955550 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Barbershop - Compute: CPU-Only a 4484PX px 150 300 450 600 750 506.20 679.34 678.40
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a 4484PX px 60 120 180 240 300 279.04 222.75 208.99 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a 4484PX px 50 100 150 200 250 156.45 208.17 206.09 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Rustls Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256 a 4484PX px 16K 32K 48K 64K 80K 76454.45 57716.64 57688.08 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression Test: Decompression Rating a 4484PX px 40K 80K 120K 160K 200K 165916 125698 125605 1. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
Blender Blend File: Junkshop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Junkshop - Compute: CPU-Only a 4484PX px 20 40 60 80 100 73.56 97.01 97.10
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a 4484PX px 0.3469 0.6938 1.0407 1.3876 1.7345 1.54196 1.17627 1.17050 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 5.0 Test: Basic - Device: CPU a 4484PX px 200 400 600 800 1000 944.27 729.40 733.02 1. (CXX) g++ options: -fPIC -std=c++14 -fopenmp -O3 -rdynamic -lfftw3f -lfftw3 -ldl -ltiff -lpng -ljpeg -lmpi_cxx -lmpi
Stockfish Chess Benchmark OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 17 Chess Benchmark a 4484PX px 12M 24M 36M 48M 60M 54752796 45267546 42973396 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -msse -msse3 -mpopcnt -mavx2 -mbmi -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto-partition=one -flto=jobserver
Renaissance Test: Genetic Algorithm Using Jenetics + Futures OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Genetic Algorithm Using Jenetics + Futures a 4484PX px 200 400 600 800 1000 732.8 904.0 920.7 MIN: 713.67 / MAX: 813.49 MIN: 886.83 / MAX: 919.31 MIN: 888.75 / MAX: 934.44
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 4K a 4484PX px 3 6 9 12 15 9.590 7.684 7.646 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.17 Time To Compile a 4484PX px 30 60 90 120 150 92.05 111.65 113.78
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a 4484PX px 300 600 900 1200 1500 1143 1383 1386 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small a 4484PX px 200 400 600 800 1000 979 809 837 1. (CXX) g++ options: -O3 -lrt -lm
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: PartialTweets a 4484PX px 3 6 9 12 15 9.76 10.10 8.35 1. (CXX) g++ options: -O3 -lrt
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 Video Input: Bosphorus 4K a 4484PX px 8 16 24 32 40 32.57 27.16 26.94 1. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a 4484PX px 0.32 0.64 0.96 1.28 1.6 1.422 1.188 1.184 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 4K a 4484PX px 20 40 60 80 100 102.01 85.20 85.00 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: DistinctUserID a 4484PX px 3 6 9 12 15 10.46 10.76 8.97 1. (CXX) g++ options: -O3 -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 4K a 4484PX px 8 16 24 32 40 34.54 29.09 28.82 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/pathtracer/real_time a 4484PX px 50 100 150 200 250 236.25 199.02 197.20
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large a 4484PX px 400 800 1200 1600 2000 1810 1515 1574 1. (CXX) g++ options: -O3 -lrt -lm
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: ATPase with 327,506 Atoms a 4484PX px 0.6292 1.2584 1.8876 2.5168 3.146 2.79632 2.38124 2.35379
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a 4484PX px 40 80 120 160 200 134.60 159.71 157.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a 4484PX px 70 140 210 280 350 339.02 287.05 286.96 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a 4484PX px 200 400 600 800 1000 920 779 798 1. (CXX) g++ options: -O3 -lrt -lm
Rustls Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256 a 4484PX px 90K 180K 270K 360K 450K 404263.45 344296.24 342775.29 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a 4484PX px 200 400 600 800 1000 844 717 723 1. (CXX) g++ options: -O3 -lrt -lm
Rustls Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a 4484PX px 800K 1600K 2400K 3200K 4000K 3563852.57 3035330.21 3038723.48 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a 4484PX px 11 22 33 44 55 47.07 40.09 43.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a 4484PX px 2 4 6 8 10 6.504 5.602 5.551 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Rustls Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a 4484PX px 300K 600K 900K 1200K 1500K 1553632.14 1329363.10 1340712.85 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a 4484PX px 40 80 120 160 200 195.42 173.38 167.89
Rustls Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256 a 4484PX px 80K 160K 240K 320K 400K 388077.69 333882.92 333574.30 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a 4484PX px 7 14 21 28 35 29.57 25.45 25.45 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: STMV with 1,066,628 Atoms a 4484PX px 0.1702 0.3404 0.5106 0.6808 0.851 0.75656 0.65119 0.65448
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression Test: Compression Rating a 4484PX px 40K 80K 120K 160K 200K 163859 141263 142213 1. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
Rustls Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a 4484PX px 400K 800K 1200K 1600K 2000K 1820810.21 1586292.42 1572010.68 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union a 4484PX px 200 400 600 800 1000 700.91 809.79 809.49 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a 4484PX px 20 40 60 80 100 101.97 88.42 88.27 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
PyPerformance Benchmark: async_tree_io OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: async_tree_io a 4484PX px 160 320 480 640 800 755 666 656
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a 4484PX px 0.7238 1.4476 2.1714 2.8952 3.619 3.21670 2.81093 2.79638 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a 4484PX px 3 6 9 12 15 12.47 10.97 10.86 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.4.0 Time To Compile a 4484PX px 15 30 45 60 75 58.66 67.36 67.08
Rustls Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a 4484PX px 600K 1200K 1800K 2400K 3000K 2620332.00 2282729.64 2292879.44 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a 4484PX px 0.7664 1.5328 2.2992 3.0656 3.832 2.97612 3.40293 3.40628 MIN: 2.42 MIN: 3.03 MIN: 3.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a 4484PX px 10 20 30 40 50 42.45 37.38 37.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Renaissance Test: Gaussian Mixture Model OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Gaussian Mixture Model a 4484PX px 800 1600 2400 3200 4000 3399.5 3860.6 3815.2 MIN: 2471.52 MIN: 2758.89 / MAX: 3860.61 MIN: 2749.56 / MAX: 3815.24
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 Video Input: Bosphorus 1080p a 4484PX px 30 60 90 120 150 114.45 101.37 101.25 1. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium a 4484PX px 120 240 360 480 600 534.92 473.55 475.51
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a 4484PX px 30 60 90 120 150 141.12 125.17 125.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Renaissance Test: Apache Spark PageRank OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark PageRank a 4484PX px 500 1000 1500 2000 2500 2412.2 2138.1 2229.7 MIN: 1691.04 MIN: 1499.64 MIN: 1612.96 / MAX: 2229.74
Apache CouchDB Bulk Size: 300 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 300 - Inserts: 1000 - Rounds: 30 a 4484PX px 30 60 90 120 150 106.13 117.57 119.35 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a 4484PX px 10 20 30 40 50 41.71 37.13 38.72
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: Kostya a 4484PX px 2 4 6 8 10 5.97 6.11 5.45 1. (CXX) g++ options: -O3 -lrt
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark a 4484PX px 200 400 600 800 1000 775.75 745.59 831.42
Apache CouchDB Bulk Size: 500 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 500 - Inserts: 1000 - Rounds: 30 a 4484PX px 40 80 120 160 200 148.05 164.47 164.81 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Renaissance Test: Scala Dotty OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Scala Dotty a 4484PX px 100 200 300 400 500 477.0 428.6 436.2 MIN: 371.54 / MAX: 736.5 MIN: 378.22 / MAX: 628.77 MIN: 380.62 / MAX: 721.56
Apache CouchDB Bulk Size: 300 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 300 - Inserts: 3000 - Rounds: 30 a 4484PX px 90 180 270 360 450 367.83 406.12 408.48 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
QuantLib Size: XXS OpenBenchmarking.org tasks/s, More Is Better QuantLib 1.35-dev Size: XXS a 4484PX px 3 6 9 12 15 13.43 12.12 12.11 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
CP2K Molecular Dynamics Input: H20-64 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-64 a 4484PX px 13 26 39 52 65 58.19 53.01 52.72 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
Renaissance Test: Akka Unbalanced Cobwebbed Tree OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Akka Unbalanced Cobwebbed Tree a 4484PX px 900 1800 2700 3600 4500 4403.8 4038.4 4002.3 MAX: 5719.11 MIN: 4038.36 / MAX: 5089.28 MIN: 4002.27 / MAX: 4983.72
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a 4484PX px 12 24 36 48 60 47.72 52.30 52.37 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Apache CouchDB Bulk Size: 100 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 100 - Inserts: 3000 - Rounds: 30 a 4484PX px 60 120 180 240 300 232.19 253.99 254.73 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a 4484PX px 80 160 240 320 400 390.60 356.41 356.19 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Apache CouchDB Bulk Size: 500 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 500 - Inserts: 3000 - Rounds: 30 a 4484PX px 120 240 360 480 600 511.78 559.35 560.70 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 4K a 4484PX px 50 100 150 200 250 212.52 198.11 194.02 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 a 4484PX px 300 600 900 1200 1500 1495 1365 1368 1. (CXX) g++ options: -O3 -lrt -lm
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union a 4484PX px 60 120 180 240 300 245.08 268.24 266.81 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a 4484PX px 200 400 600 800 1000 842.56 776.12 769.82 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Renaissance Test: Random Forest OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Random Forest a 4484PX px 100 200 300 400 500 414.4 422.0 453.2 MIN: 322.79 / MAX: 466.1 MIN: 357.91 / MAX: 497.55 MIN: 352.31 / MAX: 513.31
PyPerformance Benchmark: asyncio_tcp_ssl OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: asyncio_tcp_ssl a 4484PX px 140 280 420 560 700 645 590 590
Apache CouchDB Bulk Size: 100 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 100 - Inserts: 1000 - Rounds: 30 a 4484PX px 20 40 60 80 100 69.93 75.90 76.39 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a 4484PX px 20 40 60 80 100 102.33 110.94 110.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Renaissance Test: Apache Spark Bayes OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark Bayes a 4484PX px 110 220 330 440 550 490.0 513.2 474.9 MIN: 459.29 / MAX: 580.9 MIN: 453.66 / MAX: 554.7 MIN: 454.77 / MAX: 514.32
QuantLib Size: S OpenBenchmarking.org tasks/s, More Is Better QuantLib 1.35-dev Size: S a 4484PX px 3 6 9 12 15 12.75 11.86 11.84 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
Renaissance Test: Finagle HTTP Requests OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Finagle HTTP Requests a 4484PX px 500 1000 1500 2000 2500 2319.4 2492.2 2483.1 MIN: 1832.84 MIN: 1947.63 MIN: 1933.43
GROMACS Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS Input: water_GMX50_bare a 4484PX px 0.3807 0.7614 1.1421 1.5228 1.9035 1.692 1.577 1.575 1. GROMACS version: 2023.3-Ubuntu_2023.3_1ubuntu3
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a 4484PX px 4 8 12 16 20 15.59 14.51 14.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a 4484PX px 5 10 15 20 25 18.59 17.41 17.36 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union a 4484PX px 20 40 60 80 100 87.49 92.71 93.45 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a 4484PX px 16 32 48 64 80 70.85 66.57 66.35 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
CP2K Molecular Dynamics Input: H20-256 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-256 a 4484PX px 140 280 420 560 700 592.86 628.10 631.31 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a 4484PX px 5K 10K 15K 20K 25K 21477.8 22083.3 22752.4
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 a 4484PX px 7 14 21 28 35 26.28 27.59 27.80
Renaissance Test: ALS Movie Lens OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: ALS Movie Lens a 4484PX px 2K 4K 6K 8K 10K 9805.7 9378.8 9275.7 MIN: 9253.4 / MAX: 10057.61 MIN: 8718.36 / MAX: 9413.7 MIN: 8821.09 / MAX: 9495.91
FinanceBench Benchmark: Bonds OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Bonds OpenMP a 4484PX px 7K 14K 21K 28K 35K 33061.22 34600.77 34896.84 1. (CXX) g++ options: -O3 -march=native -fopenmp
PyPerformance Benchmark: python_startup OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: python_startup a 4484PX px 2 4 6 8 10 5.77 6.08 6.09
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 a 4484PX px 6 12 18 24 30 24.59 25.86 25.94
Gcrypt Library OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.10.3 a 4484PX px 40 80 120 160 200 162.13 171.02 163.84 1. (CC) gcc options: -O2 -fvisibility=hidden
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU a 4484PX px 5 10 15 20 25 19.28 20.28 20.29
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a 4484PX px 300 600 900 1200 1500 1190 1217 1248 1. (CXX) g++ options: -O3 -lrt -lm
Renaissance Test: Savina Reactors.IO OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Savina Reactors.IO a 4484PX px 800 1600 2400 3200 4000 3506.4 3655.8 3676.0 MIN: 3506.38 / MAX: 4329.37 MIN: 3655.76 / MAX: 4484.97 MAX: 4536.84
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 a 4484PX px 3 6 9 12 15 10.47 10.91 10.93
PyPerformance Benchmark: gc_collect OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: gc_collect a 4484PX px 150 300 450 600 750 677 699 706
FinanceBench Benchmark: Repo OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Repo OpenMP a 4484PX px 5K 10K 15K 20K 25K 21418.45 22320.33 22318.74 1. (CXX) g++ options: -O3 -march=native -fopenmp
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU a 4484PX px 3 6 9 12 15 9.83 10.23 10.24
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a 4484PX px 16 32 48 64 80 70.76 69.11 67.95 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a 4484PX px 15 30 45 60 75 69.26 66.85 66.52 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a 4484PX px 300 600 900 1200 1500 1498 1467 1527 1. (CXX) g++ options: -O3 -lrt -lm
PyPerformance Benchmark: raytrace OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: raytrace a 4484PX px 40 80 120 160 200 175 182 182
PyPerformance Benchmark: chaos OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: chaos a 4484PX px 9 18 27 36 45 38.2 39.7 39.4
PyPerformance Benchmark: regex_compile OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: regex_compile a 4484PX px 16 32 48 64 80 69.8 71.7 72.5
PyPerformance Benchmark: crypto_pyaes OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: crypto_pyaes a 4484PX px 10 20 30 40 50 41.7 43.1 43.3
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU a 4484PX px 3 6 9 12 15 12.93 13.40 13.41
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a 4484PX px 2 4 6 8 10 6.88 7.11 7.12 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
simdjson Throughput Test: TopTweet OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: TopTweet a 4484PX px 3 6 9 12 15 10.46 10.82 10.51 1. (CXX) g++ options: -O3 -lrt
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 a 4484PX px 0.414 0.828 1.242 1.656 2.07 1.78 1.83 1.84
PyPerformance Benchmark: json_loads OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: json_loads a 4484PX px 3 6 9 12 15 12.1 12.4 12.5
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a 4484PX px 3 6 9 12 15 11.06 10.73 10.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a 4484PX px 200 400 600 800 1000 823.17 848.94 849.21
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 a 4484PX px 0.4613 0.9226 1.3839 1.8452 2.3065 1.99 2.05 2.05
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: Fayalite-FIST a 4484PX px 20 40 60 80 100 94.03 92.21 94.90 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
PyPerformance Benchmark: xml_etree OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: xml_etree a 4484PX px 8 16 24 32 40 35.8 36.8 36.5
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a 4484PX px 2 4 6 8 10 7.24 7.41 7.44 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a 4484PX px 300 600 900 1200 1500 1211.48 1244.70 1244.51
Renaissance Test: In-Memory Database Shootout OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: In-Memory Database Shootout a 4484PX px 700 1400 2100 2800 3500 3256.1 3241.5 3175.6 MIN: 3019.89 / MAX: 3599.5 MIN: 3037.03 / MAX: 3491.91 MIN: 2896.06 / MAX: 3367.44
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 a 4484PX px 5 10 15 20 25 19.03 19.49 19.50
PyPerformance Benchmark: pickle_pure_python OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: pickle_pure_python a 4484PX px 40 80 120 160 200 165 169 168
PyPerformance Benchmark: django_template OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: django_template a 4484PX px 5 10 15 20 25 20.7 21.0 21.2
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 a 4484PX px 3 6 9 12 15 10.22 10.45 10.45
PyPerformance Benchmark: asyncio_websockets OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: asyncio_websockets a 4484PX px 70 140 210 280 350 315 321 322
PyPerformance Benchmark: go OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: go a 4484PX px 20 40 60 80 100 77.8 78.6 79.4
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 a 4484PX px 5 10 15 20 25 20.13 20.39 20.51
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.5 Pi Digits To Calculate: 500M a 4484PX px 2 4 6 8 10 8.772 8.688 8.623
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a 4484PX px 300 600 900 1200 1500 1252 1257 1272 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a 4484PX px 400 800 1200 1600 2000 1794.11 1809.18 1821.35
PyPerformance Benchmark: pathlib OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: pathlib a 4484PX px 4 8 12 16 20 14.2 14.4 14.4
PyPerformance Benchmark: float OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: float a 4484PX px 12 24 36 48 60 50.7 51.3 50.8
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a 4484PX px 14 28 42 56 70 63.09 63.80 63.79 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a 4484PX px 14 28 42 56 70 62.97 63.61 63.41 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a 4484PX px 15 30 45 60 75 68.40 68.20 68.81 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
PyPerformance Benchmark: nbody OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: nbody a 4484PX px 13 26 39 52 65 59.0 59.5 59.2
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.5 Pi Digits To Calculate: 1B a 4484PX px 5 10 15 20 25 18.49 18.38 18.37
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: LargeRandom a 4484PX px 0.414 0.828 1.242 1.656 2.07 1.83 1.84 1.84 1. (CXX) g++ options: -O3 -lrt
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 a 4484PX px 4K 8K 12K 16K 20K 19530.2 19477.8 19490.7
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 a 4484PX px 3K 6K 9K 12K 15K 12288 12288 12288
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 a 4484PX px 1300 2600 3900 5200 6500 6144 6144 6144
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 a 4484PX px 700 1400 2100 2800 3500 3072 3072 3072
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 a 4484PX px 300 600 900 1200 1500 1536 1536 1536
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 a 4484PX px 7K 14K 21K 28K 35K 32768 32768 32768
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 a 4484PX px 4K 8K 12K 16K 20K 16384 16384 16384
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 a 4484PX px 2K 4K 6K 8K 10K 8192 8192 8192
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 a 4484PX px 900 1800 2700 3600 4500 4096 4096 4096
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 a 4484PX px 7K 14K 21K 28K 35K 32768 32768 32768
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 a 4484PX px 4K 8K 12K 16K 20K 16384 16384 16384
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 a 4484PX px 2K 4K 6K 8K 10K 8192 8192 8192
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 a 4484PX px 900 1800 2700 3600 4500 4096 4096 4096
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 a 4484PX px 7K 14K 21K 28K 35K 32768 32768 32768
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 a 4484PX px 4K 8K 12K 16K 20K 16384 16384 16384
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 a 4484PX px 2K 4K 6K 8K 10K 8192 8192 8192
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 a 4484PX px 900 1800 2700 3600 4500 4096 4096 4096
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token a 4484PX px 12 24 36 48 60 51.86 49.31 49.28
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token a 4484PX px 13 26 39 52 65 55.93 58.91 58.86
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token a 4484PX px 20 40 60 80 100 77.34 74.65 74.54
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token a 4484PX px 20 40 60 80 100 86.06 93.01 93.00
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token a 4484PX px 20 40 60 80 100 101.72 97.79 97.61
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token a 4484PX px 30 60 90 120 150 106.62 121.48 122.30
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a 4484PX px 6 12 18 24 30 21.24 24.94 23.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a 4484PX px 200 400 600 800 1000 648.52 850.14 854.33 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a 4484PX px 2 4 6 8 10 7.08601 7.98873 7.99486 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a 4484PX px 0.6316 1.2632 1.8948 2.5264 3.158 2.55898 2.80544 2.80695 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a 4484PX px 6 12 18 24 30 23.55 26.75 26.95 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a 4484PX px 80 160 240 320 400 310.88 355.75 357.60 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a 4484PX px 0.3534 0.7068 1.0602 1.4136 1.767 1.57084 1.06188 1.06600 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a 4484PX px 15 30 45 60 75 64.14 68.91 68.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a 4484PX px 2 4 6 8 10 6.39112 4.80287 4.85142 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a 4484PX px 3 6 9 12 15 9.76985 9.01322 9.01687 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a 4484PX px 20 40 60 80 100 90.45 93.16 93.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a 4484PX px 2 4 6 8 10 7.42776 6.25815 6.33034 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5