eoy2024 Benchmarks for a future article. AMD EPYC 4484PX 12-Core testing with a Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412086-NE-EOY20243255&grr&sor .
eoy2024 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a 4484PX px AMD EPYC 4564P 16-Core @ 5.88GHz (16 Cores / 32 Threads) Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) AMD Device 14d8 2 x 32GB DRAM-4800MT/s Micron MTC20C2085S1EC48BA1 BC 3201GB Micron_7450_MTFDKCC3T2TFS + 960GB SAMSUNG MZ1L2960HCJR-00A07 ASPEED AMD Rembrandt Radeon HD Audio VA2431 2 x Intel I210 Ubuntu 24.04 6.8.0-11-generic (x86_64) GNOME Shell 45.3 X Server 1.21.1.11 GCC 13.2.0 ext4 1024x768 AMD EPYC 4484PX 12-Core @ 5.66GHz (12 Cores / 24 Threads) 6.12.2-061202-generic (x86_64) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - a: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601209 - 4484PX: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601209 - px: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601209 Java Details - OpenJDK Runtime Environment (build 21.0.2+13-Ubuntu-2) Python Details - Python 3.12.3 Security Details - a: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - 4484PX: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected - px: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
eoy2024 quantlib: S svt-av1: Preset 3 - Beauty 4K 10-bit relion: Basic - CPU whisper-cpp: ggml-medium.en - 2016 State of the Union blender: Barbershop - CPU-Only cp2k: H20-256 couchdb: 500 - 3000 - 30 whisperfile: Medium llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 2048 quantlib: XXS couchdb: 300 - 3000 - 30 byte: Whetstone Double llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 128 svt-av1: Preset 3 - Bosphorus 4K llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 2048 byte: Pipe byte: Dhrystone 2 byte: System Call whisper-cpp: ggml-small.en - 2016 State of the Union couchdb: 100 - 3000 - 30 svt-av1: Preset 5 - Beauty 4K 10-bit blender: Pabellon Barcelona - CPU-Only xnnpack: QS8MobileNetV2 xnnpack: FP16MobileNetV3Small xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV1 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 openssl: ChaCha20 openssl: ChaCha20-Poly1305 openssl: AES-256-GCM openssl: AES-128-GCM blender: Classroom - CPU-Only whisperfile: Small llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 1024 rustls: handshake-ticket - TLS13_CHACHA20_POLY1305_SHA256 rustls: handshake-resume - TLS13_CHACHA20_POLY1305_SHA256 gcrypt: ospray: particle_volume/scivis/real_time couchdb: 500 - 1000 - 30 rustls: handshake-ticket - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 rustls: handshake-resume - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 ospray: particle_volume/pathtracer/real_time svt-av1: Preset 3 - Bosphorus 1080p cassandra: Writes pyperformance: async_tree_io openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Token openvino-genai: Gemma-7b-int4-ov - CPU - Time To First Token openvino-genai: Gemma-7b-int4-ov - CPU astcenc: Very Thorough couchdb: 300 - 1000 - 30 astcenc: Exhaustive ospray: particle_volume/ao/real_time gromacs: water_GMX50_bare pyperformance: xml_etree llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 1024 svt-av1: Preset 8 - Beauty 4K 10-bit build2: Time To Compile pyperformance: asyncio_tcp_ssl numpy: primesieve: 1e13 simdjson: Kostya llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 512 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048 pyperformance: python_startup llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 cp2k: Fayalite-FIST llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 whisper-cpp: ggml-base.en - 2016 State of the Union llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 blender: Junkshop - CPU-Only blender: Fishy Cat - CPU-Only openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU namd: STMV with 1,066,628 Atoms rustls: handshake - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 simdjson: LargeRand renaissance: ALS Movie Lens svt-av1: Preset 5 - Bosphorus 4K stockfish: Chess Benchmark onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU couchdb: 100 - 1000 - 30 simdjson: DistinctUserID llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128 simdjson: TopTweet renaissance: In-Memory Database Shootout simdjson: PartialTweets svt-av1: Preset 13 - Beauty 4K 10-bit renaissance: Akka Unbalanced Cobwebbed Tree renaissance: Apache Spark PageRank blender: BMW27 - CPU-Only renaissance: Gaussian Mixture Model stockfish: Chess Benchmark pyperformance: gc_collect renaissance: Savina Reactors.IO llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 512 ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/ao/real_time renaissance: Apache Spark Bayes build-eigen: Time To Compile renaissance: Finagle HTTP Requests renaissance: Rand Forest ospray: gravity_spheres_volume/dim_512/pathtracer/real_time renaissance: Scala Dotty onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard renaissance: Genetic Algorithm Using Jenetics + Futures onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard pyperformance: asyncio_websockets cp2k: H20-64 mt-dgemm: Sustained Floating-Point Rate llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048 litert: Inception V4 litert: Inception ResNet V2 litert: NASNet Mobile litert: DeepLab V3 litert: Mobilenet Float litert: SqueezeNet litert: Quantized COCO SSD MobileNet v1 litert: Mobilenet Quant rustls: handshake-ticket - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 256 rustls: handshake-resume - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 financebench: Bonds OpenMP namd: ATPase with 327,506 Atoms svt-av1: Preset 5 - Bosphorus 1080p whisperfile: Tiny pyperformance: django_template astcenc: Thorough openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Token openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU etcpak: Multi-Threaded - ETC2 pyperformance: raytrace pyperformance: crypto_pyaes pyperformance: float llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 256 pyperformance: go financebench: Repo OpenMP svt-av1: Preset 8 - Bosphorus 4K pyperformance: chaos pyperformance: regex_compile rustls: handshake - TLS13_CHACHA20_POLY1305_SHA256 rustls: handshake - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 pyperformance: pickle_pure_python llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512 pyperformance: pathlib povray: Trace Time onednn: Deconvolution Batch shapes_1d - CPU llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16 pyperformance: json_loads llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024 pyperformance: nbody y-cruncher: 1B x265: Bosphorus 4K compress-7zip: Decompression Rating compress-7zip: Compression Rating astcenc: Fast onednn: IP Shapes 1D - CPU llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 svt-av1: Preset 13 - Bosphorus 4K llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256 llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16 svt-av1: Preset 8 - Bosphorus 1080p llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 astcenc: Medium y-cruncher: 500M llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16 onednn: IP Shapes 3D - CPU llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512 primesieve: 1e12 onednn: Convolution Batch Shapes Auto - CPU x265: Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256 onednn: Deconvolution Batch shapes_3d - CPU openssl: SHA256 a 4484PX px 12.7476 1.422 944.27 700.91 506.2 592.857 511.775 534.919 12288 13.432 367.83 343491.9 1.99 9.59 32768 48806257.1 1866536062.7 49140426.6 245.07838 232.188 6.504 166.12 844 920 1498 1190 1143 979 1810 1495 1252 62.97 63.09 130588495050 92393529340 97172751700 104784522170 143.36 195.41642 10.47 6144 404263.45 388077.69 162.125 8.98486 148.049 1553632.14 1820810.21 236.245 29.573 271333 755 101.72 106.62 9.83 2.741 106.13 1.6844 9.00917 1.692 35.8 16384 12.468 92.053 645 775.75 78.498 5.97 3072 32768 5.77 20.13 6.88 94.032 70.85 69.26 87.48973 7.24 73.56 71.35 77.34 86.06 12.93 0.75656 423535.68 1.83 9805.7 34.538 54752796 1372.03 700.859 69.929 10.46 26.28 10.46 3256.1 9.76 18.588 4403.8 2412.2 53.55 3399.5 46507038 677 3506.4 8192 7.58789 7.63944 490.0 58.655 2319.4 414.4 8.82093 477.0 648.522 1.54196 732.8 7.42776 134.596 310.875 3.2167 9.76985 102.331 64.141 15.5899 6.39112 156.453 90.4523 11.0552 23.553 42.4537 21.2429 47.0691 1.57084 636.318 2.55898 390.597 7.08601 141.117 315 58.191 1141.194104 279.04 16384 32768 21477.8 19530.2 16936 3579.67 1211.48 1794.11 2129.52 823.17 2620332 1536 3563852.57 70.76 1.78 68.4 33061.21875 2.79632 101.971 41.70935 20.7 20.3025 51.86 55.93 19.28 577.817 175 41.7 50.7 4096 77.8 21418.445312 102.005 38.2 69.8 76454.45 80462.6 165 8192 14.2 18.542 2.97612 355.09 10.22 12.1 16384 59 18.485 32.57 165916 163859 396.6495 1.12573 47.72 212.52 4096 19.03 339.023 327.3 156.2217 8.772 24.59 4.058 8192 6.347 6.67287 114.45 842.558 4096 2.41294 11.8647 1.188 729.4 809.78969 679.34 628.104 559.346 473.55091 12288 12.1169 406.12 244075.3 2.05 7.684 32768 33443359.2 1346521770.3 30761218.9 268.23891 253.99 5.602 226.34 717 779 1467 1217 1383 809 1515 1365 1257 63.61 63.8 97105235690 68816544020 71160291870 76496336760 197.2 173.38197 10.91 6144 344296.24 333882.92 171.023 6.44913 164.468 1329363.1 1586292.42 199.023 25.446 174960 666 97.79 121.48 10.23 1.9412 117.566 1.1887 6.52776 1.577 36.8 16384 10.967 111.651 590 745.59 110.608 6.11 3072 32768 6.08 20.39 7.11 92.21 66.57 66.85 92.70933 7.41 97.01 96.67 74.65 93.01 13.4 0.65119 306153.2 1.84 9378.8 29.094 45267546 1898.36 965.015 75.901 10.76 27.59 10.82 3241.5 10.1 17.406 4038.4 2138.1 74.08 3860.6 33702298 699 3655.8 8192 5.54888 5.63122 513.2 67.364 2492.2 422.0 6.41198 428.6 850.141 1.17627 904.0 6.25815 159.71 355.751 2.81093 9.01322 110.94 68.9051 14.5122 4.80287 208.174 93.1605 10.7338 26.7478 37.3832 24.9402 40.0935 1.06188 941.401 2.80544 356.409 7.98873 125.172 321 53.005 842.730642 222.75 16384 32768 22083.3 19477.8 8057.56 2343.38 1244.7 1809.18 1420.15 848.943 2282729.64 1536 3035330.21 69.11 1.83 68.2 34600.773438 2.38124 88.415 37.13462 21 14.17 49.31 58.91 20.28 410.726 182 43.1 51.3 4096 78.6 22320.332031 85.201 39.7 71.7 57716.64 59308.75 169 8192 14.4 25.264 3.40293 232.26 10.45 12.4 16384 59.5 18.379 27.16 125698 141263 278.2445 1.93806 52.3 198.112 4096 19.49 287.047 243.14 109.0265 8.688 25.86 2.73072 8192 9.116 4.11551 101.37 776.115 4096 3.5084 11.839 1.184 733.02 809.489 678.4 631.31 560.7 475.51084 12288 12.1057 408.483 244131 2.05 7.646 32768 33381363.1 1340340196.6 30701622.8 266.81425 254.733 5.551 224.64 723 798 1527 1248 1386 837 1574 1368 1272 63.41 63.79 97019897450 68678955550 70902656480 76184405610 197.53 167.89219 10.93 6144 342775.29 333574.3 163.839 6.52304 164.812 1340712.85 1572010.68 197.2 25.447 173946 656 97.61 122.3 10.24 1.9391 119.349 1.1862 6.52206 1.575 36.5 16384 10.855 113.78 590 831.42 110.709 5.45 3072 32768 6.09 20.51 7.12 94.896 66.35 66.52 93.45463 7.44 97.1 97.09 74.54 93 13.41 0.65448 304060.28 1.84 9275.7 28.824 42973396 1895.68 966.013 76.389 8.97 27.8 10.51 3175.6 8.35 17.355 4002.3 2229.7 73.16 3815.2 33871595 706 3676.0 8192 5.6147 5.71084 474.9 67.076 2483.1 453.2 6.4074 436.2 854.334 1.1705 920.7 6.33034 157.893 357.602 2.79638 9.01687 110.892 68.6104 14.5747 4.85142 206.091 93.3441 10.7127 26.9485 37.1048 23.0604 43.362 1.066 937.778 2.80695 356.194 7.99486 125.076 322 52.724 842.012831 208.99 16384 32768 22752.4 19490.7 7931.64 2359.99 1244.51 1821.35 1417.35 849.209 2292879.44 1536 3038723.48 67.95 1.84 68.81 34896.835938 2.35379 88.27 38.71828 21.2 14.1464 49.28 58.86 20.29 409.875 182 43.3 50.8 4096 79.4 22318.738281 84.998 39.4 72.5 57688.08 59206.34 168 8192 14.4 25.328 3.40628 244.77 10.45 12.5 16384 59.2 18.365 26.94 125605 142213 277.2994 1.93913 52.37 194.024 4096 19.5 286.962 232.86 108.8588 8.623 25.94 2.72942 8192 9.147 4.13321 101.25 769.818 4096 3.51243 OpenBenchmarking.org
QuantLib Size: S OpenBenchmarking.org tasks/s, More Is Better QuantLib 1.35-dev Size: S a 4484PX px 3 6 9 12 15 12.75 11.86 11.84 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a 4484PX px 0.32 0.64 0.96 1.28 1.6 1.422 1.188 1.184 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 5.0 Test: Basic - Device: CPU 4484PX px a 200 400 600 800 1000 729.40 733.02 944.27 1. (CXX) g++ options: -fPIC -std=c++14 -fopenmp -O3 -rdynamic -lfftw3f -lfftw3 -ldl -ltiff -lpng -ljpeg -lmpi_cxx -lmpi
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union a px 4484PX 200 400 600 800 1000 700.91 809.49 809.79 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Barbershop - Compute: CPU-Only a px 4484PX 150 300 450 600 750 506.20 678.40 679.34
CP2K Molecular Dynamics Input: H20-256 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-256 a 4484PX px 140 280 420 560 700 592.86 628.10 631.31 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
Apache CouchDB Bulk Size: 500 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 500 - Inserts: 3000 - Rounds: 30 a 4484PX px 120 240 360 480 600 511.78 559.35 560.70 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium 4484PX px a 120 240 360 480 600 473.55 475.51 534.92
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 px 4484PX a 3K 6K 9K 12K 15K 12288 12288 12288
QuantLib Size: XXS OpenBenchmarking.org tasks/s, More Is Better QuantLib 1.35-dev Size: XXS a 4484PX px 3 6 9 12 15 13.43 12.12 12.11 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
Apache CouchDB Bulk Size: 300 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 300 - Inserts: 3000 - Rounds: 30 a 4484PX px 90 180 270 360 450 367.83 406.12 408.48 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
BYTE Unix Benchmark Computational Test: Whetstone Double OpenBenchmarking.org MWIPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Whetstone Double a px 4484PX 70K 140K 210K 280K 350K 343491.9 244131.0 244075.3 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 px 4484PX a 0.4613 0.9226 1.3839 1.8452 2.3065 2.05 2.05 1.99
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 4K a 4484PX px 3 6 9 12 15 9.590 7.684 7.646 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 px 4484PX a 7K 14K 21K 28K 35K 32768 32768 32768
BYTE Unix Benchmark Computational Test: Pipe OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Pipe a 4484PX px 10M 20M 30M 40M 50M 48806257.1 33443359.2 33381363.1 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: Dhrystone 2 a 4484PX px 400M 800M 1200M 1600M 2000M 1866536062.7 1346521770.3 1340340196.6 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
BYTE Unix Benchmark Computational Test: System Call OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 5.1.3-git Computational Test: System Call a 4484PX px 11M 22M 33M 44M 55M 49140426.6 30761218.9 30701622.8 1. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union a px 4484PX 60 120 180 240 300 245.08 266.81 268.24 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Apache CouchDB Bulk Size: 100 - Inserts: 3000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 100 - Inserts: 3000 - Rounds: 30 a 4484PX px 60 120 180 240 300 232.19 253.99 254.73 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a 4484PX px 2 4 6 8 10 6.504 5.602 5.551 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Pabellon Barcelona - Compute: CPU-Only a px 4484PX 50 100 150 200 250 166.12 224.64 226.34
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 4484PX px a 200 400 600 800 1000 717 723 844 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small 4484PX px a 200 400 600 800 1000 779 798 920 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large 4484PX a px 300 600 900 1200 1500 1467 1498 1527 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a 4484PX px 300 600 900 1200 1500 1190 1217 1248 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a 4484PX px 300 600 900 1200 1500 1143 1383 1386 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small 4484PX px a 200 400 600 800 1000 809 837 979 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large 4484PX px a 400 800 1200 1600 2000 1515 1574 1810 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 4484PX px a 300 600 900 1200 1500 1365 1368 1495 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a 4484PX px 300 600 900 1200 1500 1252 1257 1272 1. (CXX) g++ options: -O3 -lrt -lm
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 4484PX px a 14 28 42 56 70 63.61 63.41 62.97 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 4484PX px a 14 28 42 56 70 63.80 63.79 63.09 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
OpenSSL Algorithm: ChaCha20 OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: ChaCha20 a 4484PX px 30000M 60000M 90000M 120000M 150000M 130588495050 97105235690 97019897450 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: ChaCha20-Poly1305 OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: ChaCha20-Poly1305 a 4484PX px 20000M 40000M 60000M 80000M 100000M 92393529340 68816544020 68678955550 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: AES-256-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: AES-256-GCM a 4484PX px 20000M 40000M 60000M 80000M 100000M 97172751700 71160291870 70902656480 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
OpenSSL Algorithm: AES-128-GCM OpenBenchmarking.org byte/s, More Is Better OpenSSL Algorithm: AES-128-GCM a 4484PX px 20000M 40000M 60000M 80000M 100000M 104784522170 76496336760 76184405610 1. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Classroom - Compute: CPU-Only a 4484PX px 40 80 120 160 200 143.36 197.20 197.53
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small px 4484PX a 40 80 120 160 200 167.89 173.38 195.42
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 px 4484PX a 3 6 9 12 15 10.93 10.91 10.47
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 px 4484PX a 1300 2600 3900 5200 6500 6144 6144 6144
Rustls Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256 a 4484PX px 90K 180K 270K 360K 450K 404263.45 344296.24 342775.29 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Rustls Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256 a 4484PX px 80K 160K 240K 320K 400K 388077.69 333882.92 333574.30 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Gcrypt Library OpenBenchmarking.org Seconds, Fewer Is Better Gcrypt Library 1.10.3 a px 4484PX 40 80 120 160 200 162.13 163.84 171.02 1. (CC) gcc options: -O2 -fvisibility=hidden
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/scivis/real_time a px 4484PX 3 6 9 12 15 8.98486 6.52304 6.44913
Apache CouchDB Bulk Size: 500 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 500 - Inserts: 1000 - Rounds: 30 a 4484PX px 40 80 120 160 200 148.05 164.47 164.81 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
Rustls Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a px 4484PX 300K 600K 900K 1200K 1500K 1553632.14 1340712.85 1329363.10 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Rustls Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a 4484PX px 400K 800K 1200K 1600K 2000K 1820810.21 1586292.42 1572010.68 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/pathtracer/real_time a 4484PX px 50 100 150 200 250 236.25 199.02 197.20
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a px 4484PX 7 14 21 28 35 29.57 25.45 25.45 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Apache Cassandra Test: Writes OpenBenchmarking.org Op/s, More Is Better Apache Cassandra 5.0 Test: Writes a 4484PX px 60K 120K 180K 240K 300K 271333 174960 173946
PyPerformance Benchmark: async_tree_io OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: async_tree_io px 4484PX a 160 320 480 640 800 656 666 755
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token px 4484PX a 20 40 60 80 100 97.61 97.79 101.72
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token a 4484PX px 30 60 90 120 150 106.62 121.48 122.30
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU px 4484PX a 3 6 9 12 15 10.24 10.23 9.83
ASTC Encoder Preset: Very Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Very Thorough a 4484PX px 0.6167 1.2334 1.8501 2.4668 3.0835 2.7410 1.9412 1.9391 1. (CXX) g++ options: -O3 -flto -pthread
Apache CouchDB Bulk Size: 300 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 300 - Inserts: 1000 - Rounds: 30 a 4484PX px 30 60 90 120 150 106.13 117.57 119.35 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
ASTC Encoder Preset: Exhaustive OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Exhaustive a 4484PX px 0.379 0.758 1.137 1.516 1.895 1.6844 1.1887 1.1862 1. (CXX) g++ options: -O3 -flto -pthread
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: particle_volume/ao/real_time a 4484PX px 3 6 9 12 15 9.00917 6.52776 6.52206
GROMACS Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS Input: water_GMX50_bare a 4484PX px 0.3807 0.7614 1.1421 1.5228 1.9035 1.692 1.577 1.575 1. GROMACS version: 2023.3-Ubuntu_2023.3_1ubuntu3
PyPerformance Benchmark: xml_etree OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: xml_etree a px 4484PX 8 16 24 32 40 35.8 36.5 36.8
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 px 4484PX a 4K 8K 12K 16K 20K 16384 16384 16384
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a 4484PX px 3 6 9 12 15 12.47 10.97 10.86 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Build2 Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Build2 0.17 Time To Compile a 4484PX px 30 60 90 120 150 92.05 111.65 113.78
PyPerformance Benchmark: asyncio_tcp_ssl OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: asyncio_tcp_ssl 4484PX px a 140 280 420 560 700 590 590 645
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark px a 4484PX 200 400 600 800 1000 831.42 775.75 745.59
Primesieve Length: 1e13 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 12.6 Length: 1e13 a 4484PX px 20 40 60 80 100 78.50 110.61 110.71 1. (CXX) g++ options: -O3
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: Kostya 4484PX a px 2 4 6 8 10 6.11 5.97 5.45 1. (CXX) g++ options: -O3 -lrt
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 px 4484PX a 700 1400 2100 2800 3500 3072 3072 3072
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 px 4484PX a 7K 14K 21K 28K 35K 32768 32768 32768
PyPerformance Benchmark: python_startup OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: python_startup a 4484PX px 2 4 6 8 10 5.77 6.08 6.09
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 px 4484PX a 5 10 15 20 25 20.51 20.39 20.13
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 px 4484PX a 2 4 6 8 10 7.12 7.11 6.88 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: Fayalite-FIST 4484PX a px 20 40 60 80 100 92.21 94.03 94.90 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a 4484PX px 16 32 48 64 80 70.85 66.57 66.35 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a 4484PX px 15 30 45 60 75 69.26 66.85 66.52 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union a 4484PX px 20 40 60 80 100 87.49 92.71 93.45 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 px 4484PX a 2 4 6 8 10 7.44 7.41 7.24 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Blender Blend File: Junkshop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Junkshop - Compute: CPU-Only a 4484PX px 20 40 60 80 100 73.56 97.01 97.10
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Fishy Cat - Compute: CPU-Only a 4484PX px 20 40 60 80 100 71.35 96.67 97.09
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token px 4484PX a 20 40 60 80 100 74.54 74.65 77.34
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token a px 4484PX 20 40 60 80 100 86.06 93.00 93.01
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU px 4484PX a 3 6 9 12 15 13.41 13.40 12.93
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: STMV with 1,066,628 Atoms a px 4484PX 0.1702 0.3404 0.5106 0.6808 0.851 0.75656 0.65448 0.65119
Rustls Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 a 4484PX px 90K 180K 270K 360K 450K 423535.68 306153.20 304060.28 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: LargeRandom px 4484PX a 0.414 0.828 1.242 1.656 2.07 1.84 1.84 1.83 1. (CXX) g++ options: -O3 -lrt
Renaissance Test: ALS Movie Lens OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: ALS Movie Lens px 4484PX a 2K 4K 6K 8K 10K 9275.7 9378.8 9805.7 MIN: 8821.09 / MAX: 9495.91 MIN: 8718.36 / MAX: 9413.7 MIN: 9253.4 / MAX: 10057.61
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 4K a 4484PX px 8 16 24 32 40 34.54 29.09 28.82 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Stockfish Chess Benchmark OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 17 Chess Benchmark a 4484PX px 12M 24M 36M 48M 60M 54752796 45267546 42973396 1. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -msse -msse3 -mpopcnt -mavx2 -mbmi -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto-partition=one -flto=jobserver
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a px 4484PX 400 800 1200 1600 2000 1372.03 1895.68 1898.36 MIN: 1342.06 MIN: 1892.59 MIN: 1894.26 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a 4484PX px 200 400 600 800 1000 700.86 965.02 966.01 MIN: 679.89 MIN: 963.27 MIN: 963.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Apache CouchDB Bulk Size: 100 - Inserts: 1000 - Rounds: 30 OpenBenchmarking.org Seconds, Fewer Is Better Apache CouchDB 3.4.1 Bulk Size: 100 - Inserts: 1000 - Rounds: 30 a 4484PX px 20 40 60 80 100 69.93 75.90 76.39 1. (CXX) g++ options: -flto -lstdc++ -shared -lei
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: DistinctUserID 4484PX a px 3 6 9 12 15 10.76 10.46 8.97 1. (CXX) g++ options: -O3 -lrt
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 px 4484PX a 7 14 21 28 35 27.80 27.59 26.28
simdjson Throughput Test: TopTweet OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: TopTweet 4484PX px a 3 6 9 12 15 10.82 10.51 10.46 1. (CXX) g++ options: -O3 -lrt
Renaissance Test: In-Memory Database Shootout OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: In-Memory Database Shootout px 4484PX a 700 1400 2100 2800 3500 3175.6 3241.5 3256.1 MIN: 2896.06 / MAX: 3367.44 MIN: 3037.03 / MAX: 3491.91 MIN: 3019.89 / MAX: 3599.5
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: PartialTweets 4484PX a px 3 6 9 12 15 10.10 9.76 8.35 1. (CXX) g++ options: -O3 -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a 4484PX px 5 10 15 20 25 18.59 17.41 17.36 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Renaissance Test: Akka Unbalanced Cobwebbed Tree OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Akka Unbalanced Cobwebbed Tree px 4484PX a 900 1800 2700 3600 4500 4002.3 4038.4 4403.8 MIN: 4002.27 / MAX: 4983.72 MIN: 4038.36 / MAX: 5089.28 MAX: 5719.11
Renaissance Test: Apache Spark PageRank OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark PageRank 4484PX px a 500 1000 1500 2000 2500 2138.1 2229.7 2412.2 MIN: 1499.64 MIN: 1612.96 / MAX: 2229.74 MIN: 1691.04
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: BMW27 - Compute: CPU-Only a px 4484PX 16 32 48 64 80 53.55 73.16 74.08
Renaissance Test: Gaussian Mixture Model OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Gaussian Mixture Model a px 4484PX 800 1600 2400 3200 4000 3399.5 3815.2 3860.6 MIN: 2471.52 MIN: 2749.56 / MAX: 3815.24 MIN: 2758.89 / MAX: 3860.61
Stockfish Chess Benchmark OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish Chess Benchmark a px 4484PX 10M 20M 30M 40M 50M 46507038 33871595 33702298 1. Stockfish 16 by the Stockfish developers (see AUTHORS file)
PyPerformance Benchmark: gc_collect OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: gc_collect a 4484PX px 150 300 450 600 750 677 699 706
Renaissance Test: Savina Reactors.IO OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Savina Reactors.IO a 4484PX px 800 1600 2400 3200 4000 3506.4 3655.8 3676.0 MIN: 3506.38 / MAX: 4329.37 MIN: 3655.76 / MAX: 4484.97 MAX: 4536.84
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 px 4484PX a 2K 4K 6K 8K 10K 8192 8192 8192
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time a px 4484PX 2 4 6 8 10 7.58789 5.61470 5.54888
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/ao/real_time a px 4484PX 2 4 6 8 10 7.63944 5.71084 5.63122
Renaissance Test: Apache Spark Bayes OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark Bayes px a 4484PX 110 220 330 440 550 474.9 490.0 513.2 MIN: 454.77 / MAX: 514.32 MIN: 459.29 / MAX: 580.9 MIN: 453.66 / MAX: 554.7
Timed Eigen Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Eigen Compilation 3.4.0 Time To Compile a px 4484PX 15 30 45 60 75 58.66 67.08 67.36
Renaissance Test: Finagle HTTP Requests OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Finagle HTTP Requests a px 4484PX 500 1000 1500 2000 2500 2319.4 2483.1 2492.2 MIN: 1832.84 MIN: 1933.43 MIN: 1947.63
Renaissance Test: Random Forest OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Random Forest a 4484PX px 100 200 300 400 500 414.4 422.0 453.2 MIN: 322.79 / MAX: 466.1 MIN: 357.91 / MAX: 497.55 MIN: 352.31 / MAX: 513.31
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 3.2 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time a 4484PX px 2 4 6 8 10 8.82093 6.41198 6.40740
Renaissance Test: Scala Dotty OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Scala Dotty 4484PX px a 100 200 300 400 500 428.6 436.2 477.0 MIN: 378.22 / MAX: 628.77 MIN: 380.62 / MAX: 721.56 MIN: 371.54 / MAX: 736.5
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a 4484PX px 200 400 600 800 1000 648.52 850.14 854.33 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a 4484PX px 0.3469 0.6938 1.0407 1.3876 1.7345 1.54196 1.17627 1.17050 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Renaissance Test: Genetic Algorithm Using Jenetics + Futures OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Genetic Algorithm Using Jenetics + Futures a 4484PX px 200 400 600 800 1000 732.8 904.0 920.7 MIN: 713.67 / MAX: 813.49 MIN: 886.83 / MAX: 919.31 MIN: 888.75 / MAX: 934.44
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard 4484PX px a 2 4 6 8 10 6.25815 6.33034 7.42776 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard 4484PX px a 40 80 120 160 200 159.71 157.89 134.60 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a 4484PX px 80 160 240 320 400 310.88 355.75 357.60 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a 4484PX px 0.7238 1.4476 2.1714 2.8952 3.619 3.21670 2.81093 2.79638 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard 4484PX px a 3 6 9 12 15 9.01322 9.01687 9.76985 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard 4484PX px a 20 40 60 80 100 110.94 110.89 102.33 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a px 4484PX 15 30 45 60 75 64.14 68.61 68.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a px 4484PX 4 8 12 16 20 15.59 14.57 14.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard 4484PX px a 2 4 6 8 10 4.80287 4.85142 6.39112 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard 4484PX px a 50 100 150 200 250 208.17 206.09 156.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a 4484PX px 20 40 60 80 100 90.45 93.16 93.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a 4484PX px 3 6 9 12 15 11.06 10.73 10.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a 4484PX px 6 12 18 24 30 23.55 26.75 26.95 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a 4484PX px 10 20 30 40 50 42.45 37.38 37.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a px 4484PX 6 12 18 24 30 21.24 23.06 24.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a px 4484PX 11 22 33 44 55 47.07 43.36 40.09 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard 4484PX px a 0.3534 0.7068 1.0602 1.4136 1.767 1.06188 1.06600 1.57084 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard 4484PX px a 200 400 600 800 1000 941.40 937.78 636.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a 4484PX px 0.6316 1.2632 1.8948 2.5264 3.158 2.55898 2.80544 2.80695 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a 4484PX px 80 160 240 320 400 390.60 356.41 356.19 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a 4484PX px 2 4 6 8 10 7.08601 7.98873 7.99486 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a 4484PX px 30 60 90 120 150 141.12 125.17 125.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
PyPerformance Benchmark: asyncio_websockets OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: asyncio_websockets a 4484PX px 70 140 210 280 350 315 321 322
CP2K Molecular Dynamics Input: H20-64 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-64 px 4484PX a 13 26 39 52 65 52.72 53.01 58.19 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate a 4484PX px 200 400 600 800 1000 1141.19 842.73 842.01 1. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a 4484PX px 60 120 180 240 300 279.04 222.75 208.99 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 px 4484PX a 4K 8K 12K 16K 20K 16384 16384 16384
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 px 4484PX a 7K 14K 21K 28K 35K 32768 32768 32768
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a 4484PX px 5K 10K 15K 20K 25K 21477.8 22083.3 22752.4
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 4484PX px a 4K 8K 12K 16K 20K 19477.8 19490.7 19530.2
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile px 4484PX a 4K 8K 12K 16K 20K 7931.64 8057.56 16936.00
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 4484PX px a 800 1600 2400 3200 4000 2343.38 2359.99 3579.67
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a px 4484PX 300 600 900 1200 1500 1211.48 1244.51 1244.70
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a 4484PX px 400 800 1200 1600 2000 1794.11 1809.18 1821.35
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 px 4484PX a 500 1000 1500 2000 2500 1417.35 1420.15 2129.52
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a 4484PX px 200 400 600 800 1000 823.17 848.94 849.21
Rustls Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a px 4484PX 600K 1200K 1800K 2400K 3000K 2620332.00 2292879.44 2282729.64 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 px 4484PX a 300 600 900 1200 1500 1536 1536 1536
Rustls Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a px 4484PX 800K 1600K 2400K 3200K 4000K 3563852.57 3038723.48 3035330.21 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a 4484PX px 16 32 48 64 80 70.76 69.11 67.95 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 px 4484PX a 0.414 0.828 1.242 1.656 2.07 1.84 1.83 1.78
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 px a 4484PX 15 30 45 60 75 68.81 68.40 68.20 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
FinanceBench Benchmark: Bonds OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Bonds OpenMP a 4484PX px 7K 14K 21K 28K 35K 33061.22 34600.77 34896.84 1. (CXX) g++ options: -O3 -march=native -fopenmp
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: ATPase with 327,506 Atoms a 4484PX px 0.6292 1.2584 1.8876 2.5168 3.146 2.79632 2.38124 2.35379
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a 4484PX px 20 40 60 80 100 101.97 88.42 88.27 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny 4484PX px a 10 20 30 40 50 37.13 38.72 41.71
PyPerformance Benchmark: django_template OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: django_template a 4484PX px 5 10 15 20 25 20.7 21.0 21.2
ASTC Encoder Preset: Thorough OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Thorough a 4484PX px 5 10 15 20 25 20.30 14.17 14.15 1. (CXX) g++ options: -O3 -flto -pthread
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token px 4484PX a 12 24 36 48 60 49.28 49.31 51.86
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token a px 4484PX 13 26 39 52 65 55.93 58.86 58.91
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU px 4484PX a 5 10 15 20 25 20.29 20.28 19.28
Etcpak Benchmark: Multi-Threaded - Configuration: ETC2 OpenBenchmarking.org Mpx/s, More Is Better Etcpak 2.0 Benchmark: Multi-Threaded - Configuration: ETC2 a 4484PX px 120 240 360 480 600 577.82 410.73 409.88 1. (CXX) g++ options: -flto -pthread
PyPerformance Benchmark: raytrace OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: raytrace a 4484PX px 40 80 120 160 200 175 182 182
PyPerformance Benchmark: crypto_pyaes OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: crypto_pyaes a 4484PX px 10 20 30 40 50 41.7 43.1 43.3
PyPerformance Benchmark: float OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: float a px 4484PX 12 24 36 48 60 50.7 50.8 51.3
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 px 4484PX a 900 1800 2700 3600 4500 4096 4096 4096
PyPerformance Benchmark: go OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: go a 4484PX px 20 40 60 80 100 77.8 78.6 79.4
FinanceBench Benchmark: Repo OpenMP OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Repo OpenMP a px 4484PX 5K 10K 15K 20K 25K 21418.45 22318.74 22320.33 1. (CXX) g++ options: -O3 -march=native -fopenmp
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 4K a 4484PX px 20 40 60 80 100 102.01 85.20 85.00 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
PyPerformance Benchmark: chaos OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: chaos a px 4484PX 9 18 27 36 45 38.2 39.4 39.7
PyPerformance Benchmark: regex_compile OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: regex_compile a 4484PX px 16 32 48 64 80 69.8 71.7 72.5
Rustls Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256 a 4484PX px 16K 32K 48K 64K 80K 76454.45 57716.64 57688.08 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
Rustls Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 OpenBenchmarking.org handshakes/s, More Is Better Rustls 0.23.17 Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 a 4484PX px 20K 40K 60K 80K 100K 80462.60 59308.75 59206.34 1. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs
PyPerformance Benchmark: pickle_pure_python OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: pickle_pure_python a px 4484PX 40 80 120 160 200 165 168 169
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 px 4484PX a 2K 4K 6K 8K 10K 8192 8192 8192
PyPerformance Benchmark: pathlib OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: pathlib a 4484PX px 4 8 12 16 20 14.2 14.4 14.4
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray Trace Time a 4484PX px 6 12 18 24 30 18.54 25.26 25.33 1. POV-Ray 3.7.0.10.unofficial
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a 4484PX px 0.7664 1.5328 2.2992 3.0656 3.832 2.97612 3.40293 3.40628 MIN: 2.42 MIN: 3.03 MIN: 3.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a px 4484PX 80 160 240 320 400 355.09 244.77 232.26 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 px 4484PX a 3 6 9 12 15 10.45 10.45 10.22
PyPerformance Benchmark: json_loads OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: json_loads a 4484PX px 3 6 9 12 15 12.1 12.4 12.5
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 px 4484PX a 4K 8K 12K 16K 20K 16384 16384 16384
PyPerformance Benchmark: nbody OpenBenchmarking.org Milliseconds, Fewer Is Better PyPerformance 1.11 Benchmark: nbody a px 4484PX 13 26 39 52 65 59.0 59.2 59.5
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.5 Pi Digits To Calculate: 1B px 4484PX a 5 10 15 20 25 18.37 18.38 18.49
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 Video Input: Bosphorus 4K a 4484PX px 8 16 24 32 40 32.57 27.16 26.94 1. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression Test: Decompression Rating a 4484PX px 40K 80K 120K 160K 200K 165916 125698 125605 1. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression Test: Compression Rating a px 4484PX 40K 80K 120K 160K 200K 163859 142213 141263 1. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
ASTC Encoder Preset: Fast OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Fast a 4484PX px 90 180 270 360 450 396.65 278.24 277.30 1. (CXX) g++ options: -O3 -flto -pthread
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a 4484PX px 0.4363 0.8726 1.3089 1.7452 2.1815 1.12573 1.93806 1.93913 MIN: 1.03 MIN: 1.92 MIN: 1.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 px 4484PX a 12 24 36 48 60 52.37 52.30 47.72 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 4K a 4484PX px 50 100 150 200 250 212.52 198.11 194.02 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 px 4484PX a 900 1800 2700 3600 4500 4096 4096 4096
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 px 4484PX a 5 10 15 20 25 19.50 19.49 19.03
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a 4484PX px 70 140 210 280 350 339.02 287.05 286.96 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a 4484PX px 70 140 210 280 350 327.30 243.14 232.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
ASTC Encoder Preset: Medium OpenBenchmarking.org MT/s, More Is Better ASTC Encoder 5.0 Preset: Medium a 4484PX px 30 60 90 120 150 156.22 109.03 108.86 1. (CXX) g++ options: -O3 -flto -pthread
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.5 Pi Digits To Calculate: 500M px 4484PX a 2 4 6 8 10 8.623 8.688 8.772
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 px 4484PX a 6 12 18 24 30 25.94 25.86 24.59
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU px 4484PX a 0.9131 1.8262 2.7393 3.6524 4.5655 2.72942 2.73072 4.05800 MIN: 2.7 MIN: 2.7 MIN: 3.75 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 px 4484PX a 2K 4K 6K 8K 10K 8192 8192 8192
Primesieve Length: 1e12 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 12.6 Length: 1e12 a 4484PX px 3 6 9 12 15 6.347 9.116 9.147 1. (CXX) g++ options: -O3
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU 4484PX px a 2 4 6 8 10 4.11551 4.13321 6.67287 MIN: 4.05 MIN: 4.07 MIN: 6.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 Video Input: Bosphorus 1080p a 4484PX px 30 60 90 120 150 114.45 101.37 101.25 1. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.3 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a 4484PX px 200 400 600 800 1000 842.56 776.12 769.82 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 px 4484PX a 900 1800 2700 3600 4500 4096 4096 4096
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a 4484PX px 0.7903 1.5806 2.3709 3.1612 3.9515 2.41294 3.50840 3.51243 MIN: 2.34 MIN: 3.46 MIN: 3.47 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Phoronix Test Suite v10.8.5