eoy2024

Benchmarks for a future article. AMD EPYC 4484PX 12-Core testing with a Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412086-NE-EOY20243255&grs&rdt.

eoy2024ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen Resolutiona4484PXpxAMD EPYC 4564P 16-Core @ 5.88GHz (16 Cores / 32 Threads)Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS)AMD Device 14d82 x 32GB DRAM-4800MT/s Micron MTC20C2085S1EC48BA1 BC3201GB Micron_7450_MTFDKCC3T2TFS + 960GB SAMSUNG MZ1L2960HCJR-00A07ASPEEDAMD Rembrandt Radeon HD AudioVA24312 x Intel I210Ubuntu 24.046.8.0-11-generic (x86_64)GNOME Shell 45.3X Server 1.21.1.11GCC 13.2.0ext41024x768AMD EPYC 4484PX 12-Core @ 5.66GHz (12 Cores / 24 Threads)6.12.2-061202-generic (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fxIygj/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- a: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa601209- 4484PX: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601209- px: Scaling Governor: amd-pstate-epp performance (Boost: Enabled EPP: performance) - CPU Microcode: 0xa601209Java Details- OpenJDK Runtime Environment (build 21.0.2+13-Ubuntu-2)Python Details- Python 3.12.3Security Details- a: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - 4484PX: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected - px: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

eoy2024litert: NASNet Mobileonednn: IP Shapes 1D - CPUonednn: Convolution Batch Shapes Auto - CPUbyte: System Callcassandra: Writesllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024litert: DeepLab V3litert: Quantized COCO SSD MobileNet v1onednn: IP Shapes 3D - CPUonnx: CaffeNet 12-int8 - CPU - Standardbyte: Pipeonednn: Deconvolution Batch shapes_3d - CPUprimesieve: 1e12astcenc: Thoroughastcenc: Mediumastcenc: Fastastcenc: Exhaustiveastcenc: Very Thoroughprimesieve: 1e13etcpak: Multi-Threaded - ETC2byte: Whetstone Doublellama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512ospray: particle_volume/scivis/real_timerustls: handshake - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384byte: Dhrystone 2onednn: Recurrent Neural Network Training - CPUblender: BMW27 - CPU-Onlyospray: particle_volume/ao/real_timestockfish: Chess Benchmarkonednn: Recurrent Neural Network Inference - CPUblender: Classroom - CPU-Onlyospray: gravity_spheres_volume/dim_512/pathtracer/real_timeopenssl: AES-128-GCMopenssl: AES-256-GCMospray: gravity_spheres_volume/dim_512/scivis/real_timepovray: Trace Timeblender: Pabellon Barcelona - CPU-Onlyblender: Fishy Cat - CPU-Onlyrustls: handshake - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256ospray: gravity_spheres_volume/dim_512/ao/real_timemt-dgemm: Sustained Floating-Point Rateopenssl: ChaCha20openssl: ChaCha20-Poly1305blender: Barbershop - CPU-Onlyllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048onnx: T5 Encoder - CPU - Standardrustls: handshake - TLS13_CHACHA20_POLY1305_SHA256compress-7zip: Decompression Ratingblender: Junkshop - CPU-Onlyonnx: ResNet101_DUC_HDC-12 - CPU - Standardrelion: Basic - CPUstockfish: Chess Benchmarkrenaissance: Genetic Algorithm Using Jenetics + Futuressvt-av1: Preset 3 - Bosphorus 4Kbuild2: Time To Compilexnnpack: FP16MobileNetV1xnnpack: FP32MobileNetV3Smallsimdjson: PartialTweetsx265: Bosphorus 4Ksvt-av1: Preset 3 - Beauty 4K 10-bitsvt-av1: Preset 8 - Bosphorus 4Ksimdjson: DistinctUserIDsvt-av1: Preset 5 - Bosphorus 4Kospray: particle_volume/pathtracer/real_timexnnpack: FP32MobileNetV3Largenamd: ATPase with 327,506 Atomsonnx: GPT-2 - CPU - Standardsvt-av1: Preset 8 - Bosphorus 1080pxnnpack: FP16MobileNetV3Smallrustls: handshake-ticket - TLS13_CHACHA20_POLY1305_SHA256xnnpack: QS8MobileNetV2rustls: handshake-resume - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardsvt-av1: Preset 5 - Beauty 4K 10-bitrustls: handshake-ticket - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384whisperfile: Smallrustls: handshake-resume - TLS13_CHACHA20_POLY1305_SHA256svt-av1: Preset 3 - Bosphorus 1080pnamd: STMV with 1,066,628 Atomscompress-7zip: Compression Ratingrustls: handshake-resume - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384whisper-cpp: ggml-medium.en - 2016 State of the Unionsvt-av1: Preset 5 - Bosphorus 1080ppyperformance: async_tree_ioonnx: fcn-resnet101-11 - CPU - Standardsvt-av1: Preset 8 - Beauty 4K 10-bitbuild-eigen: Time To Compilerustls: handshake-ticket - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256onednn: Deconvolution Batch shapes_1d - CPUonnx: ArcFace ResNet-100 - CPU - Standardrenaissance: Gaussian Mixture Modelx265: Bosphorus 1080pwhisperfile: Mediumonnx: super-resolution-10 - CPU - Standardrenaissance: Apache Spark PageRankcouchdb: 300 - 1000 - 30whisperfile: Tinysimdjson: Kostyanumpy: couchdb: 500 - 1000 - 30renaissance: Scala Dottycouchdb: 300 - 3000 - 30quantlib: XXScp2k: H20-64renaissance: Akka Unbalanced Cobwebbed Treellama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128couchdb: 100 - 3000 - 30onnx: ResNet50 v1-12-int8 - CPU - Standardcouchdb: 500 - 3000 - 30svt-av1: Preset 13 - Bosphorus 4Kxnnpack: FP32MobileNetV2whisper-cpp: ggml-small.en - 2016 State of the Unionsvt-av1: Preset 13 - Bosphorus 1080prenaissance: Rand Forestpyperformance: asyncio_tcp_sslcouchdb: 100 - 1000 - 30onnx: ZFNet-512 - CPU - Standardrenaissance: Apache Spark Bayesquantlib: Srenaissance: Finagle HTTP Requestsgromacs: water_GMX50_bareonnx: bertsquad-12 - CPU - Standardsvt-av1: Preset 13 - Beauty 4K 10-bitwhisper-cpp: ggml-base.en - 2016 State of the Unionllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024cp2k: H20-256litert: Inception V4llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128renaissance: ALS Movie Lensfinancebench: Bonds OpenMPpyperformance: python_startupllamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16gcrypt: openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPUxnnpack: FP16MobileNetV2renaissance: Savina Reactors.IOllamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128pyperformance: gc_collectfinancebench: Repo OpenMPopenvino-genai: Gemma-7b-int4-ov - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024xnnpack: FP16MobileNetV3Largepyperformance: raytracepyperformance: chaospyperformance: regex_compilepyperformance: crypto_pyaesopenvino-genai: Falcon-7b-instruct-int4-ov - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128simdjson: TopTweetllamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16pyperformance: json_loadsonnx: yolov4 - CPU - Standardlitert: Mobilenet Quantllamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 128cp2k: Fayalite-FISTpyperformance: xml_etreellama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128litert: Mobilenet Floatrenaissance: In-Memory Database Shootoutllamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16pyperformance: pickle_pure_pythonpyperformance: django_templatellamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16pyperformance: asyncio_websocketspyperformance: gollamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128y-cruncher: 500Mxnnpack: FP32MobileNetV1litert: SqueezeNetpyperformance: pathlibpyperformance: floatllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512pyperformance: nbodyy-cruncher: 1Bsimdjson: LargeRandlitert: Inception ResNet V2llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 2048llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 1024llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 512llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 256llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 2048llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 1024llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 512llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 256llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Gemma-7b-int4-ov - CPU - Time To First Tokenonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: ZFNet-512 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardrenaissance: Apache Spark ALSa4484PXpx169361.125736.6728749140426.6271333355.093579.672129.524.058636.31848806257.12.412946.34720.3025156.2217396.64951.68442.74178.498577.817343491.9327.38.98486423535.681866536062.71372.0353.559.0091746507038700.859143.368.82093104784522170971727517007.5878918.542166.1271.3580462.67.639441141.19410413058849505092393529340506.2279.04156.45376454.4516591673.561.54196944.2754752796732.89.5992.05311439799.7632.571.422102.00510.4634.538236.24518102.79632134.596339.023920404263.458443563852.5747.06916.5041553632.14195.41642388077.6929.5730.756561638591820810.21700.91101.9717553.216712.46858.65526203322.9761242.45373399.5114.45534.919141.1172412.2106.1341.709355.97775.75148.049477.0367.8313.43258.1914403.847.72232.188390.597511.775212.521495245.07838842.558414.464569.929102.331490.012.74762319.41.69215.589918.58887.4897370.85592.85721477.826.289805.733061.218755.7724.59162.12519.2811903506.410.4767721418.4453129.8370.7669.26149817538.269.841.712.936.8810.461.7812.111.0552823.171.9994.03235.87.241211.483256.119.0316520.710.2231577.820.138.77212521794.1114.250.763.0962.9768.45918.4851.8319530.21228861443072153632768163848192409632768163848192409632768163848192409651.8655.9377.3486.06101.72106.6221.2429648.5227.086012.5589823.553310.8751.5708464.1416.391129.7698590.45237.427768057.561.938064.1155130761218.9174960232.262343.381420.152.73072941.40133443359.23.50849.11614.17109.0265278.24451.18871.9412110.608410.726244075.3243.146.44913306153.21346521770.31898.3674.086.5277633702298965.015197.26.4119876496336760711602918705.5488825.264226.3496.6759308.755.63122842.7306429710523569068816544020679.34222.75208.17457716.6412569897.011.17627729.445267546904.07.684111.651138380910.127.161.18885.20110.7629.094199.02315152.38124159.71287.047779344296.247173035330.2140.09355.6021329363.1173.38197333882.9225.4460.651191412631586292.42809.7896988.4156662.8109310.96767.3642282729.643.4029337.38323860.6101.37473.55091125.1722138.1117.56637.134626.11745.59164.468428.6406.1212.116953.0054038.452.3253.99356.409559.346198.1121365268.23891776.115422.059075.901110.94513.211.86472492.21.57714.512217.40692.7093366.57628.10422083.327.599378.834600.7734386.0825.86171.02320.2812173655.810.9169922320.33203110.2369.1166.85146718239.771.743.113.47.1110.821.8312.410.7338848.9432.0592.2136.87.411244.73241.519.491692110.4532178.620.398.68812571809.1814.451.363.863.6168.259.518.3791.8419477.81228861443072153632768163848192409632768163848192409632768163848192409649.3158.9174.6593.0197.79121.4824.9402850.1417.988732.8054426.7478355.7511.0618868.90514.802879.0132293.16056.258157931.641.939134.1332130701622.8173946244.772359.991417.352.72942937.77833381363.13.512439.14714.1464108.8588277.29941.18621.9391110.709409.875244131232.866.52304304060.281340340196.61895.6873.166.5220633871595966.013197.536.407476184405610709026564805.614725.328224.6497.0959206.345.71084842.0128319701989745068678955550678.4208.99206.09157688.0812560597.11.1705733.0242973396920.77.646113.7813868378.3526.941.18484.9988.9728.824197.215742.35379157.893286.962798342775.297233038723.4843.3625.5511340712.85167.89219333574.325.4470.654481422131572010.68809.48988.276562.7963810.85567.0762292879.443.4062837.10483815.2101.25475.51084125.0762229.7119.34938.718285.45831.42164.812436.2408.48312.105752.7244002.352.37254.733356.194560.7194.0241368266.81425769.818453.259076.389110.892474.911.8392483.11.57514.574717.35593.4546366.35631.3122752.427.89275.734896.8359386.0925.94163.83920.2912483676.010.9370622318.73828110.2467.9566.52152718239.472.543.313.417.1210.511.8412.510.7127849.2092.0594.89636.57.441244.513175.619.516821.210.4532279.420.518.62312721821.3514.450.863.7963.4168.8159.218.3651.8419490.71228861443072153632768163848192409632768163848192409632768163848192409649.2858.8674.549397.61122.323.0604854.3347.994862.8069526.9485357.6021.06668.61044.851429.0168793.34416.33034OpenBenchmarking.org

LiteRT

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet Mobilea4484PXpx4K8K12K16K20K16936.008057.567931.64

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUa4484PXpx0.43630.87261.30891.74522.18151.125731.938061.93913MIN: 1.03MIN: 1.92MIN: 1.911. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUa4484PXpx2468106.672874.115514.13321MIN: 6.2MIN: 4.05MIN: 4.071. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

BYTE Unix Benchmark

Computational Test: System Call

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: System Calla4484PXpx11M22M33M44M55M49140426.630761218.930701622.81. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

Apache Cassandra

Test: Writes

OpenBenchmarking.orgOp/s, More Is BetterApache Cassandra 5.0Test: Writesa4484PXpx60K120K180K240K300K271333174960173946

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024a4484PXpx80160240320400355.09232.26244.771. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

LiteRT

Model: DeepLab V3

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: DeepLab V3a4484PXpx80016002400320040003579.672343.382359.99

LiteRT

Model: Quantized COCO SSD MobileNet v1

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Quantized COCO SSD MobileNet v1a4484PXpx50010001500200025002129.521420.151417.35

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUa4484PXpx0.91311.82622.73933.65244.56554.058002.730722.72942MIN: 3.75MIN: 2.7MIN: 2.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: Standarda4484PXpx2004006008001000636.32941.40937.781. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

BYTE Unix Benchmark

Computational Test: Pipe

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Pipea4484PXpx10M20M30M40M50M48806257.133443359.233381363.11. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUa4484PXpx0.79031.58062.37093.16123.95152.412943.508403.51243MIN: 2.34MIN: 3.46MIN: 3.471. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Primesieve

Length: 1e12

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 12.6Length: 1e12a4484PXpx36912156.3479.1169.1471. (CXX) g++ options: -O3

ASTC Encoder

Preset: Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 5.0Preset: Thorougha4484PXpx51015202520.3014.1714.151. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Medium

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 5.0Preset: Mediuma4484PXpx306090120150156.22109.03108.861. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Fast

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 5.0Preset: Fasta4484PXpx90180270360450396.65278.24277.301. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Exhaustive

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 5.0Preset: Exhaustivea4484PXpx0.3790.7581.1371.5161.8951.68441.18871.18621. (CXX) g++ options: -O3 -flto -pthread

ASTC Encoder

Preset: Very Thorough

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 5.0Preset: Very Thorougha4484PXpx0.61671.23341.85012.46683.08352.74101.94121.93911. (CXX) g++ options: -O3 -flto -pthread

Primesieve

Length: 1e13

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 12.6Length: 1e13a4484PXpx2040608010078.50110.61110.711. (CXX) g++ options: -O3

Etcpak

Benchmark: Multi-Threaded - Configuration: ETC2

OpenBenchmarking.orgMpx/s, More Is BetterEtcpak 2.0Benchmark: Multi-Threaded - Configuration: ETC2a4484PXpx120240360480600577.82410.73409.881. (CXX) g++ options: -flto -pthread

BYTE Unix Benchmark

Computational Test: Whetstone Double

OpenBenchmarking.orgMWIPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Whetstone Doublea4484PXpx70K140K210K280K350K343491.9244075.3244131.01. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512a4484PXpx70140210280350327.30243.14232.861. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: particle_volume/scivis/real_timea4484PXpx36912158.984866.449136.52304

Rustls

Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384a4484PXpx90K180K270K360K450K423535.68306153.20304060.281. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

BYTE Unix Benchmark

Computational Test: Dhrystone 2

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Dhrystone 2a4484PXpx400M800M1200M1600M2000M1866536062.71346521770.31340340196.61. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUa4484PXpx4008001200160020001372.031898.361895.68MIN: 1342.06MIN: 1894.26MIN: 1892.591. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: BMW27 - Compute: CPU-Onlya4484PXpx163248648053.5574.0873.16

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: particle_volume/ao/real_timea4484PXpx36912159.009176.527766.52206

Stockfish

Chess Benchmark

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfishChess Benchmarka4484PXpx10M20M30M40M50M4650703833702298338715951. Stockfish 16 by the Stockfish developers (see AUTHORS file)

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUa4484PXpx2004006008001000700.86965.02966.01MIN: 679.89MIN: 963.27MIN: 963.431. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Classroom - Compute: CPU-Onlya4484PXpx4080120160200143.36197.20197.53

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timea4484PXpx2468108.820936.411986.40740

OpenSSL

Algorithm: AES-128-GCM

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSLAlgorithm: AES-128-GCMa4484PXpx20000M40000M60000M80000M100000M10478452217076496336760761844056101. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8

OpenSSL

Algorithm: AES-256-GCM

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSLAlgorithm: AES-256-GCMa4484PXpx20000M40000M60000M80000M100000M9717275170071160291870709026564801. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/scivis/real_timea4484PXpx2468107.587895.548885.61470

POV-Ray

Trace Time

OpenBenchmarking.orgSeconds, Fewer Is BetterPOV-RayTrace Timea4484PXpx61218243018.5425.2625.331. POV-Ray 3.7.0.10.unofficial

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Pabellon Barcelona - Compute: CPU-Onlya4484PXpx50100150200250166.12226.34224.64

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Fishy Cat - Compute: CPU-Onlya4484PXpx2040608010071.3596.6797.09

Rustls

Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256a4484PXpx20K40K60K80K100K80462.6059308.7559206.341. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: gravity_spheres_volume/dim_512/ao/real_timea4484PXpx2468107.639445.631225.71084

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point Ratea4484PXpx20040060080010001141.19842.73842.011. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas

OpenSSL

Algorithm: ChaCha20

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSLAlgorithm: ChaCha20a4484PXpx30000M60000M90000M120000M150000M13058849505097105235690970198974501. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8

OpenSSL

Algorithm: ChaCha20-Poly1305

OpenBenchmarking.orgbyte/s, More Is BetterOpenSSLAlgorithm: ChaCha20-Poly1305a4484PXpx20000M40000M60000M80000M100000M9239352934068816544020686789555501. OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024) - Additional Parameters: -engine qatengine -async_jobs 8

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Barbershop - Compute: CPU-Onlya4484PXpx150300450600750506.20679.34678.40

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048a4484PXpx60120180240300279.04222.75208.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: Standarda4484PXpx50100150200250156.45208.17206.091. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Rustls

Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256a4484PXpx16K32K48K64K80K76454.4557716.6457688.081. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip CompressionTest: Decompression Ratinga4484PXpx40K80K120K160K200K1659161256981256051. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20

Blender

Blend File: Junkshop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.3Blend File: Junkshop - Compute: CPU-Onlya4484PXpx2040608010073.5697.0197.10

ONNX Runtime

Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standarda4484PXpx0.34690.69381.04071.38761.73451.541961.176271.170501. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 5.0Test: Basic - Device: CPUa4484PXpx2004006008001000944.27729.40733.021. (CXX) g++ options: -fPIC -std=c++14 -fopenmp -O3 -rdynamic -lfftw3f -lfftw3 -ldl -ltiff -lpng -ljpeg -lmpi_cxx -lmpi

Stockfish

Chess Benchmark

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 17Chess Benchmarka4484PXpx12M24M36M48M60M5475279645267546429733961. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -msse -msse3 -mpopcnt -mavx2 -mbmi -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto-partition=one -flto=jobserver

Renaissance

Test: Genetic Algorithm Using Jenetics + Futures

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Genetic Algorithm Using Jenetics + Futuresa4484PXpx2004006008001000732.8904.0920.7MIN: 713.67 / MAX: 813.49MIN: 886.83 / MAX: 919.31MIN: 888.75 / MAX: 934.44

SVT-AV1

Encoder Mode: Preset 3 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 3 - Input: Bosphorus 4Ka4484PXpx36912159.5907.6847.6461. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Build2

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterBuild2 0.17Time To Compilea4484PXpx30609012015092.05111.65113.78

XNNPACK

Model: FP16MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1a4484PXpx300600900120015001143138313861. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Smalla4484PXpx20040060080010009798098371. (CXX) g++ options: -O3 -lrt -lm

simdjson

Throughput Test: PartialTweets

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: PartialTweetsa4484PXpx36912159.7610.108.351. (CXX) g++ options: -O3 -lrt

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265Video Input: Bosphorus 4Ka4484PXpx81624324032.5727.1626.941. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6

SVT-AV1

Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 3 - Input: Beauty 4K 10-bita4484PXpx0.320.640.961.281.61.4221.1881.1841. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 8 - Input: Bosphorus 4Ka4484PXpx20406080100102.0185.2085.001. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

simdjson

Throughput Test: DistinctUserID

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: DistinctUserIDa4484PXpx369121510.4610.768.971. (CXX) g++ options: -O3 -lrt

SVT-AV1

Encoder Mode: Preset 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 5 - Input: Bosphorus 4Ka4484PXpx81624324034.5429.0928.821. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 3.2Benchmark: particle_volume/pathtracer/real_timea4484PXpx50100150200250236.25199.02197.20

XNNPACK

Model: FP32MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Largea4484PXpx4008001200160020001810151515741. (CXX) g++ options: -O3 -lrt -lm

NAMD

Input: ATPase with 327,506 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: ATPase with 327,506 Atomsa4484PXpx0.62921.25841.88762.51683.1462.796322.381242.35379

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: GPT-2 - Device: CPU - Executor: Standarda4484PXpx4080120160200134.60159.71157.891. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 8 - Input: Bosphorus 1080pa4484PXpx70140210280350339.02287.05286.961. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

XNNPACK

Model: FP16MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Smalla4484PXpx20040060080010009207797981. (CXX) g++ options: -O3 -lrt -lm

Rustls

Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256a4484PXpx90K180K270K360K450K404263.45344296.24342775.291. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

XNNPACK

Model: QS8MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2a4484PXpx20040060080010008447177231. (CXX) g++ options: -O3 -lrt -lm

Rustls

Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256a4484PXpx800K1600K2400K3200K4000K3563852.573035330.213038723.481. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standarda4484PXpx112233445547.0740.0943.361. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

SVT-AV1

Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 5 - Input: Beauty 4K 10-bita4484PXpx2468106.5045.6025.5511. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Rustls

Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384a4484PXpx300K600K900K1200K1500K1553632.141329363.101340712.851. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

Whisperfile

Model Size: Small

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: Smalla4484PXpx4080120160200195.42173.38167.89

Rustls

Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256a4484PXpx80K160K240K320K400K388077.69333882.92333574.301. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

SVT-AV1

Encoder Mode: Preset 3 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 3 - Input: Bosphorus 1080pa4484PXpx71421283529.5725.4525.451. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

NAMD

Input: STMV with 1,066,628 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: STMV with 1,066,628 Atomsa4484PXpx0.17020.34040.51060.68080.8510.756560.651190.65448

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip CompressionTest: Compression Ratinga4484PXpx40K80K120K160K200K1638591412631422131. 7-Zip 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20

Rustls

Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384a4484PXpx400K800K1200K1600K2000K1820810.211586292.421572010.681. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

Whisper.cpp

Model: ggml-medium.en - Input: 2016 State of the Union

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the Uniona4484PXpx2004006008001000700.91809.79809.491. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

SVT-AV1

Encoder Mode: Preset 5 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 5 - Input: Bosphorus 1080pa4484PXpx20406080100101.9788.4288.271. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

PyPerformance

Benchmark: async_tree_io

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: async_tree_ioa4484PXpx160320480640800755666656

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: Standarda4484PXpx0.72381.44762.17142.89523.6193.216702.810932.796381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

SVT-AV1

Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 8 - Input: Beauty 4K 10-bita4484PXpx369121512.4710.9710.861. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Timed Eigen Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Eigen Compilation 3.4.0Time To Compilea4484PXpx153045607558.6667.3667.08

Rustls

Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

OpenBenchmarking.orghandshakes/s, More Is BetterRustls 0.23.17Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256a4484PXpx600K1200K1800K2400K3000K2620332.002282729.642292879.441. (CC) gcc options: -m64 -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -pie -nodefaultlibs

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUa4484PXpx0.76641.53282.29923.06563.8322.976123.402933.40628MIN: 2.42MIN: 3.03MIN: 3.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ArcFace ResNet-100 - Device: CPU - Executor: Standarda4484PXpx102030405042.4537.3837.101. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Renaissance

Test: Gaussian Mixture Model

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Gaussian Mixture Modela4484PXpx80016002400320040003399.53860.63815.2MIN: 2471.52MIN: 2758.89 / MAX: 3860.61MIN: 2749.56 / MAX: 3815.24

x265

Video Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is Betterx265Video Input: Bosphorus 1080pa4484PXpx306090120150114.45101.37101.251. x265 [info]: HEVC encoder version 3.5+1-f0c1022b6

Whisperfile

Model Size: Medium

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: Mediuma4484PXpx120240360480600534.92473.55475.51

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: Standarda4484PXpx306090120150141.12125.17125.081. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Renaissance

Test: Apache Spark PageRank

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Apache Spark PageRanka4484PXpx50010001500200025002412.22138.12229.7MIN: 1691.04MIN: 1499.64MIN: 1612.96 / MAX: 2229.74

Apache CouchDB

Bulk Size: 300 - Inserts: 1000 - Rounds: 30

OpenBenchmarking.orgSeconds, Fewer Is BetterApache CouchDB 3.4.1Bulk Size: 300 - Inserts: 1000 - Rounds: 30a4484PXpx306090120150106.13117.57119.351. (CXX) g++ options: -flto -lstdc++ -shared -lei

Whisperfile

Model Size: Tiny

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: Tinya4484PXpx102030405041.7137.1338.72

simdjson

Throughput Test: Kostya

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: Kostyaa4484PXpx2468105.976.115.451. (CXX) g++ options: -O3 -lrt

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmarka4484PXpx2004006008001000775.75745.59831.42

Apache CouchDB

Bulk Size: 500 - Inserts: 1000 - Rounds: 30

OpenBenchmarking.orgSeconds, Fewer Is BetterApache CouchDB 3.4.1Bulk Size: 500 - Inserts: 1000 - Rounds: 30a4484PXpx4080120160200148.05164.47164.811. (CXX) g++ options: -flto -lstdc++ -shared -lei

Renaissance

Test: Scala Dotty

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Scala Dottya4484PXpx100200300400500477.0428.6436.2MIN: 371.54 / MAX: 736.5MIN: 378.22 / MAX: 628.77MIN: 380.62 / MAX: 721.56

Apache CouchDB

Bulk Size: 300 - Inserts: 3000 - Rounds: 30

OpenBenchmarking.orgSeconds, Fewer Is BetterApache CouchDB 3.4.1Bulk Size: 300 - Inserts: 3000 - Rounds: 30a4484PXpx90180270360450367.83406.12408.481. (CXX) g++ options: -flto -lstdc++ -shared -lei

QuantLib

Size: XXS

OpenBenchmarking.orgtasks/s, More Is BetterQuantLib 1.35-devSize: XXSa4484PXpx369121513.4312.1212.111. (CXX) g++ options: -O3 -march=native -fPIE -pie

CP2K Molecular Dynamics

Input: H20-64

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: H20-64a4484PXpx132639526558.1953.0152.721. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

Renaissance

Test: Akka Unbalanced Cobwebbed Tree

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Akka Unbalanced Cobwebbed Treea4484PXpx90018002700360045004403.84038.44002.3MAX: 5719.11MIN: 4038.36 / MAX: 5089.28MIN: 4002.27 / MAX: 4983.72

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128a4484PXpx122436486047.7252.3052.371. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Apache CouchDB

Bulk Size: 100 - Inserts: 3000 - Rounds: 30

OpenBenchmarking.orgSeconds, Fewer Is BetterApache CouchDB 3.4.1Bulk Size: 100 - Inserts: 3000 - Rounds: 30a4484PXpx60120180240300232.19253.99254.731. (CXX) g++ options: -flto -lstdc++ -shared -lei

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standarda4484PXpx80160240320400390.60356.41356.191. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Apache CouchDB

Bulk Size: 500 - Inserts: 3000 - Rounds: 30

OpenBenchmarking.orgSeconds, Fewer Is BetterApache CouchDB 3.4.1Bulk Size: 500 - Inserts: 3000 - Rounds: 30a4484PXpx120240360480600511.78559.35560.701. (CXX) g++ options: -flto -lstdc++ -shared -lei

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 13 - Input: Bosphorus 4Ka4484PXpx50100150200250212.52198.11194.021. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

XNNPACK

Model: FP32MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2a4484PXpx300600900120015001495136513681. (CXX) g++ options: -O3 -lrt -lm

Whisper.cpp

Model: ggml-small.en - Input: 2016 State of the Union

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the Uniona4484PXpx60120180240300245.08268.24266.811. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 13 - Input: Bosphorus 1080pa4484PXpx2004006008001000842.56776.12769.821. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Renaissance

Test: Random Forest

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Random Foresta4484PXpx100200300400500414.4422.0453.2MIN: 322.79 / MAX: 466.1MIN: 357.91 / MAX: 497.55MIN: 352.31 / MAX: 513.31

PyPerformance

Benchmark: asyncio_tcp_ssl

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: asyncio_tcp_ssla4484PXpx140280420560700645590590

Apache CouchDB

Bulk Size: 100 - Inserts: 1000 - Rounds: 30

OpenBenchmarking.orgSeconds, Fewer Is BetterApache CouchDB 3.4.1Bulk Size: 100 - Inserts: 1000 - Rounds: 30a4484PXpx2040608010069.9375.9076.391. (CXX) g++ options: -flto -lstdc++ -shared -lei

ONNX Runtime

Model: ZFNet-512 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: Standarda4484PXpx20406080100102.33110.94110.891. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Renaissance

Test: Apache Spark Bayes

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Apache Spark Bayesa4484PXpx110220330440550490.0513.2474.9MIN: 459.29 / MAX: 580.9MIN: 453.66 / MAX: 554.7MIN: 454.77 / MAX: 514.32

QuantLib

Size: S

OpenBenchmarking.orgtasks/s, More Is BetterQuantLib 1.35-devSize: Sa4484PXpx369121512.7511.8611.841. (CXX) g++ options: -O3 -march=native -fPIE -pie

Renaissance

Test: Finagle HTTP Requests

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Finagle HTTP Requestsa4484PXpx50010001500200025002319.42492.22483.1MIN: 1832.84MIN: 1947.63MIN: 1933.43

GROMACS

Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACSInput: water_GMX50_barea4484PXpx0.38070.76141.14211.52281.90351.6921.5771.5751. GROMACS version: 2023.3-Ubuntu_2023.3_1ubuntu3

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: bertsquad-12 - Device: CPU - Executor: Standarda4484PXpx4812162015.5914.5114.571. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

SVT-AV1

Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.3Encoder Mode: Preset 13 - Input: Beauty 4K 10-bita4484PXpx51015202518.5917.4117.361. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Whisper.cpp

Model: ggml-base.en - Input: 2016 State of the Union

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-base.en - Input: 2016 State of the Uniona4484PXpx2040608010087.4992.7193.451. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024a4484PXpx163248648070.8566.5766.351. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

CP2K Molecular Dynamics

Input: H20-256

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: H20-256a4484PXpx140280420560700592.86628.10631.311. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

LiteRT

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4a4484PXpx5K10K15K20K25K21477.822083.322752.4

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128a4484PXpx71421283526.2827.5927.80

Renaissance

Test: ALS Movie Lens

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: ALS Movie Lensa4484PXpx2K4K6K8K10K9805.79378.89275.7MIN: 9253.4 / MAX: 10057.61MIN: 8718.36 / MAX: 9413.7MIN: 8821.09 / MAX: 9495.91

FinanceBench

Benchmark: Bonds OpenMP

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Bonds OpenMPa4484PXpx7K14K21K28K35K33061.2234600.7734896.841. (CXX) g++ options: -O3 -march=native -fopenmp

PyPerformance

Benchmark: python_startup

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: python_startupa4484PXpx2468105.776.086.09

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16a4484PXpx61218243024.5925.8625.94

Gcrypt Library

OpenBenchmarking.orgSeconds, Fewer Is BetterGcrypt Library 1.10.3a4484PXpx4080120160200162.13171.02163.841. (CC) gcc options: -O2 -fvisibility=hidden

OpenVINO GenAI

Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPUa4484PXpx51015202519.2820.2820.29

XNNPACK

Model: FP16MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2a4484PXpx300600900120015001190121712481. (CXX) g++ options: -O3 -lrt -lm

Renaissance

Test: Savina Reactors.IO

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: Savina Reactors.IOa4484PXpx80016002400320040003506.43655.83676.0MIN: 3506.38 / MAX: 4329.37MIN: 3655.76 / MAX: 4484.97MAX: 4536.84

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128a4484PXpx369121510.4710.9110.93

PyPerformance

Benchmark: gc_collect

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: gc_collecta4484PXpx150300450600750677699706

FinanceBench

Benchmark: Repo OpenMP

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Repo OpenMPa4484PXpx5K10K15K20K25K21418.4522320.3322318.741. (CXX) g++ options: -O3 -march=native -fopenmp

OpenVINO GenAI

Model: Gemma-7b-int4-ov - Device: CPU

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPUa4484PXpx36912159.8310.2310.24

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512a4484PXpx163248648070.7669.1167.951. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024a4484PXpx153045607569.2666.8566.521. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

XNNPACK

Model: FP16MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Largea4484PXpx300600900120015001498146715271. (CXX) g++ options: -O3 -lrt -lm

PyPerformance

Benchmark: raytrace

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: raytracea4484PXpx4080120160200175182182

PyPerformance

Benchmark: chaos

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: chaosa4484PXpx91827364538.239.739.4

PyPerformance

Benchmark: regex_compile

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: regex_compilea4484PXpx163248648069.871.772.5

PyPerformance

Benchmark: crypto_pyaes

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: crypto_pyaesa4484PXpx102030405041.743.143.3

OpenVINO GenAI

Model: Falcon-7b-instruct-int4-ov - Device: CPU

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPUa4484PXpx369121512.9313.4013.41

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128a4484PXpx2468106.887.117.121. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

simdjson

Throughput Test: TopTweet

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: TopTweeta4484PXpx369121510.4610.8210.511. (CXX) g++ options: -O3 -lrt

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16a4484PXpx0.4140.8281.2421.6562.071.781.831.84

PyPerformance

Benchmark: json_loads

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: json_loadsa4484PXpx369121512.112.412.5

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: Standarda4484PXpx369121511.0610.7310.711. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LiteRT

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Quanta4484PXpx2004006008001000823.17848.94849.21

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128a4484PXpx0.46130.92261.38391.84522.30651.992.052.05

CP2K Molecular Dynamics

Input: Fayalite-FIST

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: Fayalite-FISTa4484PXpx2040608010094.0392.2194.901. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

PyPerformance

Benchmark: xml_etree

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: xml_etreea4484PXpx81624324035.836.836.5

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128a4484PXpx2468107.247.417.441. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

LiteRT

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Floata4484PXpx300600900120015001211.481244.701244.51

Renaissance

Test: In-Memory Database Shootout

OpenBenchmarking.orgms, Fewer Is BetterRenaissance 0.16Test: In-Memory Database Shootouta4484PXpx70014002100280035003256.13241.53175.6MIN: 3019.89 / MAX: 3599.5MIN: 3037.03 / MAX: 3491.91MIN: 2896.06 / MAX: 3367.44

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16a4484PXpx51015202519.0319.4919.50

PyPerformance

Benchmark: pickle_pure_python

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: pickle_pure_pythona4484PXpx4080120160200165169168

PyPerformance

Benchmark: django_template

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: django_templatea4484PXpx51015202520.721.021.2

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16a4484PXpx369121510.2210.4510.45

PyPerformance

Benchmark: asyncio_websockets

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: asyncio_websocketsa4484PXpx70140210280350315321322

PyPerformance

Benchmark: go

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: goa4484PXpx2040608010077.878.679.4

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128a4484PXpx51015202520.1320.3920.51

Y-Cruncher

Pi Digits To Calculate: 500M

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 500Ma4484PXpx2468108.7728.6888.623

XNNPACK

Model: FP32MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1a4484PXpx300600900120015001252125712721. (CXX) g++ options: -O3 -lrt -lm

LiteRT

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNeta4484PXpx4008001200160020001794.111809.181821.35

PyPerformance

Benchmark: pathlib

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: pathliba4484PXpx4812162014.214.414.4

PyPerformance

Benchmark: float

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: floata4484PXpx122436486050.751.350.8

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048a4484PXpx142842567063.0963.8063.791. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048a4484PXpx142842567062.9763.6163.411. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512a4484PXpx153045607568.4068.2068.811. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

PyPerformance

Benchmark: nbody

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyPerformance 1.11Benchmark: nbodya4484PXpx132639526559.059.559.2

Y-Cruncher

Pi Digits To Calculate: 1B

OpenBenchmarking.orgSeconds, Fewer Is BetterY-Cruncher 0.8.5Pi Digits To Calculate: 1Ba4484PXpx51015202518.4918.3818.37

simdjson

Throughput Test: LargeRandom

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 3.10Throughput Test: LargeRandoma4484PXpx0.4140.8281.2421.6562.071.831.841.841. (CXX) g++ options: -O3 -lrt

LiteRT

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception ResNet V2a4484PXpx4K8K12K16K20K19530.219477.819490.7

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048a4484PXpx3K6K9K12K15K122881228812288

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024a4484PXpx13002600390052006500614461446144

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512a4484PXpx7001400210028003500307230723072

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256a4484PXpx30060090012001500153615361536

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048a4484PXpx7K14K21K28K35K327683276832768

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024a4484PXpx4K8K12K16K20K163841638416384

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512a4484PXpx2K4K6K8K10K819281928192

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256a4484PXpx9001800270036004500409640964096

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048a4484PXpx7K14K21K28K35K327683276832768

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024a4484PXpx4K8K12K16K20K163841638416384

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512a4484PXpx2K4K6K8K10K819281928192

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256a4484PXpx9001800270036004500409640964096

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048a4484PXpx7K14K21K28K35K327683276832768

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024a4484PXpx4K8K12K16K20K163841638416384

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512a4484PXpx2K4K6K8K10K819281928192

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256a4484PXpx9001800270036004500409640964096

OpenVINO GenAI

Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Tokena4484PXpx122436486051.8649.3149.28

OpenVINO GenAI

Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Tokena4484PXpx132639526555.9358.9158.86

OpenVINO GenAI

Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Tokena4484PXpx2040608010077.3474.6574.54

OpenVINO GenAI

Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Tokena4484PXpx2040608010086.0693.0193.00

OpenVINO GenAI

Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Tokena4484PXpx20406080100101.7297.7997.61

OpenVINO GenAI

Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPU - Time To First Tokena4484PXpx306090120150106.62121.48122.30

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standarda4484PXpx61218243021.2424.9423.061. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standarda4484PXpx2004006008001000648.52850.14854.331. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: super-resolution-10 - Device: CPU - Executor: Standarda4484PXpx2468107.086017.988737.994861. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standarda4484PXpx0.63161.26321.89482.52643.1582.558982.805442.806951. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ArcFace ResNet-100 - Device: CPU - Executor: Standarda4484PXpx61218243023.5526.7526.951. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: fcn-resnet101-11 - Device: CPU - Executor: Standarda4484PXpx80160240320400310.88355.75357.601. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: CaffeNet 12-int8 - Device: CPU - Executor: Standarda4484PXpx0.35340.70681.06021.41361.7671.570841.061881.066001. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: bertsquad-12 - Device: CPU - Executor: Standarda4484PXpx153045607564.1468.9168.611. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: T5 Encoder - Device: CPU - Executor: Standarda4484PXpx2468106.391124.802874.851421. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ZFNet-512 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ZFNet-512 - Device: CPU - Executor: Standarda4484PXpx36912159.769859.013229.016871. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: yolov4 - Device: CPU - Executor: Standarda4484PXpx2040608010090.4593.1693.341. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: GPT-2 - Device: CPU - Executor: Standarda4484PXpx2468107.427766.258156.330341. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.5