tgl tgl

Intel Core i7-1185G7 testing with a Dell XPS 13 9310 0DXP1F (3.7.0 BIOS) and Intel Xe TGL GT2 15GB on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402202-NE-TGLTGL07808&sro&grs.

tgl tglProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLCompilerFile-SystemScreen ResolutionabcIntel Core i7-1185G7 @ 4.80GHz (4 Cores / 8 Threads)Dell XPS 13 9310 0DXP1F (3.7.0 BIOS)Intel Tiger Lake-LP8 x 2GB LPDDR4-4267MT/sMicron 2300 NVMe 512GBIntel Xe TGL GT2 15GB (1350MHz)Realtek ALC289Intel Wi-Fi 6 AX201Ubuntu 23.106.7.0-060700rc5-generic (x86_64)GNOME Shell 45.1X Server + Wayland4.6 Mesa 24.0~git2312220600.68c53e~oibaf~m (git-68c53ec 2023-12-22 mantic-oibaf-ppa)OpenCL 3.0GCC 13.2.0ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0xb4 - Thermald 2.5.4Python Details- Python 3.11.6Security Details- gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

tgl tgllibplacebo: hdr_peakdetectdav1d: Summer Nature 1080ponnx: bertsquad-12 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Parallelonnx: yolov4 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standarddav1d: Summer Nature 4Konnx: ArcFace ResNet-100 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: GPT-2 - CPU - Parallelonnx: yolov4 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: GPT-2 - CPU - Standardnamd: ATPase with 327,506 Atomsvkfft: FFT + iFFT R2C / C2Ronnx: ResNet50 v1-12-int8 - CPU - Parallelvkfft: FFT + iFFT C2C multidimensional in single precisiongromacs: MPI CPU - water_GMX50_barevkfft: FFT + iFFT C2C 1D batched in half precisiononnx: T5 Encoder - CPU - Paralleldav1d: Chimera 1080pcompress-lz4: 3 - Compression Speedcompress-lz4: 9 - Compression Speedcachebench: Read / Modify / Writeonnx: CaffeNet 12-int8 - CPU - Parallelllama-cpp: llama-2-7b.Q4_0.ggufllamafile: llava-v1.5-7b-q4 - CPUonnx: ResNet50 v1-12-int8 - CPU - Standardcompress-lz4: 1 - Decompression Speedonnx: T5 Encoder - CPU - Standardcompress-lz4: 3 - Decompression Speedlibplacebo: av1_grain_lapllamafile: mistral-7b-instruct-v0.2.Q8_0 - CPUvkfft: FFT + iFFT C2C Bluestein in single precisioncompress-lz4: 1 - Compression Speeddav1d: Chimera 1080p 10-bitcompress-lz4: 9 - Decompression Speedcachebench: Writelibplacebo: deband_heavylibplacebo: hdr_lutnamd: STMV with 1,066,628 Atomslibplacebo: polar_nocomputevkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingvkfft: FFT + iFFT C2C 1D batched in single precisionlibplacebo: gaussiancachebench: Readllamafile: wizardcoder-python-34b-v1.0.Q6_K - CPUllama-cpp: llama-2-70b-chat.Q5_0.ggufllama-cpp: llama-2-13b.Q4_0.ggufoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Parallelonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Parallelonnx: T5 Encoder - CPU - Standardonnx: T5 Encoder - CPU - Parallelonnx: yolov4 - CPU - Standardonnx: yolov4 - CPU - Parallelonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Parallelvkfft: FFT + iFFT C2C 1D batched in double precisionabc336.77325.64.259339.986910.4710172.8964320.91790.68614792.9214.91945.32756.9002581.61055.06166333.92437.047826.964105.4860.54640631099.278558690.5731370290.9185403.43118.7141.13103769.795772239.5857.377.07129.1674114.5146.8613850.6405.164.261112755.41296.564072.4109630.16598778.22371.630.16793151.3885107500370.738945.2798850.050.033.840.070.160.1526.989247.803322.058237.08447.7399710.070967.0264100.131457.412123.062.993154.17219144.917234.7756.8060410.9972197.56345.2499.4730912.2437254.33409.894.83258.980480.4156423.2542921.60110.753919102.6516.30947.03966.3854777.20224.72968350.137.459528.3989101.5970.55272614698.487759520.571392389.8118405.66119.0441.16104513.363289238.1647.417.09129.0574123.5146.7163870.7405.84.281116754.76296.794077109813.14575478.16371.860.16810151.4285037495370.698946.1602490.050.033.840.070.160.1526.692646.290821.255135.21027.7465110.151161.3135111.3511326.42405.912.854824.19665156.599206.9296.813411.1326211.427307.2839.8357112.9464274.71381.243.97710.19480.4623953.1246123.19440.68088694.6914.800642.81066.8384383.35384.91617328.36335.551428.2518104.9160.53465619897.33358430.5631374889.5594401.11117.8740.77104746.114943240.2217.437.12128.3784139.1147.4963857.8403.94.271111752.43297.274080109693.87005678.1371.350.16812151.2785067496370.588946.2891830.050.033.840.070.160.1528.125543.111323.355435.39347.7871310.272467.562298.08721468.672162.653.043574.16112146.226251.4426.7773711.1641203.406320.0369.5237111.987OpenBenchmarking.org

Libplacebo

Test: hdr_peakdetect

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: hdr_peakdetectabc70140210280350336.77254.33274.711. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

dav1d

Video Input: Summer Nature 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 1080pabc90180270360450325.60409.89381.241. (CC) gcc options: -pthread

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelabc1.08732.17463.26194.34925.43654.259334.832503.977001. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelabc36912159.986918.9804810.194801. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelabc0.1060.2120.3180.4240.530.4710170.4156420.4623951. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Parallelabc0.73221.46442.19662.92883.6612.896433.254293.124611. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelabc61218243020.9221.6023.191. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardabc0.16960.33920.50880.67840.8480.6861470.7539190.6808861. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

dav1d

Video Input: Summer Nature 4K

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 4Kabc2040608010092.92102.6594.691. (CC) gcc options: -pthread

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardabc4812162014.9216.3114.801. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardabc112233445545.3347.0442.811. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabc2468106.900256.385476.838431. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Parallelabc2040608010081.6177.2083.351. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardabc1.13892.27783.41674.55565.69455.061664.729684.916171. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardabc80160240320400333.92350.10328.361. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardabc91827364537.0537.4635.551. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelabc71421283526.9628.4028.251. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardabc20406080100105.49101.60104.921. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

NAMD

Input: ATPase with 327,506 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: ATPase with 327,506 Atomsabc0.12440.24880.37320.49760.6220.546400.552720.53465

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rabc140028004200560070006310614661981. (CXX) g++ options: -O3

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallelabc2040608010099.2898.4997.331. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisionabc130026003900520065005869595258431. (CXX) g++ options: -O3

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareabc0.12890.25780.38670.51560.64450.5730.5700.5631. (CXX) g++ options: -O3 -lm

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisionabc3K6K9K12K15K1370213923137481. (CXX) g++ options: -O3

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Parallelabc2040608010090.9289.8189.561. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

dav1d

Video Input: Chimera 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080pabc90180270360450403.43405.66401.111. (CC) gcc options: -pthread

LZ4 Compression

Compression Level: 3 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 3 - Compression Speedabc306090120150118.71119.04117.871. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 9 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 9 - Compression Speedabc91827364541.1341.1640.771. (CC) gcc options: -O3

CacheBench

Test: Read / Modify / Write

OpenBenchmarking.orgMB/s, More Is BetterCacheBenchTest: Read / Modify / Writeabc20K40K60K80K100K103769.80104513.36104746.11MIN: 79762.44 / MAX: 113377.99MIN: 83065.75 / MAX: 113865.39MIN: 80807.87 / MAX: 113668.221. (CC) gcc options: -O3 -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallelabc50100150200250239.59238.16240.221. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Llama.cpp

Model: llama-2-7b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-7b.Q4_0.ggufabc2468107.377.417.431. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Llamafile

Test: llava-v1.5-7b-q4 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: llava-v1.5-7b-q4 - Acceleration: CPUabc2468107.077.097.12

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardabc306090120150129.17129.06128.381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LZ4 Compression

Compression Level: 1 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 1 - Decompression Speedabc90018002700360045004114.54123.54139.11. (CC) gcc options: -O3

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabc306090120150146.86146.72147.501. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LZ4 Compression

Compression Level: 3 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 3 - Decompression Speedabc80016002400320040003850.63870.73857.81. (CC) gcc options: -O3

Libplacebo

Test: av1_grain_lap

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: av1_grain_lapabc90180270360450405.16405.80403.901. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Llamafile

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPUabc0.9631.9262.8893.8524.8154.264.284.27

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisionabc20040060080010001112111611111. (CXX) g++ options: -O3

LZ4 Compression

Compression Level: 1 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 1 - Compression Speedabc160320480640800755.41754.76752.431. (CC) gcc options: -O3

dav1d

Video Input: Chimera 1080p 10-bit

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080p 10-bitabc60120180240300296.56296.79297.271. (CC) gcc options: -pthread

LZ4 Compression

Compression Level: 9 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 9 - Decompression Speedabc90018002700360045004072.44077.04080.01. (CC) gcc options: -O3

CacheBench

Test: Write

OpenBenchmarking.orgMB/s, More Is BetterCacheBenchTest: Writeabc20K40K60K80K100K109630.17109813.15109693.87MIN: 78349.8 / MAX: 127046.33MIN: 78789.09 / MAX: 126929.16MIN: 78492.75 / MAX: 127175.971. (CC) gcc options: -O3 -lrt

Libplacebo

Test: deband_heavy

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: deband_heavyabc2040608010078.2278.1678.101. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

Libplacebo

Test: hdr_lut

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: hdr_lutabc80160240320400371.63371.86371.351. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

NAMD

Input: STMV with 1,066,628 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: STMV with 1,066,628 Atomsabc0.03780.07560.11340.15120.1890.167930.168100.16812

Libplacebo

Test: polar_nocompute

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: polar_nocomputeabc306090120150151.38151.42151.271. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingabc2K4K6K8K10K8510850385061. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisionabc160032004800640080007500749574961. (CXX) g++ options: -O3

Libplacebo

Test: gaussian

OpenBenchmarking.orgFPS, More Is BetterLibplacebo 6.338.2Test: gaussianabc80160240320400370.73370.69370.581. (CXX) g++ options: -fvisibility=hidden -std=c++20 -O2 -fno-math-errno -fPIC -pthread -MD -MQ -MF

CacheBench

Test: Read

OpenBenchmarking.orgMB/s, More Is BetterCacheBenchTest: Readabc2K4K6K8K10K8945.288946.168946.29MIN: 8942.72 / MAX: 8947.89MIN: 8944.76 / MAX: 8946.98MIN: 8939.89 / MAX: 8948.191. (CC) gcc options: -O3 -lrt

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPUabc0.01130.02260.03390.04520.05650.050.050.05

Llama.cpp

Model: llama-2-70b-chat.Q5_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-70b-chat.Q5_0.ggufabc0.00680.01360.02040.02720.0340.030.030.031. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Llama.cpp

Model: llama-2-13b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-13b.Q4_0.ggufabc0.8641.7282.5923.4564.323.843.843.841. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RTLightmap.hdr.4096x4096 - Device: CPU-Onlyabc0.01580.03160.04740.06320.0790.070.070.07

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Onlyabc0.0360.0720.1080.1440.180.160.160.16

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Onlyabc0.03380.06760.10140.13520.1690.150.150.15

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardabc71421283526.9926.6928.131. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelabc112233445547.8046.2943.111. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardabc61218243022.0621.2623.361. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelabc91827364537.0835.2135.391. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardabc2468107.739977.746517.787131. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallelabc369121510.0710.1510.271. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardabc153045607567.0361.3167.561. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelabc20406080100100.13111.3598.091. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardabc300600900120015001457.411326.401468.671. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelabc50010001500200025002123.062405.912162.651. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardabc0.68481.36962.05442.73923.4242.993152.854823.043571. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallelabc0.94421.88842.83263.77684.7214.172194.196654.161121. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabc306090120150144.92156.60146.231. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelabc50100150200250234.78206.93251.441. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabc2468106.806046.813406.777371. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Parallelabc369121511.0011.1311.161. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardabc50100150200250197.56211.43203.411. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Parallelabc80160240320400345.25307.28320.041. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardabc36912159.473099.835719.523711. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Parallelabc369121512.2412.9511.991. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.5