AMD INVLPGB Linux Patch Performance Benchmarks for a future article looking at AMD broadcast TLB invalidation Linux kernel patches with the INVLPGB instruction on newer AMD Zen 3 processors.
HTML result view exported from: https://openbenchmarking.org/result/2412250-NE-AMDINVLPG56&grr .
AMD INVLPGB Linux Patch Performance Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution Linux 6.13 Git INVLPGB Patched AMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads) Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS) AMD 1Ah 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF 3201GB Micron_7450_MTFDKCB3T2TFS + 257GB Flash Drive ASPEED 2 x Broadcom NetXtreme BCM5720 PCIe Ubuntu 24.10 6.13.0-rc4-phx-stock (x86_64) GNOME Shell 47.0 X Server GCC 14.2.0 ext4 1024x768 6.13.0-rc4-phx-broadcast-tlb (x86_64) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - NONE / relatime,rw,stripe=64 / Block Size: 4096 Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 Java Details - OpenJDK Runtime Environment (build 21.0.5+11-Ubuntu-1ubuntu124.10) Python Details - Python 3.12.7 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
AMD INVLPGB Linux Patch Performance relion: Basic - CPU whisper-cpp: ggml-medium.en - 2016 State of the Union mariadb: oltp_update_index - 256 mariadb: oltp_read_only - 256 mariadb: oltp_read_write - 256 xmrig: GhostRider - 1M whisper-cpp: ggml-small.en - 2016 State of the Union quicksilver: CORAL2 P2 rocksdb: Read While Writing build-llvm: Unix Makefiles apache-iotdb: 800 - 100 - 800 - 400 apache-iotdb: 800 - 100 - 800 - 400 clickhouse: 100M Rows Hits Dataset, Third Run clickhouse: 100M Rows Hits Dataset, Second Run clickhouse: 100M Rows Hits Dataset, First Run / Cold Cache apache-iotdb: 800 - 100 - 500 - 400 cassandra: Writes blender: Barbershop - CPU-Only apache-iotdb: 500 - 100 - 800 - 400 apache-iotdb: 500 - 100 - 800 - 400 apache-iotdb: 800 - 100 - 500 - 100 apache-iotdb: 800 - 100 - 500 - 100 apache-iotdb: 500 - 100 - 800 - 100 apache-iotdb: 500 - 100 - 800 - 100 renaissance: Savina Reactors.IO openfoam: drivaerFastback, Medium Mesh Size - Mesh Time apache-iotdb: 500 - 100 - 500 - 400 apache-iotdb: 500 - 100 - 500 - 400 apache-iotdb: 500 - 100 - 500 - 100 apache-iotdb: 500 - 100 - 500 - 100 renaissance: ALS Movie Lens build-llvm: Ninja nginx: 500 renaissance: In-Memory Database Shootout build-godot: Time To Compile renaissance: Apache Spark Bayes renaissance: Apache Spark PageRank memcached: 1:5 build-python: Released Build, PGO + LTO Optimized speedb: Update Rand speedb: Read Rand Write Rand speedb: Rand Read dacapobench: Jython quicksilver: CORAL2 P1 npb: LU.C rodinia: OpenMP Leukocyte dacapobench: jMonkeyEngine dacapobench: Apache Kafka namd: STMV with 1,066,628 Atoms dacapobench: BioJava Biological Data Framework npb: BT.C dacapobench: GraphChi npb: SP.C npb: IS.D dacapobench: Avrora AVR Simulation Framework npb: MG.C namd: ATPase with 327,506 Atoms npb: EP.C dacapobench: Apache Xalan XSLT build-python: Default dacapobench: Zxing 1D/2D Barcode Image Processing Linux 6.13 Git INVLPGB Patched 214.442 460.62129 113658 37688 172615 13706.9 225.41952 22056667 14641889 160.666 212.98 141171394 751.84 764.71 738.99 121237884 438419 124.03 211.81 125423055 40.40 117887210 61.49 120738449 5722.9 109.86392 168.36 99520170 47.85 96806809 18214.8 90.415 503771.76 4511.6 80.687 180.1 2261.3 3569817.66 190.912 519867 3054826 622084083 4320 34086667 312554.83 31.674 6808 5046 4.21510 4519 353192.16 2363 154232.79 7001.42 2585 145177.79 12.22443 11180.11 816 13.067 648 211.380 453.06996 117679 37868 173765 16061.4 219.87375 22056667 14737550 161.060 206.65 144092202 759.44 749.62 735.32 120726010 452025 124.48 214.48 124631354 40.04 118890170 60.93 121681953 5651.4 106.20884 167.95 100250585 47.03 97751377 18368.8 90.775 517615.54 4512.9 80.831 181.7 2283.3 3583145.46 191.972 522238 3106319 627286939 4158 34996667 311722.61 31.298 6806 5050 4.20973 4547 356244.91 2323 155286.47 6984.65 2556 147724.66 12.22125 11450.17 812 12.984 619 OpenBenchmarking.org
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 5.0 Test: Basic - Device: CPU Linux 6.13 Git INVLPGB Patched 50 100 150 200 250 SE +/- 3.72, N = 12 SE +/- 4.55, N = 12 214.44 211.38 1. (CXX) g++ options: -fPIC -std=c++14 -fopenmp -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -ljpeg -lmpi_cxx -lmpi
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union Linux 6.13 Git INVLPGB Patched 100 200 300 400 500 SE +/- 3.15, N = 3 SE +/- 0.73, N = 3 460.62 453.07 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
MariaDB Test: oltp_update_index - Threads: 256 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 11.5 Test: oltp_update_index - Threads: 256 Linux 6.13 Git INVLPGB Patched 30K 60K 90K 120K 150K SE +/- 190.75, N = 3 SE +/- 200.13, N = 3 113658 117679 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O3 -lnuma -lpcre2-8 -lcrypt -laio -lz -lm -lssl -lcrypto -lpthread -ldl
MariaDB Test: oltp_read_only - Threads: 256 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 11.5 Test: oltp_read_only - Threads: 256 Linux 6.13 Git INVLPGB Patched 8K 16K 24K 32K 40K SE +/- 17.06, N = 3 SE +/- 112.63, N = 3 37688 37868 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O3 -lnuma -lpcre2-8 -lcrypt -laio -lz -lm -lssl -lcrypto -lpthread -ldl
MariaDB Test: oltp_read_write - Threads: 256 OpenBenchmarking.org Queries Per Second, More Is Better MariaDB 11.5 Test: oltp_read_write - Threads: 256 Linux 6.13 Git INVLPGB Patched 40K 80K 120K 160K 200K SE +/- 73.62, N = 3 SE +/- 93.11, N = 3 172615 173765 1. (CXX) g++ options: -pie -fPIC -fstack-protector -O3 -lnuma -lpcre2-8 -lcrypt -laio -lz -lm -lssl -lcrypto -lpthread -ldl
Xmrig Variant: GhostRider - Hash Count: 1M OpenBenchmarking.org H/s, More Is Better Xmrig 6.21 Variant: GhostRider - Hash Count: 1M Linux 6.13 Git INVLPGB Patched 3K 6K 9K 12K 15K SE +/- 668.22, N = 15 SE +/- 32.37, N = 3 13706.9 16061.4 1. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union Linux 6.13 Git INVLPGB Patched 50 100 150 200 250 SE +/- 0.39, N = 3 SE +/- 1.60, N = 3 225.42 219.87 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni
Quicksilver Input: CORAL2 P2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 Linux 6.13 Git INVLPGB Patched 5M 10M 15M 20M 25M SE +/- 12018.50, N = 3 SE +/- 18559.21, N = 3 22056667 22056667 1. (CXX) g++ options: -fopenmp -O3 -march=native
RocksDB Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better RocksDB 9.0 Test: Read While Writing Linux 6.13 Git INVLPGB Patched 3M 6M 9M 12M 15M SE +/- 157207.18, N = 3 SE +/- 155717.24, N = 15 14641889 14737550 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti
Timed LLVM Compilation Build System: Unix Makefiles OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 16.0 Build System: Unix Makefiles Linux 6.13 Git INVLPGB Patched 40 80 120 160 200 SE +/- 0.32, N = 3 SE +/- 0.59, N = 3 160.67 161.06
Apache IoTDB Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.2 Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 50 100 150 200 250 SE +/- 2.18, N = 3 SE +/- 3.85, N = 3 212.98 206.65 MAX: 26561.37 MAX: 26883.6
Apache IoTDB Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 30M 60M 90M 120M 150M SE +/- 870384.94, N = 3 SE +/- 1488492.23, N = 3 141171394 144092202
ClickHouse 100M Rows Hits Dataset, Third Run OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Third Run Linux 6.13 Git INVLPGB Patched 160 320 480 640 800 SE +/- 0.95, N = 3 SE +/- 3.07, N = 3 751.84 759.44 MIN: 70.34 / MAX: 6666.67 MIN: 69.04 / MAX: 8571.43
ClickHouse 100M Rows Hits Dataset, Second Run OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, Second Run Linux 6.13 Git INVLPGB Patched 160 320 480 640 800 SE +/- 4.01, N = 3 SE +/- 1.96, N = 3 764.71 749.62 MIN: 69.85 / MAX: 8571.43 MIN: 69.28 / MAX: 6666.67
ClickHouse 100M Rows Hits Dataset, First Run / Cold Cache OpenBenchmarking.org Queries Per Minute, Geo Mean, More Is Better ClickHouse 22.12.3.5 100M Rows Hits Dataset, First Run / Cold Cache Linux 6.13 Git INVLPGB Patched 160 320 480 640 800 SE +/- 6.88, N = 3 SE +/- 4.99, N = 3 738.99 735.32 MIN: 69.85 / MAX: 7500 MIN: 68.03 / MAX: 7500
Apache IoTDB Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 400 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 30M 60M 90M 120M 150M SE +/- 875042.28, N = 3 SE +/- 1093112.93, N = 3 121237884 120726010
Apache Cassandra Test: Writes OpenBenchmarking.org Op/s, More Is Better Apache Cassandra 5.0 Test: Writes Linux 6.13 Git INVLPGB Patched 100K 200K 300K 400K 500K SE +/- 5248.18, N = 3 SE +/- 2371.91, N = 3 438419 452025
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.3 Blend File: Barbershop - Compute: CPU-Only Linux 6.13 Git INVLPGB Patched 30 60 90 120 150 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 124.03 124.48
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 50 100 150 200 250 SE +/- 2.05, N = 3 SE +/- 0.65, N = 3 211.81 214.48 MAX: 26536.58 MAX: 26489.9
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 30M 60M 90M 120M 150M SE +/- 352556.17, N = 3 SE +/- 332877.17, N = 3 125423055 124631354
Apache IoTDB Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.2 Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 Linux 6.13 Git INVLPGB Patched 9 18 27 36 45 SE +/- 0.37, N = 3 SE +/- 0.04, N = 3 40.40 40.04 MAX: 23837.32 MAX: 23821.12
Apache IoTDB Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 800 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 Linux 6.13 Git INVLPGB Patched 30M 60M 90M 120M 150M SE +/- 1087988.19, N = 3 SE +/- 226557.07, N = 3 117887210 118890170
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 100 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 100 Linux 6.13 Git INVLPGB Patched 14 28 42 56 70 SE +/- 0.36, N = 3 SE +/- 0.25, N = 3 61.49 60.93 MAX: 11313.55 MAX: 11306.84
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 100 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 800 - Client Number: 100 Linux 6.13 Git INVLPGB Patched 30M 60M 90M 120M 150M SE +/- 58310.64, N = 3 SE +/- 67599.70, N = 3 120738449 121681953
Renaissance Test: Savina Reactors.IO OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Savina Reactors.IO Linux 6.13 Git INVLPGB Patched 1200 2400 3600 4800 6000 SE +/- 75.13, N = 3 SE +/- 52.24, N = 7 5722.9 5651.4 MIN: 5596.11 / MAX: 9196.98 MIN: 5405.73 / MAX: 9193.8
OpenFOAM Input: drivaerFastback, Medium Mesh Size - Mesh Time OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 10 Input: drivaerFastback, Medium Mesh Size - Mesh Time Linux 6.13 Git INVLPGB Patched 20 40 60 80 100 109.86 106.21 1. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 400 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 40 80 120 160 200 SE +/- 0.88, N = 3 SE +/- 0.71, N = 3 168.36 167.95 MAX: 26450.24 MAX: 26468.05
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 400 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 400 Linux 6.13 Git INVLPGB Patched 20M 40M 60M 80M 100M SE +/- 603733.82, N = 3 SE +/- 97805.09, N = 3 99520170 100250585
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 OpenBenchmarking.org Average Latency, Fewer Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 Linux 6.13 Git INVLPGB Patched 11 22 33 44 55 SE +/- 0.55, N = 3 SE +/- 0.36, N = 3 47.85 47.03 MAX: 12576.32 MAX: 12578.34
Apache IoTDB Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 OpenBenchmarking.org point/sec, More Is Better Apache IoTDB 1.2 Device Count: 500 - Batch Size Per Write: 100 - Sensor Count: 500 - Client Number: 100 Linux 6.13 Git INVLPGB Patched 20M 40M 60M 80M 100M SE +/- 1081476.60, N = 3 SE +/- 291260.63, N = 3 96806809 97751377
Renaissance Test: ALS Movie Lens OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: ALS Movie Lens Linux 6.13 Git INVLPGB Patched 4K 8K 12K 16K 20K SE +/- 45.86, N = 3 SE +/- 114.89, N = 3 18214.8 18368.8 MIN: 17691.46 / MAX: 18303.77 MIN: 17674.24 / MAX: 18580.57
Timed LLVM Compilation Build System: Ninja OpenBenchmarking.org Seconds, Fewer Is Better Timed LLVM Compilation 16.0 Build System: Ninja Linux 6.13 Git INVLPGB Patched 20 40 60 80 100 SE +/- 0.24, N = 3 SE +/- 0.11, N = 3 90.42 90.78
nginx Connections: 500 OpenBenchmarking.org Requests Per Second, More Is Better nginx 1.23.2 Connections: 500 Linux 6.13 Git INVLPGB Patched 110K 220K 330K 440K 550K SE +/- 1734.77, N = 3 SE +/- 1146.82, N = 3 503771.76 517615.54 1. (CC) gcc options: -lluajit-5.1 -lm -lssl -lcrypto -lpthread -ldl -std=c99 -O2
Renaissance Test: In-Memory Database Shootout OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: In-Memory Database Shootout Linux 6.13 Git INVLPGB Patched 1000 2000 3000 4000 5000 SE +/- 55.34, N = 4 SE +/- 34.36, N = 3 4511.6 4512.9 MIN: 4295.26 / MAX: 5024.51 MIN: 4347.69 / MAX: 5072.64
Timed Godot Game Engine Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Godot Game Engine Compilation 4.0 Time To Compile Linux 6.13 Git INVLPGB Patched 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.16, N = 3 80.69 80.83
Renaissance Test: Apache Spark Bayes OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark Bayes Linux 6.13 Git INVLPGB Patched 40 80 120 160 200 SE +/- 0.85, N = 3 SE +/- 1.59, N = 3 180.1 181.7 MIN: 160.23 / MAX: 297.68 MIN: 164.62 / MAX: 284.35
Renaissance Test: Apache Spark PageRank OpenBenchmarking.org ms, Fewer Is Better Renaissance 0.16 Test: Apache Spark PageRank Linux 6.13 Git INVLPGB Patched 500 1000 1500 2000 2500 SE +/- 9.21, N = 3 SE +/- 22.38, N = 3 2261.3 2283.3 MIN: 1572.27 / MAX: 2279.05 MIN: 1561.41 / MAX: 2328.07
Memcached Set To Get Ratio: 1:5 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:5 Linux 6.13 Git INVLPGB Patched 800K 1600K 2400K 3200K 4000K SE +/- 2641.20, N = 3 SE +/- 7706.14, N = 3 3569817.66 3583145.46 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Timed CPython Compilation Build Configuration: Released Build, PGO + LTO Optimized OpenBenchmarking.org Seconds, Fewer Is Better Timed CPython Compilation 3.10.6 Build Configuration: Released Build, PGO + LTO Optimized Linux 6.13 Git INVLPGB Patched 40 80 120 160 200 190.91 191.97
Speedb Test: Update Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random Linux 6.13 Git INVLPGB Patched 110K 220K 330K 440K 550K SE +/- 210.08, N = 3 SE +/- 726.96, N = 3 519867 522238 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti
Speedb Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random Linux 6.13 Git INVLPGB Patched 700K 1400K 2100K 2800K 3500K SE +/- 5973.22, N = 3 SE +/- 22974.88, N = 3 3054826 3106319 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti
Speedb Test: Random Read OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read Linux 6.13 Git INVLPGB Patched 130M 260M 390M 520M 650M SE +/- 7230810.31, N = 3 SE +/- 1540848.02, N = 3 622084083 627286939 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti
DaCapo Benchmark Java Test: Jython OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: Jython Linux 6.13 Git INVLPGB Patched 900 1800 2700 3600 4500 SE +/- 37.01, N = 15 SE +/- 10.53, N = 3 4320 4158
Quicksilver Input: CORAL2 P1 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 Linux 6.13 Git INVLPGB Patched 7M 14M 21M 28M 35M SE +/- 46666.67, N = 3 SE +/- 98713.95, N = 3 34086667 34996667 1. (CXX) g++ options: -fopenmp -O3 -march=native
NAS Parallel Benchmarks Test / Class: LU.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C Linux 6.13 Git INVLPGB Patched 70K 140K 210K 280K 350K SE +/- 2270.42, N = 15 SE +/- 2363.19, N = 10 312554.83 311722.61 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
Rodinia Test: OpenMP Leukocyte OpenBenchmarking.org Seconds, Fewer Is Better Rodinia 3.1 Test: OpenMP Leukocyte Linux 6.13 Git INVLPGB Patched 7 14 21 28 35 SE +/- 0.33, N = 3 SE +/- 0.16, N = 3 31.67 31.30 1. (CXX) g++ options: -O2 -lOpenCL
DaCapo Benchmark Java Test: jMonkeyEngine OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: jMonkeyEngine Linux 6.13 Git INVLPGB Patched 1500 3000 4500 6000 7500 SE +/- 1.67, N = 3 SE +/- 1.67, N = 3 6808 6806
DaCapo Benchmark Java Test: Apache Kafka OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: Apache Kafka Linux 6.13 Git INVLPGB Patched 1100 2200 3300 4400 5500 SE +/- 4.26, N = 3 SE +/- 6.08, N = 3 5046 5050
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: STMV with 1,066,628 Atoms Linux 6.13 Git INVLPGB Patched 0.9484 1.8968 2.8452 3.7936 4.742 SE +/- 0.00531, N = 3 SE +/- 0.00334, N = 3 4.21510 4.20973
DaCapo Benchmark Java Test: BioJava Biological Data Framework OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: BioJava Biological Data Framework Linux 6.13 Git INVLPGB Patched 1000 2000 3000 4000 5000 SE +/- 20.17, N = 3 SE +/- 17.52, N = 3 4519 4547
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C Linux 6.13 Git INVLPGB Patched 80K 160K 240K 320K 400K SE +/- 3855.67, N = 3 SE +/- 3962.64, N = 5 353192.16 356244.91 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
DaCapo Benchmark Java Test: GraphChi OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: GraphChi Linux 6.13 Git INVLPGB Patched 500 1000 1500 2000 2500 SE +/- 8.96, N = 3 SE +/- 15.84, N = 3 2363 2323
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C Linux 6.13 Git INVLPGB Patched 30K 60K 90K 120K 150K SE +/- 1469.84, N = 3 SE +/- 279.05, N = 3 154232.79 155286.47 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D Linux 6.13 Git INVLPGB Patched 1500 3000 4500 6000 7500 SE +/- 60.97, N = 3 SE +/- 68.35, N = 5 7001.42 6984.65 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
DaCapo Benchmark Java Test: Avrora AVR Simulation Framework OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: Avrora AVR Simulation Framework Linux 6.13 Git INVLPGB Patched 600 1200 1800 2400 3000 SE +/- 8.50, N = 3 SE +/- 19.37, N = 3 2585 2556
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C Linux 6.13 Git INVLPGB Patched 30K 60K 90K 120K 150K SE +/- 2956.05, N = 15 SE +/- 1764.21, N = 3 145177.79 147724.66 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0 Input: ATPase with 327,506 Atoms Linux 6.13 Git INVLPGB Patched 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 12.22 12.22
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C Linux 6.13 Git INVLPGB Patched 2K 4K 6K 8K 10K SE +/- 27.23, N = 3 SE +/- 98.82, N = 15 11180.11 11450.17 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
DaCapo Benchmark Java Test: Apache Xalan XSLT OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: Apache Xalan XSLT Linux 6.13 Git INVLPGB Patched 200 400 600 800 1000 SE +/- 7.60, N = 6 SE +/- 2.00, N = 3 816 812
Timed CPython Compilation Build Configuration: Default OpenBenchmarking.org Seconds, Fewer Is Better Timed CPython Compilation 3.10.6 Build Configuration: Default Linux 6.13 Git INVLPGB Patched 3 6 9 12 15 13.07 12.98
DaCapo Benchmark Java Test: Zxing 1D/2D Barcode Image Processing OpenBenchmarking.org msec, Fewer Is Better DaCapo Benchmark 23.11 Java Test: Zxing 1D/2D Barcode Image Processing Linux 6.13 Git INVLPGB Patched 140 280 420 560 700 SE +/- 4.04, N = 3 SE +/- 6.43, N = 3 648 619
Phoronix Test Suite v10.8.5