Skip to content

Add LIBXML_NOBLANKS to dom_load() XML parse options#304

Open
jordikroon wants to merge 1 commit into
php:masterfrom
jordikroon:performance-improvements
Open

Add LIBXML_NOBLANKS to dom_load() XML parse options#304
jordikroon wants to merge 1 commit into
php:masterfrom
jordikroon:performance-improvements

Conversation

@jordikroon

@jordikroon jordikroon commented Jun 17, 2026

Copy link
Copy Markdown
Member

Drops whitespace-only text nodes. Which shouldn't be harmful. It results in a smaller DOM, thus faster to traverse and requires less memory.

Benchmarked configure.php --with-lang=en, 5 runs, PHP 8.5:

for i in 1 2 3 4 5; do
  /usr/bin/time -l php configure.php --with-lang=en >/dev/null
done 2>benchmark.txt

Benchmark:

Before After
Time 8.94s 6.28s
Peak memory 1687 MB 1263 MB

Raw files:

Before patch (baseline)
=== baseline run 1 ===
        8.94 real         5.79 user         2.19 sys
          1703395328  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              178083  page reclaims
                2166  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               23136  voluntary context switches
               10176  involuntary context switches
         49464100831  instructions retired
         15879556244  cycles elapsed
          1687130560  peak memory footprint
=== baseline run 2 ===
        8.36 real         5.91 user         3.10 sys
          1745633280  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              137481  page reclaims
                 410  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                2017  voluntary context switches
               11477  involuntary context switches
         50933601605  instructions retired
         18544221387  cycles elapsed
          1687310784  peak memory footprint
=== baseline run 3 ===
        9.61 real         6.00 user         3.05 sys
          1650540544  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              226507  page reclaims
                2176  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               13407  voluntary context switches
               17884  involuntary context switches
         50502793793  instructions retired
         16581835963  cycles elapsed
          1687376320  peak memory footprint
=== baseline run 4 ===
        6.83 real         5.81 user         1.75 sys
          1761476608  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              137900  page reclaims
                 391  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                 271  voluntary context switches
                6098  involuntary context switches
         48201588467  instructions retired
         14541027505  cycles elapsed
          1687343552  peak memory footprint
=== baseline run 5 ===
       10.23 real         6.30 user         3.08 sys
          1748795392  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              138801  page reclaims
                2065  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               15892  voluntary context switches
               25486  involuntary context switches
         51353051988  instructions retired
         18729227016  cycles elapsed
          1687310784  peak memory footprint
After patch
=== patched run 1 ===
        6.40 real         5.35 user         1.83 sys
          1306656768  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              111176  page reclaims
                 486  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                 967  voluntary context switches
                7422  involuntary context switches
         42787245602  instructions retired
         13223610026  cycles elapsed
          1263193728  peak memory footprint
=== patched run 2 ===
        5.64 real         5.08 user         1.57 sys
          1313062912  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              111184  page reclaims
                 371  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                 241  voluntary context switches
                3070  involuntary context switches
         42706120047  instructions retired
         12344963394  cycles elapsed
          1263554176  peak memory footprint
=== patched run 3 ===
        9.44 real         5.80 user         3.54 sys
          1300905984  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              111008  page reclaims
                1975  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                8799  voluntary context switches
               16634  involuntary context switches
         45579012902  instructions retired
         16727081777  cycles elapsed
          1263423104  peak memory footprint
=== patched run 4 ===
        6.28 real         5.24 user         1.75 sys
          1243627520  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              131634  page reclaims
                 477  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                1024  voluntary context switches
                8285  involuntary context switches
         42888761617  instructions retired
         13168907040  cycles elapsed
          1263324800  peak memory footprint
=== patched run 5 ===
        5.81 real         5.22 user         1.61 sys
          1309032448  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              111131  page reclaims
                 396  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                 168  voluntary context switches
                3945  involuntary context switches
         42648784251  instructions retired
         12410583670  cycles elapsed
          1263062656  peak memory footprint

@jordikroon jordikroon requested review from Girgias and alfsb June 17, 2026 20:04
@alfsb

alfsb commented Jun 17, 2026

Copy link
Copy Markdown
Member

As commented on discord, I know little to nothing about PhD in this respect.

Why PhD? Because some Docbook parts specify that whitespace should be preserved in the rendering process, so some whitespace may end up being relevant in PhD, or in final output.

To get better performance and reduce memory, removing XML comments may give a similar impact, and these are unspecified in Docbook.

@alfsb

alfsb commented Jun 18, 2026

Copy link
Copy Markdown
Member

Philip O gives an example where this causes a problem. The rendering of ltrim() will be broken, because two consecutive <acronym> separated by space will be source transformed, and then rendered, as two consecutive <acronym> glued together.

The problem is more general. Whitespace in Docbook is complicated. In some elements it is completely irrelevant, and could be trimmed, but there are some other contexts, where whitespace should be coalesced (like HTML), and other contexts where it should be fully preserved.

There is a hint in old libxml docs:

The 2.x and later version will switch to the XML standard way and ignorableWhitespace() are only generated when running the parser in validating mode and when the current element doesn't allow CDATA or mixed content.

So... it may be possible to change libxml to the "correct" Docbook behaviour if it is called in validating mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants