perf: FxHash node-cache + direct attr-value reads (output-identical, ~28-30% on node-heavy docs) by dginev · Pull Request #205 · KWARC/rust-libxml

dginev · 2026-07-01T00:41:20Z

Two dependency-free, output-identical performance improvements to the hot paths
exercised by node/attribute-heavy XML processing (measured downstream in
latexml-oxide, a Rust port of LaTeXML).
Both are based on the latest main.

1. `perf(node-cache)`: FxHash-style hasher for the `xmlNodePtr -> Node` map

The internal node-bookkeeping map is probed on every Node::wrap/lookup and used
the std SipHash RandomState. The keys are allocator-chosen xmlNodePtr
addresses (not adversarial input), so HashDoS resistance is irrelevant and SipHash
is pure wasted work. Swapping in a dependency-free FxHash-style multiply-rotate
hasher on the pointer key cuts ~28-30% of wall on math/node-heavy documents
(latexml-oxide: 1510.03361 19.6s→14.1s, a tikz-cd paper 22.4s→15.7s, across both
conversion phases).

Zero new dependencies (std::hash only).
The map is never iterated, so losing per-run randomization is a no-op.
Output-identical.
Includes a determinism + no-collision regression test.

2. `perf(attrs)`: `get_properties` reads values directly from the attr node

get_properties / get_properties_ns (and the get_attributes aliases) looped
over the attribute list but then re-resolved each value by name via
get_property → xmlGetProp, which re-scans the whole attribute list
(xmlStrEqual per entry) and allocates a fresh CString for the name on every
attribute — quadratic in the attribute count, and pure overhead since the loop
already holds the attribute node pointer.

This reads the value straight from that node with xmlNodeGetContent (the same
call get_content already uses), via a small attr_node_value helper. The
returned map is unchanged; libxml's own attribute-accessor tests pass.

Validation

cargo test --release — all test binaries pass (0 failed), including the new
node-cache determinism test.

The internal node bookkeeping map (probed on every Node::wrap/lookup) used the std SipHash RandomState. Keys are allocator-chosen xmlNodePtr addresses (not adversarial), so HashDoS resistance is irrelevant and SipHash is wasted work. A dependency-free FxHash-style multiply-rotate hasher on the pointer key cuts ~28-30% of wall on math/node-heavy documents (measured downstream in latexml-oxide: 1510.03361 19.6s->14.1s, tikz-cd 22.4s->15.7s, both phases). Zero new dependencies (std::hash only); the map is never iterated so losing per-run randomization is a no-op; output-identical. Includes a determinism + no-collision regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

get_properties / get_properties_ns (and the get_attributes aliases) looped over the attribute list but then re-resolved each value by name via get_property -> xmlGetProp, which re-scans the whole attribute list (xmlStrEqual per entry) and allocates a fresh CString for the name on every attribute. That is quadratic in the attribute count and pure overhead, since the loop already holds the attribute node pointer. Read the value straight from that node with xmlNodeGetContent (the same call get_content already uses), via a small attr_node_value helper. Returned map is unchanged; libxml's own attribute-accessor tests pass. Also documents the prior FxHash node-cache change under [Unreleased].

dginev and others added 2 commits June 30, 2026 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: FxHash node-cache + direct attr-value reads (output-identical, ~28-30% on node-heavy docs)#205

perf: FxHash node-cache + direct attr-value reads (output-identical, ~28-30% on node-heavy docs)#205
dginev wants to merge 2 commits into
mainfrom
perf/node-cache-and-attrs

dginev commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dginev commented Jul 1, 2026

1. perf(node-cache): FxHash-style hasher for the xmlNodePtr -> Node map

2. perf(attrs): get_properties reads values directly from the attr node

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `perf(node-cache)`: FxHash-style hasher for the `xmlNodePtr -> Node` map

2. `perf(attrs)`: `get_properties` reads values directly from the attr node