Skip to content

Remove the internal TOMLChar wrapper#492

Merged
frostming merged 1 commit into
python-poetry:masterfrom
AstekGroup:perf/remove-tomlchar
Jun 11, 2026
Merged

Remove the internal TOMLChar wrapper#492
frostming merged 1 commit into
python-poetry:masterfrom
AstekGroup:perf/remove-tomlchar

Conversation

@tfoutrein

Copy link
Copy Markdown
Contributor

Stacked on #489, #490 and #491 — the capstone of that series; best reviewed/merged after them.
This also supersedes #488 (interning TOMLChar): per @dimbleby's suggestion on that PR, removing the wrapper entirely is the better end-state, so I'd close #488 in favour of this.

What

After the bulk run-scans (#490/#491), the parser only constructs a TOMLChar (a str subclass) at run boundaries and uses a handful of its is_*() helpers. This removes the class entirely:

  • Source yields plain str characters; inc() / advance_* read self[i] directly.
  • End-of-input is detected positionally (_idx >= len / Source.end()) instead of an identity sentinel.
  • The remaining character-class checks use module-level frozensets.

A real NUL byte is still rejected as an invalid control char and is never mistaken for end-of-input, since EOF is now positional rather than a value/identity comparison.

Benchmarks

Median, interleaved A/B vs master (includes #489#491):

document speedup
large flat, single-line strings (~90 KB) 5.8×
poetry.lock-like (~64 KB) 2.4×
pyproject.toml 1.9×
typical mixed (~4 KB) 1.6×

The removal itself adds ~1.1–1.18× over #491. No regression on any shape.

Tests

Full suite passes (972, incl. the toml-test conformance submodule). On top of that, an 11.5k-input adversarial differential — EOF/truncation at every prefix length, real-NUL placement in every position, empty/whitespace/BOM, and structural fuzz — is byte-identical in output and exception type to master. No public API change (TOMLChar was not exported).

@tfoutrein

Copy link
Copy Markdown
Contributor Author

Rebased onto master now that #491 has merged — single commit on top of current master, no conflicts (composes cleanly with the recent escape/bare-key fixes #493/#497/#501). CI is green. This removes TOMLChar entirely, so it supersedes #488 (interning) — I'll close #488 once this lands. Ready for squash-merge. Thanks!

Comment thread tomlkit/parser.py Outdated
After the bulk run-scans, the parser only built a `TOMLChar` (a `str`
subclass) at run boundaries and used a handful of its `is_*()` helpers.
Drop the class entirely: `Source` now yields plain `str` characters and
detects end-of-input positionally (`_idx >= len` / `Source.end()`) instead
of an identity sentinel, and the remaining character-class checks use
module-level frozensets.

A real NUL byte is still rejected as an invalid control char and is never
mistaken for end-of-input, since EOF is now positional rather than a
sentinel comparison.

No behaviour change (972 tests incl. the toml-test conformance submodule;
plus an 11.5k-input adversarial differential over EOF/truncation, real-NUL
placement, empty/whitespace and structural fuzz — output and error-type
byte-identical to master). Removes the per-character object construction
and method dispatch (~1.1-1.18x over the previous step).
@tfoutrein tfoutrein force-pushed the perf/remove-tomlchar branch from d92e0a0 to 01860b1 Compare June 11, 2026 08:54
@tfoutrein

Copy link
Copy Markdown
Contributor Author

Done — dropped the duplicate raw-string constants and kept only frozensets, renamed without the _SET suffix (your preferred option). The two consume(...) calls that need a plain string now pass a literal. No behaviour change; the full test suite + conformance still pass.

Note on CI: all unit-test jobs (every OS × Python) + pre-commit + the poetry integration are green. The two poetry-core integration jobs failed at poetry install with a vendoring/lockfile resolution error ("This is likely not a Poetry issue…") — unrelated to this PR (it passed on master and on the Windows poetry-core job). I don't have permission to re-run them on this repo, but a re-run should clear it.

@frostming frostming merged commit eaa897d into python-poetry:master Jun 11, 2026
29 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants