Skip to content

fix: decode numeric references to surrogate code points as U+FFFD#102

Open
spokodev wants to merge 1 commit into
mdevils:mainfrom
spokodev:fix/surrogate-numeric-references
Open

fix: decode numeric references to surrogate code points as U+FFFD#102
spokodev wants to merge 1 commit into
mdevils:mainfrom
spokodev:fix/surrogate-numeric-references

Conversation

@spokodev

Copy link
Copy Markdown

decode('�'), and any numeric character reference in the surrogate range U+D800..U+DFFF, returns a lone surrogate instead of the replacement character:

decode('�') // => '\uD800'  (lone surrogate)  — should be '�'

The WHATWG numeric character reference end state says: "If the number is a surrogate, then ... set the character reference code to 0xFFFD." A lone surrogate is not well-formed UTF-16, so the current output breaks round-trips through UTF-8 and produces invalid data in JSON.stringify and Buffer.

The numeric branch passed code points <= 0xFFFF straight to fromCharCode, which yields the lone surrogate. This adds the surrogate range to the existing out-of-bounds check so those references decode to U+FFFD, matching the spec (and the entities and he packages).

Verified across the full surrogate range in both &#x..; and &#..; forms; non-surrogate references (C1 Windows-1252 mappings, astral code points, named references) are unaffected. Added a regression test.

`decode('&#xD800;')`, and any numeric character reference in the surrogate
range U+D800..U+DFFF, returned a lone surrogate instead of the replacement
character. The WHATWG numeric character reference end state requires a
surrogate code point to be set to 0xFFFD; emitting a lone surrogate produces
strings that are not well-formed UTF-16 and break round-trips through UTF-8
and JSON.stringify.

The numeric branch passed code points <= 0xFFFF straight to fromCharCode,
which yields the lone surrogate. Add the surrogate range to the existing
out-of-bounds check. Non-surrogate references (C1 Windows-1252 mappings,
astral code points, named references) are unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant