Skip to content

Cover text decoding; consolidate text tests into text_test.go#16

Merged
pgundlach merged 1 commit into
mainfrom
claude/text-decoding-coverage
Jun 23, 2026
Merged

Cover text decoding; consolidate text tests into text_test.go#16
pgundlach merged 1 commit into
mainfrom
claude/text-decoding-coverage

Conversation

@fank

@fank fank commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Raises text.go coverage and moves the misplaced text tests to their proper home.

Tests (independent oracles, not characterization)

  • TestDecodeTextStringUTF16RoundTrip — round-trips via utf16.Encode; the 😀 forces a surrogate pair, and both byte orders dispatch off their BOM.
  • TestDecodeUTF16OddLengthNoPanic — a dangling odd byte is dropped (BE and LE), no out-of-range read.
  • TestDecodeTextStringDispatch — UTF-8 BOM (PDF 2.0) and PDFDocEncoding default.
  • TestDecodePDFDocEncoding — table spot-checks (breve, bullet, euro, Latin-1 é) incl. undefined slots → U+FFFD.
  • TestParseDate — valid/leap/year-only/no-prefix/malformed/month-13/tz-offset, each checked against an independently constructed time.Time.

Cleanup

  • The three text tests that lived in reader_test.go (TestTextDecodingUTF16BE, TestTextDecodingPDFDocEncoding, TestParseDate) are superseded and moved here, matching the text.gotext_test.go convention.
  • parseDate's /D: strip simplified to strings.TrimPrefix (staticcheck S1017).

parseDate's leniency (e.g. hour=99 rolls over via time.Date) is left as-is — best-effort date parsing, not a crash; the tests cover valid input and reject clearly-malformed dates rather than enshrining the rollover.

Coverage: decodeTextString/decodeUTF16BE/decodePDFDocEncoding → 100%; decodeUTF16LE 0% → 100%; parseDate → 98%.

Add UTF-16 LE/BE round-trips through decodeTextString (a surrogate-pair
emoji exercises the hard path), odd-length no-panic checks for both byte
orders, PDFDocEncoding table spot-checks including undefined slots, and
parseDate cases verified against independently built time.Time values.

Move the three text tests that lived in reader_test.go (UTF-16BE,
PDFDocEncoding, parseDate) here, superseding them. Simplify the /D: prefix
strip to strings.TrimPrefix.

text.go decodeTextString/decodeUTF16BE/decodePDFDocEncoding to 100%,
decodeUTF16LE 0% -> 100%, parseDate to 98%.
@fank fank marked this pull request as ready for review June 22, 2026 22:06
@fank fank requested a review from pgundlach June 22, 2026 22:06
@pgundlach pgundlach merged commit 87f8b9a into main Jun 23, 2026
1 check passed
@pgundlach pgundlach deleted the claude/text-decoding-coverage branch June 23, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants