Cover text decoding; consolidate text tests into text_test.go by fank · Pull Request #16 · speedata/pdfdisassembler

fank · 2026-06-22T21:03:44Z

Raises text.go coverage and moves the misplaced text tests to their proper home.

Tests (independent oracles, not characterization)

TestDecodeTextStringUTF16RoundTrip — round-trips via utf16.Encode; the 😀 forces a surrogate pair, and both byte orders dispatch off their BOM.
TestDecodeUTF16OddLengthNoPanic — a dangling odd byte is dropped (BE and LE), no out-of-range read.
TestDecodeTextStringDispatch — UTF-8 BOM (PDF 2.0) and PDFDocEncoding default.
TestDecodePDFDocEncoding — table spot-checks (breve, bullet, euro, Latin-1 é) incl. undefined slots → U+FFFD.
TestParseDate — valid/leap/year-only/no-prefix/malformed/month-13/tz-offset, each checked against an independently constructed time.Time.

Cleanup

The three text tests that lived in reader_test.go (TestTextDecodingUTF16BE, TestTextDecodingPDFDocEncoding, TestParseDate) are superseded and moved here, matching the text.go → text_test.go convention.
parseDate's /D: strip simplified to strings.TrimPrefix (staticcheck S1017).

parseDate's leniency (e.g. hour=99 rolls over via time.Date) is left as-is — best-effort date parsing, not a crash; the tests cover valid input and reject clearly-malformed dates rather than enshrining the rollover.

Coverage: decodeTextString/decodeUTF16BE/decodePDFDocEncoding → 100%; decodeUTF16LE 0% → 100%; parseDate → 98%.

Add UTF-16 LE/BE round-trips through decodeTextString (a surrogate-pair emoji exercises the hard path), odd-length no-panic checks for both byte orders, PDFDocEncoding table spot-checks including undefined slots, and parseDate cases verified against independently built time.Time values. Move the three text tests that lived in reader_test.go (UTF-16BE, PDFDocEncoding, parseDate) here, superseding them. Simplify the /D: prefix strip to strings.TrimPrefix. text.go decodeTextString/decodeUTF16BE/decodePDFDocEncoding to 100%, decodeUTF16LE 0% -> 100%, parseDate to 98%.

fank marked this pull request as ready for review June 22, 2026 22:06

fank requested a review from pgundlach June 22, 2026 22:06

pgundlach merged commit 87f8b9a into main Jun 23, 2026
1 check passed

pgundlach deleted the claude/text-decoding-coverage branch June 23, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cover text decoding; consolidate text tests into text_test.go#16

Cover text decoding; consolidate text tests into text_test.go#16
pgundlach merged 1 commit into
mainfrom
claude/text-decoding-coverage

fank commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fank commented Jun 22, 2026

Tests (independent oracles, not characterization)

Cleanup

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants