Skip to content

Honour LZWDecode /EarlyChange 0; expand filter test coverage#14

Merged
pgundlach merged 1 commit into
mainfrom
claude/lzw-earlychange-and-filter-coverage
Jun 23, 2026
Merged

Honour LZWDecode /EarlyChange 0; expand filter test coverage#14
pgundlach merged 1 commit into
mainfrom
claude/lzw-earlychange-and-filter-coverage

Conversation

@fank

@fank fank commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Bug

decodeLZW interpreted the early-change flag as:

early := p.EarlyChange
if early == 0 { early = 1 }   // <- explicit /EarlyChange 0 becomes 1

/EarlyChange defaults to 1 but may be set to 0. Because the field's zero value (0) doubled as "unset", a stream that explicitly declared /EarlyChange 0 was silently decoded with early change on, producing garbage past the first 9→10-bit code-width boundary (~511 dict entries).

Fix

Replace Params.EarlyChange int with Params.NoEarlyChange bool:

  • zero value → early change on (the PDF default), correct for the common case;
  • explicit /EarlyChange 0NoEarlyChange = true.

As a bonus this clamps the effective flag to {0,1}, so a hostile /EarlyChange value (e.g. a huge int) can no longer drive the width threshold (1<<width) - early negative.

How the oracle works

Go's compress/lzw (MSB) turned out to use the non-early convention, so its output round-trips this decoder only with NoEarlyChange: true. That makes it a sound independent oracle for the early=0 path and the shared width-growth / clear / KwKwK machinery. The red→green was verified by temporarily reverting the fix (the /EarlyChange 0 stream then fails with invalid code … at width 9).

Tests

  • TestLZWRoundTripStdlib — stdlib-encoded streams (incl. a 64 KiB varied buffer) exercise dict reuse, KwKwK, 9→12-bit growth and the dictionary-full reset.
  • TestLZWEarlyChangeHonored / TestLZWStreamEarlyChangeZero — the flag is honoured, the latter at the Open()Content() surface (also covers paramsFromDict + streamFilterChain).
  • TestLZWTruncatedNoPanic — a stream cut mid-code must not panic (readBits zero-padding).
  • TestASCII85RoundTripStdlib / TestASCII85EdgeCases — partial groups, z, whitespace, <~, invalid byte.
  • TestStreamFilterChainArray — a chained [/ASCII85Decode /FlateDecode] filter.

Also removes the dead errors import (var _ = errors.New) from lzw.go.

Coverage: internal/filter 80.7% → 93.2%; decodeLZW, readBits, decodeASCII85 → 100%; paramsFromDict 0% → 67%; streamFilterChain 42% → 67%.

decodeLZW defaulted EarlyChange to 1 whenever the value was 0, conflating
"unset" (PDF default 1) with an explicit /EarlyChange 0, so streams that
set it to 0 were decoded with the wrong code-width timing. Replace the
EarlyChange int field with NoEarlyChange bool: the zero value keeps the
default early change and an explicit 0 is now honoured. This also clamps
the flag to {0,1}, so a hostile /EarlyChange can no longer distort the
width threshold.

Adds filter tests: stdlib-LZW round-trips exercising dictionary reuse,
KwKwK, 9->12-bit growth and the dictionary-full reset; an EarlyChange
regression at the Open()/Content() surface; a truncated-stream no-panic
check; ASCII85 round-trips and edge cases (z, whitespace, invalid byte,
partial groups); and a chained /Filter array. Removes the dead errors
import from lzw.go.

internal/filter 80.7% -> 93.2%; decodeLZW/readBits/decodeASCII85 to 100%.
@fank fank force-pushed the claude/lzw-earlychange-and-filter-coverage branch from d04da68 to a6d484f Compare June 22, 2026 21:21
@fank fank marked this pull request as ready for review June 22, 2026 22:06
@fank fank requested a review from pgundlach June 22, 2026 22:06
@pgundlach pgundlach merged commit 893cbe3 into main Jun 23, 2026
1 check passed
@pgundlach pgundlach deleted the claude/lzw-earlychange-and-filter-coverage branch June 23, 2026 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants