perf: pre-size reads and skip redundant copy to cut sync memory by reflog · Pull Request #9 · getlantern/keepcurrent

reflog · 2026-06-16T08:20:00Z

Problem

Several 1 GB proxy/VPS hosts were tripping the SigNoz "Host memory usage high" alert (10 m avg system.memory.usage > 85%). Heap pprof from both http-proxy and lantern-box binaries pointed at the same culprit — the dominant live allocation was io.ReadAll, and every byte of it came from keepcurrent:

io.ReadAll
io/ioutil.ReadAll (inline)
github.com/getlantern/keepcurrent.(*byteChannel).UpdateFrom
github.com/getlantern/keepcurrent.(*Runner).syncOnce
github.com/getlantern/keepcurrent.(*Runner).Start.func2

http-proxy: 94.9 MB / 146 MB heap (65%)
lantern-box: 75 MB / 173 MB heap (44%)

The traces showed multiple large retained generations (75 + 60 + 48 + 38 + 30 + 24 MB …).

The source is getlantern/geo, which syncs the MaxMind GeoLite2 mmdb (~75 MB) via keepcurrent.FromTarGz(keepcurrent.FromWeb(dbURL), name) into two sinks (ToChannel(chDB) + ToFile(path)). Per sync that payload was buffered with io.ReadAll up to three times:

extracting the file from the tarball (tarGzSource.Fetch),
in syncOnce (data, _ = ioutil.ReadAll(rc)),
again in byteChannel.UpdateFrom, re-reading the bytes.Reader the Runner already handed it into a brand-new slice.

io.ReadAll grows its buffer by repeated reallocation, so each read churns through a sequence of ever-larger backing arrays (the 75/60/48/38… generations) before settling — and step 3 is a wholly redundant full copy. On a 1 GB box this transient churn dominates RSS.

Fix

No public API change.

readAll() (new read.go) — reads into a buffer pre-sized from the reader's known length, turning a multi-realloc read into a single allocation. Recognizes:
- sizedReadCloser (HTTP Content-Length, archive entry size — constructed internally),
- standard in-memory readers via Len() (*bytes.Reader, *bytes.Buffer, *strings.Reader).
- Falls back to io.ReadAll when size is unknown.
Size-aware sources — webSource.Fetch surfaces Content-Length; tarGzSource.Fetch pre-sizes extraction from the archive entry's uncompressed size and returns a size-aware reader, so syncOnce's read is pre-sized too.
bytesConsumer fast-path — byteChannel receives the payload the Runner already buffered directly (updateFromBytes) instead of reading it into a second copy. The slice is read-only and shared across sinks (documented on the interface). This removes copy #3 entirely.

Net effect for the geo case: three doubling-realloc reads of a 75 MB payload → one extraction read + one pre-sized syncOnce read, with the channel sink reusing that buffer.

Test

go build ./..., go vet ./... clean.
Existing suite passes; added read_test.go covering pre-sizing for sized readers, fallback for unsized readers, and knownSize.

Rollout

Downstream bumps to follow once merged: getlantern/lantern-box and getlantern/http-proxy-lantern (both pin keepcurrent indirectly).

Syncing a large source (e.g. the ~75MB MaxMind GeoLite2 mmdb that getlantern/geo pulls through FromTarGz(FromWeb(...)) into a ToChannel + ToFile pair) was buffering the whole payload with io.ReadAll up to three times per sync — once extracting the tarball, once in syncOnce, and once more in the byteChannel sink. io.ReadAll grows its buffer by repeated reallocation, so each of those reads churns through a sequence of ever-larger backing arrays before settling, and the byte-channel read was an entirely redundant second full copy of data the Runner already held. On 1GB proxy/VPS hosts this transient churn dominated RSS and tripped the "host memory >85%" alert. Two changes: - readAll(): reads into a buffer pre-sized from the reader's known length (HTTP Content-Length, archive entry size, or an in-memory reader's Len()), turning a multi-realloc read into a single allocation. webSource and tarGzSource now return size-aware readers so syncOnce benefits too. - bytesConsumer fast-path: byteChannel takes the payload the Runner already buffered directly instead of reading it into a second copy. The slice is read-only and shared across sinks (documented on bytesConsumer). No public API change. Adds read.go + tests.

Copilot

Pull request overview

This PR reduces sync-time memory churn by pre-sizing large reads when a reader’s remaining size is known and by avoiding a redundant full-buffer copy when sending the already-buffered payload to channel sinks.

Changes:

Add readAll()/knownSize() helpers to pre-size buffers and introduce an internal bytesConsumer fast-path for sinks.
Surface size information from webSource (HTTP Content-Length) and tarGzSource (archive entry size) so Runner reads can be pre-sized.
Update byteChannel to consume the already-buffered payload directly and add tests around the new read helper behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
source.go	Wrap HTTP bodies / extracted archive entries in size-aware readers to enable pre-sized reads.
sink.go	Route `byteChannel` through `readAll()` and add a no-copy `bytesConsumer` path.
read.go	Introduce `readAll()`, `knownSize()`, and internal `bytesConsumer` optimization.
read_test.go	Add tests for `readAll()` pre-sizing and `knownSize()` behavior.
keepcurrent.go	Switch Runner to `readAll()` and use `bytesConsumer` when available to skip redundant copies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…/comment fixes - readAll: cap pre-allocation at 256 MiB and convert to int, so a bogus size can't panic make() (falls back to io.ReadAll). The original n+bytes.MinRead did compile — bytes.MinRead is an untyped constant — but the guard hardens against hostile/garbage sizes. - tarGzSource: reuse readAll(sizedReadCloser{...}) instead of a bespoke bytes.Buffer, inheriting the same pre-size + overflow handling. - bytesConsumer doc: state plainly that the slice is read-only (Runner may share it across sinks); consumers must copy to retain/mutate. - read_test: fix stale countingReader comment on unsizedReader.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

…rnal, test cap fallback - webSource.Fetch now closes resp.Body on the 304 and non-200 early returns so connections are released for keep-alive reuse. - bytesConsumer doc: state it's an internal optimization for keepcurrent's own sinks; the unexported method means external sinks can't implement it and always use Sink.UpdateFrom. - Add TestReadAllFallsBackWhenSizeExceedsCap covering a reader that reports a size > maxPreAlloc while delivering a small payload.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

- Remove the bytesConsumer fast-path entirely. It forwarded the Runner's shared buffer to an external channel, dropping ToChannel's independent- ownership guarantee (a receiver mutating or concurrently reading the slice could corrupt other sinks). byteChannel.UpdateFrom now copies again — but via readAll, so it's a single pre-sized allocation. The memory spike came from io.ReadAll's doubling-realloc churn, which pre-sizing fixes; the copy-elimination was marginal and not worth the semantic change. - webSource.Fetch: drain bodies to EOF before closing on the 304 / non-200 paths (via drainClose) so net/http can actually reuse the connection. Comment now matches behavior.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

source.go:55

The 304 / non-200 early-return paths only call resp.Body.Close(), but the comment claims “Drain+close so the connection can be reused for keep-alive”. In Go’s net/http transport, connections are generally only eligible for reuse when the response body is read to EOF and closed. Either drain the body before closing (to match the comment and enable keep-alive reuse) or adjust the comment to avoid implying reuse.

	}
	if s.getETag() != "" {
		req.Header.Add("If-None-Match", s.etag)
	}
	resp, err := s.client.Do(req)
	if err != nil {

Pulls in getlantern/keepcurrent#9, which pre-sizes the reads used to sync the MaxMind GeoLite2 mmdb (~75MB) through geo's FromTarGz(FromWeb(...)) pipeline. The old io.ReadAll grew its buffer by repeated reallocation, churning through a chain of ever-larger backing arrays per sync; on 1GB hosts that transient spike dominated RSS and tripped the host-memory>85% alert. http-proxy's keepcurrent was pinned to a 2022 commit, so the bump also carries the transitive archiver/v3 -> mholt/archives migration and the minor golang.org/x/* upgrades it brings in. keepcurrent stays indirect; no first-party code changes. Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>

Pulls in getlantern/keepcurrent#9, which pre-sizes the reads used to sync the MaxMind GeoLite2 mmdb (~75MB) through geo's FromTarGz(FromWeb(...)) pipeline. The old io.ReadAll grew its buffer by repeated reallocation, churning through a chain of ever-larger backing arrays per sync; on 1GB VPS that transient spike dominated RSS and tripped the host-memory>85% alert. No code changes — keepcurrent stays an indirect dep. Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>

…#10) * perf: pre-size fileSource reads too (close the InitFrom churn) Follow-up to #9. That PR made webSource and tarGzSource size-aware, but fileSource was left returning a bare *os.File, so readAll's knownSize() returned false for file reads and fell back to io.ReadAll's doubling growth. geo loads its cached ~75MB MaxMind mmdb from disk via Runner.InitFrom(FromFile(path)) at startup, so that read churned through the 75->60->48->38->30->24MB realloc staircase on every boot (confirmed via alloc_space pprof on a lantern-box v0.0.91 box: io.ReadAll 373MB under keepcurrent.readAll <- syncOnce <- InitFrom <- geo.FromWeb). Make fileSource.Fetch return a sizedReadCloser carrying the stat size when there's no preprocessor, so the read is a single pre-sized allocation. The preprocessor path keeps the io.ReadAll fallback (a preprocessor can change the byte count). Also close the file on the early-return/error paths, which previously leaked the fd. go build/vet/test green; adds TestFileSourceIsSizeAware. * review: fix inverted fileSource modified-since check, close-couple preprocessor stream, hermetic web test - fileSource.Fetch returned ErrUnmodified exactly when the file WAS newer than the cutoff (and fetched when unchanged). Invert to match the Source.Fetch contract and byteSource.Fetch. - couple a preprocessor's returned stream's Close to the underlying *os.File so io.NopCloser-style preprocessors don't leak the descriptor. - add TestFileSourceUnmodified covering the ErrUnmodified path. - make TestUpdateFromWeb hermetic via httptest; the live httpbin.org dependency intermittently hung and timed out CI. --------- Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>

reflog requested a review from Copilot June 16, 2026 08:28

Copilot started reviewing on behalf of reflog June 16, 2026 08:29 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread read.go

Comment thread source.go Outdated

Comment thread read.go Outdated

Comment thread read_test.go Outdated

reflog requested a review from Copilot June 16, 2026 08:47

Copilot started reviewing on behalf of reflog June 16, 2026 08:48 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread source.go

Comment thread read.go Outdated

Comment thread read_test.go

reflog requested a review from Copilot June 16, 2026 09:02

Copilot started reviewing on behalf of reflog June 16, 2026 09:02 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread sink.go Outdated

Comment thread source.go Outdated

Comment thread source.go

Comment thread read.go Outdated

reflog requested a review from Copilot June 16, 2026 09:11

Copilot started reviewing on behalf of reflog June 16, 2026 09:11 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

reflog merged commit 0364abd into main Jun 16, 2026
1 check passed

This was referenced Jun 16, 2026

build: bump keepcurrent for sync memory fix getlantern/lantern-box#275

Merged

build: bump keepcurrent for sync memory fix getlantern/http-proxy#671

Merged

reflog mentioned this pull request Jun 16, 2026

perf: pre-size fileSource reads (close the InitFrom io.ReadAll churn) #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: pre-size reads and skip redundant copy to cut sync memory#9

perf: pre-size reads and skip redundant copy to cut sync memory#9
reflog merged 4 commits into
mainfrom
reflog/presize-readall-reduce-memory

reflog commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reflog commented Jun 16, 2026

Problem

Fix

Test

Rollout

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants