Skip to content

perf: pre-size fileSource reads (close the InitFrom io.ReadAll churn)#10

Merged
reflog merged 2 commits into
mainfrom
reflog/presize-file-source
Jun 16, 2026
Merged

perf: pre-size fileSource reads (close the InitFrom io.ReadAll churn)#10
reflog merged 2 commits into
mainfrom
reflog/presize-file-source

Conversation

@reflog

@reflog reflog commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #9.

Gap

#9 made webSource (HTTP Content-Length) and tarGzSource (archive entry size) size-aware, so readAll pre-sizes those reads. But fileSource.Fetch still returned a bare *os.File, which exposes no size — so readAll's knownSize() returns false for file reads and falls back to io.ReadAll's doubling growth.

getlantern/geo loads its cached ~75MB MaxMind mmdb from disk via Runner.InitFrom(FromFile(path)) at startup, so that read still churns through the 75→60→48→38→30→24MB realloc staircase on every boot.

Confirmed via alloc_space pprof on a production lantern-box v0.0.91 box (already carrying #9):

io.ReadAll  373.82MB  (under)
  keepcurrent.readAll
  keepcurrent.(*Runner).syncOnce
  keepcurrent.(*Runner).InitFrom
  geo.FromWeb

Live/inuse heap was already clean (the web-sync path from #9 pre-sizes correctly); this is the remaining transient spike, and it lands at startup on the smallest (1GB) boxes.

Fix

fileSource.Fetch now returns sizedReadCloser{f, fi.Size()} when there's no preprocessor, so the read is a single pre-sized allocation. The preprocessor path keeps the io.ReadAll fallback (a preprocessor can change the byte count, so the file size isn't the payload size). Also closes the file on the early-return (ErrUnmodified) and error paths, which previously leaked the fd.

go build/vet/test green; adds TestFileSourceIsSizeAware.

Rollout

Downstream bumps follow (http-proxy-lantern, lantern-box) once this merges.

Follow-up to #9. That PR made webSource and tarGzSource size-aware, but
fileSource was left returning a bare *os.File, so readAll's knownSize()
returned false for file reads and fell back to io.ReadAll's doubling growth.

geo loads its cached ~75MB MaxMind mmdb from disk via
Runner.InitFrom(FromFile(path)) at startup, so that read churned through the
75->60->48->38->30->24MB realloc staircase on every boot (confirmed via
alloc_space pprof on a lantern-box v0.0.91 box: io.ReadAll 373MB under
keepcurrent.readAll <- syncOnce <- InitFrom <- geo.FromWeb).

Make fileSource.Fetch return a sizedReadCloser carrying the stat size when
there's no preprocessor, so the read is a single pre-sized allocation. The
preprocessor path keeps the io.ReadAll fallback (a preprocessor can change
the byte count). Also close the file on the early-return/error paths, which
previously leaked the fd.

go build/vet/test green; adds TestFileSourceIsSizeAware.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a follow-up performance improvement to reduce transient allocation churn when initializing/syncing from disk-backed sources by making FromFile(...).Fetch(...) size-aware (so readAll can pre-size) and tightening file-handle cleanup paths.

Changes:

  • Update fileSource.Fetch to surface file size via sizedReadCloser when no preprocessor is used.
  • Close the opened file on early-return (ErrUnmodified) and error paths.
  • Add a unit test asserting FromFile produces a size-aware reader and that readAll performs a single pre-sized allocation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
source.go Makes FromFile fetches size-aware and adds explicit file closes on error / unmodified paths.
read_test.go Adds a test to validate file sources report size and that readAll pre-sizes the buffer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source.go
Comment thread read_test.go
…eprocessor stream, hermetic web test

- fileSource.Fetch returned ErrUnmodified exactly when the file WAS newer
  than the cutoff (and fetched when unchanged). Invert to match the
  Source.Fetch contract and byteSource.Fetch.
- couple a preprocessor's returned stream's Close to the underlying *os.File
  so io.NopCloser-style preprocessors don't leak the descriptor.
- add TestFileSourceUnmodified covering the ErrUnmodified path.
- make TestUpdateFromWeb hermetic via httptest; the live httpbin.org
  dependency intermittently hung and timed out CI.
@reflog reflog merged commit f204338 into main Jun 16, 2026
1 check passed
reflog added a commit to getlantern/http-proxy that referenced this pull request Jun 16, 2026
* build: bump keepcurrent for fileSource pre-size fix

Pulls in getlantern/keepcurrent#10, which makes fileSource size-aware so
geo's startup InitFrom(FromFile) read of the cached ~75MB mmdb stops
churning through io.ReadAll's realloc staircase. Completes the memory fix
started in #671 (which covered the web-sync path).

NOTE: pinned to the keepcurrent PR-head commit; re-pin to the squashed main
commit after keepcurrent#10 merges (go get keepcurrent@<main> && go mod tidy).

* build: re-pin keepcurrent to merged main (fileSource pre-size fix #10)

---------

Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>
reflog added a commit to getlantern/lantern-box that referenced this pull request Jun 16, 2026
* build: bump keepcurrent for fileSource pre-size fix

Pulls in getlantern/keepcurrent#10, which makes fileSource size-aware so
geo's startup InitFrom(FromFile) read of the cached ~75MB mmdb stops
churning through io.ReadAll's realloc staircase. Completes the memory fix
started in #275 (which covered the web-sync path).

NOTE: pinned to the keepcurrent PR-head commit; re-pin to the squashed main
commit after keepcurrent#10 merges (go get keepcurrent@<main> && go mod tidy).

* build: re-pin keepcurrent to merged main (fileSource pre-size fix #10)

---------

Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants