perf: pre-size fileSource reads (close the InitFrom io.ReadAll churn)#10
Merged
Conversation
Follow-up to #9. That PR made webSource and tarGzSource size-aware, but fileSource was left returning a bare *os.File, so readAll's knownSize() returned false for file reads and fell back to io.ReadAll's doubling growth. geo loads its cached ~75MB MaxMind mmdb from disk via Runner.InitFrom(FromFile(path)) at startup, so that read churned through the 75->60->48->38->30->24MB realloc staircase on every boot (confirmed via alloc_space pprof on a lantern-box v0.0.91 box: io.ReadAll 373MB under keepcurrent.readAll <- syncOnce <- InitFrom <- geo.FromWeb). Make fileSource.Fetch return a sizedReadCloser carrying the stat size when there's no preprocessor, so the read is a single pre-sized allocation. The preprocessor path keeps the io.ReadAll fallback (a preprocessor can change the byte count). Also close the file on the early-return/error paths, which previously leaked the fd. go build/vet/test green; adds TestFileSourceIsSizeAware.
This was referenced Jun 16, 2026
There was a problem hiding this comment.
Pull request overview
This PR is a follow-up performance improvement to reduce transient allocation churn when initializing/syncing from disk-backed sources by making FromFile(...).Fetch(...) size-aware (so readAll can pre-size) and tightening file-handle cleanup paths.
Changes:
- Update
fileSource.Fetchto surface file size viasizedReadCloserwhen no preprocessor is used. - Close the opened file on early-return (
ErrUnmodified) and error paths. - Add a unit test asserting
FromFileproduces a size-aware reader and thatreadAllperforms a single pre-sized allocation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| source.go | Makes FromFile fetches size-aware and adds explicit file closes on error / unmodified paths. |
| read_test.go | Adds a test to validate file sources report size and that readAll pre-sizes the buffer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…eprocessor stream, hermetic web test - fileSource.Fetch returned ErrUnmodified exactly when the file WAS newer than the cutoff (and fetched when unchanged). Invert to match the Source.Fetch contract and byteSource.Fetch. - couple a preprocessor's returned stream's Close to the underlying *os.File so io.NopCloser-style preprocessors don't leak the descriptor. - add TestFileSourceUnmodified covering the ErrUnmodified path. - make TestUpdateFromWeb hermetic via httptest; the live httpbin.org dependency intermittently hung and timed out CI.
reflog
added a commit
to getlantern/http-proxy
that referenced
this pull request
Jun 16, 2026
* build: bump keepcurrent for fileSource pre-size fix Pulls in getlantern/keepcurrent#10, which makes fileSource size-aware so geo's startup InitFrom(FromFile) read of the cached ~75MB mmdb stops churning through io.ReadAll's realloc staircase. Completes the memory fix started in #671 (which covered the web-sync path). NOTE: pinned to the keepcurrent PR-head commit; re-pin to the squashed main commit after keepcurrent#10 merges (go get keepcurrent@<main> && go mod tidy). * build: re-pin keepcurrent to merged main (fileSource pre-size fix #10) --------- Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>
reflog
added a commit
to getlantern/lantern-box
that referenced
this pull request
Jun 16, 2026
* build: bump keepcurrent for fileSource pre-size fix Pulls in getlantern/keepcurrent#10, which makes fileSource size-aware so geo's startup InitFrom(FromFile) read of the cached ~75MB mmdb stops churning through io.ReadAll's realloc staircase. Completes the memory fix started in #275 (which covered the web-sync path). NOTE: pinned to the keepcurrent PR-head commit; re-pin to the squashed main commit after keepcurrent#10 merges (go get keepcurrent@<main> && go mod tidy). * build: re-pin keepcurrent to merged main (fileSource pre-size fix #10) --------- Co-authored-by: Ilya Yakelzon <reflog@getlantern.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #9.
Gap
#9 made
webSource(HTTP Content-Length) andtarGzSource(archive entry size) size-aware, soreadAllpre-sizes those reads. ButfileSource.Fetchstill returned a bare*os.File, which exposes no size — soreadAll'sknownSize()returns false for file reads and falls back toio.ReadAll's doubling growth.getlantern/geoloads its cached ~75MB MaxMind mmdb from disk viaRunner.InitFrom(FromFile(path))at startup, so that read still churns through the75→60→48→38→30→24MBrealloc staircase on every boot.Confirmed via
alloc_spacepprof on a production lantern-boxv0.0.91box (already carrying #9):Live/inuse heap was already clean (the web-sync path from #9 pre-sizes correctly); this is the remaining transient spike, and it lands at startup on the smallest (1GB) boxes.
Fix
fileSource.Fetchnow returnssizedReadCloser{f, fi.Size()}when there's no preprocessor, so the read is a single pre-sized allocation. The preprocessor path keeps theio.ReadAllfallback (a preprocessor can change the byte count, so the file size isn't the payload size). Also closes the file on the early-return (ErrUnmodified) and error paths, which previously leaked the fd.go build/vet/testgreen; addsTestFileSourceIsSizeAware.Rollout
Downstream bumps follow (http-proxy-lantern, lantern-box) once this merges.