stream: reduce allocations on WHATWG streams hot paths#63876
Conversation
Pure-JavaScript optimizations to lib/internal/webstreams/*: - Reuse the pull and write promise-reaction closures per controller instead of allocating two fresh closures per chunk. They are created lazily on the first pull/write so construction-only workloads never pay for them. - Add the buffered fast path to the ReadableStream async iterator, mirroring the one ReadableStreamDefaultReader.read() already has: when data is queued in a default controller, dequeue directly and skip the read request, the deferred promise, and the promise chaining on the following call. - Run the post-start step through queueMicrotask() when the start algorithm result is not an object (fulfillment is guaranteed and no .then lookup is observable), instead of wrapping it in new Promise((r) => r(result)) plus a two-closure reaction. Object and thenable results keep the promise path since their adoption timing and .then lookups are observable. - Specialize the promise-callback wrappers for user algorithms by arity (0/1/2), replacing the rest-parameter + ReflectApply form that allocated an arguments array per invocation. The exact number of arguments each user callback observes is preserved. - Share immutable nil records for the writable stream closeRequest, inFlightWriteRequest, inFlightCloseRequest and pendingAbortRequest resets; these records are only ever replaced wholesale. Push the PromiseWithResolvers() record directly as the write request rather than rebuilding an identical object. - Remove dead per-instance allocations: the never-read close record in the writable stream state, the placeholder close/ready records that reader/writer setup unconditionally replaces, the per-stream () => 1 size algorithm closures, and the kControllerErrorFunction placeholder plus bound function (now a prototype method; byte streams keep their historical no-op behavior there). Benchmark results vs main, same-day builds, benchmark/compare.js --runs 10 (statistically significant rows): creation WritableStream (n=500k) +97.1% *** creation TransformStream (n=500k) +50.3% *** creation ReadableStream (n=500k) +13.9% *** creation ReadableStreamBYOBReader (n=500k) +13.0% *** creation ReadableStreamDefaultReader (n=500k) +9.8% ** readable-async-iterator +38.1% *** pipe-to (16 hwm configs) +2.6..+4.5% (all positive) js_transfer WS / TS / RS +7.7% / +6.3% / +3.0% readable-read normal/byob, read-buffered parity (n.s.) WPT streams/compression/encoding results are identical to main (1403/338/3822 subtests passed, same 8 expected failures by name), and all webstreams-related parallel tests pass. Assisted-by: Claude Fable 5 <noreply@anthropic.com>
MattiasBuelens
left a comment
There was a problem hiding this comment.
Looks pretty sensible to me! 👍
| // No read is in flight. Mirror the buffered fast path of | ||
| // ReadableStreamDefaultReader.read(): when data is already queued | ||
| // in a default controller, resolve immediately without allocating | ||
| // a read request. The result settles synchronously, so leaving | ||
| // state.current undefined matches the state the slow path reaches | ||
| // once its read request callbacks have settled. |
There was a problem hiding this comment.
I wonder if we always have to inline this... 🤔
I ported some of your previous optimizations to web-streams-polyfill, but instead of copying the code around I made defaultReader.read() create a different kind of ReadRequest if it knows that it will be resolved synchronously.
Perhaps we can do the same here, and have nextSteps create a different kind of AsyncIteratorReadRequest if it can be resolved synchronously? (I forgot to do that in my polyfill, it seems. 😅 ) Or would that risk turning readableStreamDefaultReaderRead megamorphic? (Or maybe it already is?)
| queueMicrotask(() => { | ||
| controller[kState].started = true; | ||
| assert(!controller[kState].pulling); | ||
| assert(!controller[kState].pullAgain); | ||
| readableStreamDefaultControllerCallPullIfNeeded(controller); | ||
| }); |
There was a problem hiding this comment.
Nit-pick: maybe pull this callback into a const, so we can reuse it for the PromisePrototypeThen below?
Pure-JavaScript allocation reductions on the WHATWG streams hot paths partially based on the findings of #63872 : reused promise-reaction closures per controller (pull/write), a buffered fast path in the async iterator,
queueMicrotask()for non-thenable start results, arity-specialized algorithm wrappers, shared nil state records, and removal of several dead per-instance allocations. No observable behavior change: WPT streams/compression/encoding results are identical tomain(same subtests passing, same 8 expected failures by name).Benchmark results (
benchmark/compare.js --runs 10, both binaries built the same day from the same toolchain):The
creation.jsrows at the stockn=50000measure a 20-40ms window and are unreliable; re-run at--set n=500000: