Skip to content

fix(cli): close litellm async clients after each compile/index (CLOSE-WAIT/FD leak)#91

Merged
KylinMountain merged 1 commit into
VectifyAI:mainfrom
cnndabbler:fix/litellm-async-client-leak
Jun 12, 2026
Merged

fix(cli): close litellm async clients after each compile/index (CLOSE-WAIT/FD leak)#91
KylinMountain merged 1 commit into
VectifyAI:mainfrom
cnndabbler:fix/litellm-async-client-leak

Conversation

@cnndabbler

Copy link
Copy Markdown
Contributor

Problem

add_single_file compiles each document in a fresh asyncio.run() event loop. LiteLLM caches its async (aiohttp) clients per event loop, so when each loop ends the previous doc's clients are abandoned without being closed. Their HTTP connections sit in CLOSE-WAIT and accumulate sockets/file descriptors across a long ingest.

Observed on a 165-document ingest against a remote API: the process held 200+ sockets in CLOSE-WAIT, climbing per doc. (On a box with a low ulimit -n this would eventually exhaust FDs and start failing compilations.)

Fix

Add a best-effort _close_litellm_async_clients() (calls litellm's own close_litellm_async_clients(), never raises) and invoke it in try/finally around the three async entry points in add_single_file:

  • index_long_document(...)
  • asyncio.run(compile_long_doc(...))
  • asyncio.run(compile_short_doc(...))

So cached clients are closed after each doc whether it succeeds or fails.

Verification

Added a doc end-to-end after the change: CLOSE-WAIT returns to ~0 after each doc instead of accumulating. Updated test_add_short_doc_runs_compiler (the compile path now drives asyncio.run for both the compile and the cleanup, so it asserts the compile_short_doc coroutine was run rather than that asyncio.run was called exactly once).

Relation to #44

#44 carried this same intent but is now ~23 commits behind main and conflicts with the current indexer.py/cli.py. This is a minimal reimplementation on current main, so it supersedes #44.

litellm caches aiohttp clients per event loop. add_single_file runs each doc
via a fresh asyncio.run() loop, so the previous loop's clients are abandoned
and their HTTP connections linger in CLOSE-WAIT, accumulating sockets/FDs over
a long ingest (observed 200+ against a remote API on a 165-doc run).

Add _close_litellm_async_clients() (best-effort, never raises) and call it in
try/finally around index_long_document and both compile_short_doc /
compile_long_doc calls. Verified: CLOSE-WAIT returns to ~0 after each doc.

Supersedes the now-stale VectifyAI#44 (which carried the same intent on an old base).

@KylinMountain KylinMountain left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you so much for your contribution — this is a great fix. I'll go ahead and merge your PR.

@KylinMountain KylinMountain merged commit 933fd12 into VectifyAI:main Jun 12, 2026
KylinMountain added a commit that referenced this pull request Jun 12, 2026
…e loop (#95)

#91 closed litellm's cached async clients from the CLI layer in a separate
asyncio.run(), i.e. after the loop that created them was already torn down,
with the error swallowed by a bare except. Close them in the same loop that
created them instead, by moving the cleanup into the compile coroutines.

- add compiler._close_async_llm_clients() (best-effort, logs at debug)
- call it from compile_short_doc / compile_long_doc finally blocks
- drop cli._close_litellm_async_clients() and the three try/finally wraps
- drop the no-op cleanup around index_long_document (indexer uses no litellm)
- revert the test_add_command assertion to assert_called_once()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants