Skip to content

improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs#5012

Open
TheodoreSpeaks wants to merge 5 commits into
mainfrom
improvement/table-mothership-speed
Open

improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs#5012
TheodoreSpeaks wants to merge 5 commits into
mainfrom
improvement/table-mothership-speed

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Context

Audit of Mothership's table operations against the tables feature's recent perf work (tenant-scoped queries, keyset pagination, background table_jobs). The core user_table row CRUD was already on the fast paths; these were the gaps.

Changes

function_execute table mounts were silently truncated to 100 rows. inputTables mounted via queryRows(table, {}, …) — default limit 100, plus a COUNT(*) and an executions load the CSV never used. Sandbox code computed over truncated data. Mounts now drain selectExportRowPage (keyset, tenant-bounded, delete-mask-aware) in 5k pages, remap stored column-id keys back to display names (previously headers were names but cells were id-keyed, producing empty columns on id-keyed tables), run per-table mounts in parallel, and count toward the 50MB mount budget.

query_rows gets contract-parity limits and keyset paging. limit was passed through unclamped (the model could pull an entire table into one tool result) and only offset paging existed. Now: limitMAX_QUERY_LIMIT (1000) rejected with the contract's message, withExecutions: false, after: {orderKey, id} accepted (rejected with sort, like the HTTP contract), and nextCursor returned when a default-order page fills.

Imports and unbounded filter-deletes get full async-job parity with the UI routes.

  • import_file / create_from_file: CSV/TSV ≥ CSV_ASYNC_IMPORT_THRESHOLD_BYTES (8MB) dispatch tableImportTask (claim slot → dispatch → release-on-failure, mirroring the import-async routes; create mode uses the placeholder-table pattern from the workspace-level route). Smaller/other-format files stay inline but now claim and release the table's one-write-job slot, so a Mothership import can no longer interleave with a running background import/delete.
  • delete_rows_by_filter: explicit limit ≤ 1000 stays inline; with no limit, the match count is measured first (tenant-bounded count) and >1000 matches dispatch tableDeleteTask with the standard {filter, cutoff, doomedCount} payload — doomed rows hidden immediately via the existing pendingDeleteMask.
  • New get_job operation returns the table's derived job state ({id, type, status, error, rowsProcessed}) for polling.

TableImportPayload.deleteSourceFile (default true). The import worker deletes its source object when terminal — correct for the UI's single-use direct uploads, destructive for the persistent workspace files Mothership imports from. Mothership passes false.

Generated artifacts (tool-catalog-v1.ts, tool-schemas-v1.ts) regenerated from the companion contract PR simstudioai/copilot#289 (get_job, after param, limit maximums, background-behavior docs). Sim-side behavior is live regardless; the model sees the new schema once the copilot PR deploys.

Tests

  • user-table.test.ts: limit clamps, keyset after/nextCursor (+ after+sort rejection), inline slot claim/release, slot-held rejection, background import/delete dispatch payloads, get_job (33 passing)
  • function-execute.test.ts (new): full keyset drain across pages (5002-row table), id→name header mapping, cross-workspace rejection
  • import-runner.test.ts (new): source-file cleanup default vs deleteSourceFile: false
  • bunx vitest run lib/copilot lib/table (78 files, 721 tests), bun run lint:check, bun run check:api-validation, tsc --noEmit all green

Related: #5011 (cross-tenant outputTable fix, split out), simstudioai/copilot#289 (contract).

🤖 Generated with Claude Code

TheodoreSpeaks and others added 5 commits June 12, 2026 03:03
… + tenant-scoped query performance (#4915)

* feat(tables): paginated background row-delete jobs via table_jobs

* fix(tables): address review on async row-delete (filtered count, scoped optimistic clear, Cmd+A select-all, hide delete from tray)

* improvement(tables): filter-aware select-all runs, delete-job read mask, keyset index + autovacuum tuning

* feat(tables): run import/delete/export/backfill jobs on trigger.dev with in-process fallback

* improvement(tables): raise delete page to 10k and export batch to 5k

* improvement(tables): raise CSV import batch to 5k rows (param-cap bounded)

* feat(tables): surface export jobs in the header tray with progress, cancel, and download

* improvement(tables): surface exports as derived tables-scoped toasts instead of the import tray

* Revert "improvement(tables): surface exports as derived tables-scoped toasts instead of the import tray"

This reverts commit 1ea5871.

* fix(tables): preserve export storage key (NoSuchKey) and unify jobs in one spinner tray

* improvement(tables): jobs tray icon reflects aggregate state (spinner/check/alert)

* fix(tables): restore jobs tray on the tables list (dropped in staging merge)

* improvement(tables): keyset-paginate export row reads (offset paging was O(n^2) over large tables)

* perf(tables): keyset pagination for grid infinite scroll

Default-order row pages now cursor on (order_key, id) instead of OFFSET —
each page is an index seek on tableOrderKeyIdx, where OFFSET re-scans and
discards every prior row (O(N²) across deep scrolls and full drains like
select-all/export-to-clipboard). Sorted views keep offset paging; the
contract refines after+sort as mutually exclusive. v1 public rows API is
unchanged (extends the unrefined base, omits after).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): show export in job tray immediately on kickoff

The export-jobs query's poll only self-sustains once a running job is
already in the cache, so a freshly kicked export stayed invisible until
an SSE event or page refresh. Invalidate the tray query on kickoff
success so the icon appears right away.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): surface real row/column write errors in toasts

Drizzle wraps DB errors in DrizzleQueryError whose message is the failed
SQL — the real cause (e.g. the row-limit trigger's RAISE) sits on .cause,
so the routes' substring classification never matched and everything fell
through to generic 500s ("Failed to insert row"). Add rootErrorMessage
(cause-chain unwrap) and a shared rowWriteErrorResponse classifier that
consolidates the per-route pattern lists and rewrites the trigger message
into a friendly "Row limit exceeded — capped at N rows". Applied across
the app and v1 row-write routes and the columns route.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound filtered row counts (12.7s -> 0.6s)

JSONB filter predicates (->> ILIKE / range casts) are opaque to the
planner: it estimates a handful of matches and picks a parallel seq scan
over the entire shared user_table_rows relation — every tenant's rows —
for the page-0 COUNT(*), so any non-equality filter on a large table cost
10s+ regardless of how few rows matched. Run filtered counts in a
transaction with SET LOCAL enable_seqscan = off, forcing the
tenant-bounded bitmap plan. Unfiltered counts keep their index-only scan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound Cmd+F search and stream its window (75s -> 2s)

Same planner trap as the filtered count, compounded: the lateral
jsonb_each_text ILIKE is unestimatable, so findRowMatches on a 1M-row
table seq-scanned the whole 12M-row shared relation and disk-sorted
~120MB of window input (75s measured). SET LOCAL enable_seqscan=off
bounds the scan to the tenant; on the default order, additionally
penalizing bitmap/sort/parallel steers the planner onto the already-
sorted (table_id, order_key, id) index walk so row_number() streams
with no sort at all (2s measured). Flags only penalize plan shapes —
a custom sort still sorts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound sorted pages and filtered write selections

Extends the seqscan fix to every remaining jsonb-predicate path, all
measured on a 1M-row table in a 12M-row shared relation:
- sorted page query (ORDER BY data->>'col'): 9.7s/page -> 0.76s, and deep
  pages stop spilling ~130MB sorts to disk
- updateRowsByFilter / deleteRowsByFilter row selection: 14.4s -> bounded
- delete-job worker selectRowIdPage with a filter: 12.6s/page -> bounded
- dispatcher filtered-scope window walk: same shape, same fix

Shared withSeqscanOff helper moves to lib/table/planner.ts (service +
dispatcher both consume it; dispatcher can't import service).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-scoped containment index (migration 0232)

The plain GIN on user_table_rows.data matched @> candidates across every
tenant sharing the relation — a hot value in someone else's table
inflated everyone's equality filters (1.07M candidates fetched for a
33k-row match, lossy bitmap, 1.1s). Replace it with btree_gin
(table_id, data jsonb_path_ops): the tenant intersection happens inside
the index and paths are single hashed entries. Rare-equality probe
326ms -> 17ms with zero wasted candidates; unique-constraint checks and
upsert conflict lookups ride the same index. The new index is smaller
than the one it replaces (529MB vs 694MB on the 12M-row dev relation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound unique-constraint checks (3.5s -> <1s per write)

The unique check runs lower(data->>'col') = $1 LIMIT 1 on every insert
and cell edit touching a unique column. The predicate is unestimatable
and a unique (non-conflicting) value never exits early, so the planner
seq-scanned all 12.3M shared-relation rows per check — 3.5s measured.
Tenant-bound both the single and batch variants; the batch path sets the
flag on the caller's transaction when one is supplied (SET LOCAL dies at
its commit, and the statements that follow are tenant-scoped writes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* perf(tables): tenant-bound upsert conflict lookup

Same unestimatable data->>key predicate as the unique checks; an
insert-path upsert has no existing match so the lookup can't exit early
and seq-scans the whole shared relation. The upsert already runs in a
transaction — set the planner flag on it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(tables): consolidate executor types onto planner exports

service.ts kept a private DbTransaction alias and two inline
typeof db | DbTransaction unions after planner.ts began exporting the
canonical DbTransaction/DbExecutor — import those instead. From the
/simplify review of the perf series; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tests): drop narrow schema mock override in process-contents test

The local vi.mock('@sim/db/schema') stubbed only document/knowledgeBase,
but the file's import graph reaches lib/table/service whose module scope
now references tableJobs. The global schema mock already covers all of
it — rely on it per the testing rules instead of re-mocking.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): scope cancels and counts to the filtered selection (review)

Addresses the open Bugbot/Greptile findings on filtered select-all:
- Filtered runs no longer cancel the whole table: cancelWorkflowGroupRuns
  takes a filter — it stops only dispatches with that exact filter scope
  and only in-flight cells on matching rows (semi-join); whole-table and
  differently-scoped dispatches keep running, their cancelled cells
  skipped via cancelledAt > requestedAt.
- Stop on a filtered select-all sends the filter through cancel-runs
  (contract + route + mutation) instead of a table-wide cancel.
- runColumnBodySchema rejects rowIds + filter together (mirrors
  deleteTableRowsBodySchema).
- Select-all delete clears the selection in onSuccess, not at click, so
  a failed kickoff restores both rows and selection.
- Clipboard copy/cut estimates use the filter-aware total (rowTotal)
  instead of the whole-table rowCount.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: retrigger CI (Actions dropped the previous push events)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: bump api-validation route baseline to 807 (staging route + merge)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): release job claim when trigger.dev dispatch fails

If tasks.trigger (or its dynamic imports) throws after
markTableJobRunning, the ghost running row held the table's
one-write-job slot until the stale-job janitor fired (~15-20 min of
409s). All four kickoff routes now release the claim and rethrow; the
backfill runner releases and warns (a failed backfill never fails the
schema change). Greptile P1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore(db): squash 0232 into 0231 (one migration for the PR)

Both are branch-only — no environment has applied them through the
migration ledger yet — so the tenant-scoped GIN (btree_gin extension,
index swap) folds into 0231_table_jobs_and_keyset. Snapshot chain
re-pointed; drizzle-kit generate confirms zero drift.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): context-menu delete label shows the true select-all count

Under select-all the context menu counted only the loaded page ("Delete
1000 rows" on a 999k-row table) while the action correctly deletes every
matching row via the background job. Delete now gets its own count from
the filter-aware total minus deselections; the run-action labels keep the
loaded-row count since those actions act on loaded rows only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): context-menu bulk actions act on the full select-all scope

Follow-up to the label fix: under select-all the context menu's Run /
Re-run / Stop only acted on the loaded page of rows. They now route
through the same scopes as the action bar — runs dispatch by filter
(whole table when unfiltered), Stop uses the filter-scoped cancel — and
all labels share one true count (filter-aware total minus deselections,
locale-formatted). Like the action bar, filter-scoped runs ignore
deselections (the run API has no exclusion set).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(tables): exclusion set for select-all runs and stops

Select-all minus deselected rows now means exactly that for every bulk
action, not just delete. runColumnBodySchema and cancelTableRunsBodySchema
accept excludeRowIds (bounded by MAX_EXCLUDE_ROW_IDS, select-all scope
only); the dispatch scope persists it and the dispatcher window walk,
eager bulk-clear, pre-run cancel, and filter/table-scoped cancel all skip
excluded rows. Client threads exclusions from the selection through the
action bar and the grid context menu, including the optimistic stamps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): spare excluded-row dispatches on Stop; no orphan placeholder table

Two Bugbot findings on the exclusion work:
- Select-all-minus-deselections Stop (no filter) cancelled every active
  dispatch table-wide, killing row-scoped runs on deselected rows.
  markActiveDispatchesCancelled now spares dispatches whose scope.rowIds
  are fully contained in the exclusion set (coalesce(false) keeps
  table-wide dispatches cancellable).
- Create-mode import: a failed trigger.dev dispatch released the job
  claim but left the just-created placeholder table in the workspace.
  Archive it on the failure path (no hard-delete surface exists).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): row counts reflect a running delete everywhere

A mid-delete refresh resurrected the old counts: the optimistic update
stripped cached rows but left page-0 totalCount (footer / select-all
label) at the old total, and list/detail counts reported raw row_count
including doomed-but-not-yet-deleted rows.

- onMutate now sets the active view's totalCount to the kept rows and
  decrements the cached detail rowCount by the doomed estimate
- the kickoff persists that estimate on the job (payload.doomedCount,
  clamped server-side); getTableById/listTables subtract the
  not-yet-deleted remainder (doomedCount - rows_processed) while the
  delete runs, so refetched counts match the read path's delete mask

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* copy(tables): drop background mention from delete confirm

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): clear select-all immediately when a delete kicks off

The header checkbox lingered as a minus over the optimistically-emptied
grid: rowSelectionCoversAll treats zero rows as not-covered, and the
selection clear waited for the kickoff's onSuccess. Clear at click
(failed kickoffs visibly restore rows + toast; re-selecting is cheap)
and render an empty grid's header checkbox unchecked regardless — a
selection over zero rows is vacuous.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(tables): export takes the async path while a delete job runs

The sync/async export choice reads rowCount, which is a doomed-estimate-
adjusted number during a running delete (and the estimate is client-
supplied) — an overstated estimate could route a still-large masked set
through the synchronous stream. Mid-delete exports now always run as a
job: safe at any size, and exports bypass the one-job-per-table gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(build): stop uploads setup from sweeping the project into route graphs

next build (Turbopack) failed with "Two or more assets with different
content were emitted to the same output path" on the server-root chunk.
Root cause: setup.server.ts's unscoped path.resolve(process.cwd()) made
node-file-tracing sweep the entire project — next.config.ts included —
into every route graph reaching lib/uploads (the files/upload route and,
since the export job, the export-async path). Two producers emitted the
swept config into same-named chunks; staging's latest commits made their
contents diverge and the names collided. Annotate the path derivation
with turbopackIgnore per the NFT warning's own remediation — the build
passes and all ~390 "unexpected file in NFT list" warnings disappear.

Also inline the releaseJobClaim dynamic imports in the kickoff routes to
plain static imports — service is already statically imported there.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…post-index ANALYZE guard (#4997)

* fix(tables): per-batch delete-job commits, real trigger.dev retries, post-index ANALYZE guard

* fix(tables): resume job progress across retries, rethrow root cause for clean failure messages
… up tables feature (#4995)

* improvement(tables): migrate inputs to emcn chip components and clean up tables feature

* fix(tables): address review feedback — stale filter column label, shared FieldError in enrichment config

* improvement(tables): scope create-table callback to stable mutateAsync
…me/drop prompt

`drizzle-kit push --force` only suppresses the data-loss confirm, not the
rename-vs-drop disambiguation prompt. That prompt fires whenever a diff both
adds and drops tables/columns at once (e.g. migration 0231 created
sim_trigger_state while dropping the workspace_notification_* tables), and in
CI it crashes with a bare "Interactive prompts require a TTY" stack trace.

Catch that specific failure in the dev push step and emit a GitHub error
annotation explaining the cause and the fix (drop the stale objects on the dev
DB to match schema.ts — the same DROPs the versioned migration already applied
to staging/prod), instead of leaving an opaque trace. Exit status is preserved
either way.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nds, background import/delete jobs

Mothership's table operations lagged the fast paths the tables feature
already has:

- function_execute mounted inputTables via queryRows with defaults,
  silently truncating the sandbox CSV to 100 rows and paying for a count
  and execution metadata it never used. Mounts now drain the keyset
  export reader page by page, remap stored column-id keys to display
  names (headers previously didn't match id-keyed cell data), mount
  tables in parallel, and count toward the 50MB sandbox mount budget.
- user_table query_rows accepted unbounded limits (whole table into one
  tool result) and only offset paging. It now enforces the contracts'
  MAX_QUERY_LIMIT, skips execution metadata, accepts the keyset `after`
  cursor, and returns `nextCursor` when a default-order page fills.
- import_file / create_from_file / delete_rows_by_filter mutated without
  claiming the per-table job slot, racing running background jobs, and
  ran whole imports inline in the chat request. CSV/TSV imports ≥8MB and
  unbounded filter-deletes matching >1000 rows now dispatch the same
  trigger.dev jobs the UI routes use (slot claim, release-on-dispatch-
  failure, delete mask); inline imports claim and release the slot. A new
  get_job operation surfaces the table's derived job state for polling.
- TableImportPayload grows deleteSourceFile (default true) so Mothership
  imports of persistent workspace files survive the worker's single-use
  source cleanup.

Generated tool catalog artifacts regenerated from simstudioai/copilot#289.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 12, 2026 11:28pm

Request Review

@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Introduces background delete/export paths, job-slot concurrency, and schema/API field renames on core table data; mistakes could cause wrong-row deletes, ghost jobs, or client/server drift.

Overview
Replaces per-table import fields and routes with a shared table_jobs model (jobStatus, jobId, jobType, …) and generic job cancel at /job/cancel. Import, filter-based delete, and large exports are kicked off through new async APIs (optional Trigger.dev dispatch with releaseJobClaim on failed triggers), with the stale-execution cron failing stuck jobs, pruning old terminal rows, and deleting orphaned export files.

Bulk operations now accept filter and excludeRowIds on run, cancel-runs, and delete-async (with upfront tableFilterError validation). Row listing adds after keyset paging; table APIs centralize client-facing write errors via rootErrorMessage / rowWriteErrorResponse.

The workspace grid treats select-all as filter-scoped totals (including unloaded rows): exclusions survive toggles, Cmd+A toggles whole-table selection, delete-all goes through a background job instead of loading every id, and run/stop/copy counts use the filter-scoped total. SSE handles job events (including auto-download for exports). Smaller UI passes swap sidebars/filters to Chip components and shared sidebar field helpers.

Dev CI db:push now surfaces a clear workflow error when Drizzle hits interactive rename/drop prompts without a TTY.

Reviewed by Cursor Bugbot for commit 7310c06. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7310c06. Configure here.

sort: validated.sort ? wire.sortIn(validated.sort) : undefined,
limit: validated.limit,
offset: validated.offset,
after: validated.after,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP rows missing next cursor

Medium Severity

The GET rows handler now accepts an after keyset cursor and passes it to queryRows, but the JSON response never includes a nextCursor when a full page is returned. REST clients cannot advance past the first default-order page without guessing offsets or reusing Mothership-only tooling.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7310c06. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR brings Mothership's table operations to parity with the UI's recent performance work: full-table keyset CSV mounts replacing the silently-truncated 100-row queryRows path, query_rows limit enforcement and after/nextCursor cursor support, and background import/delete job dispatch for large CSV imports and unbounded filter deletes.

  • Schema migration (0233) introduces a table_jobs table with a partial unique index enforcing one active write-job per table, migrates existing import state, replaces the cross-tenant GIN index with a tenant-scoped one, and adds a keyset index on (table_id, id).
  • function-execute now drains the full table via selectExportRowPage in 5k-row pages, remaps column-id keys to display names, and runs per-table mounts in parallel against the 50MB budget — however the entire CSV for every table is built before the budget is checked, so the budget cap is not enforced pre-load.
  • user-table (Mothership tool) adds get_job for polling, enforces MAX_QUERY_LIMIT (1000) on query_rows, dispatches large CSV imports and unbounded filter deletes to background workers, and passes deleteSourceFile: false to preserve persistent workspace files.

Confidence Score: 3/5

The job infrastructure, migration, and background worker paths are solid, but the sandbox table mount builds every table CSV fully in memory before the budget check can reject it — for large enterprise tables this can spike the web container's heap before the error is thrown.

The core job-slot enforcement (partial unique index, markTableJobRunning/releaseJobClaim, stale-job janitor) is well-structured and the import/delete runner paths handle failure and supersession correctly. The one concrete defect is in buildTableCsvForMount: tables are drained in parallel via Promise.all before the 50MB budget loop runs, so all CSVs land in memory simultaneously regardless of size — the budget check comes too late to prevent the allocation. Everything else in the PR (keyset pagination, limit clamping, deleteSourceFile flag, migration) looks correct and tested.

apps/sim/lib/copilot/tools/handlers/function-execute.ts — the parallel CSV build before budget enforcement is the main concern; the rest of the changed files look safe to merge.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/handlers/function-execute.ts Replaces truncated 100-row queryRows mount with full keyset drain via selectExportRowPage; all table CSVs are built in parallel before the 50MB budget check runs, risking OOM for large tables.
apps/sim/lib/copilot/tools/server/table/user-table.ts Adds keyset pagination, limit enforcement, background import/delete dispatch, and get_job operation; doomedCount in delete_rows_by_filter can overestimate if rows are inserted between count and dispatch.
apps/sim/lib/table/import-runner.ts Adds deleteSourceFile flag (defaults true) to skip deletion of persistent workspace files; existing streaming import logic unchanged.
apps/sim/lib/table/delete-runner.ts New background delete worker using keyset pagination on id; ownership-gated per page; correctly calls markTableDeleteFailed on unexpected errors.
apps/sim/lib/table/service.ts Migrates import-status columns to table_jobs table; adds markTableJobRunning, releaseJobClaim, selectExportRowPage, and latestJobForTable helpers; rowCount now subtracts pendingDeleteRemaining for mid-delete accuracy.
packages/db/migrations/0233_table_jobs_and_keyset.sql Creates table_jobs with partial unique index for one active write-job per table; migrates existing import_status data; adds tenant-scoped GIN index and drops old cross-tenant one; all DDL is idempotent.
packages/db/migrations/0234_analyze_user_table_rows.sql Runs ANALYZE on user_table_rows to fix reltuples=Infinity corruption from concurrent index builds; idempotent and safe.
apps/sim/app/api/cron/cleanup-stale-executions/route.ts Updated stale-job janitor to operate on table_jobs instead of import columns; adds pruning of terminal jobs older than 24h and cleans up orphaned export storage objects.
apps/sim/app/api/table/[tableId]/import-async/route.ts Switches from markTableImporting to markTableJobRunning; adds trigger.dev dispatch path with proper releaseJobClaim on dispatch failure.

Sequence Diagram

sequenceDiagram
    participant MB as Mothership Tool
    participant SVC as table/service
    participant JOB as table_jobs DB
    participant WRK as Background Worker
    participant SSE as SSE / get_job

    Note over MB,SSE: import_file (large CSV / trigger.dev path)
    MB->>SVC: markTableJobRunning(tableId, importId, 'import')
    SVC->>JOB: INSERT table_jobs ON CONFLICT DO NOTHING
    JOB-->>SVC: "claimed=true"
    MB->>WRK: tasks.trigger('table-import', payload)
    WRK-->>MB: dispatched
    MB-->>MB: return jobId

    Note over MB,SSE: delete_rows_by_filter (unbounded >1000 matches)
    MB->>SVC: queryRows(filter, limit:1) totalCount
    MB->>SVC: markTableJobRunning(tableId, jobId, 'delete', payload)
    SVC->>JOB: "INSERT status=running payload={filter,cutoff,doomedCount}"
    MB->>WRK: tasks.trigger('table-delete')
    WRK->>SVC: selectRowIdPage + deletePageByIds keyset loop
    WRK->>SVC: updateJobProgress / markJobReady
    SVC->>JOB: "UPDATE status=ready"
    WRK->>SSE: "appendTableEvent status=ready"

    Note over MB,SSE: get_job polling
    MB->>SVC: getTableById(tableId)
    SVC->>JOB: SELECT latest non-export job
    JOB-->>SVC: status rowsProcessed error
    SVC-->>MB: jobStatus jobRowsProcessed
Loading

Reviews (1): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile

Comment on lines +291 to 326
const tableMounts = await Promise.all(
inputTables.map(async (tableRef) => {
const tableId =
typeof tableRef === 'string'
? tableRef
: tableRef && typeof tableRef === 'object'
? (tableRef as CanonicalTableInput).tableId || (tableRef as CanonicalTableInput).path
: undefined
if (!tableId) return null
const table = await resolveTableRef(tableId, tablePathLookup)
if (!table || table.workspaceId !== workspaceId) {
throw new Error(
`Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.`
)
}
const csvContent = await buildTableCsvForMount(table)
const sandboxPath =
typeof tableRef === 'object' && tableRef !== null
? (tableRef as CanonicalTableInput).sandboxPath
: undefined
if (!tableId) continue
const table = await resolveTableRef(tableId, tablePathLookup)
if (!table || table.workspaceId !== workspaceId) {
throw new Error(
`Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.`
)
}
const rows = await queryRows(table, {}, 'copilot-fn-exec')

const allKeys = new Set(table.schema.columns.map((column) => column.name))
for (const row of rows.rows ?? []) {
if (row.data && typeof row.data === 'object') {
for (const key of Object.keys(row.data as Record<string, unknown>)) {
allKeys.add(key)
}
return {
path: sandboxPath || `/home/user/tables/${table.id}.csv`,
content: csvContent,
}
}
const headers = Array.from(allKeys)
const csvLines = [headers.join(',')]
for (const row of rows.rows ?? []) {
const data = (row.data || {}) as Record<string, unknown>
csvLines.push(
headers
.map((h) => {
const val = data[h]
const str = val === null || val === undefined ? '' : String(val)
return str.includes(',') || str.includes('"') || str.includes('\n')
? `"${str.replace(/"/g, '""')}"`
: str
})
.join(',')
})
)
for (const mount of tableMounts) {
if (!mount) continue
if (totalSize + mount.content.length > MAX_TOTAL_SIZE) {
throw new Error(
`Mounting table "${mount.path}" would exceed the ${MAX_TOTAL_SIZE / 1024 / 1024}MB total mount limit. Mount fewer or smaller tables.`
)
}
const csvContent = csvLines.join('\n')
const sandboxPath =
typeof tableRef === 'object' && tableRef !== null
? (tableRef as CanonicalTableInput).sandboxPath
: undefined
sandboxFiles.push({
path: sandboxPath || `/home/user/tables/${table.id}.csv`,
content: csvContent,
})
totalSize += mount.content.length
sandboxFiles.push(mount)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full table CSVs built before budget enforcement

All input tables are serialized in parallel via Promise.all before any size check runs. For each table the buildTableCsvForMount loop fetches every row page-by-page and accumulates the entire CSV in memory — no size gate inside the loop. The outer budget check (totalSize + mount.content.length > MAX_TOTAL_SIZE) only fires after every table has been fully loaded.

Concrete failure: a user mounts two enterprise-plan tables each at ~40MB of rows. Both CSVs are built (80MB in memory) before the loop sees that the second one exceeds the 50MB cap and throws. File and directory mounts check record.size before fetching; tables lack that pre-flight guard. Under traffic, several concurrent requests doing this can exhaust the web container's heap.

Comment on lines +785 to +818
if (args.limit === undefined) {
const { totalCount } = await queryRows(
table,
{ filter: idFilter, limit: 1, withExecutions: false },
requestId
)
const matchCount = totalCount ?? 0
if (matchCount > TABLE_LIMITS.MAX_BULK_OPERATION_SIZE) {
const doomedCount = Math.min(matchCount, table.rowCount)
const cutoff = new Date()
const jobId = generateId()
const payload: TableDeleteJobPayload = {
filter: idFilter,
cutoff: cutoff.toISOString(),
doomedCount,
}
assertNotAborted()
const claimed = await markTableJobRunning(table.id, jobId, 'delete', payload)
if (!claimed) {
return { success: false, message: 'A job is already in progress for this table' }
}
await dispatchDeleteJob({
jobId,
tableId: table.id,
workspaceId,
filter: idFilter,
cutoff,
})
return {
success: true,
message: `Started background delete of ${doomedCount} matching rows (job ${jobId}). The rows are hidden from reads immediately; call get_job to track progress.`,
data: { jobId, doomedCount },
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 doomedCount can overestimate when rows are inserted between count and dispatch

matchCount comes from queryRows(..., { limit: 1, withExecutions: false }) and reflects the delete-mask-filtered row count. Between that count query and markTableJobRunning, rows inserted after the cutoff timestamp could match the filter, making matchCount stale. The cutoff passed to the background worker correctly caps new-row inclusion, but doomedCount stored in the job payload — and surfaced in the get_job response message — would over-report the number of rows being removed. Not a correctness issue for the worker, but the user-visible estimate is an upper bound, not an exact figure.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR closes three performance gaps in Mothership's table operations: sandbox table mounts now drain all rows via keyset pagination instead of silently truncating to 100, query_rows gains a 1000-row limit cap and keyset cursor support, and large CSV/TSV imports plus unbounded filter-deletes are handed off to the existing background-job infrastructure (with a new get_job poll operation). The underlying data model also migrates: per-import state columns on user_table_definitions are replaced by a new table_jobs table with a partial-unique index enforcing one running write-job per table.

  • function-execute mounts: buildTableCsvForMount pages through selectExportRowPage in 5k-row chunks, remaps column-id keys to display names, and counts toward the 50 MB budget — fixing empty-column mounts on id-keyed tables and the silent 100-row truncation.
  • query_rows contract parity: limit now rejected above 1000, withExecutions: false, and keyset after/nextCursor accepted (rejected alongside sort); inline imports/deletes claim the one-write-job slot to block interleaving.
  • Background import/delete in Mothership: CSV/TSV ≥ 8 MB dispatches tableImportTask; unbounded filter-deletes above 1000 rows dispatch tableDeleteTask with a pendingDeleteMask that hides doomed rows immediately from all read paths.

Confidence Score: 3/5

Safe to merge for most workloads; enterprise tables with hundreds of thousands of rows could exhaust Vercel function memory during a sandbox mount before the 50 MB budget check fires.

The core job-slot migration, keyset pagination, background import/delete dispatch, and schema change are all well-structured and tested. The one concern worth resolving before a wide enterprise rollout is buildTableCsvForMount: all table CSVs are fully assembled in parallel memory before the budget check, so a user who mounts a 500k-row enterprise table gets a function OOM rather than a clean rejection.

apps/sim/lib/copilot/tools/handlers/function-execute.ts — the buildTableCsvForMount loop and the parallel Promise.all mount assembly need a per-page budget check to avoid OOM on large enterprise tables.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/tools/handlers/function-execute.ts Replaces the truncated 100-row queryRows mount with a keyset-draining buildTableCsvForMount that reads all rows and remaps id-keyed cells back to display names. Has a memory safety issue: the full CSV is materialized in memory before the 50 MB budget check runs, and pendingDeleteMask is called once per page (N+1 pattern).
apps/sim/lib/table/service.ts Major overhaul: migrates import-status fields from user_table_definitions to table_jobs, adds selectExportRowPage (position-keyset export drain), pendingDeleteMask for delete-job visibility, selectRowIdPage/deletePageByIds for async delete workers, and latestJobForTable/latestJobsForTables for the new job fields. pendingDeleteMask now runs on every queryRows call regardless of table state.
apps/sim/lib/copilot/tools/server/table/user-table.ts Adds query_rows limit clamping, keyset cursor (after/nextCursor), background delete dispatch for unbounded deletes, background import dispatch for large CSV/TSV files, get_job polling operation, and job-slot claim/release for inline imports. Logic is well-structured with proper guards and rollback paths.
apps/sim/lib/table/import-runner.ts Renames import-specific service calls to generic job variants (markImportFailed→markJobFailed, etc.), adds deleteSourceFile flag (defaults true for UI single-use uploads; pass false for Mothership persistent workspace files), and updates SSE events to the new job event kind.
packages/db/migrations/0233_table_jobs_and_keyset.sql Creates table_jobs with partial unique index (one running write-job per table), migrates existing import_status rows idempotently via ON CONFLICT DO NOTHING, tunes autovacuum on user_table_rows, and adds two CONCURRENT indexes (table_id+id keyset, tenant-scoped btree_gin) while dropping the old cross-tenant GIN index.
apps/sim/lib/table/delete-runner.ts New async delete worker: walks rows in keyset pages (selectRowIdPage → deletePageByIds), checks job ownership on each page, applies cutoff/filter/exclusion scope, emits SSE events, and marks the job terminal or failed.
packages/db/schema.ts Adds tableJobs table with unique partial index, replaces the old cross-tenant GIN index on user_table_rows with a tenant-scoped btree_gin index, adds table_id+id keyset index, and removes import_* columns from userTableDefinitions.
apps/sim/lib/table/dispatcher.ts Extends DispatchScope with filter/excludeRowIds for select-all-under-filter and select-all-minus-deselections, wraps the dispatcherStep page query in withSeqscanOff for jsonb-filtered scopes, and adds scope-aware markActiveDispatchesCancelled with JSONB scope comparison.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Mothership: import_file / create_from_file] --> B{CSV/TSV 8 MB?}
    B -- No --> C[markTableJobRunning claim slot]
    C --> D[Inline import parseFileRows + batchInsert]
    D --> E[releaseJobClaim]
    B -- Yes --> F[createTable with jobStatus=running or markTableJobRunning]
    F --> G{isTriggerDevEnabled?}
    G -- Yes --> H[tasks.trigger tableImportTask]
    G -- No --> I[runDetached runTableImport]
    H -- dispatch error --> J[releaseJobClaim / deleteTable]
    I --> K[runTableImport streaming keyset import]
    H --> K

    L[Mothership: delete_rows_by_filter] --> M{limit === undefined?}
    M -- Yes --> N[queryRows count with pendingDeleteMask]
    N --> O{matchCount > 1000?}
    O -- No --> P[deleteRowsByFilter inline]
    O -- Yes --> Q[markTableJobRunning claim]
    Q --> R[dispatchDeleteJob]
    R --> S[runTableDelete keyset page walk]
    M -- No --> P

    T[Mothership: query_rows] --> U{limit > 1000?}
    U -- Yes --> V[return error]
    U -- No --> W[queryRows with pendingDeleteMask + after cursor]
    W --> X[return rows + nextCursor]

    Y[Mothership: get_job] --> Z[getTableById latestJobForTable]
    Z --> AA[return job status / rowCount]
Loading

Reviews (2): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile

Comment on lines +73 to 108
* would corrupt values.
*/
function formatMountCsvValue(value: unknown): string {
if (value === null || value === undefined) return ''
if (value instanceof Date) return value.toISOString()
if (typeof value === 'object') return JSON.stringify(value)
return String(value)
}

/**
* Serializes a full table to CSV for a sandbox mount. Walks the keyset export
* reader page by page so every row is included (`queryRows` with defaults
* silently truncated mounts to its 100-row page and paid for a count and
* execution metadata the CSV never used), and remaps stored column-id keys
* back to display names so headers line up with cell values.
*/
async function buildTableCsvForMount(table: TableDefinition): Promise<string> {
const nameById = buildNameById(table.schema)
const headers = table.schema.columns.map((c) => c.name)
const lines = [toCsvRow(headers)]
let after: { position: number; id: string } | null = null
while (true) {
const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE)
for (const row of page) {
const data = rowDataIdToName(row.data, nameById)
lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header]))))
}
if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n')
const last = page[page.length - 1]
after = { position: last.position, id: last.id }
}
}

async function resolveInputFiles(
workspaceId: string,
inputFiles?: unknown[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Full CSV materialized in memory before the 50 MB budget check

buildTableCsvForMount drains every row into a lines[] array and only returns after the full table is assembled. The budget check in the caller happens after all table CSVs are built via Promise.all. For an enterprise table with 100k+ rows, the CSV string can be hundreds of megabytes or more before the check ever fires — the budget guard prevents the file from reaching the sandbox, but the memory damage is already done (and in the parallel case, all tables are materialized simultaneously).

The old queryRows path implicitly bounded this to 100 rows. A straightforward fix is to track a running byte count inside the drain loop and throw early once the caller's budget would be exceeded — or pass maxBytes as a parameter and abort the page walk.

Comment on lines +73 to 108
* would corrupt values.
*/
function formatMountCsvValue(value: unknown): string {
if (value === null || value === undefined) return ''
if (value instanceof Date) return value.toISOString()
if (typeof value === 'object') return JSON.stringify(value)
return String(value)
}

/**
* Serializes a full table to CSV for a sandbox mount. Walks the keyset export
* reader page by page so every row is included (`queryRows` with defaults
* silently truncated mounts to its 100-row page and paid for a count and
* execution metadata the CSV never used), and remaps stored column-id keys
* back to display names so headers line up with cell values.
*/
async function buildTableCsvForMount(table: TableDefinition): Promise<string> {
const nameById = buildNameById(table.schema)
const headers = table.schema.columns.map((c) => c.name)
const lines = [toCsvRow(headers)]
let after: { position: number; id: string } | null = null
while (true) {
const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE)
for (const row of page) {
const data = rowDataIdToName(row.data, nameById)
lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header]))))
}
if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n')
const last = page[page.length - 1]
after = { position: last.position, id: last.id }
}
}

async function resolveInputFiles(
workspaceId: string,
inputFiles?: unknown[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 pendingDeleteMask fetched on every page — N+1 DB round-trips during drain

selectExportRowPage calls pendingDeleteMask(table) unconditionally, so a 100k-row table drained in 20 pages makes 20 extra DB round-trips just to check whether a delete job is running. The mask is derived from table.id which is constant, and the delete job status won't change meaningfully mid-drain. Fetching it once before the loop and threading it through would eliminate the repeated queries.

Comment on lines 2857 to +2870
* @param requestId - Request ID for logging
* @returns Query result with rows and pagination info
*/
/**
* Visibility mask for a running delete job: returns a clause keeping only rows the job will NOT
* delete, or `undefined` when no delete job is running. The job's persisted scope
* ({@link TableDeleteJobPayload}) defines the doomed set — `matches(filter) AND created_at <=
* cutoff AND id NOT IN excludeRowIds` — exactly what the worker's `selectRowIdPage` selects, so
* mid-job reads (refresh, other clients, exports) are consistent with the eventual result. The
* mask lifts automatically when the job leaves `running` (done, failed, or canceled).
*
* `(doomed) IS NOT TRUE` rather than `NOT (doomed)`: JSONB predicates evaluate to NULL on missing
* cells, and those rows are NOT selected for deletion (NULL ≠ TRUE) — they must stay visible.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 pendingDeleteMask adds an extra DB query to every queryRows call

pendingDeleteMask is now called at the top of queryRows on every invocation — adding a round-trip to every read path (grid page loads, Mothership query_rows, count refreshes, etc.). The vast majority of tables will never have a running delete job, so this query returns nothing almost all the time. A lightweight fast-path could skip the query if the table's derived jobStatus (already available on TableDefinition) isn't 'running'/'delete', or the mask could be pre-fetched where the caller already holds a recent TableDefinition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants