Skip to content

sql: collapse FROM-less correlated existence subqueries to a filter#37442

Draft
antiguru wants to merge 1 commit into
MaterializeInc:mainfrom
antiguru:hir-subquery-simplify-2613-2969
Draft

sql: collapse FROM-less correlated existence subqueries to a filter#37442
antiguru wants to merge 1 commit into
MaterializeInc:mainfrom
antiguru:hir-subquery-simplify-2613-2969

Conversation

@antiguru

@antiguru antiguru commented Jul 4, 2026

Copy link
Copy Markdown
Member

A FROM-less correlated existence subquery is a pure predicate on the outer row.
Decorrelation lowers it into a semijoin or antijoin that the MIR transforms do not collapse back to a filter.
This leaves avoidable joins in the plan for the shapes reported in database-issues#2613 (1 IN (SELECT 1 WHERE p)) and database-issues#2969 (NOT EXISTS (SELECT 1 WHERE p)).

Add an HIR simplification pass, simplify_from_less_existence_subqueries, that runs after try_simplify_quantified_comparisons (which already normalizes both IN and EXISTS shapes to an Exists node).
When the Exists body is a FROM-less correlated chain of Map/Project/Filter over a single-row constant, it rewrites EXISTS(chain, preds) to (preds) IS TRUE, inlining inner column references and shifting outer ones down one level as the predicate leaves the subquery.

The IS TRUE wrapper is load-bearing for null safety.
NOT EXISTS keeps the outer row when the subquery is empty, which is when the predicate is FALSE or NULL.
NOT ((p) IS TRUE) = p IS NOT TRUE is true for both, matching.
A plain NOT p would drop the NULL row.

The guard fires only on the FROM-less correlated pure-existence shape, so subqueries with a FROM clause remain genuine anti/semi-joins.

The pass is gated behind the enable_simplify_from_less_existence feature flag, default off in production and on in CI and tests.

Tests: subquery.slt covers both issues with EXPLAIN before/after in the flag-off and flag-on states, plus an explicit NULL-row test for the antijoin rewrite.

Fixes database-issues#2613

Motivation

Fixes a plan-quality gap: FROM-less correlated existence subqueries leave avoidable semi/antijoins.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog entry.

@antiguru antiguru force-pushed the hir-subquery-simplify-2613-2969 branch from 0ef6d98 to 75f4564 Compare July 4, 2026 12:15
A FROM-less correlated existence subquery is a pure predicate on the
outer row, but decorrelation lowers it into a semijoin or antijoin that
the MIR transforms do not collapse back to a filter. This leaves
avoidable joins in the plan for the shapes reported in
database-issues#2613 (`1 IN (SELECT 1 WHERE p)`) and
database-issues#2969 (`NOT EXISTS (SELECT 1 WHERE p)`).

Add an HIR simplification pass, `simplify_from_less_existence_subqueries`,
that runs after `try_simplify_quantified_comparisons` (which already
normalizes both `IN` and `EXISTS` shapes to an `Exists` node). When the
`Exists` body is a FROM-less correlated chain of `Map`/`Project`/`Filter`
over a single-row constant, it rewrites `EXISTS(chain, preds)` to
`(preds) IS TRUE`, inlining inner column references and shifting outer
ones down one level as the predicate leaves the subquery.

The `IS TRUE` wrapper is load-bearing for null safety. `NOT EXISTS`
keeps the outer row when the subquery is empty, which is when the
predicate is FALSE or NULL. `NOT ((p) IS TRUE)` = `p IS NOT TRUE` is
true for both, matching. A plain `NOT p` would drop the NULL row.

The pass is gated behind the `enable_simplify_from_less_existence`
feature flag, default off in production and on in CI and tests. The
guard fires only on the FROM-less correlated pure-existence shape, so
subqueries with a FROM clause remain genuine anti/semi-joins.

Tests: subquery.slt covers both issues with EXPLAIN before/after in the
flag-off and flag-on states, plus an explicit NULL-row test for the
antijoin rewrite. not-null-propagation.slt is unchanged with the flag
defaulting off.

Fixes database-issues#2613

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@antiguru antiguru force-pushed the hir-subquery-simplify-2613-2969 branch from 75f4564 to 6df045a Compare July 4, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant