Skip to content

Fix single-node labeled pattern expressions not filtering by label (#2443)#2444

Open
gregfelice wants to merge 1 commit into
apache:masterfrom
gregfelice:fix/single-node-label-pattern-filter
Open

Fix single-node labeled pattern expressions not filtering by label (#2443)#2444
gregfelice wants to merge 1 commit into
apache:masterfrom
gregfelice:fix/single-node-label-pattern-filter

Conversation

@gregfelice

Copy link
Copy Markdown
Contributor

Problem

A single-node labeled pattern used as a boolean expression — e.g. WHERE (a:Person) or WHERE EXISTS((a:Person)) — is accepted but does not test the bound vertex's label. It evaluates as trivially true, so the predicate matches every row.

Repro (on master):

SELECT * FROM cypher('g', $$ CREATE (:Person {name:'Alice'}), (:Animal {name:'Rex'}) $$) AS (r agtype);

-- returns BOTH Alice and Rex; should return only Alice
SELECT * FROM cypher('g', $$ MATCH (a) WHERE (a:Person) RETURN a.name $$) AS (n agtype);

Root cause

A single-node pattern expression desugars to an EXISTS sub-pattern. make_path_join_quals() returns early for vertex-only patterns (list_length(entities) < 3), emitting no quals. With no edge to carry a correlation, the sub-pattern references nothing from the enclosing query, so the planner produces an uncorrelated one-time InitPlan that is true whenever any vertex of that label exists — independent of the outer row.

Relationship patterns are unaffected: the edge-driven join (start_id = a.id) correlates them to the outer variable, which is why WHERE (a)-[:R]->(b) works correctly.

EXPLAIN before the fix — uncorrelated InitPlan with a One-Time Filter:

InitPlan 1
  ->  Result
One-Time Filter: ((InitPlan 1).col1)::agtype

After the fix — correlated SubPlan referencing the outer a_1:

SubPlan 1
  ->  Result
        One-Time Filter: ((_extract_label_id(a_1.id))::integer = 3)

Fix

In make_path_join_quals(), for a vertex-only pattern whose vertex carries a non-default label and whose variable is declared in an ancestor parse state (a correlated reference), emit an explicit label-id filter. make_qual() builds a name-based id reference that resolves to the outer variable, so the filter both correlates the sub-pattern to that variable and enforces the label.

Freshly scanned, non-correlated vertices (no ancestor binding) are untouched, so MATCH (a:Person) and "does any X exist" EXISTS checks behave exactly as before.

Testing

Added regression coverage to pattern_expression against a graph containing a non-Person vertex:

  • WHERE (a:Person) → only the :Person vertices
  • WHERE NOT (a:Person) → only the non-Person vertex
  • WHERE EXISTS((a:Company)) → only the :Company vertex

All 41 regression tests pass (make installcheck).

Note

RETURN (a:Label) in a projection on an unlabeled bound variable still errors with "multiple labels for variable" — that is a separate, pre-existing guard, orthogonal to this fix, and is intentionally left unchanged.

Fixes #2443

…pache#2443)

A single-node labeled pattern used as a boolean expression -- e.g.
`WHERE (a:Person)`, `WHERE EXISTS((a:Person))` -- was accepted but did not
test the bound vertex's label. It desugars to an EXISTS sub-pattern, and
make_path_join_quals() returned early for vertex-only patterns
(list_length(entities) < 3), emitting no quals. With no edge to carry a
correlation, the sub-pattern referenced nothing from the enclosing query,
so the planner produced an uncorrelated one-time InitPlan that was trivially
true whenever any vertex of that label existed -- the predicate matched every
outer row.

Emit an explicit label-id filter for a vertex-only pattern whose vertex
carries a non-default label and whose variable is declared in an ancestor
parse state (i.e. a correlated reference). make_qual() builds a name-based id
reference that resolves to the outer variable, so the filter both correlates
the sub-pattern to that variable and enforces the label. Freshly scanned,
non-correlated vertices (no ancestor binding) are untouched, so plain
MATCH (a:Person) and "does any X exist" EXISTS checks are unaffected.

Add regression coverage in pattern_expression: WHERE (a:Person),
WHERE NOT (a:Person), and EXISTS((a:Company)) against a graph with a
non-Person vertex. All 41 regression tests pass.
@gregfelice gregfelice force-pushed the fix/single-node-label-pattern-filter branch from 464ed50 to 9c2c441 Compare June 22, 2026 21:19
@gregfelice

Copy link
Copy Markdown
Contributor Author

Ready for review.

  • Fix verified in a clean PG18 build: WHERE (a:Person) / EXISTS((a:Person)) now filter correctly (correlated SubPlan on _extract_label_id(a.id) = <label_id>) instead of producing a trivially-true uncorrelated one-time InitPlan.
  • Full regression suite green (41/41), regression.diffs empty, including the new pattern_expression cases.
  • Scope kept tight: only correlated bound vertices with a non-default label are affected; plain MATCH (a:Person) and "does any X exist" EXISTS checks are unchanged.
  • The pre-existing single-node-pattern NOTE in pattern_expression.sql was updated to reflect the corrected behavior (and the orthogonal, still-present "multiple labels" projection limitation).

Fixes #2443.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single-node labeled pattern (a:Label) does not filter by label (vertex-only EXISTS skips label quals)

1 participant