cypher_with: add ORDER BY to non-deterministic RETURN queries#2436
Conversation
4e14eae to
9cb8621
Compare
9cb8621 to
05d5c71
Compare
Several cypher_with regression queries RETURN multiple rows without an ORDER BY, so their row order depends on heap/scan order and can vary between runs, build types, and platforms. Add ORDER BY ASC to those queries so the expected output is stable. Ordering keys use id() (a single int64 that bypasses the locale-sensitive string comparison path and is reproducible from the test's deterministic setup order), or the projected path/scalar where that is what the query returns. Where the underlying vertex/edge was dropped by a WITH projection, its id is threaded through as an alias rather than reordering the projection. Full audit of cypher_with: all 23 multi-row result blocks were checked. After this change, every multi-row, non-EXPLAIN RETURN is deterministically ordered. The two remaining unordered multi-row blocks are left as-is: - "RETURN lbl" returns two identical "Person" rows, so order cannot drift; - the 13 EXPLAIN (VERBOSE, COSTS OFF) plan blocks emit a fixed serial plan (no parallel/gather nodes), so their row order is already deterministic. This is a test-only change (regress/sql/cypher_with.sql and regress/expected/cypher_with.out); no extension C code or SQL is modified. Row counts are unchanged (pure reordering). All 37 regression tests pass (installcheck) on PostgreSQL 18.3. Co-authored-by: GitHub Copilot <noreply@github.com> modified: regress/expected/cypher_with.out modified: regress/sql/cypher_with.sql
05d5c71 to
ad7b673
Compare
gregfelice
left a comment
There was a problem hiding this comment.
Reviewed the diff and reproduced locally.
Confirmed every modified block in cypher_with.out is a pure reordering — same row multiset and same output columns before and after. Threading id(e) AS eid through the WITH to get a stable ORDER BY key without altering the RETURN projection is a clean approach.
Reproduced the full suite green on a cassert build (PG18 --enable-cassert -O0, AGE compiled with -Werror), so the updated expected output regenerates exactly:
ok 15 - cypher_with 104 ms
...
# All 40 tests passed.
One non-blocking question on the LIMIT 5 block:
... WITH x LIMIT 5 RETURN x ORDER BY xThe LIMIT 5 is applied in the WITH, before the RETURN ... ORDER BY, so the new ORDER BY only sorts whichever 5 rows survive the (still unordered) LIMIT. If that source ever yields more than 5 candidate rows, the selection — and therefore the output — could still vary with heap order. The current dataset may keep it deterministic; an ORDER BY ... LIMIT inside the WITH would make it robust to future data changes. Happy to leave as-is if you've confirmed the source is ≤5 rows.
Otherwise a clean, well-audited determinism fix. +1.
Several cypher_with regression queries RETURN multiple rows without an ORDER BY, so their row order depends on heap/scan order and can vary between runs, build types, and platforms. Add ORDER BY ASC to those queries so the expected output is stable. Ordering keys use id() (a single int64 that bypasses the locale-sensitive string comparison path and is reproducible from the test's deterministic setup order), or the projected path/scalar where that is what the query returns. Where the underlying vertex/edge was dropped by a WITH projection, its id is threaded through as an alias rather than reordering the projection.
Full audit of cypher_with: all 23 multi-row result blocks were checked. After this change, every multi-row, non-EXPLAIN RETURN is deterministically ordered. The two remaining unordered multi-row blocks are left as-is:
This is a test-only change (regress/sql/cypher_with.sql and regress/expected/cypher_with.out); no extension C code or SQL is modified. Row counts are unchanged (pure reordering).
All 37 regression tests pass (installcheck) on PostgreSQL 18.3.
Co-authored-by: GitHub Copilot noreply@github.com
modified: regress/expected/cypher_with.out
modified: regress/sql/cypher_with.sql