Skip to content

Add reduce() list folding function#2435

Open
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:add-reduce-list-folding-function
Open

Add reduce() list folding function#2435
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:add-reduce-list-folding-function

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

Implement the openCypher reduce(acc = init, var IN list | body) expression, which folds an arbitrary expression over a list, threading an accumulator across the elements in list order. This closes a long-standing gap (reduce() was previously unsupported) and works both at the SQL top level and inside a cypher() RETURN/WHERE.

Implementation

reduce() is desugared, at transform time, into a correlated scalar subquery over a new ordered aggregate rather than a new executor node, so no PostgreSQL core changes are required:

CASE WHEN list IS NULL THEN NULL
     ELSE COALESCE((SELECT ag_catalog.age_reduce(
                        <init>, '<serialized-body>'::text,
                        r.elem ORDER BY r.ord)
                    FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                   <init>)
END
  • A new cypher_reduce extensible node carries the accumulator/element names and the init/list/body expressions (grammar production, keyword, and the copy/out/read serialization plumbing).
  • The fold body is transformed against a throwaway two-column agtype namespace, its accumulator and element Var references are rewritten to PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a text argument.
  • age_reduce_transfn (a custom agtype aggregate transition function) deserializes and compiles the body once per group with ExecInitExpr, then evaluates it per element with ExecEvalExpr, rebinding the two params. The body is normalized to agtype at transform time so a boolean or other non-agtype result cannot be misread as a by-reference Datum.

Semantics

  • List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  • An empty list yields the initial value; a NULL list yields NULL.
  • The list and initial value may reference outer-query variables (e.g. reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference only the accumulator and element.
  • Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE, and element property-access bodies are all supported.
  • Outer-variable, query-parameter, nested-reduce, and aggregate references inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  • reduce is registered as a safe keyword so it remains usable as a property or map key, preserving backward compatibility.

Tests

Adds the age_reduce regression test (registered in the install SQL and the upgrade template so age_upgrade passes), covering: arithmetic/product/string folds; order sensitivity; empty/NULL list; NULL element and NULL init; list-building and CASE bodies; boolean and comparison bodies; element property access; multiple and nested (in list/init) reduce(); reduce() in boolean expressions, WHERE, and list comprehensions; folds over collected nodes and node list properties; the not-supported rejections; and reduce as a map key. All multi-row results are ordered. 38/38 installcheck pass.

Future work

The body restriction (accumulator and element only) is a property of the standalone expression evaluation and can be relaxed without core changes:

  • Allow loop-invariant outer-variable and cypher $parameter references in the body by capturing them as additional eager aggregate arguments bound to extra param slots.
  • Support a nested reduce() inside the body via an SPI-based evaluation fallback for subquery-bearing bodies. Aggregates inside the body remain intentionally unsupported, matching the openCypher specification.

Co-authored-by: Copilot copilot@github.com

modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/age_reduce.out
new file: regress/sql/age_reduce.sql
modified: sql/age_aggregate.sql
modified: src/backend/nodes/ag_nodes.c
modified: src/backend/nodes/cypher_copyfuncs.c
modified: src/backend/nodes/cypher_outfuncs.c
modified: src/backend/nodes/cypher_readfuncs.c
modified: src/backend/parser/cypher_analyze.c
modified: src/backend/parser/cypher_clause.c
modified: src/backend/parser/cypher_gram.y
modified: src/backend/utils/adt/agtype.c
modified: src/include/nodes/ag_nodes.h
modified: src/include/nodes/cypher_copyfuncs.h
modified: src/include/nodes/cypher_nodes.h
modified: src/include/nodes/cypher_outfuncs.h
modified: src/include/nodes/cypher_readfuncs.h
modified: src/include/parser/cypher_kwlist.h

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds openCypher reduce(acc = init, var IN list | body) support by desugaring the expression at transform time into a correlated scalar subquery that folds over unnest(list) WITH ORDINALITY using a new ordered aggregate (ag_catalog.age_reduce) and a serialized standalone fold-body expression evaluated inside the aggregate transition function.

Changes:

  • Adds a new cypher_reduce ExtensibleNode plus grammar/keyword support for parsing reduce(...).
  • Implements transform_cypher_reduce() to rewrite reduce() into an ordered aggregate over unnest(... ) WITH ORDINALITY, serializing the fold body with accumulator/element rewritten to PARAM_EXEC slots.
  • Adds ag_catalog.age_reduce + age_reduce_transfn (C) and a comprehensive regression test suite (age_reduce).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Makefile Registers age_reduce in the regression test list.
age--1.7.0--y.y.y.sql Adds upgrade-time creation of age_reduce_transfn and age_reduce aggregate.
regress/expected/age_reduce.out Expected output for the new reduce() regression coverage.
regress/sql/age_reduce.sql New regression tests covering semantics, order, null/empty behavior, and unsupported constructs.
sql/age_aggregate.sql Defines age_reduce_transfn and age_reduce aggregate in install SQL.
src/backend/nodes/ag_nodes.c Registers cypher_reduce in node name/method tables.
src/backend/nodes/cypher_copyfuncs.c Adds copy support for cypher_reduce.
src/backend/nodes/cypher_outfuncs.c Adds serialization support for cypher_reduce.
src/backend/nodes/cypher_readfuncs.c Adds deserialization support for cypher_reduce.
src/backend/parser/cypher_analyze.c Extends raw-expression walker to traverse cypher_reduce fields.
src/backend/parser/cypher_clause.c Implements transform_cypher_reduce() desugaring and fold-body param rewrite/validation.
src/backend/parser/cypher_gram.y Adds reduce(...) grammar production and marks REDUCE as safe keyword for map keys.
src/backend/utils/adt/agtype.c Implements age_reduce_transfn (aggregate transition fn) that deserializes/compiles and evaluates the fold body.
src/include/nodes/ag_nodes.h Adds cypher_reduce_t tag.
src/include/nodes/cypher_copyfuncs.h Declares copy_cypher_reduce.
src/include/nodes/cypher_nodes.h Defines the cypher_reduce ExtensibleNode struct.
src/include/nodes/cypher_outfuncs.h Declares out_cypher_reduce.
src/include/nodes/cypher_readfuncs.h Declares read_cypher_reduce.
src/include/parser/cypher_kwlist.h Adds reduce keyword token (REDUCE).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/parser/cypher_clause.c
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch 2 times, most recently from 36162be to 6c68874 Compare June 11, 2026 14:04
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch from 6c68874 to 80841fd Compare June 12, 2026 16:15
Implement the openCypher reduce(acc = init, var IN list | body) expression,
which folds an arbitrary expression over a list, threading an accumulator
across the elements in list order. This closes a long-standing gap (reduce()
was previously unsupported) and works both at the SQL top level and inside a
cypher() RETURN/WHERE.

Implementation
--------------
reduce() is desugared, at transform time, into a correlated scalar subquery
over a new ordered aggregate rather than a new executor node, so no PostgreSQL
core changes are required:

    CASE WHEN list IS NULL THEN NULL
         ELSE COALESCE((SELECT ag_catalog.age_reduce(
                            <init>, '<serialized-body>'::text,
                            r.elem ORDER BY r.ord)
                        FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                       <init>)
    END

  - A new cypher_reduce extensible node carries the accumulator/element names
    and the init/list/body expressions (grammar production, keyword, and the
    copy/out/read serialization plumbing).
  - The fold body is transformed against a throwaway two-column agtype
    namespace, its accumulator and element Var references are rewritten to
    PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a
    text argument.
  - age_reduce_transfn (a custom agtype aggregate transition function)
    deserializes and compiles the body once per group with ExecInitExpr, then
    evaluates it per element with ExecEvalExpr, rebinding the two params. The
    body is normalized to agtype at transform time so a boolean or other
    non-agtype result cannot be misread as a by-reference Datum.

Semantics
---------
  - List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  - An empty list yields the initial value; a NULL list yields NULL.
  - The list and initial value may reference outer-query variables (e.g.
    reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference
    only the accumulator and element.
  - Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE,
    and element property-access bodies are all supported.
  - Outer-variable, query-parameter, nested-reduce, and aggregate references
    inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  - reduce is registered as a safe keyword so it remains usable as a property
    or map key, preserving backward compatibility.

Tests
-----
Adds the age_reduce regression test (registered in the install SQL and the
upgrade template so age_upgrade passes), covering: arithmetic/product/string
folds; order sensitivity; empty/NULL list; NULL element and NULL init;
list-building and CASE bodies; boolean and comparison bodies; element property
access; multiple and nested (in list/init) reduce(); reduce() in boolean
expressions, WHERE, and list comprehensions; folds over collected nodes and
node list properties; the not-supported rejections; and reduce as a map key.
All multi-row results are ordered. 38/38 installcheck pass.

Future work
-----------
The body restriction (accumulator and element only) is a property of the
standalone expression evaluation and can be relaxed without core changes:
  - Allow loop-invariant outer-variable and cypher $parameter references in
    the body by capturing them as additional eager aggregate arguments bound
    to extra param slots.
  - Support a nested reduce() inside the body via an SPI-based evaluation
    fallback for subquery-bearing bodies.
Aggregates inside the body remain intentionally unsupported, matching the
openCypher specification.

Co-authored-by: Copilot <copilot@github.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/age_reduce.out
new file:   regress/sql/age_reduce.sql
modified:   sql/age_aggregate.sql
modified:   src/backend/nodes/ag_nodes.c
modified:   src/backend/nodes/cypher_copyfuncs.c
modified:   src/backend/nodes/cypher_outfuncs.c
modified:   src/backend/nodes/cypher_readfuncs.c
modified:   src/backend/parser/cypher_analyze.c
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_gram.y
modified:   src/backend/utils/adt/agtype.c
modified:   src/include/nodes/ag_nodes.h
modified:   src/include/nodes/cypher_copyfuncs.h
modified:   src/include/nodes/cypher_nodes.h
modified:   src/include/nodes/cypher_outfuncs.h
modified:   src/include/nodes/cypher_readfuncs.h
modified:   src/include/parser/cypher_kwlist.h

Resolved Conflicts: age--1.7.0--y.y.y.sql
@jrgemignani jrgemignani force-pushed the add-reduce-list-folding-function branch from 80841fd to 56fb282 Compare June 21, 2026 04:23
@jrgemignani jrgemignani requested a review from gregfelice June 21, 2026 04:30
@gregfelice

Copy link
Copy Markdown
Contributor

Reviewed the full diff and the changed files in context, with two deep passes on the executor/planner mechanics. I independently verified the two subtle correctness traps this kind of desugaring is prone to — the COALESCE-NULL ambiguity and PARAM_EXEC isolation — and both are genuinely handled. Solid work; no blockers or majors. (Source-level review; I could not run make here — the 38/38 installcheck is the author's.)

Verified correct

  • The empty-list vs. fold-to-null ambiguity is correctly resolved — the trap I most expected here. age_reduce_transfn never returns SQL NULL: a null fold result is normalized to an agtype 'null' varlena (agtype.c:11730-11733), and the accumulator is seeded the same way. So the aggregate returns SQL NULL only over zero rows. The desugaring CASE WHEN list IS NULL THEN NULL ELSE COALESCE(<agg>, init) END therefore reads correctly: empty list -> SQL NULL -> init; a legitimate fold-to-null -> agtype 'null' (not SQL NULL) -> passes through COALESCE. The [1, null, 3] -> null test (age_reduce.sql:186) exercises exactly this. The classic bug is not present.
  • By-value Datum safety holds. The body is normalized to agtype at transform time, and the transfn independently re-checks exprType(body_node) == AGTYPEOID at runtime (agtype.c:11680) because age_reduce is SQL-callable, so datumCopy(result, false, -1) can never misread a by-value bool/int as a varlena. Good defense in depth.
  • Accumulator survives transitionsdatumCopy'd into aggcontext, not the per-tuple context; PG_ARGISNULL(0) is a reliable first-element signal precisely because the state is never SQL NULL.
  • PARAM_EXEC 0/1 cannot collide with planner params. The body is evaluated in a private standalone ExprContext whose ecxt_param_exec_vals is the transfn's own 2-element array; the outer planner never sees ids 0/1, and no node type that would need a parent (Aggref/SubLink/WindowFunc) can reach the body — all are rejected before serialization.
  • Nested-reduce / outer-var / param rejection is enforced at transform time (a nested reduce lowers to a SubLink, which reduce_body_check_walker rejects; the varlevelsup == 0 guard prevents an outer RTE sharing a varno from being rewritten into the accumulator param). The three rejection tests confirm it.
  • Init-evaluated-once is correct. Wrapping init in CASE WHEN reduce_ordinality = 1 THEN <init> ELSE NULL::agtype END is sound because the aggregate-level ORDER BY delivers ordinality 1 first and the transfn only reads init when the state is NULL; PARALLEL UNSAFE + no combinefn prevents any partial-aggregate reordering.
  • Serialization plumbing is complete and symmetriccypher_reduce is wired into the tag enum, node_names[]/node_methods[], all five fields in copy/out/read with matching headers, and the raw-expr walker in cypher_analyze.c descends all three expr fields. No field missed.
  • Backward compat preservedreduce is added to the safe_keywords grammar production (like any/none/order), and {reduce: 1} still works as a map key.

Test coverage gaps (cheap to close, no bug implied)

The mechanisms below are correct by inspection, but the suite doesn't lock in the exact semantics that make them correct:

  1. A fold that legitimately produces null mid-way and then recovers (e.g. a coalesce/CASE body that climbs back out of null). Only "null propagates to a null result" is tested; the recovery path through the agtype-'null' state is the load-bearing one and isn't directly exercised.
  2. Empty list with a NULL init (reduce(s = null, x IN [] | ...)) — the COALESCE(NULL, null) path is unverified.
  3. A type error / runtime failure in the body (e.g. a non-coercible or arithmetic-invalid body) — the coerce_to_common_type(..., "reduce") and runtime error paths are unexercised.

(Large-list / interrupt-path stress and deeper nesting are nice-to-haves; the SubLink check makes nesting depth irrelevant.)

Adding #1#3 would be a few lines each and would pin down precisely the behavior this implementation gets right. Nice piece of work overall — the agtype-'null' state trick and the standalone-ExprContext param isolation are the right calls.

@gregfelice

Copy link
Copy Markdown
Contributor

Design check before this lands — desugar-to-aggregate as the foundation for reduce()

Flagging the core design decision for lazy consensus, since it'll likely become the template for future folding/comprehension work. reduce() here is implemented entirely at transform time — desugared into a correlated scalar subquery over a new ordered aggregate (age_reduce), with the fold body serialized via nodeToString() and re-evaluated per element through a standalone ExecInitExpr/ExecEvalExpr bound to PARAM_EXEC 0/1 — so no PostgreSQL core changes are needed. List order comes from the aggregate's ORDER BY; init is evaluated once via CASE WHEN ordinality = 1.

I've reviewed it closely and it's correct (the empty-list-vs-fold-to-null ambiguity and the param-isolation traps are both handled) and green under cassert PG18. Two things worth a second committer's eyes before merge:

  1. Is desugar-to-aggregate the foundation we want for reduce() (and as a precedent for reduce-adjacent features), versus a dedicated executor node?
  2. The body-scope restriction (accumulator + element only; outer vars / $params / nested reduce / aggregates rejected with FEATURE_NOT_SUPPORTED) — acceptable as a first cut given the author's documented path to relax it?

No objection from me on either; raising for lazy consensus rather than landing a design-setting change on a lone review.

@jrgemignani

jrgemignani commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

@gregfelice I will review this Monday to see what can be adjusted. BTW, it is -> my opinion <- that reduce is closer to an aggregate than to anything else.

Also, this is part 1 of 2, maybe 3 for reduce. Those features not supported are covered in the second PR. It was just going to be too big to create just one and review. This is also true for shortest path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants