Skip to content

Add shortest_path / all_shortest_paths SRFs#2430

Merged
muhammadshoaib merged 1 commit into
apache:masterfrom
jrgemignani:shortest_path_1
Jun 21, 2026
Merged

Add shortest_path / all_shortest_paths SRFs#2430
muhammadshoaib merged 1 commit into
apache:masterfrom
jrgemignani:shortest_path_1

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

Add two C set-returning functions that compute unweighted (hop-count) shortest paths over the cached global graph adjacency via BFS, callable both at the SQL top level and inside a cypher() RETURN:

  • age_shortest_path(...) -> the single shortest path (0 or 1 rows)
  • age_all_shortest_paths(...) -> every shortest path, one per row

The signature follows the natural Cypher argument order (graph, start, end, edge_types, direction, min_hops, max_hops), registered in sql/agtype_typecast.sql (install) and age--1.7.0--y.y.y.sql (upgrade). Unimplemented parameters fail loudly: multiple relationship types and a non-zero min_hops raise ERRCODE_FEATURE_NOT_SUPPORTED. A single edge type (string or one-element array) is honored, and a NULL endpoint yields no rows per Cypher null semantics (wrong-typed endpoints / NULL graph still error).

To call the SRFs inside a cypher() RETURN, transform_cypher_return now sets query->hasTargetSRFs (it was the only results-producing clause that didn't, so the planner never added a ProjectSet node), and transform_FuncCall auto-prepends the graph name for snake_case shortest_path / all_shortest_paths. camelCase names are reserved for the future native grammar.

Robustness:

  • BFS guards against non-existent endpoints (returns 0 rows instead of crashing) and honors CHECK_FOR_INTERRUPTS.
  • An unknown edge label now matches no edges instead of silently traversing all of them (get_label_relation returns InvalidOid).
  • manage_GRAPH_global_contexts no longer self-deadlocks: its process-local mutex is released via PG_TRY/PG_CATCH if the critical section errors.

Adds the age_shortest_path regression test (directed/undirected, label filtering, parallel edges, self-loops, max_hops, the not-supported stubs, NULL and non-existent endpoint/graph guards). 37/37 installcheck pass.

Co-authored-by: Claude noreply@anthropic.com

modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/age_shortest_path.out
new file: regress/sql/age_shortest_path.sql
modified: sql/agtype_typecast.sql
modified: src/backend/parser/cypher_clause.c
modified: src/backend/parser/cypher_expr.c
modified: src/backend/utils/adt/age_global_graph.c
modified: src/backend/utils/adt/age_vle.c

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@jrgemignani jrgemignani force-pushed the shortest_path_1 branch 4 times, most recently from 0e591ad to 06ccd1e Compare June 11, 2026 14:07
Add two C set-returning functions that compute unweighted (hop-count)
shortest paths over the cached global graph adjacency via BFS, callable
both at the SQL top level and inside a cypher() RETURN:

  - age_shortest_path(...)       -> the single shortest path (0 or 1 rows)
  - age_all_shortest_paths(...)  -> every shortest path, one per row

The signature follows the natural Cypher argument order
(graph, start, end, edge_types, direction, min_hops, max_hops), registered
in sql/agtype_typecast.sql (install) and age--1.7.0--y.y.y.sql (upgrade).
Unimplemented parameters fail loudly: multiple relationship types and a
non-zero min_hops raise ERRCODE_FEATURE_NOT_SUPPORTED. A single edge type
(string or one-element array) is honored, and a NULL endpoint yields no
rows per Cypher null semantics (wrong-typed endpoints / NULL graph still
error).

To call the SRFs inside a cypher() RETURN, transform_cypher_return now sets
query->hasTargetSRFs (it was the only results-producing clause that didn't,
so the planner never added a ProjectSet node), and transform_FuncCall
auto-prepends the graph name for snake_case shortest_path /
all_shortest_paths. camelCase names are reserved for the future native
grammar.

Robustness:
  - BFS guards against non-existent endpoints (returns 0 rows instead of
    crashing) and honors CHECK_FOR_INTERRUPTS.
  - An unknown edge label now matches no edges instead of silently
    traversing all of them (get_label_relation returns InvalidOid).

Adds the age_shortest_path regression test (directed/undirected, label
filtering, parallel edges, self-loops, max_hops, the not-supported stubs,
NULL and non-existent endpoint/graph guards).

38/38 installcheck pass.

Co-authored-by: Copilot <copilot@github.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
modified:   sql/agtype_typecast.sql
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_expr.c
modified:   src/backend/utils/adt/age_vle.c
new file:   regress/expected/age_shortest_path.out
new file:   regress/sql/age_shortest_path.sql
@gregfelice

Copy link
Copy Markdown
Contributor

Reviewed the full diff — traced the BFS/enumeration against the regression cases and confirmed the supporting helpers. This is solid, well-tested work; I'd merge it after a couple of minor cleanups. (I read the code carefully but did not rebuild locally — the 37/37 installcheck result is the author's.)

Strengths

  • Algorithm is correct. BFS pops in non-decreasing depth order, so single-path mode's break-on-first-pop yields the true shortest length, and all-paths mode correctly keeps expanding, prunes at du >= target_depth, and records every equal-depth predecessor. I checked that this does not double-count parallel edges or produce spurious back-edge predecessors in undirected mode. The "every emitted path is simple because BFS depth strictly increases" reasoning holds.
  • The non-existent-endpoint guard is a genuine crash fix (previously source == end on a missing vertex matched at depth 0 and dereferenced a missing vertex), and it has dedicated regression coverage.
  • Coverage is thorough: directed/undirected, label filtering, parallel edges, self-loops, max_hops, zero-length, unreachable, the not-supported stubs, NULL/non-existent/empty-graph guards, a 120-node large graph, and the cypher() RETURN integration path that exercises the hasTargetSRFs + transform_FuncCall changes.
  • CHECK_FOR_INTERRUPTS() is correctly placed in the BFS, the recursive enumerator, and both DFS loops.

Suggestions (non-blocking)

  1. Error messages hardcode "age_shortest_path". Since age_all_shortest_paths shares sp_compute_paths / sp_agtype_to_*, a bad argument to the all-paths function reports an age_shortest_path: prefix. Consider threading the real function name through (or dropping the prefix).
  2. No result-count bound on age_all_shortest_paths. The shortest-path DAG can yield exponentially many paths, all materialized up front in multi_call_memory_ctx. Only CHECK_FOR_INTERRUPTS bounds it. Worth documenting, ideally a configurable cap.
  3. Scratch-memory lifetime. visited, the preds lists/sp_pred nodes, and the sp_queue buffer live in multi_call_memory_ctx for the whole SRF though they're only needed during computation. A short-lived child context reset after building the path Datums would bound peak memory.
  4. PR description scope. The body describes age_global_graph.c changes (get_label_relation->InvalidOid, the manage_GRAPH_global_contexts self-deadlock fix), but those aren't in this diff — they already landed on master. Trimming the description to the 8 files actually touched would help reviewers.

Nice work — the crash guard and the test matrix in particular are well done.

@gregfelice gregfelice left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Source-level review posted above; the implementation is correct (BFS shortest-depth, all-paths predecessor DAG, the non-existent-endpoint crash guard) and well-tested.

Verified the cassert PG18 gate locally (CI-mirror: REL_18_STABLE --enable-cassert -O0, -Werror clean build, EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm"): All 42 tests passed, including the new age_shortest_path test, no regression.diffs.

The four notes in my review (error-message prefix on the all-paths function, unbounded result count for age_all_shortest_paths, scratch-memory lifetime, and trimming the stale parts of the description) are all non-blocking — fine as follow-ups.

@muhammadshoaib muhammadshoaib merged commit 5abb02f into apache:master Jun 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants