Skip to content

shortest_path: Support relationship-type filters and a minimum hop count#2442

Open
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:shortest_path_2
Open

shortest_path: Support relationship-type filters and a minimum hop count#2442
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:shortest_path_2

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

Support relationship-type filters and a minimum hop count in shortest_path SRFs

age_shortest_path / age_all_shortest_paths gain two related capabilities, both following openCypher / Neo4j semantics.

Relationship-type filtering: the edge_types argument now accepts an array of types; an edge matches when its label is any one of the requested types. A bare string or a one-element array keeps the single-type behaviour, an empty string/array or NULL means no filter, and an unknown type matches nothing. sp_run_bfs takes an Oid set rather than a single oid, and sp_compute_paths resolves the argument into that set.

Minimum hop count: the new min_hops argument is a lower bound on the path length. When it does not exceed the true shortest distance it imposes no constraint, so the normal BFS shortest-path result is returned. When it exceeds the shortest distance, BFS cannot produce a qualifying path, so the search falls back to the variable-length-edge depth-first engine (sp_minhops_fallback), which enumerates edge-distinct paths (relationship-uniqueness / trail semantics) and returns the shortest path(s) whose length is at least min_hops. This regime permits revisiting a vertex and closed walks back to the start, but never reusing an edge. A private memory context bounds the search and a cost guard caps the number of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the search with a maximum hop count) when the cap is exceeded. The hard regime combined with multiple relationship types is unsupported, because the VLE engine matches a single label; that case raises FEATURE_NOT_SUPPORTED.

Regression coverage spans single- and multi-type filters, directed and undirected reachability, multiplicity of equal-length paths, max_hops bounds, NULL and non-existent endpoints, and both min_hops regimes, including a vertex-revisiting longer path (sp_revisit) and a closed-walk cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are exercised as well.

Review feedback addressed:

  1. Error messages now report the function actually called. age_shortest_path and age_all_shortest_paths share their argument-resolution helpers, which hard-coded an "age_shortest_path" prefix regardless of the caller; the caller's name is now threaded through so each function reports its own (this also corrects a mislabeled multi-type min_hops error). A new regression case (sp_errname) pins the behaviour for both functions.

  2. age_all_shortest_paths now bounds the number of materialized result paths. The shortest-path DAG can contain exponentially many equal-length paths, all built up front before the first row streams; enumeration is capped at SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint to narrow the search, mirroring the existing min-hops candidate cap.

  3. The BFS search state (visited table, frontier queue, predecessor multiset, and intermediate path arrays) now lives in a private scratch memory context that is deleted once the surviving result Datums are built in the SRF context, rather than persisting in multi_call_memory_ctx for the life of the SRF. This bounds peak memory to the result set plus one search and matches the pattern sp_minhops_fallback already used.

41/41 installcheck.

Co-authored-by: Copilot copilot@github.com

modified: regress/expected/age_shortest_path.out
modified: regress/sql/age_shortest_path.sql
modified: src/backend/utils/adt/age_vle.c

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds openCypher/Neo4j-aligned enhancements to AGE’s shortest-path SRFs by supporting relationship-type filtering with multiple types and introducing a minimum hop-count constraint (including a DFS/VLE fallback when the minimum exceeds the BFS shortest distance).

Changes:

  • Extend edge_types handling to accept an array of relationship types and match edges whose label is in the requested set.
  • Add min_hops support, with a VLE DFS fallback for the “hard” regime (min_hops > true shortest distance) and guardrails to cap exhaustive enumeration.
  • Expand regression coverage for multi-type filters, min-hops regimes, error-prefix correctness, and in-cypher Tier 1 forms.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/backend/utils/adt/age_vle.c Implements multi-type label filtering, min_hops fallback search, result/materialization caps, and scratch memory context usage.
regress/sql/age_shortest_path.sql Adds regression queries covering new semantics (multi-type filters, min_hops behavior, error-name prefixing, Tier 1 calls).
regress/expected/age_shortest_path.out Updates expected outputs to match new behavior and added regression cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/utils/adt/age_vle.c Outdated
Comment thread src/backend/utils/adt/age_vle.c
Comment thread src/backend/utils/adt/age_vle.c
Comment thread src/backend/utils/adt/age_vle.c
Comment thread src/backend/utils/adt/age_vle.c
Support relationship-type filters and a minimum hop count in shortest_path SRFs

age_shortest_path / age_all_shortest_paths gain two related capabilities,
both following openCypher / Neo4j semantics.

Relationship-type filtering: the edge_types argument now accepts an array
of types; an edge matches when its label is any one of the requested
types. A bare string or a one-element array keeps the single-type
behaviour, an empty string/array or NULL means no filter, and an unknown
type matches nothing. sp_run_bfs takes an Oid set rather than a single
oid, and sp_compute_paths resolves the argument into that set.

Minimum hop count: the new min_hops argument is a lower bound on the path
length. When it does not exceed the true shortest distance it imposes no
constraint, so the normal BFS shortest-path result is returned. When it
exceeds the shortest distance, BFS cannot produce a qualifying path, so
the search falls back to the variable-length-edge depth-first engine
(sp_minhops_fallback), which enumerates edge-distinct paths
(relationship-uniqueness / trail semantics) and returns the shortest
path(s) whose length is at least min_hops. This regime permits revisiting
a vertex and closed walks back to the start, but never reusing an edge. A
private memory context bounds the search and a cost guard caps the number
of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the
search with a maximum hop count) when the cap is exceeded. The hard regime
combined with multiple relationship types is unsupported, because the VLE
engine matches a single label; that case raises FEATURE_NOT_SUPPORTED.

Regression coverage spans single- and multi-type filters, directed and
undirected reachability, multiplicity of equal-length paths, max_hops
bounds, NULL and non-existent endpoints, and both min_hops regimes,
including a vertex-revisiting longer path (sp_revisit) and a closed-walk
cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are
exercised as well.

Review feedback addressed:

1. Error messages now report the function actually called. age_shortest_path
   and age_all_shortest_paths share their argument-resolution helpers, which
   hard-coded an "age_shortest_path" prefix regardless of the caller; the
   caller's name is now threaded through so each function reports its own
   (this also corrects a mislabeled multi-type min_hops error). A new
   regression case (sp_errname) pins the behaviour for both functions.

2. age_all_shortest_paths now bounds the number of materialized result paths.
   The shortest-path DAG can contain exponentially many equal-length paths,
   all built up front before the first row streams; enumeration is capped at
   SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint
   to narrow the search, mirroring the existing min-hops candidate cap.

3. The BFS search state (visited table, frontier queue, predecessor multiset,
   and intermediate path arrays) now lives in a private scratch memory context
   that is deleted once the surviving result Datums are built in the SRF
   context, rather than persisting in multi_call_memory_ctx for the life of
   the SRF. This bounds peak memory to the result set plus one search and
   matches the pattern sp_minhops_fallback already used.

41/41 installcheck.

Co-authored-by: Copilot <copilot@github.com>

modified:   regress/expected/age_shortest_path.out
modified:   regress/sql/age_shortest_path.sql
modified:   src/backend/utils/adt/age_vle.c
modified:   regress/expected/age_shortest_path.out

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Comment on lines 3520 to 3523
if (start_agt == NULL || end_agt == NULL)
{
return NULL;
}
Comment on lines 3630 to 3635
/* build / fetch the global graph cache for this graph */
ggctx = manage_GRAPH_global_contexts(graph_name, graph_oid);
if (ggctx == NULL)
{
return NULL;
}
Comment on lines 3654 to 3659
if (!found)
{
hash_destroy(visited);
MemoryContextSwitchTo(oldctx);
MemoryContextDelete(scratch);
return NULL;
}
Comment on lines +3693 to +3695
return sp_minhops_fallback(ggctx, graph_oid, graph_name, fname, source,
target, fallback_label_oid, dir, min_hops,
max_hops, collect_all, out_count);
Comment on lines +3754 to 3757
/* results are copied out; drop the BFS/enumeration scratch */
MemoryContextSwitchTo(oldctx);
MemoryContextDelete(scratch);
return paths;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants