shortest_path: Support relationship-type filters and a minimum hop count#2442
Open
jrgemignani wants to merge 1 commit into
Open
shortest_path: Support relationship-type filters and a minimum hop count#2442jrgemignani wants to merge 1 commit into
jrgemignani wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds openCypher/Neo4j-aligned enhancements to AGE’s shortest-path SRFs by supporting relationship-type filtering with multiple types and introducing a minimum hop-count constraint (including a DFS/VLE fallback when the minimum exceeds the BFS shortest distance).
Changes:
- Extend
edge_typeshandling to accept an array of relationship types and match edges whose label is in the requested set. - Add
min_hopssupport, with a VLE DFS fallback for the “hard” regime (min_hops> true shortest distance) and guardrails to cap exhaustive enumeration. - Expand regression coverage for multi-type filters, min-hops regimes, error-prefix correctness, and in-cypher Tier 1 forms.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/backend/utils/adt/age_vle.c | Implements multi-type label filtering, min_hops fallback search, result/materialization caps, and scratch memory context usage. |
| regress/sql/age_shortest_path.sql | Adds regression queries covering new semantics (multi-type filters, min_hops behavior, error-name prefixing, Tier 1 calls). |
| regress/expected/age_shortest_path.out | Updates expected outputs to match new behavior and added regression cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Support relationship-type filters and a minimum hop count in shortest_path SRFs age_shortest_path / age_all_shortest_paths gain two related capabilities, both following openCypher / Neo4j semantics. Relationship-type filtering: the edge_types argument now accepts an array of types; an edge matches when its label is any one of the requested types. A bare string or a one-element array keeps the single-type behaviour, an empty string/array or NULL means no filter, and an unknown type matches nothing. sp_run_bfs takes an Oid set rather than a single oid, and sp_compute_paths resolves the argument into that set. Minimum hop count: the new min_hops argument is a lower bound on the path length. When it does not exceed the true shortest distance it imposes no constraint, so the normal BFS shortest-path result is returned. When it exceeds the shortest distance, BFS cannot produce a qualifying path, so the search falls back to the variable-length-edge depth-first engine (sp_minhops_fallback), which enumerates edge-distinct paths (relationship-uniqueness / trail semantics) and returns the shortest path(s) whose length is at least min_hops. This regime permits revisiting a vertex and closed walks back to the start, but never reusing an edge. A private memory context bounds the search and a cost guard caps the number of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the search with a maximum hop count) when the cap is exceeded. The hard regime combined with multiple relationship types is unsupported, because the VLE engine matches a single label; that case raises FEATURE_NOT_SUPPORTED. Regression coverage spans single- and multi-type filters, directed and undirected reachability, multiplicity of equal-length paths, max_hops bounds, NULL and non-existent endpoints, and both min_hops regimes, including a vertex-revisiting longer path (sp_revisit) and a closed-walk cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are exercised as well. Review feedback addressed: 1. Error messages now report the function actually called. age_shortest_path and age_all_shortest_paths share their argument-resolution helpers, which hard-coded an "age_shortest_path" prefix regardless of the caller; the caller's name is now threaded through so each function reports its own (this also corrects a mislabeled multi-type min_hops error). A new regression case (sp_errname) pins the behaviour for both functions. 2. age_all_shortest_paths now bounds the number of materialized result paths. The shortest-path DAG can contain exponentially many equal-length paths, all built up front before the first row streams; enumeration is capped at SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint to narrow the search, mirroring the existing min-hops candidate cap. 3. The BFS search state (visited table, frontier queue, predecessor multiset, and intermediate path arrays) now lives in a private scratch memory context that is deleted once the surviving result Datums are built in the SRF context, rather than persisting in multi_call_memory_ctx for the life of the SRF. This bounds peak memory to the result set plus one search and matches the pattern sp_minhops_fallback already used. 41/41 installcheck. Co-authored-by: Copilot <copilot@github.com> modified: regress/expected/age_shortest_path.out modified: regress/sql/age_shortest_path.sql modified: src/backend/utils/adt/age_vle.c modified: regress/expected/age_shortest_path.out
aba432f to
4ff702b
Compare
Comment on lines
3520
to
3523
| if (start_agt == NULL || end_agt == NULL) | ||
| { | ||
| return NULL; | ||
| } |
Comment on lines
3630
to
3635
| /* build / fetch the global graph cache for this graph */ | ||
| ggctx = manage_GRAPH_global_contexts(graph_name, graph_oid); | ||
| if (ggctx == NULL) | ||
| { | ||
| return NULL; | ||
| } |
Comment on lines
3654
to
3659
| if (!found) | ||
| { | ||
| hash_destroy(visited); | ||
| MemoryContextSwitchTo(oldctx); | ||
| MemoryContextDelete(scratch); | ||
| return NULL; | ||
| } |
Comment on lines
+3693
to
+3695
| return sp_minhops_fallback(ggctx, graph_oid, graph_name, fname, source, | ||
| target, fallback_label_oid, dir, min_hops, | ||
| max_hops, collect_all, out_count); |
Comment on lines
+3754
to
3757
| /* results are copied out; drop the BFS/enumeration scratch */ | ||
| MemoryContextSwitchTo(oldctx); | ||
| MemoryContextDelete(scratch); | ||
| return paths; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Support relationship-type filters and a minimum hop count in shortest_path SRFs
age_shortest_path / age_all_shortest_paths gain two related capabilities, both following openCypher / Neo4j semantics.
Relationship-type filtering: the edge_types argument now accepts an array of types; an edge matches when its label is any one of the requested types. A bare string or a one-element array keeps the single-type behaviour, an empty string/array or NULL means no filter, and an unknown type matches nothing. sp_run_bfs takes an Oid set rather than a single oid, and sp_compute_paths resolves the argument into that set.
Minimum hop count: the new min_hops argument is a lower bound on the path length. When it does not exceed the true shortest distance it imposes no constraint, so the normal BFS shortest-path result is returned. When it exceeds the shortest distance, BFS cannot produce a qualifying path, so the search falls back to the variable-length-edge depth-first engine (sp_minhops_fallback), which enumerates edge-distinct paths (relationship-uniqueness / trail semantics) and returns the shortest path(s) whose length is at least min_hops. This regime permits revisiting a vertex and closed walks back to the start, but never reusing an edge. A private memory context bounds the search and a cost guard caps the number of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the search with a maximum hop count) when the cap is exceeded. The hard regime combined with multiple relationship types is unsupported, because the VLE engine matches a single label; that case raises FEATURE_NOT_SUPPORTED.
Regression coverage spans single- and multi-type filters, directed and undirected reachability, multiplicity of equal-length paths, max_hops bounds, NULL and non-existent endpoints, and both min_hops regimes, including a vertex-revisiting longer path (sp_revisit) and a closed-walk cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are exercised as well.
Review feedback addressed:
Error messages now report the function actually called. age_shortest_path and age_all_shortest_paths share their argument-resolution helpers, which hard-coded an "age_shortest_path" prefix regardless of the caller; the caller's name is now threaded through so each function reports its own (this also corrects a mislabeled multi-type min_hops error). A new regression case (sp_errname) pins the behaviour for both functions.
age_all_shortest_paths now bounds the number of materialized result paths. The shortest-path DAG can contain exponentially many equal-length paths, all built up front before the first row streams; enumeration is capped at SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint to narrow the search, mirroring the existing min-hops candidate cap.
The BFS search state (visited table, frontier queue, predecessor multiset, and intermediate path arrays) now lives in a private scratch memory context that is deleted once the surviving result Datums are built in the SRF context, rather than persisting in multi_call_memory_ctx for the life of the SRF. This bounds peak memory to the result set plus one search and matches the pattern sp_minhops_fallback already used.
41/41 installcheck.
Co-authored-by: Copilot copilot@github.com
modified: regress/expected/age_shortest_path.out
modified: regress/sql/age_shortest_path.sql
modified: src/backend/utils/adt/age_vle.c