Skip to content

[FEATURE] Provide a possibility to store and load an empty graph#685

Open
reta wants to merge 1 commit into
datastax:mainfrom
reta:issue-684
Open

[FEATURE] Provide a possibility to store and load an empty graph#685
reta wants to merge 1 commit into
datastax:mainfrom
reta:issue-684

Conversation

@reta

@reta reta commented Jun 19, 2026

Copy link
Copy Markdown

Provide a possibility to store and load an empty graph, see please #684 for motivation and background.

Closes #684

@reta reta force-pushed the issue-684 branch 3 times, most recently from c9c2ef2 to a2b357f Compare June 19, 2026 17:58
Signed-off-by: Andriy Redko <drreta@gmail.com>
@reta reta marked this pull request as ready for review June 19, 2026 18:43

@ashkrisk ashkrisk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta thanks for this PR. I've left a few comments. Additionally, I tried running a test by saving and loading an empty heap graph to an OnDiskGraphIndex, but this failed an assertion while loading the graph. The test in this PR only covers OnHeapGraphIndex::save and OnHeapGraphIndex::load.

I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.

Failure details

The error:

java.lang.AssertionError: Node ID -1419795886 out of bounds for layer 1

The sample program is below, please run with assertions enabled.

import io.github.jbellis.jvector.disk.ReaderSupplierFactory;
import io.github.jbellis.jvector.graph.GraphIndexBuilder;
import io.github.jbellis.jvector.graph.ListRandomAccessVectorValues;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriter;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriterTypes;
import io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex;
import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;

public class Test {
    public static void main(String[] forwardArgs) throws IOException {
        
        var dim = 1024;
        var vsf = VectorSimilarityFunction.DOT_PRODUCT;

        var M = 32;
        var ef = 100;
        var nOv = 1.2f;
        var alpha = 1.2f;
        var hier = true;

        var ravv = new ListRandomAccessVectorValues(List.of(), dim);
        var bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, vsf);
        var graphPath = Path.of("./local/tmp.jvgraph");

        try (
            var builder = new GraphIndexBuilder(bsp, dim, M, ef, nOv, alpha, hier);
            var graph = builder.build(ravv);
            var writer = GraphIndexWriter.getBuilderFor(GraphIndexWriterTypes.RANDOM_ACCESS_PARALLEL, graph, graphPath)
                .with(new InlineVectors(dim))
                .build();
        ) {
            writer.write(Map.of(FeatureId.INLINE_VECTORS, i -> new InlineVectors.State(ravv.getVector(i))));
            System.out.println("OHG max level = " + graph.getMaxLevel());
        }

        try (
            var readerSupplier = ReaderSupplierFactory.open(graphPath);
            var graph = OnDiskGraphIndex.load(readerSupplier);
        ) {
            var count = 0;
            var niter = graph.getNodes(0);
            while (niter.hasNext()) {
                count++;
                System.out.println(">>> " + niter.next());
            }
            System.out.println(String.format("Counted %d nodes", count));
            graph.getDegree(0);
        }
    }
}

var graph = TestUtil.buildSequentially(builder, ravv);

try (var out = TestUtil.openDataOutputStream(indexDataPath)) {
((OnHeapGraphIndex) graph).setAllMutationsCompleted();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: TestUtil.buildSequentially already calls setAllMutationsCompleted (transitively)

* in a View that should be created per accessing thread.
*/
public interface ImmutableGraphIndex extends AutoCloseable, Accountable {
int OMITTED = OrdinalMapper.OMITTED; // same as OrdinalMapper, since OrdinalMapper::oldToNew may return it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest defining a new constant, say ENTRY_NODE_ABSENT, rather than reusing OrdinalMapper.OMITTED. I don't think there is any advantage in reusing the constant (expand the widget for more details).

More details

The description for OrdinalMapper.OMITTED mentions that this value may be returned newToOld, but no mention is made of oldToNew. In fact, the default OrdinalMapper implementation (AbstractGraphIndexWriter::sequentialRenumbering) appears to handle holes in the "original" ordinals by mapping them to -1 (OrdinalMapper.OMITTED is currently defined as Integer.MIN_VALUE).

/**
* Used by newToOld to indicate that the new ordinal is a "hole" that has no corresponding old ordinal.
*/
int OMITTED = Integer.MIN_VALUE;

public static Map<Integer, Integer> sequentialRenumbering(ImmutableGraphIndex graph) {
try (var view = graph.getView()) {
Int2IntHashMap oldToNewMap = new Int2IntHashMap(-1);

@Override
public int getDegree(int level) {
return layerInfo.get(level).degree;
return layerInfo.isEmpty() ? 0 : layerInfo.get(level).degree;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's valid to have an empty layerInfo, even for an empty graph. Maybe the default should be removed, I'd say it's desirable to keep the existing behaviour which throws an error in case the layerInfo is somehow empty rather potentially causing a logic error somewhere down the line by returning zero. Similarly for all the other locations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Provide a possibility to store and load an empty graph

2 participants