[FEATURE] Provide a possibility to store and load an empty graph#685
[FEATURE] Provide a possibility to store and load an empty graph#685reta wants to merge 1 commit into
Conversation
c9c2ef2 to
a2b357f
Compare
Signed-off-by: Andriy Redko <drreta@gmail.com>
ashkrisk
left a comment
There was a problem hiding this comment.
@reta thanks for this PR. I've left a few comments. Additionally, I tried running a test by saving and loading an empty heap graph to an OnDiskGraphIndex, but this failed an assertion while loading the graph. The test in this PR only covers OnHeapGraphIndex::save and OnHeapGraphIndex::load.
I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.
Failure details
The error:
java.lang.AssertionError: Node ID -1419795886 out of bounds for layer 1
The sample program is below, please run with assertions enabled.
import io.github.jbellis.jvector.disk.ReaderSupplierFactory;
import io.github.jbellis.jvector.graph.GraphIndexBuilder;
import io.github.jbellis.jvector.graph.ListRandomAccessVectorValues;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriter;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriterTypes;
import io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex;
import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;
public class Test {
public static void main(String[] forwardArgs) throws IOException {
var dim = 1024;
var vsf = VectorSimilarityFunction.DOT_PRODUCT;
var M = 32;
var ef = 100;
var nOv = 1.2f;
var alpha = 1.2f;
var hier = true;
var ravv = new ListRandomAccessVectorValues(List.of(), dim);
var bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, vsf);
var graphPath = Path.of("./local/tmp.jvgraph");
try (
var builder = new GraphIndexBuilder(bsp, dim, M, ef, nOv, alpha, hier);
var graph = builder.build(ravv);
var writer = GraphIndexWriter.getBuilderFor(GraphIndexWriterTypes.RANDOM_ACCESS_PARALLEL, graph, graphPath)
.with(new InlineVectors(dim))
.build();
) {
writer.write(Map.of(FeatureId.INLINE_VECTORS, i -> new InlineVectors.State(ravv.getVector(i))));
System.out.println("OHG max level = " + graph.getMaxLevel());
}
try (
var readerSupplier = ReaderSupplierFactory.open(graphPath);
var graph = OnDiskGraphIndex.load(readerSupplier);
) {
var count = 0;
var niter = graph.getNodes(0);
while (niter.hasNext()) {
count++;
System.out.println(">>> " + niter.next());
}
System.out.println(String.format("Counted %d nodes", count));
graph.getDegree(0);
}
}
}| var graph = TestUtil.buildSequentially(builder, ravv); | ||
|
|
||
| try (var out = TestUtil.openDataOutputStream(indexDataPath)) { | ||
| ((OnHeapGraphIndex) graph).setAllMutationsCompleted(); |
There was a problem hiding this comment.
nit: TestUtil.buildSequentially already calls setAllMutationsCompleted (transitively)
| * in a View that should be created per accessing thread. | ||
| */ | ||
| public interface ImmutableGraphIndex extends AutoCloseable, Accountable { | ||
| int OMITTED = OrdinalMapper.OMITTED; // same as OrdinalMapper, since OrdinalMapper::oldToNew may return it |
There was a problem hiding this comment.
I'd suggest defining a new constant, say ENTRY_NODE_ABSENT, rather than reusing OrdinalMapper.OMITTED. I don't think there is any advantage in reusing the constant (expand the widget for more details).
More details
The description for OrdinalMapper.OMITTED mentions that this value may be returned newToOld, but no mention is made of oldToNew. In fact, the default OrdinalMapper implementation (AbstractGraphIndexWriter::sequentialRenumbering) appears to handle holes in the "original" ordinals by mapping them to -1 (OrdinalMapper.OMITTED is currently defined as Integer.MIN_VALUE).
| @Override | ||
| public int getDegree(int level) { | ||
| return layerInfo.get(level).degree; | ||
| return layerInfo.isEmpty() ? 0 : layerInfo.get(level).degree; |
There was a problem hiding this comment.
I don't think it's valid to have an empty layerInfo, even for an empty graph. Maybe the default should be removed, I'd say it's desirable to keep the existing behaviour which throws an error in case the layerInfo is somehow empty rather potentially causing a logic error somewhere down the line by returning zero. Similarly for all the other locations.
Provide a possibility to store and load an empty graph, see please #684 for motivation and background.
Closes #684