A retrieval engine that reasons over document structure — not embeddings.
Parse a document into a tree, let an LLM navigate it, return whole sections with citations.
Quick start · How it works · Why not vector RAG · API · SDKs · Benchmarks · Config
One container — engine, a bundled Postgres, and a web UI. The only thing you bring is an LLM key.
docker run -p 8080:8080 -p 7654:7654 \
-e VLE_LLM_ANTHROPIC_API_KEY=<your key> \
halleluyaholudele/vectorless
# UI → http://localhost:8080
# API → http://localhost:7654Open http://localhost:8080, drop in a PDF, and ask a question. You can also set your API key from the dashboard (gear icon) instead of -e — it stays in your browser and is sent per request (BYOK).
Works with GLM / Z.ai, Anthropic, OpenAI, and Gemini. The image defaults to GLM (
glm-4.6) via the Anthropic-compatible gateway; override with-e VLE_LLM_*.
Run from source (Go 1.25+ · Postgres)
git clone https://github.com/hallelx2/vectorless-engine.git && cd vectorless-engine
docker compose up -d postgres
export VLE_LLM_ANTHROPIC_API_KEY=<your key>
go run ./cmd/engine --local # zero-config local mode on :7654At ingest the engine parses a document into a hierarchical tree of real sections — titles, page ranges, structure — and persists it. There is no chunking and nothing is embedded. At query time an LLM navigates that tree the way a person flips to the right page, reads the relevant sections in full, and answers with citations.
- No embeddings — nothing to recompute when you swap models, nothing to drift.
- No vector database — Postgres + object storage is enough.
- No top-K tuning — the model reads 1 section or 8, as the question needs.
- Citations for free — every answer carries page ranges and verbatim quotes.
Vector RAG works until you hit the parts where it doesn't: chunks lose structure, top-K is a guess, embeddings drift, and you maintain a second database to do approximate similarity over fragments you cut out of context.
The engine boots without a provider key and accepts credentials per request, so a self-hosted or Docker user configures the key from the dashboard — never baked into the server.
curl -X POST http://localhost:7654/v1/answer/treewalk \
-H 'Content-Type: application/json' \
-H 'X-LLM-Api-Key: <your key>' \
-d '{"document_id":"doc_…","query":"What were FY2018 capital expenditures?"}'Headers: X-LLM-Api-Key (required), X-LLM-Provider · X-LLM-Base-Url · X-LLM-Model (optional; inherit the server defaults).
Routes are versioned under /v1 from day one.
| Method | Path | Purpose |
|---|---|---|
GET |
/v1/health · /v1/version |
Liveness / build |
POST |
/v1/documents |
Ingest a document (multipart; async, 202) |
GET |
/v1/documents · /v1/documents/{id} |
List / get status |
GET |
/v1/documents/{id}/tree |
Structured section tree |
GET |
/v1/documents/{id}/source |
Stream the original bytes |
GET |
/v1/sections/{id} |
One section, full content |
POST |
/v1/answer/treewalk |
Ask — cited answer in one round-trip |
POST |
/v1/query |
Retrieve relevant sections |
POST |
/v1/replay |
Replay any answer byte-for-byte (trace_token) |
Official clients for TypeScript, Python, and Go (vectorless-sdk). Point them at the local engine:
from vectorless import VectorlessClient
client = VectorlessClient(base_url="http://localhost:7654")
doc = client.wait_for_ready(client.ingest_document("10-K.pdf").document_id)
ans = client.answer_treewalk(doc.id, "What were FY2018 capital expenditures?",
llm_key="<your key>") # BYOK
print(ans.answer, ans.citations)We benchmark on FinanceBench — SEC filings whose answers live in dense financial tables, the hard case for retrieval. The harness (vectorless-bench) scores page/section-grounded recall of the evidence and reports quality alongside cost and latency — because quality is meaningless without its price. It runs treewalk against a BM25 lexical floor and the upstream PageIndex library on equal footing (same model, same hardware, cold cache), and uses rank-based statistics across tasks rather than cherry-picked wins.
# reproduce — points the harness at any running engine
VECTORLESS_BASE_URL=http://localhost:7654 \
vlbench run --config configs/financebench_glm_fast.yaml
vlbench report runs/<stamp> --k 5Every run writes a manifest stamped with model, hardware, library versions, and git commit, so numbers are reproducible and audit-able. Headline results are published with the launch.
A single Go binary with four pluggable boundaries:
| Boundary | Implementations |
|---|---|
| Storage | Local filesystem · S3-compatible (R2, MinIO, B2, Spaces) |
| Queue | River (Postgres) · Asynq (Redis) · QStash (serverless) |
| LLM | Anthropic · OpenAI · Gemini — and any Anthropic-compatible gateway (GLM/Z.ai) |
| Retrieval | treewalk (page-based agentic — the default) · single-pass · chunked-tree |
Config layers, in increasing priority: --config <yaml> → VLE_* env vars → CLI flags. See config.example.yaml for the full reference. Minimal:
database: { url: "postgres://vectorless:vectorless@localhost:5432/vectorless?sslmode=disable" }
storage: { driver: local, local: { root: "./data/documents" } }
queue: { driver: river }
llm:
driver: anthropic
anthropic: { api_key: "${VLE_LLM_ANTHROPIC_API_KEY}", base_url: "https://api.z.ai/api/anthropic/v1", model: "glm-4.6" }Anthropic-compatible gateways (GLM/Z.ai):
base_urlmust include/v1— the client posts to${base_url}/messages.
Windows + local storage: Windows Defender real-time protection scans each freshly written file and briefly hides it from
os.Stat/os.Open, which under heavy concurrent ingestion can surface as transientobject not founderrors. The local backend rides through this window with a short internal retry, but for large bulk loads add a Defender exclusion for your storage root (local.root/VLE_STORAGE_LOCAL_ROOT):Add-MpPreference -ExclusionPath "C:\path\to\data\documents". Linux has no such scan-hold and needs no exclusion.
PDF (positioned text + tables via pdftable) · Markdown · HTML · DOCX · Text.
vectorless-sdk— TS / Python / Go clientsvectorless-bench— the benchmark harnessllmgate— provider clients, retries, pricing


