- A retrieval engine that reasons over document structure — not embeddings.
- No chunking. No top-K. No vector database. Just a tree, an LLM, and full sections.
+ A retrieval engine that reasons over document structure — not embeddings.
+ Parse a document into a tree, let an LLM navigate it, return whole sections with citations.
---
-## Why vectorless
+## Quick start
-Vector RAG works — until you hit the parts where it doesn't. Chunks lose structure. Top-K is a guess. Embeddings drift. You maintain a second database just to do approximate similarity on bits of text you cut out of context.
+One container — engine, a bundled Postgres, and a web UI. The only thing you bring is an LLM key.
-**vectorless-engine takes a different path**: at ingest, it builds a structured tree of the document (titles, summaries, metadata) — essentially an LLM-friendly table of contents. At query time, an LLM reads that tree and picks the exact section IDs it needs. The engine returns those sections in full, with their narrative intact.
+```bash
+docker run -p 8080:8080 -p 7654:7654 \
+ -e VLE_LLM_ANTHROPIC_API_KEY= \
+ halleluyaholudele/vectorless
+# UI → http://localhost:8080
+# API → http://localhost:7654
+```
-
+
-**What you get:**
-
-- **No embeddings** — nothing to recompute when you swap models, nothing to drift.
-- **No vector database** — Postgres + object storage is enough.
-- **No top-K tuning** — the model picks 1 section or 8, as needed.
-- **Full context preserved** — sections are returned whole, not as fragments.
-- **Citations for free** — every returned section has a stable ID.
-- **Provider-agnostic** — Anthropic, OpenAI, Gemini all plug in behind the same interface.
+Open **http://localhost:8080**, drop in a PDF, and ask a question. You can also set your API key from the dashboard (gear icon) instead of `-e` — it stays in your browser and is sent per request ([BYOK](#bring-your-own-key)).
-## How it works
+
+
+
-### 1. Ingest — build a structured tree
+> Works with **GLM / Z.ai**, **Anthropic**, **OpenAI**, and **Gemini**. The image defaults to GLM (`glm-4.6`) via the Anthropic-compatible gateway; override with `-e VLE_LLM_*`.
-Upload a document; the engine parses it, splits it along semantic section boundaries (not blind chunks), summarizes each section with a cheap model, and persists the tree. All asynchronous, driven by the queue of your choice.
+
+Run from source (Go 1.25+ · Postgres)
-
-
-
+```bash
+git clone https://github.com/hallelx2/vectorless-engine.git && cd vectorless-engine
+docker compose up -d postgres
+export VLE_LLM_ANTHROPIC_API_KEY=
+go run ./cmd/engine --local # zero-config local mode on :7654
+```
+
-### 2. Query — the LLM reasons over the tree
+## How it works
-For small documents the whole tree fits in one prompt. For large documents the engine **splits the tree into budget-sized slices, fires parallel LLM calls, and merges the results** — so you're never bottlenecked on a single model's context window.
+At **ingest** the engine parses a document into a hierarchical tree of real sections — titles, page ranges, structure — and persists it. There is **no chunking** and **nothing is embedded**. At **query** time an LLM navigates that tree the way a person flips to the right page, reads the relevant sections in full, and answers with citations.
-
+
-### 3. Return — full sections, not fragments
+- **No embeddings** — nothing to recompute when you swap models, nothing to drift.
+- **No vector database** — Postgres + object storage is enough.
+- **No top-K tuning** — the model reads 1 section or 8, as the question needs.
+- **Citations for free** — every answer carries page ranges and verbatim quotes.
-The engine fetches the selected sections from object storage and returns them intact. Your downstream model (or agent) gets coherent, cite-able text — not a bag of chunks.
+## Why not vector RAG
-## Architecture
+Vector RAG works until you hit the parts where it doesn't: chunks lose structure, top-K is a guess, embeddings drift, and you maintain a second database to do approximate similarity over fragments you cut out of context.
-
+
-The engine is a **single Go binary** with four pluggable boundaries:
+## Bring your own key
-| Boundary | Implementations shipped |
-|------------------|---------------------------------------------------------------------------------|
-| **Storage** | Local filesystem · S3-compatible (AWS S3, Cloudflare R2, MinIO, Backblaze B2, DigitalOcean Spaces) — GCS / Azure planned |
-| **Queue** | [QStash](https://upstash.com/docs/qstash) (serverless) · [River](https://riverqueue.com) (Postgres) · [Asynq](https://github.com/hibiken/asynq) (Redis) |
-| **LLM** | Anthropic · OpenAI · Gemini — all behind one `llm.Client` interface |
-| **Retrieval** | `single-pass` (small trees, one call) · `chunked-tree` (big trees, parallel map-reduce) |
+The engine boots **without** a provider key and accepts credentials per request, so a self-hosted or Docker user configures the key from the dashboard — never baked into the server.
-Everything else — the control plane, dashboard, MCP server, SDKs — lives outside this repo and talks to the engine over its HTTP API. Run the engine standalone; run it behind your own control plane; embed it in your product. Up to you.
+```bash
+curl -X POST http://localhost:7654/v1/answer/treewalk \
+ -H 'Content-Type: application/json' \
+ -H 'X-LLM-Api-Key: ' \
+ -d '{"document_id":"doc_…","query":"What were FY2018 capital expenditures?"}'
+```
-## Quick start
+Headers: `X-LLM-Api-Key` (required), `X-LLM-Provider` · `X-LLM-Base-Url` · `X-LLM-Model` (optional; inherit the server defaults).
-### Prerequisites
+## HTTP API
-- Go 1.25+
-- Postgres 15+ (for the job queue + document metadata)
-- An API key from Anthropic, OpenAI, or Google
+Routes are versioned under `/v1` from day one.
-### Run locally
+| Method | Path | Purpose |
+|--------|------|---------|
+| `GET` | `/v1/health` · `/v1/version` | Liveness / build |
+| `POST` | `/v1/documents` | Ingest a document (multipart; async, 202) |
+| `GET` | `/v1/documents` · `/v1/documents/{id}` | List / get status |
+| `GET` | `/v1/documents/{id}/tree` | Structured section tree |
+| `GET` | `/v1/documents/{id}/source` | Stream the original bytes |
+| `GET` | `/v1/sections/{id}` | One section, full content |
+| `POST` | `/v1/answer/treewalk` | **Ask** — cited answer in one round-trip |
+| `POST` | `/v1/query` | Retrieve relevant sections |
+| `POST` | `/v1/replay` | Replay any answer byte-for-byte (`trace_token`) |
-```bash
-git clone https://github.com/hallelx2/vectorless-engine.git
-cd vectorless-engine
+## SDKs
-cp config.example.yaml config.yaml
-# edit config.yaml — set your LLM API key and database URL
+Official clients for **TypeScript**, **Python**, and **Go** ([`vectorless-sdk`](https://github.com/hallelx2/vectorless-sdk)). Point them at the local engine:
-docker compose up -d postgres
-go run ./cmd/engine --config config.yaml
+```python
+from vectorless import VectorlessClient
+client = VectorlessClient(base_url="http://localhost:7654")
+doc = client.wait_for_ready(client.ingest_document("10-K.pdf").document_id)
+ans = client.answer_treewalk(doc.id, "What were FY2018 capital expenditures?",
+ llm_key="") # BYOK
+print(ans.answer, ans.citations)
```
-Or run the whole stack containerised:
-
-```bash
-export ANTHROPIC_API_KEY=sk-ant-...
-docker compose --profile engine up --build
-# engine → http://localhost:8080
-```
+## Benchmarks
-The engine listens on `:8080` by default:
+We benchmark on **FinanceBench** — SEC filings whose answers live in dense financial tables, the hard case for retrieval. The harness ([`vectorless-bench`](https://github.com/hallelx2/vectorless-bench)) scores **page/section-grounded recall of the evidence** and reports **quality alongside cost and latency** — because quality is meaningless without its price. It runs `treewalk` against a BM25 lexical floor and the upstream **PageIndex** library on equal footing (same model, same hardware, cold cache), and uses rank-based statistics across tasks rather than cherry-picked wins.
```bash
-curl http://localhost:8080/v1/health
-# {"status":"ok"}
+# reproduce — points the harness at any running engine
+VECTORLESS_BASE_URL=http://localhost:7654 \
+ vlbench run --config configs/financebench_glm_fast.yaml
+vlbench report runs/ --k 5
```
-### Ingest and query
-
-```bash
-# upload a document
-curl -X POST http://localhost:8080/v1/documents \
- -F "file=@whitepaper.pdf"
-# → {"document_id":"doc_01H...","status":"pending"}
-
-# poll until ready (status: ready)
-curl http://localhost:8080/v1/documents/doc_01H...
-
-# query it
-curl -X POST http://localhost:8080/v1/query \
- -H "Content-Type: application/json" \
- -d '{
- "document_id": "doc_01H...",
- "query": "what are the API stability guarantees?"
- }'
-# → {"sections": [{"id":"sec_...","title":"...","content":"..."}]}
-```
+Every run writes a manifest stamped with model, hardware, library versions, and git commit, so numbers are reproducible and audit-able. Headline results are published with the launch.
-## HTTP API (v1)
+## Architecture
-| Method | Path | Purpose |
-|--------|-------------------------------|----------------------------------------|
-| GET | `/v1/health` | Liveness probe |
-| GET | `/v1/version` | Engine version |
-| GET | `/v1/documents` | List documents (paginated; `?status`, `?limit`, `?cursor`) |
-| POST | `/v1/documents` | Ingest a document (async, returns 202) |
-| GET | `/v1/documents/{id}` | Document metadata + status |
-| DELETE | `/v1/documents/{id}` | Delete a document |
-| GET | `/v1/documents/{id}/tree` | Full structured tree |
-| POST | `/v1/query` | Query — returns relevant sections |
-| GET | `/v1/sections/{id}` | Fetch a single section in full |
+A single Go binary with four pluggable boundaries:
-Routes are versioned under `/v1` from day one. Breaking changes ship under `/v2` with a deprecation window.
+| Boundary | Implementations |
+|----------|-----------------|
+| **Storage** | Local filesystem · S3-compatible (R2, MinIO, B2, Spaces) |
+| **Queue** | River (Postgres) · Asynq (Redis) · QStash (serverless) |
+| **LLM** | Anthropic · OpenAI · Gemini — and any Anthropic-compatible gateway (GLM/Z.ai) |
+| **Retrieval** | `treewalk` (page-based agentic — the default) · `single-pass` · `chunked-tree` |
## Configuration
-The engine reads config from `--config .yaml`, then overlays environment variables prefixed with `VLE_`. Environment variables always win.
-
-Minimal `config.yaml`:
+Config layers, in increasing priority: `--config ` → `VLE_*` env vars → CLI flags. See [`config.example.yaml`](config.example.yaml) for the full reference. Minimal:
```yaml
-server:
- addr: ":8080"
-
-database:
- url: "postgres://vle:vle@localhost:5432/vectorless?sslmode=disable"
-
-storage:
- driver: local # local | s3
- local:
- root: "./data"
-
-queue:
- driver: river # qstash | river | asynq
- river:
- num_workers: 8
-
+database: { url: "postgres://vectorless:vectorless@localhost:5432/vectorless?sslmode=disable" }
+storage: { driver: local, local: { root: "./data/documents" } }
+queue: { driver: river }
llm:
- driver: anthropic # anthropic | openai | gemini
- anthropic:
- api_key: "${ANTHROPIC_API_KEY}"
- model: "claude-sonnet-4-5"
- reasoning_model: "claude-opus-4-5"
-
-retrieval:
- strategy: chunked-tree # single-pass | chunked-tree
-
-log:
- level: info # debug | info | warn | error
- format: json # json | console
-```
-
-See [`config.example.yaml`](config.example.yaml) for the full reference.
-
-### TLS
-
-The engine is **plaintext HTTP by default** — the recommended production setup is to terminate TLS at a reverse proxy (Caddy, nginx, an ALB, a Kubernetes ingress, Cloudflare) so cert rotation lives outside the binary. For single-node / homelab / direct-to-internet deployments you can opt into direct TLS:
-
-```yaml
-server:
- addr: ":8443"
- tls:
- cert_file: "/etc/vectorless/cert.pem"
- key_file: "/etc/vectorless/key.pem"
- min_version: "1.2" # "1.2" | "1.3"
-```
-
-Or via environment variables: `VLE_TLS_CERT_FILE`, `VLE_TLS_KEY_FILE`.
-
-### Supported document formats
-
-| Format | Parser | Notes |
-|---|---|---|
-| Markdown | `goldmark` | ATX + Setext headings become section boundaries |
-| HTML | `golang.org/x/net/html` | Prefers ``/``; skips nav/footer/script |
-| DOCX | stdlib `archive/zip` + `encoding/xml` | `Heading 1…9` styles become section boundaries |
-| PDF | `hallelx2/pdftable` + `ledongthuc/pdf` | pdftable extracts positioned words + ruled / borderless tables (Markdown-rendered, `Metadata["table"]="true"`); font-size heuristic recovers headings; ledongthuc supplies `/Outlines` when present |
-| Text | stdlib | Single-section fallback |
-
-New parsers drop in behind a one-method `Parser` interface — see [`pkg/parser/`](pkg/parser/).
-
-## Features
-
-- ✅ Structured tree retrieval — no embeddings, no ANN index
-- ✅ Pluggable LLM providers (Anthropic, OpenAI, Gemini)
-- ✅ Pluggable queue backends (QStash, River, Asynq)
-- ✅ Pluggable storage (Local, S3-compatible)
-- ✅ Parallel map-reduce over big trees (context-budget-aware)
-- ✅ Versioned HTTP API (`/v1`) with OpenAPI spec (coming)
-- ✅ Graceful shutdown, structured logging, request IDs
-- ✅ Postgres schema + embedded migrations (pgx v5)
-- ✅ Document parsers: **Markdown · HTML · DOCX · PDF · Text**
-- ✅ Optional direct TLS (opt-in; default is plaintext behind a reverse proxy)
-- 🚧 Official SDKs — TypeScript, Python, Go (separate repos)
-- 🚧 Dockerfile + Helm chart
-- 🚧 Benchmarks vs. traditional RAG
-
-## Roadmap
-
-
-
-
-
-- **Phase 0 — scaffold** ✅ — interfaces, HTTP layer, local + QStash + Anthropic stubs
-- **Phase 1 — ingest** ✅ — parsers (MD/HTML/DOCX/PDF/TXT), tree builder, summarizer, Postgres migrations, TLS, docker
-- **Phase 2 — retrieval** 🚧 — `single-pass` and `chunked-tree` live, real LLM clients, benchmarks
-- **Phase 3 — ecosystem** ⏭ — River + Asynq live, S3 live, SDKs, Helm, goreleaser
-- **Phase 4 — scale** ⏭ — multi-document queries, streaming, caching, tree compaction
-
-**→ See [`ROADMAP.md`](ROADMAP.md) for the full task list with subtasks and checkboxes.**
-
-Track progress in [GitHub Issues](https://github.com/hallelx2/vectorless-engine/issues) and [Projects](https://github.com/hallelx2/vectorless-engine/projects).
-
-## Project layout
-
+ driver: anthropic
+ anthropic: { api_key: "${VLE_LLM_ANTHROPIC_API_KEY}", base_url: "https://api.z.ai/api/anthropic/v1", model: "glm-4.6" }
```
-cmd/engine/ # main binary entry point
-internal/
- api/ # chi HTTP router, v1 routes (private to the binary)
-pkg/
- config/ # YAML + env config with validation
- db/ # pgx pool, embedded migrations, CRUD helpers
- ingest/ # parse → persist → summarize pipeline
- parser/ # Parser interface + MD / HTML / DOCX / PDF / TXT drivers
- queue/ # Queue interface + QStash / River / Asynq drivers
- retrieval/ # Strategy interface + single-pass / chunked-tree
- storage/ # Storage interface + local / S3 drivers
- tree/ # core tree / section data model
-docs/ # API spec, architecture notes, images
-```
-
-LLM provider access lives in a separate module, [`llmgate`](https://github.com/hallelx2/llmgate),
-which the engine imports as `github.com/hallelx2/llmgate`. That's
-where Anthropic / OpenAI / Gemini clients, retry / budget / cache
-middleware, and the cost table live.
-
-## Contributing
-
-Contributions are very welcome — especially parsers, benchmarks, and new LLM / storage drivers. Please open an issue first for anything non-trivial so we can align on the design.
-- Run tests: `go test ./...`
-- Build binary: `go build -o engine ./cmd/engine`
-- Lint: `go vet ./...`
+> **Anthropic-compatible gateways (GLM/Z.ai):** `base_url` **must include `/v1`** — the client posts to `${base_url}/messages`.
-## Related projects
+### Supported formats
-- **vectorless-dashboard** *(private)* — web UI + control plane built on top of this engine
-- **vectorless-mcp** *(private)* — Model Context Protocol server for agents
-- **@vectorless/sdk-\*** *(open source, coming soon)* — TS / Python / Go SDKs
+PDF (positioned text + tables via [`pdftable`](https://github.com/hallelx2/pdftable)) · Markdown · HTML · DOCX · Text.
-## Acknowledgements
+## Related
-Inspired by prior work on tree-structured retrieval ([RAPTOR](https://arxiv.org/abs/2401.18059)), the [`llms.txt`](https://llmstxt.org) proposal, and the broader movement toward reasoning-native retrieval.
+- [`vectorless-sdk`](https://github.com/hallelx2/vectorless-sdk) — TS / Python / Go clients
+- [`vectorless-bench`](https://github.com/hallelx2/vectorless-bench) — the benchmark harness
+- [`llmgate`](https://github.com/hallelx2/llmgate) — provider clients, retries, pricing
## License
-Licensed under the [Apache License, Version 2.0](LICENSE).
+[Apache 2.0](LICENSE).
diff --git a/docs/images/banner.png b/docs/images/banner.png
new file mode 100644
index 0000000..38cf531
Binary files /dev/null and b/docs/images/banner.png differ
diff --git a/docs/images/how-it-works.svg b/docs/images/how-it-works.svg
new file mode 100644
index 0000000..f76526c
--- /dev/null
+++ b/docs/images/how-it-works.svg
@@ -0,0 +1,63 @@
+
diff --git a/docs/images/screenshot-dashboard.png b/docs/images/screenshot-dashboard.png
new file mode 100644
index 0000000..3166cf5
Binary files /dev/null and b/docs/images/screenshot-dashboard.png differ
diff --git a/docs/images/screenshot-install.png b/docs/images/screenshot-install.png
new file mode 100644
index 0000000..0033fe2
Binary files /dev/null and b/docs/images/screenshot-install.png differ
diff --git a/docs/images/vs-rag.svg b/docs/images/vs-rag.svg
new file mode 100644
index 0000000..0890b71
--- /dev/null
+++ b/docs/images/vs-rag.svg
@@ -0,0 +1,39 @@
+
diff --git a/localapp/index.html b/localapp/index.html
index dc4197c..7778af6 100644
--- a/localapp/index.html
+++ b/localapp/index.html
@@ -375,14 +375,21 @@
Model & API key
document.getElementById("q").addEventListener("keydown",e=>{ if((e.metaKey||e.ctrlKey)&&e.key==="Enter") ask(); });
async function ask(){
if(!activeDoc) return; const q=document.getElementById("q").value.trim(); if(!q) return;
- if(!getSettings().apiKey){ openSettings(); document.getElementById("setStatus").innerHTML='Set an API key to ask questions.'; return; }
const out=document.getElementById("result"), btn=document.getElementById("ask"); btn.disabled=true; pdfDoc=null;
out.innerHTML=`
`; }
finally{ btn.disabled=false; }
diff --git a/pkg/ingest/ingest.go b/pkg/ingest/ingest.go
index 373affe..78647a4 100644
--- a/pkg/ingest/ingest.go
+++ b/pkg/ingest/ingest.go
@@ -556,7 +556,11 @@ func (p *Pipeline) parse(ctx context.Context, parsers *parser.Registry, pl Paylo
// retried with short backoff rather than failing the whole document.
// Any non-ErrNotFound error returns immediately.
func getSourceWithRetry(ctx context.Context, s storage.Storage, key string) (io.ReadCloser, storage.Metadata, error) {
- const attempts = 6
+ // Up to ~16s of incremental backoff. A large source (multi-MB) written
+ // under heavy concurrent ingestion on a busy/low-disk filesystem can take
+ // several seconds to become visible to this worker; a too-short window
+ // turns that transient into a hard "object not found" failure.
+ const attempts = 16
var lastErr error
for i := 0; i < attempts; i++ {
rc, meta, err := s.Get(ctx, key)
@@ -570,7 +574,7 @@ func getSourceWithRetry(ctx context.Context, s storage.Storage, key string) (io.
select {
case <-ctx.Done():
return nil, storage.Metadata{}, ctx.Err()
- case <-time.After(time.Duration(i+1) * 150 * time.Millisecond):
+ case <-time.After(time.Duration(i+1) * 125 * time.Millisecond):
}
}
return nil, storage.Metadata{}, lastErr