diff --git a/README.md b/README.md index d1fc30e..6f04b1a 100644 --- a/README.md +++ b/README.md @@ -1,305 +1,179 @@

- vectorless-engine — reason over document structure, not embeddings + Vectorless — reasoning-based document retrieval

-

vectorless-engine

-

- A retrieval engine that reasons over document structure — not embeddings.
- No chunking. No top-K. No vector database. Just a tree, an LLM, and full sections. + A retrieval engine that reasons over document structure — not embeddings.
+ Parse a document into a tree, let an LLM navigate it, return whole sections with citations.

License: Apache 2.0 Go version - Go Report Card - CI - Docker + Docker Hub + CI Stars

- Why · - How it works · Quick start · - Architecture · - Configuration · - Roadmap + How it works · + Why not vector RAG · + API · + SDKs · + Benchmarks · + Config

--- -## Why vectorless +## Quick start -Vector RAG works — until you hit the parts where it doesn't. Chunks lose structure. Top-K is a guess. Embeddings drift. You maintain a second database just to do approximate similarity on bits of text you cut out of context. +One container — engine, a bundled Postgres, and a web UI. The only thing you bring is an LLM key. -**vectorless-engine takes a different path**: at ingest, it builds a structured tree of the document (titles, summaries, metadata) — essentially an LLM-friendly table of contents. At query time, an LLM reads that tree and picks the exact section IDs it needs. The engine returns those sections in full, with their narrative intact. +```bash +docker run -p 8080:8080 -p 7654:7654 \ + -e VLE_LLM_ANTHROPIC_API_KEY= \ + halleluyaholudele/vectorless +# UI → http://localhost:8080 +# API → http://localhost:7654 +```

- Traditional RAG vs. vectorless + docker run halleluyaholudele/vectorless

-**What you get:** - -- **No embeddings** — nothing to recompute when you swap models, nothing to drift. -- **No vector database** — Postgres + object storage is enough. -- **No top-K tuning** — the model picks 1 section or 8, as needed. -- **Full context preserved** — sections are returned whole, not as fragments. -- **Citations for free** — every returned section has a stable ID. -- **Provider-agnostic** — Anthropic, OpenAI, Gemini all plug in behind the same interface. +Open **http://localhost:8080**, drop in a PDF, and ask a question. You can also set your API key from the dashboard (gear icon) instead of `-e` — it stays in your browser and is sent per request ([BYOK](#bring-your-own-key)). -## How it works +

+ Vectorless dashboard — cited answer with source page preview +

-### 1. Ingest — build a structured tree +> Works with **GLM / Z.ai**, **Anthropic**, **OpenAI**, and **Gemini**. The image defaults to GLM (`glm-4.6`) via the Anthropic-compatible gateway; override with `-e VLE_LLM_*`. -Upload a document; the engine parses it, splits it along semantic section boundaries (not blind chunks), summarizes each section with a cheap model, and persists the tree. All asynchronous, driven by the queue of your choice. +
+Run from source (Go 1.25+ · Postgres) -

- Ingest pipeline -

+```bash +git clone https://github.com/hallelx2/vectorless-engine.git && cd vectorless-engine +docker compose up -d postgres +export VLE_LLM_ANTHROPIC_API_KEY= +go run ./cmd/engine --local # zero-config local mode on :7654 +``` +
-### 2. Query — the LLM reasons over the tree +## How it works -For small documents the whole tree fits in one prompt. For large documents the engine **splits the tree into budget-sized slices, fires parallel LLM calls, and merges the results** — so you're never bottlenecked on a single model's context window. +At **ingest** the engine parses a document into a hierarchical tree of real sections — titles, page ranges, structure — and persists it. There is **no chunking** and **nothing is embedded**. At **query** time an LLM navigates that tree the way a person flips to the right page, reads the relevant sections in full, and answers with citations.

- Chunked-tree parallel reasoning + Ingest → navigate → read → cited answer

-### 3. Return — full sections, not fragments +- **No embeddings** — nothing to recompute when you swap models, nothing to drift. +- **No vector database** — Postgres + object storage is enough. +- **No top-K tuning** — the model reads 1 section or 8, as the question needs. +- **Citations for free** — every answer carries page ranges and verbatim quotes. -The engine fetches the selected sections from object storage and returns them intact. Your downstream model (or agent) gets coherent, cite-able text — not a bag of chunks. +## Why not vector RAG -## Architecture +Vector RAG works until you hit the parts where it doesn't: chunks lose structure, top-K is a guess, embeddings drift, and you maintain a second database to do approximate similarity over fragments you cut out of context.

- vectorless-engine architecture + Traditional vector RAG vs Vectorless

-The engine is a **single Go binary** with four pluggable boundaries: +## Bring your own key -| Boundary | Implementations shipped | -|------------------|---------------------------------------------------------------------------------| -| **Storage** | Local filesystem · S3-compatible (AWS S3, Cloudflare R2, MinIO, Backblaze B2, DigitalOcean Spaces) — GCS / Azure planned | -| **Queue** | [QStash](https://upstash.com/docs/qstash) (serverless) · [River](https://riverqueue.com) (Postgres) · [Asynq](https://github.com/hibiken/asynq) (Redis) | -| **LLM** | Anthropic · OpenAI · Gemini — all behind one `llm.Client` interface | -| **Retrieval** | `single-pass` (small trees, one call) · `chunked-tree` (big trees, parallel map-reduce) | +The engine boots **without** a provider key and accepts credentials per request, so a self-hosted or Docker user configures the key from the dashboard — never baked into the server. -Everything else — the control plane, dashboard, MCP server, SDKs — lives outside this repo and talks to the engine over its HTTP API. Run the engine standalone; run it behind your own control plane; embed it in your product. Up to you. +```bash +curl -X POST http://localhost:7654/v1/answer/treewalk \ + -H 'Content-Type: application/json' \ + -H 'X-LLM-Api-Key: ' \ + -d '{"document_id":"doc_…","query":"What were FY2018 capital expenditures?"}' +``` -## Quick start +Headers: `X-LLM-Api-Key` (required), `X-LLM-Provider` · `X-LLM-Base-Url` · `X-LLM-Model` (optional; inherit the server defaults). -### Prerequisites +## HTTP API -- Go 1.25+ -- Postgres 15+ (for the job queue + document metadata) -- An API key from Anthropic, OpenAI, or Google +Routes are versioned under `/v1` from day one. -### Run locally +| Method | Path | Purpose | +|--------|------|---------| +| `GET` | `/v1/health` · `/v1/version` | Liveness / build | +| `POST` | `/v1/documents` | Ingest a document (multipart; async, 202) | +| `GET` | `/v1/documents` · `/v1/documents/{id}` | List / get status | +| `GET` | `/v1/documents/{id}/tree` | Structured section tree | +| `GET` | `/v1/documents/{id}/source` | Stream the original bytes | +| `GET` | `/v1/sections/{id}` | One section, full content | +| `POST` | `/v1/answer/treewalk` | **Ask** — cited answer in one round-trip | +| `POST` | `/v1/query` | Retrieve relevant sections | +| `POST` | `/v1/replay` | Replay any answer byte-for-byte (`trace_token`) | -```bash -git clone https://github.com/hallelx2/vectorless-engine.git -cd vectorless-engine +## SDKs -cp config.example.yaml config.yaml -# edit config.yaml — set your LLM API key and database URL +Official clients for **TypeScript**, **Python**, and **Go** ([`vectorless-sdk`](https://github.com/hallelx2/vectorless-sdk)). Point them at the local engine: -docker compose up -d postgres -go run ./cmd/engine --config config.yaml +```python +from vectorless import VectorlessClient +client = VectorlessClient(base_url="http://localhost:7654") +doc = client.wait_for_ready(client.ingest_document("10-K.pdf").document_id) +ans = client.answer_treewalk(doc.id, "What were FY2018 capital expenditures?", + llm_key="") # BYOK +print(ans.answer, ans.citations) ``` -Or run the whole stack containerised: - -```bash -export ANTHROPIC_API_KEY=sk-ant-... -docker compose --profile engine up --build -# engine → http://localhost:8080 -``` +## Benchmarks -The engine listens on `:8080` by default: +We benchmark on **FinanceBench** — SEC filings whose answers live in dense financial tables, the hard case for retrieval. The harness ([`vectorless-bench`](https://github.com/hallelx2/vectorless-bench)) scores **page/section-grounded recall of the evidence** and reports **quality alongside cost and latency** — because quality is meaningless without its price. It runs `treewalk` against a BM25 lexical floor and the upstream **PageIndex** library on equal footing (same model, same hardware, cold cache), and uses rank-based statistics across tasks rather than cherry-picked wins. ```bash -curl http://localhost:8080/v1/health -# {"status":"ok"} +# reproduce — points the harness at any running engine +VECTORLESS_BASE_URL=http://localhost:7654 \ + vlbench run --config configs/financebench_glm_fast.yaml +vlbench report runs/ --k 5 ``` -### Ingest and query - -```bash -# upload a document -curl -X POST http://localhost:8080/v1/documents \ - -F "file=@whitepaper.pdf" -# → {"document_id":"doc_01H...","status":"pending"} - -# poll until ready (status: ready) -curl http://localhost:8080/v1/documents/doc_01H... - -# query it -curl -X POST http://localhost:8080/v1/query \ - -H "Content-Type: application/json" \ - -d '{ - "document_id": "doc_01H...", - "query": "what are the API stability guarantees?" - }' -# → {"sections": [{"id":"sec_...","title":"...","content":"..."}]} -``` +Every run writes a manifest stamped with model, hardware, library versions, and git commit, so numbers are reproducible and audit-able. Headline results are published with the launch. -## HTTP API (v1) +## Architecture -| Method | Path | Purpose | -|--------|-------------------------------|----------------------------------------| -| GET | `/v1/health` | Liveness probe | -| GET | `/v1/version` | Engine version | -| GET | `/v1/documents` | List documents (paginated; `?status`, `?limit`, `?cursor`) | -| POST | `/v1/documents` | Ingest a document (async, returns 202) | -| GET | `/v1/documents/{id}` | Document metadata + status | -| DELETE | `/v1/documents/{id}` | Delete a document | -| GET | `/v1/documents/{id}/tree` | Full structured tree | -| POST | `/v1/query` | Query — returns relevant sections | -| GET | `/v1/sections/{id}` | Fetch a single section in full | +A single Go binary with four pluggable boundaries: -Routes are versioned under `/v1` from day one. Breaking changes ship under `/v2` with a deprecation window. +| Boundary | Implementations | +|----------|-----------------| +| **Storage** | Local filesystem · S3-compatible (R2, MinIO, B2, Spaces) | +| **Queue** | River (Postgres) · Asynq (Redis) · QStash (serverless) | +| **LLM** | Anthropic · OpenAI · Gemini — and any Anthropic-compatible gateway (GLM/Z.ai) | +| **Retrieval** | `treewalk` (page-based agentic — the default) · `single-pass` · `chunked-tree` | ## Configuration -The engine reads config from `--config .yaml`, then overlays environment variables prefixed with `VLE_`. Environment variables always win. - -Minimal `config.yaml`: +Config layers, in increasing priority: `--config ` → `VLE_*` env vars → CLI flags. See [`config.example.yaml`](config.example.yaml) for the full reference. Minimal: ```yaml -server: - addr: ":8080" - -database: - url: "postgres://vle:vle@localhost:5432/vectorless?sslmode=disable" - -storage: - driver: local # local | s3 - local: - root: "./data" - -queue: - driver: river # qstash | river | asynq - river: - num_workers: 8 - +database: { url: "postgres://vectorless:vectorless@localhost:5432/vectorless?sslmode=disable" } +storage: { driver: local, local: { root: "./data/documents" } } +queue: { driver: river } llm: - driver: anthropic # anthropic | openai | gemini - anthropic: - api_key: "${ANTHROPIC_API_KEY}" - model: "claude-sonnet-4-5" - reasoning_model: "claude-opus-4-5" - -retrieval: - strategy: chunked-tree # single-pass | chunked-tree - -log: - level: info # debug | info | warn | error - format: json # json | console -``` - -See [`config.example.yaml`](config.example.yaml) for the full reference. - -### TLS - -The engine is **plaintext HTTP by default** — the recommended production setup is to terminate TLS at a reverse proxy (Caddy, nginx, an ALB, a Kubernetes ingress, Cloudflare) so cert rotation lives outside the binary. For single-node / homelab / direct-to-internet deployments you can opt into direct TLS: - -```yaml -server: - addr: ":8443" - tls: - cert_file: "/etc/vectorless/cert.pem" - key_file: "/etc/vectorless/key.pem" - min_version: "1.2" # "1.2" | "1.3" -``` - -Or via environment variables: `VLE_TLS_CERT_FILE`, `VLE_TLS_KEY_FILE`. - -### Supported document formats - -| Format | Parser | Notes | -|---|---|---| -| Markdown | `goldmark` | ATX + Setext headings become section boundaries | -| HTML | `golang.org/x/net/html` | Prefers `
`/`
`; skips nav/footer/script | -| DOCX | stdlib `archive/zip` + `encoding/xml` | `Heading 1…9` styles become section boundaries | -| PDF | `hallelx2/pdftable` + `ledongthuc/pdf` | pdftable extracts positioned words + ruled / borderless tables (Markdown-rendered, `Metadata["table"]="true"`); font-size heuristic recovers headings; ledongthuc supplies `/Outlines` when present | -| Text | stdlib | Single-section fallback | - -New parsers drop in behind a one-method `Parser` interface — see [`pkg/parser/`](pkg/parser/). - -## Features - -- ✅ Structured tree retrieval — no embeddings, no ANN index -- ✅ Pluggable LLM providers (Anthropic, OpenAI, Gemini) -- ✅ Pluggable queue backends (QStash, River, Asynq) -- ✅ Pluggable storage (Local, S3-compatible) -- ✅ Parallel map-reduce over big trees (context-budget-aware) -- ✅ Versioned HTTP API (`/v1`) with OpenAPI spec (coming) -- ✅ Graceful shutdown, structured logging, request IDs -- ✅ Postgres schema + embedded migrations (pgx v5) -- ✅ Document parsers: **Markdown · HTML · DOCX · PDF · Text** -- ✅ Optional direct TLS (opt-in; default is plaintext behind a reverse proxy) -- 🚧 Official SDKs — TypeScript, Python, Go (separate repos) -- 🚧 Dockerfile + Helm chart -- 🚧 Benchmarks vs. traditional RAG - -## Roadmap - -

- Roadmap -

- -- **Phase 0 — scaffold** ✅ — interfaces, HTTP layer, local + QStash + Anthropic stubs -- **Phase 1 — ingest** ✅ — parsers (MD/HTML/DOCX/PDF/TXT), tree builder, summarizer, Postgres migrations, TLS, docker -- **Phase 2 — retrieval** 🚧 — `single-pass` and `chunked-tree` live, real LLM clients, benchmarks -- **Phase 3 — ecosystem** ⏭ — River + Asynq live, S3 live, SDKs, Helm, goreleaser -- **Phase 4 — scale** ⏭ — multi-document queries, streaming, caching, tree compaction - -**→ See [`ROADMAP.md`](ROADMAP.md) for the full task list with subtasks and checkboxes.** - -Track progress in [GitHub Issues](https://github.com/hallelx2/vectorless-engine/issues) and [Projects](https://github.com/hallelx2/vectorless-engine/projects). - -## Project layout - + driver: anthropic + anthropic: { api_key: "${VLE_LLM_ANTHROPIC_API_KEY}", base_url: "https://api.z.ai/api/anthropic/v1", model: "glm-4.6" } ``` -cmd/engine/ # main binary entry point -internal/ - api/ # chi HTTP router, v1 routes (private to the binary) -pkg/ - config/ # YAML + env config with validation - db/ # pgx pool, embedded migrations, CRUD helpers - ingest/ # parse → persist → summarize pipeline - parser/ # Parser interface + MD / HTML / DOCX / PDF / TXT drivers - queue/ # Queue interface + QStash / River / Asynq drivers - retrieval/ # Strategy interface + single-pass / chunked-tree - storage/ # Storage interface + local / S3 drivers - tree/ # core tree / section data model -docs/ # API spec, architecture notes, images -``` - -LLM provider access lives in a separate module, [`llmgate`](https://github.com/hallelx2/llmgate), -which the engine imports as `github.com/hallelx2/llmgate`. That's -where Anthropic / OpenAI / Gemini clients, retry / budget / cache -middleware, and the cost table live. - -## Contributing - -Contributions are very welcome — especially parsers, benchmarks, and new LLM / storage drivers. Please open an issue first for anything non-trivial so we can align on the design. -- Run tests: `go test ./...` -- Build binary: `go build -o engine ./cmd/engine` -- Lint: `go vet ./...` +> **Anthropic-compatible gateways (GLM/Z.ai):** `base_url` **must include `/v1`** — the client posts to `${base_url}/messages`. -## Related projects +### Supported formats -- **vectorless-dashboard** *(private)* — web UI + control plane built on top of this engine -- **vectorless-mcp** *(private)* — Model Context Protocol server for agents -- **@vectorless/sdk-\*** *(open source, coming soon)* — TS / Python / Go SDKs +PDF (positioned text + tables via [`pdftable`](https://github.com/hallelx2/pdftable)) · Markdown · HTML · DOCX · Text. -## Acknowledgements +## Related -Inspired by prior work on tree-structured retrieval ([RAPTOR](https://arxiv.org/abs/2401.18059)), the [`llms.txt`](https://llmstxt.org) proposal, and the broader movement toward reasoning-native retrieval. +- [`vectorless-sdk`](https://github.com/hallelx2/vectorless-sdk) — TS / Python / Go clients +- [`vectorless-bench`](https://github.com/hallelx2/vectorless-bench) — the benchmark harness +- [`llmgate`](https://github.com/hallelx2/llmgate) — provider clients, retries, pricing ## License -Licensed under the [Apache License, Version 2.0](LICENSE). +[Apache 2.0](LICENSE). diff --git a/docs/images/banner.png b/docs/images/banner.png new file mode 100644 index 0000000..38cf531 Binary files /dev/null and b/docs/images/banner.png differ diff --git a/docs/images/how-it-works.svg b/docs/images/how-it-works.svg new file mode 100644 index 0000000..f76526c --- /dev/null +++ b/docs/images/how-it-works.svg @@ -0,0 +1,63 @@ + + + + + + + + + + + HOW IT WORKS + Reason over structure, not embeddings + + + + + + + + + + + 01 · INGEST + Parse to a tree + PDF → titled sections, + page ranges, structure + — no chunking. + + + 02 · NAVIGATE + LLM walks it + Reads the table of + contents, jumps to the + right pages. + + + 03 · READ + Full sections + Returns whole sections + in context — not + top-K fragments. + + + 04 · ANSWER + Cited answer + Grounded answer with + page ranges + verbatim + quotes. + + + + + + + + + + + + + + Postgres + object storage. No vector database, no ANN index, nothing to recompute when you swap models. + diff --git a/docs/images/screenshot-dashboard.png b/docs/images/screenshot-dashboard.png new file mode 100644 index 0000000..3166cf5 Binary files /dev/null and b/docs/images/screenshot-dashboard.png differ diff --git a/docs/images/screenshot-install.png b/docs/images/screenshot-install.png new file mode 100644 index 0000000..0033fe2 Binary files /dev/null and b/docs/images/screenshot-install.png differ diff --git a/docs/images/vs-rag.svg b/docs/images/vs-rag.svg new file mode 100644 index 0000000..0890b71 --- /dev/null +++ b/docs/images/vs-rag.svg @@ -0,0 +1,39 @@ + + + + + + + + + + + + + + TRADITIONAL VECTOR RAG + Cut it up, hope for the best + + + Chunk the document — structure is lost + Embed every chunk into a vector DB + Top-K nearest neighbours — a guess + Stitch fragments back together + Re-embed everything when models change + + Lossy · approximate · a second database to run + + + + VECTORLESS + Keep the structure, reason over it + + + Parse into a tree of real sections + An LLM navigates by title + page + Returns whole sections, in context + Citations for free — page + quote + Swap models freely — nothing to recompute + + Structure-preserving · exact · just Postgres + storage + diff --git a/localapp/index.html b/localapp/index.html index dc4197c..7778af6 100644 --- a/localapp/index.html +++ b/localapp/index.html @@ -375,14 +375,21 @@

Model & API key

document.getElementById("q").addEventListener("keydown",e=>{ if((e.metaKey||e.ctrlKey)&&e.key==="Enter") ask(); }); async function ask(){ if(!activeDoc) return; const q=document.getElementById("q").value.trim(); if(!q) return; - if(!getSettings().apiKey){ openSettings(); document.getElementById("setStatus").innerHTML='Set an API key to ask questions.'; return; } const out=document.getElementById("result"), btn=document.getElementById("ask"); btn.disabled=true; pdfDoc=null; out.innerHTML=`
Navigating the document…
`; const t0=performance.now(); try{ const r=await fetch(E("/v1/answer/treewalk"),{method:"POST",headers:{"Content-Type":"application/json",...llmHeaders()},body:JSON.stringify({document_id:activeDoc.id,query:q})}); const d=await r.json(); - if(!r.ok){ out.innerHTML=`
Error: ${esc(d.error||JSON.stringify(d))}
`; return; } + if(!r.ok){ + const msg=d.error||JSON.stringify(d); + if(/no LLM credentials|X-LLM-Api-Key/i.test(msg)){ + out.innerHTML=""; openSettings(); + document.getElementById("setStatus").innerHTML='Set your API key to ask questions.'; + return; + } + out.innerHTML=`
Error: ${esc(msg)}
`; return; + } renderResult(d,Math.round(performance.now()-t0)); }catch(e){ out.innerHTML=`
${esc(String(e))}
`; } finally{ btn.disabled=false; } diff --git a/pkg/ingest/ingest.go b/pkg/ingest/ingest.go index 373affe..78647a4 100644 --- a/pkg/ingest/ingest.go +++ b/pkg/ingest/ingest.go @@ -556,7 +556,11 @@ func (p *Pipeline) parse(ctx context.Context, parsers *parser.Registry, pl Paylo // retried with short backoff rather than failing the whole document. // Any non-ErrNotFound error returns immediately. func getSourceWithRetry(ctx context.Context, s storage.Storage, key string) (io.ReadCloser, storage.Metadata, error) { - const attempts = 6 + // Up to ~16s of incremental backoff. A large source (multi-MB) written + // under heavy concurrent ingestion on a busy/low-disk filesystem can take + // several seconds to become visible to this worker; a too-short window + // turns that transient into a hard "object not found" failure. + const attempts = 16 var lastErr error for i := 0; i < attempts; i++ { rc, meta, err := s.Get(ctx, key) @@ -570,7 +574,7 @@ func getSourceWithRetry(ctx context.Context, s storage.Storage, key string) (io. select { case <-ctx.Done(): return nil, storage.Metadata{}, ctx.Err() - case <-time.After(time.Duration(i+1) * 150 * time.Millisecond): + case <-time.After(time.Duration(i+1) * 125 * time.Millisecond): } } return nil, storage.Metadata{}, lastErr