GitHub - hallelx2/vectorless-engine: A retrieval engine that reasons over document structure — not embeddings. No chunking, no top-K, no vector DB.

A retrieval engine that reasons over document structure — not embeddings.
Parse a document into a tree, let an LLM navigate it, return whole sections with citations.

Quick start · How it works · Why not vector RAG · API · SDKs · Benchmarks · Config

Quick start

One container — engine, a bundled Postgres, and a web UI. The only thing you bring is an LLM key.

docker run -p 8080:8080 -p 7654:7654 \
  -e VLE_LLM_ANTHROPIC_API_KEY=<your key> \
  halleluyaholudele/vectorless
# UI  → http://localhost:8080
# API → http://localhost:7654

Open http://localhost:8080, drop in a PDF, and ask a question. You can also set your API key from the dashboard (gear icon) instead of -e — it stays in your browser and is sent per request (BYOK).

Works with GLM / Z.ai, Anthropic, OpenAI, and Gemini. The image defaults to GLM (glm-4.6) via the Anthropic-compatible gateway; override with -e VLE_LLM_*.

Run from source (Go 1.25+ · Postgres)

git clone https://github.com/hallelx2/vectorless-engine.git && cd vectorless-engine
docker compose up -d postgres
export VLE_LLM_ANTHROPIC_API_KEY=<your key>
go run ./cmd/engine --local      # zero-config local mode on :7654

How it works

At ingest the engine parses a document into a hierarchical tree of real sections — titles, page ranges, structure — and persists it. There is no chunking and nothing is embedded. At query time an LLM navigates that tree the way a person flips to the right page, reads the relevant sections in full, and answers with citations.

No embeddings — nothing to recompute when you swap models, nothing to drift.
No vector database — Postgres + object storage is enough.
No top-K tuning — the model reads 1 section or 8, as the question needs.
Citations for free — every answer carries page ranges and verbatim quotes.

Why not vector RAG

Vector RAG works until you hit the parts where it doesn't: chunks lose structure, top-K is a guess, embeddings drift, and you maintain a second database to do approximate similarity over fragments you cut out of context.

Bring your own key

The engine boots without a provider key and accepts credentials per request, so a self-hosted or Docker user configures the key from the dashboard — never baked into the server.

curl -X POST http://localhost:7654/v1/answer/treewalk \
  -H 'Content-Type: application/json' \
  -H 'X-LLM-Api-Key: <your key>' \
  -d '{"document_id":"doc_…","query":"What were FY2018 capital expenditures?"}'

Headers: X-LLM-Api-Key (required), X-LLM-Provider · X-LLM-Base-Url · X-LLM-Model (optional; inherit the server defaults).

HTTP API

Routes are versioned under /v1 from day one.

Method	Path	Purpose
`GET`	`/v1/health` · `/v1/version`	Liveness / build
`POST`	`/v1/documents`	Ingest a document (multipart; async, 202)
`GET`	`/v1/documents` · `/v1/documents/{id}`	List / get status
`GET`	`/v1/documents/{id}/tree`	Structured section tree
`GET`	`/v1/documents/{id}/source`	Stream the original bytes
`GET`	`/v1/sections/{id}`	One section, full content
`POST`	`/v1/answer/treewalk`	Ask — cited answer in one round-trip
`POST`	`/v1/query`	Retrieve relevant sections
`POST`	`/v1/replay`	Replay any answer byte-for-byte (`trace_token`)

SDKs

Official clients for TypeScript, Python, and Go (vectorless-sdk). Point them at the local engine:

from vectorless import VectorlessClient
client = VectorlessClient(base_url="http://localhost:7654")
doc = client.wait_for_ready(client.ingest_document("10-K.pdf").document_id)
ans = client.answer_treewalk(doc.id, "What were FY2018 capital expenditures?",
                             llm_key="<your key>")   # BYOK
print(ans.answer, ans.citations)

Benchmarks

We benchmark on FinanceBench — SEC filings whose answers live in dense financial tables, the hard case for retrieval. The harness (vectorless-bench) scores page/section-grounded recall of the evidence and reports quality alongside cost and latency — because quality is meaningless without its price. It runs treewalk against a BM25 lexical floor and the upstream PageIndex library on equal footing (same model, same hardware, cold cache), and uses rank-based statistics across tasks rather than cherry-picked wins.

# reproduce — points the harness at any running engine
VECTORLESS_BASE_URL=http://localhost:7654 \
  vlbench run --config configs/financebench_glm_fast.yaml
vlbench report runs/<stamp> --k 5

Every run writes a manifest stamped with model, hardware, library versions, and git commit, so numbers are reproducible and audit-able. Headline results are published with the launch.

Architecture

A single Go binary with four pluggable boundaries:

Boundary	Implementations
Storage	Local filesystem · S3-compatible (R2, MinIO, B2, Spaces)
Queue	River (Postgres) · Asynq (Redis) · QStash (serverless)
LLM	Anthropic · OpenAI · Gemini — and any Anthropic-compatible gateway (GLM/Z.ai)
Retrieval	`treewalk` (page-based agentic — the default) · `single-pass` · `chunked-tree`

Configuration

Config layers, in increasing priority: --config <yaml> → VLE_* env vars → CLI flags. See config.example.yaml for the full reference. Minimal:

database: { url: "postgres://vectorless:vectorless@localhost:5432/vectorless?sslmode=disable" }
storage:  { driver: local, local: { root: "./data/documents" } }
queue:    { driver: river }
llm:
  driver: anthropic
  anthropic: { api_key: "${VLE_LLM_ANTHROPIC_API_KEY}", base_url: "https://api.z.ai/api/anthropic/v1", model: "glm-4.6" }

Anthropic-compatible gateways (GLM/Z.ai): base_url must include /v1 — the client posts to ${base_url}/messages.

Windows + local storage: Windows Defender real-time protection scans each freshly written file and briefly hides it from os.Stat/os.Open, which under heavy concurrent ingestion can surface as transient object not found errors. The local backend rides through this window with a short internal retry, but for large bulk loads add a Defender exclusion for your storage root (local.root / VLE_STORAGE_LOCAL_ROOT): Add-MpPreference -ExclusionPath "C:\path\to\data\documents". Linux has no such scan-hold and needs no exclusion.

Supported formats

PDF (positioned text + tables via pdftable) · Markdown · HTML · DOCX · Text.

License

Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github		.github
charts/vectorless-engine		charts/vectorless-engine
cmd		cmd
deploy/allinone		deploy/allinone
docs		docs
gen/vectorless/v1		gen/vectorless/v1
internal		internal
localapp		localapp
pkg		pkg
proto/vectorless/v1		proto/vectorless/v1
scripts		scripts
testdata		testdata
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.allinone		Dockerfile.allinone
Dockerfile.server		Dockerfile.server
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
buf.gen.yaml		buf.gen.yaml
buf.yaml		buf.yaml
config.example.yaml		config.example.yaml
config.server.example.yaml		config.server.example.yaml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
openapi.yaml		openapi.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick start

How it works

Why not vector RAG

Bring your own key

HTTP API

SDKs

Benchmarks

Architecture

Configuration

Supported formats

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick start

How it works

Why not vector RAG

Bring your own key

HTTP API

SDKs

Benchmarks

Architecture

Configuration

Supported formats

Related

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages