GraphAnything

Turn anything into a navigable knowledge graph. Markdown vaults, OpenAPI specs, contracts, meeting notes, chat transcripts — all converge on the same {nodes, edges} schema with full provenance, versioning, federation, and quality reporting.

LLM-driven extraction goes through GraphAnything's built-in OpenAI-compatible client, so you can point it at any chat.completions-shaped endpoint:

Local vLLM serve, llama.cpp, Ollama, LM Studio
OpenAI itself, or any commercial OpenAI-compatible host

中文版 / Chinese → README.zh.md

At a glance


Schema presets	10 (chat-log / codebase / contracts / db-schema / fstree / meeting / obsidian-vault / openapi / papers / pr-review)
Extractors	8 (markdown / json-yaml / openapi / fstree / chatlog / llm-entity / vlm-stub / noop)
Render formats	9 (mermaid / html / svg / cypher / graphml / ascii / json / canvas / timeline)
MCP tools	17
CLI sub-commands	19
LLM backend	any OpenAI-compatible endpoint (HTTP)
External API	optional — read-only / rule-based paths need no LLM

Three drivers, one core

              ┌──────────────────────────────────────────┐
              │   GraphAnything (core)                   │
              │   Session state machine + 8 extractors   │
              │   + 10 schema presets + 9 viz formats    │
              │   + temporal + federate + ask + quality. │
              └──────────────────────────────────────────┘
                  ▲              ▲              ▲
                  │              │              │
        ┌─────────┴────┐  ┌──────┴────┐  ┌──────┴──────────┐
        │  CLI / REPL  │  │  Skill    │  │  llm_client.py  │
        │ graphanything│  │ /graphany.│  │ (OpenAI-compat) │
        └──────────────┘  └───────────┘  └─────────────────┘

Install

cd GraphAnything
pip install -e .

This registers the graphanything console-script. As a fallback, python -m GraphAnything.cli ... always works.

Optional extras:

pip install -e ".[mcp]"     # MCP stdio server (Claude Code / Cursor / Gemini CLI)
pip install -e ".[neo4j]"   # direct push to a running Neo4j instance
pip install -e ".[svg]"     # SVG renderer (matplotlib)
pip install -e ".[repl]"    # nicer REPL (history + completion)
pip install -e ".[all]"     # everything above

For LLM-gated commands (refine --llm, sample --extractor llm-entity, ask --llm, eval --llm), point the client at any OpenAI-compatible chat-completions endpoint:

# vLLM serve / llama.cpp / Ollama / LM Studio / OpenAI / …
export GA_API_BASE=http://localhost:8000/v1     # default
export GA_MODEL=Qwen3-32B-Instruct              # required
export GA_API_KEY=local                         # optional; many local servers
                                                # accept any string

# Legacy upstream env vars are also honoured:
#   OPENAI_API_BASE / OPENAI_API_KEY / OPENAI_MODEL
#   API_BASE        / API_KEY        / SUMMARY_MODEL_NAME

Rule-based extractors (markdown, json-yaml, openapi, fstree, chatlog) and all read-only commands (render, explain, ask without --llm, versions, diff, federate, eval without --llm) need no LLM at all.

CLI quickstart

# One-shot end-to-end
graphanything new ./vault/ --preset obsidian-vault --auto

# Step-by-step (recommended for non-trivial corpora)
graphanything new ./contracts --preset contracts
graphanything sample --n 5
graphanything review --merge ABC_Corp,abc_corp,ABC公司
graphanything refine "add GoverningLaw entity"
graphanything run

# Query
graphanything ask "all clauses with amount > 100k"
graphanything explain ep_api_get_user

# Versioning (incremental re-extract on changed files only)
graphanything update                # re-hash all inputs; redo only changed
graphanything versions              # list snapshots written so far
graphanything diff 1 2              # what changed between v1 and v2

# Federation
graphanything federate g1.json g2.json --out universe.json --fuzzy

# Quality
graphanything eval --out-dir graphanything-out/quality --llm --judge-n 20

# Render (9 formats)
graphanything render --fmt mermaid                       # for Claude / chat
graphanything render --fmt cypher  --out g.cypher        # → Neo4j
graphanything render --fmt graphml --out g.graphml       # → Gephi
graphanything render --fmt html    --out g.html          # standalone, force-directed
graphanything render --fmt json    --out g.json          # NetworkX JSON
graphanything render --fmt timeline --out timeline.html  # X = year, Y = community
graphanything render --fmt canvas  --out g.canvas        # Obsidian Canvas
graphanything render --fmt ascii                         # piped into terminal
graphanything render --fmt svg     --out g.svg
graphanything render --fmt mermaid --budget-tokens 4000  # PageRank-prune to fit

All 19 CLI sub-commands

Sub-command	Purpose
`new <inputs>`	Open a session; `--preset NAME`, `--extractor NAME`, `--auto`, budget caps
`propose [--n N] [--llm]`	Suggest an initial schema (rule-derived if possible, else generic / LLM)
`refine "<instruction>" [--llm]`	Edit the schema (regex first, LLM fallback)
`sample [--n N] [--extractor NAME]`	Extract from N inputs into `pending`, propose merges
`review`	`--accept-all` / `--accept ID...` / `--reject ID... [--reason ...]` / `--merge a,b[,c]`
`run [--out DIR] [--extractor NAME]`	Lock schema → run all inputs → write `graph.json` + snapshot
`update [--out DIR] [--extractor NAME]`	Re-extract only inputs whose `source_hash` changed; new snapshot
`versions [--out-root R]`	List snapshots written by `run` / `update`
`diff <v_old> <v_new>`	Diff two snapshots (added / removed / modified nodes & edges)
`ask "<question>" [--llm]`	NL query → graph traversal (regex first, LLM fallback)
`explain <id\|label\|"src → rel → tgt">`	Provenance for a node or edge
`render --fmt FMT [--out PATH] [--graph G] [--budget-tokens N]`	9 formats (see below)
`federate g1 g2... --out U [--fuzzy] [--fuzzy-threshold T] [--llm]`	Merge multiple graphs into one universe
`eval [--out-dir D] [--llm] [--judge-n N] [--graph G]`	Coverage / dedup / per-extractor / sampled LLM-judge
`presets`	List the 10 built-in schema presets
`extractors`	List the 8 registered extractors
`sessions`	List sessions in `graphanything-out/sessions/`
`use <session_id>`	Switch the active session pointer
`repl [<inputs>]`	Interactive shell (history + completion via `prompt_toolkit` if installed)

Top-level flags (apply to every sub-command):

--sessions-dir PATH — where session JSONs live (default graphanything-out/sessions/).
--session ID — override the "current session" pointer for one call.

new and repl accept budget soft caps:

--max-tokens N — total token ceiling
--max-dollars D — total $ ceiling
--max-api-calls N — total LLM-call ceiling

run / sample stop early when any cap is exceeded; remaining inputs are listed in the result notes.

Skill / MCP server (Claude Code, Cursor, Gemini, …)

The same Session core, accessed through 17 MCP tools:

Tool	Purpose
`graphanything_open_session`	Start a session over inputs
`graphanything_list_presets`	10 built-in schema templates
`graphanything_list_extractors`	8 extractors (rule + LLM + VLM stub)
`graphanything_propose_schema`	Suggest a starting schema
`graphanything_refine_schema`	Edit schema (regex + LLM fallback)
`graphanything_sample`	Extract from N inputs into `pending`
`graphanything_review`	Apply `accept_all` / `accept` / `reject` / `merge` / `rule` actions
`graphanything_run`	Full extraction → `graph.json` + snapshot
`graphanything_status`	Counts + cost + schema
`graphanything_ask`	Natural-language query
`graphanything_explain`	Provenance for one node / edge
`graphanything_update`	Incremental re-extract on changed files
`graphanything_versions`	List graph snapshots
`graphanything_diff`	Diff two snapshots
`graphanything_federate`	Combine multiple graphs
`graphanything_eval`	Coverage / dedup / quality report
`graphanything_render`	Mermaid / HTML / SVG / Cypher / GraphML / ASCII / JSON / Canvas / Timeline

Start the server:

python -m GraphAnything.serve

Wire into Claude Code / Cursor / Gemini CLI by registering the same process in your MCP config (~/.claude.json / .mcp.json / equivalent):

{
  "mcpServers": {
    "graphanything": {
      "command": "python",
      "args": ["-m", "GraphAnything.serve"],
      "env": {
        "GA_API_BASE": "http://localhost:8000/v1",
        "GA_MODEL": "Qwen3-32B-Instruct",
        "GA_API_KEY": "local"
      }
    }
  }
}

10 schema presets

graphanything presets lists them; graphanything new --preset NAME applies one. Drop your own YAML in GraphAnything/schemas/<name>.yaml to register a new one.

Preset	Domain
`chat-log`	Slack / Claude Code `.jsonl` / Discord → user / message / tool
`codebase`	Source repo → module / file / class / function / import / call
`contracts`	Legal contracts → party / clause / date / amount / governing law
`db-schema`	DDL / migrations / ORM → table / column / FK / index
`fstree`	Plain filesystem → directory / file / symlink
`meeting`	Meeting notes → person / topic / decision / action item
`obsidian-vault`	Obsidian / Notion vault → note / tag / wikilink / backlink
`openapi`	OpenAPI 2.x/3.x spec → endpoint / schema / ref / security
`papers`	Generic LLM-driven paper extraction
`pr-review`	GitHub PR trail → file / function / reviewer / concern

8 built-in extractors

graphanything extractors lists them. Suffix-based dispatcher picks one unless --extractor NAME overrides.

Extractor	LLM?	Handles	Notes
`markdown`	❌	`.md`, `.markdown`	Note / Heading / Tag / WikiLink
`json-yaml`	❌	`.json`, `.ndjson`, `.yaml`, `.yml`, `.toml`	Generic config tree + `$ref`
`openapi`	❌	`.yaml`, `.yml`, `.json`	API / Endpoint / Schema / Parameter (force via `--extractor openapi`)
`fstree`	❌	directories	Directory / File / Symlink
`chatlog`	❌	`.jsonl`, `.txt`, `.log`	Channel / User / Message / Tool / ToolCall
`llm-entity`	✅	`*` (any text)	Generic entity / relation, with `evidence_span` + `rationale`
`vlm`	✅	`.pdf`, `.png`, `.jpg`, `.jpeg`	Stub — install a plugin to enable
`noop`	❌	`*`	Empty graph; for tests

Adding a new extractor (Python plugin):

from GraphAnything import register_extractor

def extract_my_format(path, **_):
    return {
        "nodes": [{"id": "x", "label": "X", "file_type": "document",
                   "source_file": str(path)}],
        "edges": [],
    }

register_extractor(
    "my-format", extract_my_format,
    version="0.1.0", handles=(".myext",),
    description="My custom format extractor",
)

run_extractor() automatically stamps provenance (extractor_id, extractor_version, extraction_time, source_hash).

To replace the VLM stub with a real model:

register_extractor(
    "vlm", my_real_impl, version="1.0.0",
    handles=(".pdf", ".png", ".jpg"),
    needs_llm=True, overwrite=True,
)

Versioning, federation, quality

Incremental updates. graphanything update rehashes every input; unchanged files keep their previous nodes / edges verbatim, changed files are re-extracted, the result is normalised and snapshotted as the next versions/v<N>.json. diff then works between any two versions.

Federation. graphanything federate g1 g2 ... --out universe.json merges several graphs into one universe. Same-label entities of the same type collapse exactly; with --fuzzy --fuzzy-threshold 0.7 it also proposes same_as edges via Jaccard token overlap, optionally with --llm tie-breaking on borderline pairs.

Quality eval. graphanything eval --out-dir graphanything-out/quality writes QUALITY_REPORT.md: coverage by node type, dedup density, per-extractor stats, and (with --llm --judge-n 20) an LLM verdict on 20 sampled edges against their evidence_span.

REPL mode

graphanything repl ./contracts --preset contracts

Inside, every CLI sub-command is also a REPL command (schema, propose [N], refine "...", sample [N], review accept-all|accept ID...|reject ID...|merge a,b[,c], run [DIR], render FMT [PATH], explain TARGET, status, cost, llm on|off, presets, extractors, help, quit).

If prompt_toolkit is installed (pip install -e ".[repl]"), you get history + completion; otherwise the REPL falls back to bare input().

Session state on disk

graphanything-out/
├── sessions/
│   ├── .current               ← active session pointer
│   ├── sess_a1b2c3d4.json     ← one file per session
│   └── sess_...json
├── graph.json                 ← latest run/update output
├── versions/
│   ├── v1.json
│   ├── v2.json
│   └── manifest.json          ← schema_version + source_hashes per snapshot
└── quality/
    └── QUALITY_REPORT.md

The Session JSON is the source of truth: accepted / pending / rejected graph fragments, schema (with version), feedback log, running cost log, normalize rules, last source hashes for incremental update. Delete the file → the session is gone; copy it elsewhere → it relocates intact.

Configuration reference (env vars)

GraphAnything reads env vars only on demand; everything has sensible defaults. Names are listed in priority order — the first one that's set wins.

Setting	Env vars	Default
Chat-completions URL	`GA_API_BASE` / `OPENAI_API_BASE` / `OPENAI_BASE_URL` / `API_BASE`	`http://localhost:8000/v1`
Model name	`GA_MODEL` / `OPENAI_MODEL` / `SUMMARY_MODEL_NAME`	(required for LLM ops)
Bearer token	`GA_API_KEY` / `OPENAI_API_KEY` / `API_KEY`	empty (many local servers accept any string)
HTTP timeout (s)	`GA_HTTP_TIMEOUT`	`600`

Each LLM call sends a standard chat.completions POST to {API_BASE}/chat/completions with model, messages, temperature, max_tokens, and (for chat_json) response_format: {type: "json_object"}. If the server rejects response_format, the call retries without it and GraphAnything regex-extracts JSON from the answer (so reasoning models emitting <think>...</think> blocks also work).

Programmatic use

from GraphAnything import open_session
from GraphAnything.llm_client import make_client

llm = make_client()                          # reads env vars

sess = open_session(["./vault/"], preset="obsidian-vault")
sess.propose(auto_accept=True, llm=llm)
sess.run(llm=llm, out_dir="graphanything-out")

print(sess.accepted["nodes"][:3])

make_client() accepts overrides:

llm = make_client(
    api_base="http://h-100:8000/v1",
    model="Qwen3-32B-Instruct",
    api_key="local",
    timeout=900,
)

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
extractors		extractors
schemas		schemas
viz		viz
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
__init__.py		__init__.py
_proposer.py		_proposer.py
ask.py		ask.py
cli.py		cli.py
federate.py		federate.py
llm_client.py		llm_client.py
normalize.py		normalize.py
preview.py		preview.py
provenance.py		provenance.py
pyproject.toml		pyproject.toml
quality.py		quality.py
schema.py		schema.py
serve.py		serve.py
session.py		session.py
skill.md		skill.md
store.py		store.py
temporal.py		temporal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphAnything

At a glance

Three drivers, one core

Install

CLI quickstart

All 19 CLI sub-commands

Skill / MCP server (Claude Code, Cursor, Gemini, …)

10 schema presets

8 built-in extractors

Versioning, federation, quality

REPL mode

Session state on disk

Configuration reference (env vars)

Programmatic use

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GraphAnything

At a glance

Three drivers, one core

Install

CLI quickstart

All 19 CLI sub-commands

Skill / MCP server (Claude Code, Cursor, Gemini, …)

10 schema presets

8 built-in extractors

Versioning, federation, quality

REPL mode

Session state on disk

Configuration reference (env vars)

Programmatic use

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages