Add interactive CLI explorer, YAML config, and global transform option#72
Merged
martinv13 merged 30 commits intoJun 23, 2026
Merged
Conversation
- Add `config.py` with `ModelConfig`, `TableConfig`, `FieldConfig` TypedDicts
and `load_config(path)` / `parse_yaml_config(text)` YAML helpers; callable-only
keys (hooks, custom hash fn) raise `DataModelConfigError` when present in YAML
- Add `cli.py` with two subcommands:
- `xml2db render <xsd> [--config] [--format erd|target-tree|source-tree]`
- `xml2db serve <xsd> [--config] [--port]` — launches a browser-based
ERD explorer with in-page YAML editing, Apply/Save buttons, and SSE-driven
live reload when the XSD changes on disk
All plumbing uses stdlib only (http.server, threading, SSE); Mermaid.js loaded
from CDN
- Export new public symbols from `xml2db.__init__`
- Add PyYAML ≥6.0 as a runtime dependency; register `xml2db` console script
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
…alidation
- Add resolve_sa_type() to config.py: resolves "String(100)", "Integer",
"DateTime(timezone=True)" etc. to SQLAlchemy type instances; non-string
values are passed through unchanged so Python dict configs are unaffected
- field.type in YAML config is now resolved via resolve_sa_type() at column
build time (column.py); supports all common SQLAlchemy types with args
- metadata_columns[*].type is now resolved via resolve_sa_type() at table
build time (reused_table.py)
- Add IndexConfig TypedDict and accept dict-form extra_args entries
{"name": ..., "columns": [...], "unique": ...} in YAML config; converted
to sqlalchemy.Index objects in DataModelTable._validate_config() (table.py)
- Add MetadataColumnConfig TypedDict with typed name/type and common Column
kwargs (nullable, default, server_default, comment, index, unique)
- Extend YAML validation in parse_yaml_config() to cover all config levels:
metadata_columns[*].type and tables[*].fields[*].type must be strings;
model-level callable-only keys already checked
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Adds DDL SQL output (CREATE TABLE + CREATE INDEX) to the render subcommand, matching the dialect-specific output already tested in snapshot .sql files. A new --db-type option (postgresql, mssql, mysql, …) selects the SQLAlchemy dialect used for compilation; defaults to generic if omitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
… state - Removes XSD file watching and SSE/EventSource (XSD is immutable) - Removes the Apply button; textarea changes now trigger a debounced rebuild (500 ms) - Adds four output tabs: ERD, Target tree, Source tree, DDL - On build error: shows error in red below editor, keeps last successful output visible - Save button writes to --config path or ./model_config.yml if none was given - Adds --db-type option to serve (for DDL tab dialect) - All outputs (erd, target_tree, source_tree, ddl) computed in one rebuild pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Parses a single XML file and loads it into a database in one step: xml2db import <xml_file> <xsd_file> --connection-string <dsn> Options: --config (YAML model config), --short-name, --db-schema, --metadata KEY=VALUE (for metadata_columns), --validate, --no-iterparse, --recover. Prints inserted/existing row counts and per-phase timings on completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Replaces the plain textarea with a CodeMirror 6 editor (loaded from
esm.sh CDN — no new Python deps) featuring:
- YAML syntax highlighting via @codemirror/lang-yaml
- Context-aware autocompletion using indent-level heuristics:
- Top-level model_config keys
- Table names from the parsed XSD (injected as SCHEMA_INFO)
- Table config keys (reuse, as_columnstore, choice_transform, …)
- Field names per table from the parsed XSD
- Field config keys (type, rename, transform)
- metadata_columns and extra_args keys
- Value completions: true/false for booleans, SQLAlchemy type names
for 'type', transform options for 'transform'
SCHEMA_INFO is extracted from model.tables on each rebuild and injected
into the page as JSON, so completions always reflect the actual XSD schema.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
- getting_started.md: leads with xml2db serve / render / import CLI commands; Python API moved to a secondary section - configuring.md: new sections for interactive CLI config and YAML file format at the top; Python dict section follows as the alternative for advanced use - index.md: homepage now opens with CLI quickstart commands instead of a Python snippet Python API snippets are preserved in all three files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Covers comments and TypedDict annotations in src/xml2db/, architecture notes in CLAUDE.md, and comments in tests/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
esm.sh resolves codemirror@6 to 6.65.7, which is a CodeMirror 5 CJS bundle with no ESM named exports. Pinning to 6.0.2 gets the actual CM6 bundle that exports both basicSetup and EditorView. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Instead of loading CodeMirror from esm.sh in the browser (which breaks
behind corporate proxies that strip CORS headers), the serve command now
serves the bundle from /static/editor.js at the same origin.
The bundle is excluded from the wheel to keep package size small. In a
dev (editable) install it is read directly from src/xml2db/static/. In
a regular install it is downloaded once from GitHub on first serve and
cached in ~/.cache/xml2db/.
The bundle itself is built with esbuild from codemirror@6.0.2,
@codemirror/lang-yaml, and @codemirror/autocomplete. Rebuild it with:
npm install --prefix /tmp/cm6build \
codemirror@6.0.2 @codemirror/lang-yaml @codemirror/autocomplete
npx --prefix /tmp/cm6build esbuild \
--bundle --format=esm --minify \
--outfile=src/xml2db/static/editor.js \
/tmp/cm6build/entry.js # exports basicSetup, EditorView, yaml, autocompletion
Increment _EDITOR_BUNDLE_VERSION in cli.py after rebuilding to invalidate
user caches.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
basicSetup deliberately excludes Tab for accessibility. Without an explicit binding, the browser captures Tab and moves focus away from the editor. Add a keymap that: - accepts the active autocomplete completion if one is open (Tab) - indents otherwise (indentWithTab) Rebuilt the editor bundle (v2) to export keymap, indentWithTab (from @codemirror/commands), and acceptCompletion (from @codemirror/autocomplete). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
When use_db_names=True, column types now show the compiled SQL type (VARCHAR, INTEGER, BIGINT, etc.) instead of the XSD-derived logical type (string, integer, etc.). The sa_dialect is threaded through DataModel and DataModelTable so dialect-specific types (e.g. NVARCHAR on MSSQL vs VARCHAR on PostgreSQL) are shown when a db_type is configured. Falls back to generic SQL types when no dialect is set. Length specifiers are stripped (VARCHAR(255) -> VARCHAR) for mermaid ERD cleanliness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- Show full SQL type including size (VARCHAR(255) not just VARCHAR), since the size is meaningful in DB context - Drop the -N suffix in DB mode: it represents multi-value CSV storage, a logical concept with no DB equivalent - Rename radio label to "Names & types" to reflect that both change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Two different name spaces apply to fields.<name> in model_config:
- transform uses the source (XSD/pre-simplification) name
- type and rename use the target (post-elevation/logical) name
This distinction is invisible when a field is not elevated (same name
in both), but matters whenever elevation collapses a child relation
into prefixed columns in the parent (e.g. orderperson -> orderperson_*).
Docs (configuring.md):
- Add "Source names vs target names" section with a table and example
- Add "Uses source/target name" note to each sub-section
serve autocomplete:
- schema_info now carries both {source: [...], target: [...]} per table
- source fields come from model.fields_transforms keyed by type_name
- Field completions at tables.<t>.fields.<f> show source-only fields
labelled "source" and target-only fields labelled "target"; fields
present in both (non-elevated) have no label
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Removes the pre-built esbuild bundle (editor.js) from the repo and all
download/cache machinery from the Python server. Instead the HTML now
carries a <script type="importmap"> that maps every CodeMirror 6 /
lezer package name to a version-pinned jsDelivr URL, and the module
script uses bare specifiers ("codemirror", "@codemirror/view", etc.).
All 17 transitive packages are pinned to the exact versions resolved
by npm install codemirror @codemirror/lang-yaml today, so module
sharing is guaranteed (the browser sees identical URLs from every
import path). All requests go to cdn.jsdelivr.net, the same CDN
already used for mermaid, which works through the corporate proxy.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
jsDelivr's +esm suffix rewrites all transitive bare imports to versioned jsDelivr URLs server-side, so no HTML import map is needed. Three direct URL imports replace the 17-entry import map. @codemirror/view and @codemirror/lang-yaml are pinned to the versions that codemirror@6.0.2 resolves to (6.43.1 and 6.1.3) so the browser module cache deduplicates shared deps like @codemirror/state correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
jsDelivr's +esm bundles each package standalone (including its deps),
so importing from multiple +esm URLs loads multiple copies of
@codemirror/state and triggers the "multiple instances" error.
The correct approach is an HTML import map pointing to the raw npm
dist/index.js files: those files use bare import specifiers, which the
browser resolves through the map to the same versioned URLs for every
package, ensuring a single shared @codemirror/state instance.
All 17 transitive packages are pinned to the exact versions resolved
by npm today. The module script now uses bare specifiers ("codemirror",
"@codemirror/view", etc.) resolved by the import map at runtime.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
When set to false, disables all automatic field transformations globally: no join (multi-value columns stay as relations), no elevation of 1-1 child tables, no choice-group collapsing. The model stays as close to the raw XSD structure as possible. Per-field transform and per-table choice_transform config still override the global setting, so individual exceptions remain possible. The default "auto" preserves all existing behaviour. The Python API also accepts True as a synonym for "auto". CLI autocomplete distinguishes the top-level transform key (false/auto) from the field-level one (false/skip/elevate_wo_prefix) by checking the path depth before returning value options. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
"auto" is now a valid value for both field-level transform and table-level choice_transform. It is identical to omitting the key (applies the default automatic rules) and makes intent explicit, which is useful when partially overriding a global transform: false setting or when documenting that a field deliberately keeps the default. CLI autocomplete now offers auto/false for choice_transform instead of true/false, and prepends auto to the field-level transform options. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Replace em-dash with comma in the source vs target names explanation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- Rewrite transform option description: remove "possible and beneficial" and other verbose constructions; lead with the useful case (false) - Remove bold emphasis from source/target names intro sentence - Replace "It is not currently possible to" with a direct statement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- Require python -m mkdocs build after any docs change - Extend writing style rule with examples of AI-sounding phrasing to avoid Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- index.md: replace the detailed "How to get started" workflow with a single pointer to Getting started; keep the ERD diagram as a pitch - configuring.md: remove the "Interactive configuration with the CLI" section (redundant with Getting started); fold the tip into the existing admonition - how_it_works.md: demote "Caveats" from H2 to H3/H4 under "Building a data model" so it does not interrupt the loading-process narrative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Removed the 'Data model visualization' section from the documentation.
Added sections for supported backends and visualizing data models.
… bug Tests added (tests/test_config.py): - resolve_sa_type: simple types, parameterised types, passthrough, unknown/malformed inputs - parse_yaml_config: empty input, valid config, non-mapping rejection, callable-key rejection, metadata_columns and field type validation - Global transform: false/auto/true behaviour and invalid value error - Field-level transform: "auto" overriding global false, equivalence with omitting the key - choice_transform: "auto" equivalence with omitting the key Bug fix (transformed_table.py): field-level transform: "auto" was not overriding global transform: false because the auto-detection block was gated solely on the global flag. An explicit field-level "auto" now always runs auto-detection regardless of the global setting, consistent with the documented precedence (per-field > global > auto-detect). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
griffe warned about the missing annotation on get_entity_rel_diagram, which mkdocs build --strict treats as an error in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
xml2db serve: new interactive browser explorer with a live-updating ERD, source/target tree views, DDL tab, and a CodeMirror YAML editor with schema-aware autocomplete for table names, field names, and all config keys. CodeMirror and Mermaid load from jsDelivr via an HTML import map.xml2db render: new--format ddl/erd/target-tree/source-treeand--db-namesflags; ERD DB-names mode shows physical identifiers and SQL column types.xml2db import: new CLI command wrapping DataModel.parse_xml + insert_into_target_tables.load_config()/parse_yaml_config()with full validation; SQLAlchemy type strings (String(100), DateTime(timezone=True), …) parsed from YAML; IndexConfig TypedDict for extra_args.transform: false/auto: new top-level model config option to disable all automatic field transformations globally; field-leveltransform: "auto"and table-level choice_transform: "auto" added as explicit opt-in to default rules.