Add interactive CLI explorer, YAML config, and global transform option by martinv13 · Pull Request #72 · cre-dev/xml2db

martinv13 · 2026-06-23T16:04:18Z

xml2db serve: new interactive browser explorer with a live-updating ERD, source/target tree views, DDL tab, and a CodeMirror YAML editor with schema-aware autocomplete for table names, field names, and all config keys. CodeMirror and Mermaid load from jsDelivr via an HTML import map.
xml2db render: new --format ddl/erd/target-tree/source-tree and --db-names flags; ERD DB-names mode shows physical identifiers and SQL column types.
xml2db import: new CLI command wrapping DataModel.parse_xml + insert_into_target_tables.
YAML config: load_config() / parse_yaml_config() with full validation; SQLAlchemy type strings (String(100), DateTime(timezone=True), …) parsed from YAML; IndexConfig TypedDict for extra_args.
transform: false/auto: new top-level model config option to disable all automatic field transformations globally; field-level transform: "auto" and table-level choice_transform: "auto" added as explicit opt-in to default rules.
Autocomplete: source vs target field name distinction; "auto" offered for transform and choice_transform values.
Docs: restructured to reduce overlap between index and Getting started; caveats moved under "Building a data model"; writing style rules added to CLAUDE.md.
Tests: 27 new tests covering resolve_sa_type, parse_yaml_config, and all transform option variants.

- Add `config.py` with `ModelConfig`, `TableConfig`, `FieldConfig` TypedDicts and `load_config(path)` / `parse_yaml_config(text)` YAML helpers; callable-only keys (hooks, custom hash fn) raise `DataModelConfigError` when present in YAML - Add `cli.py` with two subcommands: - `xml2db render <xsd> [--config] [--format erd|target-tree|source-tree]` - `xml2db serve <xsd> [--config] [--port]` — launches a browser-based ERD explorer with in-page YAML editing, Apply/Save buttons, and SSE-driven live reload when the XSD changes on disk All plumbing uses stdlib only (http.server, threading, SSE); Mermaid.js loaded from CDN - Export new public symbols from `xml2db.__init__` - Add PyYAML ≥6.0 as a runtime dependency; register `xml2db` console script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

…alidation - Add resolve_sa_type() to config.py: resolves "String(100)", "Integer", "DateTime(timezone=True)" etc. to SQLAlchemy type instances; non-string values are passed through unchanged so Python dict configs are unaffected - field.type in YAML config is now resolved via resolve_sa_type() at column build time (column.py); supports all common SQLAlchemy types with args - metadata_columns[*].type is now resolved via resolve_sa_type() at table build time (reused_table.py) - Add IndexConfig TypedDict and accept dict-form extra_args entries {"name": ..., "columns": [...], "unique": ...} in YAML config; converted to sqlalchemy.Index objects in DataModelTable._validate_config() (table.py) - Add MetadataColumnConfig TypedDict with typed name/type and common Column kwargs (nullable, default, server_default, comment, index, unique) - Extend YAML validation in parse_yaml_config() to cover all config levels: metadata_columns[*].type and tables[*].fields[*].type must be strings; model-level callable-only keys already checked Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

Adds DDL SQL output (CREATE TABLE + CREATE INDEX) to the render subcommand, matching the dialect-specific output already tested in snapshot .sql files. A new --db-type option (postgresql, mssql, mysql, …) selects the SQLAlchemy dialect used for compilation; defaults to generic if omitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

… state - Removes XSD file watching and SSE/EventSource (XSD is immutable) - Removes the Apply button; textarea changes now trigger a debounced rebuild (500 ms) - Adds four output tabs: ERD, Target tree, Source tree, DDL - On build error: shows error in red below editor, keeps last successful output visible - Save button writes to --config path or ./model_config.yml if none was given - Adds --db-type option to serve (for DDL tab dialect) - All outputs (erd, target_tree, source_tree, ddl) computed in one rebuild pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

Parses a single XML file and loads it into a database in one step: xml2db import <xml_file> <xsd_file> --connection-string <dsn> Options: --config (YAML model config), --short-name, --db-schema, --metadata KEY=VALUE (for metadata_columns), --validate, --no-iterparse, --recover. Prints inserted/existing row counts and per-phase timings on completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

Replaces the plain textarea with a CodeMirror 6 editor (loaded from esm.sh CDN — no new Python deps) featuring: - YAML syntax highlighting via @codemirror/lang-yaml - Context-aware autocompletion using indent-level heuristics: - Top-level model_config keys - Table names from the parsed XSD (injected as SCHEMA_INFO) - Table config keys (reuse, as_columnstore, choice_transform, …) - Field names per table from the parsed XSD - Field config keys (type, rename, transform) - metadata_columns and extra_args keys - Value completions: true/false for booleans, SQLAlchemy type names for 'type', transform options for 'transform' SCHEMA_INFO is extracted from model.tables on each rebuild and injected into the page as JSON, so completions always reflect the actual XSD schema. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

- getting_started.md: leads with xml2db serve / render / import CLI commands; Python API moved to a secondary section - configuring.md: new sections for interactive CLI config and YAML file format at the top; Python dict section follows as the alternative for advanced use - index.md: homepage now opens with CLI quickstart commands instead of a Python snippet Python API snippets are preserved in all three files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

Covers comments and TypedDict annotations in src/xml2db/, architecture notes in CLAUDE.md, and comments in tests/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

esm.sh resolves codemirror@6 to 6.65.7, which is a CodeMirror 5 CJS bundle with no ESM named exports. Pinning to 6.0.2 gets the actual CM6 bundle that exports both basicSetup and EditorView. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Instead of loading CodeMirror from esm.sh in the browser (which breaks behind corporate proxies that strip CORS headers), the serve command now serves the bundle from /static/editor.js at the same origin. The bundle is excluded from the wheel to keep package size small. In a dev (editable) install it is read directly from src/xml2db/static/. In a regular install it is downloaded once from GitHub on first serve and cached in ~/.cache/xml2db/. The bundle itself is built with esbuild from codemirror@6.0.2, @codemirror/lang-yaml, and @codemirror/autocomplete. Rebuild it with: npm install --prefix /tmp/cm6build \ codemirror@6.0.2 @codemirror/lang-yaml @codemirror/autocomplete npx --prefix /tmp/cm6build esbuild \ --bundle --format=esm --minify \ --outfile=src/xml2db/static/editor.js \ /tmp/cm6build/entry.js # exports basicSetup, EditorView, yaml, autocompletion Increment _EDITOR_BUNDLE_VERSION in cli.py after rebuilding to invalidate user caches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

basicSetup deliberately excludes Tab for accessibility. Without an explicit binding, the browser captures Tab and moves focus away from the editor. Add a keymap that: - accepts the active autocomplete completion if one is open (Tab) - indents otherwise (indentWithTab) Rebuilt the editor bundle (v2) to export keymap, indentWithTab (from @codemirror/commands), and acceptCompletion (from @codemirror/autocomplete). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

When use_db_names=True, column types now show the compiled SQL type (VARCHAR, INTEGER, BIGINT, etc.) instead of the XSD-derived logical type (string, integer, etc.). The sa_dialect is threaded through DataModel and DataModelTable so dialect-specific types (e.g. NVARCHAR on MSSQL vs VARCHAR on PostgreSQL) are shown when a db_type is configured. Falls back to generic SQL types when no dialect is set. Length specifiers are stripped (VARCHAR(255) -> VARCHAR) for mermaid ERD cleanliness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

- Show full SQL type including size (VARCHAR(255) not just VARCHAR), since the size is meaningful in DB context - Drop the -N suffix in DB mode: it represents multi-value CSV storage, a logical concept with no DB equivalent - Rename radio label to "Names & types" to reflect that both change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Two different name spaces apply to fields.<name> in model_config: - transform uses the source (XSD/pre-simplification) name - type and rename use the target (post-elevation/logical) name This distinction is invisible when a field is not elevated (same name in both), but matters whenever elevation collapses a child relation into prefixed columns in the parent (e.g. orderperson -> orderperson_*). Docs (configuring.md): - Add "Source names vs target names" section with a table and example - Add "Uses source/target name" note to each sub-section serve autocomplete: - schema_info now carries both {source: [...], target: [...]} per table - source fields come from model.fields_transforms keyed by type_name - Field completions at tables.<t>.fields.<f> show source-only fields labelled "source" and target-only fields labelled "target"; fields present in both (non-elevated) have no label Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Removes the pre-built esbuild bundle (editor.js) from the repo and all download/cache machinery from the Python server. Instead the HTML now carries a <script type="importmap"> that maps every CodeMirror 6 / lezer package name to a version-pinned jsDelivr URL, and the module script uses bare specifiers ("codemirror", "@codemirror/view", etc.). All 17 transitive packages are pinned to the exact versions resolved by npm install codemirror @codemirror/lang-yaml today, so module sharing is guaranteed (the browser sees identical URLs from every import path). All requests go to cdn.jsdelivr.net, the same CDN already used for mermaid, which works through the corporate proxy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

jsDelivr's +esm suffix rewrites all transitive bare imports to versioned jsDelivr URLs server-side, so no HTML import map is needed. Three direct URL imports replace the 17-entry import map. @codemirror/view and @codemirror/lang-yaml are pinned to the versions that codemirror@6.0.2 resolves to (6.43.1 and 6.1.3) so the browser module cache deduplicates shared deps like @codemirror/state correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

jsDelivr's +esm bundles each package standalone (including its deps), so importing from multiple +esm URLs loads multiple copies of @codemirror/state and triggers the "multiple instances" error. The correct approach is an HTML import map pointing to the raw npm dist/index.js files: those files use bare import specifiers, which the browser resolves through the map to the same versioned URLs for every package, ensuring a single shared @codemirror/state instance. All 17 transitive packages are pinned to the exact versions resolved by npm today. The module script now uses bare specifiers ("codemirror", "@codemirror/view", etc.) resolved by the import map at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

When set to false, disables all automatic field transformations globally: no join (multi-value columns stay as relations), no elevation of 1-1 child tables, no choice-group collapsing. The model stays as close to the raw XSD structure as possible. Per-field transform and per-table choice_transform config still override the global setting, so individual exceptions remain possible. The default "auto" preserves all existing behaviour. The Python API also accepts True as a synonym for "auto". CLI autocomplete distinguishes the top-level transform key (false/auto) from the field-level one (false/skip/elevate_wo_prefix) by checking the path depth before returning value options. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

"auto" is now a valid value for both field-level transform and table-level choice_transform. It is identical to omitting the key (applies the default automatic rules) and makes intent explicit, which is useful when partially overriding a global transform: false setting or when documenting that a field deliberately keeps the default. CLI autocomplete now offers auto/false for choice_transform instead of true/false, and prepends auto to the field-level transform options. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Replace em-dash with comma in the source vs target names explanation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

- Rewrite transform option description: remove "possible and beneficial" and other verbose constructions; lead with the useful case (false) - Remove bold emphasis from source/target names intro sentence - Replace "It is not currently possible to" with a direct statement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

- Require python -m mkdocs build after any docs change - Extend writing style rule with examples of AI-sounding phrasing to avoid Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

- index.md: replace the detailed "How to get started" workflow with a single pointer to Getting started; keep the ERD diagram as a pitch - configuring.md: remove the "Interactive configuration with the CLI" section (redundant with Getting started); fold the tip into the existing admonition - how_it_works.md: demote "Caveats" from H2 to H3/H4 under "Building a data model" so it does not interrupt the loading-process narrative Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Removed the 'Data model visualization' section from the documentation.

Added sections for supported backends and visualizing data models.

… bug Tests added (tests/test_config.py): - resolve_sa_type: simple types, parameterised types, passthrough, unknown/malformed inputs - parse_yaml_config: empty input, valid config, non-mapping rejection, callable-key rejection, metadata_columns and field type validation - Global transform: false/auto/true behaviour and invalid value error - Field-level transform: "auto" overriding global false, equivalence with omitting the key - choice_transform: "auto" equivalence with omitting the key Bug fix (transformed_table.py): field-level transform: "auto" was not overriding global transform: false because the auto-detection block was gated solely on the global flag. An explicit field-level "auto" now always runs auto-detection regardless of the global setting, consistent with the documented precedence (per-field > global > auto-detect). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

griffe warned about the missing annotation on get_entity_rel_diagram, which mkdocs build --strict treats as an error in CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

claude and others added 30 commits June 22, 2026 18:16

Add writing style rules to CLAUDE.md

142672e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c

Remove em-dash from docs

45f5a71

Replace em-dash with comma in the source vs target names explanation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Remove AI-sounding "Why this matters:" heading from docs

144a64c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

Remove 'Data model visualization' section

d9a2a63

Removed the 'Data model visualization' section from the documentation.

Enhance documentation with new sections

5d6c909

Added sections for supported backends and visualizing data models.

Use --strict in docs build rule in CLAUDE.md to match CI

6ff9d40

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp

martinv13 merged commit aab9b86 into cre-dev:main Jun 23, 2026
10 checks passed

martinv13 deleted the claude/model-config-cli-tool-xo5pbp branch June 23, 2026 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add interactive CLI explorer, YAML config, and global transform option#72

Add interactive CLI explorer, YAML config, and global transform option#72
martinv13 merged 30 commits into
cre-dev:mainfrom
martinv13:claude/model-config-cli-tool-xo5pbp

martinv13 commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

martinv13 commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants