Skip to content

Add interactive CLI explorer, YAML config, and global transform option#72

Merged
martinv13 merged 30 commits into
cre-dev:mainfrom
martinv13:claude/model-config-cli-tool-xo5pbp
Jun 23, 2026
Merged

Add interactive CLI explorer, YAML config, and global transform option#72
martinv13 merged 30 commits into
cre-dev:mainfrom
martinv13:claude/model-config-cli-tool-xo5pbp

Conversation

@martinv13

Copy link
Copy Markdown
Collaborator
  • xml2db serve: new interactive browser explorer with a live-updating ERD, source/target tree views, DDL tab, and a CodeMirror YAML editor with schema-aware autocomplete for table names, field names, and all config keys. CodeMirror and Mermaid load from jsDelivr via an HTML import map.
  • xml2db render: new --format ddl/erd/target-tree/source-tree and --db-names flags; ERD DB-names mode shows physical identifiers and SQL column types.
  • xml2db import: new CLI command wrapping DataModel.parse_xml + insert_into_target_tables.
  • YAML config: load_config() / parse_yaml_config() with full validation; SQLAlchemy type strings (String(100), DateTime(timezone=True), …) parsed from YAML; IndexConfig TypedDict for extra_args.
  • transform: false/auto: new top-level model config option to disable all automatic field transformations globally; field-level transform: "auto" and table-level choice_transform: "auto" added as explicit opt-in to default rules.
  • Autocomplete: source vs target field name distinction; "auto" offered for transform and choice_transform values.
  • Docs: restructured to reduce overlap between index and Getting started; caveats moved under "Building a data model"; writing style rules added to CLAUDE.md.
  • Tests: 27 new tests covering resolve_sa_type, parse_yaml_config, and all transform option variants.

claude and others added 30 commits June 22, 2026 18:16
- Add `config.py` with `ModelConfig`, `TableConfig`, `FieldConfig` TypedDicts
  and `load_config(path)` / `parse_yaml_config(text)` YAML helpers; callable-only
  keys (hooks, custom hash fn) raise `DataModelConfigError` when present in YAML
- Add `cli.py` with two subcommands:
    - `xml2db render <xsd> [--config] [--format erd|target-tree|source-tree]`
    - `xml2db serve  <xsd> [--config] [--port]` — launches a browser-based
      ERD explorer with in-page YAML editing, Apply/Save buttons, and SSE-driven
      live reload when the XSD changes on disk
  All plumbing uses stdlib only (http.server, threading, SSE); Mermaid.js loaded
  from CDN
- Export new public symbols from `xml2db.__init__`
- Add PyYAML ≥6.0 as a runtime dependency; register `xml2db` console script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
…alidation

- Add resolve_sa_type() to config.py: resolves "String(100)", "Integer",
  "DateTime(timezone=True)" etc. to SQLAlchemy type instances; non-string
  values are passed through unchanged so Python dict configs are unaffected
- field.type in YAML config is now resolved via resolve_sa_type() at column
  build time (column.py); supports all common SQLAlchemy types with args
- metadata_columns[*].type is now resolved via resolve_sa_type() at table
  build time (reused_table.py)
- Add IndexConfig TypedDict and accept dict-form extra_args entries
  {"name": ..., "columns": [...], "unique": ...} in YAML config; converted
  to sqlalchemy.Index objects in DataModelTable._validate_config() (table.py)
- Add MetadataColumnConfig TypedDict with typed name/type and common Column
  kwargs (nullable, default, server_default, comment, index, unique)
- Extend YAML validation in parse_yaml_config() to cover all config levels:
  metadata_columns[*].type and tables[*].fields[*].type must be strings;
  model-level callable-only keys already checked

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Adds DDL SQL output (CREATE TABLE + CREATE INDEX) to the render subcommand,
matching the dialect-specific output already tested in snapshot .sql files.
A new --db-type option (postgresql, mssql, mysql, …) selects the SQLAlchemy
dialect used for compilation; defaults to generic if omitted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
… state

- Removes XSD file watching and SSE/EventSource (XSD is immutable)
- Removes the Apply button; textarea changes now trigger a debounced rebuild (500 ms)
- Adds four output tabs: ERD, Target tree, Source tree, DDL
- On build error: shows error in red below editor, keeps last successful output visible
- Save button writes to --config path or ./model_config.yml if none was given
- Adds --db-type option to serve (for DDL tab dialect)
- All outputs (erd, target_tree, source_tree, ddl) computed in one rebuild pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Parses a single XML file and loads it into a database in one step:

  xml2db import <xml_file> <xsd_file> --connection-string <dsn>

Options: --config (YAML model config), --short-name, --db-schema,
--metadata KEY=VALUE (for metadata_columns), --validate,
--no-iterparse, --recover. Prints inserted/existing row counts and
per-phase timings on completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Replaces the plain textarea with a CodeMirror 6 editor (loaded from
esm.sh CDN — no new Python deps) featuring:
- YAML syntax highlighting via @codemirror/lang-yaml
- Context-aware autocompletion using indent-level heuristics:
  - Top-level model_config keys
  - Table names from the parsed XSD (injected as SCHEMA_INFO)
  - Table config keys (reuse, as_columnstore, choice_transform, …)
  - Field names per table from the parsed XSD
  - Field config keys (type, rename, transform)
  - metadata_columns and extra_args keys
  - Value completions: true/false for booleans, SQLAlchemy type names
    for 'type', transform options for 'transform'

SCHEMA_INFO is extracted from model.tables on each rebuild and injected
into the page as JSON, so completions always reflect the actual XSD schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
- getting_started.md: leads with xml2db serve / render / import CLI commands;
  Python API moved to a secondary section
- configuring.md: new sections for interactive CLI config and YAML file format
  at the top; Python dict section follows as the alternative for advanced use
- index.md: homepage now opens with CLI quickstart commands instead of a Python snippet

Python API snippets are preserved in all three files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
Covers comments and TypedDict annotations in src/xml2db/,
architecture notes in CLAUDE.md, and comments in tests/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01B94Mo7W3K5YiT4MuTDgg5c
esm.sh resolves codemirror@6 to 6.65.7, which is a CodeMirror 5 CJS
bundle with no ESM named exports. Pinning to 6.0.2 gets the actual
CM6 bundle that exports both basicSetup and EditorView.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Instead of loading CodeMirror from esm.sh in the browser (which breaks
behind corporate proxies that strip CORS headers), the serve command now
serves the bundle from /static/editor.js at the same origin.

The bundle is excluded from the wheel to keep package size small. In a
dev (editable) install it is read directly from src/xml2db/static/. In
a regular install it is downloaded once from GitHub on first serve and
cached in ~/.cache/xml2db/.

The bundle itself is built with esbuild from codemirror@6.0.2,
@codemirror/lang-yaml, and @codemirror/autocomplete. Rebuild it with:

  npm install --prefix /tmp/cm6build \
    codemirror@6.0.2 @codemirror/lang-yaml @codemirror/autocomplete
  npx --prefix /tmp/cm6build esbuild \
    --bundle --format=esm --minify \
    --outfile=src/xml2db/static/editor.js \
    /tmp/cm6build/entry.js  # exports basicSetup, EditorView, yaml, autocompletion

Increment _EDITOR_BUNDLE_VERSION in cli.py after rebuilding to invalidate
user caches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
basicSetup deliberately excludes Tab for accessibility. Without an
explicit binding, the browser captures Tab and moves focus away from
the editor. Add a keymap that:
- accepts the active autocomplete completion if one is open (Tab)
- indents otherwise (indentWithTab)

Rebuilt the editor bundle (v2) to export keymap, indentWithTab
(from @codemirror/commands), and acceptCompletion
(from @codemirror/autocomplete).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
When use_db_names=True, column types now show the compiled SQL type
(VARCHAR, INTEGER, BIGINT, etc.) instead of the XSD-derived logical
type (string, integer, etc.).

The sa_dialect is threaded through DataModel and DataModelTable so
dialect-specific types (e.g. NVARCHAR on MSSQL vs VARCHAR on
PostgreSQL) are shown when a db_type is configured. Falls back to
generic SQL types when no dialect is set.

Length specifiers are stripped (VARCHAR(255) -> VARCHAR) for mermaid
ERD cleanliness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- Show full SQL type including size (VARCHAR(255) not just VARCHAR),
  since the size is meaningful in DB context
- Drop the -N suffix in DB mode: it represents multi-value CSV storage,
  a logical concept with no DB equivalent
- Rename radio label to "Names & types" to reflect that both change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Two different name spaces apply to fields.<name> in model_config:
- transform uses the source (XSD/pre-simplification) name
- type and rename use the target (post-elevation/logical) name

This distinction is invisible when a field is not elevated (same name
in both), but matters whenever elevation collapses a child relation
into prefixed columns in the parent (e.g. orderperson -> orderperson_*).

Docs (configuring.md):
- Add "Source names vs target names" section with a table and example
- Add "Uses source/target name" note to each sub-section

serve autocomplete:
- schema_info now carries both {source: [...], target: [...]} per table
- source fields come from model.fields_transforms keyed by type_name
- Field completions at tables.<t>.fields.<f> show source-only fields
  labelled "source" and target-only fields labelled "target"; fields
  present in both (non-elevated) have no label

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Removes the pre-built esbuild bundle (editor.js) from the repo and all
download/cache machinery from the Python server. Instead the HTML now
carries a <script type="importmap"> that maps every CodeMirror 6 /
lezer package name to a version-pinned jsDelivr URL, and the module
script uses bare specifiers ("codemirror", "@codemirror/view", etc.).

All 17 transitive packages are pinned to the exact versions resolved
by npm install codemirror @codemirror/lang-yaml today, so module
sharing is guaranteed (the browser sees identical URLs from every
import path). All requests go to cdn.jsdelivr.net, the same CDN
already used for mermaid, which works through the corporate proxy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
jsDelivr's +esm suffix rewrites all transitive bare imports to versioned
jsDelivr URLs server-side, so no HTML import map is needed. Three direct
URL imports replace the 17-entry import map.

@codemirror/view and @codemirror/lang-yaml are pinned to the versions
that codemirror@6.0.2 resolves to (6.43.1 and 6.1.3) so the browser
module cache deduplicates shared deps like @codemirror/state correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
jsDelivr's +esm bundles each package standalone (including its deps),
so importing from multiple +esm URLs loads multiple copies of
@codemirror/state and triggers the "multiple instances" error.

The correct approach is an HTML import map pointing to the raw npm
dist/index.js files: those files use bare import specifiers, which the
browser resolves through the map to the same versioned URLs for every
package, ensuring a single shared @codemirror/state instance.

All 17 transitive packages are pinned to the exact versions resolved
by npm today. The module script now uses bare specifiers ("codemirror",
"@codemirror/view", etc.) resolved by the import map at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
When set to false, disables all automatic field transformations globally:
no join (multi-value columns stay as relations), no elevation of 1-1
child tables, no choice-group collapsing. The model stays as close to
the raw XSD structure as possible.

Per-field transform and per-table choice_transform config still override
the global setting, so individual exceptions remain possible.

The default "auto" preserves all existing behaviour. The Python API
also accepts True as a synonym for "auto".

CLI autocomplete distinguishes the top-level transform key (false/auto)
from the field-level one (false/skip/elevate_wo_prefix) by checking the
path depth before returning value options.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
"auto" is now a valid value for both field-level transform and
table-level choice_transform. It is identical to omitting the key
(applies the default automatic rules) and makes intent explicit,
which is useful when partially overriding a global transform: false
setting or when documenting that a field deliberately keeps the default.

CLI autocomplete now offers auto/false for choice_transform instead
of true/false, and prepends auto to the field-level transform options.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Replace em-dash with comma in the source vs target names explanation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- Rewrite transform option description: remove "possible and beneficial"
  and other verbose constructions; lead with the useful case (false)
- Remove bold emphasis from source/target names intro sentence
- Replace "It is not currently possible to" with a direct statement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- Require python -m mkdocs build after any docs change
- Extend writing style rule with examples of AI-sounding phrasing to avoid

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
- index.md: replace the detailed "How to get started" workflow with a
  single pointer to Getting started; keep the ERD diagram as a pitch
- configuring.md: remove the "Interactive configuration with the CLI"
  section (redundant with Getting started); fold the tip into the
  existing admonition
- how_it_works.md: demote "Caveats" from H2 to H3/H4 under "Building
  a data model" so it does not interrupt the loading-process narrative

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Removed the 'Data model visualization' section from the documentation.
Added sections for supported backends and visualizing data models.
… bug

Tests added (tests/test_config.py):
- resolve_sa_type: simple types, parameterised types, passthrough,
  unknown/malformed inputs
- parse_yaml_config: empty input, valid config, non-mapping rejection,
  callable-key rejection, metadata_columns and field type validation
- Global transform: false/auto/true behaviour and invalid value error
- Field-level transform: "auto" overriding global false, equivalence
  with omitting the key
- choice_transform: "auto" equivalence with omitting the key

Bug fix (transformed_table.py): field-level transform: "auto" was not
overriding global transform: false because the auto-detection block was
gated solely on the global flag. An explicit field-level "auto" now
always runs auto-detection regardless of the global setting, consistent
with the documented precedence (per-field > global > auto-detect).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
griffe warned about the missing annotation on get_entity_rel_diagram,
which mkdocs build --strict treats as an error in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kc7NkAzxyFPd4sWfCXcvZp
@martinv13 martinv13 merged commit aab9b86 into cre-dev:main Jun 23, 2026
10 checks passed
@martinv13 martinv13 deleted the claude/model-config-cli-tool-xo5pbp branch June 23, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants