Skip to content

#290/#293: cross-filtered facet counts (facet_tree_cross_filter cube) + multi-tree filter collapse#298

Merged
rdhyee merged 3 commits into
isamplesorg:mainfrom
rdhyee:promote/293-tree-cross-filter-cube
Jun 18, 2026
Merged

#290/#293: cross-filtered facet counts (facet_tree_cross_filter cube) + multi-tree filter collapse#298
rdhyee merged 3 commits into
isamplesorg:mainfrom
rdhyee:promote/293-tree-cross-filter-cube

Conversation

@rdhyee

@rdhyee rdhyee commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Closes #296 (facet values/counts now recompute under a filter at global view).

What

Promotes two #293-track changes, verified green on rdhyee staging:

  1. facet_tree_cross_filter cube (explorer: live viewport/cross-filtered counts for the Material facet tree (#281 follow-up) #290) — precomputed single-active-filter
    cross-filter COUNT cube over the 3 SKOS trees (material/context/object_type,
    concept_uri keys, subtree semantics via membership) + the flat source dim.
    The explorer reads it at global view for a single active filter, so selecting
    e.g. Specimen Type "Artifact" now updates the other facets' counts instantly
    instead of leaving them at the unfiltered baseline. This is exactly the
    pre-cache-single-value-per-facet design Eric proposed (2026-04-08), with
    on-the-fly fallback for uncached (multi-value/zoomed/search) cases.

    • Builder: build_facet_tree_cross_filter() (self-join of membership ∪ source).
    • Validator: AI-free cross-file gate — re-derives the cube from the written
      membership + facets and diffs symmetrically; grain-uniqueness gate; baseline
      == tree_summaries.
    • Tests: tree fixture asserting explicit known cube counts + gate-bites-on-
      corruption + --only orchestration. 19 pass.
    • Data: isamples_202608_facet_tree_cross_filter.parquet already published to
      R2 (1,018 rows, validated == live formula on the deployed data).
  2. Multi-tree filter collapse (explorer: multi-tree facet filtering does N membership scans (slow in WASM at scale) — combine into one scan / cube #293) — collapse N AND-ed membership subqueries
    into ONE scan (relational division: OR within, GROUP BY pid HAVING COUNT(DISTINCT facet_type)=N across). Identical semantics; helps narrow
    multi-selections. Broad multi-tree map filtering still hits the WASM data-scale
    wall — the real fix there (a map/h3 cross-filter artifact) is a separate phase.

Scope / honesty

  • The cube covers ONE value per dim at global view (matches Eric's stated design);
    multi-value, viewport-scoped, and active-search cases fall through to the
    existing live path (correct, just not precomputed).
  • The explorer fast-path is defensive: any cube miss/error falls through to the
    prior baseline/flat-cube/slow paths (no regression; flat mode ?facets=flat
    explicitly deferred).

Verification

🤖 Generated with Claude Code

rdhyee and others added 3 commits June 18, 2026 11:58
Replace N AND-ed membership subqueries (one per active tree dim) with a
single read_parquet scan using the relational-division pattern:
OR within the scan, then GROUP BY pid HAVING COUNT(DISTINCT facet_type)
= <#active tree dims> to enforce AND across dims.

Semantics are identical (parts are AND-ed at the call site); single-dim
collapses to the same one scan as before. Helps narrow/specific
multi-selections; broad multi-tree selections still hit the WASM
data-scale wall (real fix is the precomputed facet_tree_cross_filter
cube, tracked for isamplesorg#293/isamplesorg#290).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l cross-filtered tree counts

Precompute a single-active-filter cross-filter COUNT cube over the 3 SKOS
trees (material/context/object_type, concept_uri keys, subtree semantics via
membership) + the flat source dim. For every single active filter it stores
COUNT(DISTINCT pid) for every OTHER dim's node/value, plus a baseline. Schema
mirrors facet_cross_filter (~1k rows).

Fixes isamplesorg#290's global cross-filtered tree counts: describeCrossFilters zeroes
tree-dim selections at global view (to avoid the live membership near-full
scan that hits the DuckDB-WASM data-scale wall, isamplesorg#293). The explorer now reads
the precomputed cube for a single active filter at global view instead — the
effective single filter is read directly from the controls (so it sees tree
nodes even when zeroed), ahead of the baseline early-return. Any miss/error
(incl. cube not yet published on R2) falls through to existing paths unchanged.

Builder: build_facet_tree_cross_filter() (self-join of membership ∪ source).
Validator: AI-free cross-file gate — re-derives the cube from the written
  membership + facets and diffs symmetrically; baseline == tree_summaries.
Tests: tree fixture (vocab + samples) asserting explicit known cube counts
  (catches builder-logic bugs) + validator-gate-bites-on-corruption. 16 pass.

Verified against deployed isamples_202608 data: cube == live isamplesorg#290 formula
(exact), 1,018 rows, validator green, corruption caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cube

P1 (regression): the explorer cube fast-path now runs ONLY when all tree dims
  are rendered as trees (TREE_DIM_KEYS.every(treeActive)). In flat mode
  (?facets=flat) the cube's subtree-membership semantics are wrong for a flat
  dim and a flat-mode selection isn't representable — so defer to the
  flat-cube/slow paths entirely (no flat-count regression).
P2 (builder): --only facet_tree_cross_filter silently built nothing (hierarchy
  guard omitted it). Added to the build guard + explicit-vocab requirement set;
  tests for both --only success and the no-vocab loud failure.
P3 (validator): EXCEPT is set-semantics so a DOUBLED cube passed. Added a
  grain/uniqueness gate over (all filter cols, facet_type, facet_value); test
  proves it bites on a doubled cube.

19 tests pass; all 3 cube gates green on deployed isamples_202608 data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Facet values and counts need to update with filters

1 participant