#290/#293: cross-filtered facet counts (facet_tree_cross_filter cube) + multi-tree filter collapse#298
Merged
rdhyee merged 3 commits intoJun 18, 2026
Conversation
Replace N AND-ed membership subqueries (one per active tree dim) with a single read_parquet scan using the relational-division pattern: OR within the scan, then GROUP BY pid HAVING COUNT(DISTINCT facet_type) = <#active tree dims> to enforce AND across dims. Semantics are identical (parts are AND-ed at the call site); single-dim collapses to the same one scan as before. Helps narrow/specific multi-selections; broad multi-tree selections still hit the WASM data-scale wall (real fix is the precomputed facet_tree_cross_filter cube, tracked for isamplesorg#293/isamplesorg#290). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l cross-filtered tree counts Precompute a single-active-filter cross-filter COUNT cube over the 3 SKOS trees (material/context/object_type, concept_uri keys, subtree semantics via membership) + the flat source dim. For every single active filter it stores COUNT(DISTINCT pid) for every OTHER dim's node/value, plus a baseline. Schema mirrors facet_cross_filter (~1k rows). Fixes isamplesorg#290's global cross-filtered tree counts: describeCrossFilters zeroes tree-dim selections at global view (to avoid the live membership near-full scan that hits the DuckDB-WASM data-scale wall, isamplesorg#293). The explorer now reads the precomputed cube for a single active filter at global view instead — the effective single filter is read directly from the controls (so it sees tree nodes even when zeroed), ahead of the baseline early-return. Any miss/error (incl. cube not yet published on R2) falls through to existing paths unchanged. Builder: build_facet_tree_cross_filter() (self-join of membership ∪ source). Validator: AI-free cross-file gate — re-derives the cube from the written membership + facets and diffs symmetrically; baseline == tree_summaries. Tests: tree fixture (vocab + samples) asserting explicit known cube counts (catches builder-logic bugs) + validator-gate-bites-on-corruption. 16 pass. Verified against deployed isamples_202608 data: cube == live isamplesorg#290 formula (exact), 1,018 rows, validator green, corruption caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cube P1 (regression): the explorer cube fast-path now runs ONLY when all tree dims are rendered as trees (TREE_DIM_KEYS.every(treeActive)). In flat mode (?facets=flat) the cube's subtree-membership semantics are wrong for a flat dim and a flat-mode selection isn't representable — so defer to the flat-cube/slow paths entirely (no flat-count regression). P2 (builder): --only facet_tree_cross_filter silently built nothing (hierarchy guard omitted it). Added to the build guard + explicit-vocab requirement set; tests for both --only success and the no-vocab loud failure. P3 (validator): EXCEPT is set-semantics so a DOUBLED cube passed. Added a grain/uniqueness gate over (all filter cols, facet_type, facet_value); test proves it bites on a doubled cube. 19 tests pass; all 3 cube gates green on deployed isamples_202608 data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #296 (facet values/counts now recompute under a filter at global view).
What
Promotes two #293-track changes, verified green on rdhyee staging:
facet_tree_cross_filtercube (explorer: live viewport/cross-filtered counts for the Material facet tree (#281 follow-up) #290) — precomputed single-active-filtercross-filter COUNT cube over the 3 SKOS trees (material/context/object_type,
concept_uri keys, subtree semantics via membership) + the flat
sourcedim.The explorer reads it at global view for a single active filter, so selecting
e.g. Specimen Type "Artifact" now updates the other facets' counts instantly
instead of leaving them at the unfiltered baseline. This is exactly the
pre-cache-single-value-per-facet design Eric proposed (2026-04-08), with
on-the-fly fallback for uncached (multi-value/zoomed/search) cases.
build_facet_tree_cross_filter()(self-join of membership ∪ source).membership + facets and diffs symmetrically; grain-uniqueness gate; baseline
== tree_summaries.
corruption +
--onlyorchestration. 19 pass.isamples_202608_facet_tree_cross_filter.parquetalready published toR2 (1,018 rows, validated == live formula on the deployed data).
Multi-tree filter collapse (explorer: multi-tree facet filtering does N membership scans (slow in WASM at scale) — combine into one scan / cube #293) — collapse N AND-ed membership subqueries
into ONE scan (relational division: OR within,
GROUP BY pid HAVING COUNT(DISTINCT facet_type)=Nacross). Identical semantics; helps narrowmulti-selections. Broad multi-tree map filtering still hits the WASM data-scale
wall — the real fix there (a map/h3 cross-filter artifact) is a separate phase.
Scope / honesty
multi-value, viewport-scoped, and active-search cases fall through to the
existing live path (correct, just not precomputed).
prior baseline/flat-cube/slow paths (no regression; flat mode
?facets=flatexplicitly deferred).
Verification
🤖 Generated with Claude Code