Skip to content

#300: filtered H3 clusters at world zoom (don't force point mode for broad facets)#302

Merged
rdhyee merged 9 commits into
isamplesorg:mainfrom
rdhyee:feat/300-filtered-clusters
Jun 19, 2026
Merged

#300: filtered H3 clusters at world zoom (don't force point mode for broad facets)#302
rdhyee merged 9 commits into
isamplesorg:mainfrom
rdhyee:feat/300-filtered-clusters

Conversation

@rdhyee

@rdhyee rdhyee commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Closes #300. Builds on PR4c #301 (merged) — the centralized computeTargetMode/filtersForcePoint seam.

What

When a facet filter is active and the camera is zoomed out (above EXIT_POINT_ALT), the map now renders an h3-clustered view of the filtered set instead of promoting to capped raw point mode (#267). Zoom-in still drops to individual dots. Search keeps point-mode (out of scope). Foundation: the #293 masks make filtered h3 aggregation fast.

How

  • Build (build_frontend_derived.py): add h3_res4/h3_res6 to samples_map_lite (+ validator + fixtures, 23/23).
  • loadRes is filter-aware: with a facet active + masks ready, it aggregates the FILTERED set off samples_map_lite (filteredClusterSQL, same grain/columns as the summary parquet) instead of loading the facet-blind pre-aggregated summary. A semantic cluster signature (_clusterFilterSig) drives stale-reload.
  • computeTargetMode: facets use the normal ENTER/EXIT altitude hysteresis once filtered clusters are ready (gated on filteredClustersReady + masks); search still forces point.
  • Coherence: deep-link/back-forward restore, cold-boot, filtered cluster-click hydration (fetchClusterByH3), and the #facetNote apology all made filter-aware.
  • Boot-deadlock fix (important): the heavy filtered query, issued during boot's concurrent query storm, deadlocked DuckDB-WASM (non-threaded MVP build) — the identical query runs in ~2.5s once idle. Fix: serialize db.query through a FIFO chain (all 45 data calls use db.query; verified). See the db cell comment.

Activation / deploy (ORDER MATTERS)

lite_url now points at isamples_202608_samples_map_lite_v2.parquet (a cache-bust filename that keeps the immutable-cache contract). This _v2 file must be uploaded to R2 (isamples-ry) BEFORE this PR mergeslite_url is load-bearing (point mode, deep links, filtered clusters all read it), so a missing _v2 would 404 the explorer's lite entirely. The file is built/validated (reproduces the shipped h3 summaries exactly).

If _v2 is absent, the feature is dormant (gates on the res4/res6 columns) and the explorer behaves as before #300 — so there's no half-state, but the explorer does need some lite at lite_url.

Watch-item

db.query serialization ships to all users and makes boot queries sequential. Local boot is fast (verify 9–14s); worth eyeballing real-network boot latency. Fallback if too slow: defer only the heavy filtered query until the connection is idle (keep other queries concurrent).

Verification

  • tests/playwright/filtered-clusters-300.spec.js [data] (against a local res46 lite): broad facet (anyanthropogenicmaterial) at world zoom → filtered clusters (kind:filtered, not point), 81 res4 cells, count conservation (cluster sum == masks-backed COUNT(*)); zoom-in → point. 2/2 pass.
  • Build/validator/fixtures 23/23. Offline SQL count-conservation at every res.
  • Regression (explorer-characterization + url-roundtrip, production data, feature dormant) clean in isolation — serialization introduces no regression. ((e) facet-hydration is a pre-existing cold-cache flake; listings.json 404 is listings.json gets a 404 error #295.)
  • Codex reviewed design, implementation, and the serialization fix; all findings addressed.

🤖 Generated with Claude Code

rdhyee added 9 commits June 19, 2026 05:18
…er filtered clusters

The explorer will aggregate filtered H3 clusters on the fly off samples_map_lite
(GROUP BY the res-appropriate h3 column + the isamplesorg#293 mask predicate) so a broad
facet filter at world zoom renders as fast filtered clusters instead of capped
raw points. samp_geo already computes h3_res4/h3_res6/h3_res8; lite carried only
res8 (point-mode cell lookups). Add res4/res6 (UBIGINT) — they dictionary-
compress well, so the size delta is small.

- build_samples_map_lite: emit h3_res4, h3_res6 alongside h3_res8
- validate: map_lite re-derivation now covers res4/res6
- header doc + corruption-test schema updated to match
- fixture tests: 23/23

Republish of the 202608 lite to R2 follows as a separate data step; the browser
feature gates on the columns being present (falls back to today's point-mode
behavior otherwise), so this is safe to ship before the republish.
…t + cluster sig

Dormant infrastructure for filtered clusters — no behavior change yet because
computeTargetMode still forces point mode when a facet is active (relaxed in C2).

- filteredClustersReady preflight cell: probes samples_map_lite for h3_res4/res6;
  sets window.__filteredClustersReady. Hard requirement is only the columns
  (masks readiness is orthogonal — facetFilterSQL self-falls-back to membership).
  Safe before the lite republish: flag false → today's point-mode behavior.
- Top-level helpers: wantFilteredClusters(), desiredClusterSig() (semantic, not
  SQL text — kind + sources + tree selections), filteredClusterSQL(res) (masks-
  backed lite aggregation; INTEGER casts; same columns/grain as build_h3_summary).
- loadRes: when wantFilteredClusters(), query filtered lite instead of the summary
  parquet. Snapshot the sig BEFORE the await; discard on
  `gen !== loadResGen || sig !== desiredClusterSig()` (filters toggled mid-query).
  Stamp viewer._clusterFilterSig on success.
- phase1: seed viewer._clusterFilterSig from the initial summary load.

Render OK. Per Codex design review (P0.1 casts, P0.2/P1.3 snapshot signature).
…orld zoom

Relax the isamplesorg#267 force-point rule: with a facet active above EXIT_POINT_ALT (and
filtered clusters ready), the map now shows FILTERED h3 clusters instead of
capped raw points. Zoom-in still drops to individual dots. Search stays
point-latched (out of scope).

- computeTargetMode(alt, latch=getMode()): search→point; facet&&!ready→point
  (pre-republish fallback); else ENTER/EXIT altitude hysteresis. latch param lets
  a URL restore resolve the band against the saved mode (Codex P1.7).
- reconcileGlobeForFilters(): shared transition for both filter-change handlers.
  Point branch invalidates in-flight cluster loads (loadResGen++) so a stale
  loadRes can't paint under point mode (P1.4). Cluster branch loads filtered
  clusters into the hidden layer FIRST, then exits point only if applied — no
  stale-cluster flash on a failed/superseded load (P1.10) — then chases
  tryEnterPointModeIfNeeded() (supersession invariant, P1.4).
- handleFacetFilterChange / applySearchFilterChange: route through the reconcile.
- camera.changed cluster branches: reload when resolution OR filter signature
  changed (a facet toggle in cluster mode refreshes the filtered cells); point→
  cluster uses load-first-then-exit (P1.5/P1.10).
- moveEnd gate (was: exit only when no filter): now computeTargetMode-driven, so
  a sub-10% zoom-out with a facet active loads filtered clusters; listener made
  async (return value unused) (P1.5).
- Readiness→reconcile hook (window.__onFilteredClustersReady) for late preflight
  / republished lite (P1.9).

Render OK. C3 (deep-link/boot restore, filtered click hydration, facet note) next.
…k hydration, facet note

- Deep-link/back-forward restore (hashchange): resolve mode via
  computeTargetMode(restoredAlt, latch=s.mode) so a facet at world zoom restores
  to FILTERED clusters, not point. Load clusters first then exit point; isStale()
  before mutation and after the load await; suppress-hash released first as before
  (Codex P1.6/P1.7).
- Cold boot mode hydration: same computeTargetMode(latch) treatment; when a facet
  is active at cluster altitude, reload the FILTERED clusters over phase1's
  unfiltered summary load (P1.7). Still enters point for the isamplesorg#203 alt<ENTER loophole.
- fetchClusterByH3: filter-aware single-cell aggregation off lite (count / center /
  dominant_source over the filtered subset, same tie-break as filteredClusterSQL)
  so a clicked or deep-linked filtered cell shows filtered numbers and a
  filter-excluded cell resolves to null (P1.8).
- handleFacetFilterChange: revalidate the selected cluster card after a facet
  change (clear if the cell emptied, else re-hydrate), guarded by a freshness
  token — mirrors the source-filter handler (P1.8).
- syncFacetNote: hide the "filter only at neighborhood zoom" apology once filtered
  clusters are ready — cluster mode is now filter-aware (P1.10).

Render OK. Completes the isamplesorg#300 browser implementation (C1+C2+C3).
- P0: delete the dedicated _urlHasFacets boot force-point block — it ran AFTER
  the new filtered-cluster boot load and switched straight back to points,
  negating isamplesorg#300 for cold-boot facet deep links. computeTargetMode (via bootTarget)
  now owns that decision.
- P1.2: invalidateClusterLoads() = ++loadResGen + loading=false. A bare
  loadResGen++ left `loading` stuck true (the superseded load's finally only
  clears it when its gen is current), wedging every later reload guard. Used in
  reconcile's point branch and at hashchange entry.
- P1.3: clusterSig(kind); phase1 labels its load clusterSig('summary') explicitly
  so a reconcile can't mistake facet-blind summary clusters for filtered data.
- P1.4: readiness fallback reconcile also checks _clusterFilterSig (at world zoom
  the mode is already 'cluster' though the layer is still summary).
- P1.5: boot 'point' uses direct enterPointMode for forced/saved cases
  (search / facet-not-ready / explicit mode=point) since tryEnterPointModeIfNeeded
  refuses at alt >= ENTER_POINT_ALT; gentle helper only for altitude-driven entry.
- P1.6: moveEnd chases tryEnterPointModeIfNeeded after exitPointMode (an
  overlapping settle can drop below ENTER during the load await).
- P1.7: hashchange invalidates cluster loads at ENTRY so a prior restore
  callback's loadRes discards instead of replacing data before the late isStale().
- P1.8: handleFacetFilterChange captures the freshness token at entry (a second
  toggle during the reconcile await must invalidate the first's revalidation).

Codex verified correct: integer casts, post-await sig TOCTOU, load-first ordering,
filtered fetchClusterByH3, build tests 23/23. Render OK.
…ered clusters

THE BUG: with a facet active at world zoom, filtered clusters never loaded — the
loadRes filtered query, issued during boot's concurrent query storm, NEVER
resolved. The identical query completes in ~2.5s once the connection is idle, and
even two concurrent post-boot queries are fine — but DuckDB-WASM (the non-threaded
MVP build this page loads) DEADLOCKS when the heavy filtered aggregation
(samples_map_lite + sample_facet_masks) runs amid boot's other in-flight queries.

THE FIX: serialize every db.query through a FIFO chain (wrap DuckDBClient.query in
the `db` cell), so at most one query runs at a time. SQL queries are atomic (none
awaits another mid-execution), so chaining can't deadlock; the latency cost is
small. Single-point fix; without it isamplesorg#300's filtered clusters never appear.

Also (found while debugging the double-fire):
- !loading guards on the readiness reconcile triggers (setTimeout0 + onReady hook)
  so boot's filtered load isn't redundantly issued twice.

Verification (tests/playwright/filtered-clusters-300.spec.js, [data], against a
local res46 lite served by dev_server.py):
- broad facet (anyanthropogenicmaterial) at world zoom → _clusterFilterSig
  kind:filtered, cluster mode (not forced point), 81 res4 cells; cluster
  sample_count sum == independent masks-backed COUNT(*) (count conservation).
- zoom-in below ENTER_POINT_ALT → point mode.

scripts/regen_lite_res46.py: derive h3_res4/res6 onto the existing 202608 lite via
the h3 extension (no wide needed); validated against the shipped h3 summaries.
DESIGN_300.md: design + Codex-review record.
…ading guards, dedup)

- P1: drop the `!loading` guards on the readiness reconcile triggers. They were
  added to avoid the boot double-fire, but db.query serialization already prevents
  the deadlock, and the guards are LOSSY — readiness arriving while an older
  unfiltered loadRes is in flight would skip the only reconcile signal permanently
  (Codex serialization-review P1).
- Add a sig-dedup at the top of reconcileGlobeForFilters' cluster branch: if
  already in cluster mode with the matching filter signature, no-op. Keeps the
  now-unguarded redundant reconciles (boot + hook + setTimeout0) cheap instead of
  re-running the heavy filtered aggregation.
- P2: narrow the `db` serialization comment — it wraps `.query` (and `.sql`, which
  calls `.query`); it does NOT cover queryStream/queryRow/raw connect. All 45 data
  calls in the page use db.query (verified), so coverage is complete today.
…he-bust)

Activates filtered clusters in production: the _v2 lite carries h3_res4/h3_res6.
A new filename (not overwriting the original) preserves the immutable-cache
contract (isamples_YYYYMM_*.parquet is served immutable/1-yr) — every visitor
fetches fresh data, no Cloudflare purge needed. One-off retrofit for 202608; the
next generation builds res4/res6 into the canonical name natively.

REQUIRES isamples_202608_samples_map_lite_v2.parquet uploaded to R2 (bucket
isamples-ry) BEFORE this merges — lite_url is load-bearing (point mode, deep
links, filtered clusters all read it), so a missing _v2 would 404 the explorer.
…the filtered load

The full db.query FIFO serialization (boot-deadlock fix) queued interactive
queries behind boot's whole storm — the pre-deploy smoke gate's "pottery" search
exceeded its 90s budget on a cold CI runner, blocking the staging deploy.

Surgical replacement: keep all queries CONCURRENT (fast boot + search, smoke gate
happy) and gate ONLY the heavy filtered-cluster aggregation on an idle connection.
- db cell: in-flight COUNTER (non-serializing) instead of the FIFO chain; exposes
  instance._inFlight().
- loadRes: when filtered, `await whenConnectionIdle()` before issuing — waits out
  boot's concurrent query storm (the deadlock trigger), no-op post-boot. Re-checks
  supersession after the wait. The light summary path is unchanged (safe concurrent).

The deadlock only occurs with a facet active at boot; the smoke test (text search,
no facet) never hit that path — its failure was purely the serialization slowing
search. Verified: filtered-clusters-300 2/2 (idle-gate avoids deadlock); smoke
passes ~28s (was ~40s serialized).
@rdhyee rdhyee merged commit 6f96df2 into isamplesorg:main Jun 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

explorer: filtered cluster/heatmap view at world zoom (don't promote to raw point mode for broad filters)

1 participant