Skip to content

feat: Reverse Translation Mapping Support#108

Draft
bencap wants to merge 5 commits into
feature/bencap/vrs-correctnessfrom
feature/bencap/target-variant-projection
Draft

feat: Reverse Translation Mapping Support#108
bencap wants to merge 5 commits into
feature/bencap/vrs-correctnessfrom
feature/bencap/target-variant-projection

Conversation

@bencap

@bencap bencap commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

bencap added 3 commits June 9, 2026 23:58
Add MappingOutcome so every (variant, level) record carries a typed outcome,
distinguishing a benign absence (intronic, no protein consequence) from a
genuine failure that error_message alone cannot convey. Derive the preferred
(authoritative) layer from the target's assay level rather than always
preferring genomic.
…c projection

Select a coding transcript for NC_ protein-coding targets via Ensembl locus
overlap plus MANE, so they are no longer silently skipped by reverse
translation. Project each measured variant onto its deterministically
reachable layers (g<->c, nucleotide->p), emitted as typed-outcome records and
routed by preferred_layer_only.
…ferred-layer records

- Track represented variant IDs at the preferred layer; re-attribute
  only variants that have no preferred-layer record, avoiding duplicate
  mapped_scores for variants with both a dead genomic attempt and a
  measured protein record (e.g. codon-optimised targets).
- Synthesize a preferred-layer failure for variants that mapped only at
  a non-preferred layer (e.g. wild-type p.= on a genomic-preferred
  target) so every input variant always has exactly one output record.
- Extract _map_protein_layer in vrs_map to return (mapping, reason)
  instead of an ad-hoc error MappedScore; a row that maps at no layer
  is failed once, layer-agnostically, carrying the detailed reason.
- Add TestNullFailureDedup and TestMapProteinLayerReason test coverage.
bencap added 2 commits June 29, 2026 16:06
…ignments

  For protein-vs-DNA BLAT alignments, qcoords always increase (protein
  reads N→C regardless of genome strand), so they cannot be used to
  detect strand. Switch to tcoords direction for protein queries.

  Also normalise hit_subranges and hit_range entries with min/max so
  they are always in ascending order, which they are not when the target
  gene sits on the minus strand.

  - Use tcoords direction (not qcoords) for strand detection when
    -q=prot is in blat_params
  - Wrap hit_subrange and hit_range endpoints in min/max in both
    _get_best_match and align_target_to_protein
  _get_mapped_reference_sequence had no CDNA branch, so it fell through
  to the genomic chromosome lookup and returned the NC_ accession as the
  post_mapped reference for the cdna layer. This caused target_genes
  post_mapped_metadata to carry NC_000017.11 under the "cdna" key
  instead of the NM transcript.

  Add a CDNA branch that resolves the NM/ENST accession from
  tx_output.nm (preferred, covers NC_/sequence-based targets) or from
  the target's own accession when it is already an NM_/ENST (cdna-source
  targets). Returns None rather than a chromosome when no NM is
  resolvable. Also adds unit tests for all three layer paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Redesign mapper output to produce MappingRecord + Allele rows at all levels

1 participant