Skip to content

fix(train): Handle subscription-only models in recipe selection#5948

Closed
haardm wants to merge 1 commit into
aws:masterfrom
haardm:fix/subscription-only-recipe-selection
Closed

fix(train): Handle subscription-only models in recipe selection#5948
haardm wants to merge 1 commit into
aws:masterfrom
haardm:fix/subscription-only-recipe-selection

Conversation

@haardm

@haardm haardm commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Description

Fix recipe selection for models that have only IsSubscriptionModel recipes (e.g., nova-textgeneration-micro-v2). Previously, the primary recipe filter required not IsSubscriptionModel, causing ValueError: No recipes found when no standard recipe exists.

Fixes

  • Fallback to subscription recipe as primary when no standard one exists
  • Handle access point ARN URIs in primary recipe download path (subscription recipes use s3://arn:aws:s3:... format)
  • Guard against self-merge — when primary IS the subscription recipe, skip the merge step
  • Conditional {customer_id} resolution — only call STS when the URI contains the placeholder

Testing

Unit tests (75 passed, 0 failed)

PYTHONPATH=src:../sagemaker-core/src pytest tests/unit/train/common_utils/test_finetune_utils.py

New test cases:

  • test__get_fine_tuning_options_subscription_only_model_lora — micro-v2 LoRA
  • test__get_fine_tuning_options_subscription_only_model_full — micro-v2 full-rank
  • test__get_fine_tuning_options_mixed_recipes_still_prefers_standard — lite-v2 regression

E2E validation (prod IAD, account 551952248621)

Test Job ARN Result
Micro v2 LoRA datamix ...training-job/nova-textgeneration-micro-v2-sft-20260615191820 ✅ Created + Downloading
Micro v2 FULL datamix ...training-job/nova-textgeneration-micro-v2-sft-20260615191835 ✅ Created + Downloading
Lite v2 LoRA datamix (regression) ...training-job/nova-textgeneration-lite-v2-sft-20260615191850 ✅ Created + Downloading
Lite v2 standard SFT (regression) ...training-job/nova-textgeneration-lite-v2-sft-20260615191906 ✅ Created (standard recipe selected)
Non-existent model N/A ✅ Correctly raises ResourceNotFound

Related

Models like nova-textgeneration-micro-v2 have only IsSubscriptionModel
recipes. The primary recipe filter required not IsSubscriptionModel,
causing ValueError when no standard recipe exists.

Fix:
- Fallback to subscription recipe as primary when no standard one exists
- Handle access point ARN URIs in primary recipe download path
- Guard against merging subscription override_params into itself
- Only resolve {customer_id} placeholder when present in URI

Tests:
- subscription_only_model_lora: Micro v2 LoRA case
- subscription_only_model_full: Micro v2 full-rank case
- mixed_recipes_still_prefers_standard: Lite v2 regression guard

Fixes: V2248468914
@haardm haardm temporarily deployed to manual-approval June 15, 2026 19:25 — with GitHub Actions Inactive
@haardm haardm deployed to manual-approval June 15, 2026 19:25 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants