Skip to content

Add RFC-0055 Out-of-Tree Platform Build and Distribution#97

Open
afrittoli wants to merge 1 commit into
pytorch:masterfrom
afrittoli:rfc0051
Open

Add RFC-0055 Out-of-Tree Platform Build and Distribution#97
afrittoli wants to merge 1 commit into
pytorch:masterfrom
afrittoli:rfc0051

Conversation

@afrittoli

Copy link
Copy Markdown

No description provided.

Signed-off-by: Andrea Frittoli <andrea.frittoli@uk.ibm.com>
@meta-cla meta-cla Bot added the cla signed label Jun 22, 2026
@groenenboomj

Copy link
Copy Markdown

Red Hat is also very interested in supporting a nightly runner signal.

@albanD

albanD commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Thanks for sending this rfc, I expect we'll finish the current CRCR for testing and focus on onboarding projects there and make sure we get benefits from that before we build more pieces there.
But happy to take a closer look at this one after that!

@afrittoli afrittoli changed the title Add RFC-0051 Out-of-Tree Platform Build and Distribution Add RFC-0055 Out-of-Tree Platform Build and Distribution Jun 22, 2026
@afrittoli

Copy link
Copy Markdown
Author

Thanks for sending this rfc, I expect we'll finish the current CRCR for testing and focus on onboarding projects there and make sure we get benefits from that before we build more pieces there. But happy to take a closer look at this one after that!

Thanks @albanD - feedback would be welcome - I believe there's plenty of design and prototyping work that I can look into in parallel to the current work on CRCR.


Each platform operates in an isolated lane:

- **Credential isolation**: Each platform has a dedicated IAM role that can only write to that platform's storage prefix. OIDC trust policies scope the role to the specific vendor repo. A compromised vendor repo cannot access another platform's storage or the main PyTorch artifact space.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies AWS S3 infra. Also, this implies centralized management for the IAMs but I guess RelEng team (which is very Meta-heavy atm)

Each platform operates in an isolated lane:

- **Credential isolation**: Each platform has a dedicated IAM role that can only write to that platform's storage prefix. OIDC trust policies scope the role to the specific vendor repo. A compromised vendor repo cannot access another platform's storage or the main PyTorch artifact space.
- **Upload workflow isolation**: Uploads go through the official `_binary_upload.yml` workflow, which enforces naming conventions before writing to S3. Once [Stage 3](#implementation-plan) is complete, this workflow also generates provenance attestations. If the `job_workflow_ref` dual-gate can be confirmed (see [Credentials and Publishing Access](#credentials-and-publishing-access)), vendors cannot bypass this workflow even with valid OIDC credentials.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think _binary_upload.yml can really be used to enforce anything, it must be done on IAM level, which is hard and implies a lot of heavy lifting from the RelEng team

Platform vendors are responsible for security vulnerabilities in their platform-specific code. When a vulnerability affects packages hosted at `download.pytorch.org`, the following process applies:

1. Vendor discloses the vulnerability to the PyTorch security team at security@pytorch.org (or equivalent) within 7 days of discovery.
2. PyTorch infra can yank (remove from the CDN index without deleting) the affected artifacts while a fix is prepared.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why without deleting? What if affecting artifact distributes a malicious content?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants