Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of publishing PyPI package to Local Python repository #22

Merged
merged 5 commits into from
Jul 21, 2023

Conversation

jvroberts
Copy link
Contributor

An example of how to make PyPI packages available from an offline Package Manager instance, until Package Manager supports air-gapped/offline PyPI repositories.

Adds an example bash script that will:

  • Download a specified package source tarball from PyPI and its associated binary wheels.
  • Upload those package files to a local Package Manager instance using twine

Copy link
Contributor

@jmwoliver jmwoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I tested it out and made a note of a few things I noticed.

What is the likelihood of an air-gapped customer being able to use this? Would they have pypi.org blocked on their network, making them unable to pull down packages like this?

pypi-to-local-python/add-pypi-package.sh Outdated Show resolved Hide resolved
pypi-to-local-python/add-pypi-package.sh Outdated Show resolved Hide resolved
pypi-to-local-python/add-pypi-package.sh Outdated Show resolved Hide resolved
@shepherdjerred
Copy link

shepherdjerred commented Jul 20, 2023

Looks reasonable to me once the issues Jacob brought up are addressed.

@jvroberts
Copy link
Contributor Author

What is the likelihood of an air-gapped customer being able to use this? Would they have pypi.org blocked on their network, making them unable to pull down packages like this?

That's a good question, but technically you could run this from some machine in a DMZ that does have access to PyPI, but also has access to the internal PPM and can upload the packages to there, I guess. You could also adapt the example to separately pull the packages you want then copy them manually and push them. It's a starting point at least!

Copy link
Contributor

@jonyoder jonyoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really neat! The only suggestion I'd have is to write another alternative script that would let you do the same thing for an entire repo; then a customer could follow this workflow:

  1. Install a temporary PPM instance with a trial license (that has access to the internet).
  2. Create a Curated-PyPI source/repo and populate it with the packages they need.
  3. Run the script against their curated repo to download all the repo packages and upload them to their real PPM server.

Copy link
Contributor

@jonyoder jonyoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a round of some feedback.

echo Downloading package files for "$PACKAGE" from PyPI...
echo

curl https://pypi.org/pypi/$PACKAGE/json | jq ".releases[$VERSION][] | .url" | xargs -n1 curl --retry 2 -O --output-dir $PKGDIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a bit sad that we're not defaulting to P3M here? I realize it has a different API, but we could consider supporting both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I originally planned to do that but then the PyPI API was easier. I'd welcome an adaptation to use P3M instead.

Comment on lines 41 to 46
if [ "$2" = "" ]
then
VERSION=.info.version
else
VERSION=\"$2\"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to shorten this entire block to:

VERSION=${2:-.info.version}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a bash wizard on retainer... :)

Copy link
Contributor

@jonyoder jonyoder Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fees are reasonable. 🤣 (although I wouldn't say I'm a wizard)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except if $2 is provided, I need it to be double-quoted, but if using the default .info.version I need to be unquoted. What's the cleanest way to make that happen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I''m sure a real bash wizard would know how, but the smallest I'm coming up with is:

VERSION=$([[ -z $2 ]] && echo ".info.version" || echo "\"$2\"")

Your original version is more readable!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, here's one idea that's not so bad:

VERSION=${2:+\"$2\"}
VERSION=${VERSION:-.info.version}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works nicely!

KEEP_PACKAGES=false

# name of the Package Manager python local source to add packages to
PACKAGEMANAGER_SOURCE=python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and for the two other options above, you should consider making these optional defaults, like this:

# Caller can set TEMPDIR, or default to $TMPDIR or /tmp
TEMPDIR=${TEMPDIR:-${TMPDIR:-/tmp}}
KEEP_PACKAGES=${KEEP_PACKAGES:-false}
PACKAGEMANAGER_SOURCE=${PACKAGEMANAGER_SOURCE:-python}

The $TMPDIR default will automatically honor any environments that use a custom $TMPDIR setting.

With that change, users don't need to edit the script, but if they want something other than the default values, they can simply set env vars for the script, like:

KEEP_PACKAGES=true ./add-pypi-packages.sh appdirs 3.0.4

PACKAGEMANAGER_SOURCE=python

# Package Manager address and API token with permission to upload to source
PACKAGEMANAGER_ADDRESS=http://localhost:4242
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above; here we could have this be a default address but allow an env var to override it.


# Package Manager address and API token with permission to upload to source
PACKAGEMANAGER_ADDRESS=http://localhost:4242
PACKAGEMANAGER_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJwYWNrYWdlbWFuYWdlciIsImp0aSI6ImQwOTIxZmJhLTcwNTUtNDU4Ni1iNTkwLWNkZDJiODJjMWI0NiIsImlhdCI6MTY4OTg3MjgzNCwiaXNzIjoicGFja2FnZW1hbmFnZXIiLCJzY29wZXMiOnsic291cmNlcyI6IjUzYmZlNGQyLTExYTUtNGI5Yi1iM2Q3LTc2NjU5YjExYWVlMiJ9fQ.KKTmNw32JM6IM30XCeJbadJSxGw3z6bNW0BqMwSqdus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discourage setting a token in this script since it's a security risk. Instead, I'd recommend simply checking that this env var is set:

if [[ "$PACKAGEMANAGER_TOKEN" == "" ]]; then
  echo "You should define the PACKAGEMANAGER_TOKEN environment variable before using this script."
fi

If someone really wants the token in this script, they can still modify it and add it, of course.

Comment on lines 75 to 78
if [ "$KEEP_PACKAGES" = "false" ]
then
rm -rf $PKGDIR
fi
Copy link
Contributor

@jonyoder jonyoder Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup works best as a trap. I'd recommend adding this up on line 50, right below where PKGDIR is set:

PKGDIR=$TEMPDIR/$PACKAGE
mkdir -p $PKGDIR

cleanup () {
  if [ "$KEEP_PACKAGES" = "false" ]; then
    rm -rf $PKGDIR
  fi
}
trap cleanup EXIT

This will clean things up even if the script fails due to a network error, 404, etc.

echo Downloading package files for "$PACKAGE" from PyPI...
echo

curl https://pypi.org/pypi/$PACKAGE/json | jq ".releases[$VERSION][] | .url" | xargs -n1 curl --retry 2 -O --output-dir $PKGDIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you specify a bad package name? You'll want to verify that the curl call was successful before you continue. Same with the twine upload command below.

Copy link
Contributor

@jonyoder jonyoder Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend using the -f flag to make curl fail if the command isn't successful. Then you can check the exit code, like:

# Get the JSON data from PyPI
url=https://pypi.org/pypi/$PACKAGE/json
json=$(curl -sf $url)
if [[ "$?" -ne 0 ]]; then
  echo "Error downloading package JSON metadata for $PACKAGE from $url"
fi

# Download the files
echo $json | jq ".releases[$VERSION][] | .url" | xargs -n1 curl --retry 2 -O --output-dir $PKGDIR

@jvroberts jvroberts merged commit bf72bcf into main Jul 21, 2023
72 checks passed
@jvroberts jvroberts deleted the pypi-to-local branch July 21, 2023 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants