Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker runner assumes the Docker Hub (docker.io) registry #279

Open
tsibley opened this issue May 9, 2023 · 4 comments
Open

Docker runner assumes the Docker Hub (docker.io) registry #279

tsibley opened this issue May 9, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@tsibley
Copy link
Member

tsibley commented May 9, 2023

Current Behavior

During update, the Docker runner assumes all images come from the Docker Hub (docker.io) registry.

Despite that, Docker images from other registries still mostly work unless a tag-less image name is given or a build-* tag is given. Either case triggers a different code path:

if not tag or is_build_tag(tag):
build_tags = sorted(filter(is_build_tag, tags(repository)))
if build_tags:
return repository + ":" + build_tags[-1]
return image_name

and nextstrain.cli.runner.docker._update ends up trying to enumerate the build-* tags of the non-docker.io image by making requests to the docker.io registry. This understandably results in a 400 Bad Request from the latter. Oops.

Originally discovered in nextstrain/docker-base#148 (comment).

Expected behavior

Ideally the update code would fetch the tags available from the correct registry, but this is often complicated by forbidding of anonymous access and other authz issues.

As a temporary workaround, perhaps we could simply not support updates from other registries for now. This would involve detecting the registry correctly and excluding those which aren't docker.io from the build-* tag handling.

How to reproduce

The manylinux image here is used only as an example. Any image from a registry other than docker.io will have the same issue.

$ NEXTSTRAIN_HOME=$(mktemp -dt) NEXTSTRAIN_DOCKER_IMAGE=quay.io/pypa/manylinux_2_28_x86_64 nextstrain setup docker
Setting up docker…
Traceback (most recent call last):
  File "/home/tom/nextstrain/cli/.venv/bin/nextstrain", line 8, in <module>
    sys.exit(main())
  File "/home/tom/nextstrain/cli/nextstrain/cli/__main__.py", line 19, in main
    return cli.run( argv[1:] )
  File "/home/tom/nextstrain/cli/nextstrain/cli/__init__.py", line 36, in run
    return opts.__command__.run(opts)
  File "/home/tom/nextstrain/cli/nextstrain/cli/console.py", line 36, in decorated
    return f(*args, **kwargs)
  File "/home/tom/nextstrain/cli/nextstrain/cli/command/setup.py", line 59, in run
    setup_ok = opts.runner.setup(dry_run = opts.dry_run, force = opts.force)
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 166, in setup
    if not setup_image(dry_run, force):
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 186, in setup_image
    update_ok = _update(dry_run)
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 308, in _update
    latest_image  = latest_build_image(current_image)
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 436, in latest_build_image
    build_tags = sorted(filter(is_build_tag, tags(repository)))
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 428, in tags
    "Authorization": "Bearer %s" % auth_token(repository),
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 423, in auth_token
    return GET(url, params = params).json().get("token")
  File "/home/tom/nextstrain/cli/nextstrain/cli/runner/docker.py", line 415, in GET
    response.raise_for_status()
  File "/home/tom/nextstrain/cli/.venv/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://auth.docker.io/token?scope=repository%3Aquay.io%2Fpypa%2Fmanylinux_2_28_x86_64%3Apull&service=registry.docker.io
@tsibley
Copy link
Member Author

tsibley commented May 9, 2023

Ah, this issue extends to any image name with an explicit registry in it, including docker.io when explicitly specified, because we treat it as part of the image name when constructing the API requests.

@tsibley
Copy link
Member Author

tsibley commented May 9, 2023

The best thing to do here might be to (finally) switch to using docker-py instead of directly making the API requests (and potentially down the road, instead of invoking docker directly). The docker-py library will handle registries correctly by interfacing with the Docker daemon and existing config (I believe).

@tsibley
Copy link
Member Author

tsibley commented May 9, 2023

Oh, hmm. Not quite with docker-py, as it doesn't have much in the way of support for interacting with registries. It has great support for interacting with the Docker daemon, which in turn can interact with registries, but not do things like list the tags for an image repository. On the command line, I'd use something like Skopeo, e.g.:

docker run \
  --rm \
  --interactive \
  --tty \
  --network host \
  -v "$HOME"/.docker/config.json:/config.json:ro \
    quay.io/skopeo/stable:v1 list-tags \
      --authfile /config.json \
      docker://docker.io/nextstrain/base

Ugh. I guess we could shell out to Skopeo inside a Docker container to get the tags?

tsibley added a commit to nextstrain/docker-base that referenced this issue May 9, 2023
… ghcr.io

This should fix the currently-broken test-pathogen-repo-ci jobs when our
CI runs on master² (and they should continue to work on branches too).
It requires a little rearranging of jobs in the workflow, with a little
additional and unfortunate complexity due to conditionals.

This works around a Nextstrain CLI bug with registries other than
docker.io¹ during `nextstrain update docker` (and `nextstrain setup
docker`), which is run as part of our setup-nextstrain-cli action used
by this workflow.  Ideally we'll fix that bug and then be able to revert
this, especially since we might actually want to condition the pushing
to docker.io on the outcome of these test jobs in the future.

¹ <nextstrain/cli#279>
² <#148 (comment)>
@tsibley tsibley self-assigned this May 11, 2023
@tsibley
Copy link
Member Author

tsibley commented May 12, 2023

Preliminary patch for this which I'll turn into a proper PR eventually.

diff --git a/nextstrain/cli/runner/docker.py b/nextstrain/cli/runner/docker.py
index 0ffe971..b0f4733 100644
--- a/nextstrain/cli/runner/docker.py
+++ b/nextstrain/cli/runner/docker.py
@@ -409,6 +409,15 @@ def latest_build_image(image_name: str) -> str:
 
     >>> latest_build_image("nextstrain/base")
     'nextstrain/base:build-...'
+
+    When a registry other than docker.io (Docker Hub) is present, *image_name*
+    is passed through unchanged.
+
+    >>> latest_build_image("ghcr.io/nextstrain/base")
+    'ghcr.io/nextstrain/base'
+
+    >>> latest_build_image("ghcr.io/nextstrain/base:build-20220215T000459Z")
+    'ghcr.io/nextstrain/base:build-20220215T000459Z'
     """
     def GET(url, **kwargs):
         response = requests.get(url, **kwargs)
@@ -429,13 +438,17 @@ def latest_build_image(image_name: str) -> str:
         }
         return GET(url, headers = headers).json().get("tags", [])
 
-    repository, tag = split_image_name(image_name, implicit_latest = False)
+    registry, repository, tag = split_image_name(image_name, implicit_latest = False, split_registry = True)
 
     if not tag or is_build_tag(tag):
-        build_tags = sorted(filter(is_build_tag, tags(repository)))
+        if registry == "docker.io":
+            build_tags = sorted(filter(is_build_tag, tags(repository)))
 
-        if build_tags:
-            return repository + ":" + build_tags[-1]
+            if build_tags:
+                # Intentionally omit the default registry
+                return repository + ":" + build_tags[-1]
+        else:
+            warn(f"Image registry is not docker.io (Docker Hub); checking for newer 'build-…' tags is unsupported.")
 
     return image_name
 
diff --git a/nextstrain/cli/util.py b/nextstrain/cli/util.py
index 8e09c54..2b20c28 100644
--- a/nextstrain/cli/util.py
+++ b/nextstrain/cli/util.py
@@ -500,14 +500,14 @@ def byte_quantity(quantity: str) -> int:
 
 
 @overload
-def split_image_name(name: str, implicit_latest: Literal[True] = True) -> Tuple[str, str]:
+def split_image_name(name: str, implicit_latest: Literal[True] = True, split_registry: Literal[False] = False) -> Tuple[str, str]:
     ...
 
 @overload
-def split_image_name(name: str, implicit_latest: Literal[False]) -> Tuple[str, Optional[str]]:
+def split_image_name(name: str, implicit_latest: Literal[False], split_registry: Literal[True]) -> Tuple[str, str, Optional[str]]:
     ...
 
-def split_image_name(name: str, implicit_latest: bool = True) -> Tuple[str, Optional[str]]:
+def split_image_name(name: str, implicit_latest: bool = True, split_registry: bool = False) -> Union[Tuple[str, Optional[str]], Tuple[str, str, Optional[str]]]:
     """
     Split the Docker image *name* into a (repository, tag) tuple.
 
@@ -522,13 +522,49 @@ def split_image_name(name: str, implicit_latest: bool = True) -> Tuple[str, Opti
 
     >>> split_image_name("nextstrain/base:latest", implicit_latest = False)
     ('nextstrain/base', 'latest')
+
+    Or optionally into a (registry, repository, tag) tuple.
+
+    >>> split_image_name("nextstrain/base", split_registry = True)
+    ('docker.io', 'nextstrain/base', 'latest')
+
+    >>> split_image_name("ghcr.io/nextstrain/base", split_registry = True)
+    ('ghcr.io', 'nextstrain/base', 'latest')
+
+    >>> split_image_name("ghcr.io/nextstrain/base", split_registry = True, implicit_latest = False)
+    ('ghcr.io', 'nextstrain/base', None)
     """
     if ":" in name:
         repository, tag = name.split(":", maxsplit = 2)
     else:
         repository, tag = name, "latest" if implicit_latest else None
 
-    return (repository, tag)
+    if not split_registry:
+        return (repository, tag)
+
+    # Default registry is Docker Hub, which has an older alias.  While
+    # registry-1.docker.io and registry.hub.docker.com are also working
+    # domains, they aren't treated by Docker as equivalent to the default.
+    default_registry = "docker.io"
+    default_aliases = {"index.docker.io"}
+
+    registry = default_registry
+
+    # Rules for determining if the first part is a registry domain or not.
+    # <https://github.com/distribution/distribution/blob/f7717b78/reference/normalize.go#L123-L140>
+    if "/" in repository:
+        parts = repository.split("/", 1)
+
+        if ("." in parts[0]
+        or ":" in parts[0]
+        or parts[0] == "localhost"
+        or parts[0] != parts[0].lower()):
+            registry, repository = parts
+
+            if registry in default_aliases:
+                registry = default_registry
+
+    return (registry, repository, tag)
 
 
 def glob_matcher(patterns: Sequence[str]) -> Callable[[Union[str, Path]], bool]:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

1 participant