Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached entries are not invalidated on renewal #50

Closed
NetForce1 opened this issue Mar 19, 2024 · 8 comments
Closed

Cached entries are not invalidated on renewal #50

NetForce1 opened this issue Mar 19, 2024 · 8 comments

Comments

@NetForce1
Copy link

NetForce1 commented Mar 19, 2024

Describe the bug

When a certificate is renewed, the cached entries are not invalidated, so the old certificate is still used. And, in the case where a domain name is added to the server block, that old certificate is also used for the new domain name.

To reproduce

Steps to reproduce the behavior:

  1. Have a certificate issued and cached
  2. Add a second domain name to the server block (and $njs_acme_server_names)
  3. Reload (not restart) nginx
  4. Invoke the auto renew endpoint
  5. Visit the newly added domain
  6. The certificate used does not contain the new domain name

Expected behavior

The newly issued certificate should be used immediately for both domains.

Your environment

@zsteinkamp
Copy link
Contributor

Thanks @NetForce1 for the report. I'm looking into this now. Could you post your nginx.conf so I could have a look? If you are using env vars instead of nginx vars to configure the package, could you include those as well?

@NetForce1
Copy link
Author

NetForce1 commented Mar 19, 2024

I justed tested with the container setup in this repository.

This is my reproduction:

  1. Add proxy1.nginx.com as an extra alias to the nginx service in docker-compose.yml
  2. Add an extra location to nginx.conf to force the auto mode (using js_content acme.clientAutoModeHTTP;
  3. Started the dev container
  4. Call the auto-endpoint, or wait a minute for the js_periodic to kick in
  5. Retrieve https://proxy.nginx.com:8443
  6. Add proxy1.nginx.com to $njs_acme_server_names in nginx.conf
  7. Run bash in the nginx container (docker compose exec nginx /bin/bash)
  8. Reload nginx (nginx -s reload)
  9. Call the auto-endpoint, or wait a minute for the js_periodic to kick in
  10. Observe that a new certificate is created (and written to /etc/nginx/njs-acme/proxy.nginx.com.crt) that includes proxy1.nginx.com
  11. Retrieve https://proxy1.nginx.com:8443 using openssl s_client
  12. Decode the returned certificate and observe that it does not include proxy1.nginx.com

I also don't see any invalidation in the code. I would expect it to happen in clientAutoModeInternal in index.ts after the certificate is succesfully written to disk (just before the final return maybe). The only reference to ngx.shared is in readCertOrKey where the items are read from and written to the cache.

edit:
Or, instead of adding proxy1.nginx.com you could remove proxy2.nginx.com from $njs_acme_server_names, and re-add it later.

@zsteinkamp
Copy link
Contributor

zsteinkamp commented Mar 19, 2024

Ahh ok. Thanks for the additional clarification. I was recreating the container (docker compose up -d --force-recreate nginx) after changing the config. Will try to repro here soon, and I agree with your intuition around the lack of updating the js_shared_dict_zone. Will report back. Thanks!

@NetForce1
Copy link
Author

Adding this at the end of clientAutoModeInternal fixes it:

  const zone = acmeZoneName(r)
  const cache = zone && ngx.shared && ngx.shared[zone]
  if (cache) {
    cache.delete('acme:' + pkeyPath)
    cache.delete('acme:' + certPath)
  }

(and import acmeZoneName of course.

But, I'm not sure this is the cleanest approach, and I also don't know much about how concurrency works in njs. Maybe this can cause a situation where an old certificate is used with a new private key or vice versa.

@NetForce1
Copy link
Author

NetForce1 commented Mar 19, 2024

We could just clear the cache, but that needs njs 0.8.3: nginx/njs#690.

edit:
Hm, it looks like clear has another issue. In my tests it hangs when called on an empty dict.

@zsteinkamp
Copy link
Contributor

Thanks @NetForce1 - This should be fixed with the merge above. I have another PR that is adjacent to this, so you may want to wait for that to be merged before taking the time to update your system. We will do a version bump on njs-acme after that's merged.

@NetForce1
Copy link
Author

Awesome, thanks!

@zsteinkamp
Copy link
Contributor

@NetForce1 could you email me at z.steinkamp@f5.com? We have some thanks we'd like to send to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants