-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness probe failing with 400 #12462
Comments
Hi @Shashankft9 - could you share details of how /health/liveness and /health/readiness are implemented. In particular are these endpoints working if accessed directly, is there any delay before they start reporting success etc, and do they listen on the same port as other endpoints in the user container? Also do you have Istio mesh enabled in this environment? If you could share an (ideally minimal) image that reproduces the problem that would be super helpful too. Thanks! |
Hey,
and then these 2 handlers are just doing You can try reproducing this issue through this yaml:
couple of dumb questions, might not be related but:
thanks for helping and apologies for the noise if any. |
assuming the fact you have a separate healthz and readyz implies they both do something more interesting than return 200 in all cases, are you able to share the code behind them (or an outline of what they do)? If you replace them with an endpoint that does always return 200, does anything change? Also could you confirm whether Istio mesh is enabled or not? Thanks! The port should be automatically set by knative so you shouldn't need to specify one (so setting 0 will result in the correct port being added). The readiness and liveness probes are rewritten so that the readiness probe goes via the Queue Proxy (so that a single endpoint can report readiness for both, which is needed by some optimisations in Activator). |
oh sorry, I thought I mentioned about mesh in the previous message - no I dont have istio mesh enabled.
|
@julz I am able to reproduce it on minikube with a trivial service. Here is the probe log entries at the queue proxy side.
Code and service def here. |
The error I see is the following (used the original service and set --v=4 at the kubelet side):
It seems that the header value is not ok. Error comes from here.
causes this. wdyth @julz, just returning if |
interesting, thanks for digging in to this @skonto! What I don't quite follow is if I'm reading it right previously if ph was "" it would call next.ServeHTTP but next (the removed final parameter) was nil before the PR 🤔. Do you understand how/why that works? (Or what I'm missing 😅) |
right, I understand why it fails, I just don't think I quite follow why the old code worked in this case. Previously in this case ph was == to "" so we called next.ServeHTTP, but wasn't |
Yeah sorry misread, also deleted my comment not fast enough I guess. My understanding is that we dont need the handler next since we can stop at the healthcheck side of the drainer, it is a probe and can be handled there by just returning, the drainer will either return here or serve the next handler here. In the past though we didnt have the logic of the drainer and we directly passed the composedHandler in that probeHandler here. |
ahhh ok I think I see now: suspect this was broken by #12033 not #12318. As of #12033 we could end up in drainer with Do you want to PR the fix and I'll review? (or can do other way around, just let me know!) |
I am off for the rest of the week so feel free, have limited access :) |
cc @nader-ziada |
/triage accepted |
/assign |
What version of Knative?
v1.1.0
Expected Behavior
I have defined liveness and readiness probe on the ksvc, looks like this:
I am handling these paths in the go code. I was expecting that it should work out like any other pod.
Actual Behavior
It all works in v0.25 knative, but when I am testing this on the newer v1.1.0, the ksvc pod looks fine for first 5-10 seconds, but after that the liveness probe fails and then the readiness probe as well, with below events:
maybe the timing of the occurence has something to do with
PeriodSeconds
?I tried to compare the pod yaml of the ksvc deployed with v0.25 and v1.1.0, and I can see this bit missing in the pod with the v1.1.0:
Is this somehow related?
Steps to Reproduce the Problem
I first noticed this error while deploying functions using the knfunc cli, but then I was able to reproduce the issue with a sample ksvc with probe configuration added and deploying it in the latest release of knative.
The text was updated successfully, but these errors were encountered: