-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
user pkg drain for draining in queue proxy #12033
Conversation
ef48e91
to
214f8c6
Compare
Codecov Report
@@ Coverage Diff @@
## main #12033 +/- ##
==========================================
- Coverage 87.36% 87.30% -0.07%
==========================================
Files 196 195 -1
Lines 9611 9569 -42
==========================================
- Hits 8397 8354 -43
Misses 935 935
- Partials 279 280 +1
Continue to review full report at Codecov.
|
/retest |
f9ca3e1
to
f505e5a
Compare
seeing this flaker here as well #12064 |
/assign @julz @markusthoemmes |
will work on comments and update PR |
I think the PR makes a little more sense now if anyone has the time to take a look, thanks |
pkg/queue/health/health_state.go
Outdated
adminMux := http.NewServeMux() | ||
handler := state.DrainHandlerFunc() | ||
adminMux.HandleFunc(queue.RequestQueueDrainPath, func(w http.ResponseWriter, r *http.Request) { | ||
handler(w, r) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling the admin server in here sorta conflates the responsibilities of the health.State
I think you can leave the admin server as it was in queue proxy's main and simply just call Drain
681b95f
to
ec3d057
Compare
I think this PR is ready now |
cmd/queue/main.go
Outdated
composedHandler = network.NewProbeHandler(composedHandler) | ||
// We might sometimes want to capture the probes/healthchecks in the request | ||
// logs. Hence we need to have RequestLogHandler to be the first one. | ||
if env.ServingEnableRequestLog { | ||
composedHandler = requestLogHandler(logger, composedHandler, env) | ||
} | ||
|
||
return pkgnet.NewServer(":"+env.QueueServingPort, composedHandler) | ||
drainer.Inner = composedHandler | ||
drainer.HealthCheck = composedHandler.ServeHTTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to convince myself this is correct. It seems like this makes the health check run the whole handler chain for a health check - is that what we want? I guess is works because the chain presumably still ends up in the health check path due to the header being there, but it seems a bit duplicative/unexpected that the health check be the whole handler chain? Wonder if we can have the health check be just the health check here
I think there may also be a mismatch between the queue behaviour and how pkg/drain acts. Specifically in the existing code when a knative prober header is passed (as opposed to the Kubernetes probe header) the queue only responds with queue.Name when the health check passes, whereas in drain serveKProbe always returns the queue.Name, even if the health check isn't passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically in the existing code when a knative prober header is passed (as opposed to the Kubernetes probe header) the queue only responds with queue.Name when the health check passes
The link here is actually the queue proxy health check code and not the kprobe logic.
KProbe logic is wired in prior earlier in the handler chain:
Lines 314 to 315 in 70c7fc0
composedHandler = health.ProbeHandler(healthState, probeContainer, tracingEnabled, composedHandler) | |
composedHandler = network.NewProbeHandler(composedHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, good point - I think you're right, but I also tbh kind of think this highlights that the flow gets a bit confusing with the whole chain being passed as both Inner and HealthCheck (at minimum I'm finding it quite difficult to follow now!). Example: IIUC right now we have network.NewProbeHandler in the composedHandler chain (as linked above) and also we handle kprobes in pkg/drain - I think they both do the same thing so it's probably OK, but isn't one of them redundant? (And same question wrt the non-kprobe health checks, is health.ProbeHandler ever being called other than via pkg/drain calling its HealthCheck field?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I brought that up #12033 (review) - because it was unclear to me why the drainer has kprobe logic when it exists in knative.dev/networking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And same question wrt the non-kprobe health checks, is health.ProbeHandler ever being called other than via pkg/drain calling its HealthCheck field?)
I would lump this into the figure out kprobe handling and then re-organize the handlers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the health.ProbeHandler
is being called by the drainer when it calls healthcheck here which now has this line io.WriteString(w, queue.Name)
the TestRequestLogs
would fail otherwsie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue in the pkg repo knative/pkg#2324 to tackle the duplication, but will need to do that a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the above issue would help with the duplication of kprobe handling. Do we also have/need a plan to deal with the other health check being duplicated via both pkgdrain's HealthCheck field and also accessible by the regular chain? TBCH even now with the context in my head about what drainer does I don't completely follow which paths the health check might be called by (both? only via drainer.healthcheck?), so I'd really like us to have a plan at least to clear that up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julz I've made a change to use a different handler for the health check that only has the necessary handlers, does that make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is starting to look pretty good to me - thanks! Left a couple extra comments where I think (if my understanding is right) we may be able to "complete the refactor" a bit by dropping the health check from the existing path too
cmd/queue/main.go
Outdated
@@ -311,15 +305,24 @@ func buildServer(ctx context.Context, env config, healthState *health.State, pro | |||
composedHandler = tracing.HTTPSpanMiddleware(composedHandler) | |||
} | |||
|
|||
composedHandler = health.ProbeHandler(healthState, probeContainer, tracingEnabled, composedHandler) | |||
composedHandler = health.ProbeHandler(probeContainer, tracingEnabled, composedHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still needed? (Is there a case where we need to do the health check that drainer doesn't intercept?). If not it'd be great if we could drop this so there's only one path the healthcheck can get called from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I try to remove this, I get failures in the tests, for example
stream.go:254: W 13:56:38.802 activator-669f74ccf4-lg59r [activator] [serving-tests/request-logs-fxrcfgjb-00001] Failed probing pods err=unexpected body: want "queue", got "Hello World! How about some tasty noodles?\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting, looks like that's because the activator sometimes does the probing directly as an optimisation to try to detect readiness faster than kubernetes can, and when it does it passes Activator as the user agent but drainer only calls healthcheck if the UA is kube-probe. So that path will be directly hitting the health probe (not via drainer) and since this PR drops healthState from the main path I think that means it'll continue to see 200s while we're shutting down, which may not be what we want 🤔. Wonder if there's any nice way to get pkgdrain to answer activator's probes as well as kubelets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just add a check in the pkg/drain.go to also check if activator probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable to me, yeah, it's a bit knative specific, but it is in knative/pkg
so maybe that's fine :) If not I guess we could add a list of user agents we'll accept health checks from, but I'd go for the simple thing if no-one hates that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm, will make a PR in pkg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think activator specific logic should be in pkg/drain
- that package is used in eventing
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really activator specific logic, just checking if header has a specific value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/queue/main.go
Outdated
var healthcheckHandler http.Handler = httpProxy | ||
healthcheckHandler = health.ProbeHandler(probeContainer, tracingEnabled, healthcheckHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be no case where we'd want to delegate down to the proxy from the health check now (drain is doing the switching between the healthcheck/inner paths), so maybe this could become:
var healthcheckHandler http.Handler = httpProxy | |
healthcheckHandler = health.ProbeHandler(probeContainer, tracingEnabled, healthcheckHandler) | |
var healthcheckHandler healthcheckHandler = health.ProbeHandler(probeContainer, tracingEnabled) |
(just trying to make it so there's two independent paths, to the extent possible)
/hold |
b159476
to
0a4481b
Compare
/test pull-knative-serving-istio-stable-no-mesh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me bar the typo! thanks for putting up with all the back and forth on this one :)
cmd/queue/main.go
Outdated
// Add Activator probe header to the drainer so it can handle probes directly dfrom activator | ||
drainer.HealthCheckUAPrefixes = []string{network.ActivatorUserAgent} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not very important, but I'm wondering why we do this here rather than when we create the drainer struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason, moved it to be set in struct creation similar to the QuietPeriod
cmd/queue/main.go
Outdated
} | ||
|
||
return pkgnet.NewServer(":"+env.QueueServingPort, composedHandler) | ||
drainer.Inner = composedHandler | ||
// Add Activator probe header to the drainer so it can handle probes directly dfrom activator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Add Activator probe header to the drainer so it can handle probes directly dfrom activator | |
// Add Activator probe header to the drainer so it can handle probes directly from activator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: julz, nader-ziada The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
if env.ServingEnableRequestLog { | ||
composedHandler = requestLogHandler(logger, composedHandler, env) | ||
healthcheckHandler = requestLogHandler(logger, healthcheckHandler, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
composedHandler needs to be wrapped with the requestLogHandler here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*in addition to the healthcheckHandler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its required in both? the RequestLogTest
is passing without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should - otherwise normal requests to the user container aren't logged
Fixes #10783
Proposed Changes
Release Note