Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add port spec in probes #8288

Closed
ghost opened this issue Jun 10, 2020 · 33 comments · Fixed by #12225 or #12606
Closed

Add port spec in probes #8288

ghost opened this issue Jun 10, 2020 · 33 comments · Fixed by #12225 or #12606
Assignees
Labels
area/API API objects and controllers kind/feature Well-understood/specified features, ready for coding.
Milestone

Comments

@ghost
Copy link

ghost commented Jun 10, 2020

In what area(s)?

Remove the '> ' to select

/area API
/area autoscale
/area build
/area monitoring
/area networking
/area test-and-release

Other classifications:

/kind good-first-issue
/kind process
/kind spec
-->

Describe the feature

Currently we can add host, httpHeaders, path, scheme under HTTPGetAction.If possible can we add port also .
something like this

livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3
@ghost ghost added the kind/feature Well-understood/specified features, ready for coding. label Jun 10, 2020
@ghost ghost changed the title Add port in probes Add port spec in probes Jun 10, 2020
@JRBANCEL
Copy link
Contributor

/area API
Why would the probe port be different than the inbound port?
See the runtime contract:

The core.v1.Container object allows specifying both a readinessProbe and a livenessProbe. If not provided, container startup and listening on the declared HTTP socket is considered sufficient to declare the container "ready" and "live" (see the probe definition below). If specified, liveness and readiness probes are REQUIRED to be of the httpGet or tcpSocket types, and MUST target the inbound container port; platform providers SHOULD disallow other probe methods.

@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Jun 12, 2020
@anishpravin
Copy link

livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3

As the above one is the default liveness-probes setting,I am looking to have exactly same thing to add it in the knative.is that possibel? @JRBANCEL

@emaildanwilson
Copy link
Contributor

How is this supposed to work with multi-container if only one container can specify a port?

Without being able to specify a port in the probe configuration, it seems the only current option is to perform no readiness/liveness probes on containers that are not directly serving traffic.

Am I missing some other way to define this?

@JRBANCEL
Copy link
Contributor

JRBANCEL commented Aug 6, 2020

The container serving traffic could probe the other containers.
This doesn't seem right though.
@markusthoemmes do you have any though on this issue regarding multi-containers?

@mattmoor
Copy link
Member

mattmoor commented Aug 6, 2020

We rewrite the port for network probes of the user container regardless to test the full network path (and ensure the queue-proxy is up). cc @tcnghia

Let's open something else for multi-container.

@github-actions
Copy link

github-actions bot commented Nov 5, 2020

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2020
@github-actions github-actions bot closed this as completed Dec 5, 2020
@evankanderson
Copy link
Member

/reopen

I'm going to necro this based on some feedback from the Convention Service team at Tanzu.

It's not clear why the port parameter on httpGet and tcpSocket probes is forbidden; for good or ill, these fields are separate from the ports specification of the container:

Today you have to write:

ports:
    - containerPort: 8080
      protocol: TCP
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 0
        scheme: HTTP
      successThreshold: 1

And the following produces an error:

ports:
    - containerPort: 8080
      protocol: TCP
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8080
        scheme: HTTP
      successThreshold: 1

Furthermore, it's not clear why a user-container might not want to open two separate HTTP ports -- one for healthcheck probing, and one for the data path. That might look like:

ports:
    - containerPort: 8080
      protocol: TCP
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8081
        scheme: HTTP
      successThreshold: 1

In this way, only the queue-proxy can probe the /actuator/health/readiness endpoint, but not an end-user, which can simplify or remove the need for application control permissions in some cases.

/assign @dprotaso @julz

To tell me if my reasoning is good or bad.

Strangely, the Knative API spec doesn't mention the port field at all in the schema, which means that this would technically be an API extension by the OSS implementation (which is fine and allowed).

@knative-prow-robot
Copy link
Contributor

@evankanderson: Reopened this issue.

In response to this:

/reopen

I'm going to necro this based on some feedback from the Convention Service team at Tanzu.

It's not clear why the port parameter on httpGet and tcpSocket probes is forbidden; for good or ill, these fields are separate from the ports specification of the container:

Today you have to write:

ports:
   - containerPort: 8080
     protocol: TCP
   readinessProbe:
     httpGet:
       path: /actuator/health/readiness
       port: 0
       scheme: HTTP
     successThreshold: 1

And the following produces an error:

ports:
   - containerPort: 8080
     protocol: TCP
   readinessProbe:
     httpGet:
       path: /actuator/health/readiness
       port: 8080
       scheme: HTTP
     successThreshold: 1

Furthermore, it's not clear why a user-container might not want to open two separate HTTP ports -- one for healthcheck probing, and one for the data path. That might look like:

ports:
   - containerPort: 8080
     protocol: TCP
   readinessProbe:
     httpGet:
       path: /actuator/health/readiness
       port: 8081
       scheme: HTTP
     successThreshold: 1

In this way, only the queue-proxy can probe the /actuator/health/readiness endpoint, but not an end-user, which can simplify or remove the need for application control permissions in some cases.

/assign @dprotaso @julz

To tell me if my reasoning is good or bad.

Strangely, the Knative API spec doesn't mention the port field at all in the schema, which means that this would technically be an API extension by the OSS implementation (which is fine and allowed).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@evankanderson
Copy link
Member

CC @scothis , because he'd complained about it before as well.

@evankanderson
Copy link
Member

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2021
@dprotaso dprotaso added this to the Ice Box milestone Oct 25, 2021
evankanderson pushed a commit to evankanderson/serving that referenced this issue Nov 4, 2021
knative-prow-robot pushed a commit that referenced this issue Nov 8, 2021
…12225)

* Allow probes to explicitly set the port to the containerPort (#8288)

* Address julz feedback
knative-prow-robot pushed a commit to knative-prow-robot/serving that referenced this issue Nov 12, 2021
knative-prow-robot added a commit that referenced this issue Nov 12, 2021
…rPort (#8288) (#12270)

* Allow probes to explicitly set the port to the containerPort (#8288)

* Address julz feedback

Co-authored-by: Evan Anderson <evana@vmware.com>
@dprotaso dprotaso modified the milestones: Icebox, v1.1.0 Jan 4, 2022
@odinnordico
Copy link

I have an special use case related to this, i need to run health checks on a different port than the serving port but as per validation added by #12225 it is only allowed probes to be in same port as serving

@evankanderson
Copy link
Member

/reopen
/good-first-issue

@evankanderson
Copy link
Member

The notion here seems to be that you don't want to expose (for example) the health or liveness probes to external application consumers, but you do want to be able to have that communication channel between the infrastructure and the application. The way you would express that would look something like (showing only PodSpec fields):

spec:
  - container:
    image: foo
    ports:
    - containerPort: 8080
    readinessProbe:
      httpGet:
        path: /health
        port: 5000

With the intent that GET /health on the main server port might be an actual application request or a 404, but that the request on port 5000 (only available to infrastructure) would be a health check.

@mattmoor
Copy link
Member

mattmoor commented Jan 6, 2022

If the user specifies a probe on a separate port, then I think a tcp probe on the main port (via the QP) should still be defaulted. For the portless option, we rewrite it to test the QP path.

@evankanderson
Copy link
Member

What if it is a tcpProbe on a separate port?

I think the PodSpec only allows a single readiness / liveness probe, though we could obviously manage things differently in the queue-proxy.

@mattmoor
Copy link
Member

mattmoor commented Jan 6, 2022

Yes, that's exactly what I mean. I think the single probe thing is from our validation, not K8s. Worth confirming, and probably relaxing.

(on a completely unrelated note, I'm very excited for the GRPC probing stuff 🤩 )

@odinnordico
Copy link

What would happed if the probes are in an application context of management which is different from main app context (serving) and, each context is exposed in different ports, the management context besides the probes has other app settings so the port should be available to operators to changes those management settings, i would need at least two containerPort one for the app serving and the other for probes and operator management.

@mattmoor
Copy link
Member

mattmoor commented Jan 6, 2022

Right, that is why I think we should relax things (assuming K8s allows it, which seems likely given it is a list).

@evankanderson
Copy link
Member

@mattmoor / @odinnordico

Container has LivenessProbe, ReadinessProbe and StartupProbe fields, which are all of type *Probe (which means only a single probe). The Probe type embeds a
ProbeHandler, which is intended to be a discriminated union (only one value set), which means that each Container should have zero or one probes of any type.

This means that we can't add a tcpPort probe on the serving port if the container defines another health check.

However, we could have the queue-proxy fail its readiness probes if the serving port isn't opened in a reasonable time.

@dprotaso dprotaso modified the milestones: v1.1.0, Icebox Jan 11, 2022
@mattmoor
Copy link
Member

I think I was (incorrectly) remembering it as a list field (like ports!), so my bad.

@dprotaso
Copy link
Member

dprotaso commented Jan 11, 2022

FYI - there's been some queue-proxy probe related changes since this issue was originally filed: #11956

I added this to the Icebox for me to revisit

@odinnordico
Copy link

Well my intent, going basic, is to have multiple ports in the serving container, with that i can address my probes/management scenarios

@evankanderson
Copy link
Member

Do you actually want to serve end-user traffic on multiple ports, or simply bind() and listen() on those ports?

Knative doesn't prevent the second, but feels pretty strongly that a pod has exactly one port which receives outside traffic for the purposes of scaling and concurrency (the first case).

@scothis
Copy link
Contributor

scothis commented Jan 13, 2022

Do you actually want to serve end-user traffic on multiple ports

No, only the primary port should be exposed to ingress traffic and autoscaled for requests. The second port is for management actions that will connect to the container via other means (like a metrics endpoint). The particular use case is for a Spring Boot workload, which hosts the liveness and readiness endpoints with the other management endpoints that we would rather not expose to ingress. Adding an additional entry in the ports list is useful for avoiding conflicts, but not strictly required.

@odinnordico
Copy link

Wanted to clarify the outcomes we need to realize here.
We need to address security concerns...

The final pod will have:

  1. a serving port for web traffic that is/can-be exposed to ingress traffic
  2. a separate port for management actions to which ingress traffic can be prevented. The traffic to that port needs to go all the way to the service container.

@evankanderson
Copy link
Member

For point 2, I think that you're asking that traffic to the second port is only exposed within the Pod, but if you're asking something different (for example, to directly address individual Pods from another node on the cluster), then loosening the probe restrictions won't be sufficient.

The reason that supporting probes on a different TCP port is more difficult than you might expect is that Knative actually rewrites the TCP and HTTP probes to be executed by the queue-proxy, which allows us to probe for container readiness at a faster rate than allowed by the kubelet (which has a minimum 1s period). During cold-start, the activator eagerly probes the queue-proxy, and the queue-proxy eagerly probes the user container, allowing the request to be delivered to application code before Kubernetes has recognized that the Pod is Ready. Unfortunately, since we've written a bunch of code there, it's quite possible that changing those assumptions will introduce new bugs, as in #12462 .

#8471 has one use case for exposing multiple ports on the container (Prometheus metrics scraping), but "a container that provides service on multiple ports" is a larger API change to Knative and will probably need a bit more thought. For Prometheus, it turned out that an annotation worked in place of specific containerPorts, but I'm not sure if I'm parsing request 2 correctly.

@odinnordico
Copy link

Yes, the traffic in 2nd port would within the pod/deployment , not need to be opened outside due to the second port is for management matters, what we wanted to say in 2 is if after the deployment an operator want to do some management to the app then he can connect to the pod on that port and the request would go to app container (it can be through the sidecar).

@evankanderson
Copy link
Member

I'm assuming that you don't mean a human operator ("he" or "she") would be running in some sort of sidecar (I'm not sure if you mean queue-proxy or something else when you say "through the sidecar"). Given that the pods may be deleted at any moment and replaced with other pods, I'm guessing that you mean "some sort of control system elsewhere in the cluster would be periodically connecting to the Pods as they are created and deleted, and hopefully the Pods would work properly before the management happens (so that requests sent to them would not result in errors)". That suggests that you do need the ports to be reachable outside the Pod, but not outside the cluster ("do not need to be opened outside" is vague -- outside of what?).

Again, a bit more specifics will help. I'm guessing that you may be referring to Actuator endpoints for Spring Boot as the application functionality you want exposed on a different port?

@odinnordico
Copy link

Yes @evankanderson you are right, and yes in general all this is about the Actuator endpoints that you linked; by the sidecar i mean the queue-proxy; the. management actually can be done by a control system in the cluster as you mentioned or some specific change by a human operator on an emergency debug for example regardless if the pod is being deleted or not, then the actuator endpoints in the pod should be accessed, as you said, inside cluster only and those are going to be exposed in a different port than the normal http traffic

@evankanderson
Copy link
Member

Is it expected that interacting with the management port will change the state of the running application, or is it for observability (including debugging) only?

I'm concerned about the pattern of "start a Pod, need some sort of poke on a management port before it can be Ready / serving correctly" having bad interactions with Knative's lifecycle management. (As one example, if the management service doesn't get to a Pod in time during Revision first-start, the entire Revision might get abandoned as "failed".)

@odinnordico
Copy link

it is expected for observability only, not expected the interaction with management change the app status in a way it affects lifecycle

@evankanderson
Copy link
Member

Whoops, just realized that we should continue this discussion in #8471 in terms of supporting containers providing off-Pod endpoints on multiple ports.

The end-resolution here is that we should loosen the probing restrictions to allow probing a different port than the serving port for healthchecks, with the acknowledged risk that the main serving port is actually malfunctioning, but the health-check port is working correctly.

@dprotaso dprotaso modified the milestones: Icebox, v1.3.0 Feb 17, 2022
@xtreme-sameer-vohra xtreme-sameer-vohra changed the title Add port spec in probes Add port spec in probes (PR https://github.com/knative/serving/pull/12606) Feb 22, 2022
@xtreme-sameer-vohra xtreme-sameer-vohra changed the title Add port spec in probes (PR https://github.com/knative/serving/pull/12606) Add port spec in probes Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/feature Well-understood/specified features, ready for coding.
Projects
None yet