Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update (rewrite) eventing specs #25

Merged
merged 65 commits into from
Jul 22, 2021

Conversation

evankanderson
Copy link
Member

This builds on #24

Changes

  • Remove helper.md and sources.md, as it's not clear how much here needs to be standardized.
  • Combine remaining control-plane documents (broker.md, channel.md, and interfaces.md) into a single control-plane.md file, similar to serving.
  • All the changes from Update (rewrite) overview, motivation, and data plane #24
  • Introduce a bunch of MAY/MUST language in addition to SHOULD.

This should probably be considered a rewrite; there are a number of places where I either clarified or rewrote APIs (in particular, status and deadLetterSink, which were not handled consistently).

/kind cleanup api-change enhancement

Fixes knative/eventing#4595

Release Note

Rewrite eventing control plane and event routing spec.

Docs

This may need to search & replace a few links in knative/docs. Assuming that the update looks reasonable, I'll add a PR here to do that in the next day or two.

@knative-prow-robot knative-prow-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/enhancement needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 9, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label May 9, 2021
@knative-prow-robot knative-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 9, 2021
@evankanderson
Copy link
Member Author

/assign @vaikas @n3wscott @lionelvillard

Copy link
Contributor

@vaikas vaikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
the `spec.subscriber.ref` field points to a resource which does not exist or
cannot be resolved via [Addressable resolution](#addressable-resolution), the
Trigger MUST set the `Ready` condition to `false`, and at least one condition
should indicate the reason for the error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just jotting this down so I don't forget, but we should also mention about the Dependency Annotation errors.
Specifying the annotation: knative.dev/dependency

This never made it to spec :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to either code or doc for this?

I'm guessing this is knative/eventing#4486 ?

How critical is it to have this in the spec, vs it being a documented addition by the OSS software? i.e. is this a foundational part of the interface, or a make-things-easier for the current implementation?

[metav1.ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#objectmeta-v1-meta)
resource.

### Spec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the dependency annotation that I think I mentioned in the other PR or somewhere else here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knative/eventing#4486

(which is still open)

If it's important, I'm willing to enforce it on all specs.

specs/eventing/control-plane.md Show resolved Hide resolved
## duckv1.Destination

Destination is used to indicate the destination for event delivery. This is an
exclusive union; excatly one field MUST be set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, typo: excatly => exactly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice eyes!

## duckv1.Destination

Destination is used to indicate the destination for event delivery. This is an
exclusive union; excatly one field MUST be set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks!

<tr>
<td><code>attributes</code></td>
<td>map[string]string</td>
<td>Event filter using exact match on event context attributes. Each key in the map is compared with the equivalent key in the event context. An event passes the filter if the event attribute values are equal to the specified values.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an empty string, it means match like '*'
knative/eventing#4847

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.


- Knative Eventing aims to enable highly-reliable event processing workflows. As
such, it prefers duplicate delivery to discarded events. The CloudEvents spec
does not take a stance here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a bit too restrictive. In particular, the IMC-based broker offers no such guarantees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does prefer them :) They are just not persisted anywhere, but point taken :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in-memory-controller SHOULD do the best it can to prefer duplicate delivery to discarded events. This means that it should implement retry, dead-letter, etc to the best of its ability.

Of course, its durability is limited by the backing storage, but that's true of other platforms like Kafka, too (i.e. if you run Kafka with default.replication.factor=1 and lose a drive, there's also data loss).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's even harder to have guarantees if you have to worry about flooding, meteor strikes, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are use cases for at-most-once delivery guarantee, which prefer losing events to duplicate delivery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are such use cases, but Knative eventing doesn't particularly design for that use case, which is different.


Event recipients MUST use the HTTP response code to indicate acceptance of an
event. The recipient MUST NOT return a response accepting the event until it has
handled event (processed the event or stored in stable storage). The following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how this can be enforced and IMO it's too restrictive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it might be hard to prove that an implementation was playing fast and loose. I can switch this to SHOULD if you'd prefer.

I think the objection about IMC is a red herring; there are a lot of other decisions (retry, timeout, etc) which could have a higher impact on failure rate than the pod lifetime, depending on the environment.

specs/eventing/data-plane.md Outdated Show resolved Hide resolved
specs/eventing/data-plane.md Show resolved Hide resolved
specs/eventing/data-plane.md Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
specs/eventing/control-plane.md Outdated Show resolved Hide resolved
1. Attempt delivery to the `status.subscriberUri` URL following the
[data plane contract](./data-plane.md).

1. If the event delivery fails with a retriable error, it SHOULD be retried

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUST? Should this SHOULD be a MUST?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm happy to promote this to a MUST, but I wasn't sure how e.g. crashing in the middle of a delivery attempt might affect these parameters.

specs/eventing/control-plane.md Show resolved Hide resolved
@evankanderson
Copy link
Member Author

I just discovered that I (somehow) flubbed duckv1.KReference; it had somehow had an address field instead of name. More importantly, it turns out a KReference does include a namespace field, so I'll go back through the comments where I asserted that there weren't cross-namespace references and correct and reopen those comments.

@evankanderson
Copy link
Member Author

I updated the language in control-plane.md to highlight that these requirements were for server-side implementation.

@aslom
Copy link
Member

aslom commented Jul 22, 2021

I opened issue #28 about supporting future upgrades, version mixing and version detection.

It seems it is not supported today? What happens in future when Knative is upgraded and event processor were built to support 1.0 (sources, cloud apps that act sinks etc.) - can Knative adapt to mixed 1.0 and newer event processors?

@evankanderson
Copy link
Member Author

I owe cleaning up (per KTC notes):

Outstanding big discussions

  • Subscription.reply magic
  • Last column of detailed resource tables
  • Generation… fields

@duglin
Copy link
Contributor

duglin commented Jul 22, 2021

On the TM call today we agreed to merge the PR (once Evan gives me the "ok") and then work on the outstanding items in follow-on PRs.

Evan (speaking for Brenda) gave VMware's approval
Spencer gave Google's approval
Doug gave IBM's approval

@duglin
Copy link
Contributor

duglin commented Jul 22, 2021

HUGE thanks to @evankanderson for all his work and patience in this process!!!!

@evankanderson
Copy link
Member Author

I opened issue #28 about supporting future upgrades, version mixing and version detection.

It seems it is not supported today? What happens in future when Knative is upgraded and event processor were built to support 1.0 (sources, cloud apps that act sinks etc.) - can Knative adapt to mixed 1.0 and newer event processors?

I think the 1.1 spec needs to take 1.0 compatibility into account; I don't think there's anything we can do today to ensure that any 1.1 would be compatible with this 1.0.

@evankanderson
Copy link
Member Author

I updated:

  • Last column of detailed resource tables
    So I have "Remove Subscription.reply magic" and "Document generation/observedGeneration" left for separate PRs.

@duglin
Copy link
Contributor

duglin commented Jul 22, 2021

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: duglin, evankanderson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [duglin,evankanderson]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit c348f50 into knative:main Jul 22, 2021
Copy link

@devguyio devguyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I am late to the party and this looks like a very late review so I'm gonna leave my comments fwiw. Late better than never. I only reached 1/4th of the control plane. Will keep adding my comments in batches.

example_ of the minimal profile rather than a requirement. This Role is
sufficient to develop, deploy, and manage event routing for an application
within a single namespace. Knative Conformance tests against "MUST", "MUST NOT",
"SHALL", "SHALL NOT", and "REQUIRED" conditions are expected to pass when using

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does expected here mean? is it a "SHOULD" ? Would this affect conformance? i.e. if the RBAC profile needed is different or stricter, would it still be compliant with this spec? specially given that it states above that this is an example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes RBAC is not a requirement for Knative Eventing. The conformance requirement is that there is a cluster and a user set up (with a KUBECONFIG file for auth) which the tests execute aginst.

It would be possible (for example) to have a conformant cluster which implemented no access control, only resource validation. On the other hand, a cluster where Channel / Subscription are forbidden to all users can't pass the conformance tests and so isn't conformant, even if it's running code which might otherwise be conformant.

and MUST attempt to forward the received events for filtering to each associated
Trigger whose `Ready` condition is `true`. As described in the
[Trigger Lifecycle](#trigger-lifecycle) section, a Broker MAY forward events to
an associated Trigger destination which which does not currently have a `true`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
an associated Trigger destination which which does not currently have a `true`
an associated Trigger destination which does not currently have a `true`

`Ready` condition, including events received by the Broker before the Trigger
was created.

The annotation `eventing.knative.dev/broker.class` SHOULD be used to select a
Copy link

@devguyio devguyio Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that a platform which ignores the annotation and forces users to use a specific Broker regardless of other installed Knative brokers is considered Knative compliant? Is there a reason this is not a MUST ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

In particular, there's been discussion at KTC about alternate implementations which might or might not use the OSS code. You can imagine an implementation which only offers a single Broker type and ignores the annotation.

It's not clear how, from a conformance-testing point of view, we'd enforce a MUST here.

`spec.delivery.deadLetterSink` as described in
[Destination resolution](#destination-resolution) before setting the `Ready`
condition to `true`. If any of the addressable fields fails resolution, the
Subscription MUST set the `Ready` condition to `false`, and at least one

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason here false is used while on 219:220 MUST NOT be true is used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

219-220 says:

if the channel does not exist or its Ready condition is not true, the Subscription's Ready condition MUST NOT be true.

There are three values for a condition: true, false, and "" (unknown). If a Channel's Ready status is unknown, a Subscription's Ready status could also reasonably be unknown.

Here, we definitively know that there is a problem (resolution failed, i.e. it's not in progress anymore), so we should affirmatively set Ready to false, rather than allowing it to be unknown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, my the way -- this tri-state nature of conditions is subtle!

`status.physicalSubscription` URIs to the empty string if the corresponding
`spec` reference cannot be resolved.

At least one of `spec.subscriber` and `spec.reply` MUST be set; if only

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the behavior is equivalent to setting spec.subscriber

Does that mean that it's a MUST? i.e. the channel controller MUST append to the subscribers the one specified in reply? if so, is there a reason MUST is not used here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #30; Doug pointed out that this would be simpler to require spec.subscriber all the time, and Lionel / Ville agreed on Slack.

We can revisit in #30 if we think there's a good reason to allow subscriber to be unset and reply set instead.

Once created, the Subscription's `spec.channel` MUST NOT permit updates; to
change the `spec.channel`, the Subscription MUST be deleted and re-created. This
pattern is chosen to make it clear that changing the `spec.channel` is not an
atomic operation, as it might span multiple storage systems. Changes to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may ask, what do you mean by it might span multiple storage systems here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change:

spec:
  channel:
    apiVersion: messaging.knative.dev/v1
    kind: Channel
    name: foo-kafka
  ...

to

spec:
  channel:
    apiVersion: messaging.knative.dev/v1
    kind: Channel
    name: foo-in-memory
  ...

Even if the same events are delivered to foo-kafka and foo-in-memory, changing the Subscription from one to the other may cause duplicate or missed events from the logical stream.

We could cut this justification, since there's nothing required in that block.

the Subscription or the Channel), the Subscription MUST only set the `Ready`
condition to `true` after the Channel has been configured to send all future
events to the Subscription's `spec.subscriber`. The Channel MAY send some events
to the Subscription before prior to the Subscription's `Ready` condition being

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the Subscription before prior to the Subscription's Ready condition being

Typo, delete prior or before.

@evankanderson
Copy link
Member Author

I know I am late to the party and this looks like a very late review so I'm gonna leave my comments fwiw. Late better than never. I only reached 1/4th of the control plane. Will keep adding my comments in batches.

No problem, I'm happy to make fix-up PRs (or you can). This started as a clearnup, but I discovered that both the content (all SHOULD requirements) and the structure (mixed control and data plane requirements, schemas by reference).

I'm hopeful that all the PRs going forward can be smaller or more focused. 😁

Copy link

@devguyio devguyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evankanderson fwiw, I realized that I had these two questions pending and were not submitted.

Broker can retry event delivery beyond the duration of receiving the event).

For each event received by the Broker, the Broker MUST evaluate each associated
Trigger **exactly once** (where "associated" means a Trigger with a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason behind this exactly once ? What about cases where the broker introduces some trigger filters optimization and/or caches the filters state? I can imagine for some cases the Broker would choose to not even evaluate any associated Trigger. Am I missing something?


### Event Delivery

Once a Trigger or Subscription has decided to deliver an event, it MUST do the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason behind being explicit that:

  1. The trigger is the deciding component? Can't it be the broker itself in some designs based on the filters?
  2. That step 1. needs to take place each time a delivery decision is taken? For instance what if object's status are cached and not read for every event delivery?

odacremolbap pushed a commit to odacremolbap/specs that referenced this pull request Jun 19, 2022
Signed-off-by: Alfusainey Jallow <alf.jallow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/enhancement lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate and update Specs for correctness
9 participants