-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining with JSON Schema specification for validation? #1052
Comments
I see that the JSON Schema being used to validate the JSON format is a draft-07 schema. Just in case folks aren't aware, the more recent draft 2020-12 (core, validation), which has also been adopted by OpenAPI 3.1, has the concept of extension vocabularies. If JSON Schema covers some but not all of your needs, extension vocabularies could be a way to bridge the gap. OpenAPI 3.1 has an extension vocabulary for this purpose. I mention OpenAPI both to bring up their use of an extension vocabulary, and to note that their adoption means that there will be a larger demand driving tooling support for 2020-12 than for any other JSON Schema draft since draft-04. |
I found some change logs between draft-07 and 2020-12 that make it easier to see what has changed: |
There's also a tool that's being developed to perform this transition. |
I'd like to know more about what's being proposed. I would expect the As an aside, I would prefer not to end up with anything in the 1.0 spec which is still draft - is there any notion of JSON schema ever actually getting to "1.0"? |
In regard to a schema for cloud events, the
We (JSON Schema) have decided to split from our involvement with IETF, and are pursuing other publication methods. As such, we're exploring what that looks like. That said, draft 7 is wisely used in production environments, and implementations are steadily starting to support draft 2020-12 (the latest). We're getting close to a "1.0," but we're not quite there yet. Even so, this shouldn't hinder adoption. |
@jskeet You can read the discussion of our standards approach, and do note that we are still working with the IETF HTTPAPI working group to register the JSON Schema media types. It is the rest of the specification for which we are looking at different publication approaches. |
What is the intent behind the I'm interested in a mechanism to distinguish between (1) validation of the payload, and (2) validation of the envelope. Another area where I can see this functionality being helpful is when certain CloudEvents extensions are required for a particular implementation. If this envelope validation needs to be implemented with an extension of CloudEvents, we lose a bit of interoperability. |
Also, @jskeet I saw some comments that versioning was out of scope of CloudEvents, so please let me know if this falls into that category, but I'd like a way to distinguish between version of the envelope (which is currently done nicely by I could potentially put this payload version into some kind of metadata context object that would exist in every schema, but then we need to enforce that it exists in every payload across every protocol, which gets complicated. So, it seems like that field would fit better in the envelope. |
@devinbost: We have some guidance on that in https://github.com/cloudevents/spec/blob/main/cloudevents/primer.md#versioning-of-cloudevents EDIT: Whoops - just seen that's the note you referred to. |
@jskeet I went back through that doc more carefully and noticed some commentary about how the version could be included in the URI of the |
@devinbost: I think it's really up to providers, to be honest. I'm not entirely sure what the request/proposal is here - and I may well not be the best person to comment on that request/proposal anyway. (It's also unclear whether the "we" in your messages is a general internet "we" or a specific organization with specific needs - if you could clarify that, it would be useful.) |
Maybe we can create an extension attribute called |
We talked on the call about that idea of creating such My personal opinion, this idea is very complex and does not worth it. |
Can you elaborate on this? I'm unfamiliar with extensions in regards to CE. |
@gregsdennis each intermediary MAY add or remove optional attributes because nothing prohibits it to do so. |
I think it's actually pretty simple to achieve. So let's say the schema uses the // generic cloud event (unknown content)
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://cloudevents.io/schema/envelope",
"$defs": {
"data": {
"$dynamicAnchor": "data",
"not": true
},
"type": "object",
"properties": {
"data": { "$dymamicRef": "#data" },
...
}
// cloud event for "person updated"
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://myserver.com/schema/person",
"$defs": {
"person": {
"$dynamicAnchor": "data",
"firstName": { "type": "string" },
...
}
},
"allOf": [
{ "$ref": "https://cloudevents.io/schema/envelope" }
]
} NOTE I changed the above from the blog post to put the Someone receiving a "person updated" event could validate the entire event (envelope and payload) using So let's say you're an intermediary, and you want to add Even if you wanted to add {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://myserver.com/schema/person",
"$defs": {
"person": {
"$dynamicAnchor": "data",
"firstName": { "type": "string" },
...
}
},
"allOf": [
{ "$ref": "https://cloudevents.io/schema/envelope" },
{
"properties": {
"data": {
"myAttr": { "type": "string" }
}
}
}
]
} Each intermediary can add to the NOTE You should be aware that changing the schema in-flight should probably require a new |
@sasha-tkachev I apologize for missing the call. I'll be there at the next one.
This is exactly the scenario I'm concerned about. What we were observing is actually the opposite. Let me explain with the following scenarios. SCENARIO 1: An event contract is changed by a producer upstream. Since there's no way to enforce validation on the producers, it triggers an entire cascade of breaks across multiple consumer flows. Consider this flow: SCENARIO 1.1: Producer P1 made a change in unversioned code or forgot to increment the version, so there was no way for consumers to know some events had a different contract. This is more of an issue on the producer side, so I won't say much here other than that an additional validation layer could have caught the mistake. SCENARIO 1.2: Consumers didn't know that they needed to validate messages against the version in the URI or didn't know how to parse the URI to extract the version to make a check. Only when their code starts throwing exceptions do they inspect incoming messages and notice there has been a version change. Not all companies have analytics that allow them to observe which downstream teams are consuming which versions of which upstream events, so it can be hard (or even impossible) for producers to know which teams they need to communicate with to ensure downstream teams can handle breaking changes. Someone could say that implementations should be obvious, but that's not always true. SCENARIO 2: We also saw cases where a break occurred due to an intermediate dependency. For example, a change in P1 results in a change to the behavior of B1, but the exception is thrown farther down the flow in B3 in code owned by a different team. Now there's a communication problem as the owners of B3 struggle to find the root cause since -- as far as they're concerned -- the messages' contracts shouldn't have changed upstream, right? (These are the cases that have caused some of the most severe production outages since it took significant time for teams to track down the cause.) SCENARIO 3: P1 cuts a new minor version. Teams owning apps B, C, and D knew about a coming change but incorrectly didn't think they would be impacted. Whether or not they would be impacted is a concern that should be addressed through validation, like a JSON Schema, tied to the version. If a message with a new version validates successfully against the schema for the prior version, then consumers can trust that their implementations will still successfully process the message. So, we have a mechanism to check against backward compatibility if we use JSON Schema to validate the envelope. The schema in this case can inform consumers if a message is still valid or not, based on what their expectations are. (Keep in mind that consumers can have stronger validation via JSON Schema if needed based on features they're implementing, and standardizing on JSON Schema makes that validation easier to maintain in general.) CONCRETE EXAMPLE 1: ADDITIONAL BUSINESS CASES:
I'm sure I can think of more cases if I search my memory, but hopefully, this is a good start.
|
One other important case I forgot to mention: SCENARIO 4: P1 needs to cut a new version but wants to remain backward-compatible, so it starts emitting both new and old versions of events. Consumers need a way to filter (by version) to only the messages of interest and perform different validation depending on the version they're interested in, but let's assume they can do this by version information in the URI or type. One advantage here for supporting JSON Schema validation of the envelope is support for automatic upgrades; if a new version passes existing validation for the consumer in question, then the consumer can automatically switch to consuming the new event version. |
@gregsdennis So you are saying the the schema MAY validate only some of the attributes, correct? However the explanation @devinbost gave is very detailed and I have changed my opinion on the usability of such an extension. I addition to the definition of the |
Hi @sasha-tkachev , regarding your comment:
what do you mean by "canonical attributes representation format"? I suppose there's an open question for how an implementation should interpret the URI provided in atterschema . JSON Schema seems to be the obvious choice for JSON messages, but due to a lack of standards for schema validation for non-JSON types, consumers may need more information to know how to interpret the atterschema URI if the event is non-JSON. Is that what your comment is about? Also, I reviewed the recording from the last working group meeting. I can clarify some points raised in that meeting. @jskeet I apologize if I was confusing/unclear at the start of this thread! Do my examples above (and in the comment below that one) help you understand my intent for this? I hope they make the intent clearer. @JemDay @duglin @clemensv , regarding your concern that subsequent events may need to add attributes, if I'm understanding the question correctly, in that case, I'd think that whatever app is responsible for adding those attributes should be updating the schema to ensure those attributes are supported, or at least they must ensure that the event they pass downstream can be validated according to the |
Thanks @devinbost for the detailed explanation of the use cases. I have the impression that this is not so much about a schema for a specific event type, though. To me the examples with URN formats or IDs being used look more like a constraint that is defined as an additional contract or convention in or between organizations. If we introduced a more general concept around this idea of constraints (others talk about contracts, conventions or characteristics), there could be pointers (URIs) to JSON schema for those who prefer this. For others a constraint could just point to a github page or a test description that explained what additional constraints were in place. Something like "our events always have trace IDs", or " If you add this as an extension attribute to each event, there is the challenge, that collections are not supported in CloudEvent attributes. I could also imagine to have this kind of information only in the catalog/discovery service. |
@deissnerk We talked about the implementation in the call two weeks ago. The constraint idea is interesting, but I think a simple schema for each attribute is enough |
@sasha-tkachev I think we have to discuss a bit more in the next call. Relying on the schema format from the discovery spec sounds good to me. If we specify the attribute in a way that it can also point to an actual event definition in the discovery service, we might both get what we want. Perhaps the notion of constraints/contracts/characteristics is something we can pick up there. An intermediary can then even enrich the discovery information if needed. Sorry, for being a bit late to the discussion. I had to leave the call two weeks ago very early because of an unplanned, urgent matter. |
@deissnerk @devinbost |
@sasha-tkachev (or anyone else) what's the status of this issue? |
My proposal was rejected |
This issue is stale because it has been open for 30 days with no |
is this one still under discussion or should we close it? |
This issue is stale because it has been open for 30 days with no |
EDIT: For anyone first reading this, please see this comment that introduces the business case: #1052 (comment)
I don't know if this is the right place to start this discussion, but I've noticed that there is some overlap between the JSON side of CloudEvents and the features of json-schema specifically around validation. In large part, it seems like they were developed to solve different problems, but I'm wondering if I could get folks from both sides to start a discussion on the feasibility of allowing json-schema to be used within events that conform to the CloudEvents specification. If we could write bindings for CloudEvents and then use json-schema to perform validation, that would open doors for us. But, maybe there are other solutions in the industry that would solve this problem.
I'd like to hear some thoughts on this.
The text was updated successfully, but these errors were encountered: