Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a command line utility (probably Python) to calculate the SAIDs in an OCA Bundle #146

Open
swcurran opened this issue Oct 18, 2024 · 6 comments

Comments

@swcurran
Copy link
Contributor

Background: The OCA Bundles that we generate and use are not fully compliant with the OCA because because we don't always maintain the digest attribute in all of the overlays. The branding credential never gets a digest, and if we manually update the OCA Bundle (vs. using the Excel), but digests don't get updated. The task is to write a command line utility that given an OCA Bundle, (re)calculates all of the digests in the bundle. An option on the command will verify (but not update) the digests in an OCA Bundle file.

An OCA digest is populated through the calculation of a SAID (Self Addressing IDentifier). A SAID is generated by hashing the content of the object (in this case, a JSON object) in a formally specified way to be used as the SAID, and then inserting the identifier into the object itself. That allows a verifier to do when they find a reference to a SAID, is to recalculate the SAID (e.g., hash the same data again in the same way) and make sure that the content has not changed. More background on SAIDs can be found here: https://kentbull.com/2024/09/22/keri-series-understanding-self-addressing-identifiers-said/ by @kentbull. For the purpose of this task, the most important part of the document is the the section on the seven steps to creating a SAID.

This task is to write a command line utility that implements (or borrows) the "seven steps" described in the blog post to generate a SAID, and then use that to process an OCA File. Given an OCA File (such as this), the steps of the process are:

  1. Iterate over the outer array. Typically with Aries OCA Bundles, we only have one OCA Bundle in the array, but assume there could be multiple. For each:
    1. Calculate the SAID for the capture_base object, and update its digest property with the SAID value.
    2. Iterate through the list of overlays, and for each:
      1. Update the value of the capture_base property with the SAID calculated for the capture_base object.
      2. Calculate the SAID for the Overlay and update its digest property with the SAID value.
    3. Find, or if not present add, a property d to the root of the OCABundle (beside capture_base) that will be a SAID, generate its value over the entire bundle and set its value to be the calculated SAID.

Add an option to verify the SAIDs of an OCA Bundle. Given an OCA Bundle, iterate through the Bundle and make sure that all of the SAIDs are accurate. Allow for the OCA Bundle level d SAID to NOT be present -- don't error if it is missing.

@swcurran
Copy link
Contributor Author

Once you have something sort of working, we’ll find test cases for you to try. Anything produced from processing an OCA Bundle Excel file should have the right SAIDs, and so is a valid test case for you. For example, our bundles should have the right digests in the capture base and core bundles.

@blelump
Copy link

blelump commented Nov 7, 2024

@swcurran, look at the recent proposals published within the OCA spec repo, particularly the-human-colossus-foundation/oca-spec#74. Our goal is to enable the community to create and exchange overlays via a standard format: the OCA Bundle, whether stored in an OCA Repo or not. In essence, we want to cover creation, serializing, and deserializing the Bundle in the most popular environments so that, as a consumer, you get the integrity and other details like SAIDs covered by the tolling.

@swcurran
Copy link
Contributor Author

swcurran commented Nov 8, 2024

@blelump, I read over the issue you pointed to. I can’t really tell from that what exactly is being proposed. It mainly talks about tooling (execution) and governance (provenance), which in my opinion are (and must be) outside the specification. As such, I’m happy for anyone to build tooling and to define governance, as long as it is outside of the specification, and that such tooling can be created by anyone based on what is stated in the specification.

I think the goal of constraining OCA such that EVERY consumer can understand EVERY OCA Bundle is extremely constraining and pretty much impossible unless there is only one implementation. Why should our use of OCA for displaying credentials have to understand how OCA is being used to describe scientific data sets, or Pharma industry overlays? All of the producers and consumers in our ecosystem need to have a common understanding, but that should not have to include how OCA is used by ecosystems that we have nothing to do with.

In my opinion, the DSL talked about in issue 68 (the OCAFile) is tooling and so not part of the specification. I’m happy for it to be a separate specification that references the OCA spec. — a mechanism such that OCA Bundle producers can create compliant OCA Bundles. But it should not be in the spec, since it is Producer-only convenience tool, and the Consumer of an OCA Bundle need not know anything about it.

I still very much want to see a tightened up specification that describes what we are doing today — especially around the calculation of SAIDs. We had a good session at IIW (at the pub in the evening, no less) with a group coding and verifying the SAID calculation. Turns out it is a single line (with multiple procedure calls) in JavaScript. I would be great to have it defined in the spec where it should exist and how to calculate it.

@blelump
Copy link

blelump commented Nov 9, 2024

The proposal aims to strengthen the OCA ecosystem by enabling participants to integrate more efficiently rather than considering separate implementation. We've also considered your approach with the Branding overlay and included it in the proposal. Essentially, it enables any consumerproducer to create a Bundle with any overlays she wants, and it must be cheaper than building another implementation. At the same time, we leave the interpretation of the Branding overlay meaning for consumers. Bundle doesn't know it, and neither does the basic tooling. Therefore, the concise summary of our proposal is as follows: use our tooling to create the OCA Bundles and our libs to serialize/deserialize them within your environment. We take care of SAIDs and other mechanical details while leaving the bundle interpretation up to the consumer. We aim to create a unified OCA Bundle that spans various use cases and ecosystems, serving as the focal point of the entire concept.

In the 2nd paragraph above, you ask, Why should our use of OCA for displaying credentials have to understand how OCA is being used to describe scientific data sets, or Pharma industry overlays? and the concise answer is you don't. What we propose is the standard exchange platform: OCA Bundle. This also enables various representation formats, i.e., if your case benefits more from RFC 8785, it's a matter of writing an adapter to the Bundle. What Bundle carries within the use case one or another is up to the interpretation of the use case participants. We, therefore, by design, assume each use case takes care of use case-specific overlays interpretation and provides proper tooling for it – discussed under the Execution section of the-human-colossus-foundation/oca-spec#74 .

@swcurran
Copy link
Contributor Author

I agree with the majority of what you say, except for a couple of points:

  • The specification should be complete so as not to rely on any implementation. We can choose to use your tooling/libs, but the spec. should allow others to create their own tooling and libraries. That is my biggest struggle with OCA as it exists. The “spec” is in the HCF implementation, not clearly documented.
  • I’m about confused about the use of “consumer” in your 3rd sentence, 1st paragraph. In my view, a “producer” that wants others to understand their data should define an OCA Bundle, and a “consumer” use the OCA Bundle to gain understanding about the data to which it is to be applied. Do I have that right?

@blelump
Copy link

blelump commented Nov 12, 2024

  1. Yes, this is correct. The current spec isn't exhaustive and needs some mechanical clarifications. It is not intended to make adoption or creating another implementation more difficult by having a vague spec.
  2. I made a typo and added a correction. Thanks for clarifying this, and I apologize for the ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants