Create super-types for relationships #4724

cmgrote · 2021-02-23T17:11:50Z

At the moment none of the RelationshipDefs have a super type defined, and I was wondering if there was some rationale behind this: for example, would it mean that the ends of the relationship are fixed by the supertype (?)

I raise it because there is significant duplication in the properties of certain relationships, particular in the area of GlossaryTerms, where all of these relationships have almost exactly the same properties defined (description, expression, status, steward, and source):

Antonym
ISARelationship
PreferredTerm
RelatedTerm
ReplacementTerm
SemanticAssignment (has one additional property confidence beyond those above)
Synonym
TermHASARelationship (though this is missing expression compared to the others)
TermISATypeOFRelationship (also missing expression compared to the others)
TermTYPEDBYRelationship (also missing expression)
Translation
UsedInContext
ValidValue

(There are ~10 more that have steward defined, and a few other overlaps.)

Aside from the question of consistency (see minor inconsistencies above), there is also the question of the performance impact of this. Per #4647 we agreed that we should not have a single "namespace" for all properties; however, in most (if not all?) of the examples above, I would expect that these 5 properties all have the same meaning and representation.

In other words, that they'd be relationship subtypes of a more abstract relationship type like GlossaryTermRelationship.

The impact of having so many identically-named properties across types is that when a consumer runs a search through our find... methods there is no mandatory qualification of the type against which they want results. For an underlying repository (both the graph repository and the Crux repository), this means that we need to do an OR-based query across ~20 different underlying properties (treating each type as a separate property) to find all that match -- combined with the fact that since many of these properties are string values, a consumer could be sending their request with an arbitrary regular expression -- and this therefore ends up being very expensive (slow).

If we could use a supertype, at least the property would only be registered against a single type in the underlying repositories, and then we could do a single query for the property rather than what ends up being the equivalent of ~20 distinct queries whose results are all combined.

The text was updated successfully, but these errors were encountered:

cmgrote · 2021-02-23T17:20:14Z

To illustrate, this query on the findRelationshipsByProperty method:

{
    "class": "PropertyMatchFindRequest",
    "matchProperties": {
        "class": "InstanceProperties",
        "instanceProperties": {
            "steward": {
                "class": "PrimitivePropertyValue",
                "instancePropertyCategory": "PRIMITIVE",
                "primitiveDefCategory": "OM_PRIMITIVE_TYPE_STRING",
                "primitiveValue": "someone"
            }
        }
    },
    "pageSize": 2,
    "matchCriteria": "ALL"
}

currently must turns into something like this query on the back-end:

{
 :find [e],
 :where [(or [(text-search :relationshipProperties/Antonym.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/DataClassAssignment.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/ForeignKey.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/ISARelationship.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/KnownDuplicateLink.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/LibraryCategoryReference.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/LibraryTermReference.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/PreferredTerm.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/ReferenceValueAssignment.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/RelatedTerm.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/ReplacementTerm.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/SemanticAssignment.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/Synonym.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/TermHASARelationship.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/TermISATypeOFRelationship.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/TermTYPEDBYRelationship.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/Translation.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/UsedInContext.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/ValidValue.steward.value "/someone/") [[e _]]]
             [(text-search :relationshipProperties/ValidValuesMapping.steward.value "/someone/") [[e _]]])
        [e :type.category 8]],
 :order-by [[e :asc]]
}

... when it could be as simple as this (if there were a supertype with these common properties):

{
 :find [e],
 :where [(text-search :relationshipProperties/GlossaryTermRelationship.steward.value "/someone/") [[e _]]]
        [e :type.category 8],
 :order-by [[e :asc]]
}```

cmgrote · 2021-02-23T17:45:43Z

Actually I have similar questions / concerns, though maybe somewhat different, with regards to the name and displayName properties that are distinctly defined across ~30-40 different EntityDefs. Given it would be optional anyway, it would seem to me that anything we must assign a qualifiedName (every Referenceable, since it's mandatory) we might reasonably expect could (since it would be optional) be assigned a name for display purposes?

That way we'd have greater consistency across the types, and a vastly reduced search space for the underlying repositories when someone is searching for something based just on its name.

davidradl · 2021-02-24T09:35:15Z

@cmgrote some thoughts on this. I can see that finds for relationship properties would be simpler if there were super types. Early on we decided to not have super types for relationships (back in the Altas days). Many repositories do not have support for top level relationships, requiring them to support inheritance as well would seem like a very high barrier to overcome. Another consideration is that an OMAS could expose inherited relationships. In fact subject area does this with the relationships extending Line. I don't think it exposes a find on a Line - you can raise this as an enhancement if you like.

cmgrote · 2021-02-25T10:42:40Z

Discussed on 25/02 and we all seem to agree this would be a good idea, but we all expect that it will cause various unforeseen breakages -- so needs to be introduced early in a release cycle for thorough testing.

Ideal approach to attempt initially:

Replace the existing type (rather than patching), so we don't end up with deprecated attributes at the lowest-level that are actually inherited (and should not be deprecated) from a higher level
Ensure that relationship types that have a supertype have ends that only ever narrow (further subtype) their parent's end types

Note: this is different from an OO-interpretation of "supertype" and "subtype" -- where a subtype would need to be capable of doing everything (and more) than its supertype, these constructions would make it such that the subtype will do less (be more restrictive than) the supertype.

planetf1 · 2021-03-03T12:00:21Z

In the call we discussed setting up a feature branch to ease integration testing, to avoid breakage in master whilst we
work through all the consequential changes

I propose to create a new feature branch 'egeria-feature-4724' but open to other name suggestions.. (4724 being linked to this id). Or we could do 'egeria-feature-superrel' or similar. Best not to be too long. cc: @mandy-chessell @davidradl @grahamwallis . I'll check the build pipelines are ok, and propose to do this tomorrow after the status call unless there is followup discussion here/keenness to get started sooner

cmgrote · 2021-03-03T12:07:59Z

I think that should be fine -- we need to do the usual cut-over of the OpenMetadataTypesArchive class post-release for master first, anyway (see #4849).

planetf1 · 2021-03-03T12:35:13Z

Ok - add a comment here when ready and I'll setup & check

cmgrote · 2021-03-05T12:13:47Z

@planetf1 ok, I think we're ready for the new branch now -- I'd be fine with feature-relationship-supertypes as a name (not sure why everything needs to be prefixed with egeria- as it seems a bit redundant?)

planetf1 · 2021-03-05T13:02:37Z

@cmgrote just convention really -- the original release started as egeria-release-xx and we took it from there. I started with egeria-feature-xx when we looked at OLS and we discussed/agreed about the feature proposal there as per . It's good to have a release or feature to help with wildcards in the github actions, but agreed egeria is redundant and is only there historically.

A few formatting errors - but this was the documented process: https://github.com/odpi/egeria/blob/master/developer-resources/Feature-Branch.md

I do think egeria-feature-relationship-supertypes is quite cumbersome. Technically I have little concerns about changing it - beyond ALSO updating the other cicd files (before the branch), which means a little more testing & also should gather more support for a proposal to change the policy & update the guidelines accordingly?

cmgrote · 2021-03-05T16:47:39Z

Fine for me to gather more support and update the process accordingly... If we want to limit the length of the names, it's a bit odd to me to waste 7-15 characters in that limited length every time by forcing every name to begin with the same 7-15 character string 🤷

In fact, taking this a step further, if we use the first x- to give the branch some decipherable meaning, then I'd make this part the short bit (e.g. just f- for a feature branch) rather than restricting the actually-meaningfully-distinguishing characteristics of the branch itself the part that gets squashed.

So rather than something like egeria-feature-superrel which forces me to me to ponder why we're mis-spelling the word "squirrel" and what this branch could possibly be about, it would make far more sense to me to call it f-relationship-supertypes (which is about the same number of characters overall, but far more descriptive).

planetf1 · 2021-03-08T10:41:53Z

I suggest we discuss and agree with the team on the call tomorrow.
Technically we just want a prefix
Reduction of length was purely a human interaction concern
I agree egeria-* is redundant (but let's not change existing - ok for new)
For anyone landing on our repo 'f-' may be a little crypic

cmgrote · 2021-03-08T19:57:06Z

To discuss and agree on 09/03/2021 call:

Naming of the feature branches
Bubbling up of name / displayName (see 3rd comment in this stream)

mandy-chessell · 2021-03-09T07:18:11Z

On the question of name vs displayName - in most cases I can think of, the distinction between name and display name is important and we make use of it. I would be cautious about loosing the distinction.

Typically "name" is used for technical assets - you can see it occurring frequently in Area 0. The name is typically the technical name used in the digital landscape. It is extracted automatically and is not what someone has chosen to be a meaningful display name. In these entities the qualified name is manufactured by the metadata extraction process often based on the name.

The displayable (human authored) names for these entities are added later using a GlossaryTerm linked with the SupplementaryProperties relationship. So for example, in the Asset Manager OMAS interface, Assets have both technical names and business names. The technical names go into the Asset entity and the business names go in the attached GlossaryTerm entity. Using two entities for this is important since the Asset entity is often read-only to the open metadata ecosystem.

The displayName property typically appears in metadata elements that are human authored - such as glossary terms, governance zone definitions, ... The author may or may not control the qualified name, but they can choose the display name.

It would be interesting to understand if we see inconsistencies in the use of these two property names in the types.

mandy-chessell · 2021-03-09T08:29:06Z

I did a quick review of the types and while most use of name vs displayName is appropriate there are some notable exceptions:

SchemaElement uses displayName rather than name (0501)
ProcessVariable uses displayName - it could be argued that it needs name too? (0525)
Port uses displayName rather than name (0217)

cmgrote · 2021-03-09T09:18:16Z

I also just did some quick counts (rough numbers that may be off by a few each):

name is defined on 24 different type defs
displayName is defined on 34 different type defs
description is defined on more than 100 separate type defs (!)

cmgrote · 2021-03-09T10:10:13Z

I like the distinction between technical and non-technical, but could this not simply be a characteristic of the instance type in which the property has a value? An area 0 type with a name populated is a "technical name" because area 0 has technical types, while a glossary type with a name populated would be a "human authored" one since the glossary is human-authored (?)

Taking the example where we have both (Asset Manager OMAS): the name from the technical Asset is then clearly a "technical name" (because the Asset is technical) while the name from its associated GlossaryTerm is a "business name" (since the GlossaryTerm is human-authored).

(Then we don't even have the risk of things becoming inconsistent per the examples above with technical things using displayName...)

cmgrote · 2021-03-09T10:30:57Z

On the branch naming point, agreed on 09/03 call to use feature- as the prefix going forward.

Signed-off-by: Christopher Grote <chris@thegrotes.net>

mandy-chessell · 2021-03-09T11:09:36Z

I think it is more helpful to maintain the different property names since the values have different expectations on their content. I like the idea of updating SchemaElement and Port to use name and deprecate displayName - and add name to ProcessVariables.

cmgrote · 2021-03-09T11:37:59Z

What about introducing 2 new entity typedefs in-between that distinguish between these "technical" and "human-authored" types?

For example (purely theoretical names for the types):

Technical, which has Referenceable as a supertype, with properties: name
HumanAuthored which has Referenceable as a supertype, with properties: displayName, description, (steward?), etc
Asset, SchemaElement, etc have Technical as their supertype
Glossary, GlossaryTerm, etc have HumanAuthored as their supertype

This would:

make the distinction between name and displayName more obvious
enforce consistency in their usage by any subtypes
significantly reduce the current "duplication" of these properties being defined across each type individually

mandy-chessell · 2021-03-09T12:50:51Z

I think it works for the Technical Entity - but not for the human authored because we have human authored entities that do not have display name

#4724 Changes feature branch naming guidance as agreed

mandy-chessell · 2021-03-09T12:53:09Z

The way the model is today is that there are these sorts of supertypes for each major grouping of entity types. Also there are entities that do not inherit from referenceable that have name or display name.

planetf1 · 2021-03-09T17:52:14Z

Have added PR #4887 to update the cicd actions for egeria - sometimes the scripts have assumptions which can lead to clashes of docker images, snapshots etc since we have multiple branches with same pom version etc. fortunately we were ok with the current definitions. But they are needed to allow the branch checks to be done. (if we want those tests)

planetf1 · 2021-03-09T18:06:13Z

I've also updated the release patterns (annoyingly we have a revert branch - so it's an addition) and I think the pipelines as above are working. Should be ok to use now as long as #4890 builds ok. The checks did result in master versions of docker and website being overwritten but the next master merge overwrote so should be good now. (those actions may benefit from additional verification, they currently have assumptions)

Signed-off-by: Christopher Grote <chris@thegrotes.net>

github-actions · 2021-05-09T00:12:21Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions · 2021-07-25T00:18:06Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

cmgrote added question Not a bug, asking a question functionality-call To discuss and agree during weekly functionality call(s) labels Feb 23, 2021

cmgrote self-assigned this Feb 25, 2021

cmgrote mentioned this issue Feb 25, 2021

Add differentiation between primary and secondary categories odpi/egeria-connector-ibm-information-server#491

Closed

cmgrote removed functionality-call To discuss and agree during weekly functionality call(s) question Not a bug, asking a question labels Feb 26, 2021

cmgrote mentioned this issue Feb 26, 2021

findEntities with multiple conditions does not return entities odpi/egeria-connector-ibm-information-server#490

Open

cmgrote changed the title ~~Rationale for relationships never having a supertype?~~ Create super-types for relationships Mar 1, 2021

cmgrote added consumability Makes the software easier to use or understand. Includes docs, error messages, logs, API definitions optimization Non-breaking, but could be more efficient performance Performance & Scalability/reliability related labels Mar 1, 2021

cmgrote added the functionality-call To discuss and agree during weekly functionality call(s) label Mar 8, 2021

cmgrote added a commit to cmgrote/egeria that referenced this issue Mar 9, 2021

odpi#4724 Changes feature branch naming guidance as agreed

66a3c5c

Signed-off-by: Christopher Grote <chris@thegrotes.net>

cmgrote added a commit that referenced this issue Mar 9, 2021

Merge pull request #4883 from cmgrote/docs

fbeb4e6

#4724 Changes feature branch naming guidance as agreed

cmgrote added a commit to cmgrote/egeria that referenced this issue Mar 10, 2021

odpi#4724 retroactive changes for supertypes on relationships

a505c17

Signed-off-by: Christopher Grote <chris@thegrotes.net>

github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label May 9, 2021

mandy-chessell removed the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label May 25, 2021

github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jul 25, 2021

github-actions bot closed this as completed Aug 15, 2021

mandy-chessell reopened this Aug 16, 2021

mandy-chessell added pinned Keep open (do not time out) and removed no-issue-activity Issues automatically marked as stale because they have not had recent activity. labels Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create super-types for relationships #4724

Create super-types for relationships #4724

cmgrote commented Feb 23, 2021

cmgrote commented Feb 23, 2021

cmgrote commented Feb 23, 2021

davidradl commented Feb 24, 2021

cmgrote commented Feb 25, 2021 •

edited

Loading

planetf1 commented Mar 3, 2021

cmgrote commented Mar 3, 2021 •

edited

Loading

planetf1 commented Mar 3, 2021

cmgrote commented Mar 5, 2021

planetf1 commented Mar 5, 2021

cmgrote commented Mar 5, 2021 •

edited

Loading

planetf1 commented Mar 8, 2021

cmgrote commented Mar 8, 2021 •

edited

Loading

mandy-chessell commented Mar 9, 2021 •

edited

Loading

mandy-chessell commented Mar 9, 2021

cmgrote commented Mar 9, 2021

cmgrote commented Mar 9, 2021 •

edited

Loading

cmgrote commented Mar 9, 2021 •

edited

Loading

mandy-chessell commented Mar 9, 2021

cmgrote commented Mar 9, 2021

mandy-chessell commented Mar 9, 2021

mandy-chessell commented Mar 9, 2021

planetf1 commented Mar 9, 2021 •

edited

Loading

planetf1 commented Mar 9, 2021 •

edited

Loading

github-actions bot commented May 9, 2021

github-actions bot commented Jul 25, 2021

Create super-types for relationships #4724

Create super-types for relationships #4724

Comments

cmgrote commented Feb 23, 2021

cmgrote commented Feb 23, 2021

cmgrote commented Feb 23, 2021

davidradl commented Feb 24, 2021

cmgrote commented Feb 25, 2021 • edited Loading

planetf1 commented Mar 3, 2021

cmgrote commented Mar 3, 2021 • edited Loading

planetf1 commented Mar 3, 2021

cmgrote commented Mar 5, 2021

planetf1 commented Mar 5, 2021

cmgrote commented Mar 5, 2021 • edited Loading

planetf1 commented Mar 8, 2021

cmgrote commented Mar 8, 2021 • edited Loading

mandy-chessell commented Mar 9, 2021 • edited Loading

mandy-chessell commented Mar 9, 2021

cmgrote commented Mar 9, 2021

cmgrote commented Mar 9, 2021 • edited Loading

cmgrote commented Mar 9, 2021 • edited Loading

mandy-chessell commented Mar 9, 2021

cmgrote commented Mar 9, 2021

mandy-chessell commented Mar 9, 2021

mandy-chessell commented Mar 9, 2021

planetf1 commented Mar 9, 2021 • edited Loading

planetf1 commented Mar 9, 2021 • edited Loading

github-actions bot commented May 9, 2021

github-actions bot commented Jul 25, 2021

cmgrote commented Feb 25, 2021 •

edited

Loading

cmgrote commented Mar 3, 2021 •

edited

Loading

cmgrote commented Mar 5, 2021 •

edited

Loading

cmgrote commented Mar 8, 2021 •

edited

Loading

mandy-chessell commented Mar 9, 2021 •

edited

Loading

cmgrote commented Mar 9, 2021 •

edited

Loading

cmgrote commented Mar 9, 2021 •

edited

Loading

planetf1 commented Mar 9, 2021 •

edited

Loading

planetf1 commented Mar 9, 2021 •

edited

Loading