Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asset 504: bad answer ID for thrombin, duplicate of 503? #84

Open
colleenXu opened this issue Jun 28, 2024 · 6 comments
Open

Asset 504: bad answer ID for thrombin, duplicate of 503? #84

colleenXu opened this issue Jun 28, 2024 · 6 comments
Assignees
Labels
Bad Asset Curie The given asset curie is not correct

Comments

@colleenXu
Copy link

Asset 504 currently uses CHEMBL.COMPOUND:CHEMBL2108110 as the expected answer ID for thrombin.

However, this is a chemical ID - and this appears to be looking for genes that have decreased activity/abundance with the input chemical Lepirudin.

I imagine the gene/protein ID should be used instead - but then this is the same as Asset 503 (which has another issue #83).

Should this test be removed as a duplicate?

@colleenXu
Copy link
Author

colleenXu commented Jun 28, 2024

This means there's some vagueness around what thrombin is (F2 gene/multiple proteins/chemical thing).

Alpha-thrombin UMLS:C0002313 is mentioned in #62 as a related thing. It's a ChemicalEntity in NodeNorm, but a Protein in ARAX's response.

@maximusunc
Copy link
Collaborator

Running thrombin through Name Resolver gives CHEBI:9574 which doesn't seem to be a duplicate of any existing assets. Should this asset just be updated to this?

@maximusunc maximusunc added the Bad Asset Curie The given asset curie is not correct label Jul 12, 2024
@colleenXu
Copy link
Author

colleenXu commented Jul 16, 2024

I don't think this query will return chemicals as "answers".

The query seems to be an MVP2 with the input chemical (Lepirudin), so the output would be genes with decreased activity/abundance.

@sandrine-muller-research

Sorry - I am just coming back to this (I was away for quite some time). For all assets I have done, the ID reported was the IDs outputed by the UI. I did this on purpose because issues in normalization need to be adressed as well. So to me, they should raise other tests for nomalization issues. If we fix them by hand here, will we be able then to come back and list all issues in normalization for another suite?

@sandrine-muller-research

Assets 503 and 504 are not duplicates, they are "unconflated" results: because of conflation, they are all top answers but in reality they are different elements:
UNIPROTKB:P00734 is prothrombin which is the protein from F2 gene
this protein is further cleaved into thrombin (CHEMBL.COMPOUND:CHEMBL2108110) via metabolic reaction

There is an issue with output nomalization that leads to 2 different results (from different ARAs) that are both true, they just appear in 2 different clique. Please note that Thombin is a serine protease so it is not a compound but a protein and with conflation on, F2/Thombin/Prothrombin are all true.

@sandrine-muller-research

Please assign this email address/GitHub account in the future, I do not receive any notification with the other one. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bad Asset Curie The given asset curie is not correct
Projects
None yet
Development

No branches or pull requests

4 participants