Give better chemical labels to returned responses #462

jh111 · 2023-08-08T00:34:59Z

Search What drug may treat Multiple Sclerosis.
https://ui.test.transltr.io/results?l=Multiple%20Sclerosis&i=MONDO:0005301&t=0&q=bf9d0342-0966-4cec-8122-8d87187b1ef3

One of the answer that comes up is Monoclonal antibody an100226.

This is the early name/number for natalizumab. It will be much more helpful for users to have this normalized to the current name, natalizumab.

Options:

A search in PubChem for Monoclonal antibody an100226. brings up natalizumab, and a list of depositor-supplied synonyms: https://pubchem.ncbi.nlm.nih.gov/substance/481101759
The evidence for treat includes many papers from PubMed that clearly have natalizumab in the title. Perhaps there's a way to get SemMedDB results to provide natalizumab as an answer, or to map SemMedDB DB answers.

sierra-moxon · 2023-08-08T23:16:34Z

@gaurav - is this something for Name Resolver?

sandrine-m · 2023-08-14T21:34:40Z

Tagging the ace team David and Gaurav.

gprice1129 · 2023-09-27T21:49:39Z

It isn't clear what the UI team can do about this issue. Is the idea of a "canonical name" available in the attribute server? @newgene

sandrine-m · 2023-09-27T22:19:29Z

I think Jenn means that the returned results were not normalized properly. I used Jenn's PK to load back the results using ARAX CI UI (note this is an "old" query and the ARAS are falling the validation):

I found that BTE was responsible for this result:

I RETESTED on test today and the unusual name is still popping up:

and appears twice (one with the meshID and one with the UMLS ID . Apparently both BTE and RTX-KG2 are returning that result.

I looked at RENCI name resolver for monoclonal antibody AN100226 and found that the 2 identifiers instances gets properly pooled together.

Natalizumab is part of the synonyms but is not the label. I do not know what is the rule for deciding the drug label, but my guess is that the drug label is decided at the Node Norm stage, so that is a NodeNorm issue?

EDIT:
So there are 2 issues here I think:

@gprice1129 : there are 2 results with the same compound, is it UI who normalize/group same compound together?
@gaurav : Got feedback from Chris B.: NodeNorm is where the labels get decided on, the team is discussing possible update in a coming future

gprice1129 · 2023-09-28T21:37:35Z

@sandrine-m the UI does not do any normalization, we use the normalization the ARS provides. The ARS relies on the node normalizer so most likely it is an issue with that service @gaurav @cbizon

sandrine-muller-research · 2023-10-04T20:49:26Z

From conversation through slack:
@cbizon : The label is probably coming from nodenorm, which is where we are choosing the 'best' label. We currently have an approach that has not always been well received
@gaurav aurav Vaidya (SRI)
I've added "Investigate strategies for improved preferred labels for cliques." to our priorities. I know we have some tickets with individual examples we can start working on, but if people have ideas about improving this at scale -- if a particular chemical provider has really good labels, say -- please let us know!

gprice1129 · 2024-02-01T23:55:30Z

I think we should move away from tickets with open ended definitions of success. "Give better labels" is way too broad and basically can never be finished. It would be better to create tickets with a finite set of items that should be corrected.

jh111 · 2024-02-02T02:46:54Z

@gprice1129 If I understand correctly, I think what you're pointing out is that we can't implement this until we define what output is expected, and whether it's possible to do it.

What is the relative importance of improving this user experience? Can we decide now, or do need time to get user input?
How much do users care whether the name is familar to them? Does it need to be on top, or would be ok to be able to click to get a list of synonyms?
Is there one canonical name for biomedical chemicals that are already used as drug ingredients, or do different users have different expectations?
What is the technical feasibility improving this user experience? How will we test it?

sandrine-muller · 2024-02-02T13:12:20Z

Re: deciding on the label for nodeNorm. my understanding was that sometimes, nodeNorm choosen label is not the user preferred one. Although this issue cannot be fixed right away (longterm issue, perhaps needing some user surveys as Jenn is pointing out) , I started a test asset sheet for testing chemical names based on a few searches I made using the system. Please note that this sheet was done back in November 2023 I think so perhaps the system changed since then. MolePro team was interested particularly into looking at chemical labels choosen differently between MolePro and NodeNorm to see how we can improve our system.

gprice1129 · 2024-02-02T15:57:38Z

@jh111 Having a definition for "better chemical labels" would definitely be a good idea, however, even if we had a perfect definition for "chemical label" its still unclear when the ticket can be closed: Are we talking about all the chemical labels in the system right now or all of them for all time? In my opinion it would be better if we constrained tickets of this nature to some finite set of chemical labels so whoever is working on it can have a clear goal.

jh111 · 2024-02-02T16:12:29Z

I have put on a better title, to reflect the problem/opportunity with experience for specific users, and the fact different users might want different names. There are several different technical options for how this could be addressed.

For the INN, for I think RxNorm ingredient would be a fine level of detail. For example, inFLIXimab, as opposed to inFLIXimab-abda. I don't think we need to use the uppercase (which is designed for prescription safety).

Genomewide · 2024-02-05T19:51:39Z

I think this is a node norm issue. We display whatever the canonical name is. So, @gaurav can you tell us what the rules are for this? Then maybe @jh111 can see if there are examples where that are not optimized and if optimizing those would break other terms? So, the rubric could change. However, I don't think this is a UI issue.

cbizon · 2024-02-15T18:02:06Z

Another example of suboptimal labeling is using the name "Activated Charcoal" for carbon:

https://nodenorm.test.transltr.io/1.4/get_normalized_nodes?curie=PUBCHEM.COMPOUND%3A5462310&conflate=true&drug_chemical_conflate=false&description=false

The rule that's being applied is to get the name from each source and then rank them by the same source priority as used in biolink to pick which curie is the best one.

sandrine-muller · 2024-02-16T08:50:36Z

When you say source, do you mean original sources or each team within Translator? Would it be useful then to collect the name that each source provides and learn a rule (=set of weights) that best predict the user liking (=the desired result in the test asset sheet?) The idea being that some sources have more user-friendly naming strategies than others (=higher weights).

gaurav · 2024-07-12T16:37:12Z

To deal with the simpler issue first, CHEBI:27594 "CHARCOAL, ACTIVATED" still has the wrong label (should be "carbon"). This is because we prefer CHEMBL.COMPOUND labels over others. I think I've seen other examples of CHEMBL labels being suboptimal; I wonder if we should promote CHEBI above it and see if that improves this situation (it should definitely fix this bug). I'm going to look for other reports of this before deciding whether to try this.

Now for the more complex issue: UMLS:C0665297 is present twice in NodeNorm Test -- once in a UMLS-only Protein clique, and once in a UMLS+MESH ChemicalEntity clique. These should really be merged into a single clique, but proteins and chemicals are currently produced by independent modules, so there isn't any way to merge those cliques given how NodeNorm is currently architected.

I don't think fixing this is reasonable to do within this round of Translator funding, as we'll need to rethink how Babel works.
However, I would like to see how often this happens, which shouldn't take too much effort. I'm going to schedule that part of this work for Hammerhead, but I'll see if I can do it any sooner than that.

Genomewide · 2024-07-13T23:37:53Z

Is there a way we can gather all of the examples together to look at the flavors we are talking about?
Charcoal, activated is wrong for different reasons than A synthetic peptide of 20 amino acids, comprising D-Phe, Pro, Arg, Pro, Gly, Gly, Gly, Gly, Asn, Gly, Asp, Phe, Glu, Glu, Ile, Pro, Glu, Glu, Tyr, and Leu in sequence. A congener of hirudin (a naturally occurring drug found in the saliva of the medicinal leech), it a specific and reversible inhibitor of thrombin, and is used as an anticoagulant.

@gaurav Do you have a dart board or a stress ball where you keep all of our complaints (or other place). I would be interested in seeing how to break these down and then look at the some examples from each group.

sandrine-muller-research · 2024-07-22T09:52:07Z

@Genomewide I started this sheet on my side (to become perhaps a set of tests in future for @gaurav ) it does not contain all examples and surely Gaurav has a lot more

Genomewide · 2024-07-25T00:34:18Z

How do I find what to put for Molpro? I added asset # 25

sandrine-muller-research · 2024-07-25T19:08:46Z

Thank you for adding a row to the sheet.
Here is how you can query MolePro where you put as an input ["CID:75007581"] (MolePro has internally a different set of CURIES.
However, MolePro does not know about this ID (we are tracking why at the moment) but does know about collagenase. To query by a name, use the "by_name" endpoint.
I do see it on the PubChem page that it got modified beginning of July (2024-07-20) so that is perhaps a change of ID. We are investigating. I'll keep you posted.

gaurav · 2024-07-26T09:46:29Z

@Genomewide I started this sheet on my side (to become perhaps a set of tests in future for @gaurav ) it does not contain all examples and surely Gaurav has a lot more

Thanks, @sandrine-muller-research! My list is actually much shorter :) I'll start moving your entries over in Hammerhead.

colleenXu · 2024-08-12T21:02:49Z

Just putting these here in case people are unaware of other convos:

sandrine-muller-research · 2024-08-16T14:28:05Z

Thank you Colleen!
Putting this query here as it has a good amount of extremely long names. I will need to see whether we have better chemical name, and update the test asset sheet. Will come back to this.

sandrine-m added the UI - display confusion on or overlooking information label Aug 14, 2023

sandrine-m assigned dnsmith124 and gaurav Aug 14, 2023

sandrine-m added this to the D: Fall - 2023 milestone Aug 14, 2023

sandrine-muller-research unassigned dnsmith124 Oct 4, 2023

sandrine-muller-research changed the title ~~Translator should map early drug numbers to final names.~~ Give better labels to returned responses Oct 4, 2023

sandrine-muller-research added the response labels task around getting better labels to show label Oct 4, 2023

sandrine-muller-research mentioned this issue Oct 4, 2023

imProving Agent returning results with drug names like "Pubchem.compound:6710690" instead of "Pharmakon1600-01504273" with lots of examples like this. I.e. CURIEs being returned, not names #568

Open

sandrine-muller-research changed the title ~~Give better labels to returned responses~~ Give better chemical labels to returned responses Oct 4, 2023

gaurav mentioned this issue Oct 19, 2023

IDs for Ivermectin #469

Closed

jh111 added needs review this ticket needs a broad group of people to review and assign next steps because it crosses teams and removed needs review this ticket needs a broad group of people to review and assign next steps because it crosses teams labels Feb 2, 2024

jh111 added the needs review this ticket needs a broad group of people to review and assign next steps because it crosses teams label Feb 2, 2024

sierra-moxon added the group1 label Feb 9, 2024

sierra-moxon removed needs review this ticket needs a broad group of people to review and assign next steps because it crosses teams group1 labels Mar 1, 2024

gaurav mentioned this issue Jul 12, 2024

Carbon has a preferred label of "activated charcoal" TranslatorSRI/Babel#304

Open

gaurav mentioned this issue Jul 12, 2024

Label chosen as per preferred label prefixes for chemicals doesn't always return the best name TranslatorSRI/Babel#306

Open

sandrine-muller-research self-assigned this Jul 25, 2024

sstemann removed this from the D: Fall - 2023 milestone Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give better chemical labels to returned responses #462

Give better chemical labels to returned responses #462

jh111 commented Aug 8, 2023

sierra-moxon commented Aug 8, 2023

sandrine-m commented Aug 14, 2023

gprice1129 commented Sep 27, 2023

sandrine-m commented Sep 27, 2023 •

edited

Loading

gprice1129 commented Sep 28, 2023

sandrine-muller-research commented Oct 4, 2023

gprice1129 commented Feb 1, 2024

jh111 commented Feb 2, 2024

sandrine-muller commented Feb 2, 2024 •

edited

Loading

gprice1129 commented Feb 2, 2024 •

edited

Loading

jh111 commented Feb 2, 2024 •

edited

Loading

Genomewide commented Feb 5, 2024

cbizon commented Feb 15, 2024

sandrine-muller commented Feb 16, 2024 •

edited

Loading

gaurav commented Jul 12, 2024

Genomewide commented Jul 13, 2024

sandrine-muller-research commented Jul 22, 2024

Genomewide commented Jul 25, 2024

sandrine-muller-research commented Jul 25, 2024 •

edited

Loading

gaurav commented Jul 26, 2024

colleenXu commented Aug 12, 2024

sandrine-muller-research commented Aug 16, 2024

Give better chemical labels to returned responses #462

Give better chemical labels to returned responses #462

Comments

jh111 commented Aug 8, 2023

sierra-moxon commented Aug 8, 2023

sandrine-m commented Aug 14, 2023

gprice1129 commented Sep 27, 2023

sandrine-m commented Sep 27, 2023 • edited Loading

gprice1129 commented Sep 28, 2023

sandrine-muller-research commented Oct 4, 2023

gprice1129 commented Feb 1, 2024

jh111 commented Feb 2, 2024

sandrine-muller commented Feb 2, 2024 • edited Loading

gprice1129 commented Feb 2, 2024 • edited Loading

jh111 commented Feb 2, 2024 • edited Loading

Genomewide commented Feb 5, 2024

cbizon commented Feb 15, 2024

sandrine-muller commented Feb 16, 2024 • edited Loading

gaurav commented Jul 12, 2024

Genomewide commented Jul 13, 2024

sandrine-muller-research commented Jul 22, 2024

Genomewide commented Jul 25, 2024

sandrine-muller-research commented Jul 25, 2024 • edited Loading

gaurav commented Jul 26, 2024

colleenXu commented Aug 12, 2024

sandrine-muller-research commented Aug 16, 2024

sandrine-m commented Sep 27, 2023 •

edited

Loading

sandrine-muller commented Feb 2, 2024 •

edited

Loading

gprice1129 commented Feb 2, 2024 •

edited

Loading

jh111 commented Feb 2, 2024 •

edited

Loading

sandrine-muller commented Feb 16, 2024 •

edited

Loading

sandrine-muller-research commented Jul 25, 2024 •

edited

Loading