Skip to content
Robert Hoehndorf edited this page Jan 12, 2016 · 9 revisions

UniProtToOWL

Model

Given protein XYZ, we generate

  • class XYZ (instances are one protein)
  • class XYZ_all (instance is the set of all XYZ proteins in the universe)
  • class XYZ_isoform for all isoforms of XYZ
  • class XYZ_generic (the 'generic' form of the protein, orthology group)

We also generate the following axioms:

  • XYZ SubClassOf: XYZ_generic
  • XYZ_isoform SubClassOf: XYZ
  • XYZ_isoform SubClassOf: isoform-of some XYZ
  • XYZ SubClassOf: member-of some XYZ_all
  • XYZ_all SubClassOf: { xyz } [XYZ_all is a singleton class]
  • XYZ_all SubClassOf: has-member only XYZ [XYZ_all is homogenic; not in EL!]

Why?

Then, we can distinguish the following:

  • Every isoform of XYZ has function GO:123: isoformOf some XYZ SubClassOf: has-function some GO:123
  • Some isoform of XYZ has function GO:123: XYZ SubClassOf: has-isoform some (has-function some GO:123)
  • All generic XYZ proteins have function GO:123: XYZ_generic SubClassOf: has-function some GO:123

How to talk about possibilities:

  • Current/standard approach: XYZ SubClassOf: has-function some (realized-by only (located-in some GO:321)), or, for processes: XYZ SubClassOf: has-function some (realized-by only GO:321)
  • The problem: not in OWL-EL, will never, ever, scale to size of UniProt!
  • Suggested solution: Transform statement in: at least one member of XYZ_all is located in membrane (or participates in process, etc.): XYZ_all SubClassOf: has-member some (located-in some GO:321), or for processes: XYZ_all SubClassOf: has-member some (participates-in some GO:321)
  • This is in OWL-EL!

Open questions

  • given a DL Query Q, how can we decide whether to rewrite the query (to use the set representation of proteins) or use the class for individual proteins?

Example and code

Todo

  • complete conversion:
    • build a complete data model, able to capture all of UniProt
    • decide on annotation properties vs. object properties
    • test conversion of SwissProt and measure reasoner performance
    • convert all of UniProt and test reasoner performance [maybe in the next 10 years]