RDF LOD

RDF / LOD (data)

Standardization of RDF data, metadata, ontologies and provenance

Provenance vocabularies (Arto Bendiken, S Kawashima, T Katayama)
Making existing data available from SPARQL (Jerven Bolleman, Arto Bendiken, Shin Kawano, Kieron Taylor, Joe Miyamoto, Raoul)
- VCF file as SPARQL database (VCF to RDF Mapping)
  - sample vcf file
- On the demand/fly conversion into triples - without conversion (look at bgzf & tabix for indexed access to vcf)
- Gene Expression Analyses/generic datasets
- SPARQL endpoint for Tara Ocean data, link to MicrobeDB
Create a Phenotype RDF data (mouse, rat, cell) (T,Takatsuki, interested: S Kawashima, R Vos, Rob H, S.Kumagai)
FAIR - Findable Accessible Interoperable Reusable Data (MarkW and MichelD - group leads)
- FAIR Principles - revisited (http://datafairport.org ; https://www.force11.org/group/fairgroup/fairprinciples) (MarkW, MarkT, MichelD, interested: Erick, Nick) (potential new document describing the principles https://docs.google.com/document/d/1XEW76g3cLqOBmgQZGZOxWB0I1rwQrJSixpWL4UallTc/edit)
- Server and Client for Triple Pattern Fragments (in Perl + other languages) (MarkW) (http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/) - a way to expose non-RDF data sources as triples; a way to push SPARQL query resolution to the client, rather than the server. (Client-side algorithm here: http://ceur-ws.org/Vol-1272/paper_10.pdf)
NBDC RDF portal [Akio, Shuichi, Toshiaki]
NBDC NikkajiRDF [Issaku Yamada (Glycomics), Toshiaki Tokimatsu (KNApSAcK), Akira Kinjo(PDBj), Gang Fu, Evan Bolton(PubChem), Kouji Kozaki (Hozo), and Tatsuya Kushida (Nikkaji) ]

Day 1

FAIR

Participants: Michel Dumontier, Mark Wilkinson, Mark Thompson, Nick Juty

Progress:

Discussed and refined all FAIR principles. see document

Next steps:

replace data with meta(data), metadata, or data
compare to published FAIR principles
initiate discussion with FAIR principles stewards and Barend Mons

Day 2

FAIR

Progress:

Prepared a complete first draft of the FAIR principles. see document

Next steps:

continue discussions with FAIR principles stewards and Barend Mons

Server and Client for Triple Pattern Fragments (Perl) - Mark

Pre-Hackathon:

implementation of FAIR for non-RDF data sources
prototype used the EU Huntington Disease Network exemplar dataset (part of RD Connect project)
Constraints: rare diseases are EXTRREMELY sensitive data - highly identifiable. Therefore, need to have extremely fine-grained access and control to not only the data, but also the metadata
Prototype solution uses Linked Data Platform (W3C)
- REST interface - First URL returns repository-level metadata, and a list of URLs representing meta-records
- REST interface - meta-record URLs return metadata about individual records, and (maybe) links to the actual records data
- everything is 100% under the control of the data owner. Everything is Linked Data.
- FINDABLE: Everything is identified by a URL, that resolves to RDF
- ACESSIBLE: URLs resolve, and contain information about license and access protocol
- INTEROPERABLE: RDF is nicely formatted Linked Data with rich link-outs to other data following open ontologies.
- RE-USABLE: Metadata is as "maximal" as the data provider can provide; all retrievals have licensing information;

So, prior to the hackathon, we had a FAIR solution for METADATA over any kind of repository (RDF repositories or other!)

But... we still want a FAIR solution for DATA within those repositories. For this, we think that Triple Pattern Fragments is a good solution. The idea is that a repository responds to requests for very simple fragments of data that match certain patterns of ?s ?p ?o.

Progress:

discussions with Ruben Verborgh, Kjetil Kjernsmo, and Patrick Hochstenbach regarding what has already been done, and what needs to be done.
Client-side has a reasonably good implementation
Server-side depends on pre-existing Triplestore. My use-case presumes that we start from something other than a triplestore
Advised to extend RDF::Trine::Store with a new type (e.g. CSV) and implement the get_statements method to dynamically generate triples.
PROBLEM: the "smart" way to do this would be to use a tool like Tarql, however Tarql currently wont build via maven :-P
hacking a custom "solution" for the moment

Day 3

Server and Client for Triple Pattern Fragments (Perl) - Mark

Solved the problem with maven --> now have a functional Tarql. Tarql (https://github.com/tarql) is "SPARQL for Tables" - a way to convert tabular data into rdf using SPARQL CONSTRUCT. It works nicely on my CSV files!
- for csv entries with spaces, you need to use the SPARQL 'ENCODE_FOR_URI' string function to get the %20 (and same for any other unuusal characters)
I want to use Plack::App::RDF::LinkedData to serve my newly transformed CVS data
- This library requires a RDF::Trine::Store object, but I want to dynamically transform my data, so none of the existing Trine::Stores are useful to me (they all require a pre-existing datastore)
- I created a new RDF::Trine::Store::CSV object. NOTE: I will probably change the namespace for these modules because they don't implement the full range of RDF::Trine::Store functionalities - they are readonly, for example...
  - I originally created this in Moose, but that was a waste of time because (for some odd reason) RDF::Trine::Store Child objects are not created by calling ->new on the RDF::Trine::Store::CHILD, but rather by calling ->new on the RDF::Trine::Store("CHILD"). Therefore my CHILD Moose object's constructor was never called, and all of the lovely Mooseyness was lost. So... now it just re-implements the required subroutines from RDF::Trine::Store.
- RDF::Trine::Store::CSV implements the two methods that are required by RDF::LinkedData - "get_statements" and "count_statements". These both have output that is dynamically generated from an IPC call to Tarql
- According to the Triple Pattern Fragments spec, the incoming URL has three parameters - subject, predicate,and object (thing?subject=this;predicate=that)
- Problem: It appears that RDF::LinkedData (or the Plack App) isn't correctly parsing the incoming URL - everythign is passed to my subroutine in a single parameter - $subject
  - I have contacted the authors to ask for their advice.

Day 4

Server for Triple Pattern Fragments (Perl) - Mark

For the moment I am parsing the URL parameters in my own code, so that I can move forward.
the triple-pattern matching is working, and I return an iterator to the RDF::LinkedData code. It seems that the triple-counting and the iterating over them (without redundancy) is working properly.
moved the Triple Pattern Fragments server from localhost to my public server so that I could test it using Ruben's node.js-based client.
- My output is rejected by the TPF client as invalid. :-P So... I got a bit more granular, using the s/p/o patterned URLs to query my server, and compare the output to that from DBPedia.
when I compare my output to the output from dbpedia's TPF server, it is very different! :-( Mine lacks the various control elements and metadata elements that I had expected to be added by the Plack TPF server.
I have contacted the authors for advice.

Revising the OpenLifeData2SADI Services

Michel has combined all Bio2RDF data into a single endpoint. To compensate for this, I need to revise all of the (32,000+) OpenLifeData SADI services.
- I am re-querying the endpoints now - one of the consequences of having the data in the same endpoint is that it is easier to discover the precise-type-relations between one endpoint and another (they used to be e.g. 'uniprot:Resource' but now will be something more specific like 'uniprot:Gene')
- work in-progress...

Making Connections between Ensembl and PubMedCentral

Jee-hyub Kim and Kieron Taylor developed queries of the form:

PREFIX obo: <http://purl.obolibrary.org/obo/> ... SELECT ?prefix ?exact ?postfix ?section ?source WHERE { ?annotation oa:hasBody ?xref . ?annotation oa:hasTarget ?target . ?target oa:hasSource ?source . ?target oa:hasSelector ?selector . ?target dcterms:isPartOf ?section . ?selector oa:prefix ?prefix . ?selector oa:exact ?exact . ?selector oa:postfix ?postfix . VALUES ?xref {<http://purl.uniprot.org/uniparc/UPI0000DA7DCA> ...}}

Kieron wishes to establish the extent of crossover between the two resources.

Script running all Ensembl IDs against PubMed discovered 115 matches
Made a module to transform Ensembl "xrefs" into LOD URIs
Queries still running against PubMed, hopefully more hits to come.
Next step to try federated query with Ensembl RDF

Ensembl Variation data via SPARQL

Raoul, Jerven, Arto, Kieron, others

Brainstorming of methods to not create RDF dump of entire Ensembl Variation
SPARQL over VCF - Functional, but multiple reads on the file per query makes it ultra-slow without indexing. VCF does not contain all information Ensembl provides.
SPARQL over SQL - No prototype, SQL schema very complex, needs Ensembl Variation experts.
SPARQL over Variation Graph (ga4gh) - No annotation, but this can be fetched from other sources, interesting.
Dump subset of RDF - all RSIDs with simple allele information more manageable than full schema. Still useful in conjunction with core Ensembl RDF for location awareness.
Dump ALL RDF - Possible, but very very big.

Conclusion? Not sure. Demand high, difficulty also high.

wwPDB/RDF SPARQL examples (AR Kinjo)

Making example queries for the NBDC endpoint.

Day 5

Server for Triple Pattern Fragments (Perl) - Mark

no significant progress from yesterday. Discussions with authors of the Triple Pattern Fragments/Plack LDF server code is ongoing.

OpenLifeData2SADI (Mark & Michel)

new OLD endpoint indexed and config file for SADI written.
- just waiting for sparql endpoint to come-up and I will re-register these SADI services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDF LOD

RDF / LOD (data)

Day 1

FAIR

Day 2

FAIR

Server and Client for Triple Pattern Fragments (Perl) - Mark

Day 3

Server and Client for Triple Pattern Fragments (Perl) - Mark

Day 4

Server for Triple Pattern Fragments (Perl) - Mark

Revising the OpenLifeData2SADI Services

Making Connections between Ensembl and PubMedCentral

Ensembl Variation data via SPARQL

wwPDB/RDF SPARQL examples (AR Kinjo)

Day 5

Server for Triple Pattern Fragments (Perl) - Mark

OpenLifeData2SADI (Mark & Michel)

Clone this wiki locally