Skip to content

TechnionTDK/jbs-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

The Jewish Bookshelf Project (JBS)

In the JBS project we build a Linked Dataset for the Jewish bookshelf (JBS-LD). The project consists of the following major efforts:

  • Defining a JBS ontology (classes and properties).
  • Representing the structure of various Jewish texts in RDF format, based on the defined ontology.
  • Conducting text analysis tasks, and representing the results in RDF. Current main task is detecting quotations of Bible verses (psukim) in a given text.

Your comments and suggestions are welcome and should be directed to Oren Mishali (omishali at cs.technion.ac.il).

The JBS ontology

The current (and evolving) version of the ontology is available in ttl format. You may also be interested to explore the JBS ontology (and the whole dataset) using a visualization tool that we have developed called eLinda.

About the ontology

PREFIX jbo: <http://jbs.technion.ac.il/ontology/>
PREFIX jbr: <http://jbs.technion.ac.il/resource/>

We view the Jewish Bookshelf as a collection of books. Each book inserted to the JBS-LD is given a unique URI (e.g., jbr:book-tanach), and a class type jbo:Book (via rdf:type property). We view each book as having a tree hierarchy, where the "leaves" of the tree (the lowest level) are its text elements (i.e., text found within a chapter or within a section). For example, in the Tanach (Bible) each text element is a single verse (pasuk). Each text element is given a URI, and a class type jbo:Text. Other elements of type jbo:Section represent the chapters or sections themselves where the text elements are contained. Note that text elements are the only elements that point to the actual text content (via a jbo:text property). Other useful properties of text elements are jbo:position (numerical ordering within the book), jbo:book (a pointer to the containing book), and rdfs:label (a short human-readable description).

The basic workflow

The JBS workflow begins with raw Jewish texts and ends with RDF triples ready for consumption. Following are the key workflow components:

  • jbs-raw is the repository where the raw texts reside. The raw texts are extracted from the following open-licensed web sources:
  • The raw texts are then analyzed and turned into structured json files. The analysis is made using a library we have developed called text2json, and the output jsons reside in the jbs-text repository.
  • The jsons in jbs-text are turned into RDF (ttl) format using a tool we have developed called json23plet.
  • The ttl files are loaded into a Virtuoso server. The Virtuoso endpoint is available at http://tdk3.csf.technion.ac.il:8890/sparql.

If you use the endpoint (via SPARQL) don't forget to set the name of the graph to http://jbs.technion.ac.il. The following example query will return all book triples:

PREFIX jbo: <http://jbs.technion.ac.il/ontology/>
PREFIX jbr: <http://jbs.technion.ac.il/resource/>
SELECT * WHERE {
  ?s ?p ?o.
  ?s a jbo:Book.
}

Text analysis

We are developing an algorithm for detecting quotations of Bible verses within a given text. The algorithm, which is written in Java, is executed against all our text corpus, and the results are represented in RDF and integrated into JBS-LD. The results may be accessed via SPARQL queries, and using a web frontend that we are developing called Sulamot.

About

The Jewish Bookshelf (JBS) project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published