Skip to content

lambdamusic/openalex-hacks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenAlex hacks

Notebooks and code snippets aimed at learning about / exploring the OpenAlex datasets.

First exploration at the moment focuses on exploring Topics & Keywords.

Topics and Keywords

Background

The topics classification in OpenAlex consists of various thousand categories organised into a 4-level hierarchy.

The gist of it is:

Topics

Works in OpenAlex are tagged with Topics using an automated system that takes into account the available information about the work, including title, abstract, source (journal) name, and citations. There are around 4,500 Topics. Topics are grouped into subfields, which are grouped into fields, which are grouped into top-level domains. This is shown in the diagram below, along with the counts for each.

Keywords:

Our team put together a new implementation of keywords based on our Topics. There are currently over 26,000 keywords and we expect to add more as time goes on. [...] With our new topics system that was developed in coordination with CWTS, we came out with a list of 10 keywords for each topic. In order to assign keywords to works, we took the topics assigned to that work (at most 3 topics), pulled the keywords associated with those topics (at most 30 keywords, for now) and then determined the similarity of the keyword to the title/abstract using embeddings (and the BGE M3-Embedding model).

For more details, see

FoamTree visualization

The notebook 2024-09-topics-explore.ipynb pulls the topics dataset and turns it into a FoamTree visualization.

foam-tree-sample.jpg

SKOS data model

SKOS provides a standard way to represent knowledge organization systems using the Resource Description Framework (RDF). Encoding this information in RDF allows it to process it using various tools developed for Knowledge Graph applications.

This notebook 2024-09-skos.ipynb loads the topics dataset and generates a SKOS ontology: openalex-topics-rdf.ttl.

Two sample visualizations of the ontology have been generated (using Ontospy):

  • Single page HTML documentation - link
  • Multi page HTML documentation - link
  • D3 bubble chart - link

Command is ontospy gendocs src/data/openalex-topics-rdf.ttl --preflabel label --theme united.

Credits

See also

Development

Disclaimer

This project is mainly a hack and can contain errors.

I am not affiliated to the OpenAlex project.