cord19q: COVID-19 Open Research Dataset (CORD-19) Analysis

This repository is an archive of work done with the CORD-19 challenge in 2020. If you'd like to programatically process medical literature, see paperai

COVID-19 Open Research Dataset (CORD-19) is a free resource of scholarly articles, aggregated by a coalition of leading research groups, covering COVID-19 and the coronavirus family of viruses. The dataset can be found on Semantic Scholar and Kaggle.

The cord19q project builds an index over the CORD-19 dataset to assist with analysis and data discovery. A series of COVID-19 related research topics were explored to identify relevant articles and help find answers to key scientific questions.

Tasks

A full list of Kaggle CORD-19 Challenge tasks can be found in this notebook. This notebook and corresponding report notebooks won 🏆 7 awards 🏆 in the Kaggle CORD-19 Challenge.

The latest tasks are also stored in the cord19q repository.

Installation

cord19q can be installed directly from GitHub using pip. Using a Python Virtual Environment is recommended.

pip install git+https://github.com/neuml/cord19q

Python 3.6+ is supported

Building a model

cord19q relies on paperetl to parse and load the CORD-19 dataset into a SQLite database. paperai is then used to run an AI-Powered Literature Review over the CORD-19 dataset for a list of query tasks.

The following links show how to parse, load and index CORD-19.

The model will be stored in ~/.cord19

Building a report file

A report file is simply a markdown file created from a list of queries. An example:

python -m paperai.report tasks/risk-factors.yml

Once complete a file named tasks/risk-factors.md will be created.

Running queries

The fastest way to run queries is to start a paperai shell

paperai

A prompt will come up. Queries can be typed directly into the console.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
tasks		tasks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cord19q: COVID-19 Open Research Dataset (CORD-19) Analysis

Tasks

Installation

Building a model

Building a report file

Running queries

About

Releases 24

Packages

Languages

License

neuml/cord19q

Folders and files

Latest commit

History

Repository files navigation

cord19q: COVID-19 Open Research Dataset (CORD-19) Analysis

Tasks

Installation

Building a model

Building a report file

Running queries

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 24

Packages 0

Languages

Packages