Skip to content

A dbt project that transforms messy public provider datasets into usable data for the Tuva Project.

Notifications You must be signed in to change notification settings

tuva-health/provider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache License dbt logo and version

Provider

🧰 What does this project do?

The Tuva Provider project combines and transforms messy public provider datasets into usable data. This project contains the transformations we use to create the clean datasets for users of the Tuva Project. We have made this project public to share our methodology and code.

You can easily load the cleaned provider data into your data warehouse by using the terminology seeds from The Tuva Project package.

🔌 Database Support

  • Snowflake

✅ How to get started

Pre-requisites

  1. You have dbt installed and configured (i.e. connected to your data warehouse). If you have not installed dbt, here are instructions for doing so.
  2. You have created a database for the output of this project to be written in your data warehouse.
  3. You have downloaded the source data and loaded it into your data warehouse.
    • NPI Data from NPPES
    • Provider Taxonomy from NUCC
    • Medicare Specialty Crosswalk from CMS

Getting Started

Complete the following steps to configure the project to run in your environment.

  1. Clone this repo to your local machine or environment.
  2. Update the dbt_project.yml file:
    1. Add the dbt profile connected to your data warehouse.
    2. Update the variable provider_database to use the new database you created for this project, default is "nppes"..
  3. Update the models/_sources.yml file:
    1. Update the database where your source data has been loaded, default is "nppes".
    2. Update the schema where your source data has been loaded, default is "raw_data".
    3. If the source tables are named differently then you can add the table identifier property.
  4. Run dbt build.

🙋🏻‍♀️ How is this project maintained and can I contribute?

Project Maintenance

The Tuva Project team maintaining this project only maintains the latest version of the project. We highly recommend you stay consistent with the latest version.

Contributions

Have an opinion on the mappings? Notice any bugs when installing and running the project? If so, we highly encourage and welcome feedback! While we work on a formal process in Github, we can be easily reached on our Slack community.

🤝 Community

Join our growing community of healthcare data practitioners on Slack!