Note: Follow the instructions step by step to extract the data from the sources. However, if you just want to try the notebooks, then go straight there (however, you'll still need to load data from somewhere).
The state of California provides a an enormous database containing years of traffic sensor data. In this repo, there is code to:
- Extract weather data from skylab
- Extract traffic data from PeMS
- Extract weather data from noaa
- Transform and load data to OmniSci
But, the best way to start is to go through the jupyter notebooks!
- Preferably, use python 3.6
- Install the requirements in requirements.txt:
pip install -r requirements.txt
- Create accounts at the appropriate places to be able to download the data.
- Fill in the fields in
config.ini
. The code reads critical information, such as your login to Caltrans from this file. You will not be able to extract data without creating a free account. - Download the correct html files with the appropriate links for data extraction (read below in [extracting traffic data...] (#extracting-traffic-data-from-caltrans))
Once everything is ready, you'll only need to run the files in bin/
to extract data and load to OmniSci.
Order to run the files in:
python bin/extract.py
python bin/extract_darksky_weather.py
python bin/transform_traffic_data_load_omnisci.py
The data is provided by California Department of Transportation (CalTrans) and found in their Performance Measurement System (PeMS) database.
CalTrans collects data in realtime from around 40,000 sensors!
To extract CalTrans traffic data, follow these steps:
- Follow the setup steps
- Set up the login info, paths, etc. in
config.ini
- Go to CalTrans PeMS website (http://pems.dot.ca.gov/) and login.
- Once in the website, navigate to the Data Clearinghouse (http://pems.dot.ca.gov/?dnode=Clearinghouse)
- The Data Clearinghouse has the data you need. Unfortunately, scrapy hasn't been implemented yet for this project, so you'll need to download the html for the desired Traffic data type and district from the website and place it in
./html_files/
. I've already placed some sample files in there. - Also important! Make sure to download the meta files for your district. These are necessary as they contain meta data regarding the stations. When transforming/loading to OmniSci, the code will read all meta files in the folder and join them together. All meta files for district 04 from 2015 to 2019 can be found in
data/meta/
. - You're ready to run:
python bin/extract.py
- Follow the setup steps
- Set up the login info, paths, etc. in
config.ini
- Create an API key at darksky and add it to the
config.ini
. - Open
bin/extract_darksky_weather.py
and configure the location, dates, etc - You're ready to run:
python bin/extract_darksky_weather.py
Note: There is already data from NOAA included in data/weather_noaa
. The script to download this data is also included but there are still some bugs.
In order to load the data in, make sure to have OmniSci running and have put in your OmniSci credentials in config.ini
.
- Make sure you have all the data correctly downloaded and ready.
- Open
transform_traffic_data_load_omnisci.py
and set the table name and other input parameters. - Run
python bin/transform_traffic_data_load_omnisci.py
The data should now be in OmniSci and ready to visualize!
The notebooks all require reading from OmniSci. Check them out to see how we created a model to:
- forecast traffic:
notebooks/Train_Models.ipynb
andPrediction.ipynb
- Identify the severity of an accident:
notebooks/IncidentClassification.ipynb
Try them out and also try some new ideas with the data!
If you want to check out some of the insights we've found from the traffic data, you can read the blog posts here:
Feel free to contact me for any questions or to get in touch with OmniSci.