These scripts began as a way to load voter data into a PANDA instance via its API.
Along the way, the end-of-life of Python 2 appeared on the horizon, and the wonderful PANDA project has fallen a bit out of date.
We're not giving up on PANDA, but for now, the voter script has been updated to run in Python 3.6+ and to be able to create a Postgres database of voter data as an alternative to feeding the data to a PANDA instance.
If you use the Postgres option, you'll get an indexed database ready to be plugged into a Django project, if desired. This repo is not a working Django project, but a Django model that can be used to hook up the Postgres voter db is included in /voters/models.py
.
First, install the Python requirements:
pip install -r requirements/base.txt
The voter script is tailored to Florida but could be adapted. It requires some source data and a date value:
- County voter registration files, which in Florida are available from the state's Division of Elections.
- A VOTER_DATA_DATE value in YYYY-MM-DD form, provided by an environment variable of that name or by manually editing the script's global variable. This is used to name the database, to provide a default source_date in the
voters_voter
table and to provide a YEAR value for accessing the right data directory.
The load_county_voters
script uses some local directories as it transforms the raw data.
The assumed directory structure, all expected to live under a directory named for the data year (such as 2019/
), includes these folders:
Folder | Use |
---|---|
VoterDetail | Put raw county voter files here |
load | Prepped voter files |
loaded | Voter files that were sent to PANDA |
prep | Processing folder |
temp | Processing folder |
The load_county_voters
script will look for a local environment variable, PANDA_LOADERS_BASE_DIR
, to use as the base directory for the above folders, and will default to /tmp
as the base directory if no environment var is found.
With raw voter files in place, you can run the voter script with python load_county_voters.py
plus one of four arguments:
[RAW FILE]
: You can pass in a file name for a raw county voter file, such asBAY_20190312.txt
. This will prep a single raw file for loading to a database or to PANDA, if the raw file is in /VoterDetail/.prep_files
: This preps all raw county files found in /VoterDetail/ and makes them ready for export to a database or to PANDA.load_to_postgres
: After files are prepped, this will create, load and index a Postgres voter database. If you put all 67 Florida county files in the /VoterDetail/ directory and prep them, this will create a statewide database of Florida voters.export_to_panda
: To export all prepped county files to a PANDA instance, creating one dataset per county.
If you use the script to load data into a PANDA instance, the process uses PANDA's API, which is slow. But the API method has some advantages over PANDA's manual data upload process.
- It sidesteps memory issues that you can encounter in PANDA's loading GUI.
- It only uses PANDA index space, rather than index space + file storage space.
- It results in a dataset with external_id values, which makes rows editable via the API.