Tools for fetching shapefiles from the Census FTP site, then extracting data from them.
- Clone this repository:
git@github.com:censusreporter/census-shapefile-utils.git
- Enter the
census-shapefile-utils
directory. - If using Python 3 or
parse_shapefiles.py
, then install dependencies:pip install -r requirements.txt
- Note: package
gdal
requires non-Python library,libgdal
. Follow OS-specific installation to obtain this library.
- Note: package
This script will download TIGER data shapefiles from the Census FTP site.
It can be used to download a set of geographies defined in GEO_TYPES_LIST
,
or can be used to fetch files for a single state and/or single geography type.
Pass an -s argument to limit by state, pass a -g argument to limit
to a single geography type, and/or pass a -y argument to change the year
from 2012 to something else (e.g. 2015).
>> python fetch_shapefiles.py
>> python fetch_shapefiles.py -s WA
>> python fetch_shapefiles.py -g place
>> python fetch_shapefiles.py -y 2015
>> python fetch_shapefiles.py -s WA -g place -y 2015
If you use the -s argument to fetch files for a single state, the script will also download the national county, state and congressional district files that include data for your chosen state.
The script will create DOWNLOAD_DIR
and EXTRACT_DIR
directories
if necessary, fetch a zipfile or set of zipfiles from the Census website,
then extract the shapefiles from each zipfile retrieved.
DISABLE_AUTO_DOWNLOADS
will prevent certain geography types from being
automatically downloaded if no -g argument is passed to fetch_shapefiles.py
.
This may be useful because certain files, such as those for Zip Code
Tabulation Areas, are extremely large. You can still target any geography
in GEO_TYPES_LIST
specifically, however. So to fetch the ZCTA data:
>> python fetch_shapefiles.py -g zcta5
After you run fetch_shapefiles.py
, this script will generate a csv file
from the extracted data. These files will have a normalized set of headers,
so varying geography types can be included in the same csv. Each row also gets
some useful additional fields not directly found in the shapefiles, Including:
FULL_GEOID
: Concatenated Census summary level code and Census GEOIDFULL_NAME
: Human-friendly name for the geography. So city names, for instance, also include the state name, e.g. "Spokane, Washington"SUMLEV
: Census summary level codeGEO_TYPE
: Name of the geography type, e.g. "state"REGION
: Where applicable, a Census Region code. Shapefiles for states include this code; this script infers the value based on state for other geography types.REGION_NAME
: Name of the Census Region, e.g. "West"DIVISION
: Where applicable, a Census Division code. Shapefiles for states include this code; this script infers the value based on state for other geography types.DIVISION_NAME
: Name of the Census Division, e.g. "Pacific"
This script will search all directories inside EXTRACT_DIR
for shapefiles.
Pass an -s argument to limit by state, and/or pass a -g argument to limit
to a single geography type.
>> python parse_shapefiles.py
>> python parse_shapefiles.py -s WA
>> python parse_shapefiles.py -g place
>> python parse_shapefiles.py -s WA -g place
This script will generate a single csv file with your chosen data, and write
it to CSV_DIR
. Headers are pulled from helpers/csv_helpers.py
. The methods
for building rows specific to each geography type are also in csv_helpers
.
You can choose whether the generated csv should include polygon geometries, which can significantly increase the size of the output file. Include geometries by passing a -p flag.
>> python parse_shapefiles.py -s WA -p
Geometry data for certain geography types can be very large. The zcta5
geometries, for instance, will add about 1.1 Gb of data to your csv.
These assume you have already used fetch_shapefiles.py
to download
the shapefiles you want to get data from.
>> python parse_shapefiles.py -g place
will make place.csv
, which includes
data from all records at the Census place level.
>> python parse_shapefiles.py -s WA
will make all_geographies_WA.csv
,
which includes all geographies in Washington state, from the state record
all the way down to cities (places) and school districts. It will not include
polygon geometries.
>> python parse_shapefiles.py -s WA -g county -p
will make county_WA.csv
,
which includes data from all counties in Washington state. Because the -p flag
was passed, it will also include polygon geometries for each record.
>> python parse_shapefiles.py
will make all_geographies.csv
, which
includes data from all geography levels and all states. If you've downloaded
shapefiles for all levels, including for Zip Code Tabulation Areas, the csv
file should be about 19 Mb.
>> python parse_shapefiles.py -p
will make the same file as above, but
including geometries. This file takes about 17 minutes to build locally on my
Macbook Air, and is about 2.45 Gb.