Skip to content

Geocode postcodes, addresses, LLSOAs or Constituencies using the Code Point Open database, ONS data or GMaps API

License

Notifications You must be signed in to change notification settings

SheffieldSolar/Geocode

Repository files navigation

Geocode

Geocode various geographical entities including postcodes and LLSOAs. Reverse-geocode to LLSOA or GSP/GNode.

Latest Version: 0.12.1

What is this repository for?

  • Use Code Point Open and/or Google Maps API to geocode postcode and/or address into lat/lon co-ordinates.
  • Use ONS & NRS LLSOA Population Weighted Centroids to geocode Lower Layer Super Output Areas.
  • Use GIS data from data.gov.uk to geocode GB constituencies based on geospatial centroid.
  • Use GIS boundaries data from ONS and NRS to reverse-geocode lat/lon to LLSOA.
  • Use GIS data from National Grid ESO's data portal to reverse-geocode to GSP / GNode.
  • Use GIS boudnaries from the Europa/Eurostats API to reverse-geocode to NUTS regions.

Benefits

  • Prioritises Code Point Open for postcode lookup to save expensive GMaps API bills.
  • Caches GMaps API queries locally so that repeated queries can be fulfilled without a new API request.
  • Fetches data automatically at runtime from public APIs where possible.

How do I get set up?

Developed and tested with Python 3.9, should work for 3.7+.

Make sure you have Git installed - Download Git

Run pip install geocode-ss

or pip install git+https://github.com/SheffieldSolar/Geocode/

Check that the installation was successful by running the following command from terminal / command-line:

>> geocode -h

This will print the helper for the limited command line interface which provides tools to help get set up and to clear the cache when needed:

usage: geocode.py [-h] [--clear-cache] [--debug] [--setup SETUP [SETUP ...]]
                  [--load-cpo-zip </path/to/zip-file>] [--load-gmaps-key <gmaps-api-key>]

This is a command line interface (CLI) for the Geocode module version 0.12.1.

optional arguments:
  -h, --help            show this help message and exit
  --clear-cache         Specify to delete the cache files.
  --debug               Geocode some sample postcodes/addresses/LLSOAs.
  --setup SETUP [SETUP ...]
                        Force download all datasets to local cache (useful if running
                        inside a Docker container i.e. run this as part of image build).
                        Possible values are 'ngeso', 'cpo', 'ons', 'eurostat' or 'all'.
  --load-cpo-zip </path/to/zip-file>
                        Load the Code Point Open data from a local zip file.
  --load-gmaps-key <gmaps-api-key>
                        Load a Google Maps API key.

Jamie Taylor & Ethan Jones, 2019-10-08

No additional set up is needed at this stage - the required datasets will be downloaded (or extracted from the packaged data) the first time you use the associated method. If you want to force the Geocode library to download/extract all data, you can run the following command:

>> geocode --setup

This is especially useful if you are installing / running the library inside a container - using the above command you can download the data once during the image build rather than have to re-download every time the container is destroyed.

Important

Note that this library makes use of the Shapely library from PyPi, which often does not install correctly on Windows machines due to some missing dependencies. If using Windows and you see an error like OSError: [WinError 126] The specified module could not be found, you should install Shapely from one of the unofficial binaries here e.g.

>> pip install https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/Shapely-1.7.0-cp37-cp37m-win_amd64.whl

All data required by this library is either packaged with the code or is downloaded at runtime from public APIs. Some data is subect to licenses and/or you may wish to manually update certain datasets (e.g. OS Code Point Open) - see appendix.

Usage

Within a Python script

Within your Python code, I recommend using the context manager so that GMaps cache will be automatically flushed on exit. See example.py:

import os
import logging

from geocode import Geocoder

def main():
    with Geocoder() as geocoder:
        # Geocode some postcodes / addresses...
        print("GEOCODE POSTCODES / ADDRESSES:")
        postcodes = ["S3 7RH", "S3 7", "S3", None, None, "S3 7RH"]
        addresses = [None, None, None, "Hicks Building, Sheffield", "Hicks", "Hicks Building"]
        results = geocoder.geocode(postcodes, "postcode", address=addresses)
        for postcode, address, (lat, lon, status) in zip(postcodes, addresses, results):
            print(f"    Postcode + Address: `{postcode}` + `{address}`  ->  {lat:.3f}, {lon:.3f} "
                  f"({geocoder.status_codes[status]})")
        # Geocode some LLSOAs...
        print("GEOCODE LLSOAs:")
        llsoas = ["E01033264", "E01033262"]
        results = geocoder.geocode_llsoa(llsoas)
        for llsoa, (lat, lon) in zip(llsoas, results):
            print(f"    LLSOA: `{llsoa}`  ->  {lat:.3f}, {lon:.3f}")
        # Geocode some Constituencies...
        print("GEOCODE CONSTITUENCIES:")
        constituencies = ["Sheffield Central", "Sheffield Hallam"]
        results = geocoder.geocode_constituency(constituencies)
        for constituency, (lat, lon) in zip(constituencies, results):
            print(f"    Constituency: `{constituency}`  ->  {lat:.3f}, {lon:.3f}")
        # Reverse-geocode some lat/lons to LLSOAs...
        print("REVERSE-GEOCODE TO LLSOA:")
        latlons = [(53.384, -1.467), (53.388, -1.470)]
        results = geocoder.reverse_geocode_llsoa(latlons)
        for llsoa, (lat, lon) in zip(results, latlons):
            print(f"    LATLON: {lat:.3f}, {lon:.3f}  ->  `{llsoa}`")
        # Reverse-geocode some lat/lons to GSP...
        print("REVERSE-GEOCODE TO GSP:")
        latlons = [(53.384, -1.467), (53.388, -1.470)]
        results = geocoder.reverse_geocode_gsp(latlons)
        for (lat, lon), region_id in zip(latlons, results):
            print(f"    LATLON: {lat:.3f}, {lon:.3f}  ->  {region_id}")
        # Reverse-geocode some lat/lons to 2021 NUTS2...
        print("REVERSE-GEOCODE TO NUTS2:")
        latlons = [(51.3259, -1.9613), (47.9995, 0.2335), (50.8356, 8.7343)]
        results = geocoder.reverse_geocode_nuts(latlons, year=2021, level=2)
        for (lat, lon), nuts2 in zip(latlons, results):
            print(f"    LATLON: {lat:.3f}, {lon:.3f}  ->  {nuts2}")

if __name__ == "__main__":
    log_fmt = "%(asctime)s [%(levelname)s] [%(filename)s:%(funcName)s] - %(message)s"
    fmt = os.environ.get("GEOCODE_LOGGING_FMT", log_fmt)
    datefmt = os.environ.get("GEOCODE_LOGGING_DATEFMT", "%Y-%m-%dT%H:%M:%SZ")
    logging.basicConfig(format=fmt, datefmt=datefmt, level=os.environ.get("LOGLEVEL", "WARNING"))
    main()
>> python example.py
GEOCODE POSTCODES / ADDRESSES:
    Postcode + Address: `S3 7RH` + `None`  ->  53.381, -1.486 (Full match with GMaps)
    Postcode + Address: `S3 7` + `None`  ->  nan, nan (Failed)
    Postcode + Address: `S3` + `None`  ->  nan, nan (Failed)
    Postcode + Address: `None` + `Hicks Building, Sheffield`  ->  53.381, -1.486 (Full match with GMaps)
    Postcode + Address: `None` + `Hicks`  ->  nan, nan (Failed)
    Postcode + Address: `S3 7RH` + `Hicks Building`  ->  53.381, -1.486 (Full match with GMaps)
GEOCODE LLSOAs:
    LLSOA: `E01033264`  ->  53.384, -1.467
    LLSOA: `E01033262`  ->  53.388, -1.470
GEOCODE CONSTITUENCIES:
    Constituency: `Sheffield Central`  ->  53.376, -1.464
    Constituency: `Sheffield Hallam`  ->  53.396, -1.604
REVERSE-GEOCODE TO LLSOA:
    LATLON: 53.384, -1.467  ->  E01033264
    LATLON: 53.388, -1.470  ->  E01033262
REVERSE-GEOCODE TO GSP:
    LATLON: 53.384, -1.467  ->  ('PITS_3', '_M')
    LATLON: 53.388, -1.470  ->  ('NEEP_3', '_M')
REVERSE-GEOCODE TO NUTS2:
    LATLON: 51.326, -1.961  ->  UKK1
    LATLON: 47.999, 0.234  ->  FRG0
    LATLON: 50.836, 8.734  ->  DE72

In the above example, postcodes and addresses are lists of strings, but it should be fine to use any iterator such as Numpy arrays or Pandas DataFrame columns, although the geocode() method will still return a list of tuples.

When reverse-geocoding to GSP, the reverse_geocode_gsp() method returns both a list of Region IDs and a corresponding list of GSP / GNodes etc. Since the relationship between Region:GSP:GNode is theoretically MANY:MANY:MANY, the second object returned is a list of lists of dicts. This is rather clunky and will likely be refined in a future release. An alternative use case could disregard this second return object and instead make use of the Geocoder.gsp_lookup instance attribute - this is a Pandas DataFrame giving the full lookup between Regions / GSPs / GNodes / DNO License Areas (i.e. this dataset on the ESO Data Portal). In testing, the reverse_geocode_gsp() method was able to allocate ~1 million random lat/lons to the correct GSP in average wall-clock time of around 300 seconds.

Command Line Utilities

latlons2llsoa

This utility can be used to load a CSV file containing latitudes and longitudes and to reverse-geocode them to LLSOAs (optionally switching to Data Zones in Scotland):

>> latlons2llsoa -h
usage: latlons2llsoa [-h] -f </path/to/file> -o </path/to/file> [--datazones]

This is a command line interface (CLI) for the latlons2llsoa.py module

optional arguments:
  -h, --help            show this help message and exit
  -f </path/to/file>, --input-file </path/to/file>
                        Specify a CSV file containing a list of latitudes and
                        longitudes to be reverse-geocoded. The file must
                        contain the columns 'latitude' and 'longitude' (it can
                        contain others, all of which will be kept).
  -o </path/to/file>, --output-file </path/to/file>
                        Specify an output file.
  --datazones           Specify to use Data Zones in Scotland.

Jamie Taylor, 2020-04-16

latlons2gsp

This utility can be used to load a CSV file containing latitudes and longitudes and to reverse-geocode them to GSPs/GNodes.

>> latlons2gsp -h
usage: latlons2gsp.py [-h] -f </path/to/file> -o </path/to/file>

This is a command line interface (CLI) for the latlons2gsp.py module

optional arguments:
  -h, --help            show this help message and exit
  -f </path/to/file>, --input-file </path/to/file>
                        Specify a CSV file containing a list of latitudes and
                        longitudes to be reverse-geocoded. The file must
                        contain the columns 'latitude' and 'longitude' (it can
                        contain others, all of which will be kept).
  -o </path/to/file>, --output-file </path/to/file>
                        Specify an output file.

Jamie Taylor, 2020-05-29

postcodes2latlon

This utility can be used to load a CSV file containing postcodes and to geocode them to lat/lons.

>> postcodes2latlon -h
usage: postcodes2latlon.py [-h] -f </path/to/file> -o </path/to/file>

This is a command line interface (CLI) for the postcodes2latlon.py module

optional arguments:
  -h, --help            show this help message and exit
  -f </path/to/file>, --input-file </path/to/file>
                        Specify a CSV file containing a list of postcodes to
                        be geocoded. The file must contain the column
                        'postcode' (it can contain others, all of which will
                        be kept).
  -o </path/to/file>, --output-file </path/to/file>
                        Specify an output file.

Jamie Taylor, 2020-06-12

bng2latlon

This utility can be used to load a CSV file containing eastings and northings and to convert them to lat/lons.

>> bng2latlon -h
usage: bng2latlon.py [-h] -f </path/to/file> -o </path/to/file>

This is a command line interface (CLI) for the bng2latlon.py module

optional arguments:
  -h, --help            show this help message and exit
  -f </path/to/file>, --input-file </path/to/file>
                        Specify a CSV file containing a list of eastings and
                        northings to be converted. The file must contain the
                        columns 'eastings' and 'northings' (it can contain
                        others, all of which will be kept).
  -o </path/to/file>, --output-file </path/to/file>
                        Specify an output file.

Jamie Taylor, 2020-06-12

Documentation

See here.

How do I update?

Run pip install --upgrade git+https://github.com/SheffieldSolar/Geocode/.

You may also wish to clear all locally cached data by running the following command from terminal/command-prompt:

geocode --clear-cache

Developers

Running Tests

>> cd <root-dir>
>> python -m unittest Tests.test_geocode

Appendix

Code Point Open

As of version 0.4.15, the Code Point Open data is packaged with this repository, however the following instructions can be used to update the local Code Point Open dataset without re-installing the Geocode library.

Contains OS data © Crown copyright and database right 2018

Contains Royal Mail data © Royal Mail copyright and Database right 2018

Contains National Statistics data © Crown copyright and database right 2018

Updating Code Point Open

Download the (free) Code Point Open data here. Run the following command:

>> geocode --load-cpo-zip "<path-to-zip-file>"

For example, if you saved the code point open data to 'C:\Users\jamie\Downloads\codepo_gb.zip', you would run:

>> geocode --load-cpo-zip "C:\Users\jamie\Downloads\codepo_gb.zip"

N.B. If you encounter problems using the --load-cpo-zip utility, you might like to manually copy/paste the zip file codepo_gb.zip into the geocode/code_point_open subdirectory of the geocode module. (Do not extract the contents of the zipfile)

The CPO data will be extracted and stored locally in geocode/code_point_open/code_point_open_<version>.p.

GMaps

If you want to make use of the Google Maps API, you can load an API key using the command:

>> geocode --load-gmaps-key "<your-api-key>"

N.B. The API key will be stored in plaintext in the geocode/google_maps/key.txt file within your local installation. This is not secure and only appropriate when using the Geocode library for development purposes.

See here for help getting an API key.

Any queries you make to the GMaps API will be cached to geocode/google_maps/gmaps_cache_<version>.p so that repeated queries are faster and cheaper.

ONS

The Geocode library makes use of data from the Office for National Statistics in order to geocode Lower Layer Super Output Areas (LLSOAs) in England and Wales. The first time you make use of the Geocoder.geocode_llsoa() method, the LLSOA (December 2011) Population Weighted Centroids data will be downloaded from the ONS API and cached locally in geocode/ons_nrs/llsoa_centroids_<version>.p. The first time you make use of the Geocoder.reverse_geocode_llsoa() method, the LLSOA (December 2011) Boundaries EW data will be downloaded from the ONS API and cached locally in geocode/ons_nrs/llsoa_boundaries_<version>.p. More information here and here.

NRS

The Geocode library makes use of data from National Records of Scotland in order to geocode Lower Layer Super Output Areas (LLSOAs) in Scotland. The raw population-weighted centroids and boundary data is available from the NRS website, but for convenience and performance reasons the data has been re-formatted and re-projected and is packaged with this library. More information here.

Contains NRS data © Crown copyright and database right 2020

Contains Ordnance Survey data © Crown copyright and database right 2020

data.gov.uk

The Geocode library makes use of the Westminster Parliamentary Constituencies (December 2015) Generalised Clipped Boundaries in Great Britain† from data.gov.uk in order to geocode constituencies according to their (unweighted) geospatial centroid. More information here. The geospatial centroids have been calculated using ESRI ArcMap's 'Feature To Point' toolbox and the derived lookup data is packaged within this respository as a pipe-separated variable file: geocode/gov/constituency_centroids.psv. The first time you make use of the Geocoder.geocode_constituency() method, this file will be loaded and cached back to geocode/gov/constituency_centroids_<version>.p for faster loading next time.

†Contains public sector information licensed under the Open Government Licence v3.0.

NGESO Data Portal

The Geocode library makes use of GSP/GNode GIS boundaries developed by Sheffield Solar. In May 2020, these region definitions were uploaded to National Grid ESO's Data Portal - see here. The first time you make use of the Geocoder.reverse_geocode_gsp() method, the GIS data is downloaded from the Data Portal API at runtime.

Supported by National Grid ESO Open Data

Subject to NGESO Open Licence

To update your locally cached boundary definitions, clear your local cache:

geocode --clear-cache

Eurostat

The Geocode library makes use of the eurostat API to download boundaries for NUTS (Nomenclature of territorial units for statistics) regions. For more information, see here. For license conditions, see the FAQ section of the Europa website here.