Aim: Identify commercial centers using Points of Interest (POI) data of Jaipur city.
There's a lot of open data available about the demographics and geography of the planet. But this information is not necessarily supervised in any particular structure from which insights can be drawn.
This project creates clusters of distinct commercial centers or markets using points of interest data of Jaipur. Points of interest (POI) data provides location information of different places along with their defining tags like school, type of outlets, type of building, etc.
POI data refers to the coordinates (latitudes and longitudes) of any physical entity with a tag describing its type like commercial buildings, schools, hospitals, restaurants, etc.
Objectives:
- Get Points of Interest from open data sources like open street maps (OSM).
- Understand how spatial location data works
- Understand spatial vector data types and how to manipulate it using your language of choice.
- Understand necessary GIS concepts like projections, spatial clustering, etc.
- Figure out a way of clustering these points into commercial centers/markets. You can use standard size polygons also to cluster the points.
- Find and label the most significant clusters, statistically and intuitively.
- Visualize the resultant commercial centres/markets.
IdentifyCommercialCenters.ipynb
is the jupyter file containing all the analysis and code for clustering and identifying the commercial centers of Jaipur.poi_data_json.py
is the python script to get the POI data of Jaipur and save it as JSON file.spatial_data.json
file contains the POI data in JSON format.shapefiles/polygon.shp
is the Polygon ESRI Shapefile containing POI data.requirements.txt
contains all the dependencies (python modules) required to run this project.
All the commercial centers form the heart of the city and contain most of the amenities like restaurants, commerical shopping complexes, malls, hospitals, etc. So, we considered 39 such amenities and using overpy
we collected POI spatial data of the city. Overpy is the python module used for extracting open data from Open Street Maps (OSM) using Overpass QL
query language. POI data mainly contains:
- Node: Represents amenities with their latitudes, longitudes and tags providing information like type of amenity, name and their address.
- Way: An ordered list of nodes represents a way.
- Relation: A group of elements used to model logical or geographical relationships between objects.
POI data of Jaipur contains 422 nodes
and 72 ways
. Below is the Scatterplot showing the nodes (POIs) of Jaipur plotted with longitude on x-axis and latitude on y-axis:
The densely clustered points (amenities) denote the heart (posh commercial area) of the city while other far-away points indicate the small number of amenities in secluded, outskirts and underdeveloped areas of the Jaipur city.
POI data is written into an ESRI shapefile using pyshp
, a Python module to read, write and modify the shapefiles efficiently.
For spatial clustering of points, we used both Density-Based Spatial Clustering of Applications (DBSCAN) and K-means algorithm to figure out the commercial centers. We know that the areas containing large number of amenities as compared to other areas will be a commercial center.
DBSCAN clusters the data points to separate the area of high density (having large no. of amenities) with the area of low density and hence, separating out commercial areas from other areas of the city. DBSCAN is used because: it is robust to outliers, and does not require number of clusters to be specified priorly.
From the scatterplot, we can observe that most of the points (amenities) are at the center surrounded by far-away points. So, the center area must be the posh main area of the city. Within this large cluster, there may be some small clusters as well. We'll zoom into the cluster to analyse it well in the further sections.
The points are clustered using K-Means algorithm by finding distance between them using their longitude and latitude values.
Here, intententionally number of clusters is set high (=8) so that we can get smaller clusters within this center large cluster. So, as seen in the figure, the center cluster has 4 smaller clusters denoted by colors: yellow, parrot green, purple, and dark green.
This is a zoomed-in scatterplot of the large center cluster. We can clearly see some smaller clusters here. Let's cluster them using both DBSCAN and K-Means Clustering.
From the above scatterplot, we can intuitively say that purple cluster has the highest density. It means it has the highest number of amenities in Jaipur. So, it is one of the commercial centers of Jaipur. Additionally, other smaller clusters are separate commercial areas in the city providing fewer amenities to the population residing there.
The above plot demonstrates the K-means clustering on the POI points. Here, we can clearly see that there are many sub-clusters denoting separate commercial centers. Blue cluster is the largest among them.
-
Clone this repository by typing following command on the terminal:
git clone https://github.com/vibhor98/Identify-Commercial-Centers.git
. -
Run: pip install requirements.txt to install all the required dependencies.
-
Navigate to the directory containing
IdentifyCommercialClusters.ipynb
python notebook and run: jupyter notebook to open the analysis file.