We polled the NextBus API for real-time bus arrival predictions, and we observed that the predicted arrival time tended to change as a bus approached a stop. We estimated when each bus arrived at each stop by requesting data every minute until the prediction disappeared. This estimated arrival was compared to the predicted arrival — the arrival time when the bus is 20 minutes away from a stop, according to the service. Stop locations are from the NextBus API.
A bus was considered “late” if its actual arrival time was more than five minutes later than the predicted time. The NextBus service predicts when a bus will “depart” from a stop. Because bus arrival and departure times are usually very close, we use the terms interchangeably in this article.
The route explorer was built with OpenLayers, an open-source library for creating dynamic maps. To draw bus routes, we used Google Maps to search for bus paths, which we converted into GPX data. The base map is from Carto.
NextBus is a company that partners with municipal transit agencies to provide information on bus locations and predictions. AC Transit, which serves the Alameda-Contra Costa area, provides access "Real-Time Departures" of its buses to the public via its website, app, and an API. The data is provided in XML format.
First, we needed data on the bus routes. See the datafunc/getRoutes
function in datafunc/getroutes.py
, and datafunc/routes.csv
. This returned a simple list of all the bus routes that ACTransit services. However, the routes of interest in the UC Berkeley vicinity are the 6, 7, 12, 18, 36, 51B, 52, 65, 67, 79, 88, and F. Next, we collected the stops along those routes, in both directions. See routeConfig
in datafunc/getroutes.py
. Over a thousand stops were stored in datafunc/stops.csv
. Narrowing to a more specific area around campus, arbitrary boundaries were set to filter only stops in this area. Those stops (about a hundred) were stored in datafunc/localstops.csv
, and as stops along each of the routes in the datafunc/routes/
directory, under the naming convention [ROUTE]_[DIRECTION NAME].csv
.
datafunc/getpredictions.py
polls and parses XML feeds
retrieved for each stop (specifically by stop ID) in datafunc/localstops.csv
. Stop predictions along each route for both directions are recorded in the datafunc/predictions/
directory, under the naming convention [STOPID]_[ROUTE]_[DIRECTION NAME].csv
.
datafunc/getpredictions.py
should be run as a cronjob, ideally at every minute so that prediction
updates can be observed carefully. Each prediction was concatenated as a new line in the appropriate CSV file. For this story, this file was run every minute from September 5, 2017 to October 17, 2017. This resulted in a predictions/
directory that was well over 1GB, so the data is not stored in this repository.
For drawing routes and stops onto a JavaScript map, the data needed to be converted to GPX format. For stops, this was straightforward and done in datafunc/toGPX.py
. For routes, it was more tricky. I manually looked up bus routes on Google Maps, then used an online resource that input Google Maps directions links and output GPX files. This was not ideal, but because of the relatively small number of routes, it was not very difficult. That data is stored in src/data/stops
and src/data/gpx
.
datafunc/delay.py
takes the CSVs generated by datafunc/getpredictions.py
and calculates delays (and other potentially useful statistics) for each stop, line, and line-specific stop. Recall that there can be multiple lines at each stop, so the third category is very important.
This data was then repackaged into a JSON file, data.json
, which can be found in the /src/data
directory.
Title | actransit-delays |
---|---|
Developer | Seokhyeon Ryu |
Link | http://projects.dailycal.org/2017/bus-delays/ |
========================
©2017 The Daily Californian