Link to citibike Analysis
- Technologies
- Objective
- Data Cleaning
- Data Aggregation
- Visualizations
- Analysis
- Tableau Story
- Resources
- Contact
- Tableau
- Jupyter Notebook
- Pandas
citibike NYC Rider and Station Analysis: 2019 vs 2020
citibike Ridership: Pre- COVID-19 vs. During-COVID-19
*Has the customer base changed? *Have the top station locations changed due to WFH lifestyle? *Have the trip totals gone up or down due to COVID-19 pandemic?
The following visualizations will illustrate citibike ridership data from August, September, October 2019 as compared to August, September, October 2020.
These months were selected because the weather is nice, so bike riding is one of the preferred ideal mode of transportation. The data is representative of Tourism in August as well as student populations in September.
I collected the data from Citi Bike Data. I used Citi Bike trip history csv files from August, September and October of 2019 and August, September, and October of 2020. The files are very large and include trip and rider data from every station trip for the entire month. I used pandas
in a jupyter notebook to clean the data. I used the concat
function to combine all the csv files into one dataframe
.
Then I separated the ‘year’ and ‘month’ information from the ‘start date’ column. This helped clearly visualize the date in my tableau story.
Rider gender was represented by numeric values in the original data set so I assigned ‘male’ and ‘female’ values in place of the numbers to be more meaningful.
To display age in my visualizations, I calculated the rider age by subtracting the riders ‘birth year’ by the ‘Trip Year’. I created a new column for ‘Rider Age’.
I included the ‘unknown’ genders and outlying rider ages in my data sets, but I filtered them out of the final visualizations for clarity.
The date from citibike was exceptionally large and was too big to use in Tableau in its original form. I created different aggregations of the data sets to make smaller data frames that would be ok to use in Tableau Public. The smaller data frames also made visualizations easier to display.
To create the total citibike trips per year, I used the .groupby
function to group the data by ‘Trip Year’ and ‘Trip month’ and count the total trips.
month_df = clean_df3.groupby(['Trip Year','Trip Month']).count()
To create the user data frame, I used the .groupby
function and grouped the data by ‘Trip Year’, ‘Trip Month’, ‘Rider Gender’, ‘Rider Age’, and ‘User Type’. I added .count()
to calculate the sum of each group.
user_df1 = user_df.groupby(["Trip Year", "Trip Month", 'Rider Gender', 'Rider Age', 'User Type']).count()
To create visualizations in Tableau, I imported my data sets and joined them on common fields such as ‘station name’ and ‘longitude’ and ‘latitude’.
I used year, gender, and age as filters in my visualizations. The main purpose of my story was to compare ridership and stations data from 2019 and 2020. I used a filter for ‘Trip Year’ to create duplicate charts for each year.
As part of the story telling process, I played with different versions of the visualizations displaying the same data to see which version was more impactful and clear. Below you can see two versions of ‘Ridership by Age and Gender’. The bar chart has more specific data displayed clearly, but the overall look of the chart is overwhelming.
The line chart shows less details but is clear and clean as a visualization.
For the map visualizations, I used ‘Longitude’ as the column value and ‘Latitude’ as the row value. I then plotted the points as ‘sum’ of station total trips.
I used color to show the value of the map points- blue representing less trips and red representing more trips. I also added specific tool tips to display all relevant data points associated to station locations.
To add the zip code layer, I used Map Layers.
I used the ‘create set’ calculation when creating my visualizations showing the Top 10 trip stations.
After reviewing the visualizations, I concluded the following:
Below is the final Tableau Story. You can also view it on the Tableau Public site- citibike Analysis
CitiBike Data Sources: