CRISP-DM Project of Udacity Data Scientist Nanodegree
There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
You will need to download the IPL dataset from Kaggle. You can find the data to download here.
The dataset has 2 files:
1. matches.csv having every match detail from 2008 to 2019 and
2. deliveries.csv having ball by ball detail for every match.
This is an Udacity Nanodegree project. For this project, I was interested in using IPL dataset from 2008-2019 to better understand team statistics, venue statistics and winning statistics:
- What is the probability of winning the game at a particular venue based on decision to field/bat first on winning the toss ?
- Most dismissals by a wicketkeeper?
- Does Home Ground Advantage has any effect on the result of the game ?
- Different ML models to predict the winning team with features:
- Team 1 Name
- Team 2 Name
- Venue
- Toss Winner
IPL_Predictive_Analytics.ipynb: Notebook containing the data analysis and modelling.
matches.csv: Details of every match from 2008-2019.
deliveries.csv: Ball by ball details of every match from 2008-2019.
The main findings of the code can be found at the post available here.
Must give credit to Kaggle and Navaneesh Kumar here for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!