Skip to content

akshit3797/Write-A-Data-Science-Blog-Post

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Write-A-Data-Science-Blog-Post

CRISP-DM Project of Udacity Data Scientist Nanodegree

Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.

You will need to download the IPL dataset from Kaggle. You can find the data to download here.

The dataset has 2 files:
1. matches.csv having every match detail from 2008 to 2019 and
2. deliveries.csv having ball by ball detail for every match.

Project Motivation

This is an Udacity Nanodegree project. For this project, I was interested in using IPL dataset from 2008-2019 to better understand team statistics, venue statistics and winning statistics:

  1. What is the probability of winning the game at a particular venue based on decision to field/bat first on winning the toss ?
  2. Most dismissals by a wicketkeeper?
  3. Does Home Ground Advantage has any effect on the result of the game ?
  4. Different ML models to predict the winning team with features:
    • Team 1 Name
    • Team 2 Name
    • Venue
    • Toss Winner

File Descriptions

IPL_Predictive_Analytics.ipynb: Notebook containing the data analysis and modelling.
matches.csv: Details of every match from 2008-2019.
deliveries.csv: Ball by ball details of every match from 2008-2019.

Results

The main findings of the code can be found at the post available here.

Licensing, Authors, Acknowledgements

Must give credit to Kaggle and Navaneesh Kumar here for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!