Skip to content

An exploratory data analysis (EDA) and various statistical tests performed on a dataset focused on stroke prediction. The analysis includes linear and logistic regression models, univariate descriptive analysis, ANOVA, and chi-square tests, among others.

Notifications You must be signed in to change notification settings

ankitlehra/Stroke-Prediction-Dataset---Exploratory-Data-Analysis

Repository files navigation

Exploratory Data Analysis on Stroke Prediction Dataset

This repository contains an exploratory data analysis (EDA) and various statistical tests performed on a dataset focused on stroke prediction. The analysis includes linear and logistic regression models, univariate descriptive analysis, ANOVA, and chi-square tests, among others. The dataset provides insights into factors such as age, work type, smoking status, hypertension, and marital status, and their potential impact on the likelihood of stroke.

Files in this Repository

  • Data_Documentation.docx: Detailed documentation of the dataset and methodology followed.

  • Graphs & Models:

    • Linear Regression: Analysis of the relationship between age and average glucose levels.
    • Logistic Regression: Binary outcome predictions of stroke occurrence.
    • Univariant Descriptive Analysis: Summary statistics for key variables.
    • ANOVA Test: Comparison of variance between groups.
    • Chi-Square, OR, and RR Tests: Relationships between categorical variables.
    • Forecast Model: A model predicting glucose levels based on age.
  • Screenshots: Visual representations of the various graphs and statistical tests conducted.

How to Use

  1. Download the project files from the repository.
  2. Open the Excel workbook and navigate through the sheets to review each analysis.
  3. Refer to the Data_Documentation.docx for a detailed explanation of each test and analysis conducted.

Tools Used

  • Microsoft Excel for analysis and statistical testing.
  • Various statistical methods including regression, ANOVA, and chi-square tests.

Visual Representation

The visual representation of the analysis can be found in the screenshots folder, which includes the following:

  • Linear Regression Analysis
  • Logistic Regression Model
  • Forecast Model
  • ANOVA, Chi-Square, and RR Tests

Dataset

The dataset used in this analysis focuses on stroke prediction based on medical and lifestyle attributes like age, hypertension, work type, smoking status, and more. You can access the dataset on Kaggle: Stroke Prediction Dataset

About

An exploratory data analysis (EDA) and various statistical tests performed on a dataset focused on stroke prediction. The analysis includes linear and logistic regression models, univariate descriptive analysis, ANOVA, and chi-square tests, among others.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published