Algorithm in python to determine how many people had stroke by age using linear regression and answering other questions.
Linear Regression is a supervised learning algorithm in machine learning that supports finding the linear correlation among variables. The model consists of the relationship between a dependent (Y) and an independent variable (X). The dependent variable is predicted by using the independent variable using a fitted equation model. linear equation is defined by y = a + bx.
Coefficient of determination, in statistics, R2 (or r2), a measure that assesses the ability of a model to predict or explain an outcome in the linear regression setting. More specifically, R2 indicates the proportion of the variance in the dependent variable (Y) that is predicted or explained by linear regression and the predictor variable (X, also known as the independent variable). The closer the r-squared (R2) value is to 1, the better the fit.
The resulting equation of the line was Y = 14.70962255*X -97.58113729998729 (R2 = 0.81), for effect of age of people that suffered stroke.
In this code different ways of using some statistics like Student's t-Test and confidence interval were presented and discussed. The Rpy2 module was even used, which allows the use of the R programming language within Python programming language.