The first file, StatisticalLearning.r, contains five queations, which cover factor selection, initial analysis, statistical learning on both qualitative and quantitative data.
Demonstrate best subsets, forward selection and backwards elimination to identify a great subset of Auto
and take in to my statistical model. Auto
is a dataset on http://www-bcf.usc.edu/~gareth/ISL/Auto.data
Get ideas how to do initial analysis using Boston
dataset, which is also a dataset in ISLR.
How many columns?
What is the range of each quatitative variables?
Is there any relation between two variables?
How many data while given constraints?
Train Weekly
dataset, which is also a dataset in ISLR, using Logistic regression, Linear Discriminant Analysis, Quadratic discriminant analysis, and K - nearest neighbor.
Create a function of linear regression on binary outcomes
Train, test, and predict sales, which is set as response among others factors of ads
, using linear regression. ads
is a dataset on http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
The second file, NonlinearRegressions.r, contains five queations, which cover cross-validation, generalized additive models, classification tree, random forest, gradient boosted machines, and regularized generalized linear models on quantitative data.
Compare errors using leave-one-out cross validation on different polynomial terms
Do statistical learning on out-of-state tuition against other factors in College
dataset using gam. College
is a dataset in ISLR.
Perform cross-validation to choose the optimal number of cuts on Age, a factor in Wage
dataset, and then find a fit regression to predict wage.
In the lab, a classification tree was applied to the Carseats
data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.
Use boosting (and bagging) to predict Salary in the Hitters
, a dataset in ISLR. Aslo, compare the mean-squared errors of linear regression and ridge regression.
The folder, ExploratoryDataAnalysis, has a rmd file, a pdf file, and a dataset folder. The pdf file was kintted by the rmd file. First, I discussed exploratory data analysis on the datasets provided for Bronx, Brooklyn, Manhattan, Queens, and Staten Island to visualize and make comparisons for residential building. Next, I did the exploratory data analysis on datasets provided nyt1.csv, nyt2.csv, and nyt3.csv to visualize some metrics and distributions over time.
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, 2014. (1) This book is available free from the author's web site at http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf (Links to an external site.)Links to an external site.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition., Springer (Tenth Printing) 2013. (2) This book is available free from the author's web site at http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf