Early assesments (courses 1-4) were mostly completed using Datacamp. Once productivity tools, like RStudio and GitHub were introduced in course 5, the scripts were completed in .R scripts.
- Basic R syntax, data types, vectors arithmetic, indexing, sorting, sorting using
dplyr
, and plotting using basic packages.
- Data visualization principles, creating custom plots with
ggplot2
, and studying the advantages and pitfalls of widely-use plots.
- Probability theory concepts including the central limit theorem, random variables and independence, performing Monte Carlo simulations, and computing expected values and standard errors.
- Defining parameters, estimates and standard errors, and margins of errors of populations in order to make predictions about data. Modeling aggregate data from different sources, Bayesian statistics and predictive modeling.
- Introduction of command line filing system, utilization of version control with git, and leveraging the powerful tools in RStudio.
- Importing data from different file formats, web scraping, tidy data with
tidyverse
, processing string with regex, wrangling data withdplyr
, handling date and time formats in data, and text mining.
- Developing linear regression mathematically, explaining and detecting confounding, implementing linear regression to understand the relationship between variables.
- Machine learning basics, cross-validation to avoid overtraining, using popular machine learning algorithms from the
caret
package, employing regularization when appropriate.
- Applying the skills learned throughout the series to a real-world problem through an independent data analysis project. See README file in 9-Capstone folder for project descriptions.