Skip to content

Latest commit

 

History

History
186 lines (146 loc) · 13 KB

learning_r.md

File metadata and controls

186 lines (146 loc) · 13 KB

Learning R: 2019 Summer Workshop for NYU CDSC Lab

Before we begin, this workshop pulls from resources written by a lot of amazing people and they deserve credit for it!

A number of the book chapters and other resources we are reading were written by Hadley Wickham, Danielle Navarro, Jenny Bryan, Jim Hester, Kieran Healy, and Andy Fields. Several of the tutorials we are working through are from a course that was taught by Dale Barr and Lisa DeBruine.

Getting your data ready for statistical analysis

Week 0: Installing R

Downloading R: Download the appropriate version for your operating system (Mac or Windows)
Downloading RStudio: RStudio to makes it much easier to code in R

Week 1: Intro to R

Reading + exercises: Learning basics about R (Part1) Reading + exercises: Learning basics about R (Part2) Reading: More about packages Reading: More about variables Reading: More about vectors Resource: Cheat sheet on how to use RStudio Resource: Cheat sheet on basic R functions

Week 2: Data Visualization

Reading: What makes a goodplot? Notes + exercises: Making plots Reading + exercises: Getting a better understanding of the code used to make plots (Chapter 3, especially 3.3 -3.10) Resource: Examples of plots with corresponding R code Resource: Resource for helping you select the best way to visualize your data Resource: R Graphics Cookbook: A practical guide to help you build graphs in R Resource: Cheat sheet on data visualization

Week 3: Data Wrangling Part 1: Learning the fundamentals of tidyr

Reading: Tidy Data Reading: Using pipes to tidy data Notes + exercises: Learning tidyr Reading (optional): Manipulating your data using tidyr (this reading provides similar information as the other readings, but may be useful to you if the other materials weren't clear) Resource: Cheat sheet on importing and tidying data Resource: Cheat sheet on processing dates using lubridate

Week 4: Data Wrangling Part 2: Learning the fundamentals of dplyr

Reading: Describing data Reading + exercises: Data transformation Notes + exercises: Learning the main 6 dplyr verbs Resource: Cheat sheet on data transformation with dplyr

Week 5: Joining Data: More fundamentals of dplyr

Reading + exercises: Relational data Notes + exercises: Joining data using the dplyr's join verbs Resource: Cheat sheet on data transformation with dplyr

Week 6: Iterations, Loops, Branches, and Functions

Reading + exercises: Using loops in R Reading + exercises: Using branches in R Reading + exercises: Creating your own functions in R Notes + exercises: Iterating and more practice creating your own functions in R Reading (optional): More about loops and iterating in R Reading (optional): More about writing your own functions in R

Week 7: Reproducible Workflow and Other Best Practices

Reading: Using R Markdown Notes + exercises: Creating reproducible code in R Reading: How to properly set paths Slides: How to name files Reading: How to debug your R code Resource: Cheat sheet on R Markdown

Week 8: Version Control

Reading: Why GitHub? Reading + exercises: Installing Git (Read Chapters 4 - 7; 8 is optional) Reading + exercises: Connecting GitHub and RStudio (Chapters 9 & 12; 14 is helpful if you are having problems connecting!) Reading + exercises: Using GitHub to store R code (Read through Chapter 15; 16 and 17 are for your reference for future projects) Reading + exercises: Basics of Git (Chapter 20; Chapter 21 - 23 for more advanced stuff)

Moving forward: Other things you should learn about R:

Introductory statistics

Reading: Learning statistics with R: A tutorial for psychology students: Great book for learning how to do the stats from an introductory stats course in R Chapter 1: Why do we learn statistics? Chapter 2: Introduction to research design Chapter 5: Descriptive statistics Chapter 9 - 11: Statistical theory: Introduction to probability, Estimating unknown quantities from a sample, and Hypothesis testing Chapter 12 - 16: Statistical tools: Categorical data analysis, Comparing two means, Comparing several means, Linear regression, and Factorial ANOVA Chapter 17: Bayesian statistics Book: Discovering statistics with R: Covers similar content as Learning statistics with R plus much more Exercises: Interactive tutorials for learning how to do statistics in R (companion to An Adventure in Statistics, a book by Andy Fields. Book is not free, but the tutorials are!)\

Advanced statistics

Frequentist approaches: Reading + exercises: Mixed models in R: Tutorial on how to build mixed models in R (best for people who already have some understanding of what mixed models are) Exercises: Interactive tutorials on statistics in R (includes tutorials on multilevel models and growth models) Reading: Psychometrics in R Reading: Exploring interactions using `interactions` package: Great package for exploring interactions, particularly interactions with one or more continuous variables

Bayesian approaches: Book:* Bayesian statistics in R (plus the solutions for the book's exercises) Best coding practices Reading: Style Guide: How to write code that is easy for you and others to read Chapter 2: General syntax style guide Chapter 4: Style guide for using pipes Advanced topics in R

Improve your programming skills and gain a deep understanding of the R language: Book: Advanced R: Book on advanced topics in R, including an more in-depth discussion on the foundations of R plus chapters on functional programming, metaprogramming, and performant code. If you can understand the concepts in this book, you will have a strong foundation for learning any programming language.

Manipulating strings and pattern matching in R using regular expressions: Reading + exercises: Book chapter on manipulating strings Reading + video tutorial: Base R functions for doing regular expressions Reading: Using stringr (a tidyverse package) for regular expressions Resource: Cheatsheet for basic regular expressions in R

data.table: An alternative approach for wrangling data You may be asking: if I can already wrangle my data using tidyverse, why learn another approach? While I personally prefer to use tidyverse or baseR functions for manipulating data (I find these approaches to be more readable and user-friendly), data.table is a more efficient codebase. That is, code using data.table runs more quickly, which is particularly useful when working with large datasets. Many psychology researchers wouldn't notice any gains in speed by implementing their code using data.table (our datasets just aren't the big). But! If you work with big datasets (like hundreds of thousands to millions of lines of data), data.table can help speed up data processing. Reading: Introduction to data.table: Vignette on data.table's syntax and to perform actions comparable to those in tidyverse's dplyr and tidyr packages Simulating data using R When pre-registering your study, one best practice is to also pre-register all the R code you will use for your analyses. How do you write code without data? One way: simulate a dataset and use that data as you work through your analyses.
Reading: Getting started simulating data in R: Blog with a good introduction into some of the functions you will use when simulating a dataset. Reading + exercises: Lab to practice simulating data using R
Slides + exercises: Slides on simulating data using R: These slides include a series of exercises to go through as you go along
Book: Introduction to Scientific Programming and Simulation Using R: This book assumes no prior experience in programming or probability. Section 3 on Probability and Section 4 on Simulation are the most relevant for those who already have experience programming in R (i.e., you are familiar with earlier programming topics discussed on this syllabus).\

General Resources

Cheat sheets

Various cheat sheets on a range of topics, from dplyr, ggplot, RMarkdown, and more!

Books / Tutorials

psyTeachR: Great resource that provides a number of interactive books and tutorials for doing reproducible research in R. This website covers a broad range of topics on data cleaning, visualization, reproducible workflows, and more. From their website: "Our curriculum now emphasizes essential ‘data science’ graduate skills that have been overlooked in traditional approaches to teaching, including programming skills, data visualisation, data wrangling and reproducible reports. Students learn about probability and inference through data simulation as well as by working with real datasets.""