Skip to content

Commit

Permalink
case study and key finalized
Browse files Browse the repository at this point in the history
  • Loading branch information
sbkellogg committed Jul 16, 2023
1 parent 0cf6880 commit 7b0cbbe
Show file tree
Hide file tree
Showing 3 changed files with 283 additions and 248 deletions.
96 changes: 55 additions & 41 deletions laser-lab-case-study-key.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -306,35 +306,33 @@ which is part of the {tidyverse} suite of packages.

### Selecting variables

Let's practice selecting a few variables using a very powerful operator
called a pipe.

> **Pipes** are a powerful tool for combining a sequence of functions or
> processes.
The original pipe operator, `%>%`, comes from the
{[magrittr](https://magrittr.tidyverse.org)} package but all packages in
the tidyverse load `%>%` for you automatically, so you don't usually
load magrittr explicitly. The pipe has become such a useful and much
used operator in R that it is now baked into R using the new and simpler
native pipe `|>` operator. Use can use both fairly interchangeably, but
there are a few [differences between pipe
operators](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/).
Recall from our Prepare section that we are interested the relationship
between the time students spend on a course and their final course
grade.

Let's practice selecting a few variables by introducing a very powerful
`|>` operator called a **pipe**. Pipes are a powerful tool for combining
a sequence of functions or processes.

Run the following code chunk to "pipe" our `sci_data` to the `select()`
function include the following two variables as arguments:

- `FinalGradeCEMS` (i.e., students' final grades on a 0-100 point
scale)

Run the following code chunk "pipe" our `sci_data` to the `select()`
function with the `student_id` and `course_id` variables as arguments:
- `TimeSpent` (i.e., the number of minutes they spent in the course's
learning management system)

```{r}
sci_data |>
select(student_id, course_id)
select(FinalGradeCEMS, TimeSpent)
```

Notice how the number of columns (variables) is now different.
Notice how the number of columns (variables) is now different!

Let's *include one additional variable* in your select function.
`FinaGradeCEMS` (i.e., students' final grades on a 0-100 point scale),
and `TimeSpent` (i.e., the number of minutes they spent in the course's
learning management system).
Let's *include one additional variable* in the select function that you
think might be a predictor of students' final course grade or useful in
addressing our research question.

First, we need to figure out what variables exist in our dataset (or be
reminded of this - it's very common in R to be continually checking and
Expand All @@ -356,12 +354,21 @@ code.

```{r}
sci_data |>
select(student_id, total_points_possible, total_points_earned)
select(FinalGradeCEMS, TimeSpent)
```

Once added, the output should be different than in the code above -
there should now be an additional variable included in the print-out.

**A quick footnote about pipes**: The original pipe operator, `%>%`,
comes from the {[magrittr](https://magrittr.tidyverse.org)} package but
all packages in the tidyverse load `%>%` for you automatically, so you
don't usually load magrittr explicitly. The pipe has become such a
useful and much used operator in R that it is now baked into R using the
new and simpler native pipe `|>` operator. You can use both fairly
interchangeably but there are a few [differences between pipe
operators](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/).

### Filtering variables

Next, let's explore filtering variables. Check out and run the next
Expand Down Expand Up @@ -392,15 +399,14 @@ a thought (or more) below:

### Arrange

The last function we'll use for preparing tables is arrange.

We'll combine this `arrange()` function with a function we used
already - `select()`. We do this so we can view only the student ID and
their final grade.
The last function we'll use for preparing tables is arrange. We'll again
use the `|>` to combine this `arrange()` function with a function we
used already - `select()`. We do this so we can view only time spent and
final grades.

```{r}
sci_data |>
select(student_id, FinalGradeCEMS) |>
select(FinalGradeCEMS, TimeSpent) |>
arrange(FinalGradeCEMS)
```

Expand All @@ -410,10 +416,14 @@ as an argument with arrange, like the following:

```{r}
sci_data |>
select(student_id, FinalGradeCEMS) |>
select(FinalGradeCEMS, TimeSpent) |>
arrange(desc(FinalGradeCEMS))
```

Just at a quick cursory glance at our two variables, it does appear that
students with higher grades also tend to have spent more time in the
online course.

#### **👉 Your Turn** ****

In the code chunk below, replace `FinalGradeCEMS` that is used with both
Expand All @@ -423,7 +433,7 @@ glimpsed at the names of all of the variables.

```{r}
sci_data |>
select(student_id, FinalGradeCEMS) |>
select(TimeSpent, FinalGradeCEMS) |>
arrange(desc(FinalGradeCEMS))
```

Expand All @@ -432,12 +442,11 @@ Can you compose a series of functions that include the `select()`,
output from one function to the next as when we used select() and
arrange() together in the code chunk above.

*This is not required/necessary to complete; it's just for those who
wish to do a bit more with these functions at this time (we'll do more
in our learning labs , too!)*

```{r}
# YOUR CODE HERE
sci_data |>
select(TimeSpent, FinalGradeCEMS) |>
filter(FinalGradeCEMS > 70) |>
arrange(FinalGradeCEMS)
```

## 3. EXPLORE
Expand All @@ -448,10 +457,12 @@ deviations of numeric variables, or counting the frequency of
categorical variables) and, often, visualizing your data. As we'll learn
in later labs, the explore phase can also involve the process of
"feature engineering," or creating new variables within a dataset
[@krumm2018]. In this section, we'll quickly pull together some basic
stats using a handy function from the {skimr} package, and introduce you
to a basic data visualization "code template" for the {ggplot} package
from the tidyverse.
[@krumm2018].

In this section, we'll quickly pull together some basic stats using a
handy function from the {skimr} package, and introduce you to a basic
data visualization "code template" for the {ggplot} package from the
tidyverse.

### Summary Statistics

Expand All @@ -461,7 +472,7 @@ a few variables and quickly gather some descriptive stats using the

```{r}
sci_data |>
select(course_id, FinalGradeCEMS) |>
select(TimeSpent, FinalGradeCEMS) |>
skim()
```

Expand Down Expand Up @@ -575,6 +586,9 @@ ggplot(sci_data) +
geom_histogram(aes(x = TimeSpent), fill = "green")
```

**Tip:** There is no shame in copying and pasting code from above.
Remember, reproducible research is also intended to help you save time!

### Scatterplots

Finally, let's create a scatter plot for the relationship between these
Expand Down
Loading

0 comments on commit 7b0cbbe

Please sign in to comment.