neural-nets.Rmd

---
title: "Neural Networks"
author: "Gian Zlupko"
date: "1/28/2018"
output: html_document
---

## Part I - Introduction to Using Neural Nets

In the attached data sets attention1.csv and attention2.csv, you will find data that describe features associated with webcam images of 100 students' faces as they particpate in an online discussion. The variables are:

eyes - student has their eyes open (1 = yes, 0 = no)
face.forward - student is facing the camera (1 = yes, 0 = no)
chin.up - student's chin is raised above 45 degrees (1 = yes, 0 = no)
squint - eyes are squinting
hunch - shoulders are hunched over
mouth1 - mouth is smiling
mouth2 - mouth is frowning
mouth3 - mouth is open
attention - whether the student was paying attention when asked (1 = yes, 0 = no)

We will use the webcam data to build a neural net to predict whether or not a student is attending.

First load the neuralnet package and other libraries used 
```{r}

library(neuralnet)
library(readr) 
library(dplyr) 

```

Upload data
```{r}
D1 <- read_csv("attention1.csv") 
  
D2 <- read_csv("attention2.csv") 
```

Now you can build a neural net that predicts attention based on webcam images. The command "neuralnet" sets up the model. It is composed of four basic arguments:

- A formula that describes the inputs and outputs of the neural net (attention is our output)
- The data frame that the model will use
- How many nodes are in the hidden layer
- A threshold that tells the model when to stop adjusting weights to find a better fit. If error does not change more than the threshold from one iteration to the next, the algorithm will stop (We will use 0.01, so if prediction error does not change by more than 1% from one iteration to the next the algorithm will halt)

```{r}
nn <- neuralnet(attention == 1 ~ eyes + face.forward + chin.up + squint + hunch + mouth1 + mouth2 + mouth3, D1, hidden = c(2,2), learningrate = 0.2)

plot(nn)


#The option "hidden" allows you to change the number of hiddden layers and number of nodes within the hidden layers c(1,1) = one hidden layer with 1 node, 0 = zero hidden layers, etc

#The option "learningrate" alters the size of the steps the model takes every time it adjusts the weights.

#Change the hidden layers and learningrate options and check both the prediction and accuracy


```


You have now trained a neural network! The plot shows you the layers of your newtork as black nodes and edges with the calculated weights on each edge. The blue nodes and edges are the bias/threshold terms - it is a little bit confusing that they are represented as nodes, they are not nodes in the sense that the black nodes are. The bias anchors the activation function, the weights change the shape of the activation function while the bias term changes the overall position of the activation function - if you have used linear regression the bias term is like the intercept of the regression equation, it shifts the trend line up and down the y axis, while the other parameters change the angle of the line. The plot also reports the final error rate and the number of iterations ("steps") that it took to reach these weights.

What happens if you increase the number of hidden layers in the neural net? Build a second neural net with more or fewer layers in it and determine if this improves your predictions or not? How can you tell if your new neural network is doing a better job than your first?

Now use your preferred neural net to predict the second data set. You will need to create a new data frame (D3) that only includes the input layers to use this command.

```{r}
D3 <- subset(D2, select = c("eyes", "face.forward", "chin.up", "squint", "hunch", "mouth1", "mouth2", "mouth3"))

```

Now you can create predictions using your neural net
```{r}
#The code below will use your model to predict the outcome using D3 data
pred <- predict(nn, D3)

#The code below will tell you how accurate your model is at predicting the unseen data
table(D2$attention == 1, pred[, 1] > 0.5)

#Adjust both the hidden layer and learning rate and see if that has an impact on error, steps and prediction accuracy


nn2 <- neuralnet(attention == 1 ~ eyes + face.forward + chin.up + squint + hunch + mouth1 + mouth2 + mouth3, D1, hidden = c(4,2), learningrate = 0.1)


nn3 <- neuralnet(attention == 1 ~ eyes + face.forward + chin.up + squint + hunch + mouth1 + mouth2 + mouth3, D1, hidden = c(1,1), learningrate = 0.5)


# create custom function to calculate model accuracy based on confusion matrix 


mod_acc <- function(model) { 
  
pred <- predict(model, D3) 
x <- table(D2$attention == 1, pred[, 1] > 0.5)
true_neg <- x[1,1] 
true_pos <- x[2,2] 
false_pos <- x[1,2]
false_neg <- x[2,1]
correct <- (true_neg + true_pos) / (true_neg + true_pos + false_pos + false_neg)
correct

  }


mod_acc(nn) 
mod_acc(nn2) 
mod_acc(nn3) 

# create list object to store model accuracies for comparison 

fits <- list() 

fits$nn <- mod_acc(nn) 
fits$nn2 <- mod_acc(nn2) 
fits$nn3 <- mod_acc(nn3) 
fits


```

## Please answer the following questions:

1. How accurate is your neural net? How can you tell?

The three neural network algorithms that I created - nn, nn2, and nn3 - are 97%, 98% and 90% accurate, respectively. The accuracy of the model was calculated from the confusion matrix. Specifically, the accuracy refers to the sum of the true positive and true negatives that the model predicted divided by the total number of predictions that the model made. 

2. How would you explain your model to the students whose behavior you are predicting? 

I would explain that this model is able to predict whether the student is paying attention based on 8 factors (e.g. eyes, mouth smiling, frowning, etc). 

3. This is a very simple example of a neural network. Real facial recognition is very complex though. Would a neural network be a good solution for predicting real facial movements? Why, why not? 

Simple perceptron models would not be a good algorithm for facial recognition but other neural networks architectures, like convolutional neural networks, could offer accuracy in recognition tasks. ANNs could also be ensembled with geometric-based approaches like PCA. 


## Repeat with your own data


The project below examines a company's recruitment data (fictitional company) and examines two main approaches to analyzing recruitment data - prediction and explanation. The data used in this project was accessed from Keith McNulty's peopleanalyticsdata library (available on CRAN), which is an accompaniment to the Handbook of Regression Modeling for People Analytics.

The first half of this notebook uses a neural network model to predict the likelihood that an individual was hired based on 8 variables. The second half of this notebook uses logistic regression to predict hiring likelihood based on the same 8 variables. While regression model was slightly less accurate in predicting 'off the shelf', the use of regression allows for a closer examination of the impact that each variable has on the overall likelihood of a candidate being hired. Both models are useful but it depends the business case that needs to solved. 


Load, clean, and normalize data: 

```{r}
library(peopleanalyticsdata) 
data(recruiting) 


# clean recruiting data 
recruiting[recruiting == "F"] <- 1 
recruiting[recruiting == "M"] <- 2 
#summary(recruiting$gpa) 
#sapply(recruiting, summary) 


# data normalization ('feature scaling')
normalize <- function(x) {
  
return ((x - min(x)) / (max(x) - min(x))) 
  
  } 

recruiting_numeric <- as.data.frame(lapply(recruiting, as.numeric)) 
recruiting_norm <- as.data.frame(lapply(recruiting_numeric, normalize))
```


Data Split - training and testing sets: 

```{r}
# data split 

N <- (length(recruiting_norm$hired)/2)  

# equal split - 483 rows in each set 
train_size <- floor(nrow(recruiting_norm)*0.5) 
test_size <- (nrow(recruiting_norm) - train_size) 


set.seed(777)

selected <- sample(seq_len(nrow(recruiting_norm)),size = train_size)
train_set <- recruiting_norm[selected, ]
test_set <- recruiting_norm[-selected, ]

train_set <- as.data.frame(lapply(train_set, as.numeric))
test_set <- as.data.frame(lapply(test_set, as.numeric))


str(train_set) 
str(test_set) 

```


Generate first neural network model to predict the likelihood that an individual is hired based on their sat, gpa,and their performance on an aptitude test. 

```{r}

# hiring predictions with neural network 

hiring_mod_1 <- neuralnet(hired == 1 ~ sat + gpa + apttest, train_set, hidden = c(2,2), learningrate = 0.2)


test_mod_1 <- subset(test_set, select = c("sat", "gpa", "apttest")) 

pred <- predict(hiring_mod_1, test_mod_1)


table(test_set$hired == 1, pred[, 1] > 0.5)
```

The last line of code produced a confusion matrix for the predictions made by the first neural network model. To calculate the overall accurary of the model's predictions, a custom function is built in the following code chunk to calculate model accuracy and will be re-usable throughout the subsequent iterations of model testing in this project. 

Custom function to calculate model accuracy: 

```{r}

conf_matrix <- table(test_set$hired == 1, pred[, 1] > 0.5)

# the function takes a confusion matrix table as input 
model_accuracy <- function(x) { 
  
true_neg <- x[1,1] 
true_pos <- x[2,2] 
false_pos <- x[1,2]
false_neg <- x[2,1]
correct <- (true_neg + true_pos) / (true_neg + true_pos + false_pos + false_neg)
correct

  }


# test function output 
model_accuracy(conf_matrix)


```

In the following code chunks I add inputs and layers to the nn model sequentially. I also adjust the model learning rate to optimize performance in prediction


```{r}

# add remaining variables to the hiring model

hiring_mod_2 <- neuralnet(hired == 1 ~ sat + gpa + apttest + gender + int1 + int2 + int3, train_set, hidden = c(2,2), learningrate = 0.2)


test_mod_2 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3")) 

pred2 <- predict(hiring_mod_2, test_mod_2)

# generate confusion matrix 
table(test_set$hired == 1, pred2[, 1] > 0.5)


# assess model accuracy 

conf_matrix_2 <- table(test_set$hired == 1, pred2[, 1] > 0.5)
model_accuracy(conf_matrix_2)


plot(hiring_mod_2) 


```

By adding the 5 remaining predictor variables, the neural network's performance in predicting who would be hired by the company increased from 79% to 93%. While this is a significant improvement in performance, the model may be able to be further improved by adjusting the hidden layers, its repetitions, or its learning rate.


```{r}
# learning rate was adjusted to 0.1 
hiring_mod_3 <- neuralnet(hired == 1 ~ sat + gpa + apttest + gender + int1 + int2 + int3, train_set, hidden = c(2,2), learningrate = 0.1)


test_mod_3 <- subset(test_set, select = c("sat", "gpa", "gender", "apttest", "int1", "int2", "int3")) 


pred3 <- predict(hiring_mod_3, test_mod_3)  

# generate confusion matrix 
table(test_set$hired == 1, pred3[, 1] > 0.5)


# assess model accuracy 

conf_matrix_3 <- table(test_set$hired == 1, pred3[, 1] > 0.5)
model_accuracy(conf_matrix_3)
```


Adding additional hidden layers 

```{r}

# add additional hidden layers

hiring_mod_4 <- neuralnet(hired == 1 ~ sat + gpa + apttest + gender + int1 + int2 + int3, train_set, hidden = c(4,3,2,1), learningrate = 0.2)


test_mod_4 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3")) 

pred4 <- predict(hiring_mod_4, test_mod_4)

# generate confusion matrix 
table(test_set$hired == 1, pred4[, 1] > 0.5)


# assess model accuracy 

conf_matrix_4 <- table(test_set$hired == 1, pred4[, 1] > 0.5)
model_accuracy(conf_matrix_4)


# plot nn model 4 

plot(hiring_mod_4) 

```

 
While the neural networks predict the likelihood that a candidate will be hired with accuracy, there are business cases in which it would be helpful to better understand the relationships between the predictor variables on the outcome. The following code chunks implement logistic regression to compare its accuracy to the neural network models. Then, the components of the logistic regression model are inspect to better understand the individual contributions of the variables on the likelihood that a candidate is hired. 


Logistic Regression


```{r}

log_mod_1 <- glm(hired ~ sat + gpa + apttest + int1 + int2 + int3 , data = train_set, family = "binomial")

test_mod_5 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3")) 


pred5 <- predict(log_mod_1, test_mod_5, type = "response") 

pred5 <- as.data.frame(pred5) 

# assess model accuracy 

conf_matrix_5 <- table(test_set$hired == 1, pred5[, 1] > 0.5)
model_accuracy(conf_matrix_5)


# view model fit 

summary(log_mod_1) 


```


A summary of the logistic regression model find that SAT and GPA do not significantly contribute to the prediction. A second logistic regression model is created and tested below without SAT and GPA.


```{r}

log_mod_2 <- glm(hired ~ apttest + int1 + int2 + int3 , data = train_set, family = "binomial")


test_mod_6 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3")) 


pred6 <- predict(log_mod_2, test_mod_6, type = "response") 

pred6 <- as.data.frame(pred6) 

# assess model accuracy 

conf_matrix_6 <- table(test_set$hired == 1, pred6[, 1] > 0.5)
model_accuracy(conf_matrix_6)


# view model fit 

summary(log_mod_2) 

# compare the models 

anova(log_mod_1, log_mod_2, test = "Chisq") 


```


A lower AIC value was obtained for AIC, which is an indication that the model fits the data better. However, a chi square test of significance did not find that the models were significantly different. 

In any case, the use of logistic regression in this example, as a follow up to neural network predictions, shows the utility of regression analysis as it allows a closer examination of the individual contributions of the variables. While the neural network performs better without any parameter tuning, the logistic regression aids interpretitability, which would be essential if we were designing interventions to improve this company's HR recruiting function.


An accompaniment to this Rmd file can be found on Kaggle at https://www.kaggle.com/gianzlupko/recruiting-decisions-predicting-vs-explaining?scriptVersionId=57300078.