-
Notifications
You must be signed in to change notification settings - Fork 0
/
neural-nets.Rmd
408 lines (212 loc) · 14 KB
/
neural-nets.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
---
title: "Neural Networks"
author: "Gian Zlupko"
date: "1/28/2018"
output: html_document
---
## Part I - Introduction to Using Neural Nets
In the attached data sets attention1.csv and attention2.csv, you will find data that describe features associated with webcam images of 100 students' faces as they particpate in an online discussion. The variables are:
eyes - student has their eyes open (1 = yes, 0 = no)
face.forward - student is facing the camera (1 = yes, 0 = no)
chin.up - student's chin is raised above 45 degrees (1 = yes, 0 = no)
squint - eyes are squinting
hunch - shoulders are hunched over
mouth1 - mouth is smiling
mouth2 - mouth is frowning
mouth3 - mouth is open
attention - whether the student was paying attention when asked (1 = yes, 0 = no)
We will use the webcam data to build a neural net to predict whether or not a student is attending.
First load the neuralnet package and other libraries used
```{r}
library(neuralnet)
library(readr)
library(dplyr)
```
Upload data
```{r}
D1 <- read_csv("attention1.csv")
D2 <- read_csv("attention2.csv")
```
Now you can build a neural net that predicts attention based on webcam images. The command "neuralnet" sets up the model. It is composed of four basic arguments:
- A formula that describes the inputs and outputs of the neural net (attention is our output)
- The data frame that the model will use
- How many nodes are in the hidden layer
- A threshold that tells the model when to stop adjusting weights to find a better fit. If error does not change more than the threshold from one iteration to the next, the algorithm will stop (We will use 0.01, so if prediction error does not change by more than 1% from one iteration to the next the algorithm will halt)
```{r}
nn <- neuralnet(attention == 1 ~ eyes + face.forward + chin.up + squint + hunch + mouth1 + mouth2 + mouth3, D1, hidden = c(2,2), learningrate = 0.2)
plot(nn)
#The option "hidden" allows you to change the number of hiddden layers and number of nodes within the hidden layers c(1,1) = one hidden layer with 1 node, 0 = zero hidden layers, etc
#The option "learningrate" alters the size of the steps the model takes every time it adjusts the weights.
#Change the hidden layers and learningrate options and check both the prediction and accuracy
```
You have now trained a neural network! The plot shows you the layers of your newtork as black nodes and edges with the calculated weights on each edge. The blue nodes and edges are the bias/threshold terms - it is a little bit confusing that they are represented as nodes, they are not nodes in the sense that the black nodes are. The bias anchors the activation function, the weights change the shape of the activation function while the bias term changes the overall position of the activation function - if you have used linear regression the bias term is like the intercept of the regression equation, it shifts the trend line up and down the y axis, while the other parameters change the angle of the line. The plot also reports the final error rate and the number of iterations ("steps") that it took to reach these weights.
What happens if you increase the number of hidden layers in the neural net? Build a second neural net with more or fewer layers in it and determine if this improves your predictions or not? How can you tell if your new neural network is doing a better job than your first?
Now use your preferred neural net to predict the second data set. You will need to create a new data frame (D3) that only includes the input layers to use this command.
```{r}
D3 <- subset(D2, select = c("eyes", "face.forward", "chin.up", "squint", "hunch", "mouth1", "mouth2", "mouth3"))
```
Now you can create predictions using your neural net
```{r}
#The code below will use your model to predict the outcome using D3 data
pred <- predict(nn, D3)
#The code below will tell you how accurate your model is at predicting the unseen data
table(D2$attention == 1, pred[, 1] > 0.5)
#Adjust both the hidden layer and learning rate and see if that has an impact on error, steps and prediction accuracy
nn2 <- neuralnet(attention == 1 ~ eyes + face.forward + chin.up + squint + hunch + mouth1 + mouth2 + mouth3, D1, hidden = c(4,2), learningrate = 0.1)
nn3 <- neuralnet(attention == 1 ~ eyes + face.forward + chin.up + squint + hunch + mouth1 + mouth2 + mouth3, D1, hidden = c(1,1), learningrate = 0.5)
# create custom function to calculate model accuracy based on confusion matrix
mod_acc <- function(model) {
pred <- predict(model, D3)
x <- table(D2$attention == 1, pred[, 1] > 0.5)
true_neg <- x[1,1]
true_pos <- x[2,2]
false_pos <- x[1,2]
false_neg <- x[2,1]
correct <- (true_neg + true_pos) / (true_neg + true_pos + false_pos + false_neg)
correct
}
mod_acc(nn)
mod_acc(nn2)
mod_acc(nn3)
# create list object to store model accuracies for comparison
fits <- list()
fits$nn <- mod_acc(nn)
fits$nn2 <- mod_acc(nn2)
fits$nn3 <- mod_acc(nn3)
fits
```
## Please answer the following questions:
1. How accurate is your neural net? How can you tell?
The three neural network algorithms that I created - nn, nn2, and nn3 - are 97%, 98% and 90% accurate, respectively. The accuracy of the model was calculated from the confusion matrix. Specifically, the accuracy refers to the sum of the true positive and true negatives that the model predicted divided by the total number of predictions that the model made.
2. How would you explain your model to the students whose behavior you are predicting?
I would explain that this model is able to predict whether the student is paying attention based on 8 factors (e.g. eyes, mouth smiling, frowning, etc).
3. This is a very simple example of a neural network. Real facial recognition is very complex though. Would a neural network be a good solution for predicting real facial movements? Why, why not?
Simple perceptron models would not be a good algorithm for facial recognition but other neural networks architectures, like convolutional neural networks, could offer accuracy in recognition tasks. ANNs could also be ensembled with geometric-based approaches like PCA.
## Repeat with your own data
The project below examines a company's recruitment data (fictitional company) and examines two main approaches to analyzing recruitment data - prediction and explanation. The data used in this project was accessed from Keith McNulty's peopleanalyticsdata library (available on CRAN), which is an accompaniment to the Handbook of Regression Modeling for People Analytics.
The first half of this notebook uses a neural network model to predict the likelihood that an individual was hired based on 8 variables. The second half of this notebook uses logistic regression to predict hiring likelihood based on the same 8 variables. While regression model was slightly less accurate in predicting 'off the shelf', the use of regression allows for a closer examination of the impact that each variable has on the overall likelihood of a candidate being hired. Both models are useful but it depends the business case that needs to solved.
Load, clean, and normalize data:
```{r}
library(peopleanalyticsdata)
data(recruiting)
# clean recruiting data
recruiting[recruiting == "F"] <- 1
recruiting[recruiting == "M"] <- 2
#summary(recruiting$gpa)
#sapply(recruiting, summary)
# data normalization ('feature scaling')
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
recruiting_numeric <- as.data.frame(lapply(recruiting, as.numeric))
recruiting_norm <- as.data.frame(lapply(recruiting_numeric, normalize))
```
Data Split - training and testing sets:
```{r}
# data split
N <- (length(recruiting_norm$hired)/2)
# equal split - 483 rows in each set
train_size <- floor(nrow(recruiting_norm)*0.5)
test_size <- (nrow(recruiting_norm) - train_size)
set.seed(777)
selected <- sample(seq_len(nrow(recruiting_norm)),size = train_size)
train_set <- recruiting_norm[selected, ]
test_set <- recruiting_norm[-selected, ]
train_set <- as.data.frame(lapply(train_set, as.numeric))
test_set <- as.data.frame(lapply(test_set, as.numeric))
str(train_set)
str(test_set)
```
Generate first neural network model to predict the likelihood that an individual is hired based on their sat, gpa,and their performance on an aptitude test.
```{r}
# hiring predictions with neural network
hiring_mod_1 <- neuralnet(hired == 1 ~ sat + gpa + apttest, train_set, hidden = c(2,2), learningrate = 0.2)
test_mod_1 <- subset(test_set, select = c("sat", "gpa", "apttest"))
pred <- predict(hiring_mod_1, test_mod_1)
table(test_set$hired == 1, pred[, 1] > 0.5)
```
The last line of code produced a confusion matrix for the predictions made by the first neural network model. To calculate the overall accurary of the model's predictions, a custom function is built in the following code chunk to calculate model accuracy and will be re-usable throughout the subsequent iterations of model testing in this project.
Custom function to calculate model accuracy:
```{r}
conf_matrix <- table(test_set$hired == 1, pred[, 1] > 0.5)
# the function takes a confusion matrix table as input
model_accuracy <- function(x) {
true_neg <- x[1,1]
true_pos <- x[2,2]
false_pos <- x[1,2]
false_neg <- x[2,1]
correct <- (true_neg + true_pos) / (true_neg + true_pos + false_pos + false_neg)
correct
}
# test function output
model_accuracy(conf_matrix)
```
In the following code chunks I add inputs and layers to the nn model sequentially. I also adjust the model learning rate to optimize performance in prediction
```{r}
# add remaining variables to the hiring model
hiring_mod_2 <- neuralnet(hired == 1 ~ sat + gpa + apttest + gender + int1 + int2 + int3, train_set, hidden = c(2,2), learningrate = 0.2)
test_mod_2 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3"))
pred2 <- predict(hiring_mod_2, test_mod_2)
# generate confusion matrix
table(test_set$hired == 1, pred2[, 1] > 0.5)
# assess model accuracy
conf_matrix_2 <- table(test_set$hired == 1, pred2[, 1] > 0.5)
model_accuracy(conf_matrix_2)
plot(hiring_mod_2)
```
By adding the 5 remaining predictor variables, the neural network's performance in predicting who would be hired by the company increased from 79% to 93%. While this is a significant improvement in performance, the model may be able to be further improved by adjusting the hidden layers, its repetitions, or its learning rate.
```{r}
# learning rate was adjusted to 0.1
hiring_mod_3 <- neuralnet(hired == 1 ~ sat + gpa + apttest + gender + int1 + int2 + int3, train_set, hidden = c(2,2), learningrate = 0.1)
test_mod_3 <- subset(test_set, select = c("sat", "gpa", "gender", "apttest", "int1", "int2", "int3"))
pred3 <- predict(hiring_mod_3, test_mod_3)
# generate confusion matrix
table(test_set$hired == 1, pred3[, 1] > 0.5)
# assess model accuracy
conf_matrix_3 <- table(test_set$hired == 1, pred3[, 1] > 0.5)
model_accuracy(conf_matrix_3)
```
Adding additional hidden layers
```{r}
# add additional hidden layers
hiring_mod_4 <- neuralnet(hired == 1 ~ sat + gpa + apttest + gender + int1 + int2 + int3, train_set, hidden = c(4,3,2,1), learningrate = 0.2)
test_mod_4 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3"))
pred4 <- predict(hiring_mod_4, test_mod_4)
# generate confusion matrix
table(test_set$hired == 1, pred4[, 1] > 0.5)
# assess model accuracy
conf_matrix_4 <- table(test_set$hired == 1, pred4[, 1] > 0.5)
model_accuracy(conf_matrix_4)
# plot nn model 4
plot(hiring_mod_4)
```
While the neural networks predict the likelihood that a candidate will be hired with accuracy, there are business cases in which it would be helpful to better understand the relationships between the predictor variables on the outcome. The following code chunks implement logistic regression to compare its accuracy to the neural network models. Then, the components of the logistic regression model are inspect to better understand the individual contributions of the variables on the likelihood that a candidate is hired.
Logistic Regression
```{r}
log_mod_1 <- glm(hired ~ sat + gpa + apttest + int1 + int2 + int3 , data = train_set, family = "binomial")
test_mod_5 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3"))
pred5 <- predict(log_mod_1, test_mod_5, type = "response")
pred5 <- as.data.frame(pred5)
# assess model accuracy
conf_matrix_5 <- table(test_set$hired == 1, pred5[, 1] > 0.5)
model_accuracy(conf_matrix_5)
# view model fit
summary(log_mod_1)
```
A summary of the logistic regression model find that SAT and GPA do not significantly contribute to the prediction. A second logistic regression model is created and tested below without SAT and GPA.
```{r}
log_mod_2 <- glm(hired ~ apttest + int1 + int2 + int3 , data = train_set, family = "binomial")
test_mod_6 <- subset(test_set, select = c("sat", "gpa", "apttest", "gender", "int1", "int2", "int3"))
pred6 <- predict(log_mod_2, test_mod_6, type = "response")
pred6 <- as.data.frame(pred6)
# assess model accuracy
conf_matrix_6 <- table(test_set$hired == 1, pred6[, 1] > 0.5)
model_accuracy(conf_matrix_6)
# view model fit
summary(log_mod_2)
# compare the models
anova(log_mod_1, log_mod_2, test = "Chisq")
```
A lower AIC value was obtained for AIC, which is an indication that the model fits the data better. However, a chi square test of significance did not find that the models were significantly different.
In any case, the use of logistic regression in this example, as a follow up to neural network predictions, shows the utility of regression analysis as it allows a closer examination of the individual contributions of the variables. While the neural network performs better without any parameter tuning, the logistic regression aids interpretitability, which would be essential if we were designing interventions to improve this company's HR recruiting function.
An accompaniment to this Rmd file can be found on Kaggle at https://www.kaggle.com/gianzlupko/recruiting-decisions-predicting-vs-explaining?scriptVersionId=57300078.