-
Notifications
You must be signed in to change notification settings - Fork 1
/
LoanAmount.Rmd
68 lines (53 loc) · 2.82 KB
/
LoanAmount.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
title: "Loan Amounts"
bibliography: bibliographies.bib
csl: journal-of-finance.csl
---
```{r, echo=F}
options(scipen = 6)
knitr::opts_chunk$set(cache=FALSE, fig.height=3, fig.width = 7, comment=NULL, eval=T, tidy=F, width=80, message= FALSE, echo= FALSE, warning = FALSE)
```
```{r, child="_preamble.Rmd"}
```
```{r, child="_nofilter.Rmd"}
```
## Analyzing the Loan Amount
```{r}
ggplot(LoanData) +
aes(x=loan_amnt, y=grade, fill=grade) +
geom_joy() +
labs(title= "Distribution of loan amount by grades",
x= "Loan Amount") +
theme_LC()
```
>**QUESTION:**
>Why is there a peak at $24,000 and not at $25,000? Is there a cutoff in the risk scoring model?
> **Answer:**
>In the past, Lending Club had adjusted the loan grades for the loan amounts. Their model started with a base interest rate and was then adjusted up based on their FICO score. The rates were then further adjusted for the loan amount. Other factors influenced the score such as the number of inquiries and the number of accounts. [@indLendAcad1] Posit: If this was widely known, then borrowers would apply for less than the breakpoint to avoid the extra interest charges. This is a plausible explanation for why there was more loans at 24,000 than 25,000 if that was one of the cutoffs.
```{r, fig.height=6}
g<- ggplot(LoanData)
g<- g + aes(loan_amnt, group=Class, colour=Class)
g<- g + geom_density(size=.75)
g<- g + labs(x= "Loan Amount")
g<- g + facet_grid(grade~.)
g<- g + theme_LC()
g<- g + labs(title= "Distribution of loan amount by default status")
g
```
Looking at grades D through G, the distribution of the defaulted and is fairly consitent with the distribution of current loans. On it's own, this suggests that loan amount won't have a predictive value over default rates. The exception is there seems to be a slight bump at the $35,000 level which is the maximum Lending Club offers. Loans at the maximum amount default at a disproportionately low rate. There may be an advantage in investing in loans in these grades at the max loan amount.
There isn't the same hump at the $35,000 level when looking at grade A & B loans. Those loans are more highly concentrated at the lower loan amount. Perhaps LC's credit rating status punishes the borrower for a higher loan amount.
>**QUESTION:**
>Why does the $35,000 loan amount have lower proportionate defaults in grades C though E? Is this related to the apparent cutoff noted above? Take a look at $35K loans and the relationship with other variables.
```{r, fig.height=6}
g<- ggplot(LoanData)
g<- g + aes(loan_amnt, group=Class, colour=Class)
g<- g + geom_density(size=.75)
g<- g + labs(x= "Loan Amount")
g<- g + facet_grid(format(LoanData[,"issue_d"],'%Y')~.)
g<- g + theme_LC()
g<- g + labs(title= "Distribution of loan amount by vintage")
g
```
```{r, child="_DataVersion.Rmd"}
```
### References