The applications data was analyzed for the purpose of developing a supervised identity fraud detection model to identify candidates for fraudulent applications. To build this model, the fraud label was assessed in relation to the linkage of five personally identifiable parameters which include SSN, address, phone number, date of birth and zip code. I created these time-window variables using sqldf library in R because it's efficient and easy to understand.
- Download both data files and R code in the same folder
- Run
Creating time-window variables with sqldf.R
in Rstudio
- Dataset file:
applications.csv
- R code:
Creating time-window variables with sqldf.R
- Explanations:
Variable Creation.pdf
- Ian Chi