Income Prediction
Dataset: https://archive.ics.uci.edu/ml/datasets/Adult
Task 1: binary classification
- prediction_baselines.py --- classification models using sklearn
- bert_clf.py --- tried to use BERT model to do the classification task TODO: find a larger dataset to solve the under-fitting problem seen in BERT model; preprocessing; More models
Task 2: clustering
- clustering.py --- find clusters among >50k TODO: find the center of each cluster for better interpretation
Task 3: Multivariable analysis (TODO)