Credit Card Fraud Detection using Logistic Regression on credit card dataset
As this is a binary classification problem we will be using Logistic Regression model for model training
- Collection of data
- Data Preprocessing
- Splitting test and training data
- Model Training
- Model Evaluation
- Prediction System
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# importing data
transaction_dataset= pd.read_csv("/content/drive/MyDrive/google_collab/creditcard.csv")
transaction_dataset.head(10)
- shape
- info()
- describe()
- isnull
- count_values()
- dtypes
- 0 : Normal transaction
- 1 : Fraudulent transaction
legit = transaction_dataset[transaction_dataset.Class == 0]
fraud = transaction_dataset[transaction_dataset.Class == 1]
comparing the samples
# comparing the values for both transaction
transaction_dataset.groupby('Class').mean()
- build a sample dataset having similar distribution of normal and fraudulent transactions.
- number of fraudulent transaction is = 492
plt.figure(figsize = (20,11))
# heatmap size in ration 16:9
sns.heatmap(new_transaction_dataset2.corr(), annot = True, cmap = 'coolwarm')
# heatmap parameters
plt.title("Heatmap for correlation matrix for credit card data ", fontsize = 22)
plt.show()
X = new_transaction_dataset2.drop(columns = 'Class', axis = 1)
Y = new_transaction_dataset2['Class']
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify = Y, random_state = 2)
model = LogisticRegression()
model.fit(X_train, Y_train)
print("\nAccuracy on Training data ",traning_data_accuracy,"\n")
print("Accuracy on Training data ",test_data_accuracy)