Skip to content
Vikrant Deshpande edited this page Apr 28, 2023 · 7 revisions

Welcome aboard!

These are the wiki pages for my Credit Card Fraud Detection Machine Learning pipeline on GCP!

A crucial aspect for credit-card companies to recognize fraudulent activity and not charge customers for items they didn't purchase.

  1. FP - Companies lose around 33% loyal customers due to blocked transactions. An excellent article by Stripe here.
  2. FN - On the other hand, fraudulent transactions going unnoticed could lead to potential lawsuits, operational dispute fees, etc.

A subtle line needs to be drawn and my main focus here is to reduce blocked-transactions of legitimate customers, as well as block the truly fraudulent transactions.

If you're a financial analyst looking to quickly identify fraudulent transactions in your dataset, look no further.

The framework provided in this repo, should let you fluently create your own GCP pipeline to trigger an Airflow pipeline based upon source-file create events in the GCS bucket. The Airflow pipeline itself, is meant for training, deploying and analyzing the Random Forest model.

Here's a quick demo of what's supposed to happen when you upload a Train/Test file onto GCS in the configured bucket.

Demonstration-GIF

Essentially:

You upload the batch Train/Test file onto GCS --> 
   A Google Cloud Function monitors these Creation events --> 
     The function triggers an Airflow DAG -->
       The Airflow DAG trains and registers a model, dataset-reports, etc. onto GCS itself -->
         Ingests these dataset-reports into BigQuery --> 
           Archives the source files on GCS again to enable monitoring of new-file creation events.

image

Clone this wiki locally