Hence, common machine learning classification algorithms such as logistic regression would not work. We used multivariate gaussian algorithm to detect the anomalies in the data. The expression for univariate Gaussian is given by :
norm
function in Anomaly Detection.py is defined to calculate the univariate Gaussian for any feature.
For multivariate Gaussian, univariate gaussian probabilities for all the features are calculated and multiplied together. This product is a multivariate gaussian distribution and can be expresses as :
We use confusion matrix to measure the efficiency of the model. For a fraud detection system , it should be able to capture maximum number of True Positive cases and it must avoid False Negatives. We measure the perforance of model by calculating the Recall and Precision of the model for arious threshold values of . Transactions whose value is less than the threshold would be considered as an anomaly or a fraudulent transaction.
The code can be used for any data anomaly detection. You needs to modify the dimensions of dataset as per your dataset and use an appropriate value of threshold probability below which signifies the anomaly behaviour.https://www.coursera.org/lecture/machine-learning/multivariate-gaussian-distribution-Cf8DF