Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

Latest commit

 

History

History
59 lines (30 loc) · 3.3 KB

README.md

File metadata and controls

59 lines (30 loc) · 3.3 KB

Boosted-KDE

Boosted-KDE is a package for boosting the kernel density estimate (KDE) of numerical data. The notion of boosting the KDE has been proposed by Prof. Marco Di Marzio and Prof. Charles Taylor. The aim of their original paper was to create a new classification algorithm based on KDE and boosting, named BoostKDC. Here, I implemented the algorithm outlined in Ref. [1] for KDE boosting to assign a weight to each observation in a dataset with the aim of detecting outliers or anomalies.

This algorithm rests on the idea of comparing the KDE computed with all samples with the KDE recomputed on all samples except the one of interest. In other words, the algorithm outputs a weight for each sample, which is equal to the log odds ratio between the full KDE and the correspoding leave-one-out estimate.

Intuitevely, this algorithm has great potential in the context of outlier / anomaly detection, because there will be lower relative loss where observations are more frequent (the density is higher), whereas a major change in relative loss will occur where samples are more rare (the density is lower).

The "boosting" part of the algorithm was originally implemented as part of an AdaBoost classifier. However, in the context of outlier detection, which is unsupervised, it rarely shows benefits to run the algorithm more than once.

Implementation

The python class KDEBoosting encapsulates the data on which to calculate the weights for each observation. It allows the user to:

  • Run boosting iterations not consecutively, which means that the algorithm can pick up the boosting process from the last iteration without having to restart from scratch
  • Plot outcome and report diagnostic information [coming soon]

KDE in highly dimensional feature spaces becomes quickly unfeasible with parametric methods. Therefore, this class computes the KDE with a non-parametric FFT method, as implemented in KDEpy.

Dependencies

The class KDEBoosting depends on:

Usage

Simply import the class and pass your data to it:

from boosted_KDE import KDEBoosting
bKD = KDEBoosting(data)

Weights can be accessed via the attributes:

.normalized_weights
.weights    

The theory notebook shows how the algorithm can give more or less weight to outliers.

The tutorial notebook shows how the algorithm can be used on real world data in the context of (unsupervised) outlier / anomaly detection to improve the performance of an unsupervised one-class SVM classification model up to and over that of a (supervised) SVM classifier.

License

Boosted-KDE is distributed under MIT license.

References

[1] Di Marzio and Taylor, Biometrika 2004