Machine Learning

Anomaly Detection

What are outliers/anomalies?

Variants of Anomaly detection problems

Types of Anomalies

Global/Point anomaly

global anomaly

Contextual/Conditional Anomaly

Collective Anomaly

Anomaly Detection Paradigms

Supervised

Semi-Supervised

Unsupervised Anomaly Detection

Unsupervised Anomaly Detection Approaches

Statistical anomaly detection

Pros

Cons

Graphical Approaches

Univariate data

\(\hat\mu = \frac{1}{n}\sum_{i=1}^{n}x_i\) \(\hat\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i-\hat\mu)^2\)

Multivariate data

Mahalanobis

\(y^2 = (x-\bar x)' S^{-1} (x-\bar x)\)

Likelihood Approach

Proximity-based Anomaly Detection

proximity

proximity2

Pros

Cons

Density-based Anomaly Detection

\[\text{density}(x, k) = (\frac{1}{k}\sum_{y\in N(x,k)} \text{distance}(x,y))^{-1}\]

Relative Density Outlier Score

\[\text{relative density}(x,k) = \frac{\text{density}(x,k)}{\frac{1}{k}\sum_{y\in N(x,k)}\text{density}(y,k)}\]

Density-based outlier detection

Pros

Cons

Cluster-based Outlier Detection

cluster based outlier detection

Degree to which object belongs to any cluster

\[\frac{\text{distance}(x, \text{centroid}_c)}{\text{median}(\{\text{distance}(x',\text{centroid}_c)|x'\in c)\}}\]

Eliminate objects to improve objective function

Discard small clusters far from other clusters

Pros

Cons


Edit this page.