Search button

Semi-supervised active learning anomaly detection

Aluno: Henrique Serafim Santos


Resumo
The analysis of Time Series data is a growing field of study due to the increase in the rate of data collection from the most varied sensors that lead to an overload of information to be analysed in order to obtain the most accurate conclusions possible. Hence, due to the high volume of data without labels, automatized detection and labelling of anomalies in Time Series data is an active area of research, as it becomes impossible to manually identify abnormal behavior in Time Series because of the high time and monetary costs. This research focus on the investigation of the power of a Semi Supervised Active Learning algorithm to identify outlier-type anomalies in univariate Time Series. To maximize the performance of the algorithm, we start by proposing an initial pool of features from which the ones with best classification power are selected to develop the algorithm. Regarding the Semi Supervised Learning segment of the process a comparison between several classifiers is made. In addition, various Query Strategies are proposed in the Active Learning segment to increase the informativeness of the observations chosen to be manually labelled so that the time spent labelling anomalies could be decreased without a great impact in the performance of the model. In a first instance, we demonstrate that the pool of designed features better identifies the anomalies than features selected in a fully automatized process. Furthermore, we demonstrate that a Query Strategy used to select the most informative observations to be expertly classified based on the utility and uncertainty of the observations exhibit better results than randomly selecting the observations to be tagged, improving the performance of the model without infeasible time and cost spent in the identification of the anomalous behavior.


Trabalho final de Mestrado