Scalable Infrastructure and Workflow for Anomaly Detection in an Automotive Industry


Anomalies are unexpected instances which significantly deviate from the normal patterns formed by the majority of a dataset. The more an observation deviate from the normal pattern, the more likely it is an anomaly. The continuous increase in the number of car models and configuration possibilities has led to continuous increase in the complexity of logistics supply chain and production. Consequently, it has become difficult to manage the whole IT Landscape, a small anomaly/failure somewhere in the system could lead to a huge loss of money. Therefore, to identify and ultimately resolve quickly a problem in such a system is highly important. This paper addresses the challenge of identifying anomalies in a scalable way. The new data collected suffers from the problem of lack of labels for training. This challenge is addressed in the developed solution by using multiple unsupervised algorithms and reporting those observation as anomalies which are commonly reported as anomalies by all the algorithms. The developed solution also tackles the problem of data heterogeneity and big size by using Spark underneath for scalable data processing. Scalability test results demonstrate the reduction in training time of 100 transactions by 80% when using 10 cores instead of using 1 core. The results of the study have also pointed out that increasing the number of cores does not necessarily means reduction in the overall execution time, there are other factors like communications between the cores, non-spark related processing tasks, etc which can also influence the execution time.

IEEE International Conference on Innovative Trends in Information Technology (ICITIIT)

My research interests include cloud computing, specifically focussing on serverless computing for heterogeneous systems, edge computing, and AIOps.