Presented during DataMass Summit 2017.
http://summit2017.datamass.io/
https://www.youtube.com/watch?v=eGJfhHPdhuo
Data center workloads produce a significant amount of log data which has to be analyzed in order to discover any potential issues. We present an automated text mining approach for workload monitoring and data analytics, which is a combination of machine learning and big data processing. This session provides an overview of a data pipeline based on key components such as Apache Kafka, Apache Spark and generalized version of k-means algorithm.