Be the first to like this
Online Data Mining: PCA and K-Means: Algorithms for data mining, unsupervised machine learning and scientific computing were traditionally designed to minimize running time for the batch setting (random access to memory).
In recent years, a significant amount of research is devoted to producing scaleable algorithms for the same problems. A scaleable solution assumes some limitation on data access and/or compute model. Some well known models include map reduce, message passing, local computation, pass efficient, streaming and others. In this talk we argue for the need to consider the online model in data mining tasks. In an online setting, the algorithm receives data points one by one and must make some decision immediately (without examining the rest of the input). The quality of the algorithm’s decisions is compared to the best possible in hindsight. While practitioners are well aware of the need for such algorithms, this setting was mostly overlooked by the academic community. Here, we will review new results on online k-means clustering and online Principal Component Analysis (PCA).