Be the first to like this
Big Data is more than just hype. The vast quantities of data now available have led to two important challenges that are fundamentally changing the way we develop data-intensive systems. The first is at the data management level, where we are finally moving beyond vanilla MapReduce towards infrastructure that allows for more flexible data processing pipelines. The second challenge is transitioning from quantity to quality and distilling genuine knowledge from the raw data. For this, we still need innovative algorithms that facilitate data cleaning, unsupervised and semi-supervised learning, knowledge harvesting, and knowledge integration. Examples include data integration, and large-scale knowledge bases such as UWN/MENTA, and collections of commonsense knowledge such as WebChild.