Monitoring social media and news media in general is kind of building a search engine - a specific search engine called news agent. Where classical search engines are about to have all kind of content, newsagent focus on news and social media only. This is where content freshness matters - it has to be near real time.
In order to process several million news with hundreds of Megabytes per day one has to choose a system architecture that is reliable on one hand and massive scalable at the other hand. Those requirements leads to a distributed architecture, a NoSQL approach.
This talk will give you a brief overview about the Media Monitoring use case, the system architecture based on Hadoop and HBase, challenges and lessons learned.