The document discusses the history and future of Hadoop. It describes how Hadoop was originally developed based on Google papers to implement MapReduce and HDFS. However, the speaker argues that the future of Hadoop is beyond just implementing more Google papers, and will involve real-time processing, integration with traditional IT and new technologies, and fast, flexible computation. The document provides examples of how MapR technologies have improved processing times for recommendations and scaling for Twitter firehose data. The speaker urges attendees to get involved with Apache Drill and MapR.
Take all of Twitter400 x 10^6 tweets per day < 400 GB per day < 40MB/s
Kafka is a message Queuing system
Catcher is a processorAll of the systems can be run out of Hadoop. Warden can be configured to run Storm as well. Simple Architecture – all from one platform. The green blocks are data that is available for other analytics.