Storage and computation is getting cheaper AND easily accessible on demand in the cloud. We now collect and store some really large data sets Eg: user activity logs, genome sequencing, sensory data etc. Hadoop and the ecosystem of projects built around it present simple and easy to use tools for storing and analyzing such large data collections on commodity hardware.
* The Hadoop architecture.
* Thinking in MapReduce.
* Run some sample MapReduce Jobs (using Hadoop Streaming).
* Introduce PigLatin, a easy to use data processing language.
Speaker Profile: Mahesh Reddy is an Entrepreneur, chasing dreams. Works on large scale crawl and extraction of structured data from the web. He is a graduate frm IIT Kanpur(2000-05) and previously worked at Yahoo! Labs as Research Engineer/Tech Lead on Search and Advertising products.