Hadoop Tutorial with @techmilind


Published on

Apache Hadoop has emerged as the storage and processing platform of choice for Big Data. In this tutorial, I will give an overview of Apache Hadoop and its ecosystem, with specific use cases. I will explain the MapReduce programming framework in detail, and outline how it interacts with Hadoop Distributed File System (HDFS). While Hadoop is written in Java, MapReduce applications can be written using a variety of languages using a framework called Hadoop Streaming. I will give several examples of MapReduce applications using Hadoop Streaming.

Published in: Technology

Hadoop Tutorial with @techmilind

  1. 1. Hadoop Overview & Architecture Milind Bhandarkar Chief Scientist, Machine Learning Platforms, Greenplum, A Division of EMC (Twitter: @techmilind)
  2. 2. About Me •  http://www.linkedin.com/in/milindb •  Founding member of Hadoop team at Yahoo! [2005-2010] •  Contributor to Apache Hadoop since v0.1 •  Built and led Grid Solutions Team at Yahoo! [2007-2010] •  Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu) •  Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and EMC-Greenplum
  3. 3. 2 Agenda • Motivation • Hadoop • Map-Reduce • Distributed File System • Hadoop Architecture • Next Generation MapReduce • Q & A
  4. 4. 3 Hadoop At Scale