Hadoop introduction

Rabindra Nath Nandi
Software Engineer(Big Data) , IPvision Canada Inc

Outlines
● A brief history of Hadoop
● Why Hadoop
● Hadoop Fundamental
● HDFS
● MapReduce
● A MapReduce Program - WordCount Problem
● Installation
● Resources

A brief history of Hadoop
The genesis of Hadoop came from the Google File System paper(2003)
This paper spawned another research paper from Google - MapReduce:
Simplified Data Processing on Large Clusters.(2004)
Hadoop Project Started from Project Apache Nutch(2006)
Douh Cutting a Yahoo Researcher initially handled the project
Yahoo the Main Contributor

Why Hadoop
Everyday millions on contents are uploaded generated in facebook, google
These data needs to be stored and processed on demand
Today's hardware facility are high , so data storage doesn’t matter
So faster data processing and data storing with low cost is needed

Why Hadoop
Ability to store and process huge amounts of any kind of data, quickly
Computing power
Fault tolerance
Flexibility
Low cost
Scalability

Hadoop Fundamental
● Hadoop Provides both data storage and data processing facility
● HDFS- Hadoop Distributed File System
● MapReduce - A Distributed Data Processing Engine

HDFS Fundamental
● File systems that manage the storage across a network of machines are
called distributed file systems
Two Types of Nodes
● NameNode(Master): Holds metadata and keeps tracks of block's location in
DataNodes
● DataNode: Slave Nodes that stores and retrieves data block.DNs Periodically
reports to NameNode about list of block that they are storing
● File splits into 128 mb blocks (default)
● Replicated to 3 datanodes(default)

MapReduce Fundamental
Map() Function:
Process a key/value pair to generate intermediate key/value pair
Reduce() Function
Merge all intermediate values associated with the same key

A MapReduce Program - WordCount Problem
Mapper Example

Reducer Example

Driver Class

Installation
http://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm
https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-
ubuntu-13-10

Resources
https://hadoop.apache.org
Hadoop: The Definitive Guide, 3rd Edition
Storage and Analysis at Internet Scale By Tom White

Hadoop introduction

More Related Content

What's hot

Similar to Hadoop introduction

Recently uploaded

Hadoop introduction