Intro to Hadoop and MapReduce

Booking Hotel, Flight, Train, Event & Rental Car
Intro to Hadoop and MapReduce

Outline
• Big Data
• Hadoop
• Hadoop Cluster
• Hadoop Ecosystem
• HDFS
• MapReduce
• Demo

Big Data
• There’s no one definition for ‘big data’, it’s a very subjective term.

Big Data
• Most people would consider a data set of terabytes or more to be ‘big data’,
but there are certainly people using Hadoop with great success on smaller
chunks of data than that.
• One reasonable definition is that it’s data which can’t comfortably be
processed on a single machine.

The 3 V’s of Big Data
• Volume refers to the size of data that you’re dealing with.
• Variety refers to the fact that the data is often coming from lots of different
sources and in many different formats
• Velocity refers to the speed at which the data is being generated

Hadoop
• The logo and the name comes from Doug Cutting son’s elephant toy.
• Started as a search engine project called Nutch in 2003 by Doug Cutting
and Mike Cafarella.
• Implemented Google’s white paper about distributed file system.
• Invested by Yahoo in 2006 and become a open-source project.
• Also in 2016 Hadoop 0.1.0 released

Hadoop Cluster
The core Hadoop project consists of a way to store data, known as the
Hadoop Distributed File System, or HDFS, and a way to process the data,
called MapReduce. The key concept is that we split the data up and store it
across a collection of machines, known as a cluster. Then, when we want to
process the data, we process it where it’s actually stored. Rather than
retrieving the data from a central server, instead it’s already on the cluster,
and we can process it in place.
Store in HDFS
Process with
MapReduce

Hadoop Ecosystem
HDFS
MapReduce
Pig Hive
Impala HBase
• Sqoop
• Flume
SELECT *
FROM
• Hue
• Oozie
• Mahout
• And Many More!!

HDFS
150 MB
mydata.txt
64 MB
64 MB
22 MB
HDFS block size is much bigger than common file systems.

HDFS
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
NameNode
64 MB64 MB
22 MB
22 MB
22 MB
64 MB
64 MB
64 MB
64 MB
NameNode
(standby)

Booking Hotel, Flight, Train, Event & Rental Car
Demo masak

Intro to Hadoop and MapReduce

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro to Hadoop and MapReduce

Similar to Intro to Hadoop and MapReduce (20)

Recently uploaded

Recently uploaded (20)

Intro to Hadoop and MapReduce