Hadoop..

Apache Hadoop
•
•
•
•
•
•
•

Developer(s)
Type
License
Written in
OS
Created by
Inspired by

: Apache Software Foundation
: Distributed File System
: Apache License 2.0
: Java
: Cross platform
: Doug Cutting (2005)
: Google’s MapReduce, GFS
2

Sub projects
• HDFS
– distributed, scalable, and portable file system
– Store large data sets
– Cope with hardware failure
– Runs on top of the existing system

3

HDFS - Replication
• Blocks with data are replicated to multiple
nodes
• Allow for node failure without data loss

4

Sub projects .
• MapReduce
– Technology from Google
– Hadoop's fundamental data filtering algorithm
– Map and Reduce functions
– Useful in a wide range of application
• distributed pattern-based searching, distributed
sorting, web link-graph reversal, machine learning,
statistical machine translation.

5

Hadoop cluster (Terminology)

7

Types of Nodes
• HDFS nodes
– NameNode (Master)
– DataNode (Slaves)

• MapReduce nodes
– Job Tracker (Master)
– Task Tracker (Slaves)

8

Sub projects ..
• Hive
– providing data summarization, query, and analysis
– initially developed by Facebook

• Hbase
– open source, non-relational, distributed database
– Providing Google BigTable-model database -like
capabilities

10

Sub projects …
• Zookeeper
– distributed configuration service, synchronization
services, notification systems and naming registry
for large distributed systems.

• Pig
– A language and compiler to generate Hadoop
programs
– Originally developed at Yahoo!

11

How does Hadoop works? .
• HDFS Works

12

How does Hadoop works? ..
• MapReduce Works

13

How does Hadoop works? …
• MapReduce Works

14

How does Hadoop works? ….
• Managing Hadoop Jobs

15

Applications
•
•
•
•

Marketing analytics
Machin learning (eg: spam filters)
Image processing
Processing of XML messages

16

• world's largest Hadoop production application
• ~20,000 machines running Hadoop

17

• the largest Hadoop cluster in the world with
100 PB of storage
• 1200 machines with 8 cores each + 800
machines with 16 cores each
• 32 GB of RAM per machine
• 65 millions files in HDFS
• 12 TB of compressed data added per day

18

Hadoop..

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to Hadoop..

Similar to Hadoop.. (20)

Recently uploaded

Recently uploaded (20)

Hadoop..