Hadoop..

Apache Hadoop
•
•
•
•
•
•
•

Developer(s)
Type
License
Written in
OS
Created by
Inspired by

: Apache Software Foundation
: Distributed File System
: Apache License 2.0
: Java
: Cross platform
: Doug Cutting (2005)
: Google’s MapReduce, GFS
2

Sub projects
• HDFS
– distributed, scalable, and portable file system
– Store large data sets
– Cope with hardware failure
– Runs on top of the existing system

3

HDFS - Replication
• Blocks with data are replicated to multiple
nodes
• Allow for node failure without data loss

4

Sub projects .
• MapReduce
– Technology from Google
– Hadoop's fundamental data filtering algorithm
– Map and Reduce functions
– Useful in a wide range of application
• distributed pattern-based searching, distributed
sorting, web link-graph reversal, machine learning,
statistical machine translation.

5

Hadoop cluster (Terminology)

7

Types of Nodes
• HDFS nodes
– NameNode (Master)
– DataNode (Slaves)

• MapReduce nodes
– Job Tracker (Master)
– Task Tracker (Slaves)

8

Sub projects ..
• Hive
– providing data summarization, query, and analysis
– initially developed by Facebook

• Hbase
– open source, non-relational, distributed database
– Providing Google BigTable-model database -like
capabilities

10

Sub projects …
• Zookeeper
– distributed configuration service, synchronization
services, notification systems and naming registry
for large distributed systems.

• Pig
– A language and compiler to generate Hadoop
programs
– Originally developed at Yahoo!

11

How does Hadoop works? .
• HDFS Works

12

How does Hadoop works? ..
• MapReduce Works

13

How does Hadoop works? …
• MapReduce Works

14

How does Hadoop works? ….
• Managing Hadoop Jobs

15

Applications
•
•
•
•

Marketing analytics
Machin learning (eg: spam filters)
Image processing
Processing of XML messages

16

• world's largest Hadoop production application
• ~20,000 machines running Hadoop

17

• the largest Hadoop cluster in the world with
100 PB of storage
• 1200 machines with 8 cores each + 800
machines with 16 cores each
• 32 GB of RAM per machine
• 65 millions files in HDFS
• 12 TB of compressed data added per day

18

Hadoop..

More Related Content

What's hot

Viewers also liked

Similar to Hadoop..

Recently uploaded

Hadoop..