Your SlideShare is downloading. ×
0
Presented by

NIKHIL P L

1
Apache Hadoop
•
•
•
•
•
•
•

Developer(s)
Type
License
Written in
OS
Created by
Inspired by

: Apache Software Foundation
...
Sub projects
• HDFS
– distributed, scalable, and portable file system
– Store large data sets
– Cope with hardware failure...
HDFS - Replication
• Blocks with data are replicated to multiple
nodes
• Allow for node failure without data loss

4
Sub projects .
• MapReduce
– Technology from Google
– Hadoop's fundamental data filtering algorithm
– Map and Reduce funct...
MapReduce - Workflow

6
Hadoop cluster (Terminology)

7
Types of Nodes
• HDFS nodes
– NameNode (Master)
– DataNode (Slaves)

• MapReduce nodes
– Job Tracker (Master)
– Task Track...
Types of Nodes .

9
Sub projects ..
• Hive
– providing data summarization, query, and analysis
– initially developed by Facebook

• Hbase
– op...
Sub projects …
• Zookeeper
– distributed configuration service, synchronization
services, notification systems and naming ...
How does Hadoop works? .
• HDFS Works

12
How does Hadoop works? ..
• MapReduce Works

13
How does Hadoop works? …
• MapReduce Works

14
How does Hadoop works? ….
• Managing Hadoop Jobs

15
Applications
•
•
•
•

Marketing analytics
Machin learning (eg: spam filters)
Image processing
Processing of XML messages

...
• world's largest Hadoop production application
• ~20,000 machines running Hadoop

17
• the largest Hadoop cluster in the world with
100 PB of storage
• 1200 machines with 8 cores each + 800
machines with 16 ...
Other Users

19
Thanks

20
Upcoming SlideShare
Loading in...5
×

Hadoop..

1,058

Published on

Apache Hadoop Seminar

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,058
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
101
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop.."

  1. 1. Presented by NIKHIL P L 1
  2. 2. Apache Hadoop • • • • • • • Developer(s) Type License Written in OS Created by Inspired by : Apache Software Foundation : Distributed File System : Apache License 2.0 : Java : Cross platform : Doug Cutting (2005) : Google’s MapReduce, GFS 2
  3. 3. Sub projects • HDFS – distributed, scalable, and portable file system – Store large data sets – Cope with hardware failure – Runs on top of the existing system 3
  4. 4. HDFS - Replication • Blocks with data are replicated to multiple nodes • Allow for node failure without data loss 4
  5. 5. Sub projects . • MapReduce – Technology from Google – Hadoop's fundamental data filtering algorithm – Map and Reduce functions – Useful in a wide range of application • distributed pattern-based searching, distributed sorting, web link-graph reversal, machine learning, statistical machine translation. 5
  6. 6. MapReduce - Workflow 6
  7. 7. Hadoop cluster (Terminology) 7
  8. 8. Types of Nodes • HDFS nodes – NameNode (Master) – DataNode (Slaves) • MapReduce nodes – Job Tracker (Master) – Task Tracker (Slaves) 8
  9. 9. Types of Nodes . 9
  10. 10. Sub projects .. • Hive – providing data summarization, query, and analysis – initially developed by Facebook • Hbase – open source, non-relational, distributed database – Providing Google BigTable-model database -like capabilities 10
  11. 11. Sub projects … • Zookeeper – distributed configuration service, synchronization services, notification systems and naming registry for large distributed systems. • Pig – A language and compiler to generate Hadoop programs – Originally developed at Yahoo! 11
  12. 12. How does Hadoop works? . • HDFS Works 12
  13. 13. How does Hadoop works? .. • MapReduce Works 13
  14. 14. How does Hadoop works? … • MapReduce Works 14
  15. 15. How does Hadoop works? …. • Managing Hadoop Jobs 15
  16. 16. Applications • • • • Marketing analytics Machin learning (eg: spam filters) Image processing Processing of XML messages 16
  17. 17. • world's largest Hadoop production application • ~20,000 machines running Hadoop 17
  18. 18. • the largest Hadoop cluster in the world with 100 PB of storage • 1200 machines with 8 cores each + 800 machines with 16 cores each • 32 GB of RAM per machine • 65 millions files in HDFS • 12 TB of compressed data added per day 18
  19. 19. Other Users 19
  20. 20. Thanks 20
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×