Your SlideShare is downloading. ×
Hadoop..
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hadoop..

762
views

Published on

Apache Hadoop Seminar

Apache Hadoop Seminar

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
762
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
63
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Presented by NIKHIL P L 1
  • 2. Apache Hadoop • • • • • • • Developer(s) Type License Written in OS Created by Inspired by : Apache Software Foundation : Distributed File System : Apache License 2.0 : Java : Cross platform : Doug Cutting (2005) : Google’s MapReduce, GFS 2
  • 3. Sub projects • HDFS – distributed, scalable, and portable file system – Store large data sets – Cope with hardware failure – Runs on top of the existing system 3
  • 4. HDFS - Replication • Blocks with data are replicated to multiple nodes • Allow for node failure without data loss 4
  • 5. Sub projects . • MapReduce – Technology from Google – Hadoop's fundamental data filtering algorithm – Map and Reduce functions – Useful in a wide range of application • distributed pattern-based searching, distributed sorting, web link-graph reversal, machine learning, statistical machine translation. 5
  • 6. MapReduce - Workflow 6
  • 7. Hadoop cluster (Terminology) 7
  • 8. Types of Nodes • HDFS nodes – NameNode (Master) – DataNode (Slaves) • MapReduce nodes – Job Tracker (Master) – Task Tracker (Slaves) 8
  • 9. Types of Nodes . 9
  • 10. Sub projects .. • Hive – providing data summarization, query, and analysis – initially developed by Facebook • Hbase – open source, non-relational, distributed database – Providing Google BigTable-model database -like capabilities 10
  • 11. Sub projects … • Zookeeper – distributed configuration service, synchronization services, notification systems and naming registry for large distributed systems. • Pig – A language and compiler to generate Hadoop programs – Originally developed at Yahoo! 11
  • 12. How does Hadoop works? . • HDFS Works 12
  • 13. How does Hadoop works? .. • MapReduce Works 13
  • 14. How does Hadoop works? … • MapReduce Works 14
  • 15. How does Hadoop works? …. • Managing Hadoop Jobs 15
  • 16. Applications • • • • Marketing analytics Machin learning (eg: spam filters) Image processing Processing of XML messages 16
  • 17. • world's largest Hadoop production application • ~20,000 machines running Hadoop 17
  • 18. • the largest Hadoop cluster in the world with 100 PB of storage • 1200 machines with 8 cores each + 800 machines with 16 cores each • 32 GB of RAM per machine • 65 millions files in HDFS • 12 TB of compressed data added per day 18
  • 19. Other Users 19
  • 20. Thanks 20