Hadoop..

1,129
-1

Published on

Apache Hadoop Seminar

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,129
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
103
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hadoop..

  1. 1. Presented by NIKHIL P L 1
  2. 2. Apache Hadoop • • • • • • • Developer(s) Type License Written in OS Created by Inspired by : Apache Software Foundation : Distributed File System : Apache License 2.0 : Java : Cross platform : Doug Cutting (2005) : Google’s MapReduce, GFS 2
  3. 3. Sub projects • HDFS – distributed, scalable, and portable file system – Store large data sets – Cope with hardware failure – Runs on top of the existing system 3
  4. 4. HDFS - Replication • Blocks with data are replicated to multiple nodes • Allow for node failure without data loss 4
  5. 5. Sub projects . • MapReduce – Technology from Google – Hadoop's fundamental data filtering algorithm – Map and Reduce functions – Useful in a wide range of application • distributed pattern-based searching, distributed sorting, web link-graph reversal, machine learning, statistical machine translation. 5
  6. 6. MapReduce - Workflow 6
  7. 7. Hadoop cluster (Terminology) 7
  8. 8. Types of Nodes • HDFS nodes – NameNode (Master) – DataNode (Slaves) • MapReduce nodes – Job Tracker (Master) – Task Tracker (Slaves) 8
  9. 9. Types of Nodes . 9
  10. 10. Sub projects .. • Hive – providing data summarization, query, and analysis – initially developed by Facebook • Hbase – open source, non-relational, distributed database – Providing Google BigTable-model database -like capabilities 10
  11. 11. Sub projects … • Zookeeper – distributed configuration service, synchronization services, notification systems and naming registry for large distributed systems. • Pig – A language and compiler to generate Hadoop programs – Originally developed at Yahoo! 11
  12. 12. How does Hadoop works? . • HDFS Works 12
  13. 13. How does Hadoop works? .. • MapReduce Works 13
  14. 14. How does Hadoop works? … • MapReduce Works 14
  15. 15. How does Hadoop works? …. • Managing Hadoop Jobs 15
  16. 16. Applications • • • • Marketing analytics Machin learning (eg: spam filters) Image processing Processing of XML messages 16
  17. 17. • world's largest Hadoop production application • ~20,000 machines running Hadoop 17
  18. 18. • the largest Hadoop cluster in the world with 100 PB of storage • 1200 machines with 8 cores each + 800 machines with 16 cores each • 32 GB of RAM per machine • 65 millions files in HDFS • 12 TB of compressed data added per day 18
  19. 19. Other Users 19
  20. 20. Thanks 20
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×