Hadoop..
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Hadoop..

on

  • 457 views

Apache Hadoop Seminar

Apache Hadoop Seminar

Statistics

Views

Total Views
457
Views on SlideShare
457
Embed Views
0

Actions

Likes
1
Downloads
28
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop.. Presentation Transcript

  • 1. Presented by NIKHIL P L 1
  • 2. Apache Hadoop • • • • • • • Developer(s) Type License Written in OS Created by Inspired by : Apache Software Foundation : Distributed File System : Apache License 2.0 : Java : Cross platform : Doug Cutting (2005) : Google’s MapReduce, GFS 2
  • 3. Sub projects • HDFS – distributed, scalable, and portable file system – Store large data sets – Cope with hardware failure – Runs on top of the existing system 3
  • 4. HDFS - Replication • Blocks with data are replicated to multiple nodes • Allow for node failure without data loss 4
  • 5. Sub projects . • MapReduce – Technology from Google – Hadoop's fundamental data filtering algorithm – Map and Reduce functions – Useful in a wide range of application • distributed pattern-based searching, distributed sorting, web link-graph reversal, machine learning, statistical machine translation. 5
  • 6. MapReduce - Workflow 6
  • 7. Hadoop cluster (Terminology) 7
  • 8. Types of Nodes • HDFS nodes – NameNode (Master) – DataNode (Slaves) • MapReduce nodes – Job Tracker (Master) – Task Tracker (Slaves) 8
  • 9. Types of Nodes . 9
  • 10. Sub projects .. • Hive – providing data summarization, query, and analysis – initially developed by Facebook • Hbase – open source, non-relational, distributed database – Providing Google BigTable-model database -like capabilities 10
  • 11. Sub projects … • Zookeeper – distributed configuration service, synchronization services, notification systems and naming registry for large distributed systems. • Pig – A language and compiler to generate Hadoop programs – Originally developed at Yahoo! 11
  • 12. How does Hadoop works? . • HDFS Works 12
  • 13. How does Hadoop works? .. • MapReduce Works 13
  • 14. How does Hadoop works? … • MapReduce Works 14
  • 15. How does Hadoop works? …. • Managing Hadoop Jobs 15
  • 16. Applications • • • • Marketing analytics Machin learning (eg: spam filters) Image processing Processing of XML messages 16
  • 17. • world's largest Hadoop production application • ~20,000 machines running Hadoop 17
  • 18. • the largest Hadoop cluster in the world with 100 PB of storage • 1200 machines with 8 cores each + 800 machines with 16 cores each • 32 GB of RAM per machine • 65 millions files in HDFS • 12 TB of compressed data added per day 18
  • 19. Other Users 19
  • 20. Thanks 20