Your SlideShare is downloading. ×
Introdution to Apache Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introdution to Apache Hadoop

460

Published on

A short introduction to Apache Hadoop and it's related products.

A short introduction to Apache Hadoop and it's related products.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
460
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Hadoop ● What is it ? ● Architecture ● Related Projects ● Large users
  • 2. Hadoop – What is it ? ● An open source system developed using Java ● Supports very large data sets ● Supports large clusters of servers ● Designed to run on pre existing low cost hardware ● Allows for fragmentation of work over cluster ● Allows for fragmentation of storage over cluster ● Provides resiliance via automatic failure handling
  • 3. Hadoop - Architecture Hadoop consists of ● Hadoop Common Common utilities for Hadoop module support ● Hadoop MapReduce Parallel processing of Hadoop data ● Hadoop Yarn Scheduler and resource manager ● Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
  • 4. Hadoop – Related Projects
  • 5. Hadoop – Related Projects ● Pig - for analysing large data sets ● Hive – data warehouse system for Hadoop ● Mahout – machine learning and data mining ● Avro – a data serialization system ● Zoo Keeper – helps build distributed applications ● Chukwa – data collection and analysis
  • 6. Hadoop – Related Projects ● Hue – Hadoop user interface ● Oozie – work flow scheduler ● Hama – bulk synchronous parallel framework – For massive scientific computations ● Nutch – web crawler ● Hbase – Non relational database
  • 7. Hadoop – Large Users ● Yahoo – 10,000 core Linux cluster ● Facebook – 100 Petabytes, growing at .5 Petabytes a day ● Amazon – Its possible to run Hadoop on Amazon's EC2 and S3
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

×