0
Apache Hadoop
● What is it ?
● Architecture
● Related Projects
● Large users
Hadoop – What is it ?
● An open source system developed using Java
● Supports very large data sets
● Supports large cluste...
Hadoop - Architecture
Hadoop consists of
● Hadoop Common
Common utilities for Hadoop module support
● Hadoop MapReduce
Par...
Hadoop – Related Projects
Hadoop – Related Projects
● Pig - for analysing large data sets
● Hive – data warehouse system for Hadoop
● Mahout – machi...
Hadoop – Related Projects
● Hue – Hadoop user interface
● Oozie – work flow scheduler
● Hama – bulk synchronous parallel f...
Hadoop – Large Users
● Yahoo
– 10,000 core Linux cluster
● Facebook
– 100 Petabytes, growing at .5 Petabytes a day
● Amazo...
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project...
Upcoming SlideShare
Loading in...5
×

Introdution to Apache Hadoop

487

Published on

A short introduction to Apache Hadoop and it's related products.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
487
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Introdution to Apache Hadoop"

  1. 1. Apache Hadoop ● What is it ? ● Architecture ● Related Projects ● Large users
  2. 2. Hadoop – What is it ? ● An open source system developed using Java ● Supports very large data sets ● Supports large clusters of servers ● Designed to run on pre existing low cost hardware ● Allows for fragmentation of work over cluster ● Allows for fragmentation of storage over cluster ● Provides resiliance via automatic failure handling
  3. 3. Hadoop - Architecture Hadoop consists of ● Hadoop Common Common utilities for Hadoop module support ● Hadoop MapReduce Parallel processing of Hadoop data ● Hadoop Yarn Scheduler and resource manager ● Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
  4. 4. Hadoop – Related Projects
  5. 5. Hadoop – Related Projects ● Pig - for analysing large data sets ● Hive – data warehouse system for Hadoop ● Mahout – machine learning and data mining ● Avro – a data serialization system ● Zoo Keeper – helps build distributed applications ● Chukwa – data collection and analysis
  6. 6. Hadoop – Related Projects ● Hue – Hadoop user interface ● Oozie – work flow scheduler ● Hama – bulk synchronous parallel framework – For massive scientific computations ● Nutch – web crawler ● Hbase – Non relational database
  7. 7. Hadoop – Large Users ● Yahoo – 10,000 core Linux cluster ● Facebook – 100 Petabytes, growing at .5 Petabytes a day ● Amazon – Its possible to run Hadoop on Amazon's EC2 and S3
  8. 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×