Introdution to Apache Hadoop

  • 409 views
Uploaded on

A short introduction to Apache Hadoop and it's related products.

A short introduction to Apache Hadoop and it's related products.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
409
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
36
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Hadoop ● What is it ? ● Architecture ● Related Projects ● Large users
  • 2. Hadoop – What is it ? ● An open source system developed using Java ● Supports very large data sets ● Supports large clusters of servers ● Designed to run on pre existing low cost hardware ● Allows for fragmentation of work over cluster ● Allows for fragmentation of storage over cluster ● Provides resiliance via automatic failure handling
  • 3. Hadoop - Architecture Hadoop consists of ● Hadoop Common Common utilities for Hadoop module support ● Hadoop MapReduce Parallel processing of Hadoop data ● Hadoop Yarn Scheduler and resource manager ● Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
  • 4. Hadoop – Related Projects
  • 5. Hadoop – Related Projects ● Pig - for analysing large data sets ● Hive – data warehouse system for Hadoop ● Mahout – machine learning and data mining ● Avro – a data serialization system ● Zoo Keeper – helps build distributed applications ● Chukwa – data collection and analysis
  • 6. Hadoop – Related Projects ● Hue – Hadoop user interface ● Oozie – work flow scheduler ● Hama – bulk synchronous parallel framework – For massive scientific computations ● Nutch – web crawler ● Hbase – Non relational database
  • 7. Hadoop – Large Users ● Yahoo – 10,000 core Linux cluster ● Facebook – 100 Petabytes, growing at .5 Petabytes a day ● Amazon – Its possible to run Hadoop on Amazon's EC2 and S3
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems