Introdution to Apache Hadoop
Upcoming SlideShare
Loading in...5
×
 

Introdution to Apache Hadoop

on

  • 716 views

A short introduction to Apache Hadoop and it's related products.

A short introduction to Apache Hadoop and it's related products.

Statistics

Views

Total Views
716
Views on SlideShare
716
Embed Views
0

Actions

Likes
1
Downloads
36
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introdution to Apache Hadoop Introdution to Apache Hadoop Presentation Transcript

    • Apache Hadoop ● What is it ? ● Architecture ● Related Projects ● Large users
    • Hadoop – What is it ? ● An open source system developed using Java ● Supports very large data sets ● Supports large clusters of servers ● Designed to run on pre existing low cost hardware ● Allows for fragmentation of work over cluster ● Allows for fragmentation of storage over cluster ● Provides resiliance via automatic failure handling
    • Hadoop - Architecture Hadoop consists of ● Hadoop Common Common utilities for Hadoop module support ● Hadoop MapReduce Parallel processing of Hadoop data ● Hadoop Yarn Scheduler and resource manager ● Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
    • Hadoop – Related Projects
    • Hadoop – Related Projects ● Pig - for analysing large data sets ● Hive – data warehouse system for Hadoop ● Mahout – machine learning and data mining ● Avro – a data serialization system ● Zoo Keeper – helps build distributed applications ● Chukwa – data collection and analysis
    • Hadoop – Related Projects ● Hue – Hadoop user interface ● Oozie – work flow scheduler ● Hama – bulk synchronous parallel framework – For massive scientific computations ● Nutch – web crawler ● Hbase – Non relational database
    • Hadoop – Large Users ● Yahoo – 10,000 core Linux cluster ● Facebook – 100 Petabytes, growing at .5 Petabytes a day ● Amazon – Its possible to run Hadoop on Amazon's EC2 and S3
    • Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems