Your SlideShare is downloading. ×
  • Like
Hadoop eco system-first class
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hadoop eco system-first class

  • 165 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
165
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
1
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.  Introduction to Distributed Programming › Sequential Programming › Asynchronous Programming › Concurrent Programming › Distributed Programming › Sequential Programming vs Asynchronous Programming › Concurrent Programming vs Distributed Programming
  • 2. › Open Source Framework for writing and running distributed applications. › Suited for applications that process large amounts of data. › Accessible - eg; EC2 cloud OR commodity hardware › Robust - Easy to recover from hardware failures. › Scalable - Scales linearly to handle larger data by adding more nodes. › Simple - Enables to quickly write efficient parallel code. › Used in Data-Intensive applications such as telecom , finance , account overview pages. › SCALE-OUT instead of SCALE-UP.
  • 3.  SCALE-OUT Vs SCALE-UP  Key-Value Pair instead of relational DB.  Functional Programming – instead of Declarative SQL statements.  Offline Batch Processing Vs Online Transactions
  • 4.  How Hadoop Works › Cluster of Nodes › Type of Nodes  Computation Nodes  Job Tracker  Task Tracker  Storage Nodes  Name Node  Data Nodes  Secondary Name Node
  • 5.  UnderStanding MapReduce › Scaling a simple program Manually  Example – Word Count – A single document  Scaling Word Count for multiple documents  Front End - Map Program  Back End – Reduce Program › How Hadoop Helps  One Central Storage Server vs Distributed Storage  Phase 2 distributed processing
  • 6.  Installing Hadoop  Setting up Environment Variables  Hadoop Usage  Execution of Sample WordCount program on Hadoop.  Setting up the Cluster › Local Mode › Pseudo-Distributed Mode › Fully-Distributed Mode  Monitoring the output › Web-based Cluster UI
  • 7.  Working with Files in HDFS › Basic File Commands  Adding Files and Directories  Removing Files and Directories › Reading and Writing to HDFS programmatically  Sample program › Anatomy of a Map-Reduce Program  Hadoop Data-Types  Mapper  Reducer  Partitioner  Combiner - Local Reduce
  • 8.  Working with Files in HDFS › Reading and Writing  InputFormat  TextInputFormat  KeyValueTextInputFormat  Creating a custom InputFormat  InputSplit  RecordReader  OutputFormat  Types of OutputFormat