Hadoop eco system-first class

  • 151 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
151
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
1
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.  Introduction to Distributed Programming › Sequential Programming › Asynchronous Programming › Concurrent Programming › Distributed Programming › Sequential Programming vs Asynchronous Programming › Concurrent Programming vs Distributed Programming
  • 2. › Open Source Framework for writing and running distributed applications. › Suited for applications that process large amounts of data. › Accessible - eg; EC2 cloud OR commodity hardware › Robust - Easy to recover from hardware failures. › Scalable - Scales linearly to handle larger data by adding more nodes. › Simple - Enables to quickly write efficient parallel code. › Used in Data-Intensive applications such as telecom , finance , account overview pages. › SCALE-OUT instead of SCALE-UP.
  • 3.  SCALE-OUT Vs SCALE-UP  Key-Value Pair instead of relational DB.  Functional Programming – instead of Declarative SQL statements.  Offline Batch Processing Vs Online Transactions
  • 4.  How Hadoop Works › Cluster of Nodes › Type of Nodes  Computation Nodes  Job Tracker  Task Tracker  Storage Nodes  Name Node  Data Nodes  Secondary Name Node
  • 5.  UnderStanding MapReduce › Scaling a simple program Manually  Example – Word Count – A single document  Scaling Word Count for multiple documents  Front End - Map Program  Back End – Reduce Program › How Hadoop Helps  One Central Storage Server vs Distributed Storage  Phase 2 distributed processing
  • 6.  Installing Hadoop  Setting up Environment Variables  Hadoop Usage  Execution of Sample WordCount program on Hadoop.  Setting up the Cluster › Local Mode › Pseudo-Distributed Mode › Fully-Distributed Mode  Monitoring the output › Web-based Cluster UI
  • 7.  Working with Files in HDFS › Basic File Commands  Adding Files and Directories  Removing Files and Directories › Reading and Writing to HDFS programmatically  Sample program › Anatomy of a Map-Reduce Program  Hadoop Data-Types  Mapper  Reducer  Partitioner  Combiner - Local Reduce
  • 8.  Working with Files in HDFS › Reading and Writing  InputFormat  TextInputFormat  KeyValueTextInputFormat  Creating a custom InputFormat  InputSplit  RecordReader  OutputFormat  Types of OutputFormat