 Introduction to Distributed Programming
› Sequential Programming
› Asynchronous Programming
› Concurrent Programming
› D...
› Open Source Framework for writing and running
distributed applications.
› Suited for applications that process large amo...
 SCALE-OUT Vs SCALE-UP
 Key-Value Pair instead of relational DB.
 Functional Programming – instead of
Declarative SQL s...
 How Hadoop Works
› Cluster of Nodes
› Type of Nodes
 Computation Nodes
 Job Tracker
 Task Tracker
 Storage Nodes
 N...
 UnderStanding MapReduce
› Scaling a simple program Manually
 Example – Word Count – A single document
 Scaling Word Co...
 Installing Hadoop
 Setting up Environment Variables
 Hadoop Usage
 Execution of Sample WordCount
program on Hadoop.
...
 Working with Files in HDFS
› Basic File Commands
 Adding Files and Directories
 Removing Files and Directories
› Readi...
 Working with Files in HDFS
› Reading and Writing
 InputFormat
 TextInputFormat
 KeyValueTextInputFormat
 Creating a ...
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Hadoop eco system-first class
Upcoming SlideShare
Loading in...5
×

Hadoop eco system-first class

216

Published on

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
216
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Hadoop eco system-first class

  1. 1.  Introduction to Distributed Programming › Sequential Programming › Asynchronous Programming › Concurrent Programming › Distributed Programming › Sequential Programming vs Asynchronous Programming › Concurrent Programming vs Distributed Programming
  2. 2. › Open Source Framework for writing and running distributed applications. › Suited for applications that process large amounts of data. › Accessible - eg; EC2 cloud OR commodity hardware › Robust - Easy to recover from hardware failures. › Scalable - Scales linearly to handle larger data by adding more nodes. › Simple - Enables to quickly write efficient parallel code. › Used in Data-Intensive applications such as telecom , finance , account overview pages. › SCALE-OUT instead of SCALE-UP.
  3. 3.  SCALE-OUT Vs SCALE-UP  Key-Value Pair instead of relational DB.  Functional Programming – instead of Declarative SQL statements.  Offline Batch Processing Vs Online Transactions
  4. 4.  How Hadoop Works › Cluster of Nodes › Type of Nodes  Computation Nodes  Job Tracker  Task Tracker  Storage Nodes  Name Node  Data Nodes  Secondary Name Node
  5. 5.  UnderStanding MapReduce › Scaling a simple program Manually  Example – Word Count – A single document  Scaling Word Count for multiple documents  Front End - Map Program  Back End – Reduce Program › How Hadoop Helps  One Central Storage Server vs Distributed Storage  Phase 2 distributed processing
  6. 6.  Installing Hadoop  Setting up Environment Variables  Hadoop Usage  Execution of Sample WordCount program on Hadoop.  Setting up the Cluster › Local Mode › Pseudo-Distributed Mode › Fully-Distributed Mode  Monitoring the output › Web-based Cluster UI
  7. 7.  Working with Files in HDFS › Basic File Commands  Adding Files and Directories  Removing Files and Directories › Reading and Writing to HDFS programmatically  Sample program › Anatomy of a Map-Reduce Program  Hadoop Data-Types  Mapper  Reducer  Partitioner  Combiner - Local Reduce
  8. 8.  Working with Files in HDFS › Reading and Writing  InputFormat  TextInputFormat  KeyValueTextInputFormat  Creating a custom InputFormat  InputSplit  RecordReader  OutputFormat  Types of OutputFormat
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×