Introduction to Distributed Programming with Hadoop
1.
2. Introduction to Distributed Programming
› Sequential Programming
› Asynchronous Programming
› Concurrent Programming
› Distributed Programming
› Sequential Programming vs Asynchronous Programming
› Concurrent Programming vs Distributed Programming
3.
4.
5. › Open Source Framework for writing and running
distributed applications.
› Suited for applications that process large amounts of
data.
› Accessible - eg; EC2 cloud OR commodity hardware
› Robust - Easy to recover from hardware failures.
› Scalable - Scales linearly to handle larger
data by adding more nodes.
› Simple - Enables to quickly write efficient parallel
code.
› Used in Data-Intensive applications such as telecom ,
finance , account overview pages.
› SCALE-OUT instead of SCALE-UP.
6.
7.
8.
9. SCALE-OUT Vs SCALE-UP
Key-Value Pair instead of relational DB.
Functional Programming – instead of
Declarative SQL statements.
Offline Batch Processing Vs Online
Transactions
10. How Hadoop Works
› Cluster of Nodes
› Type of Nodes
Computation Nodes
Job Tracker
Task Tracker
Storage Nodes
Name Node
Data Nodes
Secondary Name Node
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33. UnderStanding MapReduce
› Scaling a simple program Manually
Example – Word Count – A single document
Scaling Word Count for multiple documents
Front End - Map Program
Back End – Reduce Program
› How Hadoop Helps
One Central Storage Server vs Distributed
Storage
Phase 2 distributed processing
34. Installing Hadoop
Setting up Environment Variables
Hadoop Usage
Execution of Sample WordCount
program on Hadoop.
Setting up the Cluster
› Local Mode
› Pseudo-Distributed Mode
› Fully-Distributed Mode
Monitoring the output
› Web-based Cluster UI
35. Working with Files in HDFS
› Basic File Commands
Adding Files and Directories
Removing Files and Directories
› Reading and Writing to HDFS programmatically
Sample program
› Anatomy of a Map-Reduce Program
Hadoop Data-Types
Mapper
Reducer
Partitioner
Combiner - Local Reduce
36. Working with Files in HDFS
› Reading and Writing
InputFormat
TextInputFormat
KeyValueTextInputFormat
Creating a custom InputFormat
InputSplit
RecordReader
OutputFormat
Types of OutputFormat