Hadoop eco system-first class

 Introduction to Distributed Programming
› Sequential Programming
› Asynchronous Programming
› Concurrent Programming
› Distributed Programming
› Sequential Programming vs Asynchronous Programming
› Concurrent Programming vs Distributed Programming

› Open Source Framework for writing and running
distributed applications.
› Suited for applications that process large amounts of
data.
› Accessible - eg; EC2 cloud OR commodity hardware
› Robust - Easy to recover from hardware failures.
› Scalable - Scales linearly to handle larger
data by adding more nodes.
› Simple - Enables to quickly write efficient parallel
code.
› Used in Data-Intensive applications such as telecom ,
finance , account overview pages.
› SCALE-OUT instead of SCALE-UP.

 SCALE-OUT Vs SCALE-UP
 Key-Value Pair instead of relational DB.
 Functional Programming – instead of
Declarative SQL statements.
 Offline Batch Processing Vs Online
Transactions

 How Hadoop Works
› Cluster of Nodes
› Type of Nodes
 Computation Nodes
 Job Tracker
 Task Tracker
 Storage Nodes
 Name Node
 Data Nodes
 Secondary Name Node

 UnderStanding MapReduce
› Scaling a simple program Manually
 Example – Word Count – A single document
 Scaling Word Count for multiple documents
 Front End - Map Program
 Back End – Reduce Program
› How Hadoop Helps
 One Central Storage Server vs Distributed
Storage
 Phase 2 distributed processing

 Installing Hadoop
 Setting up Environment Variables
 Hadoop Usage
 Execution of Sample WordCount
program on Hadoop.
 Setting up the Cluster
› Local Mode
› Pseudo-Distributed Mode
› Fully-Distributed Mode
 Monitoring the output
› Web-based Cluster UI

 Working with Files in HDFS
› Basic File Commands
 Adding Files and Directories
 Removing Files and Directories
› Reading and Writing to HDFS programmatically
 Sample program
› Anatomy of a Map-Reduce Program
 Hadoop Data-Types
 Mapper
 Reducer
 Partitioner
 Combiner - Local Reduce

 Working with Files in HDFS
› Reading and Writing
 InputFormat
 TextInputFormat
 KeyValueTextInputFormat
 Creating a custom InputFormat
 InputSplit
 RecordReader
 OutputFormat
 Types of OutputFormat

Hadoop eco system-first class

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Hadoop eco system-first class

Similar to Hadoop eco system-first class (20)

Recently uploaded

Recently uploaded (20)

Hadoop eco system-first class