Hadoop eco system-first class

372 views
304 views

Published on

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
372
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Hadoop eco system-first class

  1. 1.  Introduction to Distributed Programming › Sequential Programming › Asynchronous Programming › Concurrent Programming › Distributed Programming › Sequential Programming vs Asynchronous Programming › Concurrent Programming vs Distributed Programming
  2. 2. › Open Source Framework for writing and running distributed applications. › Suited for applications that process large amounts of data. › Accessible - eg; EC2 cloud OR commodity hardware › Robust - Easy to recover from hardware failures. › Scalable - Scales linearly to handle larger data by adding more nodes. › Simple - Enables to quickly write efficient parallel code. › Used in Data-Intensive applications such as telecom , finance , account overview pages. › SCALE-OUT instead of SCALE-UP.
  3. 3.  SCALE-OUT Vs SCALE-UP  Key-Value Pair instead of relational DB.  Functional Programming – instead of Declarative SQL statements.  Offline Batch Processing Vs Online Transactions
  4. 4.  How Hadoop Works › Cluster of Nodes › Type of Nodes  Computation Nodes  Job Tracker  Task Tracker  Storage Nodes  Name Node  Data Nodes  Secondary Name Node
  5. 5.  UnderStanding MapReduce › Scaling a simple program Manually  Example – Word Count – A single document  Scaling Word Count for multiple documents  Front End - Map Program  Back End – Reduce Program › How Hadoop Helps  One Central Storage Server vs Distributed Storage  Phase 2 distributed processing
  6. 6.  Installing Hadoop  Setting up Environment Variables  Hadoop Usage  Execution of Sample WordCount program on Hadoop.  Setting up the Cluster › Local Mode › Pseudo-Distributed Mode › Fully-Distributed Mode  Monitoring the output › Web-based Cluster UI
  7. 7.  Working with Files in HDFS › Basic File Commands  Adding Files and Directories  Removing Files and Directories › Reading and Writing to HDFS programmatically  Sample program › Anatomy of a Map-Reduce Program  Hadoop Data-Types  Mapper  Reducer  Partitioner  Combiner - Local Reduce
  8. 8.  Working with Files in HDFS › Reading and Writing  InputFormat  TextInputFormat  KeyValueTextInputFormat  Creating a custom InputFormat  InputSplit  RecordReader  OutputFormat  Types of OutputFormat

×