×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

HADOOP AND HDFS presented by Vijay Pratap Singh

by on Aug 19, 2013

  • 440 views

Hadoop is an open-source software framework . ...

Hadoop is an open-source software framework .
Hadoop framework consists on two main layers
Distributed file system (HDFS)
Execution engine (MapReduce)
Supports data-intensive distributed applications.
Licensed under the Apache v2 license.
 It enables applications to work with thousands of computation-independent computers and petabytes of data

Hadoop is the popular open source implementation of map/reduce
MapReduce is a programming model for processing large data sets
MapReduce is typically used to do distributed computing on clusters of computers
MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data.
The model is inspired by the map and reduce functions 
"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to slave nodes. The slave node processes the smaller problem, and passes the answer back to its master node.
"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the final output

Highly scalable file system
6k nodes and 120pb
Add commodity servers and disks to scale storage and IO bandwidth
Supports parallel reading & processing of data
Optimized for streaming reads/writes of large files
Bandwidth scales linearly with the number of nodes and disks
Fault tolerant & easy management
Built in redundancy
Tolerate disk and node failure
Automatically manages addition/removal of nodes
One operator per 3k nodes

Very Large Distributed File System
10K nodes, 100 million files, 10PB
Assumes Commodity Hardware
Files are replicated to handle hardware failure
Detect failures and recover from them
Optimized for Batch Processing
Data locations exposed so that computations can move to where data resides
Provides very high aggregate bandwidth

Hdfs provides a reliable, scalable and manageable solution for working with huge amounts of data
Future secure
Hdfs has been deployed in clusters of 10 to 4k datanodes
Used in production at companies such as yahoo! , FB , Twitter , ebay
Many enterprises including financial companies use hadoop.

Statistics

Views

Total Views
440
Views on SlideShare
440
Embed Views
0

Actions

Likes
1
Downloads
26
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

HADOOP AND HDFS presented by Vijay Pratap Singh HADOOP AND HDFS presented by Vijay Pratap Singh Presentation Transcript