Your SlideShare is downloading. ×
 Recap - What is Big DATA?
 Problems Introduced
 Traditional Architecture
 Cluster Architecture
 Where it all started...
Internet of things
Audio /
Video
Log
Files
Text/Image
Social
Sentiment
Data Market
Feeds
eGov Feeds
Weather
Wikis / Blogs
...
1990 2010
Stores 1370 MB of data
Read
@ 4.4MB/S transfer rate
1 TB is a norm
Read
@ 100MB/S transfer rate
Takes 5 minutes ...
1 Machine 10 Machine
 4 I/O Channels
 Each channel: 100 MB/s
 ~ 45 minutes
 4 I/O Channels
 Each channel: 100 MB/s
 ...
A common way of avoiding data loss is through replication
Servers
SAN
Storage
1 U
1 U
1 U
1 U
1 U
1 U
1 U
1 U 1 U
1 U
 Google File System
 Map Reduce
 HDFS: HADOOP Distributed File
System
 MapReduce
// Map Reduce function in
JavaScript
var map = function (key,
value, context) {
var words =
value.split(/[^a-zA-Z]/);
for ...
RACK 1 - DataNodes RACK 2 - DataNodes
File Metadata
/user/kc/data01.txt – Block 1,2,3,4
/user/apb/data02.txt– Block 5,6
1 ...
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.replication</name>
<value...
NameNode Secondary NameNode
• Reads fsimage and edits file
• Transaction in edits are merged With
fsimage and edits is emp...
 During start up the NameNode loads the file system state from the fsimage and the
edits log file.
 Waits for DataNodes ...
1 2 3
1. HDFS
client caches
the file data
into a
temporary
local file
Step 2
Step 3
Step 4
Step 5
Name Node
Data Node
Support Team’s blog:
http://blogs.msdn.com/b/bigdatasupport/
Facebook Page:
https://www.facebook.com/MicrosoftBigData
Face...
Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Apache Hadoop - A Deep Dive (Part 1 - HDFS)
Upcoming SlideShare
Loading in...5
×

Apache Hadoop - A Deep Dive (Part 1 - HDFS)

332

Published on

This is our next tech talk in the series where we dive deep into the Apache Hadoop framework. Hadoop, undoubtedly is the current industry leader in Big data implementation. This tech talk covers core Hadoop and how it works. This is Part 1 which explains HDFS. The next tech talk will be Part 2 explaining MapReduce.

Published in: Data & Analytics, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
332
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Explain checkpoint
  • Transcript of "Apache Hadoop - A Deep Dive (Part 1 - HDFS) "

    1. 1. Debarchan Sarkar Sunil Kumar Chakrapani The call would start soon, please be on mute. Thanks for your time and patience.
    2. 2.  Recap - What is Big DATA?  Problems Introduced  Traditional Architecture  Cluster Architecture  Where it all started?  How does It work, A 50000 feet overview  How does it work 1 & 2  Hadoop Distributed Architecture  HDFS Architecture
    3. 3. Internet of things Audio / Video Log Files Text/Image Social Sentiment Data Market Feeds eGov Feeds Weather Wikis / Blogs Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates WEB 2.0Mobile Advertisin g CollaborationeCommerce Digital Marketing Search Marketing Web Logs Recommendation s ERP / CRM Sales Pipeline Payables Payroll Inventory Contacts Deal Tracking Terabytes (10E12) Gigabytes (10E9) Exabytes (10E18) Petabytes (10E15) Velocity - Variety - variability Volume 1980 190,000$ 2010 0.07$ 1990 9,000$ 2000 15$ Storage/GB ERP / CRM WEB 2.0 Internet of things
    4. 4. 1990 2010 Stores 1370 MB of data Read @ 4.4MB/S transfer rate 1 TB is a norm Read @ 100MB/S transfer rate Takes 5 minutes Takes 2.5 hours
    5. 5. 1 Machine 10 Machine  4 I/O Channels  Each channel: 100 MB/s  ~ 45 minutes  4 I/O Channels  Each channel: 100 MB/s  ~4.5 Minutes
    6. 6. A common way of avoiding data loss is through replication
    7. 7. Servers SAN Storage
    8. 8. 1 U 1 U 1 U 1 U 1 U 1 U 1 U 1 U 1 U 1 U
    9. 9.  Google File System  Map Reduce  HDFS: HADOOP Distributed File System  MapReduce
    10. 10. // Map Reduce function in JavaScript var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") {context.write(words[i].to LowerCase(), 1);} }}; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); };
    11. 11. RACK 1 - DataNodes RACK 2 - DataNodes File Metadata /user/kc/data01.txt – Block 1,2,3,4 /user/apb/data02.txt– Block 5,6 1 1 1 2 2 3 3 2 34 4 45 5 5 6 6 6 Block1: R1DN01, R1DN02, R2DN01 Block2:R1DN01, R1DN02, R2DN03 Block3:R1DN02, R1DN03, R2DN01
    12. 12. <property> <name>dfs.block.size</name> <value>134217728</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property>
    13. 13. NameNode Secondary NameNode • Reads fsimage and edits file • Transaction in edits are merged With fsimage and edits is emptied • A client application creates a new file in HDFS • Name node logs that transaction in the edits file Checkpoint • Secondary Namenode periodically creates checkpoints of the namespace • It downloads fsimage and edit from the active NameNode • Merges fsimage and edits locally • Uploads the new image back to the active NameNode • fs.checkpoint.period • fs.checkpoint.size
    14. 14.  During start up the NameNode loads the file system state from the fsimage and the edits log file.  Waits for DataNodes to report their blocks.  During this time NameNode stays in Safemode.  Safemode for the NameNode is essentially a read-only mode for the HDFS cluster, where it does not allow any modifications to file system or blocks.  Normally the NameNode leaves Safemode automatically after the DataNodes have reported that most file system blocks are available.
    15. 15. 1 2 3 1. HDFS client caches the file data into a temporary local file Step 2 Step 3 Step 4 Step 5 Name Node Data Node
    16. 16. Support Team’s blog: http://blogs.msdn.com/b/bigdatasupport/ Facebook Page: https://www.facebook.com/MicrosoftBigData Facebook Group: https://www.facebook.com/groups/bigdatalearnings/ Twitter: @debarchans Read more: http://en.wikipedia.org/wiki/Hadoop http://en.wikipedia.org/wiki/Big_data Next Session: Apache Hadoop – Map Reduce

    ×