Hadoop
Distributed File System
(HDFS)
SEMINAR GUIDE
Mr. PRAMOD PAVITHRAN
HEAD OF DIVISION
COMPUTER SCIENCE & ENGINEERING
S...
CONTENTS
WHAT IS HADOOP
PROJECT COMPONENTS IN HADOOP
MAP/REDUCE
HDFS
ARCHITECTURE
WRITE & READ IN HDFS
GOALS OF HADOOP
COM...
WHAT IS HADOOP ?
WHAT IS HADOOP ?
WHAT IS HADOOP ?
WHAT IS HADOOP ?
o Hadoop is an open-source software framework .
o Hadoop framework consists on two main layers
● Distribu...
WHY HADOOP ?
PROJECT COMPONENTS IN
HADOOP
MAP/REDUCE
o Hadoop is the popular open source implementation of map/reduce
o MapReduce is a programming model for process...
MAP REDUCE ENGINE
HDFS
Highly scalable file system
◦ 6K nodes and 120PB
◦ Add commodity servers and disks to scale storage and IO bandwidth
...
LIMITATIONS OF EXISTING DATA
ANALYTICS ARCHITECTURE
BIG DATA
INCREASING BIG DATA
HADOOP'S APPROACH
HADOOP'S APPROACH
HADOOP'S APPROACH
ARCHITECTURE OF HADOOP
HADOOP MASTER/SLAVE
ARCHITECTURE
ARCHITECTURE OF HDFS
ARCHITECTURE OF HDFS
CLIENT INTERACTION TO
HADOOP
HDFS WRITE
Client
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Rack 1
Core Switch
Switch SwitchF
DataNode 1
DataNode 9
DataNode...
PIPELINED WRITE
Client
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Rack 1
Core Switch
Switch SwitchF
DataNode 1
DataNode 9
Dat...
PIPELINED WRITE
Client
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Rack 1
Core Switch
Switch SwitchF
DataNode 1
DataNode 9
Dat...
HDFS READ
Client
Rack 1
Core Switch
Switch Switch
DataNode 1
DataNode 9
DataNode 7
Rack 5
Name Node
I want to
Read file.tx...
HDFS SHELL COMMANDS
● bin/hadoop fs -ls
● bin/hadoop fs -mkdir
● bin/hadoop fs -copyFromLocal
● bin/hadoop fs -copyToLocal...
GOALS OF HDFS
Very Large Distributed File System
◦10K nodes, 100 million files, 10PB
Assumes Commodity Hardware
◦Files are...
SCALABILITY OF HADOOP
EASE TO PROGRAMMERS
HADOOP VS. OTHER SYSTEMS
HADOOP USERS
TO LEARN MORE
Source code
◦http://hadoop.apache.org/version_control.html
◦http://svn.apache.org/viewvc/hadoop/common/trunk...
CONCLUSION
Hdfs provides a reliable, scalable and manageable solution for
working with huge amounts of data
Future secure
...
REFERENCES
[1] M. Zukowski, S. Heman, N. Nes, And P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing In A DBMS. In
VLD...
Thankyou.
Queries
Upcoming SlideShare
Loading in...5
×

HDFS presented by VIJAY

448

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
448
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HDFS presented by VIJAY

  1. 1. Hadoop Distributed File System (HDFS) SEMINAR GUIDE Mr. PRAMOD PAVITHRAN HEAD OF DIVISION COMPUTER SCIENCE & ENGINEERING SCHOOL OF ENGINEERING, CUSAT PRESENTED BY VIJAY PRATAP SINGH REG NO: 12110083 S7, CS-B ROLL NO: 81
  2. 2. CONTENTS WHAT IS HADOOP PROJECT COMPONENTS IN HADOOP MAP/REDUCE HDFS ARCHITECTURE WRITE & READ IN HDFS GOALS OF HADOOP COMPARISION WITH OTHER SYSTEMS CONCLUSION REFERENCES
  3. 3. WHAT IS HADOOP ?
  4. 4. WHAT IS HADOOP ?
  5. 5. WHAT IS HADOOP ?
  6. 6. WHAT IS HADOOP ? o Hadoop is an open-source software framework . o Hadoop framework consists on two main layers ● Distributed file system (HDFS) ● Execution engine (MapReduce) o Supports data-intensive distributed applications. o Licensed under the Apache v2 license. o It enables applications to work with thousands of computation-independent computers and petabytes of data
  7. 7. WHY HADOOP ?
  8. 8. PROJECT COMPONENTS IN HADOOP
  9. 9. MAP/REDUCE o Hadoop is the popular open source implementation of map/reduce o MapReduce is a programming model for processing large data sets o MapReduce is typically used to do distributed computing on clusters of computers o MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. oThe model is inspired by the map and reduce functions o"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to slave nodes. The slave node processes the smaller problem, and passes the answer back to its master node. o"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the final output
  10. 10. MAP REDUCE ENGINE
  11. 11. HDFS Highly scalable file system ◦ 6K nodes and 120PB ◦ Add commodity servers and disks to scale storage and IO bandwidth Supports parallel reading & processing of data ◦ Optimized for streaming reads/writes of large files ◦ Bandwidth scales linearly with the number of nodes and disks Fault tolerant & easy management ◦ Built in redundancy ◦ Tolerate disk and node failure ◦ Automatically manages addition/removal of nodes ◦ One operator per 3K nodes Scalable, Reliable & Manageable
  12. 12. LIMITATIONS OF EXISTING DATA ANALYTICS ARCHITECTURE
  13. 13. BIG DATA
  14. 14. INCREASING BIG DATA
  15. 15. HADOOP'S APPROACH
  16. 16. HADOOP'S APPROACH
  17. 17. HADOOP'S APPROACH
  18. 18. ARCHITECTURE OF HADOOP
  19. 19. HADOOP MASTER/SLAVE ARCHITECTURE
  20. 20. ARCHITECTURE OF HDFS
  21. 21. ARCHITECTURE OF HDFS
  22. 22. CLIENT INTERACTION TO HADOOP
  23. 23. HDFS WRITE Client Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Rack 1 Core Switch Switch SwitchF DataNode 1 DataNode 9 DataNode 7 Rack 5 BA C Name Node I want to write file.txt Block A OK, Write to DataNode [1,7,9] Ready DN 7,9 Ready DN 9 Ready
  24. 24. PIPELINED WRITE Client Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Rack 1 Core Switch Switch SwitchF DataNode 1 DataNode 9 DataNode 7 Rack 5 BA C Name Node A A A
  25. 25. PIPELINED WRITE Client Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Rack 1 Core Switch Switch SwitchF DataNode 1 DataNode 9 DataNode 7 Rack 5 BA C Name Node A A A Block Received Success MetaData File.txt = Block: DN: 1,7,9 A
  26. 26. HDFS READ Client Rack 1 Core Switch Switch Switch DataNode 1 DataNode 9 DataNode 7 Rack 5 Name Node I want to Read file.txt Block A Available at DataNode [1,7,9] A A A MetaData File.txt = Block: DN: 1,7,9 A
  27. 27. HDFS SHELL COMMANDS ● bin/hadoop fs -ls ● bin/hadoop fs -mkdir ● bin/hadoop fs -copyFromLocal ● bin/hadoop fs -copyToLocal ● bin/hadoop fs -moveToLocal ● bin/hadoop fs -rm ● bin/hadoop fs -tail ● bin/hadoop fs -chmod ● bin/hadoop fs -setrep -w 4 -R /dir1/s-dir/
  28. 28. GOALS OF HDFS Very Large Distributed File System ◦10K nodes, 100 million files, 10PB Assumes Commodity Hardware ◦Files are replicated to handle hardware failure ◦Detect failures and recover from them Optimized for Batch Processing ◦Data locations exposed so that computations can move to where data resides ◦Provides very high aggregate bandwidth
  29. 29. SCALABILITY OF HADOOP
  30. 30. EASE TO PROGRAMMERS
  31. 31. HADOOP VS. OTHER SYSTEMS
  32. 32. HADOOP USERS
  33. 33. TO LEARN MORE Source code ◦http://hadoop.apache.org/version_control.html ◦http://svn.apache.org/viewvc/hadoop/common/trunk/ Hadoop releases ◦http://hadoop.apache.org/releases.html Contribute to it ◦http://wiki.apache.org/hadoop/HowToContribute
  34. 34. CONCLUSION Hdfs provides a reliable, scalable and manageable solution for working with huge amounts of data Future secure Hdfs has been deployed in clusters of 10 to 4k datanodes ◦Used in production at companies such as yahoo! , FB , Twitter , ebay ◦Many enterprises including financial companies use hadoop
  35. 35. REFERENCES [1] M. Zukowski, S. Heman, N. Nes, And P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing In A DBMS. In VLDB ’07: Proceedings Of The 33rd International Conference On Very Large Data Bases, Pages 23–34, 2007. [2] Tom White, Hadoop The Definite Guide, O’reilly Media ,Third Edition, May 2012 [3] Jeffrey Shafer, Scott Rixner, And Alan L. Cox, The Hadoop Distributed Filesystem: Balancing Portability And Performance, Rice University, Houston, TX [4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, Yahoo, Sunnyvale, California, USA [5] Jens Dittrich, Jorge-arnulfo Quian, E-ruiz, Information Systems Group, Efficient Big Data Processing In Hadoop Mapreduce , Saarland University
  36. 36. Thankyou.
  37. 37. Queries
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×