Your SlideShare is downloading. ×
0
File Systems forFile Systems for
Cloud ComputingCloud Computing
Chittaranjan Hota, PhD
Faculty Incharge, Information Proce...
Growth of the InternetGrowth of the Internet
Source: Cisco VNI Global Forecast, 2011-2016Source: Internet world stats
Golden era in ComputingGolden era in Computing
Cloud Futures 2011, Redmond
Cloud computing: Is it aCloud computing: Is it a
hype?hype?
 from $41 billion in 2011 to $241 billion in 2020
Scaling up…Scaling up…
SETI
What is Cloud Computing?What is Cloud Computing?
FilesFiles
•Permanent Storage
•Information sharing
•Files have data and attributes
What Distributed FileWhat Distributed File
System ProvidesSystem Provides
• Provide accesses to data stored at servers usi...
DFS Design IssuesDFS Design Issues
• Mounting
• Caching
• Hints
• Bulk Data Transfer
• Replica management
• Writing polici...
NFS architectureNFS architecture
Client computer Server computer
UNIX
file
system
NFS
client
NFS
server
UNIX
file
system
A...
Google File SystemGoogle File System
Metadata:
namespace, access
control, mapping of
files to chunks, and
current location...
HDFS DesignHDFS Design
•Files stored as blocks
o Default 64MB
•Reliability through replication
o replicated across 3+ Data...
Commodity HardwareCommodity Hardware
HDFS ArchitectureHDFS Architecture
HDFS-Aware Application
POSIX API HDFS API
Regular VFS with local and
NFS-supported file...
HDFS ArchitectureHDFS Architecture
Namenode
B
replication
Rack1 Rack2
Client
Blocks
Datanodes Datanodes
Client
Write
Read
...
HDFS File ReadHDFS File Read
HDFS Client
Client Node
Distributed
FileSystems
FSData
InputStream
1: open
3: read
6: close
N...
Hadoop ClustersHadoop Clusters
Rack AwarenessRack Awareness
node
r1 r2 r1 rack
n2
d1 d2 Data
center
d=2
n1 n1
d=0
n1
d=4
d=6
HDFS WriteHDFS Write
HDFS Client
Client Node
Distributed
FileSystems
FSData
OutputStream
1: create
3: write
6: close
NameN...
Data Center
NODE
RACK
Replica PlacementReplica Placement
Computational GridsComputational Grids
[Source: IBM TJ Watson Research Center]
Load DistributionLoad Distribution
Map/ReduceMap/Reduce
SLURMSLURM
Crowd SourcingCrowd Sourcing
Foxtrot: Associating audioFoxtrot: Associating audio
with locationswith locations
Allen Telescope Array 
Search for ExtraSearch for Extra
Terrestrial IntelligenceTerrestrial Intelligence
Thank You!
Hota hadoop
Hota hadoop
Hota hadoop
Upcoming SlideShare
Loading in...5
×

Hota hadoop

327

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
327
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Hota hadoop"

  1. 1. File Systems forFile Systems for Cloud ComputingCloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani, Hyderabad Campus Jawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India hota@hyderabad.bits-pilani.ac.in 16th March 2013 Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar
  2. 2. Growth of the InternetGrowth of the Internet Source: Cisco VNI Global Forecast, 2011-2016Source: Internet world stats
  3. 3. Golden era in ComputingGolden era in Computing Cloud Futures 2011, Redmond
  4. 4. Cloud computing: Is it aCloud computing: Is it a hype?hype?  from $41 billion in 2011 to $241 billion in 2020
  5. 5. Scaling up…Scaling up… SETI
  6. 6. What is Cloud Computing?What is Cloud Computing?
  7. 7. FilesFiles •Permanent Storage •Information sharing •Files have data and attributes
  8. 8. What Distributed FileWhat Distributed File System ProvidesSystem Provides • Provide accesses to data stored at servers using file system interfaces • What are the file system interfaces? o Open a file, check status on a file, close a file o Read data from a file o Write data to a file o Lock a file or part of a file o List files in a directory, delete a directory o Delete a file, rename a file, add a symbolic link to a file etc.
  9. 9. DFS Design IssuesDFS Design Issues • Mounting • Caching • Hints • Bulk Data Transfer • Replica management • Writing policies
  10. 10. NFS architectureNFS architecture Client computer Server computer UNIX file system NFS client NFS server UNIX file system Application program Application program Virtual file systemVirtual file system PCDOS UNIX kernel system calls RPC for (remote operations) UNIX Operations on local files Operations on remote files UNIX kernel Net work
  11. 11. Google File SystemGoogle File System Metadata: namespace, access control, mapping of files to chunks, and current location of chunks 1 2 3 4
  12. 12. HDFS DesignHDFS Design •Files stored as blocks o Default 64MB •Reliability through replication o replicated across 3+ DataNodes •Single NameNode coordinates access, metadata o Centralized management •No data caching o Little benefit due to large data sets, streaming reads
  13. 13. Commodity HardwareCommodity Hardware
  14. 14. HDFS ArchitectureHDFS Architecture HDFS-Aware Application POSIX API HDFS API Regular VFS with local and NFS-supported files Specific drivers Separate HDFS view Network stack HDFS NameNode HDFS NameNode HDFS DataNodeHDFS DataNode HDFS DataNodeHDFS DataNode
  15. 15. HDFS ArchitectureHDFS Architecture Namenode B replication Rack1 Rack2 Client Blocks Datanodes Datanodes Client Write Read Metadata ops Metadata(Name, replicas, …) Block ops
  16. 16. HDFS File ReadHDFS File Read HDFS Client Client Node Distributed FileSystems FSData InputStream 1: open 3: read 6: close NameNodeNameNode namenode 2: get block location DataNodeDataNode datanode DataNodeDataNode datanode DataNodeDataNode datanode 4: read 5: read
  17. 17. Hadoop ClustersHadoop Clusters
  18. 18. Rack AwarenessRack Awareness node r1 r2 r1 rack n2 d1 d2 Data center d=2 n1 n1 d=0 n1 d=4 d=6
  19. 19. HDFS WriteHDFS Write HDFS Client Client Node Distributed FileSystems FSData OutputStream 1: create 3: write 6: close NameNodeNameNode namenode 2: create DataNodeDataNode datanode DataNodeDataNode datanode DataNodeDataNode datanode 4: write packet 5: ack packet 7: complete Pipeline 4 5 5 4
  20. 20. Data Center NODE RACK Replica PlacementReplica Placement
  21. 21. Computational GridsComputational Grids [Source: IBM TJ Watson Research Center]
  22. 22. Load DistributionLoad Distribution
  23. 23. Map/ReduceMap/Reduce
  24. 24. SLURMSLURM
  25. 25. Crowd SourcingCrowd Sourcing
  26. 26. Foxtrot: Associating audioFoxtrot: Associating audio with locationswith locations
  27. 27. Allen Telescope Array  Search for ExtraSearch for Extra Terrestrial IntelligenceTerrestrial Intelligence
  28. 28. Thank You!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×