Your SlideShare is downloading. ×
Hota hadoop
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hota hadoop


Published on

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. File Systems forFile Systems for Cloud ComputingCloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani, Hyderabad Campus Jawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India 16th March 2013 Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar
  • 2. Growth of the InternetGrowth of the Internet Source: Cisco VNI Global Forecast, 2011-2016Source: Internet world stats
  • 3. Golden era in ComputingGolden era in Computing Cloud Futures 2011, Redmond
  • 4. Cloud computing: Is it aCloud computing: Is it a hype?hype?  from $41 billion in 2011 to $241 billion in 2020
  • 5. Scaling up…Scaling up… SETI
  • 6. What is Cloud Computing?What is Cloud Computing?
  • 7. FilesFiles •Permanent Storage •Information sharing •Files have data and attributes
  • 8. What Distributed FileWhat Distributed File System ProvidesSystem Provides • Provide accesses to data stored at servers using file system interfaces • What are the file system interfaces? o Open a file, check status on a file, close a file o Read data from a file o Write data to a file o Lock a file or part of a file o List files in a directory, delete a directory o Delete a file, rename a file, add a symbolic link to a file etc.
  • 9. DFS Design IssuesDFS Design Issues • Mounting • Caching • Hints • Bulk Data Transfer • Replica management • Writing policies
  • 10. NFS architectureNFS architecture Client computer Server computer UNIX file system NFS client NFS server UNIX file system Application program Application program Virtual file systemVirtual file system PCDOS UNIX kernel system calls RPC for (remote operations) UNIX Operations on local files Operations on remote files UNIX kernel Net work
  • 11. Google File SystemGoogle File System Metadata: namespace, access control, mapping of files to chunks, and current location of chunks 1 2 3 4
  • 12. HDFS DesignHDFS Design •Files stored as blocks o Default 64MB •Reliability through replication o replicated across 3+ DataNodes •Single NameNode coordinates access, metadata o Centralized management •No data caching o Little benefit due to large data sets, streaming reads
  • 13. Commodity HardwareCommodity Hardware
  • 14. HDFS ArchitectureHDFS Architecture HDFS-Aware Application POSIX API HDFS API Regular VFS with local and NFS-supported files Specific drivers Separate HDFS view Network stack HDFS NameNode HDFS NameNode HDFS DataNodeHDFS DataNode HDFS DataNodeHDFS DataNode
  • 15. HDFS ArchitectureHDFS Architecture Namenode B replication Rack1 Rack2 Client Blocks Datanodes Datanodes Client Write Read Metadata ops Metadata(Name, replicas, …) Block ops
  • 16. HDFS File ReadHDFS File Read HDFS Client Client Node Distributed FileSystems FSData InputStream 1: open 3: read 6: close NameNodeNameNode namenode 2: get block location DataNodeDataNode datanode DataNodeDataNode datanode DataNodeDataNode datanode 4: read 5: read
  • 17. Hadoop ClustersHadoop Clusters
  • 18. Rack AwarenessRack Awareness node r1 r2 r1 rack n2 d1 d2 Data center d=2 n1 n1 d=0 n1 d=4 d=6
  • 19. HDFS WriteHDFS Write HDFS Client Client Node Distributed FileSystems FSData OutputStream 1: create 3: write 6: close NameNodeNameNode namenode 2: create DataNodeDataNode datanode DataNodeDataNode datanode DataNodeDataNode datanode 4: write packet 5: ack packet 7: complete Pipeline 4 5 5 4
  • 20. Data Center NODE RACK Replica PlacementReplica Placement
  • 21. Computational GridsComputational Grids [Source: IBM TJ Watson Research Center]
  • 22. Load DistributionLoad Distribution
  • 23. Map/ReduceMap/Reduce
  • 25. Crowd SourcingCrowd Sourcing
  • 26. Foxtrot: Associating audioFoxtrot: Associating audio with locationswith locations
  • 27. Allen Telescope Array  Search for ExtraSearch for Extra Terrestrial IntelligenceTerrestrial Intelligence
  • 28. Thank You!