• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hota hadoop
 

Hota hadoop

on

  • 348 views

 

Statistics

Views

Total Views
348
Views on SlideShare
348
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hota hadoop Hota hadoop Presentation Transcript

    • File Systems forFile Systems for Cloud ComputingCloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani, Hyderabad Campus Jawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India hota@hyderabad.bits-pilani.ac.in 16th March 2013 Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar
    • Growth of the InternetGrowth of the Internet Source: Cisco VNI Global Forecast, 2011-2016Source: Internet world stats
    • Golden era in ComputingGolden era in Computing Cloud Futures 2011, Redmond
    • Cloud computing: Is it aCloud computing: Is it a hype?hype?  from $41 billion in 2011 to $241 billion in 2020
    • Scaling up…Scaling up… SETI
    • What is Cloud Computing?What is Cloud Computing?
    • FilesFiles •Permanent Storage •Information sharing •Files have data and attributes
    • What Distributed FileWhat Distributed File System ProvidesSystem Provides • Provide accesses to data stored at servers using file system interfaces • What are the file system interfaces? o Open a file, check status on a file, close a file o Read data from a file o Write data to a file o Lock a file or part of a file o List files in a directory, delete a directory o Delete a file, rename a file, add a symbolic link to a file etc.
    • DFS Design IssuesDFS Design Issues • Mounting • Caching • Hints • Bulk Data Transfer • Replica management • Writing policies
    • NFS architectureNFS architecture Client computer Server computer UNIX file system NFS client NFS server UNIX file system Application program Application program Virtual file systemVirtual file system PCDOS UNIX kernel system calls RPC for (remote operations) UNIX Operations on local files Operations on remote files UNIX kernel Net work
    • Google File SystemGoogle File System Metadata: namespace, access control, mapping of files to chunks, and current location of chunks 1 2 3 4
    • HDFS DesignHDFS Design •Files stored as blocks o Default 64MB •Reliability through replication o replicated across 3+ DataNodes •Single NameNode coordinates access, metadata o Centralized management •No data caching o Little benefit due to large data sets, streaming reads
    • Commodity HardwareCommodity Hardware
    • HDFS ArchitectureHDFS Architecture HDFS-Aware Application POSIX API HDFS API Regular VFS with local and NFS-supported files Specific drivers Separate HDFS view Network stack HDFS NameNode HDFS NameNode HDFS DataNodeHDFS DataNode HDFS DataNodeHDFS DataNode
    • HDFS ArchitectureHDFS Architecture Namenode B replication Rack1 Rack2 Client Blocks Datanodes Datanodes Client Write Read Metadata ops Metadata(Name, replicas, …) Block ops
    • HDFS File ReadHDFS File Read HDFS Client Client Node Distributed FileSystems FSData InputStream 1: open 3: read 6: close NameNodeNameNode namenode 2: get block location DataNodeDataNode datanode DataNodeDataNode datanode DataNodeDataNode datanode 4: read 5: read
    • Hadoop ClustersHadoop Clusters
    • Rack AwarenessRack Awareness node r1 r2 r1 rack n2 d1 d2 Data center d=2 n1 n1 d=0 n1 d=4 d=6
    • HDFS WriteHDFS Write HDFS Client Client Node Distributed FileSystems FSData OutputStream 1: create 3: write 6: close NameNodeNameNode namenode 2: create DataNodeDataNode datanode DataNodeDataNode datanode DataNodeDataNode datanode 4: write packet 5: ack packet 7: complete Pipeline 4 5 5 4
    • Data Center NODE RACK Replica PlacementReplica Placement
    • Computational GridsComputational Grids [Source: IBM TJ Watson Research Center]
    • Load DistributionLoad Distribution
    • Map/ReduceMap/Reduce
    • SLURMSLURM
    • Crowd SourcingCrowd Sourcing
    • Foxtrot: Associating audioFoxtrot: Associating audio with locationswith locations
    • Allen Telescope Array  Search for ExtraSearch for Extra Terrestrial IntelligenceTerrestrial Intelligence
    • Thank You!