Understanding hdfs


Published on

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Yahoo! Adopted Hadoop for internal use at the end of 2006. So the data is a little bit out of date.
  • Why MapReduce intermediate data are not stored in HDFS? HDFS read and write is expensive.2. Ifintermediate data is lost, computation can be done on the corresponding dataNode again to get intermediate data. It is determinstic.
  • For row two, failure is no long uncommon, because, for example, if one machine dies in a thousand machines, when ten thousand machines, ten machines may die. This is a lot. So replica factor being two is necessary.
  • Ops/s is operations per second
  • Again, out of date.
  • Understanding hdfs

    1. 1. HDFS(Hadoop DistributedFile System) Thiru
    2. 2. Typical Work Writing file in flow to HDFS Reading file RackAgenda from HDFS Awareness Planning for a Q&A Cluster
    3. 3. Client Masters HDFS Map Reduce {Name Node} {Job Tracker} {Secondary Name Node}Hadoop Server Data Node Data Node Data Node Data NodeRoles Task Tracker Task Tracker Task Tracker Task Tracker Slaves Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker
    4. 4. Hadoop ClientHadoopCluster Name Node Job Tracker Secondary NN Hadoop Client DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT DN + TT
    5. 5. Write data in to cluster (HDFS) Analyze the data (Map Reduce)Sample HDFS Store the result in to cluster (HDFS)Workflow Read the result from cluster (HDFS) Sample scenario: How many times customer called to customer care enquiring about a recently launched product? Compare it against the AD campaign in the television. Correlate both and find the best time to run the AD HDFS CRM Data entry SQOOP Map Reduce Result Result
    6. 6. I want to File size 200 MB write file Hadoop Client Name NodeWrite data in Data Node 1 Data Node 2 Ok! Block size is 64 MB. Split theto cluster file in to 3 and(HDFS Data Node 3 Data Node 4 write in to node 1,4,5 Data Node 5 Data Node 6 Data node replicates as Client write Client Consults per replication Cycle repeats data to one Name node factor and for every block data node intimates Name node
    7. 7. Name Node DN 1 DN 5 DN 9 A C C Rack Aware: Rack 1: B DN 2 DN 6 B DN 10 Data node 1 A Data node 2 Rack 2: DN 3 DN 7 C DN 11Rack DN 4 A DN 8 B DN 12 Data node 5Awareness Never loose data when a rack is down Keep bulky flows within Rack when possible Assumption in rack has higher bandwidth, and low latency
    8. 8. File.txt File size 200 MB Hadoop Client Name Node A B C Replicate in 3,8Multi BlockReplication DN 1 DN 5 DN 9 A DN 2 DN 6 DN 10 DN 3 A DN 7 DN 11 DN 4 DN 8 A DN 12
    9. 9.  Data node sends hearth beats  Every 10th heart beat is Block report  Name node builds meta data from block report  If name node is down, HDFS is downName Node  Missing heartbeats signify lost nodes  Name node consults metadata and finds affected data  Name node consults rack awareness script  Name node tells data node to replicate
    10. 10. File System Metadata: File.txt = A0 {1,5,7} A1 {1,7,9} A2{5,10,15} Primary Name Secondary Name Node nodeName Node & It’s been 1 hr, giveSecondary your dataName node  Not a hot standby for the name node* (Zoo keeper)  Connects to name node every one hour* (Configurable)  Housekeeping, backup of Name node meta data  Saved meta data can be used to rebuild name node
    11. 11. Primary Name Node Secondary Name Node edits fsimage edits fsimageUnderstanding edits- newSecondaryname node Fsimage.ckpt Fsimage.ckpthouse keeping edits Fsimage.ckpt
    12. 12. I want to read file file.txt Hadoop Client Name NodeReading data Data Node 1 Data Node 2 Data Node 7 Ok! File.txt = blck afrom HDFS A B B {1,5,6}Cluster Data Node 3 Data Node 4 Data Node 8 Blck b {8,1,2} Blck c {5,8,9} C B Data Node 5 Data Node 6 Data Node 9 C A A C Client Client Client picks Client reads receives DN Consults first node of data list for each Name node list sequentially block
    13. 13. Single Point of Failure # Task per node Dual power •1 core can run supply for 1.5 Mapper or redundancy ReducerChoosing right Masterhardware node RAM thumb rule – 1 GB/ No Commodity Million blocks hardware of data Regular Data backup
    14. 14. Practice at Yahoo!
    15. 15.  HDFS clusters at Yahoo! include about 3500 nodes  A typical cluster node has:  · 2 quad core Xeon processors @ 2.5ghzPractice at  · Red Hat Enterprise Linux Server Release 5.1YAHoo!  · Sun Java JDK 1.6.0_13-b03  · 4 directly attached SATA drives (one terabyte each)  · 16G RAM  · 1-gigabit Ethernet
    16. 16.  70 percent of the disk space is allocated to HDFS. The remainder is reserved for the operating system (Red Hat Linux), logs, and space to spill the output of map tasks. (MapReduce intermediate data are not stored in HDFS.)  For each cluster, the NameNode and the BackupNode hosts arePractice at specially provisioned with up to 64GB RAM; application tasks are never assigned to those hosts.YAHoo!  In total, a cluster of 3500 nodes has 9.8 PB of storage available as blocks that are replicated three times yielding a net 3.3 PB of storage for user applications. As a convenient approximation, one thousand nodes represent one PB of application storage.
    17. 17.  Durability of Data uncorrelated node failures Replication of data three times is a robust guard against loss of data due to uncorrelated node failures.Practice at correlated node failures, the failure of a rack or core switch.YAHoo! HDFS can tolerate losing a rack switch (each block has a replica on some other rack). loss of electrical power to the cluster a large cluster will lose a handful of blocks during a power-on restart.
    18. 18.  BenchmarksPractice atYAHoo!
    19. 19.  BenchmarksPractice atYAHoo! NameNode Throughput benchmark
    20. 20.  Automated failover plan: Zookeeper, Yahoo’s distributed consensus technology to build an automated failover solution  Scalability of the NameNodeFuture work Solution: Our near-term solution to scalability is to allow multiple namespaces (and NameNodes) to share the physical storage within a cluster. Drawbacks: The main drawback of multiple independent namespaces is the cost of managing them.
    21. 21. Thank you