Hadoop architecture meetup

695 views
636 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
695
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
59
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hadoop architecture meetup

  1. 1. Hadoop Architecture
  2. 2. Agenda• Different Hadoop daemons & its roles• How does a Hadoop cluster look like• Under the Hood:- How does it write a file• Under the Hood:- How does it read a file• Under the Hood:- How does it replicate the file• Under the Hood:- How does it run a job• How to balance an un-balanced hadoop cluster
  3. 3. Hadoop – A bit of background• It’s an open source project• Based on 2 technical papers published by Google• A well known platform for distributed applications• Easy to scale-out• Works well with commodity hard wares(not entirely true)• Very good for background applications
  4. 4. Hadoop Architecture• Two Primary components Distributed File System (HDFS): It deals with fileoperations like read, write, delete & etc Map Reduce Engine: It deals with parallel computation
  5. 5. Hadoop Distributed File System• Runs on top of existing file system• A file broken into pre-defined equal sized blocks & storedindividually• Designed to handle very large files• Not good for huge number of small files
  6. 6. Map Reduce Engine• A Map Reduce Program consists of map and reduce functions• A Map Reduce job is broken into tasks that run in parallel• Prefers local processing if possible
  7. 7. Hadoop Server RolesData Node &Task TrackerData Node &Task TrackerData Node &Task TrackerData Node &Task TrackerData Node &Task TrackerData Node &Task TrackerslavesmastersClientsName NodeJob TrackerSecondaryName NodeMap Reduce HDFSDistributed Data Analytics Distributed Data Storage
  8. 8. Hadoop ClusterHadoop ClusterRack 1DN + TTDN + TTDN + TTDN + TTName NodeRack 2DN + TTDN + TTDN + TTDN + TTJob TrackerRack 3DN + TTDN + TTDN + TTDN + TTSecondary NNRack 4DN + TTDN + TTDN + TTDN + TTClientRack NDN + TTDN + TTDN + TTDN + TTDN + TTDN + TTWorldswitch switch switch switch switchswitch switchBRAD HEDLUND .com
  9. 9. Typical WorkflowTypical Workflow• Load data into the cluster (HDFS writes)• Analyze the data (Map Reduce)• Store results in the cluster (HDFS writes)• Read the results from the cluster (HDFS reads)How many times did our customers type the word“Fraud” into emails sent to customer service?Sample Scenario:File.txtHuge file containing all emails sentto customer serviceBRAD HEDLUND .com
  10. 10. Writing files to HDFS• Client consults Name Node• Client writes block directly to one Data Node• Data Nodes replicates block• Cycle repeats for next blockName NodeData Node 1 Data Node 5 Data Node 6 Data Node NClientI want to writeBlocks A,B,C ofFile.txtOK. Write toData Nodes1,5,6Blk A Blk B Blk CFile.txtBlk A Blk B Blk C
  11. 11. switchswitchswitchPreparing HDFS writesName NodeData Node 1 Data Node 5Data Node 6ClientI want to writeFile.txtBlock AOK. Write toData Nodes1,5,6Blk A Blk B Blk CFile.txtReadyData Nodes5,6Ready?Rack 1 Rack 5Rack 1:Data Node 1Rack 5:Data Node 5Data Node 6ReadyData Node 6Rack awareReady! • Name Node pickstwo nodes in thesame rack, onenode in a differentrack• Data protection• Locality for M/RReady!BRAD HEDLUND .com
  12. 12. switchswitchswitchPipelined WriteName NodeData Node 1 Data Node 5Data Node 6ClientBlk A Blk B Blk CFile.txtRack 1 Rack 5Rack 1:Data Node 1Rack 5:Data Node 5Data Node 6A AA• Data Nodes 1& 2 pass dataalong as itsreceived• TCP 50010Rack awareBRAD HEDLUND .com
  13. 13. switchswitchswitchPipelined WriteName NodeData Node 1 Data Node 2Data Node 3ClientBlk A Blk B Blk CFile.txtRack 1 Rack 5Rack 1:Data Node 1Rack 5:Data Node 2Data Node 3A AABlock receivedSuccessFile.txtBlk A:DN1, DN2, DN3BRAD HEDLUND .com
  14. 14. Multi-block Replication PipelineData Node 1 Data Node XData Node 3Rack 1switchswitchClientswitchBlk A Blk ABlk ARack 4Data Node 2Rack 5switchData Node YData Node ZBlk BBlk BBlk BData Node WBlk CBlk CBlk CBlk A Blk B Blk CFile.txt 1TB File =3TB storage3TB network trafficBRAD HEDLUND .com
  15. 15. Client writes Span the HDFS ClusterClientRack NData NodeData NodeData NodeData NodeData NodeData NodeswitchFile.txtRack 4Data NodeData NodeData NodeData NodeData NodeData NodeswitchRack 3Data NodeData NodeData NodeData NodeData NodeData NodeswitchRack 2Data NodeData NodeData NodeData NodeData NodeData NodeswitchRack 1Data NodeData NodeData NodeData NodeData NodeData Nodeswitch• Block size• File SizeFactors:More blocks = Wider spreadBRAD HEDLUND .com
  16. 16. Hadoop Rack Awareness – Why?Name NodeFile.txt=Blk A:DN1, DN5, DN6Blk B:DN7, DN1, DN2Blk C:DN5, DN8,DN9metadataRack 1Data Node 1Data Node 2Data Node 3Data Node 5switchARack 5Data Node 5Data Node 6Data Node 7Data Node 8switchRack 9Data Node 9Data Node 10Data Node 11Data Node 12switch• Never loose all data if entire rack fails• Keep bulky flows in-rack when possible• Assumption that in-rack is higher bandwidth,lower latencyAABBBC CCswitchRack 1:Data Node 1Data Node 2Data Node 3Rack 5:Data Node 5Data Node 6Data Node 7Rack awareBRAD HEDLUND .com
  17. 17. Name Node• Data Node sends Heartbeats• Every 10th heartbeat is a Block report• Name Node builds metadata from Block reports• TCP – every 3 seconds• If Name Node is down, HDFS is downName NodeData Node 1 Data Node 2 Data Node 3 Data Node NA AA CCDN1: A,CDN2: A,CDN3: A,CmetadataFile.txt = A,CCFile systemAwesome!Thanks.I’m alive!I haveblocks:A, CBRAD HEDLUND .com
  18. 18. Re-replicating missing replicasName NodeData Node 1 Data Node 2 Data Node 3 Data Node 8A AA CCDN1: A,CDN2: A,CDN3: A, CmetadataRack1: DN1, DN2Rack5: DN3,Rack9: DN8CRack AwarenessUh Oh!MissingreplicasCopyblocks A,Cto Node 8A C• Missing Heartbeats signify lost Nodes• Name Node consults metadata, finds affected data• Name Node consults Rack Awareness script• Name Node tells a Data Node to re-replicateBRAD HEDLUND .com
  19. 19. Secondary Name Node• Not a hot standby for the Name Node• Connects to Name Node every hour*• Housekeeping, backup of Name Node metadata• Saved metadata can rebuild a failed Name NodeName NodemetadataFile.txt = A,CFile systemSecondaryName NodeIts been an hour,give me yourmetadataBRAD HEDLUND .com
  20. 20. Client reading files from HDFSName NodeClientTell me theblock locationsof Results.txtBlk A = 1,5,6Blk B = 8,1,2Blk C = 5,8,9Results.txt=Blk A:DN1, DN5, DN6Blk B:DN7, DN1, DN2Blk C:DN5, DN8,DN9metadataRack 1Data Node 1Data Node 2Data NodeData NodeswitchARack 5Data Node 5Data Node 6Data NodeData NodeswitchRack 9Data Node 8Data Node 9Data NodeData Nodeswitch• Client receives Data Node list for each block• Client picks first Data Node for each block• Client reads blocks sequentiallyAABBBC CCBRAD HEDLUND .com
  21. 21. Data Processing: Map• Map: “Run this computation on  your local data”• Job Tracker delivers Java code to Nodes with local dataMap Task Map Task Map TaskA B CClientHow manytimes does“Fraud” appear in File.txt?Count“Fraud”in Block CFile.txtFraud = 3 Fraud = 0 Fraud = 11Job TrackerName NodeData Node 1 Data Node 5 Data Node 9BRAD HEDLUND .com
  22. 22. switchswitchswitchWhat if d ta isn’t local?• Job Tracker tries to select Node in same rack as data• Name Node rack awareness“I need block A”Map Task Map TaskB CClientHow manytimes does“Fraud” appear in File.txt?Count“Fraud”in Block CFraud = 0 Fraud = 11Job TrackerName NodeData Node 1Data Node 5 Data Node 9“no Map tasks left”AData Node 2Rack 1 Rack 5 Rack 9BRAD HEDLUND .com
  23. 23. Data Node reading files from HDFSName NodeBlock A = 1,5,6File.txt=Blk A:DN1, DN5, DN6Blk B:DN7, DN1, DN2Blk C:DN5, DN8,DN9metadataRack 1Data Node 1Data Node 2Data Node 3Data NodeswitchARack 5Data Node 5Data Node 6Data NodeData NodeswitchRack 9Data Node 8Data Node 9Data NodeData Nodeswitch• Name Node provides rack local Nodes first• Leverage in-rack bandwidth, single hopAABBBC CCTell me thelocations ofBlock A ofFile.txt switchRack 1:Data Node 1Data Node 2Data Node 3Rack 5:Data Node 5Rack awareBRAD HEDLUND .com
  24. 24. Data Processing: Reduce• Reduce: “Run this computation across Map results”• Map Tasks deliver output data over the network• Reduce Task data output written to and read from HDFSFraud = 0Job TrackerReduce TaskSum“Fraud”Results.txtFraud = 14Map Task Map Task Map TaskA B CClientHDFSX Y ZData Node 1 Data Node 5 Data Node 9Data Node 3BRAD HEDLUND .com
  25. 25. Unbalanced ClusterNew RackData NodeData NodeData NodeData NodeData NodeData NodeswitchRack 2Data NodeData NodeData NodeData NodeData NodeData NodeswitchRack 1Data NodeData NodeData NodeData NodeData NodeData Nodeswitch NEWswitchNew RackData NodeData NodeData NodeData NodeData NodeData Nodeswitch NEWFile.txt• Hadoop prefers local processing if possible• New servers underutilized for Map Reduce, HDFS*• Might see more network bandwidth, slower job times****I was assigneda Map Task butdon’t have th e  block. Guess Ineed to get it.*I’m bored!BRAD HEDLUND .comUnbalanced ClusterNew RackData NodeData NodeData NodeData NodeData NodeData NodeswitchRack 2Data NodeData NodeData NodeData NodeData NodeData NodeswitchRack 1Data NodeData NodeData NodeData NodeData NodeData Nodeswitch NEWswitchNew RackData NodeData NodeData NodeData NodeData NodeData Nodeswitch NEWFile.txt• Hadoop prefers local processing if possible• New servers underutilized for Map Reduce, HDFS*• Might see more network bandwidth, slower job times****I was assigneda Map Task butdon’t have th e  block. Guess Ineed to get it.*I’m bored!BRAD HEDLUND .com
  26. 26. Cluster BalancingCluster BalancingNew RackData NodeData NodeData NodeData NodeData NodeData NodeswitchRack 2Data NodeData NodeData NodeData NodeData NodeData NodeswitchRack 1Data NodeData NodeData NodeData NodeData NodeData Nodeswitch NEWswitchNew RackData NodeData NodeData NodeData NodeData NodeData Nodeswitch NEWFile.txt• Balancer utility (if used) runs in the background• Does not interfere with Map Reduce or HDFS• Default speed limit 1 MB/sbrad@cloudera-1:~$hadoop balancerBRAD HEDLUND .com
  27. 27. Quiz• If you had written a file of size 1TB intoHDFS with replication factor 2, What isthe actual size required by the HDFS tostore this file?• True/False? Even if Name node goesdown, I still will be able to read files fromHDFS.
  28. 28. Quiz• True/False? In Hadoop Cluster, We canhave a secondary Job Tracker to enhancethe fault tolerance.• True/False? If Job Tracker goes down, Youwill not be able to write any file intoHDFS.
  29. 29. Quiz• True/False? Name node stores the actualdata itself.• True/False? Name node can be re-built usingthe secondary name node.• True/False? If a data node goes down,Hadoop takes care of re-replicating theaffected data block.
  30. 30. Quiz• In which scenario, one data node tries toread data from another data node?• What are the benefits of Name node’s rack-awareness?• True/False? HDFS is well suited forapplications which write huge number ofsmall files.
  31. 31. Quiz• True/False? Hadoop takes care ofbalancing the cluster automatically?• True/False? Output of Map tasks arewritten to HDFS file?• True/False? Output of Reduce tasks arewritten to HDFS file?
  32. 32. Quiz• True/False? In production cluster,commodity hardware can be used tosetup Name node.• Thank You

×