Hadoop Inside

1,434 views
1,336 views

Published on

Published in: Technology

Hadoop Inside

  1. 1. Hadoop Inside TC 데이터플랫폼실 GFIS팀 이은조
  2. 2. What is Hadoop Hadoop is a Framework & System for  parallel processing of  large amounts of data in  a distributed computing environment http://searchbusinessintelligence.techtarget.in/tutorial/Apache-Hadoop-FAQ-for-BI-professionals Apache project  open source  java based  google system clone  GFS -> HDFS  MapReduce -> MapReduce
  3. 3. Distributed Processing System How to process data in distributed environment  how to read/write data  how to control nodes  load balancing Monitoring  node status  task status Fault tolerance  error detection  process error, network error, hardware error, …  error handling  temporary error: retry -> duplication, data corruption, …  permanent error: fail over(which one?)  process hang: timeout & retry • too long -> long response time • too short -> infinite loop
  4. 4. Hadoop System ArchitectureHDFS + MapReduce Secondary Job Name Name Tracker Node Node Task Data Task Data Task Data Tracker Node Tracker Node Tracker Node : Node : Process : Heart Beat : Data Read/Write
  5. 5. HDFS vs. Filesystem  inode – namespace  cylinder / track – data node  blocks(bytes) – blocks(Mbytes) Features  very large files  write once, read many times  support for usual file system operations  ls, cp, mv, rm, chmod, chown, put, cat, …  no support for multiple writers or arbitrary modifications
  6. 6. Block Replication & Rack Awareness 1 2 1 2 1 3 3 4 1 3 2 4 3 4 4 21 1 2 2 1 23 4 3 4 : File : Server 3 4 : Block : Rack
  7. 7. HDFS - ReadData Read 1. Read Request Name Client Node 2. Response 3. Reqeust 4. Read Data Data Data Node Data Node Data Node : Node : Data Block : Data I/O : Operation Message
  8. 8. HDFS - WriteData Write 1. Write Request Name Client Node 2. Response 3. Write 5. Write Data Done Data Node 4. Write Data Node 4. Write Data Node Replica Replica : Node : Data Block : Data I/O : Operation Message
  9. 9. HDFS – Write (Failure)Data Write 1. Write Request Name Client Node 2. Response 3. Write 5. Write Data Done Data Node Data Node Data Node 4. Write Replica : Node : Data Block : Data I/O : Operation Message
  10. 10. HDFS – Write (Failure)Data Write Name Data Node Client Node Replica Arrangement Delete Write Partial block Replica Data Node Data Node Data Node : Node : Data Block : Data I/O : Operation Message
  11. 11. MapReduce Definition  map: (+1) [ 1, 2, 3, 4, …, 10 ] -> [ 2, 3, 4, 5, …, 11 ]  reduce: (+) [ 2, 3, 4, 5, …, 11 ] -> 65 Programming Model for processing data sets in Hadoop  projection, filter -> map task  aggregation, join -> reduce task  sort -> partitioning Job Tracker & Task Trackers  master / slave  job = many tasks  # of map tasks = # of file splits (default: # of blocks)  # of reduce tasks = user configuration
  12. 12. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  13. 13. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  14. 14. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  15. 15. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  16. 16. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  17. 17. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  18. 18. MapReduceMap / Reduce Task : Distributed File System : Map Task : Map Output Record (Key/Value pair) : Split : Reduce Task : Reduce Output Record (Key/Value pair) : Input Data Record : Shuffling & Sorting : Partition
  19. 19. Mapper - partitioning  double indexed structure Output Buffer key value key value … key value(default: 100Mb) 1st Index partition key value partition key value … offset offset offset offset 2nd Index key key key …. offset offset offset  Spill Thread  data sorting: 2nd index (quick sort)  spill file generating  spill data file & index file  flush  merge sort (by key) per partition
  20. 20. Reducer –fetching GetMapEventsThread  map event listener MapOutputCopier  data fetching from completed mapper (HTTP)  concurrent running in some threads Merger  key sorting (heap sort) completion Job completion events events Tracker TaskTracker TaskTracker (reduce task) (map task) HTTP - GET Copier TaskTracker (map task) Reducer Copier TaskTracker (map task)
  21. 21. Job Flow JobTracker Node 5. add job Job 3. submit job TrackerClient Node 6. heartbeat 4. retrieve input spilts MapReduce 1. runJob Job Task Program Client Tracker 7. assign task Shared File System 8. retrieve 9. launch 2. copy job resources job resources Child 11. read data/ 10. run write result : Node : Job Queue : Job Map/ Reduce : JVM : Method Call : Task Task : Class : I/O TaskTracker Node
  22. 22. Monitoring Heart beat  task tracker status checking  task request / alignment  other commands (restart, shudown, kill task, …) Cluster Status Job / Task Status  JobInProgress  TaskInProgress Reporter & Metrics Black list
  23. 23. Monitoring (Summary) Heart beat  task tracker status checking  task request / alignment  other commands (restart, shudown, kill task, …) Cluster Status Job / Task Status  JobInProgress  TaskInProgress Reporter & Metrics Black list
  24. 24. Monitoring (Cluster Info)
  25. 25. Monitoring (Job Info)
  26. 26. Monitoring (Task Info)
  27. 27. Task Scheduler job queue  red-black tree ( java.util.TreeMap)  sort by priority & job id (request time) load factor  remain tasks / capacity task alignment  high priority  new task > speculative execution task > dummy splits task  map task (local) > map task (non-local) > reduce task padding  padding = MIN(total tasks * pad faction, task capacity)  for speculative execution
  28. 28. Error Handling Retry  configurable (default 4 times) Timeout  configurable Speculative Execution  current – start >= 1 minute  average progress – progress > 20%
  29. 29. Distributed Processing System How to process data in distributed environment  how to read/write data  how to control nodes  load balancing Monitoring  node status  task status Fault tolerance  error detection  process error, network error, hardware error, …  error handling  temporary error: retry -> duplication, data corruption, …  permanent error: fail over(which one?)  process hang: timeout & retry • too long -> long response time • too short -> infinite loop
  30. 30. Distributed Processing System How to process data in distributed environment  how to read/write data  how to control nodes HDFS Client  load balancing master / slave Monitoring replication / rack awareness  node status job scheduler  task status Fault tolerance  error detection  process error, network error, hardware error, …  error handling  temporary error: retry -> duplication, data corruption, …  permanent error: fail over(which one?)  process hang: timeout & retry • too long -> long response time • too short -> infinite loop
  31. 31. Distributed Processing System How to process data in distributed environment  how to read/write data  how to control nodes  load balancing Monitoring  node status heart beat  task status job/task status Fault tolerance reporter / metrics  error detection  process error, network error, hardware error, …  error handling  temporary error: retry -> duplication, data corruption, …  permanent error: fail over(which one?)  process hang: timeout & retry • too long -> long response time • too short -> infinite loop
  32. 32. Distributed Processing System How to process data in distributed environment  how to read/write data  how to control nodes  load balancing Monitoring  node status black list  task status time out & retry Fault tolerance speculative execution  error detection  process error, network error, hardware error, …  error handling  temporary error: retry -> duplication, data corruption, …  permanent error: fail over(which one?)  process hang: timeout & retry • too long -> long response time • too short -> infinite loop
  33. 33. Limitations map -> reduce network overhead  iterative processing  full(or theta) join small size but many splits data Low latency  polling & pulling  job initializing  optimized for throughput  job scheduling  data access
  34. 34. Q&A

×