Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[214]유연하고 확장성 있는 빅데이터 처리

661 views

Published on

유연하고 확장성 있는 빅데이터 처리

Published in: Technology
  • Be the first to comment

  • Be the first to like this

[214]유연하고 확장성 있는 빅데이터 처리

  1. 1. Onyx: A Flexible and Extensible Data Processing System 전병곤, 김주연, 송원욱 Software Platform Lab Joint work with 양영석, 이산하, 서장호, 어정윤, 이계원, 엄태건, 이우연, 이윤성, 정주성, 하현민, 정은지, 김수정, 유경인, 신동진 1
  2. 2. Data Processing from 10,000 Feet 2 Data Processing Application Data Processing Framework Resource Environment Spark, Flink, Hadoop MR, Dryad, Tez, ...
  3. 3. Data Processing from 10,000 Feet 3 Data Processing Application Data Processing Framework Resource Environment Spark, Flink, Hadoop MR, Dryad, Tez, ... Existing frameworks perform poorly in new resource environments (e.g., disaggregation, transient resources)
  4. 4. Disaggregation 4 Compute Storage (Ref. OpenCompute) Intermediate data generated from compute nodes should be written to and read from storage nodes.
  5. 5. Transient Resources 5 Preemption! Task preemption can cause expensive recomputation.
  6. 6. Cross Datacenter 6 Wide-area network bandwidth is scarce and expensive
  7. 7. Data Processing from 10,000 Feet 7 Data Processing Application Data Processing Framework Resource Environment Spark, Flink, Hadoop MR, Dryad, Tez, ... It is hard to add new application optimization features to existing frameworks.
  8. 8. Dynamic Optimization Dynamic skew handling Optimizing job execution based on its characteristics Adapting execution to resource elasticity 8
  9. 9. Key Observation Current data processing frameworks are not flexible and extensible. 9 => A new flexible and extensible data processing system
  10. 10. Onyx Architecture Dataflow Program Onyx Compiler Onyx Runtime Cluster 10
  11. 11. Onyx Compiler 11 Beam Program Physical Execution Plan OnyxCompiler Beam Frontend Onyx Backend Spark Frontend Spark Program IR DAG
  12. 12. IR (Intermediate Representation) DAG : Program-agnostic DAG with Annotations 12 Vertex Edge Vertex Labels Type: Operator/Loop Placement: GPUNode/ ReservedNode/TransientNode/Any Parallelism Edge Labels Type: 1:1/Broadcast/Shuffle Mode: Push/Pull Storage: Memory/Disk/RemoteDisk
  13. 13. MapReduce Example 13 Shuffle,Pull,Disk Classical MapReduce Small-scale MapReduce Shuffle,Push,Memory
  14. 14. Compiler Passes Transform an IR DAG into an optimized IR DAG after a series of “passes” Compile-Time Annotation Pass examples: ● Parallelism Pass ● Executor Placement Pass ● Data Flow Model Pass ● Stage Partitioning Pass 14 ● Transient Resource EP Pass ● Transient Resource DFM Pass ● Resource Disaggregation EP Pass ● Resource Disaggregation DFM Pass Variations
  15. 15. Transform an IR DAG into an optimized IR DAG after a series of “passes” Compile-Time Annotation Pass examples: ● Parallelism Pass ● Executor Placement Pass ● Data Flow Model Pass ● Stage Partitioning Pass ● Transient Resource EP Pass ● Transient Resource DFM Pass ● Resource Disaggregation EP Pass ● Resource Disaggregation DFM Pass Compiler Passes 15 Common Specialized Specialized Variations
  16. 16. Compiler Passes Transform an IR DAG into an optimized IR DAG after a series of “passes” Compile-Time Reshaping Pass examples: ● Loop Extraction Pass ● Loop Fusion Pass (Loop Optimization) ● Common Subexpression Elimination Pass ● Data Skew Reshaping Pass Runtime Pass example: ● Data Skew Runtime Pass 16
  17. 17. Compiler Passes Transform an IR DAG into an optimized IR DAG after a series of “passes” Compile-Time Reshaping Pass examples: ● Loop Extraction Pass ● Loop Fusion Pass (Loop Optimization) ● Common Subexpression Elimination Pass ● Data Skew Reshaping Pass Runtime Pass example: ● Data Skew Runtime Pass 17 Specialized
  18. 18. Compiler to Runtime 1818 Type: “Map” Operator Placement: “Compute” Node Parallelism: 100 Shuffle,Pull,Disk Type: “Reduce” Operator Placement: “Compute” Node Parallelism: 50 Reduce StageMap Stage Optimized IR DAG
  19. 19. Compiler to Runtime 1919 PhysicalStage PhysicalStage “Map”Tasks “Reduce”Tasks. . . . . . . X 100 . . X 50 I/O channels for intermediate data flow between tasks Physical DAG
  20. 20. Distributed Execution in Onyx Runtime Stage 20 Executor Executor Executor Executor Master
  21. 21. Distributed Execution in Onyx Runtime Master Stage 21 Executor Executor Executor Executor TaskGroup(Tasks)
  22. 22. Distributed Execution in Onyx Runtime Master Stage 22 Executor Executor Executor Executor
  23. 23. Onyx In Action 23
  24. 24. Onyx in Action ● Onyx compiler and runtime components ● Onyx job execution: MR, ALS ● Onyx runtime optimization: dynamic skew handling ● Harnessing transient resources with Onyx Omitted other optimizations due to time constraints! 24
  25. 25. Key Components (Compiler) 25
  26. 26. Key Components (Runtime) 26
  27. 27. Key Components (Runtime) 27
  28. 28. Job Execution Demo 28
  29. 29. MapReduce ● We will show two executions of MapReduce using different settings: ○ Intermediate data is saved in disk, and pulled by the reducers ○ Intermediate data is saved in memory, and pushed to the reducers ● In order to vary the settings, we go through the following passes: ○ A data store pass ○ A data flow model pass ○ All of these are “Annotation” passes 29
  30. 30. Demo Map Data in Disk, Pulled 30 Type: “Map” Operator Placement: “Compute” Node Shuffle,Pull,Disk Type: “Reduce” Operator Placement: “Compute” Node Reduce Stage Map Stage
  31. 31. Demo Map Data in Memory, Pushed 31 Type: “Map” Operator Placement: “Compute” Node Shuffle,Push,Memory Type: “Reduce” Operator Placement: “Compute” Node Reduce Stage Map Stage
  32. 32. Alternating Least Squares Example ● Alternating Least Square is an ML algorithm used commonly in recommendation systems. ● Most ML algorithms are iterative processes => ALS is one of them! ● But how is this expressed in terms of a DAG? (Acyclic!) 32
  33. 33. Alternating Least Squares Example Naively… 33 (Read input data) . . . . . . . . . . . . (Write output). . . . . . . Iteration 1 Iteration 2 Iteration N But what if we want to decide this “N” according to some condition? (ex. model convergence in ML) A set of operators that executes the ALS algorithm
  34. 34. Alternating Least Squares Example Something special we have for the ALS example: Loops! 34 (Read input data) . . . . . . . . . . . . (Write output) LoopVertex with termination condition (Read input data) . . . . . . . . . (Write output). . . . . . Iteration 1 Iteration NIteration 2
  35. 35. Demo ALS 35
  36. 36. Dynamic Data Partitioning Example ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately? 36 Onyx Compiler Onyx Runtime AnnotationPass(es) and ReshapingPass(es) IR DAG
  37. 37. Dynamic Data Partitioning Example ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately? 37 Onyx Compiler Onyx Runtime Physical DAG Conversion Shuffle,Pull,Disk StageStage Optimized IR DAG
  38. 38. Dynamic Data Partitioning Example 38 Onyx Compiler Onyx Runtime PhysicalStage PhysicalStage Physical DAG Physical DAG Conversion ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately?
  39. 39. Dynamic Data Partitioning Example 39 Onyx Compiler Onyx Runtime Execute! PhysicalStage PhysicalStage Physical DAG ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately?
  40. 40. Dynamic Data Partitioning Example 40 Onyx Compiler Onyx Runtime Data Size Metric Physical DAG Executing... ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately?
  41. 41. Dynamic Data Partitioning Example 41 Onyx Compiler Onyx Runtime New DAG RuntimePass(es) ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately?
  42. 42. Dynamic Data Partitioning Example 42 Onyx Compiler Onyx Runtime Execute! New DAG ● What happens if there is a data skew while executing a job? ● How do we detect such a data skew and partition data appropriately?
  43. 43. Demo Dynamic Data Partitioning 43
  44. 44. Harnessing Transient Resources with Onyx 44
  45. 45. Harnessing Transient Resources with Onyx 45 Using the techniques introduced in Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters from EuroSys 2017
  46. 46. Batch Engine 46 MapReduce Flume Spark ... Transient Resources ?
  47. 47. 47 Transient Resources Resources borrowed from over-provisioned latency-critical jobs (search service, online mall, etc.)
  48. 48. Data Analytics with Transient Resources 48 .... Dataflow Program Transient
  49. 49. Data Analytics with Transient Resources 49 .... Dataflow Program Execute! Transient Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks
  50. 50. Data Analytics with Transient Resources 50 .... Dataflow Program Execute! Transient Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks Tasks
  51. 51. Data Analytics with Transient Resources 51 .... Dataflow Program Execute! Transient Data Data Data
  52. 52. Solution 52 .... Dataflow Program Transient
  53. 53. Solution 53 .... Dataflow Program Transient Analyze
  54. 54. Solution 54 .... Dataflow Program Other Computations Valuable Computations Reserved Transient Analyze
  55. 55. Valuable Our definition of Valuable computations Not so valuable One-to-One One-to-Many Many-to-One Many-to-Many
  56. 56. Valuable Our definition of Valuable computations Not so valuable One-to-One One-to-Many Many-to-One Many-to-Many ... ... ... ...
  57. 57. Map-Reduce with Transient Containers (Case #1) Batch Engines (e.g., Spark) (Case #2) Our Approach 57 Many-to-Many Map Reduce
  58. 58. Batch Engines (e.g., Spark) 2 Transient, 1 Reserved Containers 58 Our Approach ReservedTransient
  59. 59. Batch Engines (e.g., Spark) Map, Reduce tasks on each container 59 ReservedTransient Our Approach Map1 Map2 Map3 Reduce1 Reduce2 Reduce3
  60. 60. 60 No dependency Many-to-Many Map Reduce Many-to-Many Map Reduce
  61. 61. 61 No dependency ⇒ Not so valuable ⇒ Transient Many-to-Many ⇒ Valuable ⇒ Reserved Map Reduce Many-to-Many Map Reduce
  62. 62. Batch Engines (e.g., Spark) Map tasks on Transient and Reduce task on Reserved 62 Our Approach Map1 Map2 Map3 Reduce1 Reduce2 Reduce3 Reduce1 Map1 Map2 ReservedTransient
  63. 63. Batch Engines (e.g., Spark) 63 Our Approach Map1 Map2 Map3 Maintain Map Outputs on Local Disks ReservedTransient
  64. 64. Batch Engines (e.g., Spark) 64 Our Approach Map1 Map2 Map3 Map1 Map2 Push Map Outputs to Destination Reserved Containers ReservedTransient
  65. 65. Batch Engines (e.g., Spark) 65 Our Approach Reduce1 Reduce2 Reduce3 Pull Map Outputs Map1 Map2 ReservedTransient
  66. 66. Batch Engines (e.g., Spark) 66 Our Approach Reduce1 Reduce2 Reduce3 ReservedTransient Reduce1 Read Input Data from Local Reserved Containers
  67. 67. Batch Engines (e.g., Spark) 67 Our Approach Reduce1 Reduce2 Reduce3 Eviction of Transient Containers → Map Outputs Destroyed ReservedTransient Reduce1
  68. 68. Batch Engines (e.g., Spark) 68 Our Approach Reduce1 Reduce2 Reduce3 ReservedTransient Reduce1 Eviction of Transient Containers → Map Outputs Not Destroyed
  69. 69. Batch Engines (e.g., Spark) 69 Our Approach Reduce1 Reduce2 Reduce3 Map1 Map2 Map3 Cascading Recomputation of 5 Tasks ReservedTransient Reduce1 No Recomputation
  70. 70. Step 1: Transient/Reserved Executor Placement Pass 70
  71. 71. Operator Placement Example with the Transient Resource Policy Multinomial Logistic Regression(MLR) : Machine learning application for classifying inputs, like tumors as malignant or benign, and ad clicks as profitable or not. Gradients are used to update the regression model, which is used for prediction. 71
  72. 72. Executor Placement Example Create 1st Model Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... 72 One-to-One One-to-Many Many-to-One Costly!
  73. 73. Create 1st Model Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved TransientNo Dependency No Dependency 73 Many-to-One Costly! One-to-One One-to-Many Executor Placement Example
  74. 74. Create 1st Model Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved Transient 74 Many-to-One Costly! No Costly Dependency with Parents One-to-One One-to-Many Executor Placement Example
  75. 75. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved TransientCostly Dependency with Parent 75 Many-to-One Costly! One-to-One One-to-Many Costly Dependency with Parent, Pipelined Executor Placement Example Create 1st Model
  76. 76. Step 2: Data Flow Model Pass 76
  77. 77. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved Transient 77 Recall.. Safe! Prone to evictions :( Create 1st Model
  78. 78. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved Transient 78 Must evacuate data out of transient executors ASAP Create 1st Model
  79. 79. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved Transient 79 Push data out as soon as it is ready! Push Push Push Create 1st Model Push
  80. 80. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved Transient 80 No need to hurry for data in Reserved containers Pull Pull Push Push Push Create 1st Model Push
  81. 81. Step 3: Stage Partitioning Pass 81
  82. 82. Stage Partitioning in Compiler 82 Execute subgraph-by-subgraph ⇒ Partition into subgraphs ⇒ Good abstraction for handling evictions/faults
  83. 83. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Reserved Transient 83 Stage Partitioning Example Create 1st Model
  84. 84. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Stage 1 Reserved Transient 84 Stage Partitioning Example Create 1st Model
  85. 85. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Stage 1 Stage 2 Reserved Transient 85 Stage Partitioning Example Create 1st Model
  86. 86. Compute Gradient Aggr Gradient Compute 2nd Model Read Training Data .... Stage 1 Stage 2 Reserved Transient 86 Stage Partitioning Example Stage 3 Create 1st Model
  87. 87. Demo Executor Placement Pass DataFlowModel Pass Stage Partitioning Pass with MLR example 87
  88. 88. Batch Engines 88 Spark 2.0.0 Onyx with suggested optimizations VS
  89. 89. Containers ● Amazon EC2s(with local SSDs) as containers ● 40 Transient Containers, 5 Reserved Containers ● All containers used for computation 89
  90. 90. Workloads ● Alternating Least Squares Yahoo! Music User Ratings of Songs with Artist, Album, and Genre Meta Information, v. 1.0. https://webscope. sandbox.yahoo.com/catalog.php?datatype=r ● Multinomial Logistic Regression Synthetic ● Map-Reduce Page view statistics for Wikimedia projects. https://dumps.wikimedia.org/other/pagecounts-raw 90
  91. 91. Job Completion Time (Lower is Better) 91 4.13x 3.52x 5.15x
  92. 92. Summary ● Introduces a new data processing system that is flexible and extensible ○ Compiler that represents various execution policies ○ Runtime that are modular and reconfigurable ● Adapts data processing seamlessly for new deployment and application requirements 92
  93. 93. 93 We are working on creating an Apache incubator project. We look forward contribution from many developers! We are hiring software developers! Contact: onyx@spl.snu.ac.kr Software platform lab site: http://spl.snu.ac.kr
  94. 94. Onyx: A Flexible and Extensible Data Processing System 전병곤, 김주연, 송원욱 Software Platform Lab Joint work with 양영석, 이산하, 서장호, 어정윤, 이계원, 엄태건, 이우연, 이윤성, 정주성, 하현민, 정은지, 김수정, 유경인, 신동진 94

×