Dryad Paper Review and System Analysis

1,436 views

Published on

About Dryad Paper and Dryad System.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,436
On SlideShare
0
From Embeds
0
Number of Embeds
87
Actions
Shares
0
Downloads
1
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Dryad Paper Review and System Analysis

  1. 1. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks EuroSys'07 by Microsoft Research https://twitter.com/nourlcn
  2. 2. What's Dryad? <ul><li>定义 1 (来自 Dryad 论文) </li></ul><ul><ul><li>Dryad is a general-purpose distrubuted execution engine for coarse-grain data-parallel applications. </li></ul></ul><ul><li>定义 2 ( 来自微软网站 ) </li></ul><ul><ul><li>The Dryad Project is investigating programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center. </li></ul></ul>
  3. 3. Outline <ul><li>Motivation </li></ul><ul><li>Dryad System </li></ul><ul><li>Example(SQL Query) </li></ul><ul><li>Evaluation </li></ul><ul><li>Contribution </li></ul><ul><li>Thought </li></ul>
  4. 4. motivation <ul><li>Inspired by : </li></ul><ul><ul><li>Shader lang for GPU </li></ul></ul><ul><ul><li>MapReduce </li></ul></ul><ul><ul><li>parallel db </li></ul></ul><ul><li>Motivation </li></ul><ul><ul><li>How can we make it easier for developers to write efficient parallel and distributed applications? </li></ul></ul><ul><li>Simplicity of the programming model </li></ul><ul><li>Reliability,efficiency and scalability of the applications. </li></ul>
  5. 5. Outline <ul><li>Motivation </li></ul><ul><li>Dryad System </li></ul><ul><li>Example(SQL Query) </li></ul><ul><li>Evaluation </li></ul><ul><li>Contribution </li></ul><ul><li>Thought </li></ul>
  6. 6. Why Dryad ? <ul><li>两个假设: </li></ul><ul><ul><li>the emergence of large-scale internet services </li></ul></ul><ul><ul><li>future advances in computing power come from increasing the num of cores on a chip rather than inproving the speed of a single core. </li></ul></ul>
  7. 7. Dryad 假设前提 <ul><li>前提 </li></ul><ul><ul><li>All resources are in a single administrative domain </li></ul></ul><ul><ul><li>known,high-performance communication topology </li></ul></ul><ul><ul><li>under centeralized management and control </li></ul></ul><ul><li>在这个前提下,回避了许多问题: </li></ul><ul><ul><li>高延迟 </li></ul></ul><ul><ul><li>不可靠的网络 </li></ul></ul><ul><ul><li>fedorated or competing entities 对 resource 的使用 </li></ul></ul><ul><ul><li>不同 entities 对资源的访问控制 / 认证 </li></ul></ul>
  8. 8. 系统结构 <ul><li>What's a Dryad Job? </li></ul><ul><li>A directed acyclic graph; </li></ul><ul><li>Each vertex is a program; </li></ul><ul><li>Edges represent data channels. </li></ul>
  9. 9. Dryad Details Outline <ul><li>System Responsibility </li></ul><ul><li>Dryad Application </li></ul><ul><li>Job Manager </li></ul><ul><li>Name Server </li></ul><ul><li>Vertex Program </li></ul><ul><li>Channel </li></ul><ul><li>Run-time Graph Refinement </li></ul><ul><li>Fault Tolerance </li></ul>
  10. 10. Dryad Responsibility <ul><li>管理并发的 process </li></ul><ul><li>scheduling vertices to run </li></ul><ul><ul><li>on multiple compters </li></ul></ul><ul><ul><li>or multi-cores within a computers </li></ul></ul><ul><li>scheduling use of computers and their CPUs </li></ul><ul><li>handle all the difficult problems -- ( 容错 ) </li></ul><ul><li>recover from communication or failures </li></ul><ul><li>transfer data among system </li></ul><ul><li>...... </li></ul>
  11. 11. Dryad Application <ul><li>What's Dryad Application? </li></ul><ul><li>It's a a dataflow graph,Combined with : </li></ul><ul><ul><li>Computational “vertices” </li></ul></ul><ul><ul><li>Communication “channels” </li></ul></ul><ul><li>It could: </li></ul><ul><ul><li>discover data size/placement </li></ul></ul><ul><ul><li>modify the graph </li></ul></ul>
  12. 12. Overview again
  13. 13. Job Manager <ul><li>Assumption </li></ul><ul><ul><li>one job on cluster </li></ul></ul><ul><ul><li>JM manage one job-->one communication graph </li></ul></ul><ul><li>Consult the NS to know available computer </li></ul><ul><li>Maintain the job graph </li></ul><ul><li>schedule running vertices </li></ul><ul><li>Perform greedy schedule policy </li></ul>
  14. 14. Name Server <ul><li>Report available computer </li></ul><ul><li>Record location of each computer </li></ul>
  15. 15. Vertices <ul><li>Daemon act as proxy,exchange status and data with JM. </li></ul><ul><li>Vertices controlled by daemon. </li></ul><ul><li>Comunicate with JM,receive cmd from JM using Daemon </li></ul><ul><li>Killed by daemon if JM crashes. </li></ul><ul><li>exchage data through tmpFile, TCP pipes, shared-memory </li></ul>
  16. 16. Channel 1. 定义 edges 时,可以定义适用的 protocol 2. 使用 TCP pipe 可能会发生死锁,原因: TCP pipe 需要 Vertics 同步执行,一方可能失效。
  17. 17. Vertices Program <ul><li>Developed by user; </li></ul><ul><li>Sequential code </li></ul><ul><ul><li>一般不去创建进程 / 线程 </li></ul></ul><ul><ul><li>无锁 </li></ul></ul><ul><li>Event-based programming style </li></ul><ul><li>Dryad providing: </li></ul><ul><ul><li>“ process wrapper” library </li></ul></ul><ul><ul><li>standard Dryad vertex class </li></ul></ul><ul><ul><li>Channel interface: read,write,serialization,deserialization </li></ul></ul><ul><li>User providing: </li></ul><ul><ul><li>Inherit from predefined vertex class </li></ul></ul><ul><li>(merge,Sort,Maps,joins) </li></ul>
  18. 18. Run-time
  19. 19. Run-time graph refinement <ul><li>Internal vertex perform a data reduction, network traffic will be reduced. </li></ul>
  20. 20. <ul><li>Why </li></ul><ul><ul><li>JM 调度的时候, greedy policy ,一个机器 / 机架上尽可能多的 vertices </li></ul></ul><ul><ul><li>a set of input to a single downstream vertex, 数据的 copy 极易造成网络拥塞 </li></ul></ul><ul><ul><li>location and size of data not known until run </li></ul></ul>
  21. 21. <ul><li>How to: </li></ul><ul><ul><li>inputs grouped into subsets that close in network topology </li></ul></ul><ul><ul><li>insert internal vertex for each subset </li></ul></ul><ul><ul><li>internal vertex read data from a subset of the inputs </li></ul></ul><ul><ul><li>partition the inputs by location and output union. </li></ul></ul><ul><ul><li>implement a aggregation manager. </li></ul></ul><ul><li>Benefit: </li></ul><ul><ul><li>saving network bandwidth </li></ul></ul>
  22. 22. <ul><li>当 inputs 比较大的时候,可以启发式 grouping ,保证 downstream vertex 不超过 a set number of input channels. 2.K 倍并行 </li></ul>对大部分应用来说,存在一类等价的 Graph 计算后结果相同
  23. 23. Fault Tolerance <ul><li>三个前提 </li></ul><ul><li>三个 Manager </li></ul><ul><ul><li>JM , Stage Manager , Aggregation Manager </li></ul></ul><ul><li>DAG( 有向无环图 ) </li></ul><ul><ul><li>输入决定输出,可以 re-run </li></ul></ul><ul><li>Any fails, JM is informed. </li></ul><ul><ul><li>Vertex report to JM </li></ul></ul><ul><ul><li>Daemon notify the JM </li></ul></ul><ul><ul><li>Daemon fails, JM receive heartbeat timeout </li></ul></ul><ul><ul><li>Read/write error on input/output, channel fails re-executed. </li></ul></ul><ul><li>Stage Manager Object </li></ul><ul><ul><li>Receive callback on every state transition </li></ul></ul><ul><ul><li>Stage manager object holds a global lock on JM data structure and can implement complex behaviors. </li></ul></ul><ul><ul><li>eg. 启发式检测 vertex 是否比 peers slower , if true,duplicate execution. </li></ul></ul>
  24. 24. Outline <ul><li>Motivation </li></ul><ul><li>Dryad System </li></ul><ul><li>Example(SQL Query) </li></ul><ul><li>Evaluation </li></ul><ul><li>Contribution </li></ul><ul><li>Thought </li></ul>
  25. 25. 举个例子
  26. 26. SQL Query <ul><li>photoObjAll: </li></ul><ul><ul><li>354,254,163 records, </li></ul></ul><ul><ul><li>unique identifier: objID </li></ul></ul><ul><li>neighbors: </li></ul><ul><ul><li>2,803,165,372 records </li></ul></ul><ul><li>output </li></ul><ul><ul><li>joins “X” : 932,820,679 </li></ul></ul><ul><ul><li>joins “Y”:83,798 </li></ul></ul><ul><ul><li>fi nal hash emits 83,050 records. </li></ul></ul>
  27. 27. <ul><li>为两个 table 建立 index </li></ul><ul><li>提取 index 为两个 bin 文件 </li></ul><ul><li>u.bin11.8G , </li></ul><ul><li>n.bin41.8G , </li></ul><ul><li>X=31.3G , </li></ul><ul><li>Y=655K , </li></ul><ul><li>final=649K </li></ul><ul><li>Then start run! </li></ul>
  28. 28. Outline <ul><li>Motivation </li></ul><ul><li>Dryad System </li></ul><ul><li>Example(SQL Query) </li></ul><ul><li>Evaluation </li></ul><ul><li>Contribution </li></ul><ul><li>Thought </li></ul>
  29. 29. Evaluation <ul><li>Variety of applications: </li></ul><ul><ul><li>Relational queries </li></ul></ul><ul><ul><li>Large-scale matrix computations </li></ul></ul><ul><ul><li>Text-processing task </li></ul></ul><ul><li>两个实验: </li></ul><ul><ul><li>SQL query </li></ul></ul><ul><ul><ul><li>10 computers </li></ul></ul></ul><ul><ul><ul><li>2 dual-core Opteron processors,2GHz </li></ul></ul></ul><ul><ul><ul><li>8G DRAM,1.4T NTFS volume each computer </li></ul></ul></ul><ul><ul><ul><li>1Gbit/sec Ethernet. </li></ul></ul></ul><ul><ul><li>MR style data-mining </li></ul></ul><ul><ul><ul><li>10.2TBytes on 1800 computers. </li></ul></ul></ul><ul><ul><ul><li>Windows Server 2003 x64 SP1 </li></ul></ul></ul>
  30. 30. Evaluation <ul><li>SQL Query Result: </li></ul>共同: 1.Node S to Node Y ,使用 in-memory 方式进行 communication 2. 程序几乎相同
  31. 31. Outline <ul><li>Motivation </li></ul><ul><li>Dryad System </li></ul><ul><li>Example(SQL Query) </li></ul><ul><li>Evaluation </li></ul><ul><li>Contribution </li></ul><ul><li>Thought </li></ul>
  32. 32. 忽略的地方 <ul><li>Describing a Dryad Graph </li></ul><ul><ul><li>Design a simple lang to specify communication graph </li></ul></ul><ul><li>MapReduce Style Data Mining </li></ul><ul><li>Building on Dryad </li></ul><ul><ul><li>Nebula script language </li></ul></ul><ul><ul><li>Integration with SSIS </li></ul></ul><ul><ul><li>Distributed SQL queries </li></ul></ul>
  33. 33. Contribution <ul><li>build a general-purpose,high performance distributed excution engine. </li></ul><ul><ul><li>created/scheduling/manage/recover/deliver </li></ul></ul><ul><li>demonstrated the excellent performance of Dryad on a single multi-core pc up to clusters </li></ul><ul><li>Programmability of Dryad </li></ul><ul><ul><li>designed a simple graph desc lang </li></ul></ul><ul><ul><li>build simpler,higher-level programming abstractions for specific application domains on top of Dryad </li></ul></ul>
  34. 34. 评价 - 优点 <ul><li>Developer can easily control </li></ul><ul><ul><li>communication graph </li></ul></ul><ul><ul><li>subroutines on vertices </li></ul></ul><ul><ul><li>arbitrary num of input and output on one vertice ( vs mapreduce : single input/output set ) </li></ul></ul><ul><li>借鉴了 mapreduce , developer 可以最大限度的自由使用相关的 interface </li></ul>
  35. 35. 评价 - 缺点 <ul><li>From Paper: </li></ul><ul><li>Dryad is a low-level programming model than SQL or DirectX,developer must understand </li></ul><ul><ul><li>the structure of the computation </li></ul></ul><ul><ul><li>the organization </li></ul></ul><ul><ul><li>properties of the sys resources </li></ul></ul><ul><li>Any vertex is re-run more than a set num of times, entire job failed. </li></ul><ul><li>My thought: </li></ul><ul><li>整个系统的 performance </li></ul><ul><ul><li>依赖于 graph 的结构及定义 </li></ul></ul><ul><ul><li>依赖于 communication among vertices </li></ul></ul>
  36. 36. <ul><li>终于完了… . </li></ul><ul><li>Thank you! </li></ul>

×