Your SlideShare is downloading. ×

Topic 8: Enhancements and Alternative Architectures

200

Published on

Cloud Computing Workshop 2013, ITU

Cloud Computing Workshop 2013, ITU

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
200
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 8: Enhancements and Alternative ArchitecturesZubair Nabizubair.nabi@itu.edu.pkApril 19, 2013Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 1 / 45
  • 2. Outline1 Major shortcomings2 Pig Latin3 Dryad4 CIEL5 NaiadZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 2 / 45
  • 3. Outline1 Major shortcomings2 Pig Latin3 Dryad4 CIEL5 NaiadZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 3 / 45
  • 4. Focusing on someLow-level programming interfaceZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 4 / 45
  • 5. Focusing on someLow-level programming interfaceIterative and recursive applicationsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 4 / 45
  • 6. Focusing on someLow-level programming interfaceIterative and recursive applicationsIncremental computationsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 4 / 45
  • 7. Outline1 Major shortcomings2 Pig Latin3 Dryad4 CIEL5 NaiadZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 5 / 45
  • 8. IntroductionMapReduce is too low-level and rigid and leads to lots of custom usercodeZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 6 / 45
  • 9. IntroductionMapReduce is too low-level and rigid and leads to lots of custom usercodePig Latin is a declarative language atop MapReduce designed byYahoo!Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 6 / 45
  • 10. IntroductionMapReduce is too low-level and rigid and leads to lots of custom usercodePig Latin is a declarative language atop MapReduce designed byYahoo!Finds the sweet spot between the declarative style of SQL and thelow-level interface of MapReduceZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 6 / 45
  • 11. IntroductionMapReduce is too low-level and rigid and leads to lots of custom usercodePig Latin is a declarative language atop MapReduce designed byYahoo!Finds the sweet spot between the declarative style of SQL and thelow-level interface of MapReduceThe Pig system compiles Pig Latin queries into physical plans that areexecuted atop HadoopZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 6 / 45
  • 12. SQL query to find average pagerank for each large categoryof URLs1 SELECT category , AVG(pagerank)2 FROM urls WHERE pagerank > 0.23 GROUP BY category HAVING COUNT(∗) > 10^6Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 7 / 45
  • 13. Equivalent Pig query1 good_urls = FILTER urls BY pagerank > 0.2;2 groups = GROUP good_urls BY category;3 big_groups = FILTER groups BY COUNT(good_urls)>10^6;4 output = FOREACH big_groups GENERATE5 category , AVG(good_urls.pagerank);Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 8 / 45
  • 14. Pig InterfaceA Pig Latin program is a sequence of steps, reminiscent of traditionalprogramming languagesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 9 / 45
  • 15. Pig InterfaceA Pig Latin program is a sequence of steps, reminiscent of traditionalprogramming languagesIn contrast, SQL consists of declarative constraints that collectivelydefine the resultZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 9 / 45
  • 16. Pig InterfaceA Pig Latin program is a sequence of steps, reminiscent of traditionalprogramming languagesIn contrast, SQL consists of declarative constraints that collectivelydefine the resultEach step carries out a single data transformationZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 9 / 45
  • 17. Pig InterfaceA Pig Latin program is a sequence of steps, reminiscent of traditionalprogramming languagesIn contrast, SQL consists of declarative constraints that collectivelydefine the resultEach step carries out a single data transformationA Pig Latin program is similar to specifying a query execution or adataflow graphZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 9 / 45
  • 18. Pig InterfaceA Pig Latin program is a sequence of steps, reminiscent of traditionalprogramming languagesIn contrast, SQL consists of declarative constraints that collectivelydefine the resultEach step carries out a single data transformationA Pig Latin program is similar to specifying a query execution or adataflow graphDue to this dataflow model, it is easier for programmers to understandand control how their data processing task is executedZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 9 / 45
  • 19. FeaturesSupport for a fully nested data model with complex data typesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 10 / 45
  • 20. FeaturesSupport for a fully nested data model with complex data typesExtensive support for user-defined functionsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 10 / 45
  • 21. FeaturesSupport for a fully nested data model with complex data typesExtensive support for user-defined functionsAbility to operate over plain, schema-less input filesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 10 / 45
  • 22. FeaturesSupport for a fully nested data model with complex data typesExtensive support for user-defined functionsAbility to operate over plain, schema-less input filesOpen-source Apache projectZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 10 / 45
  • 23. InteroperabilityQueries can be performed atop raw data dumps directlyZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 11 / 45
  • 24. InteroperabilityQueries can be performed atop raw data dumps directlyThe user needs to provide a function to parse the content of the file intotuplesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 11 / 45
  • 25. InteroperabilityQueries can be performed atop raw data dumps directlyThe user needs to provide a function to parse the content of the file intotuplesSimilarly, the user also needs to provide a function to convert tuplesinto a byte sequenceZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 11 / 45
  • 26. InteroperabilityQueries can be performed atop raw data dumps directlyThe user needs to provide a function to parse the content of the file intotuplesSimilarly, the user also needs to provide a function to convert tuplesinto a byte sequenceDatasets can be laid across diverse data storage sources andapplicationsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 11 / 45
  • 27. UDFs as first-class citizensA significant part of large-scale data analysis relies on customprocessingZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 12 / 45
  • 28. UDFs as first-class citizensA significant part of large-scale data analysis relies on customprocessingFor instance, the user may be interested in figuring out whether aparticular website is spamZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 12 / 45
  • 29. UDFs as first-class citizensA significant part of large-scale data analysis relies on customprocessingFor instance, the user may be interested in figuring out whether aparticular website is spamAll aspects of processing in Pig Latin including grouping, filtering,joining, and per-tuple processing can be customized via UDFsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 12 / 45
  • 30. UDFs as first-class citizensA significant part of large-scale data analysis relies on customprocessingFor instance, the user may be interested in figuring out whether aparticular website is spamAll aspects of processing in Pig Latin including grouping, filtering,joining, and per-tuple processing can be customized via UDFsUDFs take non-atomic parameters as input and produce non-atomicvalues as outputZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 12 / 45
  • 31. UDFs as first-class citizensA significant part of large-scale data analysis relies on customprocessingFor instance, the user may be interested in figuring out whether aparticular website is spamAll aspects of processing in Pig Latin including grouping, filtering,joining, and per-tuple processing can be customized via UDFsUDFs take non-atomic parameters as input and produce non-atomicvalues as outputUDFs are defined in Java1 groups = GROUP urls BY category;2 output = FOREACH groups GENERATE3 category , top10(urls);Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 12 / 45
  • 32. Data ModelPig has four data types:1 Atom: A single atomic value such as a string or an integerZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 13 / 45
  • 33. Data ModelPig has four data types:1 Atom: A single atomic value such as a string or an integer2 Tuple: A sequence of values, each with possibly a different data typeZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 13 / 45
  • 34. Data ModelPig has four data types:1 Atom: A single atomic value such as a string or an integer2 Tuple: A sequence of values, each with possibly a different data type3 Bag: A collection of tuplesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 13 / 45
  • 35. Data ModelPig has four data types:1 Atom: A single atomic value such as a string or an integer2 Tuple: A sequence of values, each with possibly a different data type3 Bag: A collection of tuples4 Map: A collection of data types, each with an associated keyZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 13 / 45
  • 36. CommandsLOAD: Load and deserialize an input fileZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 14 / 45
  • 37. CommandsLOAD: Load and deserialize an input fileFOREACH: Process each tuple of a datasetZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 14 / 45
  • 38. CommandsLOAD: Load and deserialize an input fileFOREACH: Process each tuple of a datasetFILTER: Filter a dataset based on some condition or UDFZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 14 / 45
  • 39. CommandsLOAD: Load and deserialize an input fileFOREACH: Process each tuple of a datasetFILTER: Filter a dataset based on some condition or UDFCOGROUP: Group together tuples which are related in some way fromone or more datasetsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 14 / 45
  • 40. CommandsLOAD: Load and deserialize an input fileFOREACH: Process each tuple of a datasetFILTER: Filter a dataset based on some condition or UDFCOGROUP: Group together tuples which are related in some way fromone or more datasetsSTORE: Materialize the output of a Pig Latin expression to a fileZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 14 / 45
  • 41. Outline1 Major shortcomings2 Pig Latin3 Dryad4 CIEL5 NaiadZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 15 / 45
  • 42. IntroductionMapReduce is strictly two stage, single input set and single output setZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 16 / 45
  • 43. IntroductionMapReduce is strictly two stage, single input set and single output setAwkward architecture to perform multi-stage computationZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 16 / 45
  • 44. MapReduce: ArchitectureZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 17 / 45
  • 45. DryadDryad allows computations that can form a Directed Acyclic Graph(DAG)Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 18 / 45
  • 46. DryadDryad allows computations that can form a Directed Acyclic Graph(DAG)Each vertice within the graph is a computation while an edge depictscommunication channelsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 18 / 45
  • 47. DryadDryad allows computations that can form a Directed Acyclic Graph(DAG)Each vertice within the graph is a computation while an edge depictscommunication channelsEach computation can take in multiple files as input and producemultiple outputsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 18 / 45
  • 48. DryadDryad allows computations that can form a Directed Acyclic Graph(DAG)Each vertice within the graph is a computation while an edge depictscommunication channelsEach computation can take in multiple files as input and producemultiple outputsDeveloped by Microsoft ResearchZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 18 / 45
  • 49. Dryad: ArchitectureZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 19 / 45
  • 50. Dryad: Architecture (2)Files, TCP, FIFO, Networkjob scheduledata planecontrol planeNS PD PDPDV V VJob manager clusterZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 20 / 45
  • 51. Dryad: JobJob = Directed Acyclic GraphProcessingvertices Channels(file, pipe,sharedmemory)InputsOutputsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 21 / 45
  • 52. Channel types and job inputs and outputsChannel types: File, TCP pipe, Shared-memory FIFOZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 22 / 45
  • 53. Channel types and job inputs and outputsChannel types: File, TCP pipe, Shared-memory FIFOEncapsulation: Convert a graph into a single vertex, and run withinsame processZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 22 / 45
  • 54. Channel types and job inputs and outputsChannel types: File, TCP pipe, Shared-memory FIFOEncapsulation: Convert a graph into a single vertex, and run withinsame processJob inputs and outputs: Can be logically concatenatedZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 22 / 45
  • 55. VerticesProgramming in C++/C#Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 23 / 45
  • 56. VerticesProgramming in C++/C#Runtime library sets up and executes verticesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 23 / 45
  • 57. VerticesProgramming in C++/C#Runtime library sets up and executes verticesMap and Reduce classesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 23 / 45
  • 58. VerticesProgramming in C++/C#Runtime library sets up and executes verticesMap and Reduce classesProcess wrapper: To support legacy executablesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 23 / 45
  • 59. VerticesProgramming in C++/C#Runtime library sets up and executes verticesMap and Reduce classesProcess wrapper: To support legacy executablesSupports event-based programmingZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 23 / 45
  • 60. Dryad: ExampleD DMM 4nSS 4nYYHnnX XnU UN NU Uselect distinct p.objIDfrom photoObjAll pjoin neighbors n – call this join “X”on p.objID = n.objIDand n.objID < n.neighborObjIDand p.mode = 1join photoObjAll l – call this join “Y”on l.objid = n.neighborObjIDand l.mode = 1and abs((p.u-p.g)-(l.u-l.g))<0.05and abs((p.g-p.r)-(l.g-l.r))<0.05and abs((p.r-p.i)-(l.r-l.i))<0.05and abs((p.i-p.z)-(l.i-l.z))<0.05Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 24 / 45
  • 61. OperationsCreate Vertices using C++ base classZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 25 / 45
  • 62. OperationsCreate Vertices using C++ base classAdd edgesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 25 / 45
  • 63. OperationsCreate Vertices using C++ base classAdd edgesMerge two graphsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 25 / 45
  • 64. Operations (2)Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 26 / 45
  • 65. Job ExecutionVertex can specify “hard constraint” or “preference”Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 27 / 45
  • 66. Job ExecutionVertex can specify “hard constraint” or “preference”Job manager runs greedy scheduling algorithm: Only job runningZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 27 / 45
  • 67. Job ExecutionVertex can specify “hard constraint” or “preference”Job manager runs greedy scheduling algorithm: Only job runningSimple graph visualizer: State of each vertex and channel for smalljobsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 27 / 45
  • 68. Job ExecutionVertex can specify “hard constraint” or “preference”Job manager runs greedy scheduling algorithm: Only job runningSimple graph visualizer: State of each vertex and channel for smalljobsWeb-based interface: Regularly-updated statisticsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 27 / 45
  • 69. Job ExecutionVertex can specify “hard constraint” or “preference”Job manager runs greedy scheduling algorithm: Only job runningSimple graph visualizer: State of each vertex and channel for smalljobsWeb-based interface: Regularly-updated statisticsFault Tolerance: Vertices deterministic. Just re-scheduleZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 27 / 45
  • 70. Job ExecutionVertex can specify “hard constraint” or “preference”Job manager runs greedy scheduling algorithm: Only job runningSimple graph visualizer: State of each vertex and channel for smalljobsWeb-based interface: Regularly-updated statisticsFault Tolerance: Vertices deterministic. Just re-scheduleSpeculative execution within stagesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 27 / 45
  • 71. Run-time Graph RefinementAggregation tree: Distributed combinerZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 28 / 45
  • 72. Run-time Graph RefinementAggregation tree: Distributed combinerAssociative, and commutative computationZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 28 / 45
  • 73. Outline1 Major shortcomings2 Pig Latin3 Dryad4 CIEL5 NaiadZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 29 / 45
  • 74. IntroductionMapReduce and Dryad are not amenable to iterative and recursiveapplicationsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 30 / 45
  • 75. IntroductionMapReduce and Dryad are not amenable to iterative and recursiveapplicationsMost machine learning and data mining applications are iterative innatureZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 30 / 45
  • 76. IntroductionMapReduce and Dryad are not amenable to iterative and recursiveapplicationsMost machine learning and data mining applications are iterative innatureThese applications require a data-dependent control flowThe ability to spawn new tasks on the fly based on previouscomputationsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 30 / 45
  • 77. CIEL1 Data-centric execution engine from Cambridge: the goal of a CIEL jobis to produce one or more output objectsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 31 / 45
  • 78. CIEL1 Data-centric execution engine from Cambridge: the goal of a CIEL jobis to produce one or more output objects2 A reference can be obtained to an object without materializing its fullcontents, reminiscent of C pointersZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 31 / 45
  • 79. CIEL1 Data-centric execution engine from Cambridge: the goal of a CIEL jobis to produce one or more output objects2 A reference can be obtained to an object without materializing its fullcontents, reminiscent of C pointersIf objects do not have their full contents, their references are futurereferences; otherwise they are concrete referencesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 31 / 45
  • 80. CIEL1 Data-centric execution engine from Cambridge: the goal of a CIEL jobis to produce one or more output objects2 A reference can be obtained to an object without materializing its fullcontents, reminiscent of C pointersIf objects do not have their full contents, their references are futurereferences; otherwise they are concrete references3 A job makes progress by executing tasksZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 31 / 45
  • 81. Tasks1 Each task has dependencies on one of more objects via referencesand it starts executing once all of its references become concreteZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 32 / 45
  • 82. Tasks1 Each task has dependencies on one of more objects via referencesand it starts executing once all of its references become concrete2 The purpose of each task is to produce objectsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 32 / 45
  • 83. Tasks1 Each task has dependencies on one of more objects via referencesand it starts executing once all of its references become concrete2 The purpose of each task is to produce objects1 A task can publish one or more objects by creating a concretereference for themZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 32 / 45
  • 84. Tasks1 Each task has dependencies on one of more objects via referencesand it starts executing once all of its references become concrete2 The purpose of each task is to produce objects1 A task can publish one or more objects by creating a concretereference for them2 A task can also spawn new tasks and delegate the creation of output tothemZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 32 / 45
  • 85. Tasks1 Each task has dependencies on one of more objects via referencesand it starts executing once all of its references become concrete2 The purpose of each task is to produce objects1 A task can publish one or more objects by creating a concretereference for them2 A task can also spawn new tasks and delegate the creation of output tothem3 The dynamic task graph stores the relation between tasks andobjectsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 32 / 45
  • 86. CIEL: ArchitectureZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 33 / 45
  • 87. CIEL: Dynamic Task GraphZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 34 / 45
  • 88. Executors1 CIEL maintains a decoupling between tasks and the underlyingframework through the concept of executorsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 35 / 45
  • 89. Executors1 CIEL maintains a decoupling between tasks and the underlyingframework through the concept of executors2 Each programming language has a corresponding executorZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 35 / 45
  • 90. Executors1 CIEL maintains a decoupling between tasks and the underlyingframework through the concept of executors2 Each programming language has a corresponding executor3 As a result, a task can be written in any programming language, suchas Java, Python, shell-script, etc. as well as the indigenous SkywritingZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 35 / 45
  • 91. Skywriting1 Scripting language for expressing task-level parallelism atop CIELZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 36 / 45
  • 92. Skywriting1 Scripting language for expressing task-level parallelism atop CIEL2 Contains data-dependent control flow constructs such as while loopsand recursive functionsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 36 / 45
  • 93. Skywriting1 Scripting language for expressing task-level parallelism atop CIEL2 Contains data-dependent control flow constructs such as while loopsand recursive functions3 Ability to spawn new tasks in the middle of executionZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 36 / 45
  • 94. Skywriting constructs1 ref(url): Returns a reference to the object located (both local andremote) at urlZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 37 / 45
  • 95. Skywriting constructs1 ref(url): Returns a reference to the object located (both local andremote) at url2 spawn(f, [arg, ...]): Spawns a task to evaluate fZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 37 / 45
  • 96. Skywriting constructs1 ref(url): Returns a reference to the object located (both local andremote) at url2 spawn(f, [arg, ...]): Spawns a task to evaluate f3 exec(executor, args, n): Runs the given executor toevaluate argsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 37 / 45
  • 97. Skywriting constructs1 ref(url): Returns a reference to the object located (both local andremote) at url2 spawn(f, [arg, ...]): Spawns a task to evaluate f3 exec(executor, args, n): Runs the given executor toevaluate args4 spawn_exec(executor, args, n): Spawns a new task to runthe given executorZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 37 / 45
  • 98. Skywriting constructs1 ref(url): Returns a reference to the object located (both local andremote) at url2 spawn(f, [arg, ...]): Spawns a task to evaluate f3 exec(executor, args, n): Runs the given executor toevaluate args4 spawn_exec(executor, args, n): Spawns a new task to runthe given executor5 *-: De-references the given referenceZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 37 / 45
  • 99. Example: Skywriting1 function process_chunk(chunk, prev_result) {2 return spawn_exec(...);3 }4 function is_converged(curr_result , prev_result) {5 return spawn_exec(...)[0];6 }7 input_data = [ref("ciel://host137/chunk0"),8 ref("ciel://host223/chunk1"), ...];9 curr = ...; // Initial guess at the result.10 do {11 prev = curr;12 curr = [];13 for (chunk in input_data) {14 curr += process_chunk(chunk, prev);15 }16 } while (!∗is_converged(curr, prev));17 return curr;Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 38 / 45
  • 100. Outline1 Major shortcomings2 Pig Latin3 Dryad4 CIEL5 NaiadZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 39 / 45
  • 101. IntroductionA class of applications requires support for both iterative andincremental computationZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 40 / 45
  • 102. IntroductionA class of applications requires support for both iterative andincremental computationFor instance, to maintain in real-time the strongly connectedcomponent structure in the graph induced by Twitter mentionsZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 40 / 45
  • 103. IntroductionA class of applications requires support for both iterative andincremental computationFor instance, to maintain in real-time the strongly connectedcomponent structure in the graph induced by Twitter mentionsCurrently, MapReduce itself has no support for either iterative orincremental computationZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 40 / 45
  • 104. NaiadData-intensive computing framework from Microsoft Research thatsupports both incremental and iterative computation by leveragingdifferential computationZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 41 / 45
  • 105. NaiadData-intensive computing framework from Microsoft Research thatsupports both incremental and iterative computation by leveragingdifferential computationDifferential computation adds two novelty factors to the framework:Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 41 / 45
  • 106. NaiadData-intensive computing framework from Microsoft Research thatsupports both incremental and iterative computation by leveragingdifferential computationDifferential computation adds two novelty factors to the framework:1 The state of the computation varies according to a partially ordered setof versions rather than a total orderingZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 41 / 45
  • 107. NaiadData-intensive computing framework from Microsoft Research thatsupports both incremental and iterative computation by leveragingdifferential computationDifferential computation adds two novelty factors to the framework:1 The state of the computation varies according to a partially ordered setof versions rather than a total ordering2 The set of updates required to reconstruct the state at any version isretained in an indexed data-structureZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 41 / 45
  • 108. NaiadData-intensive computing framework from Microsoft Research thatsupports both incremental and iterative computation by leveragingdifferential computationDifferential computation adds two novelty factors to the framework:1 The state of the computation varies according to a partially ordered setof versions rather than a total ordering2 The set of updates required to reconstruct the state at any version isretained in an indexed data-structureThe state and updates to that state are associated with amulti-dimensional logical timestamp (called a version)Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 41 / 45
  • 109. Programming environmentDeclarative query language based on the .NET Language IntegratedQuery (LINQ)Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 42 / 45
  • 110. Programming environmentDeclarative query language based on the .NET Language IntegratedQuery (LINQ)LINQ extends C# with declarative operators, such as Select,Where, Join, and GroupBy, among othersZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 42 / 45
  • 111. Programming environmentDeclarative query language based on the .NET Language IntegratedQuery (LINQ)LINQ extends C# with declarative operators, such as Select,Where, Join, and GroupBy, among othersNaiad adds two more operators:1 FixedPoint that takes a source collection and a function thatmutates the collection to another collection of the same type to achievefixed-point convergenceZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 42 / 45
  • 112. Programming environmentDeclarative query language based on the .NET Language IntegratedQuery (LINQ)LINQ extends C# with declarative operators, such as Select,Where, Join, and GroupBy, among othersNaiad adds two more operators:1 FixedPoint that takes a source collection and a function thatmutates the collection to another collection of the same type to achievefixed-point convergence2 PrioritizedFP additionally takes a priority function to apply toevery record in the source collectionZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 42 / 45
  • 113. RuntimeThe Naiad runtime transforms declarative queries to a dataflow graphZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 43 / 45
  • 114. RuntimeThe Naiad runtime transforms declarative queries to a dataflow graphThe user program can insert differences into the input collections andregister callbacks to be invoked when differences are received at theoutput collectionZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 43 / 45
  • 115. RuntimeThe Naiad runtime transforms declarative queries to a dataflow graphThe user program can insert differences into the input collections andregister callbacks to be invoked when differences are received at theoutput collectionThe runtime transparently distributes the execution of the data flowgraph (similar to Dryad) across several cores and nodesZubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 43 / 45
  • 116. References1 Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar,and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language fordata processing. In Proceedings of the 2008 ACM SIGMODinternational conference on Management of data (SIGMOD ’08). ACM,New York, NY, USA, 1099-1110.2 Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and DennisFetterly. 2007. Dryad: distributed data-parallel programs fromsequential building blocks. In Proceedings of the 2nd ACMSIGOPS/EuroSys European Conference on Computer Systems 2007(EuroSys ’07). ACM, New York, NY, USA, 59-72.3 Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, StevenSmith, Anil Madhavapeddy, and Steven Hand. 2011. CIEL: a universalexecution engine for distributed data-flow computing. In Proceedings ofthe 8th USENIX conference on Networked systems design andimplementation (NSDI’11). USENIX Association, Berkeley, CA, USA.Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 44 / 45
  • 117. References (2)4 Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard.Differential dataflow. 2013. In Conference on Innovative Data SystemsResearch (CIDR), 2013.Zubair Nabi 8: Enhancements and Alternative Architectures April 19, 2013 45 / 45

×