Confluence
Upcoming SlideShare
Loading in...5
×
 

Confluence

on

  • 397 views

 

Statistics

Views

Total Views
397
Views on SlideShare
397
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Support provenance: which is keeping track of processing and intermediate results
  • Unfortunately current workflows cannot support the new data types. What I’m referring to is data produced by stock price updates, sensors etcand also the current workflow model cannot integrate DSMSIf we rely on the existing model of polling data sources, we are missing updates and have blocking operations which are insufficient for this type of computation.What we envision is a workflow model where streaming sources feed the workflow with data in various input points.
  • Our approach to supporting data stream processing for enablingmonitoring and collaborative applications is [click] CONFLUENCEInterestingly this project first introduced in collaborate com in 2008 where I unveiled our model, right here in Orlando.Since then we have implemented confluence by developing new workflow constructs, a new model of computation and have demoed our prototype in SIGMOD 2011.-- And in fact in this presentation I will share with you my experience in implementing this prototype.
  • Following, I will describe 3 key constructs of the continuous workflow modelSome implementation detailsAnd present two representative apps from the business and scientific domains
  • Additionally a group-by clause can be defined on a windowed queue. Besides support of the standard group-by on simple data types we also support complexIn this case you need to specify what needs to be used as grouping element.
  • Finally a key requirement in enabling stream processing is the ability for workflows to receive push updates.We enable it using three techniques.Firstly from inside the workflow going out.
  • And now I’ll present details of how the key continuous workflow constructs were implemented.
  • In summary, confluence is built in Java on top of Kepler.We chose Kepler because of openness and wide deployment.It’s based upon PtelemyIIAnd it provides a wide range of basic as well as specialized actors.And workflows are composed using a high level visual language.
  • Each task in Kepler is modeled as an actor, defined with input and output ports to consume and produce data tokens.Effectively this is the visual interface.
  • Actors are interconnected in the workflow using channels.
  • And some complex actors can be composed by other actors, and are classified as sub-workflows forming hierarchies.
  • The execution and communication semantics are facilitated by an entity called the director.It executes the workflow using a scheduleand since directors define the communication and execution semantics, an actor configuration can be reused with a different director thus exhibiting different behavior.
  • Kepler provides a variety of directors but none of them meets the requirements of the continuous workflow model.
  • …so we have developed our own director which enables the continuous execution of actors by running each one in a separate thread (much like the PN director), --adds timestamps to events-- and adds the window operator to the input queues of actors as a new type of receiver.
  • The receivers are objects contained inside the input ports of any actor.We have also made modifications to the port configuration dialogue box to define the size and step of windows, delete_used_token, as well as the the group-by flag and expression.
  • Supporting push communications meant implementing new workflow data source actors:JSON:Javascript object notation
  • Two representative applications from the business and science domains
  • Here is the Kepler interface which defines the workflow.Only implemented the source actors. Everything else is off-the-shelf actors.
  • Our second example is from the scientific domain. It’s development was driven by real astronomer’s.
  • Astroshelf is the source of events and the feedback panel. Annotations engine records the meta-data.Confluence completes the feedback loop by processing the events and producing new meta-data.At this point our collaborators are considering to test this prototype in a classroom setting.Details of these interactions are described in the paper.
  • -- turns our Kepler has proven to be a good choice, that allowed us to realize our model within a reasonable time and without unnecessary complexity.--that allowed us to show the usability of our model
  • TODO: add pictures and animateB2B enables dynamic interactions by interpolating internal and external applicationsEstablish Virtual EnterprisesMiddleware infrastructure: “the Grid”Seamlessly bring together the power of resources to the desktop.

Confluence Confluence Presentation Transcript

  • CONtinuous workFLow ExeCution Engine Panayiotis (Panickos) Neophytou Panos K. Chrysanthis Alexandros Labrinidis CollaborateCom 2011 Advanced Data Management Technologies Lab Computer Science Department University of Pittsburgh
  • Workflows are GREAT! Ability to automate processes Integrate and orchestrate resources (including humans) seamlessly and effectively.  Service composition. Process large data static sets Keep track of things (provenance) Re-usable Easy to program (Visual Languages) CONFLuEnCE: Implementation and Application Design 2
  • High data rates – Push model New type of data sources (proactive): (unsupported)  Stock price ticker, twitter stream, DSMS tuple streams.  Polling: blocking, miss updates.  Data items participate in multiple interleaving WF invocations. CONFLuEnCE: Implementation and Application Design 3
  • Our approach Goal: Enable monitoring and collaborative applications that involve processing and integration of continuous streams of data. CONFLuEnCE: Continuous Workflow Execution Engine  Define the model. [CollaborateCom 2008]  Develop the new constructs.  Window semantics, event waves, support backwards workflow compatibility, enable push  Develop the new model of computation  Continuously running workflow activities.  Deadline driven scheduling.  Implement prototype. [Demo SIGMOD 2011] CONFLuEnCE: Implementation and Application Design 4
  • Overview Motivation Continuous Workflow Model  Waves  Window Operator  Push communication CONFLuEnCE CWf Application Scenarios Conclusions CONFLuEnCE: Implementation and Application Design 5
  • Continuous Workflow Model Includes all existing workflow constructs. Waves of events to distinguish between event contexts. Window operators on queues. Continuously running activities. Ability to support push communications. CONFLuEnCE: Implementation and Application Design 6
  • Wave of events Distinguish events between multiple invocations of an activity. Waves expose provenance during design/execution. Allows synchronization of events of the same lineage.  E.g., Customer order: multiple items, multiple handlers CONFLuEnCE: Implementation and Application Design 7
  • Window Operator Apply flexible bounds on unbounded stream of events  Size – Token, Time, Wave, Semantics  Step (period of recalculation) - Token, Time, Wave, Semantics  Delete_used_events flag (after activity has finished executing) Triggers activities in combination with preconditions.  Window definition  Size=5min  Activity preconditions  Step=1min if (window.length >= 2)  Delete_used_events=true fire activityOut-of-stockevents 10 11 9 80 4 6 7 3 2 5 1 ∅ BD C B A Notify D C B A 11 8 6 0 Manager Fired: ✔ ✘ Expired If 2 events occur between 5 min A events of each other, then notify the manager. CONFLuEnCE: Implementation and Application Design 8
  • Group-By Scalar: (int, float, String, decimal, etc.) Complex: (array, record, matrix…)  E.g.: /entities/hashtag “ghost” /entities/hashtag “vampire” “vampire” “zombies” “ghost” Window spec: Size=2 token Step=2 token Delete_used_events=false “zombies” CONFLuEnCE: Implementation and Application Design 9
  • Push Communication Push communication patterns:  Broadcast  Publish/Subscribe In->Out  Out->In Port WF WFProducer Producer input input  Hybrid Port WF Producer input Producer Mediator Producer CONFLuEnCE: Implementation and Application Design 10
  • Overview Motivation Continuous Workflow Model CONFLuEnCE  Kepler’s Actor Oriented Modeling  Continuous Workflow Director  Windowed Operator  Push Communication CWf Application Scenarios Conclusions CONFLuEnCE: Implementation and Application Design 11
  • CONFLuEnCE: CONtinuousworkFLow ExeCution Engine Implements our Continuous Workflow model, in Java, as a module in Kepler Kepler’s benefits  Open-source scientific workflow system  Actor-based workflow modeling  Built on top of PtolemyII (modeling, simulating, designing concurrent, real-time systems)  Well defined models of computation – extendible, pluggable  Large number of basic and specialized actors (task components)  High-level visual language CONFLuEnCE: Implementation and Application Design 12
  • Kepler’s Actor Oriented ModelingPorts  each actor has a set of input and output ports  produce/consume data (a.k.a. tokens) CONFLuEnCE: Implementation and Application Design 13
  • Kepler’s Actor Oriented ModelingDataflow Connections  unidirectional actor “communication” channels  connect output ports with input ports CONFLuEnCE: Implementation and Application Design 14
  • Kepler’s Actor Oriented ModelingSub-workflows / Composite Actors  composite actors “wrap” sub-workflows  hierarchical workflows (arbitrary nesting levels) CONFLuEnCE: Implementation and Application Design 15
  • Kepler’s Actor Oriented Modeling PN DirectorDirectors SDF Director  defines the execution and communication semantics of workflow graphs  executes workflow graph (some schedule)  sub-workflows may have different directors  promotes reusability CONFLuEnCE: Implementation and Application Design 16
  • Kepler Directors – Models ofComputationDirectors separate the concerns of orchestration and scheduling from conceptual design  Synchronous Dataflow (SDF)  Process Networks (PN)  Dynamic Data Flow (DDF)  Continuous Time (CT)  Discrete Event (DE)  … CONFLuEnCE: Implementation and Application Design 17
  • Continuous Workflow Director CWfs require continuous execution of the actors Stream data are events in time. Require timestamps CWf director:  Extends the PN director  Add timestamps on events using TimeKeeper on each actor.  Add Window Operators on buffer queues (receivers) CONFLuEnCE: Implementation and Application Design 18
  • Windowed Receiver Kepler extension to support window semantics CWF Director I/O Ports Producer Consumer windowed receiver CONFLuEnCE: Implementation and Application Design 19
  • Push Communication Implemented JSON WebSocket Server Actor (Out->In)  Listens to predefined port  Converts JSON objects to RecordToken(s)  Enables continuous connectivity with web-browsers Implemented HTTP Socket Stream Source Actor (In->Out)  Connects directly to an HTTP stream source (e.g., twitter) and receives data continuously Implemented the hybrid approach using PubSubHubbub [http://code.google.com/apis/pubsubhubbub/] CONFLuEnCE: Implementation and Application Design 20
  • Overview Motivation Continuous Workflow Model CONFLuEnCE CWf Application Scenarios  Supply Chain Management  Astroshelf’s collaboration Backend Conclusions CONFLuEnCE: Implementation and Application Design 21
  • Supply Chain Management Real-time monitoring of a supply chain 4 User Roles:  Customer, Warehouse Mgr, Company Mgr, Admin 22
  • Supply Chain Management CWF Director CONFLuEnCE: Implementation and Application Design 23
  • Astroshelf A collaboration platform for astrophysicists  Annotate sky objects and events. CONFLuEnCE: Live annotations & Integration. Astroshelf team: • Liz Marai • Timothy Luciani • Rebecca Hachey • Roxana Gheorghiu • Boyu Sun Astronomers: • Arthur Kosowsky • Jeffrey Newman • Michael Wood- Vasley • Brian Cherinca CONFLuEnCE: Implementation and Application Design • Anja Weyant 24
  • Astroshelf CONFLuEnCE: Implementation and Application Design 25
  • Conclusions The Continuous Workflow model Foundation for CONFLuEnCE  CONtinuous workFLow ExeCution Engine  Built on top of Kepler  Includes a new director, windowed receiver and, source actors enabling Push communication. Two Monitoring and Collaborative Application implementations. Future: Design a director, which implements scheduling, sensitive to QoS requirements. CONFLuEnCE: Implementation and Application Design 26
  • Supported by NSF grants: IIS-0534531 and OIA-1028162http://db.cs.pitt.edu/group/projects/confluencehttp://db.cs.pitt.edu/group/projects/astroshelf Special thanks to: Astroshelf team: Astronomy collaborators: • Liz Marai, • Arthur Kosowsky, • Timothy Luciani, • Jeffrey Newman, • Rebecca Hachey, • Michael Wood- • Roxana Gheorghiu Vasley, • Brian Cherinca, • Anja Weyant. CONFLuEnCE: Implementation and Application Design 27
  • Conclusions The Continuous Workflow model Foundation for CONFLuEnCE  CONtinuous workFLow ExeCution Engine  Built on top of Kepler  Includes a new director, windowed receiver and, source actors enabling Push communication. Two Monitoring and Collaborative Application implementations. Future: Design a director, which implements scheduling, sensitive to QoS requirements. http://db.cs.pitt.edu/group/projects/confluence http://db.cs.pitt.edu/group/projects/astroshelf 28
  • Workflows vs. DSMS vs. CWfs DSMS CWfs WFs Staticconfiguration Flexibility QoS/QoD General purpose driven Declarative & Stream Procedural processing Human integration Declarative Feedback Loops CONFLuEnCE: Implementation and Application Design 29