Support provenance: which is keeping track of processing and intermediate results
Unfortunately current workflows cannot support the new data types. What I’m referring to is data produced by stock price updates, sensors etcand also the current workflow model cannot integrate DSMSIf we rely on the existing model of polling data sources, we are missing updates and have blocking operations which are insufficient for this type of computation.What we envision is a workflow model where streaming sources feed the workflow with data in various input points.
Our approach to supporting data stream processing for enablingmonitoring and collaborative applications is [click] CONFLUENCEInterestingly this project first introduced in collaborate com in 2008 where I unveiled our model, right here in Orlando.Since then we have implemented confluence by developing new workflow constructs, a new model of computation and have demoed our prototype in SIGMOD 2011.-- And in fact in this presentation I will share with you my experience in implementing this prototype.
Following, I will describe 3 key constructs of the continuous workflow modelSome implementation detailsAnd present two representative apps from the business and scientific domains
Additionally a group-by clause can be defined on a windowed queue. Besides support of the standard group-by on simple data types we also support complexIn this case you need to specify what needs to be used as grouping element.
Finally a key requirement in enabling stream processing is the ability for workflows to receive push updates.We enable it using three techniques.Firstly from inside the workflow going out.
And now I’ll present details of how the key continuous workflow constructs were implemented.
In summary, confluence is built in Java on top of Kepler.We chose Kepler because of openness and wide deployment.It’s based upon PtelemyIIAnd it provides a wide range of basic as well as specialized actors.And workflows are composed using a high level visual language.
Each task in Kepler is modeled as an actor, defined with input and output ports to consume and produce data tokens.Effectively this is the visual interface.
Actors are interconnected in the workflow using channels.
And some complex actors can be composed by other actors, and are classified as sub-workflows forming hierarchies.
The execution and communication semantics are facilitated by an entity called the director.It executes the workflow using a scheduleand since directors define the communication and execution semantics, an actor configuration can be reused with a different director thus exhibiting different behavior.
Kepler provides a variety of directors but none of them meets the requirements of the continuous workflow model.
…so we have developed our own director which enables the continuous execution of actors by running each one in a separate thread (much like the PN director), --adds timestamps to events-- and adds the window operator to the input queues of actors as a new type of receiver.
The receivers are objects contained inside the input ports of any actor.We have also made modifications to the port configuration dialogue box to define the size and step of windows, delete_used_token, as well as the the group-by flag and expression.
Two representative applications from the business and science domains
Here is the Kepler interface which defines the workflow.Only implemented the source actors. Everything else is off-the-shelf actors.
Our second example is from the scientific domain. It’s development was driven by real astronomer’s.
Astroshelf is the source of events and the feedback panel. Annotations engine records the meta-data.Confluence completes the feedback loop by processing the events and producing new meta-data.At this point our collaborators are considering to test this prototype in a classroom setting.Details of these interactions are described in the paper.
-- turns our Kepler has proven to be a good choice, that allowed us to realize our model within a reasonable time and without unnecessary complexity.--that allowed us to show the usability of our model
TODO: add pictures and animateB2B enables dynamic interactions by interpolating internal and external applicationsEstablish Virtual EnterprisesMiddleware infrastructure: “the Grid”Seamlessly bring together the power of resources to the desktop.
CONtinuous workFLow ExeCution Engine Panayiotis (Panickos) Neophytou Panos K. Chrysanthis Alexandros Labrinidis CollaborateCom 2011 Advanced Data Management Technologies Lab Computer Science Department University of Pittsburgh
Workflows are GREAT! Ability to automate processes Integrate and orchestrate resources (including humans) seamlessly and effectively. Service composition. Process large data static sets Keep track of things (provenance) Re-usable Easy to program (Visual Languages) CONFLuEnCE: Implementation and Application Design 2
High data rates – Push model New type of data sources (proactive): (unsupported) Stock price ticker, twitter stream, DSMS tuple streams. Polling: blocking, miss updates. Data items participate in multiple interleaving WF invocations. CONFLuEnCE: Implementation and Application Design 3
Our approach Goal: Enable monitoring and collaborative applications that involve processing and integration of continuous streams of data. CONFLuEnCE: Continuous Workflow Execution Engine Define the model. [CollaborateCom 2008] Develop the new constructs. Window semantics, event waves, support backwards workflow compatibility, enable push Develop the new model of computation Continuously running workflow activities. Deadline driven scheduling. Implement prototype. [Demo SIGMOD 2011] CONFLuEnCE: Implementation and Application Design 4
Continuous Workflow Model Includes all existing workflow constructs. Waves of events to distinguish between event contexts. Window operators on queues. Continuously running activities. Ability to support push communications. CONFLuEnCE: Implementation and Application Design 6
Wave of events Distinguish events between multiple invocations of an activity. Waves expose provenance during design/execution. Allows synchronization of events of the same lineage. E.g., Customer order: multiple items, multiple handlers CONFLuEnCE: Implementation and Application Design 7
Window Operator Apply flexible bounds on unbounded stream of events Size – Token, Time, Wave, Semantics Step (period of recalculation) - Token, Time, Wave, Semantics Delete_used_events flag (after activity has finished executing) Triggers activities in combination with preconditions. Window definition Size=5min Activity preconditions Step=1min if (window.length >= 2) Delete_used_events=true fire activityOut-of-stockevents 10 11 9 80 4 6 7 3 2 5 1 ∅ BD C B A Notify D C B A 11 8 6 0 Manager Fired: ✔ ✘ Expired If 2 events occur between 5 min A events of each other, then notify the manager. CONFLuEnCE: Implementation and Application Design 8
CONFLuEnCE: CONtinuousworkFLow ExeCution Engine Implements our Continuous Workflow model, in Java, as a module in Kepler Kepler’s benefits Open-source scientific workflow system Actor-based workflow modeling Built on top of PtolemyII (modeling, simulating, designing concurrent, real-time systems) Well defined models of computation – extendible, pluggable Large number of basic and specialized actors (task components) High-level visual language CONFLuEnCE: Implementation and Application Design 12
Kepler’s Actor Oriented ModelingPorts each actor has a set of input and output ports produce/consume data (a.k.a. tokens) CONFLuEnCE: Implementation and Application Design 13
Kepler’s Actor Oriented ModelingDataflow Connections unidirectional actor “communication” channels connect output ports with input ports CONFLuEnCE: Implementation and Application Design 14
Kepler’s Actor Oriented Modeling PN DirectorDirectors SDF Director defines the execution and communication semantics of workflow graphs executes workflow graph (some schedule) sub-workflows may have different directors promotes reusability CONFLuEnCE: Implementation and Application Design 16
Kepler Directors – Models ofComputationDirectors separate the concerns of orchestration and scheduling from conceptual design Synchronous Dataflow (SDF) Process Networks (PN) Dynamic Data Flow (DDF) Continuous Time (CT) Discrete Event (DE) … CONFLuEnCE: Implementation and Application Design 17
Continuous Workflow Director CWfs require continuous execution of the actors Stream data are events in time. Require timestamps CWf director: Extends the PN director Add timestamps on events using TimeKeeper on each actor. Add Window Operators on buffer queues (receivers) CONFLuEnCE: Implementation and Application Design 18
Windowed Receiver Kepler extension to support window semantics CWF Director I/O Ports Producer Consumer windowed receiver CONFLuEnCE: Implementation and Application Design 19
Push Communication Implemented JSON WebSocket Server Actor (Out->In) Listens to predefined port Converts JSON objects to RecordToken(s) Enables continuous connectivity with web-browsers Implemented HTTP Socket Stream Source Actor (In->Out) Connects directly to an HTTP stream source (e.g., twitter) and receives data continuously Implemented the hybrid approach using PubSubHubbub [http://code.google.com/apis/pubsubhubbub/] CONFLuEnCE: Implementation and Application Design 20
Supply Chain Management Real-time monitoring of a supply chain 4 User Roles: Customer, Warehouse Mgr, Company Mgr, Admin 22
Supply Chain Management CWF Director CONFLuEnCE: Implementation and Application Design 23
Astroshelf A collaboration platform for astrophysicists Annotate sky objects and events. CONFLuEnCE: Live annotations & Integration. Astroshelf team: • Liz Marai • Timothy Luciani • Rebecca Hachey • Roxana Gheorghiu • Boyu Sun Astronomers: • Arthur Kosowsky • Jeffrey Newman • Michael Wood- Vasley • Brian Cherinca CONFLuEnCE: Implementation and Application Design • Anja Weyant 24
Astroshelf CONFLuEnCE: Implementation and Application Design 25
Conclusions The Continuous Workflow model Foundation for CONFLuEnCE CONtinuous workFLow ExeCution Engine Built on top of Kepler Includes a new director, windowed receiver and, source actors enabling Push communication. Two Monitoring and Collaborative Application implementations. Future: Design a director, which implements scheduling, sensitive to QoS requirements. CONFLuEnCE: Implementation and Application Design 26
Supported by NSF grants: IIS-0534531 and OIA-1028162http://db.cs.pitt.edu/group/projects/confluencehttp://db.cs.pitt.edu/group/projects/astroshelf Special thanks to: Astroshelf team: Astronomy collaborators: • Liz Marai, • Arthur Kosowsky, • Timothy Luciani, • Jeffrey Newman, • Rebecca Hachey, • Michael Wood- • Roxana Gheorghiu Vasley, • Brian Cherinca, • Anja Weyant. CONFLuEnCE: Implementation and Application Design 27
Conclusions The Continuous Workflow model Foundation for CONFLuEnCE CONtinuous workFLow ExeCution Engine Built on top of Kepler Includes a new director, windowed receiver and, source actors enabling Push communication. Two Monitoring and Collaborative Application implementations. Future: Design a director, which implements scheduling, sensitive to QoS requirements. http://db.cs.pitt.edu/group/projects/confluence http://db.cs.pitt.edu/group/projects/astroshelf 28
Workflows vs. DSMS vs. CWfs DSMS CWfs WFs Staticconfiguration Flexibility QoS/QoD General purpose driven Declarative & Stream Procedural processing Human integration Declarative Feedback Loops CONFLuEnCE: Implementation and Application Design 29