Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Escience2013-Continuous Data Flow Update Strategies for Mission-Critical Applications

475 views

Published on

Continuous dataflows complement scientific workflows
by allowing composition of realtime data ingest and analytics
pipelines to process data streams from pervasive sensors and
“always-on” scientific instruments. Such dataflows are missioncritical
applications that cannot suffer downtime, need to operate
consistent, and are long running, but may need to be updated to
fix bug or add features. This poses the problem: How do we update
the continuous dataflow application with minimal disruption? In
this paper, we formalize different types of dataflow update models
for continuous dataflow applications, and identify the qualitative
and quantitative metrics to be considered when choosing an
update strategy. We propose five dataflow update strategies,
and analytically characterize their performance trade-offs. We
validate one of these consistent, low-latency update strategies
using the F`o" dataflow engine for an eEngineering application
from the Smart Power Grid domain, and show its

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Escience2013-Continuous Data Flow Update Strategies for Mission-Critical Applications

  1. 1. http://chinesemilitaryreview.blogspot.com/2012/01/plaafs-j-10a-refueling-from-h-6u-badger.html 1
  2. 2. • • • • • • Motivation Data flow update types Quantitative/Qualitative metrics Update strategies Evaluation Conclusion http://www.gilaberttax.com/2013/04/16/targetedpartnership-tax-allocations/ 2
  3. 3. Gartner, “Big Data,” http://www.gartner.com/itglossary/big-data/ Twitter Storm 3
  4. 4. http://smartgrid.usc.edu “The Smart Grid Explained, The Hype and The Promise”, WESCO, FMEA/FMPA Conference 2009 4
  5. 5. • Mission critical data flows cannot suffer downtime – How to update continuous dataflow applications with minimal disruption ? • Evaluating dynamic update. – Performance impact • Throughput , Latency – Consistency • Data loss • Reproducibility 5 http://www.ipandora.net/2009/08/09/pray-steadfastly/
  6. 6. • Formalize different types of data flow updates needs. • Identify qualitative and quantitative metrics to be considered when designing update strategies • Introduce five different data flow strategies and analytically characterize their performance metrics • Implement a consistent, low latency update strategy in Floe continuous dataflow engine and evaluate it against a simple update strategy for a motivating application from Los Angeles power grid project http://www.flickr.com/photos/dhammakaya/7095451689/ 6
  7. 7. • Continuous data flow τ(Ƿ,С) is a directed graph – Ƿ set of processors – С set of directed edges(channels) connecting processors 7
  8. 8. • • • • Processor update Channel update Independent sub graph update Connected sub graph update 8
  9. 9. P3 P1 P5 P2 P4 P6 P2++ • Updates to one or more processors • | Ƿ | remains constant • С remains constant 9
  10. 10. P3 P1 P5 P2 P4 P6 • Change in number of channels or connectivity • No changes to processors 10
  11. 11. P3 P1 P5 P2 P4 P6 P2++ • Updates to one or more processors and channels – No change in number of processors – Channel connectivity change, Channel addition/removal 11
  12. 12. P3 P1 P5 P2 P4 P2++ P6 P2++ P2++ • Connected sub-graph in data flow is replaced by another connected sub-graph 12
  13. 13. • Quantitative – – – – Refresh latency Lag latency Throughput Message loss • Qualitative – Consistency – Interleaved vs Delineated http://theculturevulture.co.uk/blog/reviews/what-happened-at-whats-next/ 13
  14. 14. • Refresh latency – Time between update start and first message from the new workflow component • Lag latency – Time between update start and time at which last message from the old work flow is emitted. • Throughput – Message throughput drop at update time • Message loss – Is there a message loss ? How many ? 14
  15. 15. • Consistency – Does message consistently processed through a one version of data flow ? • Interleaved & Delineated • Let tf be the first message processed and emitted from τs+1 and tl be the last message processed and emitted from τs • Delineated if tf > tl 15
  16. 16. • • • • • Native Consistent Lossy update (NCL) Native Consistent High latency update (NCH) High-Throughput Inconsistent Update (HTI) Message Versioned Consistent Update (MVC) Path Versioned Consistent Update (PVC) https://www.ubat.com/blog/do-i-really-need-a-business-plan/ 16
  17. 17. P3 P5 P2 P4 Pause P1 P6 • Pause input stream , terminate dataflow , deploy new data flow, resume workflow – – – – – Consistent Delineated Lag latency = 0 Refresh latency = Deployment time + Min(wave head time) Throughput = 0 ;starting at update start time for a duration of refresh latency 17
  18. 18. P5 P3 Flush Pause P1 P2 P4 P6 • Pause input stream , flush on the fly messages (TTLold), terminate dataflow , deploy new data flow, resume workflow – – – – – – Consistent Delineated Refresh latency = DT + TTLold + Min(wave head time) Lag Latency = TTLold No Message loss Throughput goes to 0 18
  19. 19. P3 P1 P5 P2 P4 P6 • Perform in place updates upon request – Inconsistent – Interleaving messages – Low latencies (bounds are derived per update type) 19
  20. 20. P3 P5 Update current version P1 P2 • Tags messages at the sources • Message versions are used to find the correct processor/channel/sub-graph P4 P6 P4 – Consistent – Interleaved 20
  21. 21. • Extension of MVC • Message tagged with current path it took • Dispatch messages to new version either if they processed through new version or its processed through components present in both new and old versions of workflow. – Consistent – Interleaved 21
  22. 22. • Implemented MCV in Floe[1] Continuous data flow engine. • Compare MVC against Naïve Consistent Lossy update. • Used Message Context as a carrier of data-flow version Floe Message Key Properties<K,V> Payload [1] https://github.com/usc-cloud/floe 22
  23. 23. • Update processor “Parse” to “Parse++” 23
  24. 24. 24
  25. 25. • Online updates to mission critical continuous data flows is an important problem space. • Formalized and analyzed – Update models – Evaluation metrics – Update strategies and their trade offs • Empirically evaluate MVC and NCL update strategies. 25
  26. 26. http://thesciencepresenter.wordpress.com/category/behavi our-management/ 26

×