Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dependable Operations

1,524 views

Published on

DevOpsDay Downunder 2013 talk

Published in: Technology
  • If we are speaking about saving time and money this site ⇒ www.WritePaper.info ⇐ is going to be the best option!! I personally used lots of times and remain highly satisfied.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I'd advise you to use this service: ⇒ www.WritePaper.info ⇐ The price of your order will depend on the deadline and type of paper (e.g. bachelor, undergraduate etc). The more time you have before the deadline - the less price of the order you will have. Thus, this service offers high-quality essays at the optimal price.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Dependable Operations

  1. 1. NICTA Copyright 2012 From imagination to impact Dependable Operation Dr. Liming Zhu Software Systems Research Group NICTA (National ICT Australia) & University of New South Wales DevOps Days Downunder, 2013 Liming.Zhu@nicta.com.au slideshare.net/LimingZhu/
  2. 2. NICTA Copyright 2012 From imagination to impact Motivation • Applications fail due to operation issues – Gartner report: 80% of outage caused by people/process issues • Sporadic activities: replication/failover, auto-scaling, upgrade… – Not that dependability issues may trigger mitigating operations but the converse: • dependability, often unexpectedly, is affected by these mitigating activities and other sporadic activities – Lessons from our own cloud disaster recovery product: Yuruware.com • Complex interleaving “sporadic” processes/activities – Scripts, tools, human – Activities auto-triggered by policies, monitoring and analysis – Logs/Events often lack the “process-context” 2
  3. 3. NICTA Copyright 2012 From imagination to impact Our Process-Oriented Approach • Existing artifact-oriented and state-based research – Log analysis linking back to issues in source code – Static configuration analysis and constraint checking – State-based system-level models • We treat an operation as a set of steps – Executed by fault-prone agents (scripts/tools/human) – Requiring various fault-prone resources (computing/nodes/environ) – Faults at one step may surface later at another step – Exception handling: error diagnosis, undo/redo, fixing, tolerating… 3
  4. 4. NICTA Copyright 2012 From imagination to impact What We Are Working On • Undo Framework and Undo-ability of Operations – AWS Cloud API wrapper to allow undo – Use AI Planning to check undo-ability and plan undo path • Model, Monitor and Simulate Operations – Post-condition verification and monitoring of steps – Use monitored process context for error diagnosis and recovery – Simulate large-scale operations: probability/time of successful completion, bottle necks and problems • Process Mining from Logs – Mine a process from existing log files – Detect deviation early or help error diagnosis Tell us the right problems and approaches! Liming.Zhu@nicta.com.au slideshare.net/LimingZhu/ 4

×