Dependable Operations

NICTA Copyright 2012 From imagination to impact
Dependable Operation
Dr. Liming Zhu
Software Systems Research Group
NICTA (National ICT Australia) &
University of New South Wales
DevOps Days Downunder, 2013
Liming.Zhu@nicta.com.au slideshare.net/LimingZhu/

Motivation
• Applications fail due to operation issues
– Gartner report: 80% of outage caused by people/process issues
• Sporadic activities: replication/failover, auto-scaling, upgrade…
– Not that dependability issues may trigger mitigating operations but
the converse:
• dependability, often unexpectedly, is affected by these mitigating
activities and other sporadic activities
– Lessons from our own cloud disaster recovery product:
Yuruware.com
• Complex interleaving “sporadic” processes/activities
– Scripts, tools, human
– Activities auto-triggered by policies, monitoring and analysis
– Logs/Events often lack the “process-context”
2

Our Process-Oriented Approach
• Existing artifact-oriented and state-based research
– Log analysis linking back to issues in source code
– Static configuration analysis and constraint checking
– State-based system-level models
• We treat an operation as a set of steps
– Executed by fault-prone agents (scripts/tools/human)
– Requiring various fault-prone resources (computing/nodes/environ)
– Faults at one step may surface later at another step
– Exception handling: error diagnosis, undo/redo, fixing, tolerating…
3

What We Are Working On
• Undo Framework and Undo-ability of Operations
– AWS Cloud API wrapper to allow undo
– Use AI Planning to check undo-ability and plan undo path
• Model, Monitor and Simulate Operations
– Post-condition verification and monitoring of steps
– Use monitored process context for error diagnosis and recovery
– Simulate large-scale operations: probability/time of successful
completion, bottle necks and problems
• Process Mining from Logs
– Mine a process from existing log files
– Detect deviation early or help error diagnosis
Tell us the right problems and approaches!
Liming.Zhu@nicta.com.au slideshare.net/LimingZhu/
4

Dependable Operations

More Related Content

What's hot

Viewers also liked

Similar to Dependable Operations

More from Liming Zhu

Recently uploaded

Dependable Operations