• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis

on

  • 901 views

Talk from ISSRE 2010

Talk from ISSRE 2010

Statistics

Views

Total Views
901
Views on SlideShare
823
Embed Views
78

Actions

Likes
0
Downloads
7
Comments
0

1 Embed 78

http://www.koziolek.de 78

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Why is this done? Benefits:- Determine components most contributing to software architecture- Allocate testing efforts, goals for testing units- Evaluate design alternatives, improve architecture- More reliabile system, quantitative numbers
  • report on experiences and methods usedlessons learnedwhat needs to be improved (from our perspective)
  • 3 MLOC C++, COM, ATL9 subsystems, >100 componentsmanaging industrial process (e.g., power generation, paper production, oil and gas refining, etc.)distributed system, controllers, servers, networks, field devicesoperator workplace for controlling the process: montoring sensor readings, manipulating actuators
  • - also agenda of the rest of the talk
  • Schrift größer, weniger text
  • -Selected Littlewood/Verrall model from IEEE Std. 1633Industry affinity (SCADA), good fit in initial testsTime between failures exponentially distributed:Repair may introduce new faults, repair time = 0  is a random variable with Gamma distributionWe were able to fit the whole dataset without filtering data at5% significance level with the quadratic Littlewood/Verrallmodel (LV-Q)failure reports are often not mapped to components in bug tracking systemsdifficult to select a Modeltoo many models availablestatistical validity hard
  • failure data from bugtracker, filtered for critical/high severity bugsquadratic model: programmers have good intentions in fixing the codedone for each subsystem, result: 9 failure probabilities
  • Installed and configured the systemDefined 2 load profiles, configured load driversConfigured ABB tool to log subsystem transitionsExecuted load drivers for each profile (2 days)Processed logs (2 GB) with scriptAdded initial, final stateCalculate transition probabilitiesValidated the modelCompared with architectural documentationInterviewed PCS experts
  • - Q: transition probabilitiy matrix (by eliminating failure state)S: steady state probabilitiesR: system reliability (probability of reaching the successstate
  • units obfuscated for confidentiality reasonssubsystem 8 has highest failure probabilitysubsystem 1 has highest sensitivity to system reliabilitysubsystem 6 is used by many subsystems, but only limited contribution to system reliability
  • verteilung erklärenMany variation points, limited step-by-step guidanceTime-consuming data collection for non-expertsBest for for small changes to existing systemsNeeds to be tailored to available data

A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis Presentation Transcript

  • © ABB Group
    November 3, 2010 | Slide 1
    A Large-Scale Industrial Case Study on Architecture-based Software Reliability Analysis
    Heiko Koziolek, Bastian Schlich, Carlos Bilich, ABB Corporate Research, 2010-11-01
  • Architecture-based Software Reliability Analysis (ABSRA)What?
    Typical questions of software architects concerning reliability
    „What is the reliability (probability of failures) in my system?“
    „How do individual components contribute to the system reliability?“
    „Which architectural alternative is best for reliability?“
    „Where shall I introduce fault-tolerance mechanisms?“
    „How to distribute my limited testing efforts among components?“
    Additional questions by ABB
    „How much more reliable is a new architecture than a former one?“
    „Does ABSRA work on large-scale systems?“
    © ABB Group
    November 3, 2010 | Slide 2
  • Architecture-based Software Reliability Analysis (ABSRA)How?
    © ABB Group
    November 3, 2010 | Slide 3
    R=0.995
    Software
    components, control flow, reliabilities
    R=0.982
    Markov Model
    combine
    R=0.937
    im-prove
    trans-form
    Markov ModelSolution
    Predicted system reliability
    R = 0.9923
    solve
  • Related workExisting empirical studies
    © ABB Group
    November 3, 2010 | Slide 4
    ”… very little effort has been devoted to the validation of architecture-based software reliability techniques.”[Gokhale2007, IEEE Transactions on Dependable and Secure Computing, Vol. 4, No. 1]
  • System under study: Process control system
    © ABB Group
    November 3, 2010 | Slide 5
  • System under study: Process control systemTopology
    © ABB Group
    November 3, 2010 | Slide 6
    Remote
    Workplaces
    Internet
    Firewall
    Remote
    Workplaces
    Plant / Office Network
    Network
    IsolationDevice
    Workplaces
    Servers
    Redundant Network
    Controllers
    Fieldbus
    Remote I/O and
    Field devices
  • System under study: Process control systemSubsystems within the servers
    © ABB Group
    November 3, 2010 | Slide 7
  • Which steps are required for ABSRA?
    © ABB Group
    November 3, 2010 | Slide 8
  • Estimate component failure probabilitiesExisting methods
    © ABB Group
    November 3, 2010 | Slide 9
  • Reliability growth modelingGeneral principle
    © ABB Group
    November 3, 2010 | Slide 10
    Littlewood/Verrall Model
  • Reliability growth modeling Using the Littlewood/Verrall-model on one subsystem
    © ABB Group
    November 3, 2010 | Slide 11
    Filtered subsystem bug list
    Release dates
    Curve fitting in CASRE 3.0
    http://www.openchannelsoftware.com/projects/CASRE_3.0/
  • Reliability growth modeling Result
    © ABB Group
    November 3, 2010 | Slide 12
    R1= ...
    R7= ...
    R8= ...
    R6= ...
    R4= ...
    R5= ...
    R2= ...
    R3= ...
  • Which steps are required for ABSRA?
    © ABB Group
    November 3, 2010 | Slide 13
  • Estimate component transition probabilitiesExisting methods
    © ABB Group
    November 3, 2010 | Slide 14
  • Estimate component transition probabilitiesProfiling with proprietary tools
    © ABB Group
    November 3, 2010 | Slide 15
    Set up and ran the system
    Self-coded script
    Example trace from profiling
  • Which steps are required for ABSRA?
    © ABB Group
    November 3, 2010 | Slide 16
  • Construct the Markov modelExisting state-based methods
    © ABB Group
    November 3, 2010 | Slide 17
    [Goseva-Popstojanova2001]
  • Cheung modelAdding failure & end states, compute reliability
    © ABB Group
    November 3, 2010 | Slide 18
    [Cheung1980]
  • Which steps are required for ABSRA?
    © ABB Group
    November 3, 2010 | Slide 19
  • Exploit the resultsPossibilities
    © ABB Group
    November 3, 2010 | Slide 20
  • Sensitivity AnalysisImpact of varying subsystem failure rates
    © ABB Group
    November 3, 2010 | Slide 21
    http://www.prismmodelchecker.org/
  • Evaluation Cost estimations in person hours (best/worst case)
    © ABB Group
    November 3, 2010 | Slide 22
  • ConclusionsLessons learned
    Getting failure and transition probabilities is hard
    Time consuming, error-prone, limited automation
     Main obstacle for ABSRA is data collection
    Currently rather simplemodels
    No technologies, concurrency, hardware
    Difficult to evaluate architecture alternatives
     Limited decision support from the predictions
    Lack of empirical studies in literature
    Predominantly small systems
    Often dubious techniques for estimating failure rates
     Replicated case studies needed
    © ABB Group
    November 3, 2010 | Slide 23
  • © ABB Group
    November 3, 2010 | Slide 24