Your SlideShare is downloading. ×
  • Like
The quality attribute of upgradability
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The quality attribute of upgradability

  • 367 views
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
367
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
20
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Quality Attribute of Upgradability Len Bass with Hiroshi Wada, Ingo Weber, Liming Zhu, Ross JefferyNICTA Copyright 2012 From imagination to impact
  • 2. About NICTANational ICT Australia • Federal and state funded research company established in 2002 • Largest ICT research resource in Australia • National impact is an important success metric • ~700 staff/students working in 5 labs across major capital cities • 7 university partners NICTA technology is • Providing R&D services, knowledge in over 1 billion mobile transfer to Australian (and global) ICT phones industry 2 NICTA Copyright 2012 From imagination to impact
  • 3. Consider the follow sequence.• You have prepared an upgrade to an existing large enterprise system – You have coded it – You have tested it – It is ready!!• Alternatively, the IT department (or you) get a package from a third party – a vendor or open source – that has been coded and tested.• What happens then?NICTA Copyright 2012 From imagination to impact 3
  • 4. Consider the follow sequence.• You have prepared an upgrade to an existing large enterprise system – You have coded it – You have tested it – It is ready!!• Alternatively, the IT department (or you) get a package from a third party – a vendor or open source – that has been coded and tested.• What happens then? – ~10% of the time the upgrade will fail.NICTA Copyright 2012 From imagination to impact 4
  • 5. This is the upgradability problem• How do we make upgrading a system less problematic?• Talk outline – Characteristics of the upgrade problem – FMEA analysis • Possible causes of failure • Failure prevention, detection, and recovery – Relation to existing product and process quality workNICTA Copyright 2012 From imagination to impact 5
  • 6. Upgrades to enterprise systems are a verycommon occurrenceUpgrade frequency of some common systems Application Average release interval Facebook (platform) < 7 days Google Docs <50 days Media Wiki 21 days Joomla 30 daysThis frequency would suggest it is important to getthe upgrades correctNICTA Copyright 2012 From imagination to impact 6
  • 7. Unfortunately, Upgrades Fail Often• 4.6-10 component failures each month in three large-scale Internet services. Mostly during regular maintenance• Average and maximum failure rates from a survey of systems administrators are 8.6% and 50%.• Some claim that user visible failures from upgrade outweigh user visible failures from software errors.NICTA Copyright 2012 From imagination to impact 7
  • 8. Why is this?• Installation is complicated. – Installation guides for SAS 9.3 Intelligence, IBM i, Oracle 11g for Linux are ~250 pages each – Apache description of addresses and ports (one out of 16 descriptions) has following elements: • Choosing and specifying ports for the server to listen to • IPv4 and IPv6 • Protocols • Virtual Hosts – The number of configuration options that must be set can be large • Hadoop has 206 options • Hbase has 64 – Many dependencies are not visible until executionNICTA Copyright 2012 From imagination to impact 8
  • 9. Provides Research Agenda• Indeed, the surprise is not that upgrades fail 8.6% of the time but that they are successful 91.4% of the time.• Rich area for research.NICTA Copyright 2012 From imagination to impact 9
  • 10. What kind of problem is this - product?• ISO 25010 provides – A quality in use model composed of five characteristics (some of which are further subdivided into subcharacteristics) that relate to the outcome of interaction when a product is used in a particular context of use. – I.e. is upgradability a quality of the system being upgraded?• The answer is yes.NICTA Copyright 2012 From imagination to impact 10
  • 11. What kind of problem is this – process?• ITIL (Information Technology Infrastructure Library) – Change Management aims to ensure that standardised methods and procedures are used for efficient handling of all changes.• SPICE – ISO 15504 – process assessment provides the means of characterizing the current practice within an organizational unit in terms of the capability of the selected processes.• Is upgradability of quality of the process used to manage information technology?• The answer is yes.NICTA Copyright 2012 From imagination to impact 11
  • 12. Upgradability is a hybrid quality problem• A hybrid quality problem is one in which improvement involves both product and process and in which the product has process awareness.• Many product centered conferences – Dependability – Security –…• Some process centered conferences – Software Process Improvement – SPICE – SPEG –…NICTA Copyright 2012 From imagination to impact 12
  • 13. Hybrid quality improvement is not wellserved by the academic community• Hybrid quality improvement – as we shall see – involves close interaction between product, process and tools to support the process.• Venues that should emphasize this interaction include – Profes (Product focused Software Development and Process Improvement) – ASQ (Conference on Quality and Improvement)• Yet an examination of the CFPs and proceedings for these conferences shows a distinction between process activities and product characteristics• We will present the results of a FMEA (Failure Mode and Effects Analysis) style analysis for upgradability and then return to the hybrid quality issueNICTA Copyright 2012 From imagination to impact 13
  • 14. FMEA• Failure Modes and Effect Analysis is an inductive failure analysis for analysis of failure modes.• FMEA involves describing – Potential failure modes – The severity and likelihood of these failures.• We will focus on the first portion and generate the potential failure modes as well as potential prevention, detection, and recovery from these failures.• I.e. we are performing an FMEA style analysis, not an FMEA, per se.NICTA Copyright 2012 From imagination to impact 14
  • 15. Scenario for Upgradability• We are concerned with the following scenario – Version N+1 of an enterprise system is available for deployment. • Version N+1 can be deployed by developers • Version N+1 can be deployed by the Information Technology Department (The Release Manager if there is one). – Version N+1 is completely coded and tested by its developers.• Measures can include – Downtime – Resources (hardware or personnel) required to perform the upgrade – Number of failed attempts to install upgradeNICTA Copyright 2012 From imagination to impact 15
  • 16. Fundamental goals during upgrade• The literature identifies four fundamental goals while upgrade is occurring. – Efficiently manage resources – Completely and correctly specify configurations – Manage multiple versions to avoid problems with version mismatch. – Maintain consistency of persistent data.• Failures are caused by the violation of one of these fundamental goals. – Our FMEA analysis will look at potential causes for violations of one of these goals.NICTA Copyright 2012 From imagination to impact 16
  • 17. Activities during an upgrade of a system• Make the upgrade available.• Prepare the environment. Ensure that there are sufficient resources available for installation and that assumed software is available.• Configuration• Deployment• ActivationNICTA Copyright 2012 From imagination to impact 17
  • 18. Organization of next portion of thepresentation• For each activity ˗ Potential fault (a fault is a failure in waiting) ˗ Prevention of the fault ˗ Detection of the fault ˗ Correction of the fault• Research opportunity • Blank cell • Cell with only partial coverageNICTA Copyright 2012 From imagination to impact 18
  • 19. Make Upgrade available Fault possibility Prevention Detection Recovery Element omitted/included Manifest Recreate incorrectly in installing Bill of lading distribution software System corrupted during Hash code, Retransmit movement checksum Source of distribution from Digital an untrusted site signature Forgotten/misplaced Separate secret credentials Independent channel for new credentials Credential verifier Codify unavailable acceptable credentials in distributionNICTA Copyright 2012 From imagination to impact 19
  • 20. Prepare environment Fault possibility Prevention Detection Recovery Incorrect versions of support Include version Encode hash of libraries number in APIs specification Utilize services to announce incompatibilities Multiple versions of support Include version libraries simultaneously required number in name Libraries expose version numbers Linkers version aware Insufficient resources Rolling Upgrade Schema modification on Convert data to database new schema prior to upgradeNICTA Copyright 2012 From imagination to impact 20
  • 21. Configuration Fault possibility Prevention Detection Recovery Missing parameter Parameter database Parameter built into tool Static analysis of code Incorrectly specified Abstract Check parameter specification syntax Validate against a specification Inconsistent Constraint parameters checkerNICTA Copyright 2012 From imagination to impact 21
  • 22. Deployment Fault possibility Prevention Detection Recovery Insufficient resources Pre-allocate during preparation Rolling upgrade Inconsistent hardware Verify during preparation Operator error Undo mechanismNICTA Copyright 2012 From imagination to impact 22
  • 23. Activation Fault possibility Prevention Detection Recovery Discovered hidden Monitoring Recovery dependency block Multiple Separation Version simultaneous Dynamic aware code versions Software and data Update Automatic translation of data when old schema is used Version aware load balancerNICTA Copyright 2012 From imagination to impact 23
  • 24. Our activities in this space so far (greencells)• Mixed version race condition solution• Operator undoNICTA Copyright 2012 From imagination to impact 24
  • 25. What is the “mixed version race condition”• Common practice when pushing an upgrade to a large number of servers is to perform the upgrades one (or several) servers at a time• This means that version N+1 (the new version) will be available on some servers and version N (the old version) will be available on other servers.• Suppose version N+1 has functionality not available in version NNICTA Copyright 2012 From imagination to impact 25
  • 26. Now consider the following sequence1. A client (browser) issues a request that is routed by the load balancer to an instance of version N+12. Version N+1 sends JavaScript assuming new functionality back to the client.3. Client sends an AJAX request that utilizes new functionality and the load balancer routes it to an instance of version N.4. Error because version N does not have the new functionality.NICTA Copyright 2012 From imagination to impact 26
  • 27. Mixed Version Race Condition Client (browser) Server 1 Start rolling upgrade 2 Initial request HTTP reply with New embedded JavaScript 3 Version 4 AJAX callback Old 5 Version X ERRORNICTA Copyright 2012 From imagination to impact 27
  • 28. What does the solution involve?1. Label communication between instances and the client with version information2. Modify load balancer so that messages are routed to an appropriate version3. Modify load balancer so that messages are balanced across all child instances.NICTA Copyright 2012 From imagination to impact 28
  • 29. Why is this a hard problem?• Large installations have multiple distributed load balancers that must be kept in synch. I.e. some load balancers may know about new version and some may not• Not enough to put version number in message – Suppose second request goes to a load balancer that does not yet know about version N+1.• Must keep messages balanced so that all servers handle roughly the same number of requests. /service /service /service/vN /service/vN+1 /service/vN server server server server server serverNICTA Copyright 2012 From imagination to impact 29
  • 30. Operator undo• After perofmring an operation in AWS, may want to go back to original state – i.e. Undo the operation• Not always that straight-forward: – Attaching volume is no problem while the instance is running, detaching might be problematic – Creating / changing auto-scaling rules has effect on number of running instances • Cannot terminate additional instances, as the rule would create new ones! – Deleted / terminated / released resources are gone!NICTA Copyright 2012 From imagination to impact 30
  • 31. Undo for System Operators Administrator begin- do do do rollback transaction + commit + pseudo-deleteNICTA Copyright 2012 From imagination to impact 31
  • 32. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Undo SystemNICTA Copyright 2012 From imagination to impact 32
  • 33. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Goal Goal Initial Initial state state state state Undo SystemNICTA Copyright 2012 From imagination to impact 33
  • 34. Approach Administrator begin- do do do rollback transaction Sense cloud Sense cloud resources states resources states Goal Goal Initial Initial Set of Set of state state state state actions actions Execute Generate code Plan Undo SystemNICTA Copyright 2012 From imagination to impact 34
  • 35. Upgradability as a process&product quality• Architecture of the system being upgraded can affect the process of installation – Suppose the system checks for version information from dependent libraries. Then the process must encompass descriptions of what to do if an error condition occurs.• Process of upgrade can affect the architecture of the product. – Suppose the process is supported by a tool that checks the health of the installation of version N+1. Then the system must make visible the information used by this tool.NICTA Copyright 2012 From imagination to impact 35
  • 36. Summary• Upgrade is an important problem – Upgrade failures affect user satisfaction – Upgrade failures happen frequently• Upgrade involves the interaction of product and process quality issues. – Communities are focussed on improving the quality of the process or the product. Not the joint process/product quality.• Multiple opportunities for research exist.NICTA Copyright 2012 From imagination to impact 36
  • 37. Q&A Thank You!Research study opportunities in dependable cloud computing:• Software Architecture• Data Management• Performance Engineering• Autonomic Computing To find out more, send your CV and undergraduate details to students@nicta.com.auNICTA Copyright 2012 From imagination to impact 37