Trustworthy Systems for Today & Tomorrow Larry Bernstein (lbernste@stevens.edu)  Stevens Institute of Technology Castle Point, Hoboken, NJ 07030
Trustworthy Software is: Safe: Does no harm Reliable: No crash or hang. Secure: No Hacking Possible
Airbus  320 Troubles In three crashes each pilot claimed the plane was higher than indicated. Altitude read 67ft before the wheels had even left the ground!  The fly-by-wire system could ignore pilot actions.
Poor Designs in A320 Programmed landing maneuvers with a bug in altitude calculation Warning system alerts only seconds before accident with no time to react Flight path angle and vertical speed indicator have the same display format; confuses pilots.
Untrustworthy  Software Buggy software  Pilots either frantic or  bored. Error and warning messages are often numerous or indecipherable, so pilots ignore them.
Two Software Aging Processes Development Shelf-life Multiple releases Environmental changes Execution Launch-to-Crash State & Data focus
Software ages like fish,  not like wine
Trustworthy Characteristics Quantitative Specifications Designed for reliability, safety and security Bounded execution domains Certified against requirements Certified against problem Reliability tested Stress tested Diabolically tested Defined Development and Config. Process Trusted
Software Yesterday and Today Software execution is chaotic and always repeatable but may not be stable. Professionals untrained in analysis Debugging is detective work Software products are often complex and untrustworthy Crashes and hangs expected Bugs or Glitches are common
Execution-Aging Latent faults causes gradual deterioration of software  with respect to the use of some resource resulting in a crash or hang.
Conditions That Cause Instability Memory Leaks  Poor Algorithms Missing Deadlines Roundoff Error Amplification Broken Pointers Register Misuse- Extinct bug reappears in multi-core machines
Rejuvenation Concept Periodic preemptive rollback prevents future failures. Gracefully terminating an application at a known point  allows restarting at a fixed and carefully tested initial state .
Pragmatic Software Rejuvenation   Instead of running a system for a year, run it  one day 365 times Prevents many future failures. Restarts at a known and clean internal state.
History  1960’s: Anti-missile “On-Interrupt” Code 1970’s: Store and Forward  1980s: Billing Data Collector
1980s: Billing Data Collector Snap shots of switch call records 600 switches/ system Daily rejuvenation Crash-free & Hang-free
History  1960’s: Anti-missile “On-Interrupt” Code 1970’s: Store and Forward  1980s: Billing Data Collector 1990’s: R&D emerges from Bell Labs and Duke with industrial use State-of-the-Practice: Scenario, regression and automatic testing State-of-the-Art: Agility & Software Failure Prevention
Case Study: Network Management System System monitors network equipment. Messages are trigger by network events I/O Buffer Sharing reduces memory required.
Customer Complaint It crashed again!!!
Failure Analysis Latent Fault in Buffer Flow Control. Software does not return ‘buffer full’ signal. Messages are written to full buffers. Messages are accepted,acknowleged and then partially dropped. Application waits and waits for a complete message.
Solution Fix the bug by returning appropriate indicator, or Rejuvenate: Re-launch message handler and avoid the problem: When buffers are half full. Periodically. After hang is detected.
Programmer Resistance to Rejuvenation  Culture? Skepticism? Management? “Is the limited use of known good methods of software engineering a consequence of poor training?”  National Research Council 1997
Perspectives on the Profession Software Engineering is the 4 th  Fastest Growing Occupation in 2009 Top-notch players remained employed Software Engineering/Project Managers continue to get raises Beware of ‘off-shoring’ a risky way to  reduce costs by 20-30%.
Ethics from IEEE & ACM Understand the problem  Analyze safety and risks Humanize tasks Read & study  Respect property rights and privacy Ship systems that work crash-free and hang-free Stand up and be counted.
Re-engineer Your Career with a Stevens Software Engineering Graduate Certificate One year 4 courses  (www.stevens.edu)   Modular Format or WebCampus March 9-13:  SSW533 with Laird May 14-15,18-20 SSW540 with Cohen August 24-28: SSW564/SYS625 with Barrese Jan 4 –8: SSW 565 with Cohen

Web Ex2 28 Jan09

  • 1.
    Trustworthy Systems forToday & Tomorrow Larry Bernstein (lbernste@stevens.edu) Stevens Institute of Technology Castle Point, Hoboken, NJ 07030
  • 2.
    Trustworthy Software is:Safe: Does no harm Reliable: No crash or hang. Secure: No Hacking Possible
  • 3.
    Airbus 320Troubles In three crashes each pilot claimed the plane was higher than indicated. Altitude read 67ft before the wheels had even left the ground! The fly-by-wire system could ignore pilot actions.
  • 4.
    Poor Designs inA320 Programmed landing maneuvers with a bug in altitude calculation Warning system alerts only seconds before accident with no time to react Flight path angle and vertical speed indicator have the same display format; confuses pilots.
  • 5.
    Untrustworthy SoftwareBuggy software Pilots either frantic or bored. Error and warning messages are often numerous or indecipherable, so pilots ignore them.
  • 6.
    Two Software AgingProcesses Development Shelf-life Multiple releases Environmental changes Execution Launch-to-Crash State & Data focus
  • 7.
    Software ages likefish, not like wine
  • 8.
    Trustworthy Characteristics QuantitativeSpecifications Designed for reliability, safety and security Bounded execution domains Certified against requirements Certified against problem Reliability tested Stress tested Diabolically tested Defined Development and Config. Process Trusted
  • 9.
    Software Yesterday andToday Software execution is chaotic and always repeatable but may not be stable. Professionals untrained in analysis Debugging is detective work Software products are often complex and untrustworthy Crashes and hangs expected Bugs or Glitches are common
  • 10.
    Execution-Aging Latent faultscauses gradual deterioration of software with respect to the use of some resource resulting in a crash or hang.
  • 11.
    Conditions That CauseInstability Memory Leaks Poor Algorithms Missing Deadlines Roundoff Error Amplification Broken Pointers Register Misuse- Extinct bug reappears in multi-core machines
  • 12.
    Rejuvenation Concept Periodicpreemptive rollback prevents future failures. Gracefully terminating an application at a known point allows restarting at a fixed and carefully tested initial state .
  • 13.
    Pragmatic Software Rejuvenation Instead of running a system for a year, run it one day 365 times Prevents many future failures. Restarts at a known and clean internal state.
  • 14.
    History 1960’s:Anti-missile “On-Interrupt” Code 1970’s: Store and Forward 1980s: Billing Data Collector
  • 15.
    1980s: Billing DataCollector Snap shots of switch call records 600 switches/ system Daily rejuvenation Crash-free & Hang-free
  • 16.
    History 1960’s:Anti-missile “On-Interrupt” Code 1970’s: Store and Forward 1980s: Billing Data Collector 1990’s: R&D emerges from Bell Labs and Duke with industrial use State-of-the-Practice: Scenario, regression and automatic testing State-of-the-Art: Agility & Software Failure Prevention
  • 17.
    Case Study: NetworkManagement System System monitors network equipment. Messages are trigger by network events I/O Buffer Sharing reduces memory required.
  • 18.
    Customer Complaint Itcrashed again!!!
  • 19.
    Failure Analysis LatentFault in Buffer Flow Control. Software does not return ‘buffer full’ signal. Messages are written to full buffers. Messages are accepted,acknowleged and then partially dropped. Application waits and waits for a complete message.
  • 20.
    Solution Fix thebug by returning appropriate indicator, or Rejuvenate: Re-launch message handler and avoid the problem: When buffers are half full. Periodically. After hang is detected.
  • 21.
    Programmer Resistance toRejuvenation Culture? Skepticism? Management? “Is the limited use of known good methods of software engineering a consequence of poor training?” National Research Council 1997
  • 22.
    Perspectives on theProfession Software Engineering is the 4 th Fastest Growing Occupation in 2009 Top-notch players remained employed Software Engineering/Project Managers continue to get raises Beware of ‘off-shoring’ a risky way to reduce costs by 20-30%.
  • 23.
    Ethics from IEEE& ACM Understand the problem Analyze safety and risks Humanize tasks Read & study Respect property rights and privacy Ship systems that work crash-free and hang-free Stand up and be counted.
  • 24.
    Re-engineer Your Careerwith a Stevens Software Engineering Graduate Certificate One year 4 courses (www.stevens.edu) Modular Format or WebCampus March 9-13: SSW533 with Laird May 14-15,18-20 SSW540 with Cohen August 24-28: SSW564/SYS625 with Barrese Jan 4 –8: SSW 565 with Cohen