Dvorak.dan

14,853 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
14,853
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dvorak.dan

  1. 1. Cleared for unlimited release: CL#08-3913 NASA Study Flight Software Complexity Sponsor: NASA OCE Technical Excellence Program JPL Task Lead: Dan Dvorak GSFC POC: Lou Hallock JSC POC: Pedro Martinez, Brian Butcher MSFC POC: Cathy White, Helen Housch APL POC: Steve Williams HQ Sponsor: Adam West NASA Advisors: John Kelly, Tim Crumbley2/11/2009 Flight Software Complexity 1
  2. 2. Task Overview Flight Software Complexity Origin Growth in Codein Size izforr RoboticU and eHuman Missions G row th C od e S e fo M ann e d and n mann d M ission s 10000000 Robotic unm annedChief engineers identified cross- 1000000 Human m anned E x pon. (unm anned)cutting issues warranting further 100000 NCSL (Log scale) E x pon. (m anned) K N C S L (l o g s c astudy 10000 1969 Mariner-6 (30) 1975 Viking (5K) 1977 Voyager (3K) 1000Brought software complexity issue 100 1989 Galileo (8K) 1990 Cassini (120K) 1997 Pathfinder (175K)to Baseline Performance Review 10 1999 DS1 (349K) 2003 SIRTF/Spitzer (554K) 2004 MER (555K) Charter 2005 MRO (545K) 1 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 1968 Apollo (8.5K) 1980 Shuttle(470K) Ye a r o f M issio n 1989 ISS (1.5M)Bring forward deployable technicaland managerial strategies to Areas of Interesteffectively address risks from 1. Clear exposé of growth in NASAgrowth in size and complexity of FSW size and complexityflight software 2. Ways to reduce/manage complexity in general Initiators / Reviewers 3. Ways to reduce/manage complexityKen Ledbetter, SMD Chief Engineer of fault protection systemsStan Fishkind, SOMD Chief EngineerFrank Bauer, ESMD Chief Engineer 4. Methods of testing complex logic forGeorge Xenofos, ESMD Dep. Chief Engineer safety and fault protection provisions Flight Software Complexity 2
  3. 3. Growth Trends in NASA Flight Software Note log Growth in CodeoSize efor M an n e d an d Uand nHumann s G ro w th in C d e S iz fo r Robotic n man e d M issio Missions scale 10000000 unm anned Robotic 1000000 m anned Human E x pon. (unm anned) 100000 E x pon. (m anned) scale) NCSL N(Log(lo g sca NCSL S L scale) 10000 1969 Mariner-6 (30) K C (Log 1975 Viking (5K) 1000 1977 Voyager (3K) 1989 Galileo (8K) 1990 Cassini (120K) 100 1997 Pathfinder (175K) 1999 DS1 (349K) 10 2003 SIRTF/Spitzer (554K) 2004 MER (555K) 2005 MRO (545K) 1NCSL = 1968 Apollo (8.5K) 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004Non-Comment 1980 Shuttle(470K) 1989 ISS (1.5M)Source Lines Ye a r o f M issio n The ‘year’ used in this plot is for a mission is typically the year of launch, or of completion of the primary software. Line counts are either from best available source or direct line counts (e.g., for the JPL and LMA missions). The line count for Shuttle Software is from Michael King, Space Flight Operations Contract Software Process Owner, April 2005 Note well: This shows exponential growth Note well: This shows exponential growth ~10X growth every 10 years ~10X growth every 10 yearsSource: Gerard Holzmann, JPL Flight Software Complexity 3
  4. 4. Software Growth in Human SpaceflightJSCdata G ro w th in S o ftw are S iz e 1400 1244 The Orion (CEV) numbers The Orion (CEV) numbers 1200 are current estimates. are current estimates. 1000 To make Space ShuttleK SLOC 800 650 To make Space Shuttle 600 and Orion comparable, and Orion comparable, 400 neither one includes neither one includes backup flight software backup flight software 200 8.5 since that figure for Orion since that figure for Orion 0 A pollo 1968 S pac e S huttle O rion (es t.) is TBD. is TBD. (8500 lines) F lig h t V e h icle Space Shuttle and ISS estimates dated Dec. 2007 Source: Pedro Martinez, JSC Flight Software Complexity 4
  5. 5. How Big is a Million Lines of Code? A novel has ~500K characters (~100K words × ~5 characters/word)A million-line program has ~20M characters(1M lines × ~20 characters/line), or about 40 novels Source: Les Hatton, University of Kent, Encyclopedia of Software Engineering, John Marciniak, editor in chief Flight Software Complexity 5
  6. 6. Size Comparisons of Embedded SoftwareSystem Lines of CodeMars Reconnaissance 545K NASA flight s/w is NASA flight s/w isOrbiter not among the not among the largest embedded largest embeddedOrion Primary Flight Sys. 1.2M software systems software systemsF-22 Raptor 1.7MSeawolf Submarine Combat 3.6MSystem AN/BSY-2Boeing 777 4MBoeing 787 6.5MF-35 Joint Strike Fighter 5.7M Yes, Yes,Typical GM car in 2010 100M really.100 really.100 Flight Software Complexity Million Million 6
  7. 7. NSF Concerned About Complexity “As the complexity of current systems “As the complexity of current systems has grown, the time needed to develop has grown, the time needed to develop them has increased exponentially, and them has increased exponentially, and the effort needed to certify them has the effort needed to certify them has risen to account for more than half the risen to account for more than half the total system cost. total system cost.NSF solicitation on cyber-physical systems (Jan. 2009) Flight Software Complexity 7
  8. 8. Complex interactions and high coupling raise risk of design defects and operational errors High-risk systems INTERACTIONS Linear Complex High Dams Nuclear plant Power grids Aircraft Marine transport Chemical plantsCOUPLING (Urgency) Rail transport Space missions Airways Military early-warning Junior college Military actions Trade schools Mining R&D firms Most manufacturing Low Universities Post Office Source: Charles Perrow, “Normal Accidents: Living with High-Risk Technologies”, 1984. Flight Software Complexity 8
  9. 9. Reasons for Growthin Size and Complexity
  10. 10. Why is Flight Software Growing?“The demand for complex hardware/software systems “The demand for complex hardware/software systemshas increased more rapidly than the ability to design, has increased more rapidly than the ability to design,implement, test, and maintain them. … implement, test, and maintain them. …“It is the integrating potential of software that has “It is the integrating potential of software that hasallowed designers to contemplate more ambitious allowed designers to contemplate more ambitioussystems encompassing a broader and more systems encompassing a broader and moremultidisciplinary scope ...” multidisciplinary scope ...” Michael Lyu Michael Lyu Handbook of Software Reliability Engineering, 1996 Handbook of Software Reliability Engineering, 1996 Flight Software Complexity 10
  11. 11. Software Growth in Military AircraftFlight software is growing S o ftw are in M ilitary Aircraftbecause it is providing an P ercen t o f F u n ctio n ality P ro vid eincreasing percentage of 90 80system functionality 70 60 S o ftw areWith the newest F-22 in 502000, software controls 4080% of everything the pilot 30does 20 10Designers put functionality 0in software or firmware 1960 1964 1970 1975 1982 1990 2000 (F -4) (A -7) (F - (F -15) (F -16) (B -2) (F -22)because it is easier and/or 111)cheaper than hardware Ye a r o f In tro d u ctio n “Crouching Dragon, Hidden Software: Software in DoD Weapon Systems”, Jack Ferguson, IEEE Software, vol. 18, no. 4, pp.105-107, Jul/Aug, 2001. Flight Software Complexity 11
  12. 12. NASA Missions Factors that Increase Software Complexity• Human-rated Missions – May require architecture redundancy and associated complexity• Fault Detection, Diagnostics, and Recovery (FDDR) – FDDR requirements may result in complex logic and numerous potential paths of execution• Requirements to control/monitor increasing number of system components – Greater computer processing, memory, and input/output capability enables control and monitor of more hardware components• Multi-threads of execution – Virtually impossible to test every path and associated timing constraints• Increased security requirements – Using commercial network protocols may introduce vulnerabilities• Including features that exceed requirements – Commercial Off the Shelf (COTS) products or re-use code may provide capability that exceeds needs or may have complex interactions Source: Cathy White, MSFC Flight Software Complexity 12
  13. 13. About Complexity • But what is complexity? • Where does it appear? • Why is it getting bigger?10/09/2008 Flight Software Complexity 13
  14. 14. Definition What is Complexity?• Complexity is a measure of how hard something is to understand or achieve – Components — How many kinds of things are there to be aware of? – Connections — How many relationships are there to track? – Patterns — Can the design be understood in terms of well-defined patterns? – Requirements — Timing, precision, algorithms• Two kinds of complexity: – Essential Complexity – How complex is the underlying problem? – Incidental Complexity – What extraneous complexity have we added?• Complexity appears in at least four key areas: – Complexity in requirements “Complexity is a total – Complexity of the software itself system issue, not just – Complexity of testing the system a software issue.” – Complexity of operating the system – Orlando Figueroa Flight Software Complexity 14
  15. 15. Causes of Software Growth Expanding FunctionalityCommand sequencing Source: Bob Rasmussen, JPLTelemetry collection & formattingAttitude and velocity controlAperture & array pointingPayload managementFault detection and diagnosis “Flight software is aSafing and fault recoveryCritical event sequencing system’s complexityMomentum managementAero-braking sponge.”Fine guidance pointingGuided descent and landingData priority management Dynamic resource managementEvent-driven sequencing Long distance traversalSurface sample acquisition & handling Landing hazard avoidanceSurface mobility and hazard avoidance Model-based reasoningRelay communications Plan repairScience event detection Guided ascentAutomated planning and scheduling Rendezvous and dockingOperation on or near small bodies Guided atmospheric entry Formation flyingStar identification Tethered system soft landing Opportunistic scienceRobot arm control Interferometer control and more to come . . . and many others … Past Planned Future Flight Software Complexity 15
  16. 16. Scope, Findings, ObservationsRequirementsRequirements • Challenging requirements raise downstream complexity (unavoidable) Complexity Complexity • Lack of requirements rationale permit unnecessary requirementsSystem-LevelSystem-Level • Engineering trade studies not done: a missed opportunity Analysis & Analysis & • Architectural thinking/review needed at level of systems and software Design Design • Inadequate software architecture and lack of design patternsFlight SoftwareFlight Software • Coding guidelines help reduce defects and improve static analysis Complexity Complexity • Descopes often shift complexity to operationsVerification &Verification & • Growth in testing complexity seen at all centers Validation Validation • More software components and interactions to test Complexity Complexity • COTS software is a mixed blessing Operations • Shortsighted FSW decisions make operations unnecessarily complex Operations Complexity Complexity • Numerous “operational workarounds” raise risk of command errors Flight Software Complexity 16
  17. 17. Categorized RecommendationsArchitecture R4 More up-front analysis and architecting Link R5 Software architecture review board Link R9 Invest in a reference architecture Link R6 Grow and promote software architects LinkProject Management R2 Emphasize requirements rationale Link R3 Serious attention to trade studies Link R10 Technical kickoff for projects Link R16 Use software metrics Link R7 Involve operations engineers early and often LinkVerification R11 Use static analysis tools LinkFault Management R12 Standardize fault management terminology Link R13 Conduct fault management reviews Link R14 Develop fault management education Link R15 Research s/w fault containment techniques LinkComplexity Awareness R1 Educate about downstream effects of decisions Link Flight Software Complexity 17
  18. 18. Category: Architecture Recommendation 4 More Up-Front Analysis & ArchitectingFinding: Clear trends of increasing complexity in NASA missions– Complexity is evident in requirements, FSW, testing, and ops– We can reduce incidental complexity through better architecture “Point of view is worth 80 IQ points.” – Alan Kay, 1982 (famous computer scientist)Recommendation: Spend more time up front in requirementsanalysis and architecture to really understand the job and itssolution (What is architecture?)– Architecture is an essential systems engineering responsibility, and the architecture of behavior largely falls to software– Cheaper to deal with complexity early in analysis and architecture– Integration & testing becomes easier with well-defined interfaces and well- understood interactions– Be aware of Conway’s Law (software reflects the organizational structure that produced it) Flight Software Complexity 18
  19. 19. Architecture Investment “Sweet Spot” Predictions from COCOMO II model for software cost estimation 120% F ra c spent t s p n t o n re w rk architecture 100% 10M SLOC Lesson: Lesson: Fraction of budgettio n bu dgeon erework o+ + a rc h Projects that allocate adequately Projects that allocate adequately 80% for architecture do better for architecture do better 1M SLOC 60% 100K SLOC 40% 10K SLOC Trend: Trend: 20% The bigger the software, the bigger The bigger the software, the bigger the fraction to spend on architecture the fraction to spend on architecture 0% 0% 10% 20% 30% 40% 50% 60% 70% F ra c tio n b ud g e t s p e nt o n a r c hite c tur e Fraction of budget spent on architecture (E q u a tio n s fr o m R e in h o ltz A r c h S w e e tS p o tV 1 .n b )Note: Note: KSLO C 1 0 KSLO C 1 0 0 KSLO C 1 0 0 0 KSLO C 1 0 0 0 0Prior investment in a reference Prior investment in a referencearchitecture pays dividends architecture pays dividends Source: Kirk Reinholtz, JPL(R9) (R9) Flight Software Complexity 19
  20. 20. Category: Architecture Recommendation 5 Software Architecture Review BoardFinding: In the 1990’s AT&T had a standingArchitecture Review Board that examined proposedsoftware architectures for projects, in depth,and pointed out problem areas for rework– The board members were experts in architecture & system analysis– They could spot common problems a mile away– The review was invited and the board provided constructive feedback– It helped immensely to avoid big problemsRecommendation: Create a professional architecturereview board and add architecture reviews as a bestpractice (details) Maybe similar to Navigation Advisory Group (NAG)Options:1. Strengthen NPR 7123 re when to assess s/w architecture2. Tune AT&T’s architecture review process for NASA3. Leverage existing checklists for architecture reviews [8]4. Consider reviewers from academia and industry for very large projects Flight Software Complexity 20
  21. 21. Category: Architecture Recommendation 9 Invest in Reference Architecture & Core Assets • Finding: Although each mission is unique, they must all address common problems: attitude control, navigation, data management, fault protection, command handling, telemetry, uplink, downlink, etc. Establishment of uniform patterns for such functionality, across projects, saves time and mission-specific training. This requires investment, but project managers have no incentive to “wear the big hat”. • Recommendation: Earmark funds for development of a reference architecture (a predefine architectural pattern) and core assets, at each center, to be led and sustained by the appropriate technical line organization, with senior management supportKey – A reference architecture embodies a huge set of lessons learned, best practices, architectural principles, design patterns, etc. • Options: 1. Create a separate fund for reference architecture (infrastructure investment) 2. Keep a list of planned improvements that projects can select from as their intended contribution See backup slide on Flight Software Complexity reference architecture 21
  22. 22. Category: Project Mgmt. Recommendation 2 Emphasize Requirements RationaleFinding: Unsubstantiated requirements have causedunnecessary complexity. Rationale for requirementsoften missing or superficial or misused.Recommendation: Require rationales at Levels 2 and 3– Rationale explains why a requirement exists– Numerical values require strong justification (e.g. “99% data completeness”, “20 msec response”, etc). Why that value rather than an easier value?Notes: Work with systems engineering to provide guidance on rationale from software complexity perspective. NPR 7123, NASA System Engineering Requirements, specifies in an appendix of “best typical practices” that requirements include rationale, but offers no guidance on how to write a good rationale or check it. NASA Systems Engineering Handbook provides some guidance (p. 48). Flight Software Complexity 22
  23. 23. Category: Project Mgmt. Recommendation 3 Serious Attention to Trade StudiesFinding: Engineering trade studies often not done ordone superficially or done too late– Kinds of trade studies: flight vs. ground, hardware vs. software vs. firmware (including FPGAs), FSW vs. mission ops and ops tools– Possible reasons: schedule pressure, unclear ownership, cultureRecommendation: Ensure that trade studies areproperly staffed, funded, and done early enough This is unsatisfying because it says “Just do what you’re supposed to do”Options:1. Mandate trade studies via NASA Procedural Requirement2. For a trade study between x and y, make it the responsibility of the manager that holds the funds for both x and y “As the line between systems and “As the line between systems and software engineering blurs, software engineering blurs,3. Encourage informal-but-frequent trade multidisciplinary approaches and teams multidisciplinary approaches and teams studies via co-location (co-location are becoming imperative.” are becoming imperative.” universally praised by those who — Jack Ferguson — Jack Ferguson Director of Software Intensive Systems, DoD Director of Software Intensive Systems, DoD experienced it) IEEE Software, July/August 2001 IEEE Software, July/August 2001 Flight Software Complexity 23
  24. 24. Cautionary NoteSome recommendations are common sense, but aren’tcommon practice. Why not? Some reasons below. Cost and schedule pressure Cost and schedule pressure – Some recommendations require time and training, – Some recommendations require time and training, and the benefits are hard to quantify up front and the benefits are hard to quantify up front Lack of Enforcement Lack of Enforcement – Some ideas already exist in NASA requirements and local practices, but – Some ideas already exist in NASA requirements and local practices, but aren’t followed because of and because nobody checks for them aren’t followed because of and because nobody checks for them Pressure to inherit from previous mission Pressure to inherit from previous mission – Inheritance can be a very good thing, but “inheritance mentality” – Inheritance can be a very good thing, but “inheritance mentality” inhibits new ideas, tools, and methodologies inhibits new ideas, tools, and methodologies No incentive to “wear the big hat” No incentive to “wear the big hat” – Project managers focus on point solutions for their missions, – Project managers focus on point solutions for their missions, with no infrastructure investment for the future with no infrastructure investment for the future Flight Software Complexity 24
  25. 25. Summary Big-Picture Take-Away Message• Flight software growth is exponential, and will continue – Driven by ambitious requirements – Accommodates new functions more easily – Accommodates evolving understanding (easier to modify)• Complexity is better managed/reduced through … – Well-chosen architectural patterns, design patterns, and coding guidelines – Fault management that is dyed into the design, not painted on – Substantiated, unambiguous, testable requirements – Awareness of downstream effects of engineering decisions – Faster processors and larger memories (timing and memory margin)• Architecture addresses complexity directly – Confront complexity at the start (can’t test away complexity) – Architecture reviews (follow AT&T’s example) – Need more architectural thinkers (education, career path) – See “Thinking Outside the Box” for how to think architecturally Flight Software Complexity 25
  26. 26. Hyperlinks to Reserve SlidesOther Findings and Recommendations LinkSoftware Size and Growth LinkReasons for Growth LinkAbout Complexity LinkSoftware Defects and Verification LinkObservations on NASA Software Practices LinkHistorical Perspective LinkArchitecture and Architecting LinkSoftware Complexity Metrics LinkMiscellaneous Link Flight Software Complexity 26
  27. 27. Other Findings & Recommendations R1 Downstream effects of decisions Link R6 Grow and promote software architects Link R7 Involve operations engineers early and often Link R10 Technical kickoff for projects Link R11 Use static analysis tools Link R12 Standardize fault protection terminology Link R13 Conduct fault protection reviews Link R14 Develop fault protection education Link R15 Research in software fault containment techniques Link R16 Use software metrics Link
  28. 28. Category: Awareness Recommendation 1 Education about “effect of x on complexity”Finding: Engineers and scientists often don’t realize thedownstream complexity entailed by their decisions– Seemingly simple science “requirements” and avionics designs can have large impact on software complexity, and software decisions can have large impact on operational complexityRecommendations:– Educate engineers about the kinds of decisions that affect complexity • Intended for systems engineers, subsystem engineers, instrument designers, scientists, flight and ground software engineers, and operations engineers– Include complexity analysis as part of reviewsOptions:1. Create a “Complexity Primer” on a NASA-internal web site (link)2. Populate NASA Lessons Learned with complexity lessons3. Publish a paper about common causes of complexity Flight Software Complexity 28
  29. 29. Category: Architecture Recommendation 6 Grow and Promote Software ArchitectsFinding: Software architecture is vitally important inreducing incidental complexity, but architecture skillsare uncommon and need to be nurtured Reference: (what is architecture?) (what is an architect?)Recommendation: Increase the ranks of softwarearchitects and put them in positions of authority Analogous to Systems Engineering Leadership Development ProgramOptions:1. Target experienced software architects for strategic hiring2. Nurture budding architects through education and mentoring (think in terms of a 2-year Master’s program)3. Expand APPEL course offerings: Help systems engineers to think architecturally The architecture of behavior largely falls to software, and systems engineers must understand how to analyze control flow, data flow, resource management, and other cross-cutting issues Flight Software Complexity 29
  30. 30. Category: Project Mgmt. Recommendation 7 Involve Operations Engineers Early & OftenFindings that increase ops complexity:– Flight/ground trades and subsequent FSW descope decisions often lack operator input– Shortsighted decisions about telemetry design, sequencer features, data management, autonomy, and testability– Large stack of “operational workarounds” raise risk of command errors and distract operators from vigilant monitoring Findings are from a “gripe session on ops complexity” held at JPLRecommendations:– Include experienced operators in flight/ground trades and FSW descope decisions– Treat operational workarounds as a cost and risk upper; quantify their cost– Design FSW to allow tests to start at several well-known states (shouldn’t have to “launch” spacecraft for each test!) Flight Software Complexity 30
  31. 31. Category: Project Mgmt. Recommendation 10 Formalize a ‘Technical Kickoff’ for ProjectsFinding: Flight project engineers move from project toproject, often with little time to catch up on technologyadvances, so they tend to use the same old stuffRecommendation:– Option 1: Hold ‘technical kickoff meetings’ for projects as a way to infuse new ideas and best practices, and create champions within the project Michael Aguilar, NESC, is a strong proponent • Inspire rather than mandate • Introduces new architectures, processes, tools, and lessons • Supports technical growth of engineers– Option 2: Provide 4-month “sabbatical” for project engineers to learn a TRL 6 software technology, experiment with it, give feedback for improvements, and then infuse itSteps:1. Outline a structure and a technical agenda for a kickoff meeting2. Create a well-structured web site with kickoff materials3. Pilot a technical kickoff on a selected mission Flight Software Complexity 31
  32. 32. Category: Verification Recommendation 11 Static Analysis for Software• Finding: Commercial tools for static analysis of source code are mature and effective at detecting many kinds of software defects, but are not widely used – Example tools: Coverity, Klocwork, CodeSonar• Recommendation: Provide funds for: (a) site licenses of source code analyzers at flight centers, and (b) local guidance and support• Notes: 1. Poll experts within NASA and industry regarding best tools for C, C++, and Java 2. JPL provides site licenses for Coverity and Klocwork 3. Continue funding for OCE Tool Shed, expand use of common tools Flight Software Complexity 32
  33. 33. Category: Fault Management Recommendation 12 Fault Management Reference Standardization • Finding: Inconsistency in the terminology for fault management among NASA centers and their contractors, and a lack of reference material for which to assess the suitability of fault management approaches to mission objectives. – Example Terminology: Fault, Failure, Fault Protection, Fault Tolerance, Monitor, Response. • Recommendation: Publish a NASA Fault Management Handbook or Standards Document that provides: – An approved lexicon for fault management. – A set of principles and features that characterize software architectures used for fault management. – For existing and past software architectures, a catalog of recurring design patterns with assessments of their relevance and adherence to the identified principles and features. Findings from NASA Planetary Spacecraft Fault Management WorkshopSource: Kevin Barltrop, JPL Flight Software Complexity 33
  34. 34. Category: Fault Management Recommendation 13 Fault Management Proposal Review • Finding: The proposal review process does not assess in a consistent manner the risk entailed by a mismatch between mission requirements and the proposed fault management approach. • Recommendation: For each mission proposal generate an explicit assessment of the match between mission scope and fault management architecture. Penalize proposals or require follow-up for cases where proposed architecture would be insufficient to support fault coverage scope. – Example: Dawn recognized the fault coverage scope problem, but did not appreciate the difficult of expanding fault coverage using the existing architecture. – The handbook or standards document can be used as a reference to aid in the assessment and provide some consistency. Findings from NASA Planetary Spacecraft Fault Management WorkshopSource: Kevin Barltrop, JPL Flight Software Complexity 34
  35. 35. Category: Fault Management Recommendation 14 Develop Fault Management Education • Finding: Fault management and autonomy receives little attention within university curricula, especially within engineering programs. This hinders the development of a consistent fault management culture needed to foster the ready exchange of ideas. • Recommendation: Sponsor or facilitate the addition of a fault management and autonomy course within a university program, such as a Controls program. – Example: University of Michigan could add a “Fault Management and Autonomy Course.” Findings from NASA Planetary Spacecraft Fault Management WorkshopSource: Kevin Barltrop, JPL Flight Software Complexity 35
  36. 36. Category: Fault Management Recommendation 15 Do Research on Software Fault Containment• Finding: Given growth trends in flight software, and given current achievable defect rates, the odds of a mission-ending failure are increasing (see link) – A mission with 1 Million lines of flight code, with a low residual defect ratio of 1 per 1000 lines of code, then translates into 900 benign defects, 90 medium, and 9 potentially fatal residual software defects (i.e., these are defects that will happen, not those that could happen) – Bottom line: As more functionality is done in software, the probability of mission-ending software defects increases (until we get smarter)• Recommendation: Extend the concept of onboard fault protection to cover software failures. Develop and test techniques to detect software faults at run-time and contain their effects – One technique: upon fault detection, fall back to a simpler-but-more- verifiable version of the failed software module Flight Software Complexity 36
  37. 37. Category: Project Mgmt. Recommendation 16 Apply Software Metrics• Finding: No consistency in flight software metrics – No consistency in how to measure and categorize software size – Hard to assess amount and areas of FSW growth, even within a center – NPR 7150.2 Section 5.3.1 (Software Metrics Report) requires measures of software progress, functionality, quality, and requirements volatility• Recommendations: Development organizations should … – Seek measures of complexity at code level and architecture level – Add ‘complexity’ as new software metrics category in NPR 7150.2 – Compare to historical size & complexity for planning and monitoring – Save flight software from each mission in a repository for undefined future analyses (software archeology, SARP study)• Non-Recommendation: Don’t attempt NASA-wide metrics. Better to drive local center efforts. (See slide) “The 777 marks the first time The Boeing Company has applied software metrics uniformly across a a new commercial-airplane “The 777 marks the first time The Boeing Company has applied software metrics uniformly across new commercial-airplane programme. This was done to ensure simple, consistent communication of information pertinent to software schedules among programme. This was done to ensure simple, consistent communication of information pertinent to software schedules among Boeing, its software suppliers, and its customers—at all engineering and management levels. In the short term, uniform Boeing, its software suppliers, and its customers—at all engineering and management levels. In the short term, uniform application of software metrics has resulted in improved visibility and reduced risk for 777 on-board software.” application of software metrics has resulted in improved visibility and reduced risk for 777 on-board software.” Robert Lytz, “Software metrics for the Boeing 777: a a case study”, Software Quality Journal, Springer Netherlands Robert Lytz, “Software metrics for the Boeing 777: case study”, Software Quality Journal, Springer Netherlands Flight Software Complexity 37
  38. 38. Category: Verification Observation Analyze COTS for Testing Complexity COTS software is a mixed blessingFinding: COTS software provides valuable functionality,but often comes with numerous other features that arenot needed. However, the unneeded features oftenentail extra testing to check for undesired interactions.Recommendation: In make/buy decisions, analyzeCOTS software for separability of its components andfeatures, and thus their effect on testing complexity– Weigh the cost of testing unwanted features against the cost of implementing only the desired features Flight Software Complexity 38
  39. 39. Software Size and GrowthSoftware Growth in Military Aircraft LinkSize Comparison of Embedded Software LinkGrowth in Automobile Software at GM LinkFSW Growth Trend in JPL Missions LinkMSFC Flight Software Sizes LinkGSFC Flight Software Sizes LinkAPL Flight Software Sizes Link
  40. 40. Flight Software Growth Trend: JPL Missions JPL With a vertical axis of size x speed, this chart data shows growth keeping pace with Moore’s Law 109 MSL 108 MER Size Pathfinder, MGS, DS1… 107 × Speed 106 Cassini(bytes × MIPS) MO 105 Doubling time < 2 years Doubling time < 2 years GLL, Magellan 104 Consistent with Moore’s Law VGR (i.e., bounded by capability) 103 Viking 1970 1980 1990 2000 2010 Launch YearSource: Bob Rasmussen, JPL Flight Software Complexity 40
  41. 41. MSFC Flight Software Organization (no trend) SSME - - Space Shuttle Main Engine ~30K SLOCMSFC SSME Space Shuttle Main Engine ~30K SLOC C/assembly (1980’s –– 2007) C/assembly (1980’s 2007) data LCT - - Low Cost Technology (FASTRAC engine) LCT Low Cost Technology (FASTRAC engine) ~30K SLOC C/Ada (1990’s) ~30K SLOC C/Ada (1990’s) SSFF –– Space Station Furnace Facility ~22K SLOC S o u rce L in e o f C o d e (S L O C ) H isto ry SSFF Space Station Furnace Facility ~22K SLOC C (cancelled 1997) C (cancelled 1997) 70 MSRR –– Microgravity Science Research Rack MSRR Microgravity Science Research Rack ~60K SLOC C (2001 - - 2007) 60 ~60K SLOC C (2001 2007) 50 UPA –– Urine Processor Assembly ~30K SKOC C UPA Urine Processor Assembly ~30K SKOC C K SLOC (2001 - - 2007) 40 (2001 2007) 30 AVGS DART –– Advanced Video Guidance System AVGS DART Advanced Video Guidance System 20 for Demonstration of Automated Rendezvous for Demonstration of Automated Rendezvous Technology ~18K SLOC C (2002 - - 2004) 10 Technology ~18K SLOC C (2002 2004) 0 AVGS OE – AVGS for Orbital Express ~16 K SLOC AVGS OE – AVGS for Orbital Express ~16 K SLOC C (2004 - - 2006) SSME SSFF UP A AVGS A res A res J- C (2004 2006) OE FC 2X SSME AHMS –– Space Shuttle Main Engine SSME AHMS Space Shuttle Main Engine P ro je ct Advanced Health Management System ~42.5K Advanced Health Management System ~42.5K SLOC C/assembly (2006 flight) SLOC C/assembly (2006 flight) FC - - Ares Flight Computer estimated ~60K SLOC FC Ares Flight Computer estimated ~60K SLOC TBD language (2007 SRR) TBD language (2007 SRR) CTC - - Ares Command and Telemetry Computer CTC Ares Command and Telemetry Computer estimated ~30K SLOC TBD language (2007 SRR) estimated ~30K SLOC TBD language (2007 SRR) Ares J-2X engine initial estimate ~15K SLOC TBD Ares J-2X engine initial estimate ~15K SLOC TBD language (2007 SRR) language (2007 SRR)Source: Cathy White, MSFC Flight Software Complexity 41
  42. 42. GSFC Flight Software Sizes (no trend) F S W S iz e fo r G S F C M issio n s 160000 140000 120000 100000 NCS L 80000 60000 40000 20000 0 1997 2001 2006 2009 2009 TR M M MAP S T-5 S DO LR O Ye a r a n d M issio nSource: David McComas, GSFC Note: LISA expected to be much larger Flight Software Complexity 42
  43. 43. APL Flight Software Sizes (no trend) 160000 Horizons 140000 TIMED Messenger New MSX Stereo 120000 100000 Lines o 80000 Contour 60000 NEAR 40000 ACE 20000 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 L au n ch D ateSource: Steve Williams, APL Flight Software Complexity 43
  44. 44. Software Defects and Verification Residual Defects in Software Link Software Development Process Link Defects, latent defects, residual defects Link Is there a limit to software size? Link
  45. 45. Technical Reference Residual Defects in Software• Each lifecycle phase involves human effort and therefore inserts some defects• Each phase also has reviews and checks and therefore also removes defects• Difference between the insertion and removal rates determines defect propagation rate• the propagation rate at the far right determines the residual defect rate• For a good industry-standard software process, residual defect rate is typically 1-10 per KNCSL• For an exceptionally good process (e.g., Shuttle) it can be as low as 0.1 per KNCSL• It is currently unrealistic to assume that it could be zero…. defect insertion rate 6 23 46 1 2 5 25 2 residual defects reqs design coding testing after testing (anomalies) 4 20 26 24 defect removal rate Propagation of S.G. Eick, C.R. Loader et al., Estimating software fault content before coding, Proc. 15th Int. Conf. on Software Eng., Melbourne, Australia, 1992, pp. 59-65 residual defects Flight Software Complexity 45
  46. 46. Software Development Process for Safety- & Mission-Critical Code 1: reduce defect insertion rates 3: reduce risk require- ments design coding testing from residual software defects 2: increase effectiveness of defect removal with tool based techniquesrequirements model-based design, static source code analysis run-time monitoring capture and prototyping / formal increased assertion density techniquesanalysis tools verification techniques, NASA standard for Reliable C property-based logic model checking, verifiable coding guidelines testing techniques code synthesis methods compliance checking tools sw fault containment strategies test-case generation from requirements / traceability Source: Gerard Holzmann, JPL Flight Software Complexity 46
  47. 47. How good are state-of-the-art software testing methods? • Most estimates put the number of residual defects for a good software process at 1 to 10 per KNCSL – A residual software defect is a defect missed in 1 Million lines of code testing, that shows up in mission operations – A larger, but unknowable, class of defects is defects caught in known as latent software defects – these are all unit & integration defects present in the code after testing that testing (99%) could strike – only some of which reveal themselves as residual defects in a given interval of time. • Residual defects occur in any severity category latent defects (1%) software – A rule of thumb is to assume that the severity defects ratios drop off by powers of ten: if we use 3 missed in severity categories with 3 being least and 1 most testing damaging, then 90% of the residual defects will be category 3, 9% category 2, and 1% category 1 (potentially fatal). – A mission with 1 Million lines of flight code, with a residual defects (0.1%) defects low residual defect ratio of 1 per KNCSL, then that conservatively: 100-1,000 translates into 900 benign defects, 90 medium, occur in severity 1 defects and 9 potentially fatal residual software defects flight (potentially fatal) (i.e., these are defects that will happen, not those that could happen) (0.001%) conservatively: 1-10Source: Gerard Holzmann, JPL Flight Software Complexity 47
  48. 48. Thought Experiment Is there a limit to software size?Assumptions:• 1 residual defect per 1,000 lines of code (industry average)• 1 in every 100 residual defects occur in the 1st year of operation• 1 in every 1000 residual defects can lead to mission failure• System/software methods are at current state of the practice (2008) 1.0 certainty of failure beyond this size probability of system 0.5 failure beyond this size code is more likely to fail than to work 0.0 code size spacecraft commercial 50M 100M in NCSL software software time Long-term trend: increasing code size with each new mission Flight Software Complexity 48
  49. 49. Observations aboutNASA Software Practices
  50. 50. Impediments to Software Architecture within NASA•• Inappropriate modeling techniques Inappropriate modeling techniques –– “Software architecture is just boxes and lines” “Software architecture is just boxes and lines” –– “Software architecture is just code modules” “Software architecture is just code modules” As presented by – “A layered diagram says ititall” – “A layered diagram says all” Prof. David Garlan (CMU) at•• Misunderstanding about role of architecture NASA Planetary Spacecraft Misunderstanding about role of architecture in product lines and architectural reuse Fault Management Workshop, in product lines and architectural reuse – “A product line is just a reuse library” 4/15/08 – “A product line is just a reuse library”•• Impoverished culture of architecture design Impoverished culture of architecture design –– No standards for arch description and analysis No standards for arch description and analysis –– Architecture reviews are not productive Architecture reviews are not productive –– Architecture is limited to one or two phases Architecture is limited to one or two phases –– Lack of architecture education among engineers Lack of architecture education among engineers•• Failure to take architecture seriously Failure to take architecture seriously – “We always do ititthat way. It’s cheaper/easier/less risky – “We always do that way. It’s cheaper/easier/less risky to do ititthe way we did ititlast time.” to do the way we did last time.” – “They do itita certain way ‘out there’ so we should too.” – “They do a certain way ‘out there’ so we should too.” – “We need to reengineer ititfrom scratch because the – “We need to reengineer from scratch because the mission is different from all others.” mission is different from all others.” Flight Software Complexity 50
  51. 51. Observations Poor Software Practices within NASANo formal documentation of requirementsLittle to no user involvement during requirements definitionRushing to start design & code before requirements are understood.Wildly optimistic beliefs in re-use (especially when it comes to costing and planning).Planning to use new compilers, operating systems, languages, computers for the first time as if theywere proven entities.Poor configuration management (CM)Inadequate ICDsUser interfaces left up to software designers rather than prototyping and baselining as part of therequirementsBig Bang Theory: All software from all developers comes together at end and miraculously worksPlanning that software will work with little or no errors found in every test phase.Poor integration planning (both SW-to-SW and SW-to-HW) (e.g., no early interface/integration testing)No pass/fail criteria at milestones (not that software is unique in this). Holding reviews when artifacts arenot ready.Software too far down the program management hierarchy An illustrative butto have visibility into its progress incomplete list of poorLittle to no life-cycle documentation software practicesInadequate to no developmental metrics collected/analyzed observed in NASA.No knowledgeable NASA oversight John Hinkle, LaRC Flight Software Complexity 51
  52. 52. Historical Perspective
  53. 53. History NATO Software Engineering Conference 1968• This landmark conference, which introduced the term “software engineering”, was called to address “the software crisis”.• Discussions of wide interest: – problems of achieving sufficient reliability in software systems – difficulties of schedules and specifications on large software projects – education of software engineersQuotes from the 1968 report: “I am concerned about the current growth of“There is a widening gap between systems, and what I expect is probably an exponential growth of errors. Should we haveambitions and achievements in software systems of this size and complexity?”engineering.” “The general admission of the existence of the“Particularly alarming is the seemingly software failure in this group of responsibleunavoidable fallibility of large software, people is the most refreshing experience I havesince a malfunction in an advanced had in a number of years, because the admissionhardware-software system can be a of shortcomings is the primary condition formatter of life and death …” improvement.” Flight Software Complexity 53
  54. 54. Epilogue • Angst about software complexity in 2008 is the same as in 1968 (See NATO 1968 report, slide) – We build systems to the limit of our ability – In 1968, 10K lines of code was complex – Now, 1M lines of code is complex, for the same price“While technology can change quickly, getting your people to change takes a greatdeal longer. That is why the people-intensive job of developing software has hadessentially the same problems for over 40 years. It is also why, unless you dosomething, the situation won’t improve by itself. In fact, current trends suggest thatyour future products will use more software and be more complex than those oftoday. This means that more of your people will work on software and that theirwork will be harder to track and more difficult to manage. Unless you make somechanges in the way your software work is done, your current problems willlikely get much worse.” Winning with Software: An Executive Strategy, 2001 Watts Humphrey, Fellow, Software Engineering Institute, and Recipient of 2003 National Medal of Technology Flight Software Complexity 54
  55. 55. Architecture and Architecting
  56. 56. What is Architecture? • Architecture is an essential systems engineering responsibility, which deals with the fundamental organization of a system, as embodied in its components and their relationships to each other and to the environment – Architecture addresses the structure, not only of the system, but also of its functions, the environment within which it will work, and the process by which it will be built and operated • Just as importantly, however, architecture also deals with the principles guiding the design and evolution of a system – It is through the application and formal evaluation of architectural principles that complexity, uncertainty, and ambiguity in the design of complicated systems may be reduced to workable concepts – In the best practice of architecture, this aspect of architecture must not be understated or neglectedSource: Bob Rasmussen, JPL Flight Software Complexity 56
  57. 57. Architecture Some Essential Ideas • Architecture is focused on fundamentals – An architecture that must regularly change as issues arise provides little guidance – Architecture and design are not the same thing • Guidance isn’t possible if the original concepts have little structural integrity to begin with – Choices must be grounded in essential need and solid principles – Otherwise, any migration away from the original high level design is easy to justify • Even if the structural integrity is there, it can be lost if it is poorly communicated or poorly stewarded – The result is generally ever more inflexible and brittleSource: Bob Rasmussen, JPL Flight Software Complexity 57
  58. 58. Reference What is Software Architecture?• The software architecture of a program or computing system is the structure or structures of the system, which comprise software elements, the externally visible properties of those elements, and the relationships among them.” Software Architecture in Practice, 2nd edition, Bass, Clements, Kazman, 2003, Addison-Wesley.• Noteworthy points: – Architecture is an abstraction of a system that suppresses some details – Architecture is concerned with the public interfaces of elements and how they interact at runtime – Systems comprise more than one structure, e.g., runtime processes, synchronization relations, work breakdown, etc. No single structure is adequate. – Every software system has an architecture, whether or not documented, hence the importance of architecture documentation – The externally visible behavior of each element is part of the architecture, but not the internal implementation details – The definition is indifferent as to whether the architecture is good or bad, hence the importance of architecture evaluation Flight Software Complexity 58
  59. 59. What is an Architect? • An architect defines, documents, maintains, improves, and certifies proper implementation of an architecture — both its structure and the principles that guide it – An architect ensures through continual attention that the elements of a system come together in a coherent whole – Therefore, in meeting these obligations the role of architect is naturally concerned with leadership of the design effort throughout the development lifecycle • An architect must ensure that… – The architecture (elements, relationships, principles) reflects fundamental, stable concepts – The architecture is capable of providing sound guidance throughout the whole process – The concept and principles of the architecture are never lost or compromisedSource: Bob Rasmussen, JPL Flight Software Complexity 59
  60. 60. Architect Essential Activities • Understand what a system must do • Define a system concept that will accomplish this • Render that concept in a form that allows the work to be shared • Communicate the resulting architecture to others • Ensure throughout development, implementation, and testing that the design follows the concepts and comes together as envisioned • Refine ideas and carrying them forward to the next generation of systemsSource: Bob Rasmussen, JPL Flight Software Complexity 60
  61. 61. Architectural Activities in More Detail (1) • Function – Help formulate the overall system objectives – Help stakeholders express what they care about in an actionable form – Capture in scenarios where and how the system will be used, and the nature of its targets and environment – Define the scope of the architecture, including external relationships • Definition – Select and refine concepts on which the architecture might be based – Define essential properties concepts must satisfy, and the means by which they will be analyzed and demonstrated – Perform trades and assess options against essential properties — both to choose the best concept and to help refine objectives • Articulation – Render selected concepts in elements that can be developed further – Choose carefully the structure and relationships among the elements – Identify the principles that will guide the evolution of the design – Express these ideas in requirements for the elements and their relationships that are complete, but preserve flexibilitySource: Bob Rasmussen, JPL Flight Software Complexity 61

×