Challenge to Endeavour Discovery of Atlantis in Columbia
On hardware and software used in NASA Space Shuttle Program
HP Service Virtualization, Prague, September 23rd 2015, Martin Dvorak
Change of the motivation, value and complexity
Why?
• NASA survival and funding
• Space Task Group
• 1969: Space Transportation System Program (STS)
– Permanent space station: 6 > 120 men at the top of LEO
– LEO shuttle
– Inter orbit space tug
– LEO to solar system NERVA engine shuttle
• 1972: Space Shuttle Program
– STS program de-scope and cost reduction
– NASA & DoD
– Reusability, cost, … and much more promises
– Use cases for the shuttle
James Fletcher & Richard Nixon
Space Shuttle Program approval
1972
Shuttle concepts - early 1970s
VP Spiro Agnew
Space Task Group
1969
1972 - 2011 (+ DoD)
Space Shuttle Program
• Space shuttle purpose
– LEO van (satellites, telescopes, Earth atmosphere research, …)
– Hubble (559km): service missions (STS-31, STS-61, STS-125, …)
• Space shuttle = orbiter + external tank + solid rocket boosters
– 7 astronauts, 1-2 week missions
– 135* missions (1981 - 2011)
– 2.000t full w/ 32t payload capacity to LEO
• Orbiter
– 4-6 millions parts; 90 days check
– $1.700.000.000 base price + $450.000.000/mission
– 5 orbiters: Atlantis, Challenger, Columbia, Discovery, Endeavour
• Enterprise prototype
GPCs redundancy, IOPs, Data Buses (24) and MMUs > engines, boosters, tank, …
Onboard Hardware
B. J. Thomas
Manager Apollo/Saturn and
Shuttle HW
IBM
Lynn Killingbeck
Senior System Analyst
(HW redundancy)
IBM
GPC = CPU + IOP
Hardware
General Purpose Computer
5x GPC + 2x MMU located below the cockpit
Main engine controller
From AGC/PGNCS to GPC
Hardware
• IBM AP-101
– IBM mainframe architecture w/ unique IOP & bus system
– 2x8 32b registers, 154 instructions 550W, 29kg, MTBS 10.000h
– US Army: B-52, B-1B (8 units), F-15 … (JOVIAL/Ada)
– Advanced self HW/SW test
• Integrity: 95% of HW failures detected; 5% of SW failures via redundancy
– No HDD - tape cartridges instead (MMU) as SW didn’t fit
• GPCs for the shuttle: IBM AP-101B/S (IOP+bus)
– 1st generation (1981-1989): 424kB of magnetic core
memory (Apollo AGC), 400.000 instructions/s
– 2nd generation (1990-2011): 1MB, 1.200.000
instructions/s (3x space & time); semiconductor
memory w/ backup battery
• Onboard: 5x GPC = 4x PASS @ lockstep + 1x BFS
IBM AP-101B 1st … generationIBM AP-101S … 2nd generation
Core memory page Semiconductor memory board
RAM
Software: Space Shuttle Mission Sequence
SW driven mission sequence
PASS, HAL/S and OPS
Onboard Software
• PASS: Primary Avionics Software System
– System Software
• Flight Computer OS (FCOS) w/ redundancy ctrl
• UI
• System Control Programs
– Application Software
• Guidance & Navigation & Control
• (Orbit) Systems Management
• Payload & Checkout
• PASS Functions ~ Mission Sequence
– Pre-flight > Ascent > On-orbit > Descent
• PASS Development
– 420.000 lines in HAL/S (IBM Federal Systems…)
– 700kB (didn’t fit to GPC RAM > split to OPS)
• HAL/S (High-order Assembly Language/Shuttle)
– Intermetrics: language (spec) and compiler
• Apollo veterans + Arra Avakian (linker, HP OpenView)
– Reliability + real-time environments support
– Free form language: modules, functions, vector arithmetic,
multilines, …
• Operation Sequences (OPS)
– OPSs implement PASS functions
– OPS = SPECs (ctrl by human) + DISPs (UI)
– OPS code loaded from MMU (data kept: vectors, …)
OPS overview: mission sequence like structure
Reliability via Redundancy and Quality
Software: Redundancy
• Hardware/Software redundancy (deployment)
• PASS running on 4 GPCs in lockstep
– On PASS GPC inconsistency/failure: GPCs vote to deselect failed one
– FCOS driven redundancy scheme solved by
NASA/Rockwell/IBM in 1975
– Lockstep synced GPCs every 3-4ms on I/Os
– OS redesigned to priority driven two level (40ms & 960ms) task
scheduler
- remind Margaret Hamilton’s PGNCS software and Moon
landing overload
– On PASS GPCs total failure BFC takes control
• Backup Flight Computer runs independently w/ different SW
• Never used
Annunciator (warning panel) Display Unit
Process and statistical analysis driven software development
Software: Development & Quality
• PASS development
– Started ’74 (Apollo + new hires), 1st flight in ’81, released every 6 - 9 months
– 2.000 requirements
– 420.000 lines of code
• … and 1.400.000 lines of code to build/test/develop/simulate/configure
– 275 people (‘95)
• Strategy to achieve high quality
– Process
• manage+control+measure+analyze software via (meta)data collected to
perform (statistical) analysis (30+ years of statistics, process
improvements, experience and lessons learned… 25 year old bugs ;)
– Resources
• enough people - highly skilled peers cooperate on small portion of code
• enough time
• infrequent/tiny changes
• heavy weight (7 level) testing
• relatively small amount of code in contrast to commercial avionics SW
James Orr
Chief Engineer
(PASS)
United Space Alliance
Tony Macina
Manager Flight Operations
(Test Team)
IBM
Small things make huge difference
Lessons Learned
• Quality (Meta)data Creation
– Commit messages, bug tracking system descriptions, review reports, …
– Analytics, metrics, statistics, …
• Incremental Process Improvement
– Chronicle of systematic incremental improvements w/ analytics
– Defect elimination process (+ analogous process improvement)
• Core Features Investment
– Key parts/components of software to be built according to well known quality
principles w/ enough resources
– People, time, reviews, changes, testing, code…
Anyone who sits on top of the largest hydrogen-oxygen fueled system
in the world; knowing they're going to light the bottom — and doesn't
get a little worried — does not fully understand the situation.
— John Young, after making the first Space Shuttle flight.

On NASA Space Shuttle Program Hardware and Software

  • 1.
    Challenge to EndeavourDiscovery of Atlantis in Columbia On hardware and software used in NASA Space Shuttle Program HP Service Virtualization, Prague, September 23rd 2015, Martin Dvorak
  • 2.
    Change of themotivation, value and complexity Why? • NASA survival and funding • Space Task Group • 1969: Space Transportation System Program (STS) – Permanent space station: 6 > 120 men at the top of LEO – LEO shuttle – Inter orbit space tug – LEO to solar system NERVA engine shuttle • 1972: Space Shuttle Program – STS program de-scope and cost reduction – NASA & DoD – Reusability, cost, … and much more promises – Use cases for the shuttle James Fletcher & Richard Nixon Space Shuttle Program approval 1972 Shuttle concepts - early 1970s VP Spiro Agnew Space Task Group 1969
  • 3.
    1972 - 2011(+ DoD) Space Shuttle Program • Space shuttle purpose – LEO van (satellites, telescopes, Earth atmosphere research, …) – Hubble (559km): service missions (STS-31, STS-61, STS-125, …) • Space shuttle = orbiter + external tank + solid rocket boosters – 7 astronauts, 1-2 week missions – 135* missions (1981 - 2011) – 2.000t full w/ 32t payload capacity to LEO • Orbiter – 4-6 millions parts; 90 days check – $1.700.000.000 base price + $450.000.000/mission – 5 orbiters: Atlantis, Challenger, Columbia, Discovery, Endeavour • Enterprise prototype
  • 4.
    GPCs redundancy, IOPs,Data Buses (24) and MMUs > engines, boosters, tank, … Onboard Hardware B. J. Thomas Manager Apollo/Saturn and Shuttle HW IBM Lynn Killingbeck Senior System Analyst (HW redundancy) IBM
  • 5.
    GPC = CPU+ IOP Hardware General Purpose Computer 5x GPC + 2x MMU located below the cockpit Main engine controller
  • 6.
    From AGC/PGNCS toGPC Hardware • IBM AP-101 – IBM mainframe architecture w/ unique IOP & bus system – 2x8 32b registers, 154 instructions 550W, 29kg, MTBS 10.000h – US Army: B-52, B-1B (8 units), F-15 … (JOVIAL/Ada) – Advanced self HW/SW test • Integrity: 95% of HW failures detected; 5% of SW failures via redundancy – No HDD - tape cartridges instead (MMU) as SW didn’t fit • GPCs for the shuttle: IBM AP-101B/S (IOP+bus) – 1st generation (1981-1989): 424kB of magnetic core memory (Apollo AGC), 400.000 instructions/s – 2nd generation (1990-2011): 1MB, 1.200.000 instructions/s (3x space & time); semiconductor memory w/ backup battery • Onboard: 5x GPC = 4x PASS @ lockstep + 1x BFS IBM AP-101B 1st … generationIBM AP-101S … 2nd generation Core memory page Semiconductor memory board RAM
  • 7.
    Software: Space ShuttleMission Sequence SW driven mission sequence
  • 8.
    PASS, HAL/S andOPS Onboard Software • PASS: Primary Avionics Software System – System Software • Flight Computer OS (FCOS) w/ redundancy ctrl • UI • System Control Programs – Application Software • Guidance & Navigation & Control • (Orbit) Systems Management • Payload & Checkout • PASS Functions ~ Mission Sequence – Pre-flight > Ascent > On-orbit > Descent • PASS Development – 420.000 lines in HAL/S (IBM Federal Systems…) – 700kB (didn’t fit to GPC RAM > split to OPS) • HAL/S (High-order Assembly Language/Shuttle) – Intermetrics: language (spec) and compiler • Apollo veterans + Arra Avakian (linker, HP OpenView) – Reliability + real-time environments support – Free form language: modules, functions, vector arithmetic, multilines, … • Operation Sequences (OPS) – OPSs implement PASS functions – OPS = SPECs (ctrl by human) + DISPs (UI) – OPS code loaded from MMU (data kept: vectors, …)
  • 9.
    OPS overview: missionsequence like structure
  • 10.
    Reliability via Redundancyand Quality Software: Redundancy • Hardware/Software redundancy (deployment) • PASS running on 4 GPCs in lockstep – On PASS GPC inconsistency/failure: GPCs vote to deselect failed one – FCOS driven redundancy scheme solved by NASA/Rockwell/IBM in 1975 – Lockstep synced GPCs every 3-4ms on I/Os – OS redesigned to priority driven two level (40ms & 960ms) task scheduler - remind Margaret Hamilton’s PGNCS software and Moon landing overload – On PASS GPCs total failure BFC takes control • Backup Flight Computer runs independently w/ different SW • Never used Annunciator (warning panel) Display Unit
  • 11.
    Process and statisticalanalysis driven software development Software: Development & Quality • PASS development – Started ’74 (Apollo + new hires), 1st flight in ’81, released every 6 - 9 months – 2.000 requirements – 420.000 lines of code • … and 1.400.000 lines of code to build/test/develop/simulate/configure – 275 people (‘95) • Strategy to achieve high quality – Process • manage+control+measure+analyze software via (meta)data collected to perform (statistical) analysis (30+ years of statistics, process improvements, experience and lessons learned… 25 year old bugs ;) – Resources • enough people - highly skilled peers cooperate on small portion of code • enough time • infrequent/tiny changes • heavy weight (7 level) testing • relatively small amount of code in contrast to commercial avionics SW James Orr Chief Engineer (PASS) United Space Alliance Tony Macina Manager Flight Operations (Test Team) IBM
  • 12.
    Small things makehuge difference Lessons Learned • Quality (Meta)data Creation – Commit messages, bug tracking system descriptions, review reports, … – Analytics, metrics, statistics, … • Incremental Process Improvement – Chronicle of systematic incremental improvements w/ analytics – Defect elimination process (+ analogous process improvement) • Core Features Investment – Key parts/components of software to be built according to well known quality principles w/ enough resources – People, time, reviews, changes, testing, code…
  • 13.
    Anyone who sitson top of the largest hydrogen-oxygen fueled system in the world; knowing they're going to light the bottom — and doesn't get a little worried — does not fully understand the situation. — John Young, after making the first Space Shuttle flight.