HAB Software Woes
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

HAB Software Woes

  • 3,496 views
Uploaded on

My talk from the UKHAS 2012 conference about problems in HAB software.

My talk from the UKHAS 2012 conference about problems in HAB software.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,496
On Slideshare
1,154
From Embeds
2,342
Number of Embeds
5

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 2,342

http://blog.jgc.org 2,308
http://newsblur.com 21
http://www.newsblur.com 11
http://feeds.jpvanoosten.nl 1
http://translate.googleusercontent.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HAB Software WoesJohn Graham-CummingSeptember 2012Or “My capsule didn‟t crash but my software did”
  • 2. Background  > 30 years of programming experience  One HAB flight ◦ GAGA-1http://blog.jgc.org/2011/04/gaga-1-flight.htmlhttps://github.com/jgrahamc/gaga
  • 3. Where‟s your flight‟scomplexity? Example: GAGA-1 ◦ One balloon, parachute, polystyrene box ◦ Many metres of cord attached with knots ◦ An off-the-shelf camera ◦ 2,836 lines of code ◦ Common to see defect rates of 2 to 4 per KLOC ◦ So GAGA-1 likely has 5 to 10 errors in it
  • 4. Real Stuff Seen on HABflights Complete computer crash Altitude going negative Latitude and longitude garbled Cutdown triggered in back of car Long periods of no transmission Not setting the GPS up before launch Not turning the camera on Running out of camera disk space Altitude jumping around rhythmically
  • 5. The Curse and Joy ofDeterminism Computers do what you tell them to ◦ Precisely what you tell them to ◦ Not what you think you told them to do A Curse ◦ Will do things you don‟t expect ◦ Will process bogus input without complaint The Joy ◦ Easy to test that it does what‟s expected
  • 6. HAB Is A Harsh Environment Cold Vibration Stuff breaks in flight Software needs to be able to cope with failing hardware Very important to think about failure modes YOUR CODE IS ON ITS OWN OUT THERE
  • 7. Deadly Sins The “It works!” Fallacy The Last Minute Change Being Far Too Clever Overlooking Odd Behaviour Copying Other People‟s Code Assuming Finding A Bug Solves The Problem
  • 8. The “It works!” Fallacy If you‟re an inexperienced (and sometimes experienced) programmer… ◦ You hack some code together ◦ It works once ◦ You assume it will always work Only solution to this is ◦ Testing ◦ Paranoia
  • 9. The Last Minute Change Never, ever change anything in code at the last minute no matter how simple. Example: HABE 1 ◦ Complete camera failure ◦ Maximum integer size in uBASIC on CHDK is 999,999 ◦ Last minute change of integer from 600,000 to 1,000,000 caused total failure
  • 10. Being Far Too Clever  Example: GAGA-1 ◦ Entered the wrong value of 2 * pi in code to do GPS position conversion from radians to degrees ◦ Caught before flight because I verified the location of my own back garden ◦ Note to self: 2 * pi != 6.2818.https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/gps.cpp#L113
  • 11. Overlooking Odd Behaviour  Example: GAGA-1 ◦ In tests RTTY output was fine some of the time, garbled at other times ◦ Turned out to be interrupts from the GPS messing up the RTTY timing ◦ Solution: disable GPS serial interface while sending RTTY string  ALWAYS BE HONEST WITH YOURSELF ABOUT YOUR CODE  EXPECT THE SPANISH INQUISITION!https://github.com/jgrahamc/gaga/blob/master/gaga-1/flight/gaga1/tsip.cpp#L229
  • 12. Copying Other People‟s Code  Don‟t do this, you have no idea what you are copying or who they copied it from  Better practice is to look at other people‟s code and… ◦ Write your own version ◦ That you understand ◦ That you are able to test ◦ Example: GAGA-1  Read lots of people‟s RTTY code, wrote my ownhttps://github.com/jgrahamc/gaga/blob/master/gaga-
  • 13. APRS Tracker using copied code If the altitude in metres contained an 8 or a 9 the altitude reported would be wronghttp://sharon.esrac.ele.tue.nl/users/pe1rxq/aprstracker/aprstracker.html
  • 14. Assuming Finding The BugSolves The Problem Just because you‟ve found A bug doesn‟t mean it was THE bug Lots of research in computer science shows bugs tend to cluster Example: CLOUD1, CLOUD2 ◦ Three bugs in printing latitude, longitude and altitude ◦ One fixed on CLOUD1, …
  • 15. “The One Thing I Didn‟t Test” http://ukhas.org.uk/guides:common_coding_errors_payload_testing
  • 16. Common problems with uC Lack of floating point support Small integers
  • 17. You might never be agreat programmer…… but you can be aparanoid tester!
  • 18. Good Things To Do No infinite loops Self-Checking Unexpected Error Handling Handle Exceptions Simulation Simplify, Simplify, Simplify Unit Test Write Log Files
  • 19. No Infinite Loops Never sit in a loop waiting forever Example: ATLAS 3while (1) { // Make sure data is available to read if (Serial.available()) { b = Serial.read(); if(bytePos == 8){ navmode = b; return true; } bytePos++; } // Timeout if no valid response in 3 seconds if (millis() - startTime > 3000) { navmode = 0; return false; } }} https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L
  • 20. Self-Checking -- Now enter a self-check of the manual mode settings log( "Self-check started" ) assert_prop( 49, -32764, "Not in manual mode" ) assert_prop( 5, 0, "AF Assist Beam should be Off" ) assert_prop( 6, 0, "Focus Mode should be Normal" ) assert_prop( 8, 0, "AiAF Mode should be On" ) assert_prop( 21, 0, "Auto Rotate should be Off" ) assert_prop( 29, 0, "Bracket Mode should be None" ) assert_prop( 57, 0, "Picture Mode should be Superfine" ) assert_prop( 66, 0, "Date Stamp should be Off" ) assert_prop( 95, 0, "Digital Zoom should be None" ) assert_prop( 102, 0, "Drive Mode should be Single" ) assert_prop( 133, 0, "Manual Focus Mode should be Off" ) assert_prop( 143, 2, "Flash Mode should be Off" ) assert_prop( 149, 100, "ISO Mode should be 100" ) assert_prop( 218, 0, "Picture Size should be L" ) assert_prop( 268, 0, "White Balance Mode should be Auto" ) assert_gt( get_time("Y"), 2009, "Unexpected year" ) assert_gt( get_time("h"), 6, "Hour appears too early" ) assert_lt( get_time("h"), 20, "Hour appears too late" ) assert_gt( get_vbatt(), 3000, "Batteries seem low" ) assert_gt( get_jpg_count(), ns, "Insufficient card space" )https://github.com/jgrahamc/gaga/blob/master/gaga-1/camera/gaga-1.lua#L96
  • 21. Self-Checking  Example: ALTAS 3  Makes sure uBlox GPS will work at high altitude; fixes it if not if((count % 10) == 0) { digitalWrite(6, LOW); checkNAV(); delay(1000); if(navmode != 6){ setupGPS(); delay(1000); } checkNAV(); delay(1000); digitalWrite(6, HIGH); }https://github.com/jamescoxon/Atlas-Flight-Computer/blob/master/Atlas3/Atlas3_3.pde#L3
  • 22. Unexpected Error Handling def temperature(): t = at.cmd( AT#TEMPMON=1 ) # Command returns something like: # # #TEMPMEAS: 0,28 # # OK # # So split on whitespace first to isolate the temperate 0,28 # and then split on comma to get the temperature w = t.split() if len(w) < 2: logger.log( "Temperature read returned %s" % t ) return -1000 m = w[1].split(,) if len(m) != 2: logger.log( "Temperature read returned %s" % t ) return -1000 else: return int(m[1])https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/util.py
  • 23. Handle Exceptions  If your language can generate exceptions then you‟d better handle them!  Example: GAGA-1 ◦ Recovery computer used Python ◦ Exception could have killed it ◦ Global exception handler except: logger.log( "Caught exception in main loop: %s" % sys.exc_info()[1] )  Bonus: What‟s wrong with that code?https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/gaga-1.py#L144
  • 24. Simulation Simulate a flight Example: UKHAS wiki has example of using a PC as a fake GPShttp://www.ukhas.org.uk/guides:common_coding_errors_payload_testing Example: GAGA-1 ◦ To test the embedded Telit module wrote modules that faked the entire Telit Python interface.https://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/GPS.pyhttps://github.com/jgrahamc/gaga/blob/master/gaga-1/recovery/MDM.py
  • 25. Simplify, Simplify, Simplify Make your code as simple as possible Never have „duplicated‟ or „copy and paste‟ code Break it up into small functions that you understand Make sure you understand the limitations of the functions you call
  • 26. Unit Test Break your program up into small, separate functions Write tests that call that function and make sure it does what you expect. Lots of ways to do this ◦ Use something like cpptest ◦ ArduinoUnit ◦ Write your own test program
  • 27. Unit Test Example In the bad APRS program Turn metres to feet code into a separate function: int m_to_f(int m) assertEquals(m_to_f(1000),3300) assertEquals(m_to_f(2000),6600) assertEquals(m_to_f(3000),9900) assertEquals(m_to_f(4000),13200) assertEquals(m_to_f(5000),16500) assertEquals(m_to_f(6000),19800) assertEquals(m_to_f(7000),23100) assertEquals(m_to_f(8000),26400) assertEquals(m_to_f(9000),29700) assertEquals(m_to_f(10000),33000)
  • 28. Write Log Files Write detailed log files to non-volatile memory for post flight debugging Data sent via RTTY or APRS is limited Log exceptions and errors in detail Make sure you have a timestamp
  • 29. Perform system testing Test your entire system before flight ◦ Put your tracker in the garden ◦ Get a GPS lock ◦ Listen to the RTTY on your radio ◦ Look at the decoded RTTY on your computer ◦ Test uploaded data on the tracker* ◦ *I didn‟t do that step, on the day people had to fix the tracker for me.