Why Do Computational Scientists Trust Their So


Published on

A very informal talk I gave to Hausi Muller's group at UVic in June 2009.

I have included, without permission, slides from Daniel Hook's excellent presentation at SE-CSE 2009 (http://www.cs.ua.edu/~SECSE09/schedule.htm).

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Why Do Computational Scientists Trust Their So

    1. 1. Why do climate modellers trust their software? Jon Pipitone Advisor: Steve Easterbrook University of Toronto @Uvic, June 2009
    2. 2. This presentation <ul><li>Quick and dirty </li><ul><li>I'd prefer discussion over me just blabbering </li></ul><li>Tell a good story
    3. 3. Get feedback from you </li><ul><li>Approach is good?
    4. 4. What am I missing? </li></ul></ul>
    5. 5. &quot;If we knew what we were doing, it wouldn't be called research, would it?&quot; - attributed to Albert Einstein
    6. 6. What is climate modelling? A kind of computational science
    7. 7. What is computational science? <ul><li>“A scientific computing approach to gain understanding, mainly through the analysis of mathematical models implemented on computers” </li></ul>– Wikipedia, “computational science”
    8. 8. What is computational science? <ul><li>Computers and software are the lab equipment. Virtual laboratories.
    9. 9. Program outputs are the results of the experiment. </li></ul>
    10. 10. Scientific software development my focus
    11. 11. What is climate modelling? <ul><li>Climatologists build computer models of the climate to try to understand climate processes. </li></ul>
    12. 12. What is climate modelling? Source: Easterbrook, CUSEC'09 (Source: IPCC AR4, 2007)
    13. 13. What is climate modelling? Source: Easterbrook, CUSEC'09 (Source: IPCC AR4, 2007)
    14. 14. General Circulation Models Source: Easterbrook, CUSEC'09 © Crown Copyright
    15. 15. Scientific software development
    16. 16. Scientific software development
    17. 17. Verification and Validation <ul><li>Desk checking </li><ul><li>Informal unit test, some use of debuggers </li></ul><li>Science Review and Code Review </li><ul><li>Science review by project managers
    18. 18. Code review by designated code owners </li></ul></ul><ul><li>Continuous testing as Science Experiments </li></ul><ul><li>Automated test harness on main trunk </li><ul><li>JP: physical constraints </li></ul></ul><ul><li>Bit reproducibility (strong constraint)
    19. 19. Model-intercomparisons </li></ul>Source: Easterbrook, CUSEC'09
    20. 20. Basic Validation Steps <ul><li>Simulate stable climate (with no forcings) </li><ul><li>Models can produce climates with tiny changes in mean temperature but with seasonal and regional changes that mimic real weather. </li></ul><li>Reproduce past climate change </li><ul><li>When 20th century forcings are added model should match observations. </li></ul><li>Reproduce pre-historic climates </li><ul><li>Can model last ice age and advance of Sahara desert. </li></ul></ul>Source: Easterbrook, CUSEC'09
    21. 21. Validation Notes Source: Easterbrook, CUSEC'09 © Crown Copyright
    22. 22. Core problems with V&V
    23. 23. Validation notes Bit reproducibility Core problems with V&V
    24. 24. In other words, … <ul><li>This is science. It's difficult to specify concrete requirements beforehand.
    25. 25. What does this mean for software quality?
    26. 26. (note, we're not talking about model quality!)
    27. 27. How do we judge quality in scientific software? </li></ul>
    28. 28. Software Quality <ul><li>Software quality is a big concept with many facets, or “-ilities” </li><ul><li>e.g. reliability, modularity, customisability, … </li></ul><li>We're used to thinking of the quality of software as how well it is designed and how well it matches our requirements. </li></ul>
    29. 29. Measuring Quality: Defect Density <ul><li>Defect Density = # defects / LOC
    30. 30. Can we benchmark quality using defect density? </li><ul><ul><li>(It is the most common rough quality measure from what I've seen.) </li></ul></ul><li>Preliminary observation: defect density for climate models is lower than comparably-sized industrial projects. </li></ul>
    31. 31. Hadley “defect rates” Some comparisons: NASA Space shuttle: 0.1 failures/KLOC Best military systems: 5 faults/KLOC Worst military systems: 55 faults/KLOC Apache: 0.5 faults/KLOC XP: 1.4 faults/KLOC Hadley’s Unified Model: avg of 24 “bug fixes” per release avg of 50,000 lines edited per release  2 defects / KLOC make it through to released code <ul><ul><li>expected defect density in current version: 24 / 830,000 ≈ 0.03 faults/KLOC </li></ul></ul>Source: Easterbrook, CUSEC'09 ? ?
    32. 32. Few Defects Post-release <ul><li>Obvious errors: </li><ul><li>Model won’t compile / won’t run
    33. 33. Model crashes during a run
    34. 34. Model runs, but variables drift out of tolerance
    35. 35. Runs don’t bit-compare (when they should) </li></ul><li>Subtle errors (model runs appear “valid”): </li><ul><li>Model does not simulate the physical processes as intended (e.g. some equations / parameters not correct)
    36. 36. The right results for the “wrong reasons” (e.g. over-tuning)
    37. 37. Expected improvement not achieved </li></ul></ul>Source: Easterbrook, CUSEC'09
    38. 38. Measuring Quality: Defect Density So, Is climate modelling software really that good?
    39. 39. On Benchmarking <ul><li>Comparing defect rates is very subjective: </li></ul><ul><ul><li>Ultimately depends on testing strategy
    40. 40. When are we counting: pre- or post-release?
    41. 41. How do we factor in severity? </li><ul><li>Pareto law: 20% of the bugs cause 80% of the problems </li></ul><li>Bug type: A bug is* not a bug, across projects.
    42. 42. No standards in the literature </li></ul></ul>
    43. 43. What are we measuring? <ul><li>“Absolute values suck, we don't know what they mean” </li></ul><ul><ul><li>Guy from Software Quality workshop at ICSE </li></ul></ul><ul><li>We don't have an underlying theory of software quality yet </li></ul><ul><ul><li>i.e. how do all these “-ilities” relate and correspond to the world? </li></ul></ul>
    44. 44. We could ask ... <ul>What are the important aspects of quality for computational scientists? </ul>
    45. 45. We could ask ... <ul><li>What makes a piece of software good​ ?
    46. 46. What makes a piece of software bad ?
    47. 47. How do you know when you're done?
    48. 48. How do you train newcomers?
    49. 49. or...
    50. 50. When have you had to delay releasing due to a bug? Why?
    51. 51. Tell me the story behind these and other bugs... </li></ul>
    52. 52. Questions to ask... <ul>“ SW Quality Measurement: A Framework for counting problems and defects” (Florac, SEI: TR22.92) </ul>Finding Activity: What activity discovered the problem or defect? Finding Mode: How was the problem or defect found? Problem Type: What is the nature of the problem? If a defect, what kind? Criticality : How critical or severe is the problem or defect? Related Changes: What are the prerequisite changes? .... Why did the bug go unnoticed? Why is it important to have fixed this bug then? How was the bug fixed? Why is the fix appropriate?
    53. 53. Why study climate modellers? <ul><li>Socially relevant
    54. 54. Already have connections with CM groups
    55. 55. Preliminary data suggesting the quality of their models is high: </li></ul><ul><ul><li>What can we learn from them?
    56. 56. What can we teach them? </li></ul></ul><ul><li>A good example of well-established computational science. </li></ul>
    57. 57. My study: <ul><li>Why do climate modellers trust their code?
    58. 58. What do climate modellers do when coding to guarantee correctness?
    59. 59. What are their notions of quality wrt to code?
    60. 60. How can we benchmark computational scientists' code quality? </li></ul>
    61. 61. My study: <ul><li>Detailed analysis of defect density </li><ul><li>Pre- and post-release defect counts
    62. 62. Discover through bug reports and version control comments </li><ul><li>e.g. check-in comments with “fixed”, “bug #”, etc.. </li></ul><li>Defect density over releases (trends)
    63. 63. Breakout by defect types (but, what are they?)
    64. 64. Maybe: static fault density using automated tool
    65. 65. Examine several climate models (>3?) </li></ul></ul>
    66. 66. My study: <ul><li>Qualitative investigation </li><ul><li>Semi-structured interview of climate modellers
    67. 67. Use questions given previously to guide conversation
    68. 68. Investigate the story of a defect, judgement calls that were made.
    69. 69. ~5 defect stories per climate modelling centre
    70. 70. Cross-case analysis </li></ul></ul>
    71. 71. Outcomes <ul><li>Towards a theory of code quality for climate modelling (computational science?) software </li><ul><li>Empirical basis
    72. 72. Future: relevant quality benchmark </li></ul><li>Benchmarking statistics for climate modelling code </li><ul><li>Useful for climate modelling groups </li></ul><li>Learn from CS; Where can we can help? </li></ul>
    73. 73. Questions? <ul><li>How well did I present the background to the study?
    74. 74. … objective the study?
    75. 75. Issues with the study itself? </li><ul><li>no direct investigation of code quality, only problems </li><ul><li>To some extent through fault analysis
    76. 76. What would I look for? </li></ul></ul><li>Others? </li></ul>