UMich CI Days: Scaling a code in the human dimension

430 views

Published on

Presentation from 2013 UMich CyberInfrastructure Days.

Abstract: As scientists’ needs for computational techniques and tools grow, they cease to be supportable by software developed in isolation. In many cases, these needs are being met by communities of practice, where software is developed by domain scientists to reach pragmatic goals and satisfy distinct and enumerable scientific goals. We describe challenges encountered by communities of scientist developers, as well present techniques that have been successful in growing and engaging communities of practice, specifically in the yt and Enzo communities of computational astrophysics.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
430
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

UMich CI Days: Scaling a code in the human dimension

  1. 1. Scaling a Code in the Human Dimension Matthew Turk Columbia University
  2. 2. Barracuda
  3. 3. Barracuda Battery
  4. 4. Barracuda Cobra Battery
  5. 5. Barracuda Battery Cobra Quiver
  6. 6. Barracuda Battery Cobra Quiver Kangaroo
  7. 7. Barracuda Battery Cobra Quiver Kangaroo Mob
  8. 8. Barracuda Battery Cobra Quiver Kangaroo Mob Ancedote
  9. 9. What is yt?
  10. 10. astro-ph/1011.3514 astro-ph/1112.4482 yt-project.org
  11. 11. There are many simulation codes.
  12. 12. There are many simulation codes, but there is only one sky.
  13. 13. Fully-Supported Semi-Supported In-Progress Enzo FLASH Orion Nyx Raw Data Piernik Chombo Athena ART RAMSES GDF Cactus Gadget GAMER PENCIL
  14. 14. yt is designed to address physical, not computational, entities and questions.
  15. 15. yt is supposed to get out of the way.
  16. 16. from yt.mods import * pf = load("galaxy0030/galaxy0030") p = SlicePlot(pf, 2, "Density", "c", (200, "kpc")) p.show()
  17. 17. from yt.mods import * pf = load("galaxy0030/galaxy0030") p = SlicePlot(pf, 2, "Density", "c", (200, "kpc")) p.show() Load from disk, determine IO format, parse parameters, set up mesh, initialize IO, and create geometric objects.
  18. 18. from yt.mods import * pf = load("galaxy0030/galaxy0030") p = SlicePlot(pf, 2, "Density", "c", (200, "kpc")) p.show() Identify appropriate subregions of data, mask out overlapping data, convert to CGS, concatenate, pixelize, and return plot.
  19. 19. 10-12 g/cc
  20. 20. Low-Level Data Handling Objects and Physical Quantities Canned Analysis Tasks
  21. 21. (advanced) Low-Level Data Handling (intermediate) Objects and Physical Quantities (beginner) Canned Analysis Tasks
  22. 22. "Community"?
  23. 23. Traditional View of Scientific Development "Users" "Developers"
  24. 24. Most Scientific Development "Users" "Developers"
  25. 25. Community of Practice "Devusers"
  26. 26. "Developers" Inspection and verification Tracking modifications Sharing information Doing new and interesting things
  27. 27. "Users" Uncritical acceptance of code?
  28. 28. "Users" "These are the people we give the code to that don't care how it works."
  29. 29. Challenges
  30. 30. Academic Reward Structure
  31. 31. Academic Reward Structure de facto & de jure
  32. 32. de facto & de jure ○ Utilization of developed tools ○ Respect from community ○ Project involvements ○ Invitations and opportunities to speak
  33. 33. de facto & de jure ○ Funding ○ Publications ○ Citation count ○ Influence
  34. 34. Traditional astrophysics does not favor tool builders.
  35. 35. Chores Documentation, testing, outreach, infrastructure development.
  36. 36. Chores Tasks not fully-aligned with reward structure present great motivational challenges.
  37. 37. Co-opetition ( ○ Funding ○ Publications ○ Citation count ○ Influence )
  38. 38. The "citation economy" for community codes is broken, and this disproportionately impacts new and junior contributors.
  39. 39. The "citation economy" for community codes is broken, and this disproportionately impacts new and junior contributors. (It's bad for us, but even worse for infrastructure.)
  40. 40. How developer community engagement, cohesion, excitement and energy is affected by funded improvements remains unclear.
  41. 41. Feelings of “ownership” are tricky in academia, particularly for young researchers.
  42. 42. Feelings of “ownership” are tricky in academia, particularly for young researchers. How can progress, credit, and intellectual priority be balanced?
  43. 43. How do the relationships between distributed software developers interact with academic relationships?
  44. 44. Strategies
  45. 45. Design the community you want.
  46. 46. Design the community you want. Diversity. Tone. Enthusiasm. Congeniality.
  47. 47. Design the community you want. This is an investment.
  48. 48. Design the community you want. Diversity. Tone. Enthusiasm. Congeniality.
  49. 49. Growing diversity in scientific software development is a key challenge.
  50. 50. Technical & Social
  51. 51. Technical & Social
  52. 52. Reduce barrier to entry Test on every push Review every changeset
  53. 53. Reduce barrier to entry Test on every push Review every changeset Everything comes in the box: version control, extensions, sample data, dependencies, and tutorials.
  54. 54. CVCS DVCS Repo Repository Repo Repo Repo Individuals Repo Repo
  55. 55. Reduce barrier to entry Test on every push Review every changeset Shining Panda for unit tests & small data answer tests, ReadTheDocs.org, and an auto-deployed ReST blog.
  56. 56. Reduce barrier to entry Test on every push Review every changeset Pull requests and mentoring of new developers, through IRC, mailing list, and code comments.
  57. 57. An upstream path...
  58. 58. Happy Application Itch-Scratching Pull Request Submission Code Review & Mentoring Participation
  59. 59. Happy Application Itch-Scratching Pull Request Submission Code Review & Mentoring Participation
  60. 60. Accept contributions of data, scripts, images, projects
  61. 61. Communication All project business is conducted openly.
  62. 62. Immediate
  63. 63. Immediate
  64. 64. Immediate
  65. 65. Low-Latency
  66. 66. High-Latency
  67. 67. Technical & Social
  68. 68. Culture self-propagates. So, it must be seeded directly.
  69. 69. The traditional "Open Source" community is not always the right role model.
  70. 70. The traditional "Open Source" community is not always the right role model. Diversity. Tone. Enthusiasm. Congeniality.
  71. 71. Foster a community of peers, not a community of elites.
  72. 72. H R T
  73. 73. H Fitzpatrick & Collins-Sussman R T
  74. 74. Humility R T
  75. 75. Humility Respect T
  76. 76. Humility Respect Trust
  77. 77. Humility
  78. 78. I think there might be a bug in ... It's like that for a good reason. Don't touch it.
  79. 79. I think there might be a bug in ... It behaves that way because ...
  80. 80. Respect
  81. 81. I've noticed something is acting strangely with ... You're probably doing it wrong.
  82. 82. I've noticed something is acting strangely with ... Can you tell us how you'd expect it to act?
  83. 83. Trust
  84. 84. Letting go...
  85. 85. By emphasizing pride over ownership, we've tried to help projects move between people without smothering through control.
  86. 86. By emphasizing pride over ownership, we've tried to help projects move between people without smothering through control. This is really difficult.
  87. 87. Successes
  88. 88. Developed by working astrophysicists.
  89. 89. ! Developed by working astrophysicists.
  90. 90. Usage on XSEDE Nautilus Szczepanski et al, 2012
  91. 91. Moving beyond computational astrophysics.
  92. 92. Moving beyond astrophysics.
  93. 93. Our first major litmus test, a 3.0 release with major infrastructure overhauls, approaches.
  94. 94. "... it seems likely that significant software contributions to existing scientific software projects are not likely to be rewarded through the traditional reputation economy of science. Together these factors provide a reason to expect the over-production of independent scientific software packages, and the underproduction of collaborative projects in which later academics build on the work of earlier ones." Howison & Herbsleb (2011)
  95. 95. First Workshop on Sustainable Software for Science: Practices and Experiences SC13: This Sunday! wssspe.researchcomputing.org.uk
  96. 96. Thank you.
  97. 97. (Short) Bibliography "What Makes Computational Open Source Libraries Successful?" by Bangerth & Heister "The Art of Community" by Jono Bacon "Producing Open Source Software" by Karl Fogel "Team Geek" by Brian Fitzpatrick & Ben Collins-Sussman "Organizing Simulation Code Collectives" by Mikaela Sundberg "Scientific Software Production" by James Howison & James Herbsleb "Your Community is your Best Feature" by Gina Trapani "The Proof of the Pudding" by John Allsopp "Standing Out in the Crowd" by Skud "Why you should write buggy software" by Brian Granger
  98. 98. Tom Abel Kenza Arraki David Collins Brian Crosby Andrew Cunningham Hilary Egan Nathan Goldbaum William Grey Markus Haider Cameron Hummels Christian Karch Steffen Klemer Kacper Kowalik Mike Kuhlen Eve Lee Sam Leitner Yuan Li Chris Malone Josh Moloney Chris Moody Andrew Myers Jill Naiman Kaylea Nelson Jeff Oishi Jean-Claude Passy Mark Richardson Thomas Robitaille Anna Rosen Doug Rudd Anthony Scopatz Devin Silvia Sam Skillman Stephen Skory Britton Smith Geoffrey So Casey Stark Elizabeth Tasker Stephanie Tonnesen Sebastian Trujillo-Gomez Matthew Turk Rick Wagner Andrew Wetzel John Wise John ZuHone

×