Successfully reported this slideshow.
Your SlideShare is downloading. ×

How to sustain a tool building community-driven effort

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 72 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to How to sustain a tool building community-driven effort (20)

Advertisement

More from Jordi Cabot (20)

Recently uploaded (20)

Advertisement

How to sustain a tool building community-driven effort

  1. 1. How to sustain a tool building community- driven effort. Experiences from the modeling trenches Jordi Cabot @JordiCabot – jordicabot.com
  2. 2. WHO AM I?
  3. 3. SOM Research Lab Software runs the world. Models run the software
  4. 4. Our mission We are interested in the broad area of systems and software engineering, especially promoting the rigorous use of software models and engineering principles in all software engineering tasks while keeping an eye on the most unpredictable element in any project: the people involved in it. Flickr/clement127
  5. 5. WHY AM I HERE?
  6. 6. Ask the organizers!
  7. 7. Building [modeling] tools since 2002
  8. 8. Why the f*** are people not using our tools? Why the f*** nobody is helping me to maintain them?
  9. 9. Open source tools
  10. 10. Being honest with your tools
  11. 11. KEY SUSTAINABILITY DIMENSIONS
  12. 12. Guess where Papyrus is at ...
  13. 13. Careful -> Side effects If there’s no tool, the language is invisible. If the tool gets abandoned, the language follows suit (ATL vs QVT)
  14. 14. UML with a drawing tool
  15. 15. Successful OSS tool/repo/spec Governing Optimizing Onboarding
  16. 16. Code Community
  17. 17. ONBOARDING
  18. 18. Onboarding users
  19. 19. Researcher vs Practitioners Happy with whatever you throw at them “Infinite” time (PhD Students) Used to low quality tools Need documentation, support, nice UIs… They are “spoiled” They don’t get why it’s so complicated for us to build good tools THEY WILL NOT PAY/HELP
  20. 20. Force researchers to use your tool • If the tool/lang/repo is a community effort you can force people to use it • Teaching is a good starting point (if you provide the teaching material) • Asking authors to provide the models as replicability package or artefact evaluation
  21. 21. Onboarding contributors (and keeping them!)
  22. 22. OSS is a matching market (markets where money is not the mainfactor)
  23. 23. OSS is not an exception
  24. 24. Facilitateon boarding: Importance of first impressions
  25. 25. Facilitating onborading is just a first step. You need to be more proactive
  26. 26. What’s in it for me? Learning Community recognition Citations Credibility Clients / Money
  27. 27. Onboarding contributors from volunteers
  28. 28. Goal vs Reality
  29. 29. Unpaid / Junior programmers • Should not be assigned any critical task • Need a lot of supervision • May end up generating useless code • Disappear shortly after (IP issues!) • Great for them, not so much for “you” E.g. in Xatkit interns do not contribute anything to the main repo. They work on “lab repos”
  30. 30. 1 organization – N repos • Research repos (“labs”) can eventually be part of the main tool (after rework) • They can also stay and be used as labs for the people taking the risk • Feedback from the tool can generate labs to come up with the right solution
  31. 31. Onboarding contributors from industry Unpopular opinion (1): You cannot get industrial users without a mature tool (docs, UI, support,…) Unpopular opinion (2): You cannot get a mature tool without an industrial contributor
  32. 32. Research-Industry collaboration models • Direct Transfer contracts • Industrial PhDs • Large Consortium projects (e.g. EU ECSEL) • Co-production models • Industrial Research Labs • … They all work (sometimes)
  33. 33. Industrialization triangle
  34. 34. A common problem
  35. 35. Our solution: entrepreneurship path If you can’t beat find them join them!
  36. 36. Commercial open source business model Release prototype as OSS Improve it to be used in a “kind of” real environment Kickstart a community Create commercial services / extensions Maybe some of you want to try this path?
  37. 37. Example: Xatkit.com (OSS chatbot dev platform) External DSL – Tree based External DSL – State machine Internal DSL Two DSLs First paper Real chatbots are not trees And replies are not just text Users don’t want to learn a new tool + many other useful feedback, e.g. target platforms, monitoring, trolls, bot generators
  38. 38. When to stop
  39. 39. Is this model really viable? • Evaluation of researchers should be more based on impact than on “bean counting” – All universities signed the DORA declaration but… • Researchers need to be taught business, marketing and financial skills – And must want to invest time on acquiring these skills • Not easy! (but none of the other collab models is that easy either)
  40. 40. Governance
  41. 41. Governance in every project MUST be explicit (TRANSPARENCY)
  42. 42. Benevolent Dictator for Life
  43. 43. ******
  44. 44. © Apple Records Power to the people
  45. 45. Governance should also be more democratic
  46. 46. ThemanynamesofDemocracy
  47. 47. ThemanynamesofDemocracy
  48. 48. The many aspects of governance Analysis and modeling of the governance in general programming languages. Canovas, Cabot. SLE 2019: 179-183
  49. 49. ******
  50. 50. A DSL for governance rules Enabling the Definition and Enforcement of Governance Rules in Open Source Systems. Javier Luis Cánovas Izquierdo, Jordi Cabot: ICSE (2) 2015: 505-514
  51. 51. Project myProject { Roles: Committers Deadlines: myDeadline : 7 days Rules: myMajorityRule : Majority { applied to Task when TaskReview people Committers range Present minVotes 3 deadline myDeadline } } All the proposals for new development tasks will be accepted or rejected in 7 days by the committers of the project. Verbalization
  52. 52. Optimization
  53. 53. Software Analysis
  54. 54. Community Health
  55. 55. Undertanding Community = Graph Analysis • Many types of graphs (e.g. Bipartite graphs) • Many types of properties – Micro-view (local properties) – Macro-view (global properties) – Meso-level (emerging properties) • Analysis at different levels and on different dimensions (e.g. non-code contributors!)
  56. 56. Build the right graph for yourpurpose
  57. 57. Label Usage
  58. 58. UserInvolvement
  59. 59. Bus Factor “Number of key developers who would need to be incapacitated (hit by a bus), to send the project into disarray that it would not be able to proceed”
  60. 60. 64.43% 12.58%
  61. 61. Betweenness & cia Useful to identify subcommunities and increase commnication
  62. 62. Nestedness -> Occasional contributors focus on the most frequently modified files. You still need to “force” people to work on the rest (typically: backend or legacy or “not cool” parts of the project) Online division of labour: emergent structures in Open Source Software Palazzi, Cabot, Cánovas, Solé-Ribalta &Borge-Holthoefer. Scientific Reports volume 9, Article number: 13890 (2019)
  63. 63. http://matt.might.net/articles/phd-school-in-pictures Andthere’smuch more... rich club ordering, small world behaviour, modularity…
  64. 64. FINAL THOUGHTS
  65. 65. Should I stay or should I go? • Little chances to succeed but GO for it • Worst-case scenario: many people from the community will learn a lot
  66. 66. jordi.cabot@icrea.cat @JordiCabot jordicabot.com

Editor's Notes

  • Thank you for the invitation. Happy to be here!
  • My team at som-research.uoc.edu
  • We have a soft side and this has helped us in trying also to understand the users of our tools
  • I’m one of those that only use feature models when writing a survey paper
  • Modeling tools we have developed
  • And as soon we started Building those tools, we started wonering these two questions

    Nobody means:
    Not the university
    Even less the evaluation agencies? Have you seen a “tool impact factor section in any evaluation form”? A good tool is equivalent to how many journals? <- the question nobody is answering

  • The legend is because we’re forced to abandon many of them
  • Let’s now get into what I’ve learned in the process and what I can recommend.

    But let’s take this presentation as an open discussion. Let’s all imagine we’re in the middle of a real coffee break and feel free to participate at any time

    This is not a recipe of actions but a collection of discussion points that I hope you’ll find interesting
  • In 2016 I gave a talk on the sustainability of Papyrus trying to prevent Papyrus from dying. Unfortunately they are more in a Zombie State right now. Mostly due to the lack of industrial suport but also due to the fact they were not listening to the users

  • ATL was THE model transformation Language. Now only researchers still use it

    The opposite is also true, if nobody cares about the Language then nobody cares about the tool...
  • UML situation is a combination of a complex language with a complex tool

    UML is not going to disappear, too big to fall, but clearly it-s not going in the right direction
  • The three aspects are interrelated and just a way to try to decompose the problem.

    They apply to any other OSS rtefact, being a tool, a repository or a Language spec

    By the way this is a great opportunity for interdisciplinary research!!!!
    But we cannot do it alone, we need powerful friends from political science, social science and ecology / complex systems.
    Complex systems study how parts of a system give rise to the collective behaviors of the system, and how the system interacts with its environment

    I’ll focus mostly on the onboarding one
  • People don’t choose a tool based on the quality of its code alone. It does it based also on the quality of the community (e.g. To get support)

  • Having industrial users is of course very interesting for all of us but they are challenging ones. So this is a decision you’ll need to make (and “pay the price”). Who are you targeting??

    They are very demanding but hardly ever will help you (we get often feature request from people that has interesting domain names in the email address but hardly ever got anything useful from them)


  • The tòpic is complex. It’s even the core work of one of the latest nobel prize award winners
  • Do not be naïve. You must be proactive.
  • Yes you also need to cover the basics

    if a project doesn’t make a good first impression, newcomers may wait a long time before giving it a second chance. Importance of good impression!
    Up-for-grabs and good-first-bugs are curated tasks specifically for new contributors

  • This is also a discussion
  • State that we don’t propose to stop all these models but to propose a new one
  • There is a cost in integrating something in the main tool!!! Be sure you need it

    WordPress uses this model -> they create plugins to experiment with new features they want to incorpórate. Then they decide whether the plugin stays as a plugin or gets merged in to the core

    We do the same at Xatkit

    Governance (to be discussed later) is key to decide each arrow
  • You could get some people from Innovation departments but not the real users
  • State that we don’t propose to stop all these models but to propose a new one
  • We still get feedback but it’s an indirect one
  • All these models share a common problem: they need to find the right company to work
  • 1) Release the prototype as OSS.
    2) Improve it to make it usable in real environments.
    3) Aim to get free users to kickstart a community.
    4) Try to get paying users by creating a commercial extension or services on top of the open-source core.

    Learning speeds up in steps 3 and 4



    In this journey you Will evaluate product-market fit, talk to users, test product under realistic conditions, …

    Of couse, then new problems pop up (Iintellectual property?)

  • Huge gap between our first paper and the current version of Xatkit thanks to the real feedback

    Also in your case, whatever you think it’s a good language could improve a lot if you manage to attract users (beyond your core community)
  • You can stop once you reach the plateau of diminishing returns (unless you actually want to go all the way to the end and créate the company)
  • It’s not for everybody
  • The typical reaction when I say this
  • And our second proposal is to have a closer look at the democracy models and see which ones could work best for open source. We have over 500 variants of democracy
  • Of course, once you choose one, you’ll need tool support to implement your democratic model

    Democracy doesnt’ mean there are no clear responsabilities. Or that you cannot operate in an effective way
  • So what are we proposing?
    2 things. First, to address transparency -> add to each project a governance.md file expliciting the governance rules of the project so that people know what to expect
  • Si és així es podria fins i tot automatitzar / assistir en la gestió del projecte. De fet tenim un plugin de Eclipse que via una eina que es diu Mylin es connecta a diversos issue and bug trackers per extreure aquesta informació, aplica les regles i actualitzar les issues.
  • Software Analysis is the tool we are going to use to understand what makes a project succeed. Key is to have the project itself as our target of study to learn about what it works and what it doesn’t <- New research field of software mining thanks to GitHub and its over > 30 M projects
  • Us posaré alguns exemples.
  • a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint sets {\displaystyle U} and {\displaystyle V} (that is, {\displaystyle U} and {\displaystyle V} are each independent sets) such that every edge connects a vertex in {\displaystyle U} to one in {\displaystyle V}. Vertex sets {\displaystyle U} and {\displaystyle V} are usually called the parts of the graph
  • Still, looking at raw community data is a mess. A good community analysis is not trivial to do.
  • Comment the meaning of the size of
  • Importància no només del codi sinó de la discussió al voltant del projecte, per exemple quines etiquetes es fan servir més
  • I qui s’encarrega de comentar-les / tancar-les (tipus “bus factor” però de la interacció amb els usuaris).
  • I will now show you three metrics that can be calculated on top of these graphs of data.
    1- Bus facotr
    Helps to assess the employee turnover risk
    Identify the key developers
    Measure the concentration of information
  • Positive evolution of WordPress vs Papyrus bus factor
  • Number of shortest paths that passes through a node. The more the higher betweenness centrality. It says if we lose nodes with high betweenness we fragment (or “delay”) the community
    Related to clustering / subcommunities / modular classes algorithms
    https://wiki.cs.umd.edu/cmsc734_09/index.php?title=Music_Artist_Collaborations_from_MusicBrainz
  • Only core people tackle the files nobody wants. Drive people to the files that nobody wants to modify

×