Crowdsourcing Documentation in Software Engineering
in Software Engineering
Margaret-Anne (Peggy) Storey
ICSE 2014 1st International Workshop on
Crowdsourcing in Software Engineering
Fernando Figueira Filho
Chris Parnin, Georgia Tech
Ohad Barzilay, Tel-Aviv University, Israel
Arie van Deursen, TU Delft, the Netherlands
Li-Te Cheng, IBM Research
Ian Bull, Eclipsesource
“Documentation is the castor oil of
Gerald Weinberg, Psychology of Computer
Documentation to capture…
Scenarios of use
Examples of use
To replace communication
To specify a contract with partners
To provide organizational memory
To seek feedback
For the public good! [Wasko et al.]
Audience and “fit for purpose”
Consistent use of terminology
Explicit versus tacit knowledge
Lack of good examples
“…obtaining needed services, ideas, or content by soliciting
contributions from a large group of people, and especially from
an online community, rather than from traditional employees or
suppliers… the work comes from an undefined public rather
than being commissioned from a specific, named group…
Explicit crowdsourcing lets users work together to evaluate, share
and build different specific tasks, while implicit crowdsourcing
means that users solve a problem as a side effect of something
else they are doing.” [Wikipedia, June 1, 2014]
Community versus crowd
Individual or team contributions
(e.g. design documents, podcasts)
Community contributions: created by a few
(e.g. translation efforts)
Crowdsourcing contributions: many small
contributions that add value
(e.g. views, likes, comments, tags, votes)
Social production [Yochai Benkler]
Industrial revolution, high costs to access broadcast media
Low cost distributed small contributions at scale
Not just turning levers but adding wisdom, creativity
Not a fad!
Critical long term shift caused by the internet
Social media as a disruptive force:
an enabler for crowdsourcing
Enhancing the participatory culture in
software development and in software
Storey, M.-A., L. Singer, F. Figueira Filho, B. Cleary and A. Zagalsky,
The (R)evolutionary Role of Social Media in Software Engineering,
ICSE 2014 Future of Software Engineering Track), Hyderabad, 2014.
Social Media Channels for
Outline of the rest of this talk
Some insights on how social media channels
can support “crowdsourced”
documentation in software development
Wikis and software documentation
Used extensively (requirements, design,
planning), integrated with many tools
lack of authoritativeness
[Dagenais and Robillard FSE 2010]
Designed by Ward Cunningham in 1994
How does tagging help with crowdsourced
TagSEA: Tagging Waypoints
in source code and gathering into Tours
M.-A. Storey, J. Ryall, J. Singer, D. Myers, L.-T. Cheng, M. Muller, 2009.
How Software Developers Use Tagging to Support Reminding and Refinding. IEEE
Transactions on Software Engineering (TSE), 2009.
Studied introduction and adoption of tags by
several teams for work items
C. Treude and M.-A. Storey. Work Item Tagging: Communicating Concerns in
Collaborative Software Development. In IEEE Transactions on Software Engineering 38, 1
(January/February 2012). pp. 19-34.
– Categorization (cross cutting concerns, see also
Martin Robillard’s Feat tool)
– Finding and refinding
Treude, C., and M.-A. Storey, Concernlines: A timeline view of co-occurring concerns,
formal research demonstration, IEEE ICSE’09.
Software engineers tweet actively (share) facts about
software engineering topics and technology
G. Bougie, J. Starke, M.-A. Storey and D. German. Towards Understanding Twitter Use in Software
Engineering: Preliminary Findings Ongoing Challenges and Future QuestionsIn Proceedings of the
2nd International Workshop on Web 2.0 for Software Engineering. 2011.
“It was evolving way faster than I was
able to keep up with it. And the only
way to keep up was to follow some
Node.js people on Twitter.”
Leif Singer, Fernando Figueira Filho, Margaret-Anne Storey.
Software Engineering at the Speed of Light: How Developers Stay Current Using Twitter ICSE 2014.
Determining requirements through blogs
[Park and Maurer, CHASE 2009]
How developers blog: high-level concept
discussion and requirements
[Pagano and Maalej, MSR 2011]
Blogs play a role in documenting APIs
[Treude and Parnin, Web2SE 2011]
Is there potential to increase the size of the
Blogging crowd for software documentation?
Question and Answer
What role do Question and Answer websites
play in documentation?
Over 92% of the questions on
Stackoverflow are answered, and for those
92% the median answer time is 11 minutes
L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann.
Design lessons from the fastest q&a site in the west. CHI 2011.
How-to questions prevalent, and used frequently
C. Treude, O. Barzilay and M.-A. Storey. How do Programmers Ask and Answer
Questions on the Web? NIER/ICSE 2011.
Linking Stackoverflow data with
C. Parnin, C. Treude, L. Grammel and M.-A. Storey.
Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack
Overflow”. Under submission, blogged (50,000 hits) at http://blog.ninlabs.com/2012/05/crowd-
documentation/ May 2012.
Stackoverflow as Crowd Documentation
Coverage of API documentation: 77% of the
Java API classes & 87% of Android API classes
Speed of coverage:
Documentation! But also …
Reputation: Improves their online persona
Dedication to helping others
“What I wish I had known when I started”
“Throw it up on the internet and forget about it”
Many projects use videos to support documentation
and onboarding (e.g. MSDN) so…
How can they be improved for the recipient?
How effective are videos at sharing tacit knowledge?
Tool enhancements? Integration with IDE?
Cheng, L.-T., M. Desmond and M.-A. Storey, “Presentations by Programmers for
Programmers”, ICSE 2007, IEEE 29th International Conference on Software Engineering.
Is this crowdsourcing?
Are code walkthroughs on YouTube effective?
How much do the social features matter?
A social platform for crowd input for video
Stores code and project resources
Provides version control
Hosts web pages
Links to communication tools
C. Treude and M.-A. Storey. Effective Communication of Software Development
Knowledge Through Community Portals. ESEC/FSE ’11.
Implications of different media
Content on wikis is often stale, but useful for
posting information quickly
Blog posts create more buzz or fanfare
Official product documentation is trusted
(review it carefully or rely on the crowd?)
Have an updating process (or crowdsource it?)
Have mechanisms to solicit feedback
(e.g. commenting, blog posts, voting)
Social Media Channels to
support Software Documentation
Documentation challenges revisited
Recommenders to aid in discoverability
Keeping up: leverage the crowd
Incentive: participatory culture
Video and podcasts for tacit knowledge
Mining of social media can point to code
examples (implicit mechanism)
When does a community become a crowd?
Gaps and nichification?
Study other portals, hubs?
Do these mechanisms translate to industry?
What do you see as challenges, opportunities for
involving the crowd?
Funded by NSERC/DRDC/IBM
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.