Building
   crowdsourcing
    applications
Simon Willison - simonwillison.net - @simonw
          @media - 9th June 2010
Crowdsourcing?
Let me just cop to the fact that
“crowdsourcing” is a stupid buzzword. But
like “blog” before it, sometimes it’s the stupid
term that sticks. For my purposes, it means
collaborating with the people who used to be the
silent audience to make something better than you
could make alone. - Derek Powazek

http://powazek.com/posts/2443
Accuracy
                                     Game mechanics
             Psychology
                            Real-time          Copywriting
 Usability

             Crowdsourcing
     Visual design                Incentives       Moderation

       Ethics             Write-heavy
                                          Competition
                     Statistics
Legal liability
Accuracy
                                     Game mechanics
             Psychology
                            Real-time          Copywriting
 Usability

             Crowdsourcing
     Visual design                Incentives       Moderation

       Ethics             Write-heavy
                                          Competition
                     Statistics
Legal liability
Accuracy
                                     Game mechanics
             Psychology
                            Real-time          Copywriting
 Usability

             Crowdsourcing
     Visual design                Incentives       Moderation

       Ethics             Write-heavy
                                          Competition
                     Statistics
Legal liability
Accuracy
                                     Game mechanics
             Psychology
                            Real-time          Copywriting
 Usability

             Crowdsourcing
     Visual design                Incentives       Moderation

       Ethics             Write-heavy
                                          Competition
                     Statistics
Legal liability
Accuracy
                                     Game mechanics
             Psychology
                            Real-time          Copywriting
 Usability

             Crowdsourcing
     Visual design                Incentives       Moderation

       Ethics             Write-heavy
                                          Competition
                     Statistics
Legal liability
Accuracy
                                     Game mechanics
             Psychology
                            Real-time          Copywriting
 Usability

             Crowdsourcing
     Visual design                Incentives       Moderation

       Ethics             Write-heavy
                                          Competition
                     Statistics
Legal liability
Examples
OpenStreetMap
Google Image Labeler
ScenicOrNot
XKCD colour survey
Crowdsourcing at
  the Guardian
The Blair
Rich Project
MP’s expenses v1
http://mps-expenses.guardian.co.uk/
Background

June 2009

450,000 pages of expenses documents released

“Transparency” = dodgy scanned PDFs

One week notice - so one week to build it!
Stuff that worked

The progress bar

Photos of the MPs

Releasing a small group of documents at first

Score boards (once we finally added them)

  Especially the “top in last 48 hours” one
Stuff that didn't

Releasing everything else at once

Asking the wrong questions

  Line items!

Too much time fighting scalability fires

Reporting tools were 24 hours too late
Contributors
total users




                    date
Votes per user
users




             number of votes cast
MP’s expenses v2
http://mps-expenses2.guardian.co.uk/
Background

December 2009

Smaller number of documents

One weeks notice (again)

Opportunity to learn from our earlier mistakes
Goals
Find stuff our journalists cared about

Less boring data entry

Data coming out again from the start

Visible rewards for contributors

More digestible tasks

Better sense of activity by other people
Lessons learned

Use Redis for random selections, not MySQL

Assignments made a huge improvement

The most important logic in a crowdsourcing
system is the next thing to review button

“Oldest first” pagination is critical
WildlifeNearYou.com
/dev/fort
Where’s my
nearest llama?
Lessons learned

Be flexible: your users may not share your
precise goals

Optimise for the fat head of your user base

Expose recent activity to site staff

Users will do almost anything for a medal!
Final thoughts

Don’t be afraid: even flawed crowdsourcing
systems produce fascinating results

Think hard about the questions you ask

Have a minimal barrier to entry

Get the next task logic right. Seriously.
Thank you


http://simonwillison.net/

http://twitter.com/simonw

http://simonwillison.net/tags/crowdsourcing/

Building crowdsourcing applications