The Anatomy of a
Large-Scale
Human-Computation
Engine

Shailesh Kochhar, Stefano Mazzocchi, Praveen Paritosh



Freebase A...
1: Freebase & Human Computation
           2: Example – Stanford Library
                      3: RABJ
                   ...
Freebase
               Structured database
     12 MM entites, 300 MM triples/facts




Aug 18, 2010         Freebase Mee...
Where does the data come from?




Aug 18, 2010      Freebase Meetup
Community contributions
                  Mass Data Loads




Aug 18, 2010           Freebase Meetup
Human Judgments Improve Both




Aug 18, 2010      Freebase Meetup
Community
    Simplify contribution through games




Aug 18, 2010      Freebase Meetup
http://typewriter.freebaseapps.com/
Aug 18, 2010              Freebase Meetup
Community
    Simplify contribution through games
           Enable QA for Gridworks loads




Aug 18, 2010          Freeb...
Aug 18, 2010   Freebase Meetup
Mass Data Loads
        Precision: QA for >99% accuracy




Aug 18, 2010        Freebase Meetup
Book Edition QA




Aug 18, 2010        Freebase Meetup
Mass Data Loads
        Precision: QA for >99% accuracy
         Coverage: Manual reconciliation




Aug 18, 2010         ...
matchmaker




               http://matchmaker2.freebaseapps.com/
Aug 18, 2010               Freebase Meetup
1: Freebase & Human Computation
           2: Example – Stanford Library
                      3: RABJ
                   ...
Reconcile Stanford Library Catalog
               with freebase.com




Aug 18, 2010       Freebase Meetup
Stanford Library Catalog
                  4.4MM book editions
               1.3MM English book editions
                ...
For freebase, identity is key
               match books, match authors




Aug 18, 2010             Freebase Meetup
Automatic matching insufficient
  Trained judges needed to decide hard
                     cases




Aug 18, 2010        ...
How to get this?




Aug 18, 2010       Freebase Meetup
RABJ
       Redundant Array of Brains in a Jar




Aug 18, 2010        Freebase Meetup
What?
                      Abstraction
               Powers human judgment (HJ)
                      applications
     ...
Provides primitive elements for more
               sophisticated applications




Aug 18, 2010            Freebase Meetup
Questions
               Judgments
                Queues
                Agents



Aug 18, 2010    Freebase Meetup
Design Constraints




Aug 18, 2010        Freebase Meetup
Content-agnostic
                Dynamic data
                 Low latency




Aug 18, 2010       Freebase Meetup
Architecture




Aug 18, 2010     Freebase Meetup
Questions contain metadata, pointers
                  to dynamic content

               Questions added to queues

     ...
Acre applications pull questions from
                           RABJ

  RABJ matches judge to available tasks

          ...
Declarative consensus
 Yes: 3, No: 3, Skip: 4, Invalid: 3, Max: 6

   RABJ notifies agents when consensus
                ...
Scale




Aug 18, 2010   Freebase Meetup
2.3 MM questions
               3.1 MM judgments
                 500+ queues
               20+ applications



Aug 18, 2...
1: Freebase & Human Computation
           2: Example – Stanford Library
                      3: RABJ
                   ...
Always have leftovers




Aug 18, 2010          Freebase Meetup
Perfect Consensus? Not!
Aug 18, 2010           Freebase Meetup
Evaluating QAers




Aug 18, 2010       Freebase Meetup
Explore
         http://rabj.freebaseapps.com/explorer
                        Create
     http://wiki.freebase.com/wiki/R...
Questions?




Aug 18, 2010    Freebase Meetup
Upcoming SlideShare
Loading in...5
×

Rabj freebase all

1,037

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,037
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Rabj freebase all

  1. 1. The Anatomy of a Large-Scale Human-Computation Engine Shailesh Kochhar, Stefano Mazzocchi, Praveen Paritosh Freebase August Meetup
  2. 2. 1: Freebase & Human Computation 2: Example – Stanford Library 3: RABJ 4: Consensus Aug 18, 2010 Freebase Meetup
  3. 3. Freebase Structured database 12 MM entites, 300 MM triples/facts Aug 18, 2010 Freebase Meetup
  4. 4. Where does the data come from? Aug 18, 2010 Freebase Meetup
  5. 5. Community contributions Mass Data Loads Aug 18, 2010 Freebase Meetup
  6. 6. Human Judgments Improve Both Aug 18, 2010 Freebase Meetup
  7. 7. Community Simplify contribution through games Aug 18, 2010 Freebase Meetup
  8. 8. http://typewriter.freebaseapps.com/ Aug 18, 2010 Freebase Meetup
  9. 9. Community Simplify contribution through games Enable QA for Gridworks loads Aug 18, 2010 Freebase Meetup
  10. 10. Aug 18, 2010 Freebase Meetup
  11. 11. Mass Data Loads Precision: QA for >99% accuracy Aug 18, 2010 Freebase Meetup
  12. 12. Book Edition QA Aug 18, 2010 Freebase Meetup
  13. 13. Mass Data Loads Precision: QA for >99% accuracy Coverage: Manual reconciliation Aug 18, 2010 Freebase Meetup
  14. 14. matchmaker http://matchmaker2.freebaseapps.com/ Aug 18, 2010 Freebase Meetup
  15. 15. 1: Freebase & Human Computation 2: Example – Stanford Library 3: RABJ 4: Consensus Aug 18, 2010 Freebase Meetup
  16. 16. Reconcile Stanford Library Catalog with freebase.com Aug 18, 2010 Freebase Meetup
  17. 17. Stanford Library Catalog 4.4MM book editions 1.3MM English book editions 1.2MM English books 600K authors Aug 18, 2010 Freebase Meetup
  18. 18. For freebase, identity is key match books, match authors Aug 18, 2010 Freebase Meetup
  19. 19. Automatic matching insufficient Trained judges needed to decide hard cases Aug 18, 2010 Freebase Meetup
  20. 20. How to get this? Aug 18, 2010 Freebase Meetup
  21. 21. RABJ Redundant Array of Brains in a Jar Aug 18, 2010 Freebase Meetup
  22. 22. What? Abstraction Powers human judgment (HJ) applications 3.1MM judgments in 16 months Aug 18, 2010 Freebase Meetup
  23. 23. Provides primitive elements for more sophisticated applications Aug 18, 2010 Freebase Meetup
  24. 24. Questions Judgments Queues Agents Aug 18, 2010 Freebase Meetup
  25. 25. Design Constraints Aug 18, 2010 Freebase Meetup
  26. 26. Content-agnostic Dynamic data Low latency Aug 18, 2010 Freebase Meetup
  27. 27. Architecture Aug 18, 2010 Freebase Meetup
  28. 28. Questions contain metadata, pointers to dynamic content Questions added to queues Metadata allows slicing and dicing Aug 18, 2010 Freebase Meetup
  29. 29. Acre applications pull questions from RABJ RABJ matches judge to available tasks Acre renders question, sends judgment back Aug 18, 2010 Freebase Meetup
  30. 30. Declarative consensus Yes: 3, No: 3, Skip: 4, Invalid: 3, Max: 6 RABJ notifies agents when consensus is reached Aug 18, 2010 Freebase Meetup
  31. 31. Scale Aug 18, 2010 Freebase Meetup
  32. 32. 2.3 MM questions 3.1 MM judgments 500+ queues 20+ applications Aug 18, 2010 Freebase Meetup
  33. 33. 1: Freebase & Human Computation 2: Example – Stanford Library 3: RABJ 4: Consensus Aug 18, 2010 Freebase Meetup
  34. 34. Always have leftovers Aug 18, 2010 Freebase Meetup
  35. 35. Perfect Consensus? Not! Aug 18, 2010 Freebase Meetup
  36. 36. Evaluating QAers Aug 18, 2010 Freebase Meetup
  37. 37. Explore http://rabj.freebaseapps.com/explorer Create http://wiki.freebase.com/wiki/RABJ_Tutorial Reference http://wiki.freebase.com/wiki/RABJ_API/ Aug 18, 2010 Freebase Meetup
  38. 38. Questions? Aug 18, 2010 Freebase Meetup
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×