IMPROVING INDEXING EFFICIENCY & QUALITY:COMPARING A-B-ARBITRATE AND PEER REVIEW FAMILY HISTORY TECHNOLOGY WORKSHOP        ...
FAMILYSEARCH
FAMILYSEARCH INDEXING
A-B-ARBITRATE PROCESS (A-B-ARB)A                  ARBB
THE PROBLEM                  ScannedAmount                  Documents           Time
OUR APPROACH• Historical Data Analysis• Field Experiment comparing  quality control models
HISTORICAL DATA ANALYSIS• Quality (estimated based on A-B agreement) • Measures difficulty more than actual quality • Unde...
A-B AGREEMENT BY FIELD
A-B AGREEMENT BY LANGUAGE     1871 Canadian Census   English Language      French Language• Given Name: 79.8    • Given Na...
A-B AGREEMENT BY EXPERIENCE                       Birth Place: All U.S. Censuses B (novice ↔ expert)                      ...
A-B AGREEMENT BY EXPERIENCE                       Given Name: All U.S. Censuses B (novice ↔ expert)                       ...
A-B AGREEMENT BY EXPERIENCE                       Surname: All U.S. Censuses B (novice ↔ expert)                          ...
A-B AGREEMENT BY EXPERIENCE                       Gender: All U.S. Censuses B (novice ↔ expert)                          A...
A-B AGREEMENT BY EXPERIENCE   U.S. - English   Canada - English Mexico - Spanish   Canada - French
TIME & KEYSTROKE BY EXPERIENCE
TIME & KEYSTROKE OF ARB
A NEW APPROACH? (A-R-ARB)• Peer review model• Efficiency ++• Quality ?
PEER REVIEW PROCESS (A-R-ARB)A           R                       ARB                Already Filled In                     ...
FIELD EXPERIMENT• Develop Truth Set of 2,000 1930 Census  images• Use historical A-B-ARB data• Create new A-R-ARB dataset ...
DISCUSSIONIMPLICATIONS• Transition users from novice to expert• Recruit foreign language indexers• Intelligent matching ba...
QUESTIONS•   Derek Hansen (dlhansen@byu.edu)•   Jake Gehring (GehringJG@familysearch.org)•   Patrick Schone (BoiseBound@ao...
Upcoming SlideShare
Loading in …5
×

Improving Family Search Indexing Efficiency and Quality

837 views

Published on

RootsTech workshop presentation

Published in: Education, Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
837
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The goal of FamilySearch is to help people find their ancestors. It is a freely available resource that compiles information from databases from around the world. The LDS church sponsors it, but it can be used by anyone for free.
  • FamilySearch Indexing’s role is to transcribe text from scanned images so it is in a machine-readable format that can be searched. This is done by hundreds of thousands of indexers. [would be nice to include some background slides on FamilySearchIndexing].This is likely one of the largest crowdsourcing projects in the world.
  • The current quality control mechanism is called A-B-Arbitrate (or just A-B-ARB for short). In this process A and B index the document independently, and an experience arbitrator (ARB) reviews any discrepancies between the two.
  • Documents are being scanned at an increasing rate. If we are to benefit from these new resources we’ll need to keep pace with the indexing efforts.
  • A new approach based on peer review instead of independent indexing would likely improve efficiency, but its effect on quality is unknown. Anecdotal evidence suggests that peer reviewing may be twice as fast as indexing from scratch.
  • The model could include arbitration (ARB) or that step could be skipped if A-B results in high enough quality on its own.
  • Data is currently being collected for R and ARB. It should be done in a few weeks.
  • Combining humans and algorithms into the same process would allow Family Search to continue to improve machine learning algorithms based on millions of records.
  • Improving Family Search Indexing Efficiency and Quality

    1. 1. IMPROVING INDEXING EFFICIENCY & QUALITY:COMPARING A-B-ARBITRATE AND PEER REVIEW FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID
    2. 2. FAMILYSEARCH
    3. 3. FAMILYSEARCH INDEXING
    4. 4. A-B-ARBITRATE PROCESS (A-B-ARB)A ARBB
    5. 5. THE PROBLEM ScannedAmount Documents Time
    6. 6. OUR APPROACH• Historical Data Analysis• Field Experiment comparing quality control models
    7. 7. HISTORICAL DATA ANALYSIS• Quality (estimated based on A-B agreement) • Measures difficulty more than actual quality • Underestimates quality, since an experienced Arbitrator reviews all A-B disagreements • Good at capturing differences across people, fields, and projects• Time (calculated using keystroke-logging data) • Idle time is tracked separately, making actual time measurements more accurate • Outliers removed
    8. 8. A-B AGREEMENT BY FIELD
    9. 9. A-B AGREEMENT BY LANGUAGE 1871 Canadian Census English Language French Language• Given Name: 79.8 • Given Name: 62.7%• Surname: 66.4 • Surname: 48.8%
    10. 10. A-B AGREEMENT BY EXPERIENCE Birth Place: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)
    11. 11. A-B AGREEMENT BY EXPERIENCE Given Name: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)
    12. 12. A-B AGREEMENT BY EXPERIENCE Surname: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)
    13. 13. A-B AGREEMENT BY EXPERIENCE Gender: All U.S. Censuses B (novice ↔ expert) A (novice ↔ expert)
    14. 14. A-B AGREEMENT BY EXPERIENCE U.S. - English Canada - English Mexico - Spanish Canada - French
    15. 15. TIME & KEYSTROKE BY EXPERIENCE
    16. 16. TIME & KEYSTROKE OF ARB
    17. 17. A NEW APPROACH? (A-R-ARB)• Peer review model• Efficiency ++• Quality ?
    18. 18. PEER REVIEW PROCESS (A-R-ARB)A R ARB Already Filled In Optional?
    19. 19. FIELD EXPERIMENT• Develop Truth Set of 2,000 1930 Census images• Use historical A-B-ARB data• Create new A-R-ARB dataset by having new indexers review and arbitrate• Compare quality & efficiency• Qualitatively identify types of errors
    20. 20. DISCUSSIONIMPLICATIONS• Transition users from novice to expert• Recruit foreign language indexers• Intelligent matching based on expertise (in A-B-ARB &/or A-R-ARB)FUTURE POSSIBILITIES• Peer review by algorithms?• Initial indexing by algorithms?
    21. 21. QUESTIONS• Derek Hansen (dlhansen@byu.edu)• Jake Gehring (GehringJG@familysearch.org)• Patrick Schone (BoiseBound@aol.com)• Matthew Reid (matthewreid007@gmail.com)

    ×