This document summarizes a study on quality control mechanisms for crowdsourcing at FamilySearch Indexing. It finds that experienced workers are faster and more accurate than novices. Peer review is nearly as effective as arbitration at maintaining quality, while being more efficient. For some fields, context or language skills are needed. Overall, peer review with expert routing shows promise as an effective quality control method.
The document discusses different types of social media tools including blogs, wikis, threaded conversations, social networking sites, and social sharing tools. It describes the key features and goals of each tool. For example, blogs aim to support time-sensitive content and reactions from readers, wikis allow community-authored content to be edited by many, and social networking sites help people connect and maintain social relationships. The document also covers considerations for using social media and predicts continued innovation in new forms of social interaction using mobile devices.
Slides from a presentation I gave to the BYU Computer Science faculty and research students in March, 2012. Summarizes my own work on Technology-Mediated Social Participation. Follow hyperlinks on images to get to various papers (in second half of presentation).
Odd Leaf Out (IEEE Social Computing 2011)Derek Hansen
Describes paper presented at IEEE Social Computing 2011 conference on novel "serious game" called Odd Leaf Out to identify errors in classified image sets. See http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2011-17 for the paper.
This document discusses different types of information sources and strategies for conducting effective searches. It identifies primary sources as firsthand evidence like correspondence, diaries, and interviews. Secondary sources contain interpreted or analyzed information like textbooks, reviews, and articles. Search strategies include using keywords, Boolean operators, phrase searches, and truncation to retrieve relevant results. The document provides examples of applying these strategies to formulate a focused research question on student enrollment challenges in Caribbean tertiary institutions.
University of Utah Biomedical Informatics Seminar SlidesDerek Hansen
This document summarizes a presentation given by Derek Hansen on social participation in Health 2.0. It discusses opportunities and challenges with technology-mediated social participation, and gives examples like socio-technical systems. It also outlines research opportunities like developing tools to study social health data, examining systems from other domains, and testing novel interventions. The presentation notes computing conferences that welcome Health 2.0 work and ends with an invitation for questions and discussion.
La Hoguera de San Juan de Elche es una tradición festiva que se celebra cada 23 de junio, donde se construyen grandes hogueras y se queman muñecos para celebrar el solsticio de verano.
Slides from a presentation I gave at the ASIST annual conference based on the paper titled "Virtual Community Maintenance with a Collaborative Repository" (see http://www.si.umich.edu/~presnick/papers/asist07/hansen.pdf for a preprint).
The document provides instructions for using the Online Public Access Catalogue (OPAC) of the Campus Libraries to find information resources. It describes how to navigate and search the OPAC to find relevant books and other materials. Search options include basic, combination, collection, and course reserves searches. Boolean logic can be used to refine searches by combining terms with AND, OR, and NOT. The results should be evaluated based on authority, credibility, relevance, timeliness, and accuracy.
The document discusses different types of social media tools including blogs, wikis, threaded conversations, social networking sites, and social sharing tools. It describes the key features and goals of each tool. For example, blogs aim to support time-sensitive content and reactions from readers, wikis allow community-authored content to be edited by many, and social networking sites help people connect and maintain social relationships. The document also covers considerations for using social media and predicts continued innovation in new forms of social interaction using mobile devices.
Slides from a presentation I gave to the BYU Computer Science faculty and research students in March, 2012. Summarizes my own work on Technology-Mediated Social Participation. Follow hyperlinks on images to get to various papers (in second half of presentation).
Odd Leaf Out (IEEE Social Computing 2011)Derek Hansen
Describes paper presented at IEEE Social Computing 2011 conference on novel "serious game" called Odd Leaf Out to identify errors in classified image sets. See http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2011-17 for the paper.
This document discusses different types of information sources and strategies for conducting effective searches. It identifies primary sources as firsthand evidence like correspondence, diaries, and interviews. Secondary sources contain interpreted or analyzed information like textbooks, reviews, and articles. Search strategies include using keywords, Boolean operators, phrase searches, and truncation to retrieve relevant results. The document provides examples of applying these strategies to formulate a focused research question on student enrollment challenges in Caribbean tertiary institutions.
University of Utah Biomedical Informatics Seminar SlidesDerek Hansen
This document summarizes a presentation given by Derek Hansen on social participation in Health 2.0. It discusses opportunities and challenges with technology-mediated social participation, and gives examples like socio-technical systems. It also outlines research opportunities like developing tools to study social health data, examining systems from other domains, and testing novel interventions. The presentation notes computing conferences that welcome Health 2.0 work and ends with an invitation for questions and discussion.
La Hoguera de San Juan de Elche es una tradición festiva que se celebra cada 23 de junio, donde se construyen grandes hogueras y se queman muñecos para celebrar el solsticio de verano.
Slides from a presentation I gave at the ASIST annual conference based on the paper titled "Virtual Community Maintenance with a Collaborative Repository" (see http://www.si.umich.edu/~presnick/papers/asist07/hansen.pdf for a preprint).
The document provides instructions for using the Online Public Access Catalogue (OPAC) of the Campus Libraries to find information resources. It describes how to navigate and search the OPAC to find relevant books and other materials. Search options include basic, combination, collection, and course reserves searches. Boolean logic can be used to refine searches by combining terms with AND, OR, and NOT. The results should be evaluated based on authority, credibility, relevance, timeliness, and accuracy.
Improving Family Search Indexing Efficiency and QualityDerek Hansen
The document compares the traditional A-B-Arbitrate indexing quality control model to a proposed Peer Review model. It analyzes historical data on indexer agreement rates and task times. A field experiment was conducted using a truth set of 2,000 1930 Census records to directly compare the quality and efficiency of the two models. Preliminary results from the experiment and implications for improving indexing are discussed.
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
Prior research on automated duplicate issue report retrieval focused on improving performance metrics like recall rate. The author revisits experimental design choices from four perspectives: needed effort, data changes, data filtration, and evaluation process.
The thesis contributions are: 1) Showing the importance of considering needed effort in performance measurement. 2) Proposing a "realistic evaluation" approach and analyzing prior findings with it. 3) Developing a genetic algorithm to filter old issue reports and improve performance. 4) Highlighting the impact of "just-in-time" features on evaluation. The findings help better understand benefits and limitations of prior work in this area.
The document discusses quality evaluation of geospatial data. It summarizes the key successes of work package 8, which include utilizing open standards to define quality, developing a common understanding of what quality means for specifications and user requirements, and how to measure it. Automating quality evaluation services and providing quality results as metadata are also highlighted. Benefits of quality evaluation include early error detection, reduced costs, consistent procedures, improved analysis, and trusted data. The ESDIN approach to quality involves developing quality models, rulesets, templates and an object-oriented geospatial rules engine to automate quality evaluation.
The document provides details on assessing data integrity as part of a remediation plan. It discusses assessing historical data, the IT quality system, and computerized systems compliance. For historical data assessment, topics like the audit trail, analytical methods, and instrument calibration are evaluated. The IT quality system is assessed against regulatory requirements. Computerized systems are inventoried and evaluated for GxP impact, risk level, and compliance gaps. Remediation actions are prioritized based on risk. Weaknesses identified include lack of IT skills and documentation gaps. The plan is to close gaps in phases based on risk and severity over several years.
This presentation is about sharing my experience on how to use a Measurement System Analysis on a Lean Six Sigma project and, most importantly, how to interpret statistical results from Minitab.
7QC Tools Study Materials - LSSGB - Quality Control.pptxsboral2
This document provides an overview of statistical process control (SPC) and the 7 main quality control tools used in SPC. It discusses the history of SPC dating back to 1924 and Dr. Walter Shewart. It then defines key SPC terms and concepts. The rest of the document focuses on explaining each of the 7 quality control tools: check sheets, stratification, Pareto charts, cause-and-effect (fishbone) diagrams, histograms, control charts, and scatter diagrams. For each tool, it provides the purpose, when to use it, an example, and the benefits. The overall goal is to introduce the reader to SPC and how these tools can be used to monitor and improve process quality.
Evaluating Complex Systems: Strategies for Testing Systems You Can’t UnderstandUXPA International
The document discusses strategies for testing complex systems that usability professionals may not fully understand. It suggests learning as much as possible about the system through demos, training, observing users, and reviewing recordings. Tests should use focused, specific tasks identified with knowledgeable staff. Logistics like resetting the system and recording video need planning. Participants require minimal training, while observers are trained to facilitate tests and provide careful assistance. Analyzing data involves identifying real problems versus training issues with input from observers. While challenging, usability testing can provide insights for complex systems with the right strategies.
The Role Of The Sqa In Software Development By Jim ColemanJames Coleman
The document discusses the role of a quality analyst in software development. It defines key terms like quality assurance, verification, and validation. It also outlines different testing techniques like equivalence partitioning, boundary analysis, and error guessing that quality analysts use to test software. Finally, it discusses different types of testing like black box testing, white box testing, stress testing, and regression testing that quality analysts employ to ensure software quality.
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
This document outlines the steps in a Six Sigma DMAIC process improvement project. It includes defining critical metrics, measuring current performance, analyzing processes, improving processes through pilot tests, and controlling ongoing performance. Key steps are defining customer needs, measuring baseline performance, determining critical factors, piloting solutions, creating control systems, and finalizing documentation.
This document outlines the steps in a Six Sigma DMAIC process improvement project. It includes defining the problem and critical metrics, measuring current performance, analyzing processes and measurements, improving the processes, and controlling future performance. Key steps are defining critical metrics, establishing baselines, determining root causes, piloting solutions, creating control systems, and finalizing documentation.
The document discusses improving change history-based recommendations for software by accounting for transformations between versions. It proposes an Extended Change History-Based (ECHB) approach that detects transformations using method similarity and uses the original entity's history for recommendations. An evaluation compares the ECHB approach to a baseline Change History-Based (CHB) approach using the Eclipse project, finding the ECHB approach works better at the method-level and provides valid recommendations in more cases involving potential transformations, though CHB maintains higher precision and recall for a given input.
Interactive glosses provide annotations alongside text to help increase vocabulary acquisition, reading comprehension, and reduce frustration for English learners. Research shows that glosses with first language translations are more effective than second language definitions alone. Glosses work best for texts with more than 5% unfamiliar words and benefit most when combined with comprehension questions and explicit vocabulary study. A variety of glossing tools exist as browser extensions, immersive readers, and integrated K-12 learning solutions to augment English texts with these interactive annotations.
This document discusses analyzing social media networks using NodeXL. It defines social media and lists common types. It then covers key concepts in social network analysis including nodes, edges, metrics like centrality and density. NodeXL is introduced as a tool for visualizing and analyzing social networks from data collected from sources like personal emails, Twitter, forums and YouTube. Examples of social network analyses using NodeXL are provided such as mapping corporate email communication and identifying influencers on Twitter.
Exploring and Elevating Healthy Behaviors with Social Technologies (AAHB Conf...Derek Hansen
Talk given at the American Academy of Health Behavior conference in San Antonio Texas on exploring and elevating healthy behaviors with social technologies.
This document discusses technology-mediated social participation (TMSP) and opportunities for research in this area. It defines TMSP and provides examples. The document outlines two main goals for TMSP research: to improve the world and develop generalizable knowledge. It identifies opportunities to develop new theories and technologies to support and analyze TMSP systems. Methods are proposed to study TMSP, examine extraordinary socio-technical systems, and test novel interventions through field studies.
This document discusses analyzing social media networks using NodeXL. It defines social media and lists common types. It then provides an overview of social network analysis, including key concepts like nodes, edges, centrality measures, and network metrics. Various applications of social network analysis are presented, such as mapping email networks and events on Twitter. Finally, NodeXL is introduced as a tool for visualizing and analyzing social networks.
This document proposes a framework for designing reusable Alternate Reality Games (ARGs). It introduces ARGs and discusses challenges to reusability. The researchers conducted interviews with ARG experts and analyzed examples to develop a framework with three types of reuse: replayable, adaptable, and extensible. They then outline design strategies for each type, such as cyclical storytelling to allow replayability or location-based structures to enable adaptation. The document concludes by discussing implications and opportunities for future work, such as identifying design patterns and testing the framework.
Infrastructure for Supporting Computational Social ScienceDerek Hansen
This document discusses the need for infrastructure research to support computational social science. It notes current limitations with relying solely on corporate or third-party tools for data access and analysis. Specifically, these tools are not designed for research needs, duplication of effort is required, APIs are limited and changing, and maintaining third-party tools is challenging. The document proposes a large-scale collaborative solution involving data handling and processing, human-computer interaction, and legal/social considerations to better enable social science research. Collaboration with groups like CASCI and DSST is suggested.
Dr. Derek Hansen received a BA in Economics from BYU and a Ph.D in Information from the University of Michigan. He is currently an Assistant Professor at the University of Maryland iSchool and enjoys Ultimate Frisbee, racquet sports, chess, and guitar in his free time. The University of Maryland iSchool is now hiring.
The PLACE approach (Prototyping Location Activities and Collective Experience)Derek Hansen
The PLACE approach is a framework for low-fidelity prototyping of location-based social games that addresses challenges such as representing location, activities, and social experience. It involves users physically moving in the real world and completing prescribed and self-chosen activities using field pointers and a Wizard of Oz approach. Initial tests are done on a small scale, with findings informing later larger-scale sessions to iteratively refine the design. The goal is to authentically prototype key experiences like varied locations, motivations for different activities, and collective gameplay.
This document discusses using veiled viral marketing via social media to disseminate information about stigmatized illnesses. It describes how anonymity allows people to share information about sensitive topics more freely but can also enable unwanted behavior. The study presented tested sending veiled versus unveiled invitations on social media to inform people about HPV and found a relatively high acceptance rate, even for veiled invitations sent via email. However, veiled marketing also presents potential problems like spam and stress that require solutions to implement it responsibly. Further research is needed to understand why veiled invitations are accepted and test this approach with other health topics.
Improving Family Search Indexing Efficiency and QualityDerek Hansen
The document compares the traditional A-B-Arbitrate indexing quality control model to a proposed Peer Review model. It analyzes historical data on indexer agreement rates and task times. A field experiment was conducted using a truth set of 2,000 1930 Census records to directly compare the quality and efficiency of the two models. Preliminary results from the experiment and implications for improving indexing are discussed.
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
Prior research on automated duplicate issue report retrieval focused on improving performance metrics like recall rate. The author revisits experimental design choices from four perspectives: needed effort, data changes, data filtration, and evaluation process.
The thesis contributions are: 1) Showing the importance of considering needed effort in performance measurement. 2) Proposing a "realistic evaluation" approach and analyzing prior findings with it. 3) Developing a genetic algorithm to filter old issue reports and improve performance. 4) Highlighting the impact of "just-in-time" features on evaluation. The findings help better understand benefits and limitations of prior work in this area.
The document discusses quality evaluation of geospatial data. It summarizes the key successes of work package 8, which include utilizing open standards to define quality, developing a common understanding of what quality means for specifications and user requirements, and how to measure it. Automating quality evaluation services and providing quality results as metadata are also highlighted. Benefits of quality evaluation include early error detection, reduced costs, consistent procedures, improved analysis, and trusted data. The ESDIN approach to quality involves developing quality models, rulesets, templates and an object-oriented geospatial rules engine to automate quality evaluation.
The document provides details on assessing data integrity as part of a remediation plan. It discusses assessing historical data, the IT quality system, and computerized systems compliance. For historical data assessment, topics like the audit trail, analytical methods, and instrument calibration are evaluated. The IT quality system is assessed against regulatory requirements. Computerized systems are inventoried and evaluated for GxP impact, risk level, and compliance gaps. Remediation actions are prioritized based on risk. Weaknesses identified include lack of IT skills and documentation gaps. The plan is to close gaps in phases based on risk and severity over several years.
This presentation is about sharing my experience on how to use a Measurement System Analysis on a Lean Six Sigma project and, most importantly, how to interpret statistical results from Minitab.
7QC Tools Study Materials - LSSGB - Quality Control.pptxsboral2
This document provides an overview of statistical process control (SPC) and the 7 main quality control tools used in SPC. It discusses the history of SPC dating back to 1924 and Dr. Walter Shewart. It then defines key SPC terms and concepts. The rest of the document focuses on explaining each of the 7 quality control tools: check sheets, stratification, Pareto charts, cause-and-effect (fishbone) diagrams, histograms, control charts, and scatter diagrams. For each tool, it provides the purpose, when to use it, an example, and the benefits. The overall goal is to introduce the reader to SPC and how these tools can be used to monitor and improve process quality.
Evaluating Complex Systems: Strategies for Testing Systems You Can’t UnderstandUXPA International
The document discusses strategies for testing complex systems that usability professionals may not fully understand. It suggests learning as much as possible about the system through demos, training, observing users, and reviewing recordings. Tests should use focused, specific tasks identified with knowledgeable staff. Logistics like resetting the system and recording video need planning. Participants require minimal training, while observers are trained to facilitate tests and provide careful assistance. Analyzing data involves identifying real problems versus training issues with input from observers. While challenging, usability testing can provide insights for complex systems with the right strategies.
The Role Of The Sqa In Software Development By Jim ColemanJames Coleman
The document discusses the role of a quality analyst in software development. It defines key terms like quality assurance, verification, and validation. It also outlines different testing techniques like equivalence partitioning, boundary analysis, and error guessing that quality analysts use to test software. Finally, it discusses different types of testing like black box testing, white box testing, stress testing, and regression testing that quality analysts employ to ensure software quality.
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
This document outlines the steps in a Six Sigma DMAIC process improvement project. It includes defining critical metrics, measuring current performance, analyzing processes, improving processes through pilot tests, and controlling ongoing performance. Key steps are defining customer needs, measuring baseline performance, determining critical factors, piloting solutions, creating control systems, and finalizing documentation.
This document outlines the steps in a Six Sigma DMAIC process improvement project. It includes defining the problem and critical metrics, measuring current performance, analyzing processes and measurements, improving the processes, and controlling future performance. Key steps are defining critical metrics, establishing baselines, determining root causes, piloting solutions, creating control systems, and finalizing documentation.
The document discusses improving change history-based recommendations for software by accounting for transformations between versions. It proposes an Extended Change History-Based (ECHB) approach that detects transformations using method similarity and uses the original entity's history for recommendations. An evaluation compares the ECHB approach to a baseline Change History-Based (CHB) approach using the Eclipse project, finding the ECHB approach works better at the method-level and provides valid recommendations in more cases involving potential transformations, though CHB maintains higher precision and recall for a given input.
Interactive glosses provide annotations alongside text to help increase vocabulary acquisition, reading comprehension, and reduce frustration for English learners. Research shows that glosses with first language translations are more effective than second language definitions alone. Glosses work best for texts with more than 5% unfamiliar words and benefit most when combined with comprehension questions and explicit vocabulary study. A variety of glossing tools exist as browser extensions, immersive readers, and integrated K-12 learning solutions to augment English texts with these interactive annotations.
This document discusses analyzing social media networks using NodeXL. It defines social media and lists common types. It then covers key concepts in social network analysis including nodes, edges, metrics like centrality and density. NodeXL is introduced as a tool for visualizing and analyzing social networks from data collected from sources like personal emails, Twitter, forums and YouTube. Examples of social network analyses using NodeXL are provided such as mapping corporate email communication and identifying influencers on Twitter.
Exploring and Elevating Healthy Behaviors with Social Technologies (AAHB Conf...Derek Hansen
Talk given at the American Academy of Health Behavior conference in San Antonio Texas on exploring and elevating healthy behaviors with social technologies.
This document discusses technology-mediated social participation (TMSP) and opportunities for research in this area. It defines TMSP and provides examples. The document outlines two main goals for TMSP research: to improve the world and develop generalizable knowledge. It identifies opportunities to develop new theories and technologies to support and analyze TMSP systems. Methods are proposed to study TMSP, examine extraordinary socio-technical systems, and test novel interventions through field studies.
This document discusses analyzing social media networks using NodeXL. It defines social media and lists common types. It then provides an overview of social network analysis, including key concepts like nodes, edges, centrality measures, and network metrics. Various applications of social network analysis are presented, such as mapping email networks and events on Twitter. Finally, NodeXL is introduced as a tool for visualizing and analyzing social networks.
This document proposes a framework for designing reusable Alternate Reality Games (ARGs). It introduces ARGs and discusses challenges to reusability. The researchers conducted interviews with ARG experts and analyzed examples to develop a framework with three types of reuse: replayable, adaptable, and extensible. They then outline design strategies for each type, such as cyclical storytelling to allow replayability or location-based structures to enable adaptation. The document concludes by discussing implications and opportunities for future work, such as identifying design patterns and testing the framework.
Infrastructure for Supporting Computational Social ScienceDerek Hansen
This document discusses the need for infrastructure research to support computational social science. It notes current limitations with relying solely on corporate or third-party tools for data access and analysis. Specifically, these tools are not designed for research needs, duplication of effort is required, APIs are limited and changing, and maintaining third-party tools is challenging. The document proposes a large-scale collaborative solution involving data handling and processing, human-computer interaction, and legal/social considerations to better enable social science research. Collaboration with groups like CASCI and DSST is suggested.
Dr. Derek Hansen received a BA in Economics from BYU and a Ph.D in Information from the University of Michigan. He is currently an Assistant Professor at the University of Maryland iSchool and enjoys Ultimate Frisbee, racquet sports, chess, and guitar in his free time. The University of Maryland iSchool is now hiring.
The PLACE approach (Prototyping Location Activities and Collective Experience)Derek Hansen
The PLACE approach is a framework for low-fidelity prototyping of location-based social games that addresses challenges such as representing location, activities, and social experience. It involves users physically moving in the real world and completing prescribed and self-chosen activities using field pointers and a Wizard of Oz approach. Initial tests are done on a small scale, with findings informing later larger-scale sessions to iteratively refine the design. The goal is to authentically prototype key experiences like varied locations, motivations for different activities, and collective gameplay.
This document discusses using veiled viral marketing via social media to disseminate information about stigmatized illnesses. It describes how anonymity allows people to share information about sensitive topics more freely but can also enable unwanted behavior. The study presented tested sending veiled versus unveiled invitations on social media to inform people about HPV and found a relatively high acceptance rate, even for veiled invitations sent via email. However, veiled marketing also presents potential problems like spam and stress that require solutions to implement it responsibly. Further research is needed to understand why veiled invitations are accepted and test this approach with other health topics.
EventGraphs are network graphs that illustrate the social structure of discussions around events on social media. This document discusses EventGraphs, including how they are created in NodeXL and analyzed to understand the social structure and important discussants in event conversations. It provides examples of EventGraphs for conferences and discusses future work such as automated query expansion and integrating sentiment analysis.
This is a presentation that describes at a high level some of the work we've been performing related to NodeXL and it's use to understand social media networks.
1. QUALITY CONTROL MECHANISMS FOR
CROWDSOURCING: PEER REVIEW, ARBITRATION,
& EXPERTISE AT FAMILYSEARCH INDEXING
CSCW, SAN ANTONIO, TX
FEB 26, 2013
Derek Hansen, Patrick Schone, Douglas
Corey, Matthew Reid, & Jake Gehring
5. FSI in Broader Landscape
• Crowdsourcing Project
Aggregates discrete tasks completed by
volunteers who replace professionals (Howe,
2006; Doan, et al., 2011)
• Human Computation System
Humans use computational system to work on a
problem that may someday be solvable by
computers (Quinn & Bederson, 2011)
• Lightweight Peer Production
Largely anonymous contributors independently
completing discrete, repetitive tasks provided by
authorities (Haythornthwaite, 2009)
7. Quality Control Mechanisms
• 9 Types of quality control for human computation
systems (Quinn & Bederson, 2011)
• Redundancy
• Multi-level review
• Find-Fix-Verify pattern (Bernstein, et al., 2010)
• Weight proposed solutions by reputation of contributor
(McCann, et al., 2003)
• Peer or expert oversight (Cosley, et al., 2005)
• Tournament selection approach (Sun, et al., 2011)
9. Peer review process (A-R-RARB)
A R RARB
Already Filled In
Proposed Mechanism Optional?
10. Two Act Play
Act I: Experience Act II: Quality Control
What is the role of Is peer review or
experience on quality arbitration better in terms
and efficiency? of quality and efficiency?
Historical data analysis Field experiment using
using full US and 2,000 images from the
Canadian Census 1930 US Census Data &
records from 1920 and corresponding truth set
earlier
11. Act I: Experience
Quality is estimated based on A-B
agreement (no truth set)
Efficiency calculated using keystroke-
logging data with idle time and outliers
removed
13. A-B agreement by language
(1871 Canadian Census)
English Language French Language
Given Name: 79.8% Given Name: 62.7%
Surname: 66.4% Surname: 48.8%
14. A-B agreement by experience
Birth Place: All U.S. Censuses
B (novice ↔ experienced)
A (novice ↔ experienced)
15. A-B agreement by experience
Given Name: All U.S. Censuses
B (novice ↔ experienced)
A (novice ↔ experienced)
16. A-B agreement by experience
Surname: All U.S. Censuses
B (novice ↔ experienced)
A (novice ↔ experienced)
17. A-B agreement by experience
Gender: All U.S. Censuses
B (novice ↔ experienced)
A (novice ↔ experienced)
18. A-B agreement by experience
Birthplace: English-speaking Canadian Census
B (novice ↔ experienced)
A (novice ↔ experienced)
20. Summary & Implications of Act I
Experienced workers are faster and more
accurate, gains which continue even at high levels
- Focus on retention
- Encourage both novices & experts to do more
- Develop interventions to speed up experience
gains (e.g., send users common mistakes made
by people at their experience level)
21. Summary & Implications of Act I
Contextual knowledge (e.g., Canadian placenames)
and specialized skills (e.g., French language fluency)
is needed for some tasks
- Recruit people with existing knowledge & skills
- Provide contextual information when possible
(e.g., Canadian placename prompts)
- Don’t remove context (e.g., captcha)
- Allow users to specialize?
22. Act II: Quality Control
A-B-ARB data from original transcribers (Feb
2011)
A-R-RARB data includes original A data and
newly collected R and RARB data from
people new to this method (Jan-Feb of 2012)
Truth Set data from company with
independent audit by FSI experts
Statistical Test: mixed-model logistic
regression (accurate or not) with random
effects, controlling for expertise
23. Limitations
• Experience levels of R and RARB were
lower than expected, though we did
statistically control for this
• Original B data used in A-B-ARB for
certain fields was transcribed in non-
standard manner requiring adjustment
24. No Need for RARB
• No gains in quality from extra arbitration of
peer reviewed data (A-R = A-R-RARB)
• RARB takes some time, so better without
25. Quality
Comparison
• Both methods were
statistically better than A
alone
• A-B-ARB had slightly
lower error rates than A-R
• R “missed” more
errors, but also
introduced fewer errors
27. Summary & Implications of Act II
Peer Review shows considerable efficiency
gains with nearly as good quality as Arbitration
- Prime reviewers to find errors (e.g., prompt
them with expected # of errors on a page)
- Highlight potential problems (e.g., let A flag
tough fields)
- Route difficult pages to experts
- Consider an A-R1-R2 process when high quality
is critical
28. Summary & Implications of Act II
Reviewing reviewers isn’t always worth the time
- At least in some contexts, Find-Fix may not
need Verify
Quality of different fields varies dramatically
- Use different quality control mechanisms for
harder or easier fields
Integrate human and algorithmic transcription
- Use algorithms on easy fields & integrate into
review process so machine learning can occur
29. Questions
• Derek Hansen (dlhansen@byu.edu)
• Patrick Schone (BoiseBound@aol.com)
• Douglas Corey (corey@mathed.byu.edu)
• Matthew Reid (matthewreid007@gmail.com)
• Jake Gehring (GehringJG@familysearch.org)
Editor's Notes
The goal of FamilySearch.org is to help people find their ancestors. It is a freely available resource that compiles information from databases from around the world. The Church of Jesus Christ of Latter-Day Saints sponsors it, but it can be used by anyone for free.
FamilySearch Indexing’s role is to transcribe text from scanned images so it is in a machine-readable format that can be searched. This is done by hundreds of thousands of indexers, making it the world’s largest document transcription service. Documents include census records, vital records (e.g., birth, death, marriage, burial), church records (e.g., christening), military records, legal records, cemetery records, and migration records from countries around the globe.
As you can see, transcribing names from hand-written documents is not a trivial task, though a wide range of people are capable of learning to do it and no specialized equipment is needed. Nearly 400,000 contributors have transcribed records, with over 500 new volunteers signing up each day in the recent past. The challenges of transcription work make quality control mechanisms essential to the success of the project, and also underscore the importance of understanding expertise and how it develops over time.
Documents are being scanned at an increasing rate. If we are to benefit from these new resources we’ll need to keep pace with the indexing efforts.Thus, the goals of FSI are to (a) Index as many documents as possible, while (b) assuring a certain level of quality.
And there are others for more complex tasks that require coordination, such as those occurring on Wikipedia (e.g., Kittur & Kraut, 2008).Note that some of these are not mutually exclusive. Many have only been tested in research prototype projects, but not at scale. And others were not designed with efficiency in mind.
The current quality control mechanism is called A-B-Arbitrate (or just A-B-ARB or “arbitration” for short). In this process, person A and person B index the document independently, and an experience arbitrator (ARB) reviews any discrepancies between the two.
This is a proposed model that has not been tested until this study.The model could include arbitration (ARB) or that step could be skipped if A-B results in high enough quality on its own (see findings).
Quality is measured as agreement between independent coders in Act I. This is not true quality, but is highly correlated with high quality.Quality is measured against a truth set created by a company who assured 99.9% accuracy and was independently audited by expert FSI transcribers.Efficiency is measured in terms of “active” time spent indexing (after “idle time” was removed) and keystrokes as captured by the indexing program.
Quality (estimated based on A-B agreement)Measures difficulty more than actual qualityUnderestimates quality, since an experienced Arbitrator reviews all A-B disagreementsGood at capturing differences across people, fields, and projectsTime (calculated using keystroke-logging data)Idle time is tracked separately, making actual time measurements more accurateOutliers removed
Notice the high variation in agreement depending on how many options there are for a field to have (e.g., gender has only a couple options, while surname has many options)
This finding is likely due to the fact that most transcribers are English Speaking which suggests the need to recruit contributors who are native speakers of other languages
Experience is based on EL(U) = round(log5(N(U))) Where U represents the transcriber, N(U) is the number of images that U has transcribed, and EL(U) is the experience level of U.Rank Number of images transcribed0 11 5 2 25 3 125 4 625 5 3125 6 15625 7 78125 8 390625
There isn’t much improvement, since it’s an “easy” field to agree on. In other words, even novices are good.
Here there isn’t much improvement, but the overall agreement is low. This suggests that even experts are not good, likely because of unfamiliarity with Canadian placenames given the predominantly US indexing population. Remember, that expertise is based on all contributions, not just those in this category.
More experienced transcribers are much faster (up to 4 times faster) than inexperienced users. They also have fewer keystrokes (e.g., using help functionality; fixing mistakes…)Though not shown here, the paper shows how experienced indexer work also requires less time to arbitrate and fewer keystrokes.Furthermore, English-speaking 1871 Candadian Census were 2.68 seconds faster per line than the French version, even though French version required more keystrokes. Again, this is likely due to the fact that most transcribers are native English speakers.
2,000 random images including many fields (e.g., surname, county of origin, gender, age) for each of the 50 lines of data (which include a single row for each individual). Note that this is repeated measures data, since the same transcriber transcribes all 50 rows of an image in a “batch” and some people transcribe more than one page. We use a mixed-model to account for this.Because people performing R were new to this method and the system was not tuned to the needs of reviewers, the A-R-RARB data should be considered a baseline – i.e., a lower bound on how well A-R-RARB can do.
A new approach based on peer review instead of independent indexing would likely improve efficiency, but its effect on quality is unknown. Anecdotal evidence suggests that peer reviewing may be twice as fast as indexing from scratch.
This is likely due to the fact that most R edits fix problems – they rarely introduce new problems. However, RARB doesn’t know who A or R is, and they erroneously agree with A too much, which is why there is no gain from RARB, and in fact some small losses in quality due to RARB.
There are clear gains in time for the A-R model, because reviewing takes about half as much time as transcribing from scratch.
Remember, in our study Peer Review was a new method for those performing it and the system hadn’t been customized to support it well, so it may do as well as A-B-ARB with some minor improvements and training.