The Rich Picture A Tool For Reasoning About Work Contextguestc990b6
This document discusses rich pictures, which are cartoon-like representations that identify stakeholders, their concerns, and the structure underlying a work context. Rich pictures originated in Soft Systems Methodology as a tool for reasoning about multiple viewpoints in a work situation. They typically depict the key stakeholders, their relationships and concerns through diagrams and thought bubbles. The document explains how rich pictures can be used in participatory design and lightweight usability engineering to capture a work context from stakeholders' perspectives and identify tensions between different stakeholders. It provides examples of rich pictures and guidelines for making them effective representations.
The Visual Representation of Complexity: Sixteen Key Characteristics of Compl...RSD7 Symposium
The document describes research into visually representing key characteristics of complex systems. It involved collecting surveys at a conference to gather ideas from experts in complexity science, systems mapping, and design. Sixteen key characteristics of complex systems were identified: feedback, emergence, self-organization, levers/hubs, non-linearity, domains of stability, adaptation, path dependency, tipping points, change over time, open systems, unpredictability, unknowns, distributed control, and multiple scales/levels. Workshops were held to develop images, definitions, examples, and learning points for each characteristic. The goal is to develop a visual language of complexity that can aid communication and decision-making across different fields.
- The document proposes a heuristic to determine whether a systems dynamics project should use a qualitative or quantitative approach.
- The heuristic involves asking either a "QMAP question" or "QMOD question" to classify the problem as qualitative or quantitative. QMAP is for problems focusing on causal loop structures, while QMOD is for complex dynamic problems.
- By guiding the problem articulation with these questions, the heuristic aims to avoid debates over whether to map or model and instead matches the right methodology to the problem type. Case studies demonstrate applying the QMAP and QMOD questions.
The document discusses how crowdsourcing can be used to gather natural language processing ground truth data by capturing and exploiting disagreement among annotators on events, participants, temporal and spatial aspects. It proposes an approach that tolerates, captures and understands disagreement in order to score machine outputs based on the space of annotation possibilities. Examples are provided that show disagreement among annotators for event participants, temporal aspects, and spatial aspects of events.
3-step Parallel Corpus Cleaning using Monolingual Crowd WorkersToshiaki Nakazawa
A high-quality parallel corpus needs to be manually created to achieve good machine translation for the domains which do not have enough existing resources. Although the quality of the corpus to some extent can be improved by asking the professional translators to translate, it is impossible to completely avoid making any mistakes. In this paper, we propose a framework for cleaning the existing professionally-translated parallel corpus in a quick and cheap way. The proposed method uses a 3-step crowdsourcing procedure to efficiently detect and edit the translation flaws, and also guarantees the reliability of the edits. The experiments using the fashion-domain e-commerce-site (EC-site) parallel corpus show the effectiveness of the proposed method for the parallel corpus cleaning.
The Rich Picture A Tool For Reasoning About Work Contextguestc990b6
This document discusses rich pictures, which are cartoon-like representations that identify stakeholders, their concerns, and the structure underlying a work context. Rich pictures originated in Soft Systems Methodology as a tool for reasoning about multiple viewpoints in a work situation. They typically depict the key stakeholders, their relationships and concerns through diagrams and thought bubbles. The document explains how rich pictures can be used in participatory design and lightweight usability engineering to capture a work context from stakeholders' perspectives and identify tensions between different stakeholders. It provides examples of rich pictures and guidelines for making them effective representations.
The Visual Representation of Complexity: Sixteen Key Characteristics of Compl...RSD7 Symposium
The document describes research into visually representing key characteristics of complex systems. It involved collecting surveys at a conference to gather ideas from experts in complexity science, systems mapping, and design. Sixteen key characteristics of complex systems were identified: feedback, emergence, self-organization, levers/hubs, non-linearity, domains of stability, adaptation, path dependency, tipping points, change over time, open systems, unpredictability, unknowns, distributed control, and multiple scales/levels. Workshops were held to develop images, definitions, examples, and learning points for each characteristic. The goal is to develop a visual language of complexity that can aid communication and decision-making across different fields.
- The document proposes a heuristic to determine whether a systems dynamics project should use a qualitative or quantitative approach.
- The heuristic involves asking either a "QMAP question" or "QMOD question" to classify the problem as qualitative or quantitative. QMAP is for problems focusing on causal loop structures, while QMOD is for complex dynamic problems.
- By guiding the problem articulation with these questions, the heuristic aims to avoid debates over whether to map or model and instead matches the right methodology to the problem type. Case studies demonstrate applying the QMAP and QMOD questions.
The document discusses how crowdsourcing can be used to gather natural language processing ground truth data by capturing and exploiting disagreement among annotators on events, participants, temporal and spatial aspects. It proposes an approach that tolerates, captures and understands disagreement in order to score machine outputs based on the space of annotation possibilities. Examples are provided that show disagreement among annotators for event participants, temporal aspects, and spatial aspects of events.
3-step Parallel Corpus Cleaning using Monolingual Crowd WorkersToshiaki Nakazawa
A high-quality parallel corpus needs to be manually created to achieve good machine translation for the domains which do not have enough existing resources. Although the quality of the corpus to some extent can be improved by asking the professional translators to translate, it is impossible to completely avoid making any mistakes. In this paper, we propose a framework for cleaning the existing professionally-translated parallel corpus in a quick and cheap way. The proposed method uses a 3-step crowdsourcing procedure to efficiently detect and edit the translation flaws, and also guarantees the reliability of the edits. The experiments using the fashion-domain e-commerce-site (EC-site) parallel corpus show the effectiveness of the proposed method for the parallel corpus cleaning.
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
This document provides an overview of statistical crowdsourcing and its applications. It discusses crowdsourcing platforms like Amazon Mechanical Turk and how they have enabled large-scale data labeling for tasks in areas like natural language processing. It also summarizes research on using crowdsourcing to evaluate search engines and benchmarks different statistical consensus methods for aggregating judgments from crowds. Finally, it presents work on using psychometrics and crowdsourcing to model multidimensional relevance through structured surveys and factor analysis.
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
Presentation at the 1st Biannual Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018). August 30, 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/kutlu-desires18.pdf
A Query Routing Model to Rank Expertcandidates on TwitterJonathas Magalhães
The document proposes a query routing model to rank expert candidates on Twitter to answer questions. It evaluates knowledge, trust, and activity criteria to determine the best person to direct a question to. An evaluation of the model on 160 questions showed it achieved over 90% accuracy in predicting the ideal expert ranking, outperforming using individual criteria. This demonstrates the model is effective at query routing on Twitter to connect questions with suitable answers.
On Quality Control and Machine Learning in CrowdsourcingMatthew Lease
Talk at "Wisdom of the Crowd" AAAI 2012 Spring Symposium workshop (http://users.wpi.edu/~soniac/WisdomOfTheCrowd/WoCSchedule.htm) on 2011 AAAI-HComp paper by the same title.
The document discusses machine learning and data science concepts. It begins with an introduction to machine learning and the machine learning process. It then provides an overview of select machine learning algorithms and concepts like bias/variance, generalization, underfitting and overfitting. It also discusses ensemble methods. The document then shifts to discussing time series, functions for manipulating time series, and laying the foundation for time series prediction and forecasting. It provides examples of applying techniques like median filtering to smooth time series data. Overall, the document provides a high-level introduction and overview of key machine learning and time series concepts.
The document summarizes an Analytics Vidhya meetup event. It discusses that the meetups will occur once a month, with the next one on May 24th. It aims to provide networking and learning around data science, big data, machine learning and IoT. It introduces the volunteer organizers and outlines the agenda, which includes an introduction, discussing the model building lifecycle, data exploration techniques, and modeling techniques like logistic regression, decision trees, random forests, and SVMs. It provides details on practicing these techniques by predicting survival on the Titanic dataset.
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
This document discusses issues in statistics that data scientists can and cannot ignore when working with large datasets. It begins by outlining the talk and defining key terms in data science. It then explains that model assessment, such as estimating model performance on new data, becomes easier with more data as statistical adjustments are not needed. However, more data and variables are not always better, as noise, collinearity, and overfitting can still occur. Several examples are given where common machine learning algorithms can be fooled into achieving high accuracy on training data even when the target variable is random. The conclusion emphasizes that data science, statistics, and domain expertise each provide unique perspectives, and effective teams need to understand all views.
The slide has details on below points:
1. Introduction to Machine Learning
2. What are the challenges in acceptance of Machine Learning in Banks
3. How to overcome the challenges in adoption of Machine Learning in Banks
4. How to find new use cases of Machine Learning
5. Few current interesting use cases of Machine Learning
Please contact me (shekup@gmail.com) or connect with me on LinkedIn (https://www.linkedin.com/in/shekup/) for more explanation on ML and how it may help your business.
The slides are inspired by:
Survey & interviews done by me with Bankers & Technology Professionals
Presentation from Google NEXT 2017
Presentation by DATUM on Youtube
Royal Society Machine Learning
Big Data & Social Analytics Course from MIT & GetSmarter
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
The document summarizes a presentation about analyzing paid crowd work platforms beyond Mechanical Turk. It discusses how Mechanical Turk has dominated research on paid crowdsourcing due to its early popularity, but that it has limitations. The presentation conducts a qualitative study of 7 alternative crowd work platforms to identify distinguishing capabilities not found on MTurk, such as different payment models, richer worker profiles, and support for confidential tasks. It aims to increase awareness of other platforms to further inform practice and research on crowdsourcing.
In this presentation I review various data science techniques and discuss their usefulness to pricing actuaries working in general insurance.
This presentation was originally given at the TIGI webinar in 2020.
https://www.actuaries.org.uk/learn-develop/attend-event/tigi-2020-technical-issues-general-insurance
Classification is a data analysis technique used to predict class membership for new observations based on a training set of previously labeled examples. It involves building a classification model during a training phase using an algorithm, then testing the model on new data to estimate accuracy. Some common classification algorithms include decision trees, Bayesian networks, neural networks, and support vector machines. Classification has applications in domains like medicine, retail, and entertainment.
This document summarizes a talk on practical machine learning issues. It discusses identifying the right machine learning scenario for a given task, such as classification, regression, clustering, or reinforcement learning. It also addresses common reasons why machine learning models may fail, such as using the wrong evaluation metrics, not having enough labeled training data, or not performing proper feature engineering. The document emphasizes the importance of choosing the appropriate machine learning model, having sufficient high-quality data, and selecting useful features.
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
검색 및 추천 시스템의 사회적 역할이 커지면서, 그 결과의 공정성 역시 최근 관심사로 대두되었다. 본 발표에서는 검색 및 추천시스템의 공정성 이슈 및 그 해법을 다룬다. 공정한 검색 및 추천 결과를 정의하는 다양한 방법, 공정성의 결여가 미치는 자원 배분 및 스테레오타이핑 문제, 그리고 검색 및 추천시스템 개발의 각 단계별로 어떤 해결책이 있는지를 최신 연구 중심으로 살펴본다. 마지막으로 실제 공정한 시스템 개발을 위한 실무적인 고려사항을 다룬다.
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...PhD Assistance
Machine Learning (ML) is rapidly used in a variety of applications. It has risen to prominence in recent years, owing in part to the emergence of big data. When it comes to big data, ML algorithms have never been more promising. Big data allows machine learning algorithms to discover finer-grained patterns and make more timely and precise predictions than ever before; however, it also poses significant challenges to machine learning, such as model scalability and distributed computing.
Learn More: https://bit.ly/2RB1buD
Contact Us:
Website: https://www.phdassistance.com/
UK NO: +44–1143520021
India No: +91–4448137070
WhatsApp No: +91 91769 66446
Email: info@phdassistance.com
Machine Learning for automated diagnosis of distributed ...AEbutest
The document discusses challenges in using machine learning for automated diagnosis of performance issues in distributed systems. It describes 4 key challenges: 1) transforming large amounts of metrics data into useful information, 2) adapting models to changing systems, 3) leveraging historical diagnosis to retrieve similar issues, and 4) combining metrics data with unstructured log data from multiple sources. The author proposes approaches for each challenge including Bayesian network classifiers, adaptive ensembles of models, defining issue signatures, and information extraction from logs.
This document provides an agenda for a meetup on data science topics. The meetup will be held once a month, with the next one on June 14th. It aims to provide the best networking and learning platform in Bangalore for areas like data science, big data, machine learning. The agenda includes introductions, an overview of the model building lifecycle, data exploration and feature engineering techniques, and modeling techniques like logistic regression, decision trees, random forests, and SVM. Teams will be formed to predict whether bids are from humans or robots using these techniques. Resources for implementing the techniques in Python and R are also provided.
Recently, in the fields Business Intelligence and Data Management, everybody is talking about data science, machine learning, predictive analytics and many other “clever” terms with promises to turn your data into gold. In this slides, we present the big picture of data science and machine learning. First, we define the context for data mining from BI perspective, and try to clarify various buzzwords in this field. Then we give an overview of the machine learning paradigms. After that, we are going to discuss - at a high level - the various data mining tasks, techniques and applications. Next, we will have a quick tour through the Knowledge Discovery Process. Screenshots from demos will be shown, and finally we conclude with some takeaway points.
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
Research talk presented at "Innovations in Online Research" (October 1, 2021)
Event URL: https://web.cvent.com/event/d063e447-1f16-4f70-a375-5d6978b3feea/websitePage:b8d4ce12-3d02-4d24-897d-fd469ca4808a.
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
This document provides an overview of statistical crowdsourcing and its applications. It discusses crowdsourcing platforms like Amazon Mechanical Turk and how they have enabled large-scale data labeling for tasks in areas like natural language processing. It also summarizes research on using crowdsourcing to evaluate search engines and benchmarks different statistical consensus methods for aggregating judgments from crowds. Finally, it presents work on using psychometrics and crowdsourcing to model multidimensional relevance through structured surveys and factor analysis.
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
Presentation at the 1st Biannual Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018). August 30, 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/kutlu-desires18.pdf
A Query Routing Model to Rank Expertcandidates on TwitterJonathas Magalhães
The document proposes a query routing model to rank expert candidates on Twitter to answer questions. It evaluates knowledge, trust, and activity criteria to determine the best person to direct a question to. An evaluation of the model on 160 questions showed it achieved over 90% accuracy in predicting the ideal expert ranking, outperforming using individual criteria. This demonstrates the model is effective at query routing on Twitter to connect questions with suitable answers.
On Quality Control and Machine Learning in CrowdsourcingMatthew Lease
Talk at "Wisdom of the Crowd" AAAI 2012 Spring Symposium workshop (http://users.wpi.edu/~soniac/WisdomOfTheCrowd/WoCSchedule.htm) on 2011 AAAI-HComp paper by the same title.
The document discusses machine learning and data science concepts. It begins with an introduction to machine learning and the machine learning process. It then provides an overview of select machine learning algorithms and concepts like bias/variance, generalization, underfitting and overfitting. It also discusses ensemble methods. The document then shifts to discussing time series, functions for manipulating time series, and laying the foundation for time series prediction and forecasting. It provides examples of applying techniques like median filtering to smooth time series data. Overall, the document provides a high-level introduction and overview of key machine learning and time series concepts.
The document summarizes an Analytics Vidhya meetup event. It discusses that the meetups will occur once a month, with the next one on May 24th. It aims to provide networking and learning around data science, big data, machine learning and IoT. It introduces the volunteer organizers and outlines the agenda, which includes an introduction, discussing the model building lifecycle, data exploration techniques, and modeling techniques like logistic regression, decision trees, random forests, and SVMs. It provides details on practicing these techniques by predicting survival on the Titanic dataset.
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
This document discusses issues in statistics that data scientists can and cannot ignore when working with large datasets. It begins by outlining the talk and defining key terms in data science. It then explains that model assessment, such as estimating model performance on new data, becomes easier with more data as statistical adjustments are not needed. However, more data and variables are not always better, as noise, collinearity, and overfitting can still occur. Several examples are given where common machine learning algorithms can be fooled into achieving high accuracy on training data even when the target variable is random. The conclusion emphasizes that data science, statistics, and domain expertise each provide unique perspectives, and effective teams need to understand all views.
The slide has details on below points:
1. Introduction to Machine Learning
2. What are the challenges in acceptance of Machine Learning in Banks
3. How to overcome the challenges in adoption of Machine Learning in Banks
4. How to find new use cases of Machine Learning
5. Few current interesting use cases of Machine Learning
Please contact me (shekup@gmail.com) or connect with me on LinkedIn (https://www.linkedin.com/in/shekup/) for more explanation on ML and how it may help your business.
The slides are inspired by:
Survey & interviews done by me with Bankers & Technology Professionals
Presentation from Google NEXT 2017
Presentation by DATUM on Youtube
Royal Society Machine Learning
Big Data & Social Analytics Course from MIT & GetSmarter
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
The document summarizes a presentation about analyzing paid crowd work platforms beyond Mechanical Turk. It discusses how Mechanical Turk has dominated research on paid crowdsourcing due to its early popularity, but that it has limitations. The presentation conducts a qualitative study of 7 alternative crowd work platforms to identify distinguishing capabilities not found on MTurk, such as different payment models, richer worker profiles, and support for confidential tasks. It aims to increase awareness of other platforms to further inform practice and research on crowdsourcing.
In this presentation I review various data science techniques and discuss their usefulness to pricing actuaries working in general insurance.
This presentation was originally given at the TIGI webinar in 2020.
https://www.actuaries.org.uk/learn-develop/attend-event/tigi-2020-technical-issues-general-insurance
Classification is a data analysis technique used to predict class membership for new observations based on a training set of previously labeled examples. It involves building a classification model during a training phase using an algorithm, then testing the model on new data to estimate accuracy. Some common classification algorithms include decision trees, Bayesian networks, neural networks, and support vector machines. Classification has applications in domains like medicine, retail, and entertainment.
This document summarizes a talk on practical machine learning issues. It discusses identifying the right machine learning scenario for a given task, such as classification, regression, clustering, or reinforcement learning. It also addresses common reasons why machine learning models may fail, such as using the wrong evaluation metrics, not having enough labeled training data, or not performing proper feature engineering. The document emphasizes the importance of choosing the appropriate machine learning model, having sufficient high-quality data, and selecting useful features.
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
검색 및 추천 시스템의 사회적 역할이 커지면서, 그 결과의 공정성 역시 최근 관심사로 대두되었다. 본 발표에서는 검색 및 추천시스템의 공정성 이슈 및 그 해법을 다룬다. 공정한 검색 및 추천 결과를 정의하는 다양한 방법, 공정성의 결여가 미치는 자원 배분 및 스테레오타이핑 문제, 그리고 검색 및 추천시스템 개발의 각 단계별로 어떤 해결책이 있는지를 최신 연구 중심으로 살펴본다. 마지막으로 실제 공정한 시스템 개발을 위한 실무적인 고려사항을 다룬다.
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...PhD Assistance
Machine Learning (ML) is rapidly used in a variety of applications. It has risen to prominence in recent years, owing in part to the emergence of big data. When it comes to big data, ML algorithms have never been more promising. Big data allows machine learning algorithms to discover finer-grained patterns and make more timely and precise predictions than ever before; however, it also poses significant challenges to machine learning, such as model scalability and distributed computing.
Learn More: https://bit.ly/2RB1buD
Contact Us:
Website: https://www.phdassistance.com/
UK NO: +44–1143520021
India No: +91–4448137070
WhatsApp No: +91 91769 66446
Email: info@phdassistance.com
Machine Learning for automated diagnosis of distributed ...AEbutest
The document discusses challenges in using machine learning for automated diagnosis of performance issues in distributed systems. It describes 4 key challenges: 1) transforming large amounts of metrics data into useful information, 2) adapting models to changing systems, 3) leveraging historical diagnosis to retrieve similar issues, and 4) combining metrics data with unstructured log data from multiple sources. The author proposes approaches for each challenge including Bayesian network classifiers, adaptive ensembles of models, defining issue signatures, and information extraction from logs.
This document provides an agenda for a meetup on data science topics. The meetup will be held once a month, with the next one on June 14th. It aims to provide the best networking and learning platform in Bangalore for areas like data science, big data, machine learning. The agenda includes introductions, an overview of the model building lifecycle, data exploration and feature engineering techniques, and modeling techniques like logistic regression, decision trees, random forests, and SVM. Teams will be formed to predict whether bids are from humans or robots using these techniques. Resources for implementing the techniques in Python and R are also provided.
Recently, in the fields Business Intelligence and Data Management, everybody is talking about data science, machine learning, predictive analytics and many other “clever” terms with promises to turn your data into gold. In this slides, we present the big picture of data science and machine learning. First, we define the context for data mining from BI perspective, and try to clarify various buzzwords in this field. Then we give an overview of the machine learning paradigms. After that, we are going to discuss - at a high level - the various data mining tasks, techniques and applications. Next, we will have a quick tour through the Knowledge Discovery Process. Screenshots from demos will be shown, and finally we conclude with some takeaway points.
Similar to Crowdsourcing for Information Retrieval: From Statistics to Ethics (20)
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
Research talk presented at "Innovations in Online Research" (October 1, 2021)
Event URL: https://web.cvent.com/event/d063e447-1f16-4f70-a375-5d6978b3feea/websitePage:b8d4ce12-3d02-4d24-897d-fd469ca4808a.
Explainable Fact Checking with Humans in-the-loopMatthew Lease
Invited Keynote at KDD 2021 TrueFact Workshop: Making a Credible Web for Tomorrow, August 15, 2021.
https://www.microsoft.com/en-us/research/event/kdd-2021-truefact-workshop-making-a-credible-web-for-tomorrow/#!program-schedule
Talk given at Delft University speaker series on "Crowd Computing & Human-Centered AI" (https://www.academicfringe.org/). November 23, 2020. Covers two 2020 works:
(1) Anubrata Das, Brandon Dang, and Matthew Lease. Fast, Accurate, and Healthier: Interactive Blurring Helps Moderators Reduce Exposure to Harmful Content. In Proceedings of the 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2020.
Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020.
AI & Work, with Transparency & the Crowd Matthew Lease
The document discusses designing human-AI partnerships and the role of crowdsourcing in AI systems. It summarizes work on designing AI assistants to work with humans, using crowds to help fact-check information, and explores challenges around protecting crowd workers who review harmful content or do "dirty jobs". It advocates for more research on ethics in AI and using crowds to help check work for ethical issues.
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
The document discusses designing human-AI partnerships to combat misinformation. It describes a prototype partnership where a human and AI work together to fact-check claims. The partnership aims to make the AI more transparent and address user bias by allowing the user to adjust the perceived reliability of news sources, which then changes the AI's political leaning analysis and fact checking results. The discussion wraps up by noting challenges like avoiding echo chambers and assessing potential harms, as well as opportunities for AI to reduce bias and increase trust through explainable, interactive systems.
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
This document summarizes a presentation about designing human-AI partnerships for fact-checking misinformation. It discusses using crowdsourced rationales to improve the accuracy and cost-efficiency of annotation tasks. It also addresses challenges in designing interfaces for automatic fact-checking models, such as integrating human knowledge and reasoning to correct errors and account for bias. The goal is to develop mixed-initiative systems where humans and AI can jointly reason and personalize fact-checking.
Presentation given at the Linguistic Data Consortium (LDC), University of Pennsylvania, April 2019. Based on presentations at the 6th ACM Collective Intelligence Conference, 2018 and the 6th AAAI Conference on Human Computation & Crowdsourcing (HCOMP), 2018. Blog post: https://blog.humancomputation.com/?p=9932.
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
Presented at the 31st ACM User Interface Software and Technology Symposium (UIST), 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/nguyen-uist18.pdf
Talk given August 29, 2018 at the 1st Biannual Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018). Paper: https://www.ischool.utexas.edu/~ml/papers/lease-desires18.pdf
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
Presentation at the 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), July 7, 2018. Work by Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. Pages 41-49 in conference proceedings. Online version of paper includes corrections to official version in proceedings: https://www.ischool.utexas.edu/~ml/papers/goyal-hcomp18
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
Invited Talk at the ACM JCDL 2018 WORKSHOP ON CYBERINFRASTRUCTURE AND MACHINE LEARNING FOR DIGITAL LIBRARIES AND ARCHIVES. https://www.tacc.utexas.edu/conference/jcdl18
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
This document discusses opportunities for collaboration between researchers working in systematic reviews and electronic discovery (e-discovery). It notes similarities in the challenges both fields face, including the need for high recall with bounded costs and reliance on multi-stage review pipelines. The document proposes that technologies developed for semi-automated citation screening and crowdsourcing could help address current limitations. It concludes by encouraging information retrieval researchers to investigate open problems in systematic reviews as opportunities to advance technologies beyond other tasks and help bring together interested parties through forums like the TREC Total Recall track.
Crowd computing utilizes both crowdsourcing and human computation to solve problems. Crowdsourcing enables more efficient and scalable data collection and processing by outsourcing tasks to a large, undefined group of people. Human computation allows software developers to incorporate human intelligence and judgment into applications to provide capabilities beyond current artificial intelligence. Examples discussed include Amazon Mechanical Turk, various crowd-powered applications, and how crowdsourcing has helped label large datasets to train machine learning models.
The Rise of Crowd Computing (December 2015)Matthew Lease
Crowd computing is rising with two waves - the first using crowds to label large amounts of data for artificial intelligence applications. The second wave delivers applications that go beyond AI abilities by incorporating human computation. Open problems remain around ensuring high quality outputs, task design, understanding the worker context and experience, and addressing ethics concerns around opaque platforms and working conditions. The future holds potential for empowering crowd work but also risks like digital sweatshops if worker freedoms and conditions are not considered.
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
New forms of online crowd work enabled by technology present both opportunities for innovation and risks of harm that require careful consideration. This document discusses three main issues. First, some crowd work tasks may enable illegal or unethical goals. Second, the lack of regulation means crowd work practices sometimes exploit vulnerable workers by not ensuring informed consent. Third, multi-stakeholder discussions are needed to develop win-win solutions that balance costs, quality, and what is fair for all parties in a global context. The goal is to learn from each other and find ways to encourage ethical practices.
Talk at AAAI Human Computation 2013 Workshop on Scaling Speech, Language Understanding and Dialogue through Crowdsourcing (November 9, 2013): http://faculty.washington.edu/mtjalve/HCOMP2013.Workshop.html
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Crowdsourcing for Information Retrieval: From Statistics to Ethics
1. Crowdsourcing for Information Retrieval:
From Statistics to Ethics
Matt Lease
School of Information
University of Texas at Austin
@mattlease
ml@utexas.edu
2. Roadmap
• Scalability Challenges in IR Evaluation (brief)
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing
Matt Lease <ml@utexas.edu>
2
3. Roadmap
• Scalability Challenges in IR Evaluation (brief)
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing
Matt Lease <ml@utexas.edu>
3
4. Why Evaluation at Scale?
• Evaluation should closely
mirror real use conditions
• The best algorithm at
small scale may not be
best at larger scales
– Banko and Brill (2001)
– Halevy et al. (2009)
• IR systems should be evaluated on the scale of
data which users will search in practice
Matt Lease <ml@utexas.edu>
4
5. Why is Evaluation at Scale Hard?
• Multiple ways to evaluate; consider Cranfield
– Given a document collection and set of user queries
– Label documents for relevance to each query
– Evaluate search algorithms on these queries & documents
• Labeling data is slow/expensive/difficult
• Approach 1: label less data (e.g. active learning)
– Pooling, metrics robust to sparse data (e.g., BPref)
– Measure only relative performance (e.g., statAP, MTC)
• Approach 2: label data more efficiently
– Crowdsourcing (e.g., Amazon’s Mechanical Turk)
Matt Lease <ml@utexas.edu>
5
7. Crowdsourcing for IR Evaluation
• Origin: Alonso et al. (SIGIR Forum 2008)
– Continuing active area of research
• Primary concern: ensuring reliable data
– Reliable data provides foundation for evaluation
– If QA inefficient, overhead could reduce any savings
– Common strategy: ask multiple people to judge
relevance, then aggregate their answers (consensus)
Matt Lease <ml@utexas.edu>
7
8. Roadmap
• Scalability Challenges in Evaluating IR Systems
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing
Matt Lease <ml@utexas.edu>
8
9. SQUARE: A Benchmark for Research
on Computing Crowd Consensus
Aashish Sheshadri and M. Lease, HCOMP’13
ir.ischool.utexas.edu/square (open source)
Matt Lease <ml@utexas.edu>
9
10. Background
• How do we resolve disagreement of multiple
peoples’ answers to arrive at consensus?
• Simple baseline: majority voting
• Long history pre-dating crowdsourcing
– Dawid and Skene’79, Smyth et al., ’95
– Recent focus on quality assurance with crowds
• Many more methods, active research topic
– Across many areas: ML, Vision, NLP, IR, DB, …
Matt Lease <ml@utexas.edu>
10
11. Why Benchmark?
• Drive field innovation by clear challenge tasks
– e.g., David Tse’s FIST 2012 Keynote (Comp. Biology)
• Many other things we can learn
– How do methods compare?
• Qualitatively & quantitatively?
– What is the state-of-the-art today?
– What works, what doesn’t, and why?
• Where is further research most needed?
– How has field progressed over time?
Matt Lease <ml@utexas.edu>
11
12. Cons Method
-
-
Most limited model
Cannot be supervised
No confusion matrix
-
Pros
Simple, fast, no training
Task-independent
MV
ZC
Demartini’12
Worker Reliability
parameters
-
Task-independent
Can be supervised
Allows priors on worker
reliability & class distribution
GLAD
-
-
-
-
-
No confusion matrix
No worker priors
Classification only
Space prop. to num classes
No worker priors
Classification only
Space prop. to num classes
No worker priors
Classification only
Space prop. to num classes
Automatic classifier requires
feature representation
Classification only
Complex
with
many
hyper-parameters.
Unclear how to supervise
Whitehill et al.’09
Worker Reliability &
Task Difficulty params
Naïve Bayes (NB)
Snow et al.,’08
= D&S Model fully-supervised
Dawid & Skene’79 (DS)
Class priors &
Worker Confusion matrices
Raykar et al.’10 (RY)
Worker confusion, sensitivity, specificity
(Optional) Automatic Classifier
-
Task-independent
Can be supervised
Prior on class distribution
-
Supports multi-class tasks
Models worker confusion
Simple maximum-likelihood
-
Supports multi-class tasks
Models worker confusion
Unsup, semi-sup, or fully-sup
-
Classifier not required
Priors on worker confusion
and class distribution.
Has multi-class support.
Can be supervised.
Welinder et al.’10 (CUBAM)
-
Worker reliability and confusion
-
Annotation noise
Task Difficulty
More Complex
Method =
Model +
Training +
Inference
Confusion Matrix
Detailed model of the
annotation process.
Can identify worker clusters .
Has multi-class support.
12
16. Findings
• Majority voting never best, rarely much worse
• Each method often best for some condition
– E.g., original dataset designed for
• DS & RY tend to perform best (RY adds priors)
• No method performs far beyond others
– Of course, contributions aren’t just empirical…
Matt Lease <ml@utexas.edu>
16
17. Why Don’t We See Bigger Gains?
• Gold is too noisy to detect improvement?
– Cormack & Kolcz’09, Klebanov & Beigman’10
• Limited tasks / scenarios considered?
– e.g., we exclude hybrid methods & worker filtering
• Might we see greater differences from
– Better benchmark tests?
– Better tuning of methods?
– Additional methods?
• We invite community contributions!
Matt Lease <ml@utexas.edu>
17
18. Roadmap
• Scalability Challenges in Evaluating IR Systems
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing
Matt Lease <ml@utexas.edu>
18
19. Crowdsourced Task Routing via
Matrix Factorization
HyunJoon Jung and M. Lease
arXiv 1310.5142, under review
Matt Lease <ml@utexas.edu>
21. Task Routing: Background
• Selection vs. recommendation vs. assignment
– Potential to improve work quality & satisfaction
– task search time has latency & is uncompensated
– Tradeoffs in push vs. pull, varying models
• Many matching criteria one could consider
– Preferences, Experience, Skills, Job constraints, …
• References
– Law and von Ahn, 2011 (Ch. 4)
– Chilton et al., 2010
• MTurk “free” selection constrained by search interface
Matt Lease <ml@utexas.edu>
21
22. Matrix Factorization Approach
• Collaborative filtering-based recommendation
• Intuition: achieve similar accuracy on similar tasks
– Notion is more general: e.g. preference, expertise, etc.
Worker-example matrix for each task
w1
Comprehensive worker-task matrix
..
wm
w1
0
e1
w2
w2
..
wm
w1
0
w2
0
0
1
..
1
1
0
…
1
1
1
Tn
1
1
e2
e1
1
…
e2
e1
1
en
…
e2
1
…
en
N Tasks
en
1
1
1
1
w1
T1
w2
..
0.39
wm
w2
..
wm
T1
0.72
0.59
0.70
0.75
T2
0.5
0.54
0.66
0.73
…
0.66
0.71
0.78
0.89
Tn
0.55
w1
0.87
0.83
0.72
0.91
wm
T2
0.5
0.54
0.66
0
Accumulate
repeated
crowdsourced
data
0.78
0.83
0.89
0.72
Tabularize a
worker-task
relational
model
Matt Lease <ml@utexas.edu>
Apply MF for
inferring
missing values
Select bestpredicted
workers for a
target task
22
23. Matrix Factorization
• Automatically induce latent features
– Task-independent
• Popular due to robustness to sparsity
– SVD sensitive matrix density; PMF much more robust
M workers (M>>N)
Worker Features
T
N tasks
»
Rij
WT
D = N-1
dimensions
T R D M
Rij Wi T j W T ik T jk
T
k
Task Features
e.g., rating of user i for movie j
W R D N
Matt Lease <ml@utexas.edu>
23
25. Baselines
• Random assignment
– no accuracy prediction; just for task routing
• Simple average
– Average worker’s accuracies across past tasks
• Weighted average
– weight each task in average by similarity to target task
• task similarity must be estimated from data
Matt Lease <ml@utexas.edu>
25
26. Estimating Task Similarity
• Define by Pearson correlation over per-task
accuracies of workers who perform both
– Ignore any workers doing only one of the tasks
Matt Lease <ml@utexas.edu>
26
27. Results – RMSE & Mean Acc. (MTurk data)
Average over tasks
k = 1 to 20 workers
Per-task & Average
k=10 workers
Matt Lease <ml@utexas.edu>
27
28. Findings
• How does MF prediction accuracy vary given
task similarity, matrix size, & matrix density?
– Feasible, PMF beats SVD, more data = better…
• MF task routing vs. baselines?
– Much better than random; baselines fine in most
sparse conditions; improvement beyond that
Matt Lease <ml@utexas.edu>
28
29. Open Questions
• Other ways to infer task similarity (e.g. textual)
• Under “Big Data” conditions?
• When integrating target task observations?
• How to better model crowd & spam?
• How to address live task routing challenges?
Matt Lease <ml@utexas.edu>
29
30. Roadmap
• Scalability Challenges in Evaluating IR Systems
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing
Matt Lease <ml@utexas.edu>
30
31. A Few Moral Dilemmas
• A “fair” price for online work in a global economy?
– Is it better to pay nothing (i.e., volunteers, gamification)
rather than pay something small for valuable work?
• Are we obligated to inform people how their
participation / work products will be used?
– If my IRB doesn’t require me to obtain informed consent,
is there some other moral obligation to do so?
• A worker finds his ID posted in a researcher’s online
source code and asks that it be removed. This can’t
be done without recreating the repo, which many
people use. What should be done?
Matt Lease <ml@utexas.edu>
31
32. Mechanical Turk is Not Anonymous
Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim,
Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller.
Online: Social Science Research Network, March 6, 2013
ssrn.com/abstract=2190946
33. `
Amazon profile page
URLs use the same
IDs as used on MTurk
How do we respond
when we learn we’ve
exposed people to risk?
33
34. Ethical Crowdsourcing
• Assume researchers have good intentions, and
so issues of gross negligence are rare
– Withholding promised pay after work performed
– Not obtaining or complying with IRB oversight
• Instead, great challenge is how to recognize our
impacts appropriate actions in a complex world
– Educating ourselves takes time & effort
– Failing to educate ourselves could harm to others
• How can we strike a reasonable balance between
complete apathy vs. being overly alarmist?
Matt Lease <ml@utexas.edu>
34
35. CACM August, 2013
Paul Hyman. Communications of the ACM, Vol. 56 No. 8, Pages 19-21, August 2013.
Matt Lease <ml@utexas.edu>
35
36. •
•
•
•
•
Contribute to society and human well-being
Avoid harm to others
Be honest and trustworthy
Be fair and take action not to discriminate
Respect the privacy of others
COMPLIANCE WITH THE CODE. As an ACM member I will
– Uphold and promote the principles of this Code
– Treat violations of this code as inconsistent with
membership in the ACM
Matt Lease <ml@utexas.edu>
36
37. CS2008 Curriculum Update (ACM, IEEE)
There is reasonably wide agreement that this topic of legal, social,
professional and ethical should feature in all computing degrees.
…financial and economic imperatives …Which approaches are less
expensive and is this sensible? With the advent of outsourcing and
off-shoring these matters become more complex and take on new
dimensions …there are often related ethical issues concerning
exploitation… Such matters ought to feature in courses on legal,
ethical and professional practice.
if ethical considerations are covered only in the standalone course and
not “in context,” it will reinforce the false notion that technical processes
are void of ethical issues. Thus it is important that several traditional
courses include modules that analyze ethical considerations in the
context of the technical subject matter … It would be explicitly against
the spirit of the recommendations to have only a standalone course.
Matt Lease <ml@utexas.edu>
37
38. “Contribute to society and human
well-being; avoid harm to others”
• Do we have a moral obligation to try to ascertain
conditions under which work is performed? Or the
impact we have upon those performing the work?
• Do we feel differently when work is performed by
– Political refugees? Children? Prisoners? Disabled?
• How do we know who is doing the work, or if a
decision to work (for a given price) is freely made?
– Does it matter why someone accepts offered work?
Matt Lease <ml@utexas.edu>
38
40. Who are
the workers?
• A. Baio, November 2008. The Faces of Mechanical Turk.
• P. Ipeirotis. March 2010. The New Demographics of
Mechanical Turk
• J. Ross, et al. Who are the Crowdworkers? CHI 2010.
Matt Lease <ml@utexas.edu>
40
41. Some Notable Prior Research
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of these people
who we ask to power our computing?”
– “abstraction hides detail'‘ - some details may be worth
keeping conspicuously present (Jessica Hullman)
• Irani and Silberman (2013)
– “…AMT helps employers see themselves as builders of
innovative technologies, rather than employers unconcerned
with working conditions.”
– “…human computation currently relies on worker invisibility.”
• Fort, Adda, and Cohen (2011)
– “…opportunities for our community to deliberately value
ethics above cost savings.”
41
42. Power Asymmetry on MTurk
• Mistakes happen, such as wrongly rejecting work – e.g., error by
new student, software bug, poor instructions, noisy gold, etc.
• How do we balance the harm caused by our mistakes to workers
(our liability) vs. our cost/effort of preventing such mistakes?
Matt Lease <ml@utexas.edu>
42
43. Task Decomposition
By minimizing context, greater task efficiency &
accuracy can often be achieved in practice
– e.g. “Can you name who is in this photo?”
• Much research on ways to streamline work
and decompose complex tasks
Matt Lease <ml@utexas.edu>
43
44. Context & Informed Consent
• Assume we wish to obtain informed consent
• Without context, consent cannot be informed
– Zittrain, Ubiquitous human computing (2008)
44
45. Independent Contractors vs. Employees
• Wolfson & Lease, ASIS&T’11
• Many platforms classify workers as independent
contractors (piece-work, not hourly)
– Legislators/courts must ultimately decide
• Different work classifications yield different legal
rights/protections & responsibilities
– Domestic vs. international workers
– Employment taxes
– Litigation can both cause or redress harm
• Law aside, to what extent do moral principles
underlying current laws apply to online work?
Matt Lease <ml@utexas.edu>
45
46. Consequences of Human Computation
as a Panacea where AI Falls Short
•
•
•
•
The Googler who Looked at the Worst of the Internet
Policing the Web’s Lurid Precincts
Facebook content moderation
The dirty job of keeping Facebook clean
• Even linguistic annotators report stress &
nightmares from reading news articles!
Matt Lease <ml@utexas.edu>
46
47. What about Freedom?
• Crowdsourcing vision: empowering freedom
– work whenever you want for whomever you want
• Risk: people compelled to perform work
– Chinese prisoners farming gold online
– Digital sweat shops? Digital slaves?
– We know relatively little today about work conditions
– How might we monitor and mitigate risk/growth of
crowd work inflicting harm to at-risk populations?
– Traction? Human Trafficking at MSR Summit’12
Matt Lease <ml@utexas.edu>
47
49. Join the conversation!
Crowdwork-ethics, by Six Silberman
http://crowdwork-ethics.wtf.tw
an informal, occasional blog for researchers
interested in ethical issues in crowd work
Matt Lease <ml@utexas.edu>
49
50. The Future of Crowd Work, CSCW’13
Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
Matt Lease <ml@utexas.edu>
50
51. Additional References
• Irani, Lilly C. The Ideological Work of Microwork. In preparation,
draft available online.
• Adda, Gilles, et al. Crowdsourcing for language resource
development: Critical analysis of amazon mechanical turk
overpowering use. Proceedings of the 5th Language and Technology
Conference (LTC). 2011.
• Adda, Gilles, and Joseph J. Mariani. Economic, Legal and Ethical
analysis of Crowdsourcing for Speech Processing. (2013).
• Harris, Christopher G., and Padmini Srinivasan. Crowdsourcing and
Ethics. Security and Privacy in Social Networks. 67-83. 2013.
• Harris, Christopher G. Dirty Deeds Done Dirt Cheap: A Darker Side
to Crowdsourcing. IEEE 3rd conference on social computing
(socialcom). 2011.
• Horton, John J. The condition of the Turking class: Are online
employers fair and honest?. Economics Letters 111.1 (2011): 10-12.
Matt Lease <ml@utexas.edu>
51
52. Additional References (2)
• Bederson, B. B., & Quinn, A. J. Web workers unite! addressing challenges
of online laborers. In CHI 2011 Human Computation Workshop, 97-106.
• Bederson, B. B., & Quinn, A. J. Participation in Human Computation. In
CHI 2011 Human Computation Workshop.
• Felstiner, Alek. Working the Crowd: Employment and Labor Law in the
Crowdsourcing Industry. Berkeley J. Employment & Labor Law 32.1 2011
• Felstiner, Alek. Sweatshop or Paper Route?: Child Labor Laws and InGame Work. CrowdConf (2010).
• Larson, Martha. Toward Responsible and Sustainable Crowsourcing.
Blog post + Slides from Dagstuhl, September 2013.
• Vili Lehdonvirta and Paul Mezier. Identity and Self-Organization in
Unstructured Work. Unpublished working paper. 16 October 2013.
• Zittrain, Jonathan. Minds for Sale. You Tube.
Matt Lease <ml@utexas.edu>
52
53. Thank You!
See also: SIAM’13 Tutorial
Slides: www.slideshare.net/mattlease
ir.ischool.utexas.edu
Matt Lease <ml@utexas.edu>
53