Presented Nov. 16 2011 at the National Institute of Standards (NIST) Text REtrieval Conference (TREC). Track organized with Gabriella Kazai with assistance from Hyun Joon Jung.
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
The document discusses using crowdsourcing for search evaluation and social-algorithmic search. It covers topics like using crowds to collect data for search relevance judging, training machine learning models, and answering queries. It also discusses different crowdsourcing platforms, designing tasks for crowds, and quality control. Examples are given of using crowds for tasks in natural language processing, computer vision, information retrieval and more. The social aspects of search are also discussed, like integrating social networks and allowing community question answering.
The document discusses the basics of information retrieval systems. It covers two main stages - indexing and retrieval. In the indexing stage, documents are preprocessed and stored in an index. In retrieval, queries are issued and the index is accessed to find relevant documents. The document then discusses several models for defining relevance between documents and queries, including the Boolean model and vector space model. It also covers techniques for representing documents and queries as vectors and calculating similarity between them.
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
Presentation at the 1st Biannual Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018). August 30, 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/kutlu-desires18.pdf
Presentation of a geotagging approach for social media content with a refined language modelling approach. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
Geotagging Social Media Content with a Refined Language Modelling ApproachSymeon Papadopoulos
Presentation of a geotagging approach for social media content with a refined language modelling approach. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
The document summarizes crowd computing and crowdsourcing. It discusses how tasks traditionally performed by employees can now be outsourced to large online groups through platforms like Amazon Mechanical Turk. It provides examples of how crowds have been used for tasks like data labeling, content analysis, and computer vision. It also discusses some of the opportunities, challenges, and open questions around using crowds for human computation, including ensuring data quality, addressing fraud and ethics concerns, understanding the demographics of workers, and regulating the emerging field.
The document discusses object detection using convolutional neural networks and Region-based Convolutional Neural Networks (R-CNNs). It provides an overview of object detection and classification tasks, as well as a history of approaches using hand-crafted features with SVMs and more recent deep learning methods. Region-based Convolutional Neural Networks were able to achieve state-of-the-art results on object detection benchmarks by proposing regions, extracting features from CNNs, and classifying each region.
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
The document discusses using crowdsourcing for search evaluation and social-algorithmic search. It covers topics like using crowds to collect data for search relevance judging, training machine learning models, and answering queries. It also discusses different crowdsourcing platforms, designing tasks for crowds, and quality control. Examples are given of using crowds for tasks in natural language processing, computer vision, information retrieval and more. The social aspects of search are also discussed, like integrating social networks and allowing community question answering.
The document discusses the basics of information retrieval systems. It covers two main stages - indexing and retrieval. In the indexing stage, documents are preprocessed and stored in an index. In retrieval, queries are issued and the index is accessed to find relevant documents. The document then discusses several models for defining relevance between documents and queries, including the Boolean model and vector space model. It also covers techniques for representing documents and queries as vectors and calculating similarity between them.
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
Presentation at the 1st Biannual Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018). August 30, 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/kutlu-desires18.pdf
Presentation of a geotagging approach for social media content with a refined language modelling approach. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
Geotagging Social Media Content with a Refined Language Modelling ApproachSymeon Papadopoulos
Presentation of a geotagging approach for social media content with a refined language modelling approach. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
The document summarizes crowd computing and crowdsourcing. It discusses how tasks traditionally performed by employees can now be outsourced to large online groups through platforms like Amazon Mechanical Turk. It provides examples of how crowds have been used for tasks like data labeling, content analysis, and computer vision. It also discusses some of the opportunities, challenges, and open questions around using crowds for human computation, including ensuring data quality, addressing fraud and ethics concerns, understanding the demographics of workers, and regulating the emerging field.
The document discusses object detection using convolutional neural networks and Region-based Convolutional Neural Networks (R-CNNs). It provides an overview of object detection and classification tasks, as well as a history of approaches using hand-crafted features with SVMs and more recent deep learning methods. Region-based Convolutional Neural Networks were able to achieve state-of-the-art results on object detection benchmarks by proposing regions, extracting features from CNNs, and classifying each region.
Improving DBpedia (one microtask at a time)Elena Simperl
This document summarizes a study on using crowdsourcing microtasks to improve entity classification in DBpedia. It found that providing a shortlist of classification options for crowd workers led to faster and cheaper results, though it risked missing some classes. Allowing freeform classification responses captured a broader range of classes but at a higher cost and lower precision. Overall, classifying entities at the most basic level of abstraction in the DBpedia ontology achieved the best precision, even when more options were available. The study concluded that some entities may not be classifiable and that the DBpedia ontology could benefit from revision based on crowdsourcing insights.
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
The Art and Science of Analyzing Software DataCS, NcState
This document summarizes an ICSE'14 tutorial on analyzing software data. The tutorial covers various topics:
- Organization issues like talking to users to understand goals, knowing the software domain to avoid misinterpretations, questioning the data, and seeing data science as cyclic.
- Qualitative methods like discovering information needs through surveys and interviews.
- Quantitative methods like data reduction techniques and privacy-preserving sharing.
- Open issues like data instabilities, model comparisons, and ensemble techniques.
The document emphasizes understanding the user's perspective and software domain knowledge to properly analyze data and avoid incorrect conclusions. Case studies show how missing this domain knowledge led analyses down wrong paths.
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
The document discusses the LDBC Social Network Benchmark for evaluating database and graph processing systems. It describes the benchmark's social network data generator which produces realistic data following power law distributions and correlations. It also outlines the benchmark's three workloads: interactive, business intelligence, and graph analytics. The focus is on the interactive workload, which includes complex read queries, simple read queries, and concurrent updates. It aims to identify choke points and measure the acceleration factor a system can sustain for the query mix while meeting a maximum query latency. Parameter curation is used to select query parameters that produce stable performance. The parallel query driver respects dependencies between queries to evaluate a system's ability to handle the workload concurrently.
This document describes a method for ranking entity types using contextual information from text. It presents several approaches for ranking types, including entity-centric, hierarchy-based, and context-aware methods. It also describes how the different ranking approaches are evaluated using crowdsourcing to collect relevance judgments on entity types within given contexts. The approaches are implemented in a system called TRank that uses inverted indices and MapReduce for scalability.
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
This document summarizes Liwei Ren's presentation on binary similarity algorithms. It discusses three existing algorithms - ssdeep, sdhash, and TLSH - and proposes a new algorithm called TSFP. TSFP represents files as bags of blocks and measures similarity based on the overlap of blocks between two files. It is suggested as a way to solve similarity search and clustering problems by creating an index of files represented as TSFPs. The presentation concludes with inviting questions from the audience.
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...eXascale Infolab
The document describes ZenCrowd, a system that leverages probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. It combines both algorithmic and manual linking approaches. ZenCrowd automates manual linking via crowdsourcing by breaking it into micro-tasks and dynamically assessing human workers with a probabilistic model. An experimental evaluation on news articles demonstrates that the combined approach of algorithmic matching and crowdsourcing achieves higher precision and recall than using either technique alone.
Crowdsourcing via Amazon Mechanical Turk was used to collect annotations for 5 natural language tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. The study found that while individual non-experts were less reliable than experts, aggregating responses from multiple non-experts could produce annotations equivalent or comparable to experts. Specifically, annotations from 4 non-experts on average produced similar reliability as 1 expert for affect recognition, and classifiers trained on crowdsourced data performed comparable to those trained on expert annotations.
This document provides an overview of crowd computing, including opportunities and challenges. It discusses how platforms like Amazon Mechanical Turk have enabled on-demand access to a global workforce for microtasks. Examples are given of how crowdsourcing has been used for tasks like data labeling and user studies. Challenges discussed include ensuring quality, managing sensitive data, addressing legal issues, understanding worker demographics, and considering ethics. The document outlines future research directions in crowdsourcing techniques, quality control, modeling worker expertise, and the role of humans and automation.
Bytewise approximate matching, searching and clusteringLiwei Ren任力偉
This document discusses bytewise approximate matching, searching, and clustering. It defines six matching problems - identicalness, containment, cross-sharing, similarity, approximate containment, and approximate cross-sharing. A framework is proposed that uses bytewise relevance to define matching, searching, and clustering problems and solutions. Current and future work includes algorithms, tools, and applications for approximate matching in domains like malware analysis, plagiarism detection, and digital forensics.
ICSE’14 Workshop Keynote Address: Emerging Trends in Software Metrics (WeTSOM’14).
Data about software projects is not stored in metrc1, metric2,…,
but is shared between them in some shared, underlying,shape.
Not every project has thesame underlying simple shape; many projects have different,
albeit simple, shapes.
We can exploit that shape, to great effect: for better local predictions; for transferring
lessons learned; for privacy-preserving data mining/
Improving search with neural ranking methodsvoginip
Neural ranking methods have improved search quality but also have limitations. The good is that neural methods using contextual word embeddings have significantly boosted search accuracy. However, the bad is that no single neural approach works best for all queries. Additionally, the ugly is that neural methods rely on learned word similarities that can reflect societal biases and fall back on statistical correlations rather than true understanding.
This document summarizes machine learning techniques used at NASA's Jet Propulsion Laboratory. It discusses how machine learning can be used to analyze large datasets that are too complex for humans to fully examine alone. Examples include identifying features in hyperspectral images and discovering patterns in genetic and meteorological data. Both supervised and unsupervised machine learning algorithms are covered.
This document proposes fast single-pass k-means clustering algorithms to allow for fast nearest neighbor search on large datasets. It discusses the rationale for using k-means clustering, describes algorithms like ball k-means and surrogate methods that can perform clustering in a single pass. It covers implementations using techniques like locality sensitive hashing and projection search to speed up vector searches. Evaluation on synthetic and real datasets shows the algorithms can achieve the same or better accuracy as traditional k-means 10x faster, enabling applications like fast nearest neighbor search on massive datasets for applications like customer modeling.
This document discusses multimodal learning analytics (MLA), which examines learning through multiple modalities like video, audio, digital pens, etc. It provides examples of extracting features from these modalities to analyze problem solving, expertise levels, and presentation quality. Key challenges of MLA are integrating different modalities and developing tools to capture real-world learning outside online systems. While current accuracy is limited, MLA is an emerging field that could provide insights beyond traditional learning analytics.
Measuring System Performance in Cultural Heritage SystemsToine Bogers
This talk presents a high-level overview of the different components of cultural heritage information systems—search, browsing, recommendation, and enrichment—and their evaluation, and the common challenges.
(Invited talk at the "Evaluating Cultural Heritage Information Systems" workshop at the iConference 2015 in Newport Beach, CA)
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
Intra- and interdisciplinary cross-concordances were created to improve information retrieval across heterogeneous collections. The KoMoHe project created 25 vocabularies in 64 cross-concordances, mapping 380,000 terms through 465,000 relations. Information retrieval tests found that cross-concordances significantly improved recall and precision more for interdisciplinary searches compared to intradisciplinary searches, which had more identical terms. Mapping projects should perform more information retrieval tests to measure the effect of their mappings on search effectiveness.
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
Research talk presented at "Innovations in Online Research" (October 1, 2021)
Event URL: https://web.cvent.com/event/d063e447-1f16-4f70-a375-5d6978b3feea/websitePage:b8d4ce12-3d02-4d24-897d-fd469ca4808a.
Improving DBpedia (one microtask at a time)Elena Simperl
This document summarizes a study on using crowdsourcing microtasks to improve entity classification in DBpedia. It found that providing a shortlist of classification options for crowd workers led to faster and cheaper results, though it risked missing some classes. Allowing freeform classification responses captured a broader range of classes but at a higher cost and lower precision. Overall, classifying entities at the most basic level of abstraction in the DBpedia ontology achieved the best precision, even when more options were available. The study concluded that some entities may not be classifiable and that the DBpedia ontology could benefit from revision based on crowdsourcing insights.
The Web of Data: do we actually understand what we built?Frank van Harmelen
Despite its obvious success (largest knowledge base ever built, used in practice by companies and governments alike), we actually understand very little of the structure of the Web of Data. Its formal meaning is specified in logic, but with its scale, context dependency and dynamics, the Web of Data has outgrown its traditional model-theoretic semantics.
Is the meaning of a logical statement (an edge in the graph) dependent on the cluster ("context") in which it appears? Does a more densely connected concept (node) contain more information? Is the path length between two nodes related to their semantic distance?
Properties such as clustering, connectivity and path length are not described, much less explained by model-theoretic semantics. Do such properties contribute to the meaning of a knowledge graph?
To properly understand the structure and meaning of knowledge graphs, we should no longer treat knowledge graphs as (only) a set of logical statements, but treat them properly as a graph. But how to do this is far from clear.
In this talk, I report on some of our early results on some of these questions, but I ask many more questions for which we don't have answers yet.
The Art and Science of Analyzing Software DataCS, NcState
This document summarizes an ICSE'14 tutorial on analyzing software data. The tutorial covers various topics:
- Organization issues like talking to users to understand goals, knowing the software domain to avoid misinterpretations, questioning the data, and seeing data science as cyclic.
- Qualitative methods like discovering information needs through surveys and interviews.
- Quantitative methods like data reduction techniques and privacy-preserving sharing.
- Open issues like data instabilities, model comparisons, and ensemble techniques.
The document emphasizes understanding the user's perspective and software domain knowledge to properly analyze data and avoid incorrect conclusions. Case studies show how missing this domain knowledge led analyses down wrong paths.
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
The document discusses the LDBC Social Network Benchmark for evaluating database and graph processing systems. It describes the benchmark's social network data generator which produces realistic data following power law distributions and correlations. It also outlines the benchmark's three workloads: interactive, business intelligence, and graph analytics. The focus is on the interactive workload, which includes complex read queries, simple read queries, and concurrent updates. It aims to identify choke points and measure the acceleration factor a system can sustain for the query mix while meeting a maximum query latency. Parameter curation is used to select query parameters that produce stable performance. The parallel query driver respects dependencies between queries to evaluate a system's ability to handle the workload concurrently.
This document describes a method for ranking entity types using contextual information from text. It presents several approaches for ranking types, including entity-centric, hierarchy-based, and context-aware methods. It also describes how the different ranking approaches are evaluated using crowdsourcing to collect relevance judgments on entity types within given contexts. The approaches are implemented in a system called TRank that uses inverted indices and MapReduce for scalability.
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
This document summarizes Liwei Ren's presentation on binary similarity algorithms. It discusses three existing algorithms - ssdeep, sdhash, and TLSH - and proposes a new algorithm called TSFP. TSFP represents files as bags of blocks and measures similarity based on the overlap of blocks between two files. It is suggested as a way to solve similarity search and clustering problems by creating an index of files represented as TSFPs. The presentation concludes with inviting questions from the audience.
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...eXascale Infolab
The document describes ZenCrowd, a system that leverages probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. It combines both algorithmic and manual linking approaches. ZenCrowd automates manual linking via crowdsourcing by breaking it into micro-tasks and dynamically assessing human workers with a probabilistic model. An experimental evaluation on news articles demonstrates that the combined approach of algorithmic matching and crowdsourcing achieves higher precision and recall than using either technique alone.
Crowdsourcing via Amazon Mechanical Turk was used to collect annotations for 5 natural language tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. The study found that while individual non-experts were less reliable than experts, aggregating responses from multiple non-experts could produce annotations equivalent or comparable to experts. Specifically, annotations from 4 non-experts on average produced similar reliability as 1 expert for affect recognition, and classifiers trained on crowdsourced data performed comparable to those trained on expert annotations.
This document provides an overview of crowd computing, including opportunities and challenges. It discusses how platforms like Amazon Mechanical Turk have enabled on-demand access to a global workforce for microtasks. Examples are given of how crowdsourcing has been used for tasks like data labeling and user studies. Challenges discussed include ensuring quality, managing sensitive data, addressing legal issues, understanding worker demographics, and considering ethics. The document outlines future research directions in crowdsourcing techniques, quality control, modeling worker expertise, and the role of humans and automation.
Bytewise approximate matching, searching and clusteringLiwei Ren任力偉
This document discusses bytewise approximate matching, searching, and clustering. It defines six matching problems - identicalness, containment, cross-sharing, similarity, approximate containment, and approximate cross-sharing. A framework is proposed that uses bytewise relevance to define matching, searching, and clustering problems and solutions. Current and future work includes algorithms, tools, and applications for approximate matching in domains like malware analysis, plagiarism detection, and digital forensics.
ICSE’14 Workshop Keynote Address: Emerging Trends in Software Metrics (WeTSOM’14).
Data about software projects is not stored in metrc1, metric2,…,
but is shared between them in some shared, underlying,shape.
Not every project has thesame underlying simple shape; many projects have different,
albeit simple, shapes.
We can exploit that shape, to great effect: for better local predictions; for transferring
lessons learned; for privacy-preserving data mining/
Improving search with neural ranking methodsvoginip
Neural ranking methods have improved search quality but also have limitations. The good is that neural methods using contextual word embeddings have significantly boosted search accuracy. However, the bad is that no single neural approach works best for all queries. Additionally, the ugly is that neural methods rely on learned word similarities that can reflect societal biases and fall back on statistical correlations rather than true understanding.
This document summarizes machine learning techniques used at NASA's Jet Propulsion Laboratory. It discusses how machine learning can be used to analyze large datasets that are too complex for humans to fully examine alone. Examples include identifying features in hyperspectral images and discovering patterns in genetic and meteorological data. Both supervised and unsupervised machine learning algorithms are covered.
This document proposes fast single-pass k-means clustering algorithms to allow for fast nearest neighbor search on large datasets. It discusses the rationale for using k-means clustering, describes algorithms like ball k-means and surrogate methods that can perform clustering in a single pass. It covers implementations using techniques like locality sensitive hashing and projection search to speed up vector searches. Evaluation on synthetic and real datasets shows the algorithms can achieve the same or better accuracy as traditional k-means 10x faster, enabling applications like fast nearest neighbor search on massive datasets for applications like customer modeling.
This document discusses multimodal learning analytics (MLA), which examines learning through multiple modalities like video, audio, digital pens, etc. It provides examples of extracting features from these modalities to analyze problem solving, expertise levels, and presentation quality. Key challenges of MLA are integrating different modalities and developing tools to capture real-world learning outside online systems. While current accuracy is limited, MLA is an emerging field that could provide insights beyond traditional learning analytics.
Measuring System Performance in Cultural Heritage SystemsToine Bogers
This talk presents a high-level overview of the different components of cultural heritage information systems—search, browsing, recommendation, and enrichment—and their evaluation, and the common challenges.
(Invited talk at the "Evaluating Cultural Heritage Information Systems" workshop at the iConference 2015 in Newport Beach, CA)
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
Intra- and interdisciplinary cross-concordances were created to improve information retrieval across heterogeneous collections. The KoMoHe project created 25 vocabularies in 64 cross-concordances, mapping 380,000 terms through 465,000 relations. Information retrieval tests found that cross-concordances significantly improved recall and precision more for interdisciplinary searches compared to intradisciplinary searches, which had more identical terms. Mapping projects should perform more information retrieval tests to measure the effect of their mappings on search effectiveness.
Similar to Crowdsourcing Track Overview at TREC 2011 (20)
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
Research talk presented at "Innovations in Online Research" (October 1, 2021)
Event URL: https://web.cvent.com/event/d063e447-1f16-4f70-a375-5d6978b3feea/websitePage:b8d4ce12-3d02-4d24-897d-fd469ca4808a.
Explainable Fact Checking with Humans in-the-loopMatthew Lease
Invited Keynote at KDD 2021 TrueFact Workshop: Making a Credible Web for Tomorrow, August 15, 2021.
https://www.microsoft.com/en-us/research/event/kdd-2021-truefact-workshop-making-a-credible-web-for-tomorrow/#!program-schedule
Talk given at Delft University speaker series on "Crowd Computing & Human-Centered AI" (https://www.academicfringe.org/). November 23, 2020. Covers two 2020 works:
(1) Anubrata Das, Brandon Dang, and Matthew Lease. Fast, Accurate, and Healthier: Interactive Blurring Helps Moderators Reduce Exposure to Harmful Content. In Proceedings of the 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2020.
Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020.
AI & Work, with Transparency & the Crowd Matthew Lease
The document discusses designing human-AI partnerships and the role of crowdsourcing in AI systems. It summarizes work on designing AI assistants to work with humans, using crowds to help fact-check information, and explores challenges around protecting crowd workers who review harmful content or do "dirty jobs". It advocates for more research on ethics in AI and using crowds to help check work for ethical issues.
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
The document discusses designing human-AI partnerships to combat misinformation. It describes a prototype partnership where a human and AI work together to fact-check claims. The partnership aims to make the AI more transparent and address user bias by allowing the user to adjust the perceived reliability of news sources, which then changes the AI's political leaning analysis and fact checking results. The discussion wraps up by noting challenges like avoiding echo chambers and assessing potential harms, as well as opportunities for AI to reduce bias and increase trust through explainable, interactive systems.
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
This document summarizes a presentation about designing human-AI partnerships for fact-checking misinformation. It discusses using crowdsourced rationales to improve the accuracy and cost-efficiency of annotation tasks. It also addresses challenges in designing interfaces for automatic fact-checking models, such as integrating human knowledge and reasoning to correct errors and account for bias. The goal is to develop mixed-initiative systems where humans and AI can jointly reason and personalize fact-checking.
Presentation given at the Linguistic Data Consortium (LDC), University of Pennsylvania, April 2019. Based on presentations at the 6th ACM Collective Intelligence Conference, 2018 and the 6th AAAI Conference on Human Computation & Crowdsourcing (HCOMP), 2018. Blog post: https://blog.humancomputation.com/?p=9932.
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
Presented at the 31st ACM User Interface Software and Technology Symposium (UIST), 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/nguyen-uist18.pdf
Talk given August 29, 2018 at the 1st Biannual Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES 2018). Paper: https://www.ischool.utexas.edu/~ml/papers/lease-desires18.pdf
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
Presentation at the 6th AAAI Conference on Human Computation and Crowdsourcing (HCOMP), July 7, 2018. Work by Tanya Goyal, Tyler McDonnell, Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. Pages 41-49 in conference proceedings. Online version of paper includes corrections to official version in proceedings: https://www.ischool.utexas.edu/~ml/papers/goyal-hcomp18
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
Invited Talk at the ACM JCDL 2018 WORKSHOP ON CYBERINFRASTRUCTURE AND MACHINE LEARNING FOR DIGITAL LIBRARIES AND ARCHIVES. https://www.tacc.utexas.edu/conference/jcdl18
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
This document discusses opportunities for collaboration between researchers working in systematic reviews and electronic discovery (e-discovery). It notes similarities in the challenges both fields face, including the need for high recall with bounded costs and reliance on multi-stage review pipelines. The document proposes that technologies developed for semi-automated citation screening and crowdsourcing could help address current limitations. It concludes by encouraging information retrieval researchers to investigate open problems in systematic reviews as opportunities to advance technologies beyond other tasks and help bring together interested parties through forums like the TREC Total Recall track.
Crowd computing utilizes both crowdsourcing and human computation to solve problems. Crowdsourcing enables more efficient and scalable data collection and processing by outsourcing tasks to a large, undefined group of people. Human computation allows software developers to incorporate human intelligence and judgment into applications to provide capabilities beyond current artificial intelligence. Examples discussed include Amazon Mechanical Turk, various crowd-powered applications, and how crowdsourcing has helped label large datasets to train machine learning models.
The Rise of Crowd Computing (December 2015)Matthew Lease
Crowd computing is rising with two waves - the first using crowds to label large amounts of data for artificial intelligence applications. The second wave delivers applications that go beyond AI abilities by incorporating human computation. Open problems remain around ensuring high quality outputs, task design, understanding the worker context and experience, and addressing ethics concerns around opaque platforms and working conditions. The future holds potential for empowering crowd work but also risks like digital sweatshops if worker freedoms and conditions are not considered.
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
The document summarizes a presentation about analyzing paid crowd work platforms beyond Mechanical Turk. It discusses how Mechanical Turk has dominated research on paid crowdsourcing due to its early popularity, but that it has limitations. The presentation conducts a qualitative study of 7 alternative crowd work platforms to identify distinguishing capabilities not found on MTurk, such as different payment models, richer worker profiles, and support for confidential tasks. It aims to increase awareness of other platforms to further inform practice and research on crowdsourcing.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
3. What is Crowdsourcing?
• A collection of mechanisms and associated
methodologies for scaling and directing crowd
activities to achieve some goal(s)
• Enabled by internet-connectivity
• Many related concepts
– Collective intelligence
– Social computing
– People services
– Human computation
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 3
4. Why Crowdsourcing? Potential…
• Scalability (e.g. cost, time, effort)
– e.g. scale to greater pool sizes
• Quality (by getting more eyes on the data)
– More diverse judgments
– More accurate judgments (“wisdom of crowds”)
• And more!
– New datasets, new tasks, interaction, on-demand
evaluation, hybrid search systems
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 4
5. Track Goals (for Year 1)
• Promote IR community awareness
of, investigation of, and experience with
crowdsourcing mechanisms and methods
• Improve understanding of best practices
• Establish shared, reusable benchmarks
• Assess state-of-the-art of the field
• Attract experience from outside IR community
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 5
6. Crowdsourcing in 2011
• AAAI-HCOMP: 3rd Human Computation Workshop (Aug. 8)
• ACIS: Crowdsourcing, Value Co-Creation, & Digital Economy Innovation (Nov. 30 – Dec. 2)
• Crowdsourcing Technologies for Language and Cognition Studies (July 27)
• CHI-CHC: Crowdsourcing and Human Computation (May 8)
• CIKM: BooksOnline (Oct. 24, “crowdsourcing … online books”)
• CrowdConf 2011 -- 2nd Conf. on the Future of Distributed Work (Nov. 1-2)
• Crowdsourcing: Improving … Scientific Data Through Social Networking (June 13)
• EC: Workshop on Social Computing and User Generated Content (June 5)
• ICWE: 2nd International Workshop on Enterprise Crowdsourcing (June 20)
• Interspeech: Crowdsourcing for speech processing (August)
• NIPS: Second Workshop on Computational Social Science and the Wisdom of Crowds (Dec. TBD)
• SIGIR-CIR: Workshop on Crowdsourcing for Information Retrieval (July 28)
• TREC-Crowd: Year 1 of TREC Crowdsourcing Track (Nov. 16-18)
• UbiComp: 2nd Workshop on Ubiquitous Crowdsourcing (Sep. 18)
• WSDM-CSDM: Crowdsourcing for Search and Data Mining (Feb. 9)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 6
7. Two Questions, Two Tasks
• Task 1: Assessment (human factors)
– How can we obtain quality relevance judgments
from individual (crowd) participants?
• Task 2: Aggregation (statistics)
– How can we derive a quality relevance judgment
from multiple (crowd) judgments?
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 7
9. Task 2: Aggregation (statistics)
• “Wisdom of crowds” computing
• Typical assumption: noisy input labels
– But not always (cf. Yang et al., SIGIR’10)
• Many statistical methods have been proposed
– Common baseline: majority vote
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 9
10. Crowdsourcing, Noise & Uncertainty
Broadly two approaches
1. Alchemy: turn noisy data into gold
– Once we have gold, we can go on training and
evaluating as before (separation of concerns)
– Assume we can mostly clean it up and ignore any
remaining error (even gold is rarely 100% pure)
2. Model & propagate uncertainty
– Let it “spill over” into training and evaluation
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 10
11. Test Collection: ClueWeb09 subset
• Collection: 19K pages rendered by Waterloo
– Task 1: teams judge (a subset)
– Task 2: teams aggregate judgments we provide
• Topics: taken from past MQ and RF tracks
• Gold: Roughly 3K prior NIST judgments
– Remaining 16K pages have no “gold” judgments
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 11
12. What to Predict?
• Teams submit classification and/or ranking labels
– Classification supports traditional absolute relevance judging
– Rank labels support pair-wise preference or list-wise judging
• Classification labels in [0,1]
– Probability of relevance (assessor/system uncertainty)
– Simple generalization of binary relevance
– If probabilities submitted but no ranking, rank labels induced
• Ranking as [1..N]
– Task 1: rank 5 documents per set
• Same worker had to label all 5 examples in a given set (challenge)
– Task 2: rank all documents per topic
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 12
13. Metrics
• Classification
– Binary ground truth: P, R, Accuracy, Sensitivity, LogLoss
– Probabilistic ground truth: KL, RMSE
• Ranking
– Mean Average Precision (MAP)
– Normalized Discounted Cumulative Gain (NDCG)
• Ternary NIST judgments conflated to binary
• Could explore mapping [0,1] consensus to ternary categories
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 13
15. Classification Metrics (cont’d)
• Classification – Binary ground truth (cont’d)
• Classification – Probabilistic ground truth
Root Mean Squared Error (RMSE)
• Notes
– To avoid log(0) = infinity, replace 0 with 10^-15
– Revision: compute average per-example logloss and KL so error does
not grow with sample size (particularly with varying team coverage)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 15
16. Ground Truth: Three Versions
• Gold: NIST Judgments
– only available for a subset of the test collection
• Consensus: generated by aggregating team labels (automatic)
– full coverage
• Team-based (Task 2 only)
– use each team’s labels as truth to evaluate all other teams
– Inspect variance in team rankings over alternative ground truths
– Coverage varies
Three primary evaluation conditions
1. Over examples having gold labels (evaluate vs. gold labels)
2. Over examples having gold labels (evaluate vs. consensus labels)
3. Over all examples (evaluate vs. consensus labels)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 16
17. Consensus
• Goal: Infer single consensus label from multiple input labels
• Methodological Goals: unbiased, transparent, simple
• Method: simple average, rounded when metrics require
– Task 2: input = example labels from each team
– Task 1: input = per-example average of worker labels from each team
• Details
– Classification labels only; no rank fusion
– Using primary runs only
– Task 1: each team gets 1 vote regardless of worker count (prevent bias)
– Exclude any examples where
• only one team submitted a label (bias)
• consensus would yield a tie (binary metrics only)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 17
20. Task 1: Data
• Option 1: Use Waterloo rendered pages
– Available as images, PDFs, and plain text (+html)
– Many page images fetched from CMU server
– Protect workers from malicious scripting
• Option 2: Use some other format
– Any team creating some other format was asked
to provide that data or conversion tool to others
– Avoid comparison based on different rendering
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 20
21. Task 1: Data
• Topics: 270 (240 development, 30 test)
• Test Effort: ~2200 topic-document pairs for each team to judge
– Shared sets: judged by all teams
• Test: 1655 topic-document pairs (331 sets) over 20 topics
– Assigned sets: judged subset of teams
• Test: 1545 topic-document pairs (309 sets) over 15 topics in total
• ~ 500 assigned to each team (~ 30 rel, 20 non-rel, 450 unknown)
– Split intended to let organizers measure any worker-training effects
• Increased track complexity, decreased useful redundancy & gold …
• Gold: 395 topic-document pairs for test
– made available to teams for cross-validation (not blind)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 21
22. Task 1: Cost & Sponsorship
• Paid crowd labor only one form of crowdsourcing
– Other models: directed gaming, citizen science, virtual pay
– Incentives: socialize with others, recognition, social good, learn, etc.
• Nonetheless, paid models continue to dominate
– e.g. Amazon Mechanical Turk (MTurk), CrowdFlower
• Risk: cost of crowd labor being barrier to track participation
• Risk Mitigation: sponsorship
– CrowdFlower: $100 free credit to interested teams
– Amazon: ~ $300 reimbursement to teams using MTurk (expected)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 22
23. Task 1: Participants
1. Beijing University of Posts and Telecommunications (BUPT)
– CrowdFlower qualification, MTurk judging
2. Delft University of Technology – Vuurens (TUD_DMIR): MTurk
3. Delft University of Technology & University of Iowa (GeAnn)
– Game, recruit via CrowdFlower
4. Glasgow – Terrier (uogTr): MTurk
5. Microsoft (MSRC): MTurk
6. RMIT University (RMIT): CrowdFlower
7. University Carlos III of Madrid (uc3m): Mturk
8. University of Waterloo (UWaterlooMDS): in-house judging
5 used MTurk, 3 used CrowdFlower , 1 in-house
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 23
24. Task 1: Evaluation method
• Average per-worker performance
– Average weighted by number of labels per worker
– Primary evaluation includes rejected work
• Additional metric: Coverage
– What % of examples were labeled by the team?
• Cost & time to be self-reported by teams
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 24
25. ¼ most productive workers do ¾ of the work
# of workers
# of labels % of labels
Top 25%
44917 76.77%
Top 50%
53444 91.34%
Top 75%
56558.5 96.66%
Total
58510 100%
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 25
26. Same worker, multiple teams
2000
1800
1600 # of teams avg. # of
belongs to # of worker examples
1400
1
Number of Examples
1200
947 56.21
1000
2
35 146.65
800
3
600 2 72.25
400
200
0
1
25
49
73
97
121
145
169
193
217
241
265
289
313
337
361
385
409
433
457
481
505
529
553
577
601
625
649
673
697
721
745
769
793
817
841
865
889
913
937
961
Anonymized Worker ID
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 26
28. Task 2: Data
• Input: judgments provided by organizers
– 19,033 topic-document pairs
– 89,624 binary judgments from 762 workers
• Evaluation: average per-topic performance
• Gold: 3275 labels
– 2275 for training (1275 relevant, 1000 non-relevant)
• Excluded from evaluation
– 1000 for blind test (balanced 500/500)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 28
29. Task 2: Participants
1. Beijing University of Posts and Telecommunications (BUPT)
2. Delft University of Technology – Vuurens (TUD_DMIR)
3. Delft University of Technology & University of Iowa (GeAnn)
4. Glasgow – Terrier (uogTr)
5. Glasgow – Zuccon (qirdcsuog)
6. LingPipe
7. Microsoft (MSRC)
8. University Carlos III of Madrid (uc3m)
9. University of Texas at Austin (UTAustin)
10. University of Waterloo (UWaterlooMDS)
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 29
30. Discussion
• Consensus Labels as ground-truth
– Consensus Algorithm for Label Generation?
– Probabilistic or Rounded Binary Consensus Labels?
• Proper scoring rules
• Changes for 2012?
– Which document collection? Request NIST judging?
– Drop the two-task format? Pre-suppose crowdsourced solution?
– Broaden sponsorship? Narrow scope?
– Additional organizer?
– Details
• Focus on worker training effects
• Treatment of rejected work
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 30
31. Conclusion
• Interesting first year of track
– Some insights about what worked well and less well in track design
– Participants will tell us about methods developed
– More analysis still needed for evaluation
• Track will run again in 2012
– Help shape it with feedback (planning session. Hallway, or email)
• Acknowledgments
– Hyun Joon Jung (UT Austin)
– Mark Smucker (U Waterloo)
– Ellen Voorhees & Ian Soboroff (NIST)
• Sponsors
– Amazon
– CrowdFlower
Nov. 16, 2011 TREC 2011 Crowdsourcing Track 31