This document summarizes a talk on human computation and crowdsourcing from an industrial perspective. It discusses how crowdsourcing can provide large amounts of cheap labeled data through platforms like Mechanical Turk but that ensuring high quality labels requires careful task design, payments, quality control methods and addressing issues like worker experience and content. Current trends include algorithms for optimizing human-machine workflows and routing tasks between crowds based on their expertise.
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?Mustafa Ekim
Kodlama öğrenmek isteyen, yazılımda yeni ve başarılı bir kariyer hedefleyenler için bilgilendirici ve yönlendirici toplantılar organize ediyoruz.
Eğer ileride profesyonel bir yazılımcı olma hayaliniz varsa, size bu hayalinizin gerçekçi olup olmadığını tespit etmenizde, eğer gerçekçi ise, bu hayale en doğru yoldan nasıl erişebileceğinize ilişkin en doğru kararları almanızda destek oluyoruz.
Hedefimiz, yazılımda yeni bir kariyer hedefleyenlerin ilk adımlarını doğru atmalarını sağlamak.
www.yenikodyazilimkursu.com
UXPA2019 Optimal AR UX for Complex Purchases — How immersive technology boost...UXPA International
Augmented Reality for eCommerce is everywhere. Major retailers and Shopify have mainstreamed 3D. But so far, nearly all product shoppers do is simply “see this in their room.” For complex, configurable, personalized purchases, this isn’t enough.
This session focuses on effective AR uses that increase user success with planning and decision-making. Think of projects such as a kitchen redesign — design aesthetics, myriad features/options, physical characteristics, and lack of buyer knowledge all stand in the way.
I’ll discuss wide-ranging aspects of AR’s potential and provide a framework for planning product-focused applications. I’ll share lots of examples and insights from recent projects, plus others I’ve found along the way, including UX principles for image-based visualizers and configurators refined over 2 decades. This knowledge with help spur ideas for your own projects.
Going beyond, I’ll align user expectations with present and future capabilities of 3D platforms/engines/hardware, giving you a working knowledge for the next generation of 3D: Mixed- and eXtended-Reality.
A very high level introduction to the field of Data Science, Artificial Intelligence. Covers an introduction to Supervised Learning, Unsupervised Learning, Deep Learning and Neural Networks. Given as part of Industry Lectures event at GVP College of Engineering
Communication Skills in Science: Research in 4 minutes (Rin4)Aurelio Ruiz Garcia
DTIC Seminar February 2016. Communication Skills in Science - Research in 4 minutes (Rin4) competition at Universitat Pompeu Fabra, Barcelona.
Aurelio Ruiz, Department of Information and Communication Technologies, Unit of Excellence María de Maeztu
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?Mustafa Ekim
Kodlama öğrenmek isteyen, yazılımda yeni ve başarılı bir kariyer hedefleyenler için bilgilendirici ve yönlendirici toplantılar organize ediyoruz.
Eğer ileride profesyonel bir yazılımcı olma hayaliniz varsa, size bu hayalinizin gerçekçi olup olmadığını tespit etmenizde, eğer gerçekçi ise, bu hayale en doğru yoldan nasıl erişebileceğinize ilişkin en doğru kararları almanızda destek oluyoruz.
Hedefimiz, yazılımda yeni bir kariyer hedefleyenlerin ilk adımlarını doğru atmalarını sağlamak.
www.yenikodyazilimkursu.com
UXPA2019 Optimal AR UX for Complex Purchases — How immersive technology boost...UXPA International
Augmented Reality for eCommerce is everywhere. Major retailers and Shopify have mainstreamed 3D. But so far, nearly all product shoppers do is simply “see this in their room.” For complex, configurable, personalized purchases, this isn’t enough.
This session focuses on effective AR uses that increase user success with planning and decision-making. Think of projects such as a kitchen redesign — design aesthetics, myriad features/options, physical characteristics, and lack of buyer knowledge all stand in the way.
I’ll discuss wide-ranging aspects of AR’s potential and provide a framework for planning product-focused applications. I’ll share lots of examples and insights from recent projects, plus others I’ve found along the way, including UX principles for image-based visualizers and configurators refined over 2 decades. This knowledge with help spur ideas for your own projects.
Going beyond, I’ll align user expectations with present and future capabilities of 3D platforms/engines/hardware, giving you a working knowledge for the next generation of 3D: Mixed- and eXtended-Reality.
A very high level introduction to the field of Data Science, Artificial Intelligence. Covers an introduction to Supervised Learning, Unsupervised Learning, Deep Learning and Neural Networks. Given as part of Industry Lectures event at GVP College of Engineering
Communication Skills in Science: Research in 4 minutes (Rin4)Aurelio Ruiz Garcia
DTIC Seminar February 2016. Communication Skills in Science - Research in 4 minutes (Rin4) competition at Universitat Pompeu Fabra, Barcelona.
Aurelio Ruiz, Department of Information and Communication Technologies, Unit of Excellence María de Maeztu
UX Burlington 2017: Exploratory Research in UX DesignSarah Fathallah
Presentation given at the 2017 UX Burlington conference, on the topic of "Exploratory Research in UX Design."
Exploratory research focuses on gaining a deep understanding of the lives of the end users and the contexts in which they use certain products and services. At its core, it’s about challenging and exploring the problem space, before venturing into the solution space. Using real-life examples of digital tools that help people access affordable housing or register to vote, this talk will explore the different tools used for exploratory research, including ethnographic interviews, contextual inquiry, and co-creation activities and prompts. This talk will leave the audience with a better understanding of the types of insights that exploratory research generates, and how they can complement the findings of evaluative or comparative research.
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations.
In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
The talk focused on the Fundamentals of Product Management, leveraging the speaker's personal experiences in the AI field. It covered core Product Manager topics such as managing customer needs, business goals & technology feasibility, the holy trinity of the Product Manager discipline, delve into data analyses, rapid experimentation, and execution, and finally, explored the challenges of customer privacy, bias, and inclusivity in AI products.
Best Practices in Recommender System ChallengesAlan Said
Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.
How To Optimize Your Tech Recruiting Stack
Patrick Christell, Senior Sourcer at Hire4ce, meets all the qualifications of “MASTER.”
We’re talking a Full-Lifecycle Recruiter, Project Manager and Agile sourcing pod-builder with seven-plus years of progressive experience recruiting for technology companies across the boards.
He also has a rather impressive tech stack, which is what this is all about.
Patrick is here to give you 60-minutes of training and live Q&A that will help you learn to recruit top talent.
In this webinar we will cover:
- How to search.
Tools like Hiretual, Seekout, AmazingHiring (and their plusses and minuses).
The difference between searching for senior-level engineers, how to know if you are on a purple squirrel hunt, and what to with a BONUS live demo that iterates a single string.
- How to run a sourcing pod.
Learn how Patrick creates his own CRM that can do outreach and reporting
- How to understand tech without being a techie.
What a software stack even is, understanding how it fits together, learning what each part of the stack technologies are associated with.
- How to engage talent.
Why a mixture of broad spectrum outreach and personalized outreach is best.
What cadence works best in 2019.
Why only using inmails screws you, and how to leverage the phone even if you hate using it (TextNow).
Nobody’s got time for a floppy stack.
Let Patrick show you how to build in functionality and results.
Research and Discovery Tools for Experimentation - 17 Apr 2024 - v 2.3 (1).pdfVWO
You can utilize various forms of Generative Research to deepen your understanding of how people interact with your product or service.
Craig has amassed a vast toolkit of research methods, which he has employed to optimize websites and apps for over 500 companies. He'll share which methods yielded the highest return on investment, identified key customer pain points, and generated the best experiment ideas.
By sharing the top inspection methods essential for our work, Craig will provide advice for each technique. Anticipate insights on driving experiment hypotheses from research, a list of essential toolkit components for tomorrow, and additional resources for further reading.
Embedding Clinical standards in research workshopJames Malone
My slides on a talk titled "Standards and ontologies - changing the value proposition" given at the "Clinical Standards in Research: Development, Implementation and Curation" workshop, EMBL-EBI, Cambridge, October 2016.
User Experience Design Fundamentals - Part 2: Talking with UsersLaura B
#2 in a 3-part series on UX Fundamentals: Talking with Users
Understand why you should talk to users to uncover, validate and/or understand their goals.
Learn how and when to talk with your users:
User research methods
Planning
Best practices for interviews
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
More Related Content
Similar to Human computation, crowdsourcing and social: An industrial perspective
UX Burlington 2017: Exploratory Research in UX DesignSarah Fathallah
Presentation given at the 2017 UX Burlington conference, on the topic of "Exploratory Research in UX Design."
Exploratory research focuses on gaining a deep understanding of the lives of the end users and the contexts in which they use certain products and services. At its core, it’s about challenging and exploring the problem space, before venturing into the solution space. Using real-life examples of digital tools that help people access affordable housing or register to vote, this talk will explore the different tools used for exploratory research, including ethnographic interviews, contextual inquiry, and co-creation activities and prompts. This talk will leave the audience with a better understanding of the types of insights that exploratory research generates, and how they can complement the findings of evaluative or comparative research.
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations.
In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
The talk focused on the Fundamentals of Product Management, leveraging the speaker's personal experiences in the AI field. It covered core Product Manager topics such as managing customer needs, business goals & technology feasibility, the holy trinity of the Product Manager discipline, delve into data analyses, rapid experimentation, and execution, and finally, explored the challenges of customer privacy, bias, and inclusivity in AI products.
Best Practices in Recommender System ChallengesAlan Said
Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.
How To Optimize Your Tech Recruiting Stack
Patrick Christell, Senior Sourcer at Hire4ce, meets all the qualifications of “MASTER.”
We’re talking a Full-Lifecycle Recruiter, Project Manager and Agile sourcing pod-builder with seven-plus years of progressive experience recruiting for technology companies across the boards.
He also has a rather impressive tech stack, which is what this is all about.
Patrick is here to give you 60-minutes of training and live Q&A that will help you learn to recruit top talent.
In this webinar we will cover:
- How to search.
Tools like Hiretual, Seekout, AmazingHiring (and their plusses and minuses).
The difference between searching for senior-level engineers, how to know if you are on a purple squirrel hunt, and what to with a BONUS live demo that iterates a single string.
- How to run a sourcing pod.
Learn how Patrick creates his own CRM that can do outreach and reporting
- How to understand tech without being a techie.
What a software stack even is, understanding how it fits together, learning what each part of the stack technologies are associated with.
- How to engage talent.
Why a mixture of broad spectrum outreach and personalized outreach is best.
What cadence works best in 2019.
Why only using inmails screws you, and how to leverage the phone even if you hate using it (TextNow).
Nobody’s got time for a floppy stack.
Let Patrick show you how to build in functionality and results.
Research and Discovery Tools for Experimentation - 17 Apr 2024 - v 2.3 (1).pdfVWO
You can utilize various forms of Generative Research to deepen your understanding of how people interact with your product or service.
Craig has amassed a vast toolkit of research methods, which he has employed to optimize websites and apps for over 500 companies. He'll share which methods yielded the highest return on investment, identified key customer pain points, and generated the best experiment ideas.
By sharing the top inspection methods essential for our work, Craig will provide advice for each technique. Anticipate insights on driving experiment hypotheses from research, a list of essential toolkit components for tomorrow, and additional resources for further reading.
Embedding Clinical standards in research workshopJames Malone
My slides on a talk titled "Standards and ontologies - changing the value proposition" given at the "Clinical Standards in Research: Development, Implementation and Curation" workshop, EMBL-EBI, Cambridge, October 2016.
User Experience Design Fundamentals - Part 2: Talking with UsersLaura B
#2 in a 3-part series on UX Fundamentals: Talking with Users
Understand why you should talk to users to uncover, validate and/or understand their goals.
Learn how and when to talk with your users:
User research methods
Planning
Best practices for interviews
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
2. Disclaimer
The views, opinions, positions, or strategies expressed in
this talk are mine and do not necessarily reflect the
official policy or position of Microsoft.
3. Introduction
• Crowdsourcing is hot
• Lots of interest in the research community
– Articles showing good results
– Journals special issues (IR, IEEE Internet Computing, etc.)
– Workshops and tutorials (SIGIR, NACL, WSDM, WWW, CHI,
RecSys, VLDB, etc.)
– HCOMP
– CrowdConf
• Large companies leveraging crowdsourcing
• Big data
• Start-ups
• Venture capital investment
4. Crowdsourcing
• Crowdsourcing is the act of taking a
job traditionally performed by a
designated agent (usually an
employee) and outsourcing it to an
undefined, generally large group of
people in the form of an open call.
• The application of Open Source
principles to fields outside of
software.
• Most successful story: Wikipedia
8. Human computation
• Not a new idea
• Computers before
computers
• You are a human
computer
9. Some definitions
• Human computation is a computation
that is performed by a human
• Human computation system is a system
that organizes human efforts to carry
out computation
• Crowdsourcing is a tool that a human
computation system can use to
distribute tasks.
Edith Law and Luis von Ahn. Human Computation.Morgan & Claypool Publishers, 2011.
10. More examples
• ESP game
• Captcha: 200M every day
• ReCaptcha: 750M to date
11. Data is king
• Massive free Web data
changed how we train
learning systems
• Crowds provide new access
to cheap & labeled big data
• But quality also matters
M. Banko and E. Brill. “Scaling to Very Very Large Corpora for Natural Language Disambiguation”, ACL 2001.
A. Halevy, P. Norvig, and F. Pereira. “The Unreasonable Effectiveness of Data”, IEEE Intelligent Systems 2009.
12. Traditional Data Collection
• Setup data collection software / harness
• Recruit participants / annotators / assessors
• Pay a flat fee for experiment or hourly wage
• Characteristics
– Slow
– Expensive
– Difficult and/or Tedious
– Sample Bias…
13. Natural Language Processing
• MTurk annotation for 5 NLP tasks
• 22K labels for US $26
• High agreement between consensus labels and
gold-standard labels
• Workers as good as experts
R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. “Cheap and Fast But is it Good? Evaluating Non-Expert
Annotations for Natural Language Tasks”. EMNLP-2008.
14. Machine Translation
• Manual evaluation
on translation quality
is slow and expensive
• High agreement
between non-experts
and experts
• $0.10 to translate a
sentence
C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality
Using Amazon’s Mechanical Turk”, EMNLP 2009.
15. Soylent
M. Bernstein et al. “Soylent: A Word Processor with a Crowd Inside”, UIST 2010
16. Mechanical Turk
• Amazon Mechanical Turk
(AMT, MTurk,
www.mturk.com)
• Crowdsourcing platform
• On-demand workforce
• “Artificial artificial
intelligence”: get humans to
do hard part
• Named after faux automaton
of 18th C.
22. Flip a coin
• Please flip a coin and report the results
• Two questions
1. Coin type?
2. Head or tails?
• Results
Row Labels Counts
head 57
tail 43
Grand Total 100
Row Labels Count
Dollar 56
Euro 11
Other 30
(blank) 3
Grand Total 100
23. Why is this interesting?
• Easy to prototype and test new experiments
• Cheap and fast
• No need to setup infrastructure
• Introduce experimentation early in the cycle
• For new ideas, this is very helpful
24. Caveats and clarifications
• Trust and reliability
• Wisdom of the crowd re-visit
• Adjust expectations
• Crowdsourcing is another data point for your
analysis
• Complementary to other experiments
25. Why now?
• The Web
• Use humans as processors in a distributed
system
• Address problems that computers aren’t good
• Scale
• Reach
26. Who are
the workers?
• A. Baio, November 2008. The Faces of Mechanical Turk.
• P. Ipeirotis. March 2010. The New Demographics of Mechanical Turk
• J. Ross, et al. Who are the Crowdworkers? CHI 2010.
31. Careful with That Axe Data, Eugene
• In the area of big data and machine learning:
– labels -> features -> predictive model -> optimization
• Labeling/experimentation perceived as boring
• Don’t rush labeling
– Human and machine
• Label quality is very important
– Don’t outsource it
– Own it end to end
– Large scale
32. More on label quality
• Data gathering is not a free lunch
• Labels for the machine != labels for humans
• Emphasis on algorithms,
models/optimizations and mining from labels
• Not so much on algorithms for ensuring high
quality labels
• Training sets
36. Motivating Example: Relevance Judging
• Relevance of search results is difficult to judge
– Highly subjective
– Expensive to measure
• Professional editors commonly used
• Potential benefits of crowdsourcing
– Scalability (time and cost)
– Diversity of judgments
39. Results for {idiot}
February 2011: 5/7 (R), 2/7 (NR)
Relevant
1. Most of the time those TV reality stars have absolutely no talent. They do whatever they
can to make a quick dollar. Most of the time the reality tv stars don not have a mind of
their own. R
2. Most are just celebrity wannabees. Many have little or no talent, they just want fame. R
3. Have you seen the knuckledraggers on reality television? They should be required to change
their names to idiot after appearing on the show. You could put numbers after the word
idiot so we can tell them apart. R
4. Although I have not followed too many of these shows, those that I have encountered have
for a great part a very common property. That property is that most of the participants
involved exhibit a shallow self-serving personality that borders on social pathological
behavior. To perform or act in such an abysmal way could only be an act of an idiot. R
5. I can see this one going both ways. A particular sort of reality star comes to mind,
though, one who was voted off Survivor because he chose not to use his immunity necklace.
Sometimes the label fits, but sometimes it might be unfair. R
Not Relevant
1. Just because someone else thinks they are an "idiot", doesn't mean that is what the word
means. I don't like to think that any one person's photo would be used to describe a
certain term. NR
2. While some reality-television stars are genuinely stupid (or cultivate an image of
stupidity), that does not mean they can or should be classified as "idiots." Some simply
act that way to increase their TV exposure and potential earnings. Other reality-television
stars are really intelligent people, and may be considered as idiots by people who don't
like them or agree with them. It is too subjective an issue to be a good result for a
search engine. NR
40. You have a new idea
• Novel IR technique
• Don’t have access to click data
• Can’t hire editors
• How to test new ideas?
41. Crowdsourcing and relevance evaluation
• Subject pool access: no need to come into the
lab
• Diversity
• Low cost
• Agile
42. Pedal to the metal
• You read the papers
• You tell your boss (or advisor) that
crowdsourcing is the way to go
• You now need to produce hundreds of
thousands of labels per month
• Easy, right?
43. Ask the right questions
• Instructions are key
• Workers are not IR experts so don’t assume
the same understanding in terms of
terminology
• Show examples
• Hire a technical writer
• Prepare to iterate
44. How not to do things
• Lot of work for a few cents
• Go here, go there, copy, enter, count …
45. UX design
• Time to apply all those usability concepts
• Need to grab attention
• Generic tips
– Experiment should be self-contained.
– Keep it short and simple.
– Be very clear with the task.
– Engage with the worker. Avoid boring stuff.
– Always ask for feedback (open-ended question) in an
input box.
• Localization
46. Payments
• How much is a HIT?
• Delicate balance
– Too little, no interest
– Too much, attract spammers
• Heuristics
– Start with something and wait to see if there is
interest or feedback (“I’ll do this for X amount”)
– Payment based on user effort. Example: $0.04 (2 cents
to answer a yes/no question, 2 cents if you provide
feedback that is not mandatory)
• Bonus
48. Quality control
• Extremely important part of the experiment
• Approach it as “overall” quality – not just for
workers
• Bi-directional channel
– You may think the worker is doing a bad job.
– The same worker may think you are a lousy
requester.
• Test with a gold standard
49. When to assess work quality?
• Beforehand (prior to main task activity)
– How: “qualification tests” or similar mechanism
– Purpose: screening, selection, recruiting, training
• During
– How: assess labels as worker produces them
– Like random checks on a manufacturing line
– Purpose: calibrate, reward/penalize, weight
• After
– How: compute accuracy metrics post-hoc
– Purpose: filter, calibrate, weight, retain
50. How do we measure work quality?
• Compare worker’s label vs.
– Known (correct, trusted) label
– Other workers’ labels
– Model predictions of workers and labels
• Verify worker’s label
– Yourself
– Tiered approach (e.g. Find-Fix-Verify)
51. Methods for measuring agreement
• Inter-agreement level
– Agreement between judges
– Agreement between judges and the gold set
• Some statistics
– Cohen’s kappa (2 raters)
– Fleiss’ kappa (any number of raters)
– Krippendorff’s alpha
• Gray areas
– 2 workers say “relevant” and 3 say “not relevant”
– 2-tier system
52. Content quality
• People like to work on things that they like
• Content and judgments according to modern
times
– TREC data set: airport security docs are pre 9/11
• Document length
• Randomize content
• Avoid worker fatigue
– Judging 100 documents on the same subject can be
tiring, leading to decreasing quality
53. Was the task difficult?
Ask workers to rate difficulty of a search topic
50 topics; 5 workers, $0.01 per task
54. So far …
• One may say “this is all good but looks like a
ton of work”
• The original goal: data is king
• Data quality and experimental designs are
preconditions to make sure we get the right
stuff
• Don’t cut corners
55. Pause
• Crowdsourcing works
– Fast turnaround, easy to experiment, few dollars to test
– But: you have to design experiments carefully, quality,
platform limitations
• Crowdsourcing in production
– Large scale data sets
– Continuous execution
– Difficult to debug
• How do you know the experiment is working
• Goal: framework for ensuring reliability on
crowdsourcing tasks
O. Alonso, C. Marshall and M. Najork. “Crowdsourcing a subjective labeling task: A human centered framework to ensure reliable
results” http://research.microsoft.com/apps/pubs/default.aspx?id=219755.
56. Labeling tweets – an example of a task
• Is this tweet interesting?
• Subjective activity
• Not focused on specific events
• Findings
– Difficult problem, low inter-rater agreement
– Tested many designs, number of workers, platforms
(MTurk and others)
• Multiple contingent factors
– Worker performance
– Work
– Task design
O. Alonso, C. Marshall & M. Najork. “Are some tweets more interesting than others? #hardquestion. HCIR 2013.
57. Designs that include in-task CAPTCHA
• Borrowed idea from reCAPTCHA -> use of
control term
• HIDDEN
• Adapt your labeling task
• 2 more questions as control
– 1 algorithmic
– 1 semantic
58. Production example #1
Q1 (k = 0.91, alpha = 0.91)
Q2 (k = 0.771, alpha = 0.771)
Q3 (k = 0.033, alpha = 0.035)
Tweet de-branded
In-task captcha
The main
question
59. Production example #2
Q1 (k = 0.907, alpha = 0.907)
Q2 (k = 0.728, alpha = 0.728)
• Q3 Worthless (alpha = 0.033)
• Q3 Trivial (alpha = 0.043)
• Q3 Funny (alpha = -0.016)
• Q3 Makes me curious (alpha = 0.026)
• Q3 Contains useful info (alpha = 0.048)
• Q3 Important news (alpha = 0.207)
Tweet de-branded
In-task captcha
Breakdown by
categories to
get better signal
60. Once we get here
• High quality labels
• Data will be later be used for rankers, ML
models, evaluations, etc.
• Training sets
• Scalability and repeatability
62. Algorithms
• Bandit problems; explore-exploit
• Optimizing amount of work by workers
– Humans have limited throughput
– Harder to scale than machines
• Selecting the right crowds
• Stopping rule
63. Humans in the loop
• Computation loops that mix humans and
machines
• Kind of active learning
• Double goal:
– Human checking on the machine
– Machine checking on humans
• Example: classifiers for social data
64. Routing
• Expertise detection and routing
• Social load balancing
• When to switch between machines and
humans
• CrowdSTAR
B. Nushi, O. Alonso, M. Hentschel, V. Kandylas. “CrowdSTAR: A Social Task Routing
Framework for Online Communities”, 2014. http://arxiv.org/abs/1407.6714
65. Social Task Routing
Task A? Task B?
C1 C2 Crowd Summaries
Crowd 1 (Twitter) Crowd 2 (Quora)
Routing across
crowds
Routing within a crowd
67. Conclusions
• Crowdsourcing at scale works but requires a solid
framework
• Fast turnaround, easy to experiment, few dollars to test
• But you have to design the experiments carefully
• Usability considerations
• Lots of opportunities to improve current platforms
• Three aspects that need attention: workers, work and
task design
• Labeling social data is hard
68. Conclusions – II
• Important to know your limitations and be
ready to collaborate
• Lots of different skills and expertise required
– Social/behavioral science
– Human factors
– Algorithms
– Economics
– Distributed systems
– Statistics