The document discusses the role of humans in an era of big data and machine learning. It outlines that humans are needed to tag data to help machines understand it, and that crowdsourcing is one way to obtain tagged data at scale. The presentation also covers how the human-in-the-loop paradigm involves humans actively training machine learning models through techniques like active learning.
BigData Republic teamed up with VodafoneZiggo and hosted an meetup on churn prediction.
Telecom companies like VodafoneZiggo have long benefited from the fine art/science of predicting churn. Currently, in the booming age of subscription based business models (e.g. Netflix, Spotify, HelloFresh), the importance of predicting churn has become widespread. During this event, VodafoneZiggo shared some of its wisdom with the public, after which BDR Data Scientist Tom de Ruijter presented an overview of the modeling tools at hand, both classical, as well as novel approaches. Finally, the participants engaged in a hands-on session showcasing the implementation of different approaches.
PART 1 — Churn Prediction in Practice by Florian Maas
At VodafoneZiggo we are incredibly excited about Advanced Analytics and the enormous potential for progress and innovation. In our state of the art open source platform we store the tremendous amount of data that is generated every single second in our mobile and fixed networks. This means that we have a vast body of rich information, which if unlocked, can lead to something very special. As a company with a primarily subscription-based service model, churn plays a vital role in the daily business. Not only is the churn rate a good indicator of customer (dis)satisfaction, it is also one out of two factors that determines the steady-state level of active customers. During this talk, we will show how data science provides added value in the process of churn prevention at VodafoneZiggo. We will talk about the data and the modeling approach we use, and the pitfalls and shortcomings that we have encountered while building the model. We will also briefly discuss potential improvements to the current approach, which brings us to talk #2.
PART 2 — The Churn Prediction Toolbox by Tom de Ruijter
The second talk will show you the fine intricacies of predicting churn through different approaches. We’ll start off with an overview of different modeling strategies for describing the problem of churn, both in terms of a classification problem as well as a regression problem. Secondly, Tom will give you insights in how you evaluate a churn model in a way such that business stakeholders know how to act upon the model results. Finally, we’ll work towards the hands-on session demonstrating different model approaches for churn prediction, ranging from classical time series prediction to recurrent neural networks.
Big data Analytics is a process to extract meaningful insight from big such as hidden patterns, unknown correlations, market trends and customer preferences
BigData Republic teamed up with VodafoneZiggo and hosted an meetup on churn prediction.
Telecom companies like VodafoneZiggo have long benefited from the fine art/science of predicting churn. Currently, in the booming age of subscription based business models (e.g. Netflix, Spotify, HelloFresh), the importance of predicting churn has become widespread. During this event, VodafoneZiggo shared some of its wisdom with the public, after which BDR Data Scientist Tom de Ruijter presented an overview of the modeling tools at hand, both classical, as well as novel approaches. Finally, the participants engaged in a hands-on session showcasing the implementation of different approaches.
PART 1 — Churn Prediction in Practice by Florian Maas
At VodafoneZiggo we are incredibly excited about Advanced Analytics and the enormous potential for progress and innovation. In our state of the art open source platform we store the tremendous amount of data that is generated every single second in our mobile and fixed networks. This means that we have a vast body of rich information, which if unlocked, can lead to something very special. As a company with a primarily subscription-based service model, churn plays a vital role in the daily business. Not only is the churn rate a good indicator of customer (dis)satisfaction, it is also one out of two factors that determines the steady-state level of active customers. During this talk, we will show how data science provides added value in the process of churn prevention at VodafoneZiggo. We will talk about the data and the modeling approach we use, and the pitfalls and shortcomings that we have encountered while building the model. We will also briefly discuss potential improvements to the current approach, which brings us to talk #2.
PART 2 — The Churn Prediction Toolbox by Tom de Ruijter
The second talk will show you the fine intricacies of predicting churn through different approaches. We’ll start off with an overview of different modeling strategies for describing the problem of churn, both in terms of a classification problem as well as a regression problem. Secondly, Tom will give you insights in how you evaluate a churn model in a way such that business stakeholders know how to act upon the model results. Finally, we’ll work towards the hands-on session demonstrating different model approaches for churn prediction, ranging from classical time series prediction to recurrent neural networks.
Big data Analytics is a process to extract meaningful insight from big such as hidden patterns, unknown correlations, market trends and customer preferences
Ethics in Data Science and Machine LearningHJ van Veen
Introduction and overview on ethics in data science and machine learning, variations and examples of algorithmic bias, and a call-to-action for self-regulation. Given by Thierry Silbermann as part of the Sao Paulo Machine Learning Meetup, theme: "Ethics".
https://www.linkedin.com/in/thierrysilbermann
https://twitter.com/silbermannt
https://github.com/thierry-silbermann
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Simplilearn
The presentation about Big Data Analytics will help you know why Big Data analytics is required, what is Big Data analytics, the lifecycle of Big Data analytics, types of Big Data analytics, tools used in Big Data analytics and few Big Data application domains. Also, we'll see a use case on how Spotify uses Big Data analytics. Big Data analytics is a process to extract meaningful insights from Big Data such as hidden patterns, unknown correlations, market trends, and customer preferences. One of the essential benefits of Big Data analytics is used for product development and innovations. Now, let us get started and understand Big Data Analytics in detail.
Below are explained in this Big Data analytics tutorial:
1. Why Big Data analytics?
2. What is Big Data analytics?
3. Lifecycle of Big Data analytics
4. Types of Big Data analytics
5. Tools used in Big Data analytics
6. Big Data application domains
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Sales forecasting is a valuable tool for any growing business. This presentation explores the sales forecasting of Walmart store along with the causal analysis included several factors such as temperature, fuel price etc. affecting the future sales. The forecasting is done for the point of sales of department “Outdoor living” and “Stationery”. In time-series analysis we have built ARIMA models to forecast the future sales taking the weekly input sales into consideration. The best model is chosen by comparing the results of the white noise test, unit root test, and forecast.
For linear regression, the input variables are a combination of both continuous and categorical variables. The list of Continuous variables includes temperature, fuel price, CPI (Consumer Price Index) and the Unemployment rate, while the list of categorical variables includes Store Size and IsHoliday.
On performing this analysis, we found that variables temperature and CPI have the highest impact on the sales; where the temperature is positively impacting and CPI is negatively impacting.
Marketing analytics a practical guide to improving consumer insights using da...MarketingForum
Available at a lower price from other sellers that may not offer free Prime shipping.
Who is most likely to buy and what is the best way to target them? How can businesses improve strategy without identifying the key influencing factors? The second edition of Marketing Analytics enables marketers and business analysts to leverage predictive techniques to measure and improve marketing performance. By exploring real-world marketing challenges, it provides clear, jargon-free explanations on how to apply different analytical models for each purpose. From targeted list creation and data segmentation, to testing campaign effectiveness, pricing structures and forecasting demand, this book offers a welcome handbook on how statistics, consumer analytics and modelling can be put to optimal use.
The fully revised second edition of Marketing Analytics includes three new chapters on big data analytics, insights and panel regression, including how to collect, separate and analyze big data. All of the advanced tools and techniques for predictive analytics have been updated, translating models such as tobit analysis for customer lifetime value into everyday use. Whether an experienced practitioner or having no prior knowledge, methodologies are simplified to ensure the more complex aspects of data and analytics are fully accessible for any level of application. Complete with downloadable data sets and test bank resources, this book supplies a concrete foundation to optimize marketing analytics for day-to-day business advantage.
COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,
Big data is a term that describes the large volume of data may be both structured and unstructured.
That inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
Presentation for Taxonomy Bootcamp 2015 by Naomi Oorbeck & Jessica DuVerneay. Covers how taxonomy improved digital products, when to use a lightweight approach, planning & scoping lightweight work, and an overview of key skills and approaches to taxonomy development.
Ethics in Data Science and Machine LearningHJ van Veen
Introduction and overview on ethics in data science and machine learning, variations and examples of algorithmic bias, and a call-to-action for self-regulation. Given by Thierry Silbermann as part of the Sao Paulo Machine Learning Meetup, theme: "Ethics".
https://www.linkedin.com/in/thierrysilbermann
https://twitter.com/silbermannt
https://github.com/thierry-silbermann
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Simplilearn
The presentation about Big Data Analytics will help you know why Big Data analytics is required, what is Big Data analytics, the lifecycle of Big Data analytics, types of Big Data analytics, tools used in Big Data analytics and few Big Data application domains. Also, we'll see a use case on how Spotify uses Big Data analytics. Big Data analytics is a process to extract meaningful insights from Big Data such as hidden patterns, unknown correlations, market trends, and customer preferences. One of the essential benefits of Big Data analytics is used for product development and innovations. Now, let us get started and understand Big Data Analytics in detail.
Below are explained in this Big Data analytics tutorial:
1. Why Big Data analytics?
2. What is Big Data analytics?
3. Lifecycle of Big Data analytics
4. Types of Big Data analytics
5. Tools used in Big Data analytics
6. Big Data application domains
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Sales forecasting is a valuable tool for any growing business. This presentation explores the sales forecasting of Walmart store along with the causal analysis included several factors such as temperature, fuel price etc. affecting the future sales. The forecasting is done for the point of sales of department “Outdoor living” and “Stationery”. In time-series analysis we have built ARIMA models to forecast the future sales taking the weekly input sales into consideration. The best model is chosen by comparing the results of the white noise test, unit root test, and forecast.
For linear regression, the input variables are a combination of both continuous and categorical variables. The list of Continuous variables includes temperature, fuel price, CPI (Consumer Price Index) and the Unemployment rate, while the list of categorical variables includes Store Size and IsHoliday.
On performing this analysis, we found that variables temperature and CPI have the highest impact on the sales; where the temperature is positively impacting and CPI is negatively impacting.
Marketing analytics a practical guide to improving consumer insights using da...MarketingForum
Available at a lower price from other sellers that may not offer free Prime shipping.
Who is most likely to buy and what is the best way to target them? How can businesses improve strategy without identifying the key influencing factors? The second edition of Marketing Analytics enables marketers and business analysts to leverage predictive techniques to measure and improve marketing performance. By exploring real-world marketing challenges, it provides clear, jargon-free explanations on how to apply different analytical models for each purpose. From targeted list creation and data segmentation, to testing campaign effectiveness, pricing structures and forecasting demand, this book offers a welcome handbook on how statistics, consumer analytics and modelling can be put to optimal use.
The fully revised second edition of Marketing Analytics includes three new chapters on big data analytics, insights and panel regression, including how to collect, separate and analyze big data. All of the advanced tools and techniques for predictive analytics have been updated, translating models such as tobit analysis for customer lifetime value into everyday use. Whether an experienced practitioner or having no prior knowledge, methodologies are simplified to ensure the more complex aspects of data and analytics are fully accessible for any level of application. Complete with downloadable data sets and test bank resources, this book supplies a concrete foundation to optimize marketing analytics for day-to-day business advantage.
COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,
Big data is a term that describes the large volume of data may be both structured and unstructured.
That inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.
Presentation for Taxonomy Bootcamp 2015 by Naomi Oorbeck & Jessica DuVerneay. Covers how taxonomy improved digital products, when to use a lightweight approach, planning & scoping lightweight work, and an overview of key skills and approaches to taxonomy development.
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016Filipe Barretto
Palestra realizada no Rio Cloud Computing Meetup, apresentando os principais lançamentos durante o AWS re:Invent 2016, divulgados nas palestras do Andy Jassy, CEO da AWS, e do Werner Vogels, CTO da Amazon.
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewMarco Amado
An overview of the tools and tricks you could use to turn a monolithic big pile of... Apache, PHP, and MariaDB into an awesome high-availability, load balanced, shiny new pile of... Apache, PHP, and MariaDB. Zero, or almost zero changes to the codebase.
This overview is not intended to be a business case for data science. It is expected that you are already familiar with the value proposition. However, a reference to several case study examples has been included at the end of this document as a reminder of the broad applicability of the subject at hand.
The intent of this document is to set in motion the discussion for the creation of a startup in South Africa that is focused on data science.
One key area of Oracle OpenWorld 2016 was data in various shapes. Big Data, streaming data and traditional transactional data. The power of SQL to access and unleash all data - even data in NoSQL databases. The advent of the citizen data scientist. Streaming data analysis in real time on vast and fast and vast data, data discovery. And the new Oracle Database 12cR2 release. Forms, APEX, SQL and PL/SQL.
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
Speakers: Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!)
In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.
Channel partners: Get ready for future trends in client solutionsDell World
Client solutions technology is ever-changing and Dell’s Channel partners are challenged with aligning future trends with customer needs. Learn how to adapt your business model to emerging technology and drive positive impact for customers in a candid round table discussion and Q&A with key Dell executives and Channel partners. Panelists will share insights on mobility and security trends, thoughts on competitive landscape and best practices for sharing technology improvements with customers.
Natural Intelligence the human factor in AIBill Liu
Presented at AI NEXTCon Seattle 1/17-20, 2018
http://aisea18.xnextcon.com
join our free online AI group with 50,000+ tech engineers to learn and practice AI technology, including: latest AI news, tech articles/blogs, tech talks, tutorial videos, and hands-on workshop/codelabs, on machine learning, deep learning, data science, etc..
Measuring Impact: Towards a data citation metricEdward Baker
How the ViBRANT and eMonocot projects are building tools, including a modified implementation of Bourne and Fink's 'Scholar Factor', the Biodiversity Data Journal, and Scratchpad's user metrics and statistics modules.
Understanding big data and data analytics big dataSeta Wicaksana
Big Data helps companies to generate valuable insights. Companies use Big Data to refine their marketing campaigns and techniques. Companies use it in machine learning projects to train machines, predictive modeling, and other advanced analytics applications.
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations.
In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.
Understanding big data and data analytics-Business IntelligenceSeta Wicaksana
Faster and more accurate reporting, analysis or planning; better business decisions; improved employee satisfaction and improved data quality top the list. Benefits achieved least frequently include reducing costs, and increasing revenues.
How can we mine, analyse and visualise the Social Web?
In this lecture, you will learn about mining social web data for analysis. Data preparation and gathering basic statistics on your data.
My presentation given at the Association of Subscription Agents annual conference, Feb 2013.
It was titled Understanding how researchers and practitioners use STM information, but the specific theme was understanding how to design information products and services for researchs and practitioners against a background of information abundance (aka information overload).
We provide real time big data training in Chennai by industrial experts with real time scenarios.
Our Advanced topics will enhance the students expectations into high level knowledge in Big Data Technology.
For More Info.Reach our Big Data Technical Team@ +91 96677211551/56
The Experience of Big data Training Experts Team.
www.thecreatingexperts.com
SAP BEST INSTITUTES IN CHENNAI
http://www.youtube.com/watch?v=UpWthI0P-7g
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
This is the fourth lecture in the Social Web course at the VU University Amsterdam
Visit the website for more information: <a>Social Web 2012</a>
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...BigDataExpo
Tijdens deze presentatie wordt duidelijk hoe je machine learning kunt toepassen in het dagelijks leven. Denk aan het kopen van een huis, het kijken van Goede Tijden Slechte Tijden, shoppen bij IKEA en het bezoeken van restaurants.
In this session we'll dive into the journey that Google chooses to take in order focus on AI: what was the mindset, what were the challenges and what is the direction for the future.
Pacmed - Machine Learning in health care: opportunities and challanges in pra...BigDataExpo
The potential of personalized medicine based on machine learning is huge, but big challenges must be overcome to implement this technology in practice. Hidde will discuss both sides of the story, including a case study on the intensive care.
De Toekomst Verkenner is een ‘award winning’ innovatie van PGGM, die in een rap temp doorontwikkeling naar een platform maakt.
In zijn presentatie zal Mladen Sančanin vertellen hoe PGGM real time data en algoritmes heeft ingezet om dit platform te bouwen en hoe PGGM innovaties vanuit haar ‘Big Data Lab’ ondersteunt?
In een half uur worden veel ervaringen gedeeld over het opzetten van innovatieprojecten gebruik makend van data en het inrichten van data lab in een corporate omgeving.
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...BigDataExpo
Het GGHDC onderzoekt wat de gezondheidseffecten zijn van omgeving en leefstijl in relatie tot het dagelijks leven van mensen. Het onderzoekscentrum is opgebouwd rond een gedeelde data- infrastructuur van de Universiteit Utrecht en het Universitair Medisch Centrum Utrecht (UMCU).
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...BigDataExpo
IoT, Big Data, AI creëren een nieuwe situatie met betrekking tot het nemen van beslissingen door beleidsmakers. Toch verschuift er weinig in ons democratisch bestel, terwijl onze data in handen zijn van GAFA, China en andere nieuwe vormen van bestuur die nog ontstaan in de digitale transitie. Wij, in Europa, staan stil.
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...BigDataExpo
Construction companies such as BAM Infra Telecom rely on accurate, up-to-date maps. Google Maps isn’t enough, but doing on-site surveys is expensive and time-consuming. However, driving through and recording 360° video from a car is cheap and easy. Using machine learning, we turn videos into highly accurate maps.
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIBigDataExpo
Dynniq is a high-tech, innovative company offering smart mobility solutions and services internationally. We will present advanced IoT use cases Dynniq is working on, and share how GoDataDriven helps set up an AI capability. We will share our learnings, and show what makes data science in the mobility domain unique.
Teleperformance - Smart personalized service door het gebruik van Data Science BigDataExpo
Bij Teleperformance helpen we klanten waarde toe te voegen aan het klanttraject. We gebruiken Data Science voor onze Omnichannel-klantinteracties om de behoeften van de klant te voorspellen, zodat we het beste antwoord kunnen geven.
FunXtion - Interactive Digital Fitness with Data AnalyticsBigDataExpo
Digital is the new Personal. FunXtion Interactive is een interactieve trainingservaring voor zowel binnen als buiten de sportschool. FunXtion is revolutionair in de fitness branche en volledig data driven, by design. FunXtion laat zien hoe zij real-time data gebruiken voor ondersteuning van beslissingen, proces automatisering, personalisatie en product innovatie.
fashionTrade - Vroeger noemde we dat Big DataBigDataExpo
Big Data was de verzamelnaam voor alles wat je nog niet deed, maar al wel door Google of Amazon was uitgevonden. Inmiddels doen we al die dingen wel dus heet productaanbevelingen weer gewoon productaanbevelingen, fraudebestrijding weer fraudebestrijding, en spraakherkenning nog steeds spraakherkenning; geen Big Data. Geeft niet, want nu is er AI. Deze keynote legt uit of dat anders is, en waarom.
BigData Republic - Industrializing data science: a view from the trenchesBigDataExpo
What does it take to bring machine learning algorithms to production and start delivering business value? How can teams of data scientists and engineers effectively collaborate on a single product, integrate with existing IT systems and keep business stakeholders involved? Using real-life examples, we discuss the challenges and best practices.
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...BigDataExpo
Industry expert Dave Vanhoudt will set out his vision for the future of data infrastructure. Dave will highlight the key role automation must play in any data infrastructure strategy today, drawing on his current role with Medtronic, and past experiences at AB Inbev, Baxter, BMW and Nike.
Endrse - Next level online samenwerkingen tussen personalities en merken met ...BigDataExpo
Digitaal is vrijwel alles meetbaar. Maar het is vaak een uitdaging om de impact van samenwerkingen tussen influencers (topsporters) en bedrijven te analyseren. Start-up Endrse gebruikt AI om socialmediacontent te analyseren om content van influencers en bedrijven beter op elkaar te laten aansluiten. Zo maak je impact bij het publiek!
Bovag - Refine-IT - Proces optimalisatie in de automotive sectorBigDataExpo
De ontwikkelingen in de automotive sector gaan snel: elektrisch rijdende auto’s, de snelle groei van private lease, over the air connectiviteit, services on the demand en advanced driver assistance is zo maar een greep uit deze ontwikkelingen. Voorbeelden van (big) data ontwikkelingen die van grote invloed zijn op de automotive retail. De transitie naar een nieuw verdienmodel daagt uit tot samenwerken en datagedreven procesoptimalisatie.
Wilco Schellevis, directeur van Refine-IT en Renate Weggemans, manager strategie en beleid, bij BOVAG Autodealers, nemen u mee in de case Dely-App. Een mooi staaltje samenwerken en datagedreven procesoptimalisatie in de automotive retail; gevangen in één app.
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...BigDataExpo
Schiphol is Europa’s best connected airport en verwerkt op piekdagen tot 235.000 passagiers. Om deze soepel door de processen te leiden is een betrouwbare prognose van de drukte noodzakelijk. Schiphol laat zien hoe zij datatoepassingen ontwikkelt om het aantal reizigers zo accuraat mogelijk te voorspellen en hiermee processen in te richten.
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...BigDataExpo
Veco is marktleider op het gebied van het ontwerpen en vervaardigen van precisie delen middels electroformeren. In deze presentatie zal uitgeleverd worden hoe Veco succesvol Process Mining heeft ingezet in de productie om doorlooptijd te reduceren en new business te creëren. Tevens wordt uitgelegd wat Process Mining is.
Rabobank - There is something about DataBigDataExpo
Technologische mogelijkheden en GDPR, een continue clash? En hoe staat het met de het ethisch (her)gebruik van data? Leer in deze sessie van Rabobank’s Big Data journey en krijg inzicht in: organisatorische keuzes, data Lab technologie visie & data strategie, als enabler en accelerator van digitale innovatie en transformatie.
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...BigDataExpo
In zijn presentatie gaat Frans Feldberg in op het ‘Waarom, Wat, en Hoe’ van big data en datagedreven business model innovation. Hoe is de wereld, als het om data gaat, de laatste jaren veranderd? Waarom zijn big data, business analytics en kunstmatige intelligentie belangrijke digitale innovaties die hoog op menig managementagenda staat en waarom investeren organisaties aanzienlijk in big data en data science? Hoe kunnen organisaties waarde met data creëren door zowel het verbeteren van het bestaande business model als door nieuwe data-gedreven business modellen te ontwikkelen. Dit zijn vragen die in zijn presentatie beantwoord zullen worden.
Booking.com - Data science and experimentation at Booking.com: a data-driven ...BigDataExpo
At Booking.com we have experienced what a data driven organisation means for creating business impact. And what looks it like, when experimentation is part of your company culture.
During this session we will share our experiences and learnings on how data science and experimentation go hand in go.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate
and
exhaustive
measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
3. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate and
exhaustive measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
4. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate and
exhaustive measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
5. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate and
exhaustive measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
6. q Humans & Big Data
• The
role
of
human
beings
in
the
era
of
Big
Data
• Why
do
we
need
to
tag
data?
• How
to
get
tagged
data?
q The Era of Crowdsourcing
• What
is
Crowdsourcing?
• Use
cases
and
details
about
Crowdsourcing
• Traditional
crowds
vs.
curated
crowds
q The Human-‐in-‐the-‐Loop Paradigm
• Definition
and
details
about
Human-‐In-‐The-‐Loop
ML
• Introduction
to
Active
Learning
Outline
7. q Humans & Big Data
• The
role
of
human
beings
in
the
era
of
Big
Data
• Why
do
we
need
to
tag
data?
• How
to
get
tagged
data?
q The Era of Crowdsourcing
• What
is
Crowdsourcing?
• Use
cases
and
details
about
Crowdsourcing
• Traditional
crowds
vs.
curated
crowds
q The Human-‐in-‐the-‐Loop Paradigm
• Definition
and
details
about
Human-‐In-‐The-‐Loop
ML
• Introduction
to
Active
Learning
Outline
8. q Humans & Big Data
• The
role
of
human
beings
in
the
era
of
Big
Data
• Why
do
we
need
to
tag
data?
• How
to
get
tagged
data?
q The Era of Crowdsourcing
• What
is
Crowdsourcing?
• Use
cases
and
details
about
Crowdsourcing
• Traditional
crowds
vs.
curated
crowds
q The Human-‐in-‐the-‐Loop Paradigm
• Definition
and
details
about
Human-‐In-‐The-‐Loop
ML
• Introduction
to
Active
Learning
Outline
9. Humans & Big Data:
The Role of Human Beings in the Era of
Machine Learning
10. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
11. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
12. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
13. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
14. Supervised vs. Unsupervised Machine Learning
Supervised ML
requires tagged data
• Classification:
problem
where
the
output
variable
is
a
category
examples:
SVM,
random
forest,
Bayesian
classifiers
• Regression:
problem
where
the
output
variable
is
a
real
value
examples:
linear
regression,
random
forest
15. Supervised vs. Unsupervised Machine Learning
Supervised ML
requires tagged data
Unsupervised ML
doesn’t require tagged data
• Classification:
problem
where
the
output
variable
is
a
category
examples:
SVM,
random
forest,
Bayesian
classifiers
• Regression:
problem
where
the
output
variable
is
a
real
value
examples:
linear
regression,
random
forest
• Clustering:
discovery of inherent groupings in the data
examples: k-‐means, k-‐nearest neighbors
• Association rules:
discovery of rules describing the data
example: Apriori algorithm
16. Supervised vs. Unsupervised Machine Learning
Supervised ML
requires tagged data
Unsupervised ML
doesn’t require tagged data
Supervised:
• Image
Recognition
• Speech
Recognition
Unsupervised
• Feature
Learning
• Autoencoders
• Classification:
problem
where
the
output
variable
is
a
category
examples:
SVM,
random
forest,
Bayesian
classifiers
• Regression:
problem
where
the
output
variable
is
a
real
value
examples:
linear
regression,
random
forest
• Clustering:
discovery of inherent groupings in the data
examples: k-‐means, k-‐nearest neighbors
• Association rules:
discovery of rules describing the data
example: Apriori algorithm
The Case of Deep Learning
both supervised and unsupervised applications
NB:
Deep
Learning
algorithms
are
data-‐greedy…
17. • Gathering quality tagged training data is a common bottleneck in ML
• Expensive
• Quality
control
is
hard,
requires
second
human
pass
• Hardly
scalable
à heavy
use
of
sampling
strategies
• How do companies doing Machine Learning get tagged data?
• Implicit
tagging:
customer
engagement
• Explicit
tagging:
manual
labor
• A few strategies to get tagged data for cheap/free:
• Games
(Google
Quick
Draw)
• Incentivization
(extra
lives
or
bonuses
in
games)
Tagged Data
18. • Gathering quality tagged training data is a common bottleneck in ML
• Expensive
• Quality
control
is
hard,
requires
second
human
pass
• Hardly
scalable
à heavy
use
of
sampling
strategies
• How do companies doing Machine Learning get tagged data?
• Implicit
tagging:
customer
engagement
• Explicit
tagging:
manual
labor
• A few strategies to get tagged data for cheap/free:
• Games
(Google
Quick
Draw)
• Incentivization
(extra
lives
or
bonuses
in
games)
Tagged Data
19. • Gathering quality tagged training data is a common bottleneck in ML
• Expensive
• Quality
control
is
hard,
requires
second
human
pass
• Hardly
scalable
à heavy
use
of
sampling
strategies
• How do companies doing Machine Learning get tagged data?
• Implicit
tagging:
customer
engagement
• Explicit
tagging:
manual
labor
• A few strategies to get tagged data for cheap/free:
• Games
(Google
Quick
Draw)
• Incentivization
(extra
lives
or
bonuses
in
games)
Tagged Data
https://quickdraw.withgoogle.com/
20. Why
human
input
matters:
the
use
case
of
image
colorization
The Wisdom from the Crowd
21. Why
human
input
matters:
the
use
case
of
image
colorization
The Wisdom from the Crowd
Colorization
Model
à Colorization
is
straightforward
to
humans
because
they
can
‘tap’
into
their
general
knowledge
22. The Wisdom from the Crowd
image
recognition
watermelon
grapesbananas
pineapple
orange
tagged training
data
set
“Bananas
are
generally
”
‘general’
knowledge
• obvious
for
human
beings
• fastidious
for
machines
colorization
Why
human
input
matters:
the
use
case
of
image
colorization
24. What is Crowdsourcing?
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
Crowdsourcing
25. What is Crowdsourcing?
Ø Crowdsourcing
=
'crowd'
+
'outsourcing'
Ø Act
of
taking
a
function
once
performed
by
employees
and
outsourcing
it
to
an
undefined
(generally
large)
network
of
people
in
the
form
of
an
open
call
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
History of Crowdsourcing
• Term
was
first
used
in
2005
by
the
editors
at Wired
• Official
definition
published
in
Wired
article
“The
Rise
of
Crowdsourcing”,
June
2016
• Describes
how
businesses
were
using
the
Internet
to
“outsource
work
to
the
crowd”
What
Crowdsourcing
helps
with:
• Scale
à peer-‐production
(for jobs
to
be
performed
collaboratively)
• Reach
à connect
with
a
large
network
of
potential
laborers
(if
task
undertaken
by
sole
individuals)
Crowdsourcing
26. What is Crowdsourcing?
Ø Crowdsourcing
=
'crowd'
+
'outsourcing'
Ø Act
of
taking
a
function
once
performed
by
employees
and
outsourcing
it
to
an
undefined
(generally
large)
network
of
people
in
the
form
of
an
open
call
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
History of Crowdsourcing
• Term
was
first
used
in
2005 by
the
editors
at Wired
• Official
definition
published
in
Wired
article
“The
Rise
of
Crowdsourcing”,
June
2006
• Describes
how
businesses
were
using
the
Internet
to
“outsource
work
to
the
crowd”
What
Crowdsourcing
helps
with:
• Scale
à peer-‐production
(for jobs
to
be
performed
collaboratively)
• Reach
à connect
with
a
large
network
of
potential
laborers
(if
task
undertaken
by
sole
individuals)
Crowdsourcing
27. What is Crowdsourcing?
Ø Crowdsourcing
=
'crowd'
+
'outsourcing'
Ø Act
of
taking
a
function
once
performed
by
employees
and
outsourcing
it
to
an
undefined
(generally
large)
network
of
people
in
the
form
of
an
open
call
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
Crowdsourcing
History of Crowdsourcing
• Term
was
first
used
in
2005 by
the
editors
at Wired
• Official
definition
published
in
Wired
article
“The
Rise
of
Crowdsourcing”,
June
2016
• Describes
how
businesses
were
using
the
Internet
to
“outsource
work
to
the
crowd”
What
Crowdsourcing
helps
with:
• Scale
à peer-‐production
(for jobs
to
be
performed
collaboratively)
• Reach
à connect
with
a
large
network
of
potential
laborers
(if
task
undertaken
by
sole
individuals)
28. The Nature of Crowdsourcing
• Data
generation: user
generated
content
such
as
reviews,
pictures,
translations,
etc.
• Data
validation:
validation
of
translation,
etc.
• Data
tagging:
image
tagging,
product
categorization,
etc.
• Data
curation:
curation
of
news
feeds,
etc.
Microtasks
Funding
Macrotasks
• Solution
development:
algorithm
improvement,
etc.
• Crowd
contest:
design
competition,
algorithmic
competition,
etc.
29. The Nature of Crowdsourcing
• Data
generation: user
generated
content
such
as
reviews,
pictures,
translations,
etc.
• Data
validation:
validation
of
translation,
etc.
• Data
tagging:
image
tagging,
product
categorization,
etc.
• Data
curation:
curation
of
news
feeds,
etc.
Microtasks
Funding
Macrotasks
• Solution
development:
algorithm
improvement,
etc.
• Crowd
contest:
design
competition,
algorithmic
competition,
etc.
30. The Nature of Crowdsourcing
• Data
generation: user
generated
content
such
as
reviews,
pictures,
translations,
etc.
• Data
validation:
validation
of
translation,
etc.
• Data
tagging:
image
tagging,
product
categorization,
etc.
• Data
curation:
curation
of
news
feeds,
etc.
Microtasks
Funding
Macrotasks
• Solution
development:
algorithm
improvement,
etc.
• Crowd
contest:
design
competition,
algorithmic
competition,
etc.
32. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
33. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
Traffic
• Google
Traffic
• Waze:
Traffic
reporting
app
34. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
Traffic
• Google
Traffic
• Waze:
Traffic
reporting
app
Translation
• Google
Translate
35. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
Traffic
• Google
Traffic
• Waze:
Traffic
reporting
app
Epidemiology
• Flu
tracking
applications
Translation
• Google
Translate
36. Companies Based on Crowdsourcing
Quora is
a question-‐and-‐answer
site where
questions
are
asked,
answered,
edited
and
organized
by
its
community
of
users.
Waze
is
a
community-‐based
traffic
and
navigation
app
where
drivers
share
real-‐time
traffic
and
road
info
Kaggle is
a
platform
for predictive
modelling competitions
in
which
companies
post
data
and
data
miners
compete
to
produce
the
best
models.
Stack
Overflow
is
a
platform
for
users
to
ask
and
answer
questions
and
to
vote
questions
and
answers
up
or
down
and
edit
them.
Flickr is
an image
and
video
hosting website that
is
widely
used
by bloggers to
host
images
that
they
embed
in
social
media.
38. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
39. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
40. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
41. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
42. Crowdsourcing vs. Curated Crowds
Traditional Crowdsourcing Model
$$$$$
+ Speed:
• many
hands
generate
light
work
+ Lower
cost:
• typically
a
few
pennies
per
task
-‐ No
quality
control
-‐ Lack
of
control:
• little
to
no
incentive
to
deliver
on
time
-‐ High
maintenance:
• clear
instructions
needed
• automated
understanding
checks
-‐ Lower
reliability:
• high
overlap
required
-‐ Lack
of
confidentiality:
• anyone
can
see
your
tasks
Curated Crowd
$$$$$
+ Quality
control:
• judges
submitted
to
quality
metrics
• removed
if
they
don’t
deliver
required
quality
+ Better
quality:
• very
little
overlap
needed
+ Expertise:
• judges
become
experts
at
required
task
+ Constraints
on
crowd:
• judges
less
likely
to
drop
out
-‐ More
expensive:
• typically
primary
source
of
income
for
judges
-‐ Consistency
required:
• need
frequent
tasks
to
keep
sharp
skills
43. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
44. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
45. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
46. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
47. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
48. Use Case: Evaluation of Search Engine Relevance
à Human
evaluation
makes
it
possible
to
measure
the
intangible
with
little
risk
Ranking BRanking A
Side-‐by-‐Side Engine Comparison
Judge
1:
Prefers
ranking
A
Judge
2:
Prefers
ranking
A
Judge
3:
Prefers
ranking
B
49. Use Case: Evaluation of Search Engine Relevance
5/5
5/5
5/5
4/5
3/5
2/5
5/5
5/5
5/5
5/5
5/5
5/5
Query-‐Item Relevance Scoring for
Measurement of Ranking Quality
𝐷𝐶𝐺$ = &
𝑟𝑒𝑙*
𝑙𝑜𝑔-(𝑖 + 1)
$
*34
𝑛𝐷𝐶𝐺$ =
𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$ = &
289:; − 1
𝑙𝑜𝑔-(𝑖 + 1)
=>?
*34
where
graded
relevance
of item at
position i
Discounted
cumulative
gain
51. The Dream of Automation
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
The 4 Industrial Revolutions
52. The Dream of Automation
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
à Automation is not a new idea
The 4 Industrial Revolutions
53. The Dream of Automation
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
à Automation is not a new idea
The 4 Industrial Revolutions
the
use
of
various control
systems for
operating
equipment
such
as
machinery
and
processes
with
minimal
or
reduced
human
intervention.
Automation
54. The Dream of Automation
the
use
of
various control
systems for
operating
equipment
such
as
machinery
and
processes
with
minimal
or
reduced
human
intervention.
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
Why?
• Automate
boring/repetitive
tasks
• Perform
tasks
at
scale
• Perform
tasks
with
enhanced
precision
• Deliver
consistent products
• Use
machines
where
they
outperform
humans
à Automation is not a new idea
The 4 Industrial Revolutions Automation
55. When Full Automation can’t be Achieved…
Human-‐in-‐the-‐Loop
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
56. The idea of using human beings to enhance the machine is not new
We
have
been
doing
Human-‐in-‐the-‐Loop
all
along…
• Example:
Autopilot
technology
for
planes
Human intervention/presence is useful:
• To
handle
corner
cases
(outlier
management)
• To
“keep
an
eye”
on
the
system
(sanity
check)
• To
correct
unwanted
behavior
(refinement)
• To
validate
appropriate
behavior
(validation)
When Full Automation can’t be Achieved…
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
Human-‐in-‐the-‐Loop
57. The idea of using human beings to enhance the machine is not new
We
have
been
doing
Human-‐in-‐the-‐Loop
all
along…
• Example:
Autopilot
technology
for
planes
Human intervention/presence is useful:
• To
handle
corner
cases
(outlier
management)
• To
“keep
an
eye”
on
the
system
(sanity
check)
• To
correct
unwanted
behavior
(refinement)
• To
validate
appropriate
behavior
(validation)
When Full Automation can’t be Achieved…
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
Human-‐in-‐the-‐Loop
58. Human-‐in-‐the-‐Loop Paradigm
Pareto Principle
aka
the
80/20
rule,
the law
of
the
vital
few, or
the principle
of
factor
sparsity
-‐ states
that,
for
many
events,
roughly
80%
of
the
effects
come
from
20%
of
the
causes
59. ML version of the Pareto Principle:
• Evidence
suggests
that
some
of
the
most
accurate
ML
systems
to
date need:
• 80%
computer
AI-‐driven
• 19%
human
input
• 1
%
unknown
randomness
to
balance
things
out
• The
combination
of
machine
and
human
intervention
achieves
maximum
machine
accuracy
How can human knowledge be incorporated to ML models?
A. Helping
label
the
original
dataset
that
will
be
fed
into
a
ML
model
B. Helping
correct
inaccurate
predictions
that
arise
as
the
system
goes
live.
Human-‐in-‐the-‐Loop Paradigm
aka
the
80/20
rule,
the law
of
the
vital
few, or
the principle
of
factor
sparsity
-‐ states
that,
for
many
events,
roughly
80%
of
the
effects
come
from
20%
of
the
causes
Pareto Principle
60. ML version of the Pareto Principle:
• Evidence
suggests
that
some
of
the
most
accurate
ML
systems
to
date need:
• 80%
computer
AI-‐driven
• 19%
human
input
• 1
%
unknown
randomness
to
balance
things
out
• The
combination
of
machine
and
human
intervention
achieves
maximum
machine
accuracy
How can human knowledge be incorporated to ML models?
A. Helping
label
the
original
dataset
that
will
be
fed
into
a
ML
model
B. Helping
correct
inaccurate
predictions
that
arise
as
the
system
goes
live
Human-‐in-‐the-‐Loop Paradigm
aka
the
80/20
rule,
the law
of
the
vital
few, or
the principle
of
factor
sparsity
-‐ states
that,
for
many
events,
roughly
80%
of
the
effects
come
from
20%
of
the
causes
Pareto Principle
63. Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy
• Facebook's
DeepFace Software
reaches
97.25%
of
accuracy
HITL as a feedback loop
• When
the
confidence
is
below
a
certain
threshold,
it:
• suggests
a
label
• ask
the
uploader
to
validate/approve
or
correct
the
suggestion
• The
new
data
is
used
to
improve
the
accuracy
of
the
algorithm
An example of HITL approach: face recognition
64. Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy
• Facebook's
DeepFace Software
reaches
97.25%
of
accuracy
HITL as a feedback loop
• When
the
confidence
is
below
a
certain
threshold,
it:
• suggests a
label
• ask
the
uploader
to
validate/approve
or
correct
the
suggestion
• The
new
data
is
used
to
improve
the
accuracy
of
the
algorithm
An example of HITL approach: face recognition
66. Teaching the machine
• Driving
systems
were
trained
using
a
human
to
oversee
the
process
Accuracy considerations
• Autopilot
system
is
now
over
99%
accurate
• However,
a
99%
accuracy
means
that
people
can
die
1%
of
the
time
(!!)
• Though
we
have
seen
huge
advances
in
accuracy
of
pure
machine-‐
driven
systems,
they
tend
to fall
short
of
acceptable accuracy
rates
Human-‐In-‐The-‐Loop Use Case #2
An example of HITL approach: autonomous vehicles
67. Teaching the machine
• Driving
systems
were
trained
using
a
human
to
oversee
the
process
Accuracy considerations
• Autopilot
system
is
now
over
99%
accurate
• However,
a
99%
accuracy
means
that
people
can
die
1%
of
the
time
(!!)
• Though
we
have
seen
huge
advances
in
accuracy
of
pure
machine-‐
driven
systems,
they
tend
to fall
short
of
acceptable accuracy
rates
Human-‐In-‐The-‐Loop Use Case #2
An example of HITL approach: autonomous vehicles
68. Teaching the machine
• Driving
systems
were
trained
using
a
human
to
oversee
the
process
Accuracy considerations
• Autopilot
system
is
now
over
99%
accurate
• However,
a
99%
accuracy
means
that
people
can
die
1%
of
the
time
(!!)
• Though
we
have
seen
huge
advances
in
accuracy
of
pure
machine-‐
driven
systems,
they
tend
to fall
short
of
acceptable accuracy
rates
Human-‐In-‐The-‐Loop Use Case #2
An example of HITL approach: autonomous vehicles
Corner
cases
• Fun
fact: Volvo’s
self-‐driving
cars
fail
in
Australia
because
of
kangaroos
• Reaching
100%
is
hard
because
of
corner
cases
• A
HITL
approach
helps
get
the
accuracy
to
~100%
• get
the
accuracy
to
~100%
Volvo's
driverless
cars
'confused'
by
kangaroos
69. The Success of Human-‐In-‐The-‐Loop
The Example of Chess
70. The Human vs. the Machine
• In
1997,
Chess
Master
Garry
Kasparov
is
beaten
by
IBM
supercomputer
Deep
Blue
The Success of Human-‐In-‐The-‐Loop
The Example of Chess
Garry
Kasparov
71. The Human vs. the Machine
• In
1997,
Chess
Master
Garry
Kasparov
is
beaten
by
IBM
supercomputer
Deep
Blue
The Success of Human-‐In-‐The-‐Loop
The Example of Chess
Freestyle
or
“Advanced”
Chess
• Advanced:
A
human
chess
master
works
with
a
computer
to
find
the
best
possible
move
• Freestyle:
A
team
can
be
made
of
any
combination
of
human
beings
+
computers
• In
2005,
Steven
Cramton,
Zackary
Stephen
and
their
3
computers
win
Freestyle
Chess
Tournament
Why it works
• Computers
are
great
at
reading
tough
tactical
situations
• But
humans
are
better
at
understanding
long
term
strategy
• Computers
to
limit
“blunders”
while
using
their
intuition
to
force
the
opponent
into
board
states
that
confuses
the
computer(s)
Garry
Kasparov
72. The Human vs. the Machine
• In
1997,
Chess
Master
Garry
Kasparov
is
beaten
by
IBM
supercomputer
Deep
Blue
The Success of Human-‐In-‐The-‐Loop
The Example of Chess
Freestyle
or
“Advanced”
Chess
• Advanced:
A
human
chess
master
works
with
a
computer
to
find
the
best
possible
move
• Freestyle:
A
team
can
be
made
of
any
combination
of
human
beings
+
computers
• In
2005,
Steven
Cramton,
Zackary
Stephen
and
their
3
computers
win
Freestyle
Chess
Tournament
Why it works
• Computers
are
great
at
reading
tough
tactical
situations
• But
humans
are
better
at
understanding
long
term
strategy
• Computers
to
limit
“blunders”
while
using
their
intuition
to
force
the
opponent
into
board
states
that
confuses
the
computer(s)
Garry
Kasparov
74. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
Active Learning
75. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General Strategy
If
D
is the
entire
data
set,
a each
iteration i , D is
broken
up
into
three
subsets
1. DK,
i :
data
points
where
the
label
is known
2. DU,
i :
data
points
where
the
label
is unknown
3. DQ,
i :
data
points for
which
the
label
is
queried
(sometimes,
even
when
the
label
is
known)
Benefits
• Query
labels
only
when
necessary
(lower
cost)
Next Generation Algorithms
• Proactive
learning:
• relaxes
the
assumption
that
the
oracle
is
always
right
• casts
the
problem
as
an
optimization
problem w/
a budget
constraint
Active Learning
76. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General Strategy
If
D
is the
entire
data
set,
a each
iteration i , D is
broken
up
into
three
subsets
1. DK,
i :
data
points
where
the
label
is known
2. DU,
i :
data
points
where
the
label
is unknown
3. DQ,
i :
data
points for
which
the
label
is
queried
(sometimes,
even
when
the
label
is
known)
Benefits
• Query
labels
only
when
necessary
(lower
cost)
Next Generation Algorithms
• Proactive
learning:
• relaxes
the
assumption
that
the
oracle
is
always
right
• casts
the
problem
as
an
optimization
problem w/
a budget
constraint
Active Learning
77. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General Strategy
If
D
is the
entire
data
set,
a each
iteration i , D is
broken
up
into
three
subsets
1. DK,
i :
data
points
where
the
label
is known
2. DU,
i :
data
points
where
the
label
is unknown
3. DQ,
i :
data
points for
which
the
label
is
queried
(sometimes,
even
when
the
label
is
known)
Benefits
• Query
labels
only
when
necessary
(lower
cost)
Next Generation Algorithms
• Proactive
learning:
• relaxes
the
assumption
that
the
oracle
is
always
right
• casts
the
problem
as
an
optimization
problem w/
a budget
constraint
Active Learning
79. Active Learning: How does it Work?
Machine
Learning
needs
• Logics
(algorithm)
• Data
• Optimization
• Feedback
ß Human-‐in-‐the-‐Loop
Active
Learning
=
a
Machine
Learning
Algorithm
using
an
“oracle”
to
reduce
mistakes/uncertainty
Query
Strategy
-‐ Labels
are
queried
when:
• Data
points
for
which
model
uncertainty
is
high
(uncertainty
sampling)
• Data
points
for
which
the
different
models
of
an
ensemble
method
disagree
the
most
(query
by
committee)
• Data
points
causing
the
most
changes
on
the
model
(expected
model
change)
• Data
points
caused
overall
variance
to
be
high
(variance
reduction)
80. Active Learning: How does it Work?
Unlabeled Data
Active
Learning
Algorithm
select/remove
single
example
Labeled Data
Classifier
Oracle
(Human)
update
add
labeled
example
provide
correct
label
Machine
Learning
needs
• Logics
(algorithm)
• Data
• Optimization
• Feedback
ß Human-‐in-‐the-‐Loop
Active
Learning
=
a
Machine
Learning
Algorithm
using
an
“oracle”
to
reduce
mistakes/uncertainty
Query
Strategy
-‐ Labels
are
queried
when:
• Data
points
for
which
model
uncertainty
is
high
(uncertainty
sampling)
• Data
points
for
which
the
different
models
of
an
ensemble
method
disagree
the
most
(query
by
committee)
• Data
points
causing
the
most
changes
on
the
model
(expected
model
change)
• Data
points
caused
overall
variance
to
be
high
(variance
reduction)
81. Active Learning: How does it Work?
Unlabeled Data
Active
Learning
Algorithm
select/remove
single
example
Labeled Data
Classifier
Oracle
(Human)
update
add
labeled
example
provide
correct
label
Machine
Learning
needs
• Logics
(algorithm)
• Data
• Optimization
• Feedback
ß Human-‐in-‐the-‐Loop
Active
Learning
=
a
Machine
Learning
Algorithm
using
an
“oracle”
to
reduce
mistakes/uncertainty
Query
Strategy
-‐ Labels
are
queried
when:
• Data
points
for
which
model
uncertainty
is
high
(uncertainty
sampling)
• Data
points
for
which
the
different
models
of
an
ensemble
method
disagree
the
most
(query
by
committee)
• Data
points
causing
the
most
changes
on
the
model
(expected
model
change)
• Data
points
caused
overall
variance
to
be
high
(variance
reduction)
82. Active Learning: How does it Work?
Machine Learning
Classifier
Confidence
level
high?
YES
NO
Output
Annotation by
Human Oracle
Human-‐in-‐the-‐Loop
Active
Learning
By adding a human feedback loop, we allow the system to:
• actively
learn
• correct
itself
where
it
got
it
wrong
• improve
the
algorithm
over
iterations
83. Active Learning: How does it Work?
Machine Learning
Classifier
Confidence
level
high?
YES
NO
Output
Annotation by
Human Oracle
Human-‐in-‐the-‐Loop
Active
Learning
By adding a human feedback loop, we allow the system to:
• actively
learn
• correct
itself
where
it
got
it
wrong
• improve
the
algorithm
over
iterations
84. 3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
85. q Machine Learning Lifecycle Management (Programming by Feedback)
• Automatic
monitoring
of
input
and
output
values
for
ML
algorithm
• An
algorithm
detects
failings
and
outliers
in
real-‐time
and
suggest
an
action
• A
human
validates
the
action,
creating
tagged
data
for
full
automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)
• Algorithm
uncovers
demoted
items
and
suggests
most
likely
reason
for
the
demotion
• Engineer
manually
confirms/corrects
the
suggestion,
generating
training
data
for
full
automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human
evaluation
team
manually
measures
accuracy
of
query
tagging
model
• Mistagged
queries
are
used
to
discover
patterns
specific
to
problematic
queries,
which
are
reported
to
engineers
• Sample
is
enriched
with
problematic
queries
(evaluation
team
can
diagnose
problems
with
algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
86. q Machine Learning Lifecycle Management (Programming by Feedback)
• Automatic
monitoring
of
input
and
output
values
for
ML
algorithm
• An
algorithm
detects
failings
and
outliers
in
real-‐time
and
suggest
an
action
• A
human
validates
the
action,
creating
tagged
data
for
full
automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)
• Algorithm
uncovers
demoted
items
and
suggests
most
likely
reason
for
the
demotion
• Engineer
manually
confirms/corrects
the
suggestion,
generating
training
data
for
full
automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human
evaluation
team
manually
measures
accuracy
of
query
tagging
model
• Mistagged
queries
are
used
to
discover
patterns
specific
to
problematic
queries,
which
are
reported
to
engineers
• Sample
is
enriched
with
problematic
queries
(evaluation
team
can
diagnose
problems
with
algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
87. q Machine Learning Lifecycle Management (Programming by Feedback)
• Automatic
monitoring
of
input
and
output
values
for
ML
algorithm
• An
algorithm
detects
failings
and
outliers
in
real-‐time
and
suggest
an
action
• A
human
validates
the
action,
creating
tagged
data
for
full
automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)
• Algorithm
uncovers
demoted
items
and
suggests
most
likely
reason
for
the
demotion
• Engineer
manually
confirms/corrects
the
suggestion,
generating
training
data
for
full
automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human
evaluation
team
manually
measures
accuracy
of
query
tagging
model
• Mistagged
queries
are
used
to
discover
patterns
specific
to
problematic
queries,
which
are
reported
to
engineers
• Sample
is
enriched
with
problematic
queries
(evaluation
team
can
diagnose
problems
with
algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
red t-shirt Size M
color product
type size
Active Learning at Walmart e-‐Commerce
88. • Why do humans and machine complement each other?
• Human
beings
are
memory-‐constrained
• Computers
are
knowledge-‐constrained
• Tagged data more important than ever
• But
getting
quality
data
is
challenging
given
the
volume
of
data
• Crowdsourcing
offer
more
flexibility
to
tag
data
at
scale
• Human-‐in-‐the-‐Loop paradigm
• Improve
accuracy
of
machine
learning
algorithm
(classifiers)
• Many
examples
of
successful
endeavors
using
“Augmented
Intelligence”
• Active
Learning
is
a
booming
area
of
ML/AI
Conclusion and Takeaways
89. • Why do humans and machine complement each other?
• Human
beings
are
memory-‐constrained
• Computers
are
knowledge-‐constrained
• Tagged data more important than ever
• But
getting
quality
data
is
challenging
given
the
volume
of
data
• Crowdsourcing
offer
more
flexibility
to
tag
data
at
scale
• Human-‐in-‐the-‐Loop paradigm
• Improve
accuracy
of
machine
learning
algorithm
(classifiers)
• Many
examples
of
successful
endeavors
using
“Augmented
Intelligence”
• Active
Learning
is
a
booming
area
of
ML/AI
Conclusion and Takeaways
90. • Why do humans and machine complement each other?
• Human
beings
are
memory-‐constrained
• Computers
are
knowledge-‐constrained
• Tagged data more important than ever
• But
getting
quality
data
is
challenging
given
the
volume
of
data
• Crowdsourcing
offer
more
flexibility
to
tag
data
at
scale
• Human-‐in-‐the-‐Loop paradigm
• Improve
accuracy
of
machine
learning
algorithm
(classifiers)
• Many
examples
of
successful
endeavors
using
“Augmented
Intelligence”
• Active
Learning
is
a
booming
area
of
ML/AI
Conclusion and Takeaways