SlideShare a Scribd company logo
1 of 21
Conal Sathi
Data Alchemist at Slice Technologies
@aDataAlchemist
a human touch in machine learning
1
computer systems that can learn from training data and make
predictions for new (unlabeled) data
machine learning
2
who do you trust?
3
who do you trust?
4
can they work together?
founded in 2010, Slice has amassed a 4.4-million person panel of
US online shoppers to date
5
Slice: who we are
• data set: e-receipts in e-mail
• over 1 billion consumer purchases
processed
• $100 billion in purchases
Slice: your smart shopping assistant
6
1. track shipments automatically
2. store shopping receipts
3. save money with price drop alerts
4. stay safe with recall alerts
Unroll.me: clean up your inbox
7
1. unsubscribe with one click
2. combine favorite subscriptions into one
email
3. read what you want, when you want
8
a treasure trove of data sits in e-receipts
field extraction data sciencereceipt discovery
the secret sauce to unique purchase insights
men’s clothing
women’s clothing
item-level consumer purchase data
9
package out for delivery
Nike Kobe X, 9.5, wide,
blue lagoon, $144.97
across all merchants, channels, and time
E-COMMERCE
TRAVEL
Los Angeles, CA
check in: Fri, April 15, 2016
check out: Sun, April 17, 2016
PAYMENTS
payment confirmation
large non-fat mocha to La Stacione Cafe
OFFLINE RETAIL
TICKETS
reservation for Varekai
Sat, March 12, 2016 @7:15pm
purchased Wed, March 9, 2016
cashmere blend scarf, $98.00
Panasonic35-100mmf/2.8X OIS
machine learning for product semantics
10
SamsungGalaxyS4SPH -16GB-Purple
Panasonic35-100mmf/ 2.8X OIS
machine learning for product semantics
11
SamsungGalaxyS4SPH -16GB-Purple
electronics & accessories
cameras & photo
camera lenses
brand: Panasonic
focal length: 35 - 100mm
aperture: 2.8
electronics & accessories
telephones & mobiles
mobile phones
brand: Samsung
sub-brand: Galaxy
model: S4
memory: 16GB
color: purple
traditional approach to categorization
12
engine
training set
learning
algorithm
unlabeled
data
categories
need to continuously evolve
long tail of edge cases
• continuous improvement difficult
issues
13
use machines and humans together to create a continuously
improving system
to do this, we need to scale humans to big data!
so what do we do?
14
humans can directly improve the engine by:
• adding training data
• creating rules
why rules?
• can quickly bring up the accuracy (and in a deterministic fashion)
• helpful when training data is sparse/not representative for a particular
category
• effective and easy way to handle corner cases
what do humans do?
15
types of humans involved in our system:
• machine learning engineers
• analysts (domain experience)
add more human labor with:
• crowdsourced workers (very elastic supply, micro-tasks)
• outsourced workers (less elastic, more involved tasks)
types of workers
16
get data/opinions/services from a large online community
Amazon mechanical turk
• initially invented in Amazon for internal data cleaning
employs over 500,000 people worldwide
can create tasks on demand
crowdsourcing
17
18
the downside of crowdsourcing is that you have to give micro-tasks (not very
involved tasks) and you do not work closely with the workers
this is why we use outsourcing as well
• can give much more involved tasks (allow them to filter/search to find
misclassifications)
• we can work closely with them to teach them the taxonomy and how to
use our tools effectively
• they have much more practice on these tasks, so they can be very
productive and eventually become domain experts
outsourcing
19
with our system, we have created:
• continuous monitoring of the quality of our engine
• mechanism to quickly improve quality when issue is detected
• created large training data set (16 million items)
• extensive rules (12K)
in conclusion….
20
Slice Intelligence confidential. Do not copy or distribute without prior written consent.
questions?
visit sliceintelligence.com or email conal@slice.com
or follow us on Twitter: @aDataAlchemist | @sliceintel

More Related Content

Similar to A Human Touch in Machine Learning

Data Economy: Lessons learned and the Road ahead!
Data Economy: Lessons learned and the Road ahead!Data Economy: Lessons learned and the Road ahead!
Data Economy: Lessons learned and the Road ahead!Ahmet Bulut
 
Soccnx10 Man versus Machine – A Story About Embracing Innovation
Soccnx10 Man versus Machine – A Story About Embracing Innovation Soccnx10 Man versus Machine – A Story About Embracing Innovation
Soccnx10 Man versus Machine – A Story About Embracing Innovation Femke Goedhart
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGUmair Shafique
 
Management by data
Management by dataManagement by data
Management by dataLuca Foresti
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptxSamiksha880257
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0 Aditya Randika
 
Digital Disruptions and Transformation
Digital Disruptions and TransformationDigital Disruptions and Transformation
Digital Disruptions and TransformationMohammad Faiz
 
Voice of the customer requirements overview
Voice of the customer requirements overviewVoice of the customer requirements overview
Voice of the customer requirements overviewMichael Dattilio
 
Class 2 -- Knowledge & Knowledge Management at Accenture.ppt
Class 2 -- Knowledge & Knowledge Management at Accenture.pptClass 2 -- Knowledge & Knowledge Management at Accenture.ppt
Class 2 -- Knowledge & Knowledge Management at Accenture.pptVaishnaviG18CA248
 
Use of data science for startups_Sept 2021
Use of data science for startups_Sept 2021Use of data science for startups_Sept 2021
Use of data science for startups_Sept 2021Bohitesh Misra, PMP
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
 
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...DataScienceConferenc1
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsApplause
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of dataHarsha MV
 
Data Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and SellsData Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and SellsLeadGenius
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveBigML, Inc
 
Robotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkRobotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkDr. Crispin Coombs
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantLynne Thomas
 

Similar to A Human Touch in Machine Learning (20)

Data Economy: Lessons learned and the Road ahead!
Data Economy: Lessons learned and the Road ahead!Data Economy: Lessons learned and the Road ahead!
Data Economy: Lessons learned and the Road ahead!
 
Soccnx10 Man versus Machine – A Story About Embracing Innovation
Soccnx10 Man versus Machine – A Story About Embracing Innovation Soccnx10 Man versus Machine – A Story About Embracing Innovation
Soccnx10 Man versus Machine – A Story About Embracing Innovation
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Management by data
Management by dataManagement by data
Management by data
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0
 
Digital Disruptions and Transformation
Digital Disruptions and TransformationDigital Disruptions and Transformation
Digital Disruptions and Transformation
 
Voice of the customer requirements overview
Voice of the customer requirements overviewVoice of the customer requirements overview
Voice of the customer requirements overview
 
Class 2 -- Knowledge & Knowledge Management at Accenture.ppt
Class 2 -- Knowledge & Knowledge Management at Accenture.pptClass 2 -- Knowledge & Knowledge Management at Accenture.ppt
Class 2 -- Knowledge & Knowledge Management at Accenture.ppt
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
Use of data science for startups_Sept 2021
Use of data science for startups_Sept 2021Use of data science for startups_Sept 2021
Use of data science for startups_Sept 2021
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
 
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI Applications
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
Data Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and SellsData Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and Sells
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
 
Robotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkRobotisation of Knowledge and Service Work
Robotisation of Knowledge and Service Work
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

A Human Touch in Machine Learning

  • 1. Conal Sathi Data Alchemist at Slice Technologies @aDataAlchemist a human touch in machine learning 1
  • 2. computer systems that can learn from training data and make predictions for new (unlabeled) data machine learning 2
  • 3. who do you trust? 3
  • 4. who do you trust? 4 can they work together?
  • 5. founded in 2010, Slice has amassed a 4.4-million person panel of US online shoppers to date 5 Slice: who we are • data set: e-receipts in e-mail • over 1 billion consumer purchases processed • $100 billion in purchases
  • 6. Slice: your smart shopping assistant 6 1. track shipments automatically 2. store shopping receipts 3. save money with price drop alerts 4. stay safe with recall alerts
  • 7. Unroll.me: clean up your inbox 7 1. unsubscribe with one click 2. combine favorite subscriptions into one email 3. read what you want, when you want
  • 8. 8 a treasure trove of data sits in e-receipts field extraction data sciencereceipt discovery the secret sauce to unique purchase insights men’s clothing women’s clothing
  • 9. item-level consumer purchase data 9 package out for delivery Nike Kobe X, 9.5, wide, blue lagoon, $144.97 across all merchants, channels, and time E-COMMERCE TRAVEL Los Angeles, CA check in: Fri, April 15, 2016 check out: Sun, April 17, 2016 PAYMENTS payment confirmation large non-fat mocha to La Stacione Cafe OFFLINE RETAIL TICKETS reservation for Varekai Sat, March 12, 2016 @7:15pm purchased Wed, March 9, 2016 cashmere blend scarf, $98.00
  • 10. Panasonic35-100mmf/2.8X OIS machine learning for product semantics 10 SamsungGalaxyS4SPH -16GB-Purple
  • 11. Panasonic35-100mmf/ 2.8X OIS machine learning for product semantics 11 SamsungGalaxyS4SPH -16GB-Purple electronics & accessories cameras & photo camera lenses brand: Panasonic focal length: 35 - 100mm aperture: 2.8 electronics & accessories telephones & mobiles mobile phones brand: Samsung sub-brand: Galaxy model: S4 memory: 16GB color: purple
  • 12. traditional approach to categorization 12 engine training set learning algorithm unlabeled data categories
  • 13. need to continuously evolve long tail of edge cases • continuous improvement difficult issues 13
  • 14. use machines and humans together to create a continuously improving system to do this, we need to scale humans to big data! so what do we do? 14
  • 15. humans can directly improve the engine by: • adding training data • creating rules why rules? • can quickly bring up the accuracy (and in a deterministic fashion) • helpful when training data is sparse/not representative for a particular category • effective and easy way to handle corner cases what do humans do? 15
  • 16. types of humans involved in our system: • machine learning engineers • analysts (domain experience) add more human labor with: • crowdsourced workers (very elastic supply, micro-tasks) • outsourced workers (less elastic, more involved tasks) types of workers 16
  • 17. get data/opinions/services from a large online community Amazon mechanical turk • initially invented in Amazon for internal data cleaning employs over 500,000 people worldwide can create tasks on demand crowdsourcing 17
  • 18. 18
  • 19. the downside of crowdsourcing is that you have to give micro-tasks (not very involved tasks) and you do not work closely with the workers this is why we use outsourcing as well • can give much more involved tasks (allow them to filter/search to find misclassifications) • we can work closely with them to teach them the taxonomy and how to use our tools effectively • they have much more practice on these tasks, so they can be very productive and eventually become domain experts outsourcing 19
  • 20. with our system, we have created: • continuous monitoring of the quality of our engine • mechanism to quickly improve quality when issue is detected • created large training data set (16 million items) • extensive rules (12K) in conclusion…. 20
  • 21. Slice Intelligence confidential. Do not copy or distribute without prior written consent. questions? visit sliceintelligence.com or email conal@slice.com or follow us on Twitter: @aDataAlchemist | @sliceintel

Editor's Notes

  1. 4.429M user panel as of 10/5/16