SlideShare a Scribd company logo
Learning to Rank with Deep
Visual Semantic Features
@ Etsy
AI with the Best
Kamelia Aryafar, Senior Data Scientist, @karyafar
Search Sciences, Etsy
September 2016
1
Etsy
Etsy is a global marketplace where people around the world connect,
both online and offline, to make, sell and buy unique goods.
3
By the Numbers
1.6M
active sellers
AS OF DECEMBER 31, 2015
24M
active buyers
AS OF DECEMBER 31, 2015
$2.39B
annual GMS
IN 2015
35+M
items for sale
AS OF DECEMBER 31, 2015
Photo by Kirsty-Lyn Jameson
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
819
employees around
the world
AS OF DECEMBER 31, 2015
9
offices in
7 countries
AS OF DECEMBER 31, 2015
Photo by Emily Andrews
Work and Culture
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
Large and Unique Seller Base
1.6M
active sellers
AS OF SEPTEMBER 30, 2015
95%
of sellers run their
Etsy shop from home
2014 ETSY SELLER SURVEY
76%
consider their
shop a business
2014 ETSY SELLER SURVEY
Photo by Moira K. Lime
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
Etsy Made in Canada
Photo by Jean-Michael Seminaro
24M
active buyers
AS OF DECEMBER 31, 2015
92%
of buyers agree
Etsy
offers products
they can't
find elsewhere
2014 ETSY BUYER SURVEY
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
8
Learning To Rank
11
Approaches to Learning to Rank
• Pointwise
- For an item, predict it’s grade (implicit ordering)
- Labels come from interactions with items
- Possible class imbalance
• Pairwise
- Ranking transformed to pairwise classification or regression
- Labels depend on ordering of item pair
- Ability to create balanced classes
• Listwise
- Input is entire set of documents associated with query
- Output is their ranked list
- Eg. Loss is a measure of the distance of ranking generated by the model to the perfect ranking for the
set of documents
Pairwise Learning
Each training instance represents a pair of items from same set of search
results in your logs.
<item1, item2>
Learner must learn to order item1 and item2 correctly, with respect to user
preference decisions found in your logs.
Features
Label Creation for Pairwise Features
{housewarming, gift, photo} - {housewarming, gift, ceramic, tile} → +1
{housewarming, gift, ceramic, tile } - {housewarming, gift, photo} → -1
Train Classifier (SVM)
Learning to Rank Pipeline
Multimodal Learning to Rank
Image vs. Text Features
20
Texture
Shape
Color
Title
Tags
Extracting Image Features
21
title Feature Engineerings Deep Learning
22
ImageNet
23Photo from : http://www.image-net.org/
Convolutional Neural Nets (CNNs)
24
Photo from: http://cs231n.stanford.edu/
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Karen Simonyan & Andrew Zisserman
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
Model Specs: VGGnet
Transfer Learning
26
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
32
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
Production Pipeline
35
Collecting Item Pairs with Labels
36
Strategies for Model Application
Real Time Offline
Pros Can handle unseen items No cost to feature
complexity
Cons Latency cost ∝ complexity
Query time feature
computations
Computations compound as
considerations increase
Real time model evaluation
Fetch Model
(cache, key-value store)
Apply Ranking
(ranking or reranking pass)
User Query Top-k results
Top-k results
RerankedIndex Ranking Model
Gaining Confidence
Gaining Confidence
Performance Replays
Performance Replays
Ranking Replays
Offline Evaluation Metrics
Model Understanding: Side by Side
Custom Queries & Explain Logs
The future…
Further Reading
etsy.com/careers
Thanks!

More Related Content

Similar to Learning to Rank with Deep Visual Semantic Features - Kamelia, Seniors Data Scientist, Etsy

Yuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBayYuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBay
MLconf
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...
June Andrews
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
Bill Liu
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
DataWorks Summit
 
IDA Halifax Education Project Pitch
IDA Halifax Education Project PitchIDA Halifax Education Project Pitch
IDA Halifax Education Project Pitch
Ben Capozzi
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
krisztianbalog
 
Elastic loves Graphs
Elastic loves GraphsElastic loves Graphs
Elastic loves Graphs
GraphRM
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
DataWorks Summit/Hadoop Summit
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
DataWorks Summit/Hadoop Summit
 
Programming in Java: Object and Classes
Programming in Java: Object and ClassesProgramming in Java: Object and Classes
Programming in Java: Object and Classes
Martin Chapman
 
Introduction to btec firsts
Introduction to btec firstsIntroduction to btec firsts
Introduction to btec firsts
Cat Davies
 
Use Big Data to Improve Content Marketing - Cheemin
Use Big Data to Improve Content Marketing - CheeminUse Big Data to Improve Content Marketing - Cheemin
Use Big Data to Improve Content Marketing - Cheemin
Pam Didner
 
[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning
[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning
[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning
NAVER Engineering
 
Applied Machine Learning Conference: Synthetic OCR data
Applied Machine Learning Conference: Synthetic OCR dataApplied Machine Learning Conference: Synthetic OCR data
Applied Machine Learning Conference: Synthetic OCR data
Quinn Lathrop
 
BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019
Stanford University
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
AlphaIssaghaDiallo
 
Artificial Intelligence in Action
Artificial Intelligence in ActionArtificial Intelligence in Action
Artificial Intelligence in Action
Benjamin Ejzenberg
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
dongchangim30
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
Alexander Konduforov
 
Merchandising-at-eBay-SearchMeetup-2012
Merchandising-at-eBay-SearchMeetup-2012Merchandising-at-eBay-SearchMeetup-2012
Merchandising-at-eBay-SearchMeetup-2012Venkat Sundaranatha
 

Similar to Learning to Rank with Deep Visual Semantic Features - Kamelia, Seniors Data Scientist, Etsy (20)

Yuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBayYuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBay
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
 
IDA Halifax Education Project Pitch
IDA Halifax Education Project PitchIDA Halifax Education Project Pitch
IDA Halifax Education Project Pitch
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Elastic loves Graphs
Elastic loves GraphsElastic loves Graphs
Elastic loves Graphs
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
 
Programming in Java: Object and Classes
Programming in Java: Object and ClassesProgramming in Java: Object and Classes
Programming in Java: Object and Classes
 
Introduction to btec firsts
Introduction to btec firstsIntroduction to btec firsts
Introduction to btec firsts
 
Use Big Data to Improve Content Marketing - Cheemin
Use Big Data to Improve Content Marketing - CheeminUse Big Data to Improve Content Marketing - Cheemin
Use Big Data to Improve Content Marketing - Cheemin
 
[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning
[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning
[CVPR 2018] Visual Search (Image Retrieval) and Metric Learning
 
Applied Machine Learning Conference: Synthetic OCR data
Applied Machine Learning Conference: Synthetic OCR dataApplied Machine Learning Conference: Synthetic OCR data
Applied Machine Learning Conference: Synthetic OCR data
 
BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
 
Artificial Intelligence in Action
Artificial Intelligence in ActionArtificial Intelligence in Action
Artificial Intelligence in Action
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Merchandising-at-eBay-SearchMeetup-2012
Merchandising-at-eBay-SearchMeetup-2012Merchandising-at-eBay-SearchMeetup-2012
Merchandising-at-eBay-SearchMeetup-2012
 

More from WithTheBest

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
WithTheBest
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
WithTheBest
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
WithTheBest
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
WithTheBest
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
WithTheBest
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
WithTheBest
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
WithTheBest
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
WithTheBest
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
WithTheBest
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
WithTheBest
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
WithTheBest
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
WithTheBest
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
WithTheBest
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
WithTheBest
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
WithTheBest
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
WithTheBest
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
WithTheBest
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
WithTheBest
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
WithTheBest
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
WithTheBest
 

More from WithTheBest (20)

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Learning to Rank with Deep Visual Semantic Features - Kamelia, Seniors Data Scientist, Etsy

  • 1. Learning to Rank with Deep Visual Semantic Features @ Etsy AI with the Best Kamelia Aryafar, Senior Data Scientist, @karyafar Search Sciences, Etsy September 2016 1
  • 3. Etsy is a global marketplace where people around the world connect, both online and offline, to make, sell and buy unique goods. 3
  • 4. By the Numbers 1.6M active sellers AS OF DECEMBER 31, 2015 24M active buyers AS OF DECEMBER 31, 2015 $2.39B annual GMS IN 2015 35+M items for sale AS OF DECEMBER 31, 2015 Photo by Kirsty-Lyn Jameson DISCLAIMER The statistics included on the following slides are updated quarterly.
  • 5. 819 employees around the world AS OF DECEMBER 31, 2015 9 offices in 7 countries AS OF DECEMBER 31, 2015 Photo by Emily Andrews Work and Culture DISCLAIMER The statistics included on the following slides are updated quarterly.
  • 6. Large and Unique Seller Base 1.6M active sellers AS OF SEPTEMBER 30, 2015 95% of sellers run their Etsy shop from home 2014 ETSY SELLER SURVEY 76% consider their shop a business 2014 ETSY SELLER SURVEY Photo by Moira K. Lime DISCLAIMER The statistics included on the following slides are updated quarterly.
  • 7. Etsy Made in Canada Photo by Jean-Michael Seminaro 24M active buyers AS OF DECEMBER 31, 2015 92% of buyers agree Etsy offers products they can't find elsewhere 2014 ETSY BUYER SURVEY DISCLAIMER The statistics included on the following slides are updated quarterly.
  • 8. 8
  • 9.
  • 10.
  • 12. Approaches to Learning to Rank • Pointwise - For an item, predict it’s grade (implicit ordering) - Labels come from interactions with items - Possible class imbalance • Pairwise - Ranking transformed to pairwise classification or regression - Labels depend on ordering of item pair - Ability to create balanced classes • Listwise - Input is entire set of documents associated with query - Output is their ranked list - Eg. Loss is a measure of the distance of ranking generated by the model to the perfect ranking for the set of documents
  • 13. Pairwise Learning Each training instance represents a pair of items from same set of search results in your logs. <item1, item2> Learner must learn to order item1 and item2 correctly, with respect to user preference decisions found in your logs.
  • 15. Label Creation for Pairwise Features {housewarming, gift, photo} - {housewarming, gift, ceramic, tile} → +1 {housewarming, gift, ceramic, tile } - {housewarming, gift, photo} → -1
  • 17. Learning to Rank Pipeline
  • 19.
  • 20. Image vs. Text Features 20 Texture Shape Color Title Tags
  • 22. title Feature Engineerings Deep Learning 22
  • 23. ImageNet 23Photo from : http://www.image-net.org/
  • 24. Convolutional Neural Nets (CNNs) 24 Photo from: http://cs231n.stanford.edu/
  • 25. VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16 Model Specs: VGGnet
  • 26. Transfer Learning 26 Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
  • 27.
  • 28.
  • 29.
  • 30. Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
  • 31. Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
  • 32. 32 Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
  • 33. Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
  • 34. Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg, KDD ‘16
  • 36. Collecting Item Pairs with Labels 36
  • 37. Strategies for Model Application Real Time Offline Pros Can handle unseen items No cost to feature complexity Cons Latency cost ∝ complexity Query time feature computations Computations compound as considerations increase
  • 38. Real time model evaluation Fetch Model (cache, key-value store) Apply Ranking (ranking or reranking pass) User Query Top-k results Top-k results RerankedIndex Ranking Model
  • 46. Custom Queries & Explain Logs