SlideShare a Scribd company logo
GrapeVin
Anasuya Das
Insight Data Science
Can wine recommendations be crowd sourced?
Can wine recommendations be crowd sourced?
130,000 reviews
82,342 unique wines
655 unique users
Images from:
Can wine recommendations be crowd sourced?
http://GrapeVin.us
130,000 reviews
82,342 unique wines
655 unique users
Images from:
Content based recommender
Content based recommender
Deep ruby color leads to a nose of currant and cedar
chest. The complex body contains plum,black cherry
and just the right amount of oak.
Content based recommender
Deep ruby color leads to a nose of currant and cedar
chest. The complex body contains plum,black cherry
and just the right amount of oak.
Get most similar wines in
latent semantic space
Content based recommender
Deep ruby color leads to a nose of currant and cedar
chest. The complex body contains plum,black cherry
and just the right amount of oak.
Deep ruby color, currant, dark cherry, vanilla,
cedar, full bodied, oak
Get most similar wines in
latent semantic space
2/8/2015 mpld3 plot
-0.6 -0.4 -0.2 -0.0 0.2 0.4 0.6 0.8
-0.6
-0.4
-0.2
-0.0
0.2
0.4
0.6
0.8
Pinot noir
Champagne
component3
component 2
Bordeaux
Wine reviews cluster by the type of wine
Preserved're(notopic'organiza(on'in'early'visual'cortex'followin
'implica(ons'for'visual'func(on?'
How*does*V1*damage*affect*the*re;notopic*organiza;on*of*spared*
visual*cortex?**How*is*this*related*to*spared*visual*func;on?*
V2d'
V2v'
V3v'
V4'
V3a'
hMT+'
V3d'
pole'
calcarine'
V1'LO1/LO2'
V2d'
V2v'
V3v'
V4'
V3a'
hMT+'
V3d'
pole'calcarine'
V1'
LO1/LO2'
V2d'
V2v'
V3a'
hMT+'
V3d'
pole'
calcarine'
V1'LO1/LO2'
Background&
Anasuya&Das1,2,&Elisha&P.&Merriam3,&David&J.&Heeger3,&Krystel&R.&
1Flaum&Eye&Ins+tute&and&2Centre&for&Visual&Science,&University&of&Rochester,&3Center&for&N
Damage&to&the&primary&visual&cortex&(V1)&or&its&afferents&produces&a&severe&
loss& of& vision& in& the& contralateral& visual& hemifield& (VF),& called& cor+cal&
blindness& (CB).& Studies& in& nonGhuman& primates& with& lesions& to& V1& observe&
reduced& but& organized& re+notopic& ac+vity& in& the& lesioned& visual& cortex&
(Schmid,&2009&&&2010).&These&and&other&studies&suggest&that&residual&visual&
processing&is&mediated&by&either&spared&V1&or&extraGgeniculoGcalcarine&input&
to&extrastriate&visual&areas&(reviewed&in&Das&and&Huxlin,&2010).&Human&fMRI&
studies&of&CB&have&examined&single&subjects,&and&the&lesion&characteris+cs&of&
these&subjects&have&varied&across&studies.&The&re+notopic&organiza+on&of&the&
damaged&visual&cortex&in&humans&is&not&well&characterized.&
Group*1:*Re;notopic*organiza;on*is*preserved*around*lesion**
But$extrastriate$cortex$has$greater$representa1on$of$central$field$than$spared$V1(CB8)$&
Intact&hemisphere&Damaged&hemisphere& Damaged&hemisphere&
Ques;ons& V2d'
V2v'
V3v'
V4'
V3a'
hMT+'
V3d'
pole'calcarine'
V1'
LO1/LO2'
The story so far …
y*visual*cortex*is*
Questions?
Cross validation: How accurately can the star rating of
a recommended wine be predicted?
red,spicy, oak
Train classifier
Get similar wines
Predict ratings
Cross validation: How accurately can the star rating of
a recommended wine be predicted?
Train a Naive-Bayes to learn the user preference of each user.
Test on the reviews written by other user for recommended wines
Cross validation: How accurately can the star rating of
a recommended wine be predicted?
2/8/2015 mpld3 plot
1 2 3 4 5
0
10
20
30
40
50
60
Rating prediction accuracy
Numberofusers
0.2 0.4 0.6 0.8 1
Train a Naive-Bayes to learn the user preference of each user.
Test on the reviews written by other user for recommended wines
A
L
G
O
R
I
T
H
M
Remove stop words
Stem using wordNet
Synsets to defeat adverbs
Detect language and filter
Unicode, HTML scrubbing
Cosine similarity in lower dimensional
space
reviews
= n
Truncated SVD, a.k.a Latent Semantic Indexing on
132,000 reviews x 20,000 words
TfIdf - m columns
x11 x12 ….
x21 x22 ….
x31 x32 ….
~
n x r components
~
x11 x12 ….
x21 x22 ….
x31 x32 ….
Recommend top 10
most similar and
highest rated wines
2/8/2015 mpld3 plot
1 2 3 4 5
0
10
20
30
40
50
60
Crossvalidation: What is the probability of
recommending a wine that is already reviewed
P of recommending already reviewed wine
Numberofusers
0.2 0.4 0.6 0.8 1
Data:
SuperUsers: Top 100 users with the most reviews
Wines: Top 100 wines reviewed by SuperUsers
Method: For each wine reviewed by SuperUser
recommend 20 most similar wines based on
remaining 99 users
1. convert to lower case
2. remove stop words and punctuation and html code
3. deal with broken unicode characters and replace with plain text
4. detect language and only include reviews in english
5. lemmatize using wordNet- works only on nouns and adjectives
6. do synset and pertainyms to convert adverbs to adjectives
7. use bigrams
8. tokenize using term frequence- inverse document frequency
Text processing steps
1/29/2015 mpld3 plot
500 1,000 1,500 2,000 2,500 3,000
20
30
40
50
60
70
80
90
Explainedvariance(%)
number of components
Selecting k components
1. Incorporate keyword search
2. Scale up and increase inventory
3. Scrape wine pricing information and local availability
4. Analyze tasting notes by vineyard or geographical region
- does soil and climate really impact how wines taste
Future directions
grapeVin
grapeVin
grapeVin
grapeVin
grapeVin
grapeVin
grapeVin

More Related Content

Viewers also liked

Mathias cv
Mathias cvMathias cv
Mathias cv
wangila mathias
 
Acebillo Summary Folio
Acebillo Summary FolioAcebillo Summary Folio
Acebillo Summary Folio
Pablo Acebillo
 
$martWorks Storyboard Activity Management 3
$martWorks Storyboard Activity Management 3$martWorks Storyboard Activity Management 3
$martWorks Storyboard Activity Management 3Patience Edremoda
 
lani_minella_resume
lani_minella_resumelani_minella_resume
lani_minella_resumeLani Minella
 
20150204 阿里巴巴說明會分享
20150204 阿里巴巴說明會分享20150204 阿里巴巴說明會分享
20150204 阿里巴巴說明會分享
Chiaen Li
 
Правильний вибір внз
Правильний вибір внзПравильний вибір внз
Правильний вибір внз
senya71
 
Effective communication
Effective communication Effective communication
Effective communication
Elikem Tsikata
 
Summer Time Blues
Summer Time BluesSummer Time Blues
Summer Time BluesJay Patel
 
Questionaire results (2)
Questionaire results (2)Questionaire results (2)
Questionaire results (2)
PuttH
 

Viewers also liked (12)

Mathias cv
Mathias cvMathias cv
Mathias cv
 
Acebillo Summary Folio
Acebillo Summary FolioAcebillo Summary Folio
Acebillo Summary Folio
 
$martWorks Storyboard Activity Management 3
$martWorks Storyboard Activity Management 3$martWorks Storyboard Activity Management 3
$martWorks Storyboard Activity Management 3
 
lani_minella_resume
lani_minella_resumelani_minella_resume
lani_minella_resume
 
20150204 阿里巴巴說明會分享
20150204 阿里巴巴說明會分享20150204 阿里巴巴說明會分享
20150204 阿里巴巴說明會分享
 
Profile-P.M
Profile-P.MProfile-P.M
Profile-P.M
 
Правильний вибір внз
Правильний вибір внзПравильний вибір внз
Правильний вибір внз
 
CV for LinkedIn15
CV for LinkedIn15CV for LinkedIn15
CV for LinkedIn15
 
0000039611-01
0000039611-010000039611-01
0000039611-01
 
Effective communication
Effective communication Effective communication
Effective communication
 
Summer Time Blues
Summer Time BluesSummer Time Blues
Summer Time Blues
 
Questionaire results (2)
Questionaire results (2)Questionaire results (2)
Questionaire results (2)
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

grapeVin