Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
DescribeYourBar
1. Describe Your Bar
Context Sensitive Natural Language Search
and Exploration through Topic Space
Eric Carlson
2. “I want to eat oysters with a view of the bay”
Yelp
We don’t write or think in keywords
Semantic structure is lost
We want to find interesting
neighborhoods and their
relevant topics
Explore
Search
“the bay” vs “bay oyster”
4. Yelp: “No results in San Francisco”
“I want a club with neon
hoola-hoopers and a dj”
DescribeYourBar Top Result: “Mighty”
5. 500k Raw Yelp
Reviews
Noun Chunked
Tokenized
Lemmatized
Bar & neighborhood topics
extracted from review text
Fast search via
cosine similarity
Doc2Vec
Dense
Capture specific meaning
and word ordering
LDA
Sparse
Capture Interpretable Topics
6. Validation
Search using held out reviews->Find rank of real business
Correct business is in top 10 results 60% of the time
Top10
60%
10. Choosing LDA Parameters
Leveling off of perplexity implies
additional topics don’t help much
#bitsto
predictword
Dirichlet(𝛼) ~ Beta(𝛼, 𝛽 = 1)
𝛼 controls sparsity of topics
Tune to have ~5 top/bar
Topic 1 Topic 2
Number of Topics
11.
12. Validation
N>1000 bars in SF.
Validate on held out reviews (some very short).
50% of the time, true bar is in top 10!
50%
Top10
13. Validation
Search rank of true business for each held out review
50% of the time, true business is in top 10!
Top 10
14.
15. Extensions for the next week
1. Tune LDA sparsity for 4-5 topics/bar
2. Incorporate doc2vec similarity with some weight
add sentiment analysis
3. Display results on map
4. Validate by predicting business of held-out
reviews
5. Segment users via LDA+doc2vec trained on user
concatenated reviews
6. User rating predcition.
Pick out bars that are more interesting to a
given user than the average user