SlideShare a Scribd company logo
1 of 15
Download to read offline
Penguins in Sweaters,
or Serendipitous Entity Search
on User-generated-Content
I l a r i a B o r d i n o , Ye l e n a M e j o va , a n d M o u n i a L a l m a s
( Ya h o o L a b s )
ACM International Conference on Information and 
Knowledge
Management (CIKM 2013)
O c t o b e r 2 9 th, 2 0 1 3
Why/when do penguins wear sweaters?

Serendipity
Entity
Search

finding something good or useful while not
specifically looking for it, serendipitous search
systems provide relevant and interesting results
we build an entity-driven serendipitous search system
based on enriched entity networks extracted from
Wikipedia and Yahoo! Answers

2
1. What connections between
entities do web community
knowledge portals offer?

2. How do they contribute to an
interesting, serendipitous
browsing experience?

WHAT

WHY

3
Yahoo Answers

vs

community-driven question &
answer portal
 67M questions & 262M
answers
 2 years [2010/2011]
 English-language

minimally curated
opinions, gossip, personal info
variety of points of view

Wikipedia

community-driven
encyclopedia
• 3 795 865 articles
• from end of
December 2011
• English Wikipedia

curated
high-quality knowledge
variety of niche topics
4
Entity & Relationship Extraction
 Entity: any concept having a Wikipedia page
Use an internal tool to
(1) identify surface forms,
(2) resolve to Wikipedia entities,
(3) rank entities using aboutness score;
Relationship: Cosine similarity of tf/idf vectors
(concatenation of documents where entity appears)
W. Zhao, J. Jiang, J. Weng, J. He, E.P. Lim, H. Yan, and X. Li. Comparing twitter and traditional
media using topic models. ECIR 2011.
D. Paranjpe. Learning document aboutness from implicit user feedback and document structure.
CIKM 2009.
5
Dataset Features
Dataset

# Nodes

# Edges

# Isolated

Yahoo! Answers

896,799

112,595,138

69,856

1,754,069

237,058,218

82,381

Wikipedia

 Sentiment
› using SentiStrength compute positive & negative scores
› compute attitude and sentimentality [Kucuktunc’12]
› Entity-level scores
 Topical Category
 Quality
– Yahoo Content Taxonomy
› Flesch Reading Ease score
Attitude (Polarity)

Sentimentality (Strength)

Readability

6
Wikipedia

Yahoo Answers
7
Retrieval
 Algorithm: Lazy Random walk with restart
Justin Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga,
Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert
Pattinson, Adele (singer), Steve Jobs, Osama bin Laden, Ron Paul,
Twitter, Facebook, Netflix, IPad, IPhone, Touchpad, Kindle, Olympic
Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford
Street, Nubcrburgring, Haiti, Chile, Libya, Egypt, Middle East,
Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout,
Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendicitis,
Cholera, Influenza, Pertussis, Vaccine, Childbirth

3 label per query-result pair

Wikipedia

Yahoo! Answers

Combined

Precision @ 5

0.668

0.724

0.744

MAP

0.716

0.762

0.782

 Annotator agreement
(overlap): 0.85
 Average overlap in top 5
results: 12%

Steve Jobs
Yahoo! Answers
Jon Rubinstein
Timothy Cook
Kane Kramer
Steve Wozniak
Jerry York

Wikipedia
System 7
PowerPC G4
SuperDrive
Power Macintosh
Power Computing Corp.
8
Serendipity
“making fortunate discoveries by accident”
Serendipity = unexpectedness + relevance
“Expected” result baselines from web search
Serendipity = interestingness + relevance
Result interestingness given the query
Personal interest in result

M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems
by coverage and serendipity. IRecSys 2010.
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in
web search. SIGCHI 2009.
9
Baseline

Data

General

High Read.

Top: 5 entities that occur most
frequently

WP

0.63 (0.58)

0.56 (0.53)

in the top 5 search results provided by

YA

0.69 (0.63)

0.71 (0.65)

Bing and Google

Comb

0.70 (0.61)

0.68 (0.61)

Top –WP: same as above, but excluding WP

0.63 (0.58)

0.56 (0.54)

the Wikipedia page from the set of

YA

0.70 (0.64)

0.71 (0.66)

results

Comb

0.71 (0.64)

0.68 (0.63)

Rel: top 5 entities in the related query

WP

0.64 (0.61)

0.57 (0.56)

suggestions provided by Bing and
Google

YA

0.70 (0.65)

0.71 (0.66)

Comb

0.72 (0.67)

0.69 (0.65)

WP

0.61 (0.54)

0.55 (0.51)

YA

0.68 (0.57)

0.69 (0.59)

Comb

0.68 (0.55)

0.66 (0.56)

Rel + Top: union of Top and Rel

| relevant & unexpected | / | unexpected |
number of serendipitous results out of all
of the unexpected results retrieved

| relevant & unexpected | / | retrieved |
serendipitous out of all retrieved
10
User-perceived Quality

1. Which result is more relevant to the query?
2. If someone is interested in the query, would they
also be interested in these results?
3. Even if you are not interested in the query, are
these results interesting to you personally?
4. Would you learn anything new about the query?
11
Interestingness
 Labelers provide pairwise comparisons between results;
Combine into a reference ranking and Compare result ranking
to optimal (Kendall’s tau-b)
Agreement:

Relevance (83%), Query interest (81%),
Personal interest (76%), Learning something new (81%)

 Interesting > Relevant
Oil Spill 
Sweaters for Penguins

Robert Pattinson 
Water for Elephants

WP

 Relevant > Interesting

Egypt 
Ptolemaic Kingdom

WP

WP & YA

Egypt  Cairo Conference WP
Netflix  Blu-ray Disc YA

J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating
aggregated search results. ECIR 2011.

12
Data

General +Topic

Which result is more

WP

0.162

0.194

relevant to the query?

YA

0.336

0.374

Comb

0.201

0.222

If someone is interested in

WP

0.162

0.176

the query, would they also

YA

0.312

0.343

be interested in the result?

Comb

0.184

0.222

Even if you are not
interested

WP

0.139

0.144

in the query, is the result

YA

0.324

0.359

interesting to you
personally?

Comb

0.168

0.198

Would you learn anything

WP

0.167

0.164

new about the query from

YA

0.307

0.346

this result?

Comb

0.184

0.203

Similarity (Kendall’s tau-b) between result sets
and reference ranking

Topical
category
constraint
promote results
of same topic
as query entity
Sentiment and
Readability
constraints
hurt performance

13
What did we learn?

1. What connections between entities
do web community knowledge
portals offer?

2. How do they contribute to
an interesting, serendipitous
browsing experience?

≠
ANSWERS

>
ANSWERS

14
15

Yahoo Confidential & Proprietary

More Related Content

Similar to Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content

SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...Distilled
 
Lean Analytics workshop for Dublin City University, April 2014
Lean Analytics workshop for Dublin City University, April 2014Lean Analytics workshop for Dublin City University, April 2014
Lean Analytics workshop for Dublin City University, April 2014Lean Analytics
 
Augmented Social Innovation
Augmented Social InnovationAugmented Social Innovation
Augmented Social InnovationAshwin Ram
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Croll   lean analytics workshop (3h) - lean ux nyc april 2014Croll   lean analytics workshop (3h) - lean ux nyc april 2014
Croll lean analytics workshop (3h) - lean ux nyc april 2014Lean Analytics
 
Persuasive e commerce workshop - Spryker
Persuasive e commerce workshop - SprykerPersuasive e commerce workshop - Spryker
Persuasive e commerce workshop - SprykerGuido X Jansen
 
Essay On Good Manners Maketh A Man
Essay On Good Manners Maketh A ManEssay On Good Manners Maketh A Man
Essay On Good Manners Maketh A ManAmy Williams
 
2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - TechnologyEdelman
 
The Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian LovellThe Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian LovellJessica Tams
 
eLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or FishyeLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or FishyKarl Kapp
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AIVerena Rieser
 
NursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing EducationNursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing EducationKarl Kapp
 
Froomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the GiantsFroomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the GiantsFroomle
 
Monitoring Measuring Social Media
Monitoring Measuring Social MediaMonitoring Measuring Social Media
Monitoring Measuring Social MediaSean Moffitt
 
Social networking for human resources professionals wb
Social networking for human resources professionals wbSocial networking for human resources professionals wb
Social networking for human resources professionals wbTodd Nilson
 
What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?   				What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You? Anne Adrian
 
RMIT 2013 sm1 evening
RMIT 2013 sm1 eveningRMIT 2013 sm1 evening
RMIT 2013 sm1 eveningDenis Masseni
 

Similar to Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content (20)

SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
SearchLove London 2016 | Larry Kim | Hacking RankBrain and Other Machine Lear...
 
Lean Analytics workshop for Dublin City University, April 2014
Lean Analytics workshop for Dublin City University, April 2014Lean Analytics workshop for Dublin City University, April 2014
Lean Analytics workshop for Dublin City University, April 2014
 
Augmented Social Innovation
Augmented Social InnovationAugmented Social Innovation
Augmented Social Innovation
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Croll   lean analytics workshop (3h) - lean ux nyc april 2014Croll   lean analytics workshop (3h) - lean ux nyc april 2014
Croll lean analytics workshop (3h) - lean ux nyc april 2014
 
Persuasive e commerce workshop - Spryker
Persuasive e commerce workshop - SprykerPersuasive e commerce workshop - Spryker
Persuasive e commerce workshop - Spryker
 
Leading Social Media Strategy
Leading Social Media StrategyLeading Social Media Strategy
Leading Social Media Strategy
 
Essay On Good Manners Maketh A Man
Essay On Good Manners Maketh A ManEssay On Good Manners Maketh A Man
Essay On Good Manners Maketh A Man
 
2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology2017 Edelman Trust Barometer - Technology
2017 Edelman Trust Barometer - Technology
 
The Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian LovellThe Business of Family-Friendly Mobile Gaming | Brian Lovell
The Business of Family-Friendly Mobile Gaming | Brian Lovell
 
eLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or FishyeLearning and the Future through Fact or Fishy
eLearning and the Future through Fact or Fishy
 
Ethics for Conversational AI
Ethics for Conversational AIEthics for Conversational AI
Ethics for Conversational AI
 
NursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing EducationNursingInnovations in Learning Technology and What it Means to Nursing Education
NursingInnovations in Learning Technology and What it Means to Nursing Education
 
Froomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the GiantsFroomle Tech Webinar : Stand on the Giants
Froomle Tech Webinar : Stand on the Giants
 
Monitoring Measuring Social Media
Monitoring Measuring Social MediaMonitoring Measuring Social Media
Monitoring Measuring Social Media
 
Social networking for human resources professionals wb
Social networking for human resources professionals wbSocial networking for human resources professionals wb
Social networking for human resources professionals wb
 
What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?   				What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
Slalom
SlalomSlalom
Slalom
 
RMIT 2013 sm1 evening
RMIT 2013 sm1 eveningRMIT 2013 sm1 evening
RMIT 2013 sm1 evening
 

More from Mounia Lalmas-Roelleke

Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Mounia Lalmas-Roelleke
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Mounia Lalmas-Roelleke
 
Tutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationTutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationMounia Lalmas-Roelleke
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceTutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceMounia Lalmas-Roelleke
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalMounia Lalmas-Roelleke
 
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Mounia Lalmas-Roelleke
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersMounia Lalmas-Roelleke
 
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataDescribing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataMounia Lalmas-Roelleke
 
Story-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementStory-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementMounia Lalmas-Roelleke
 
Mobile advertising: The preclick experience
Mobile advertising: The preclick experienceMobile advertising: The preclick experience
Mobile advertising: The preclick experienceMounia Lalmas-Roelleke
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsMounia Lalmas-Roelleke
 
Improving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisImproving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisMounia Lalmas-Roelleke
 
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Mounia Lalmas-Roelleke
 
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementA Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementMounia Lalmas-Roelleke
 

More from Mounia Lalmas-Roelleke (20)

Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization
 
Tutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationTutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and Optimization
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceTutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
 
Advertising Quality Science
Advertising Quality ScienceAdvertising Quality Science
Advertising Quality Science
 
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataDescribing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
 
Story-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementStory-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User Engagement
 
Mobile advertising: The preclick experience
Mobile advertising: The preclick experienceMobile advertising: The preclick experience
Mobile advertising: The preclick experience
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native Advertisements
 
Improving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisImproving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival Analysis
 
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
 
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementA Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
 

Recently uploaded

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 

Recently uploaded (20)

Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 

Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content

  • 1. Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content I l a r i a B o r d i n o , Ye l e n a M e j o va , a n d M o u n i a L a l m a s ( Ya h o o L a b s ) ACM International Conference on Information and 
Knowledge Management (CIKM 2013) O c t o b e r 2 9 th, 2 0 1 3
  • 2. Why/when do penguins wear sweaters? Serendipity Entity Search finding something good or useful while not specifically looking for it, serendipitous search systems provide relevant and interesting results we build an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers 2
  • 3. 1. What connections between entities do web community knowledge portals offer? 2. How do they contribute to an interesting, serendipitous browsing experience? WHAT WHY 3
  • 4. Yahoo Answers vs community-driven question & answer portal  67M questions & 262M answers  2 years [2010/2011]  English-language minimally curated opinions, gossip, personal info variety of points of view Wikipedia community-driven encyclopedia • 3 795 865 articles • from end of December 2011 • English Wikipedia curated high-quality knowledge variety of niche topics 4
  • 5. Entity & Relationship Extraction  Entity: any concept having a Wikipedia page Use an internal tool to (1) identify surface forms, (2) resolve to Wikipedia entities, (3) rank entities using aboutness score; Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears) W. Zhao, J. Jiang, J. Weng, J. He, E.P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. ECIR 2011. D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009. 5
  • 6. Dataset Features Dataset # Nodes # Edges # Isolated Yahoo! Answers 896,799 112,595,138 69,856 1,754,069 237,058,218 82,381 Wikipedia  Sentiment › using SentiStrength compute positive & negative scores › compute attitude and sentimentality [Kucuktunc’12] › Entity-level scores  Topical Category  Quality – Yahoo Content Taxonomy › Flesch Reading Ease score Attitude (Polarity) Sentimentality (Strength) Readability 6
  • 8. Retrieval  Algorithm: Lazy Random walk with restart Justin Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga, Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert Pattinson, Adele (singer), Steve Jobs, Osama bin Laden, Ron Paul, Twitter, Facebook, Netflix, IPad, IPhone, Touchpad, Kindle, Olympic Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford Street, Nubcrburgring, Haiti, Chile, Libya, Egypt, Middle East, Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout, Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendicitis, Cholera, Influenza, Pertussis, Vaccine, Childbirth 3 label per query-result pair Wikipedia Yahoo! Answers Combined Precision @ 5 0.668 0.724 0.744 MAP 0.716 0.762 0.782  Annotator agreement (overlap): 0.85  Average overlap in top 5 results: 12% Steve Jobs Yahoo! Answers Jon Rubinstein Timothy Cook Kane Kramer Steve Wozniak Jerry York Wikipedia System 7 PowerPC G4 SuperDrive Power Macintosh Power Computing Corp. 8
  • 9. Serendipity “making fortunate discoveries by accident” Serendipity = unexpectedness + relevance “Expected” result baselines from web search Serendipity = interestingness + relevance Result interestingness given the query Personal interest in result M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010. P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009. 9
  • 10. Baseline Data General High Read. Top: 5 entities that occur most frequently WP 0.63 (0.58) 0.56 (0.53) in the top 5 search results provided by YA 0.69 (0.63) 0.71 (0.65) Bing and Google Comb 0.70 (0.61) 0.68 (0.61) Top –WP: same as above, but excluding WP 0.63 (0.58) 0.56 (0.54) the Wikipedia page from the set of YA 0.70 (0.64) 0.71 (0.66) results Comb 0.71 (0.64) 0.68 (0.63) Rel: top 5 entities in the related query WP 0.64 (0.61) 0.57 (0.56) suggestions provided by Bing and Google YA 0.70 (0.65) 0.71 (0.66) Comb 0.72 (0.67) 0.69 (0.65) WP 0.61 (0.54) 0.55 (0.51) YA 0.68 (0.57) 0.69 (0.59) Comb 0.68 (0.55) 0.66 (0.56) Rel + Top: union of Top and Rel | relevant & unexpected | / | unexpected | number of serendipitous results out of all of the unexpected results retrieved | relevant & unexpected | / | retrieved | serendipitous out of all retrieved 10
  • 11. User-perceived Quality 1. Which result is more relevant to the query? 2. If someone is interested in the query, would they also be interested in these results? 3. Even if you are not interested in the query, are these results interesting to you personally? 4. Would you learn anything new about the query? 11
  • 12. Interestingness  Labelers provide pairwise comparisons between results; Combine into a reference ranking and Compare result ranking to optimal (Kendall’s tau-b) Agreement: Relevance (83%), Query interest (81%), Personal interest (76%), Learning something new (81%)  Interesting > Relevant Oil Spill  Sweaters for Penguins Robert Pattinson  Water for Elephants WP  Relevant > Interesting Egypt  Ptolemaic Kingdom WP WP & YA Egypt  Cairo Conference WP Netflix  Blu-ray Disc YA J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. ECIR 2011. 12
  • 13. Data General +Topic Which result is more WP 0.162 0.194 relevant to the query? YA 0.336 0.374 Comb 0.201 0.222 If someone is interested in WP 0.162 0.176 the query, would they also YA 0.312 0.343 be interested in the result? Comb 0.184 0.222 Even if you are not interested WP 0.139 0.144 in the query, is the result YA 0.324 0.359 interesting to you personally? Comb 0.168 0.198 Would you learn anything WP 0.167 0.164 new about the query from YA 0.307 0.346 this result? Comb 0.184 0.203 Similarity (Kendall’s tau-b) between result sets and reference ranking Topical category constraint promote results of same topic as query entity Sentiment and Readability constraints hurt performance 13
  • 14. What did we learn? 1. What connections between entities do web community knowledge portals offer? 2. How do they contribute to an interesting, serendipitous browsing experience? ≠ ANSWERS > ANSWERS 14
  • 15. 15 Yahoo Confidential & Proprietary