SlideShare a Scribd company logo
Searching for Interestingness
                                    in Wikipedia and Yahoo! Answers

                         Yelena Mejova1 Ilaria Bordino2 Mounia Lalmas3 Aristides Gionis4
                                {1,2,3}                                                      4
                                          Yahoo! Research Barcelona, Spain                       Aalto University, Finland
                                   {1 ymejova, 2 bordino, 3 mounia}                 4

ABSTRACT                                                                            2. ENTITY NETWORKS
In many cases, when browsing the Web, users are searching                              We extract entity networks from (i) a dump of the En-
for specific information. Sometimes, though, users are also                          glish Wikipedia from December 2011 consisting of 3 795 865
looking for something interesting, surprising, or entertain-                        articles, and (ii) a sample of the English Yahoo! Answers
ing. Serendipitous search puts interestingness on par with                          dataset from 2010/2011, containing 67 336 144 questions and
relevance. We investigate how interesting are the results one                       261 770 047 answers. We use state-of-the-art methods [3, 5]
can obtain via serendipitous search, and what makes them                            to extract entities from the documents in each dataset.
so, by comparing entity networks extracted from two promi-                             Next we draw an arc between any two entities e1 and e2
nent social media sites, Wikipedia and Yahoo! Answers.                              that co-occur in one or more documents. We assign the arc
                                                                                    a weight w1 (e1 , e2 ) = DF(e1 , e2 ) equal to the number of such
Categories and Subject Descriptors                                                  documents (the document frequency (DF) of the entity pair).
H.4 [Information Systems Applications]: Miscellaneous                                  This weighting scheme tends to favor popular entities. To
                                                                                    mitigate this effect, we measure the rarity of any entity e
Keywords                                                                            in a dataset by computing its inverse document frequency
Serendipity, Exploratory search                                                     IDF(e) = log(N )−log(DF(e)), where N is the size of the col-
                                                                                    lection, and DF(e) is the document frequency of entity e. We
1. INTRODUCTION                                                                     set a threshold on IDF to drop the arcs that involve the most
   Serendipitous search occurs when a user with no a priori                         popular entities. We also rescale the arc weights according
or totally unrelated intentions interacts with a system and                         to the alternative scheme w2 (e1 → e2 ) = DF(e1 , e2 )·IDF(e2 ).
acquires useful information [4]. A system supporting such                              We use Personalized PageRank (PPR) [1] to extract the
exploratory capabilities must provide results that are rele-                        top n entities related to a query entity. We consider two
vant to the user’s current interest, and yet interesting, to                        scoring methods. When using the w2 weighting scheme, we
encourage the user to continue the exploration.                                     simply use the PPR scores (we dub this method IDF). When
   In this work, we describe an entity-driven exploratory and                       using the simpler scheme w1 , we normalize the PPR scores
serendipitous search system, based on enriched entity net-                          by the global PageRank scores (with no personalization) to
works that are explored through random-walk computations                            penalize popular entities. We dub this method PN.
to retrieve search results for a given query entity. We extract                        We enrich our entity networks with metadata regard-
entity networks from two datasets, Wikipedia, a curated,                            ing sentiment and quality of the documents. Using Sen-
collaborative online encyclopedia, and Yahoo! Answers, a                            tiStrength1 , we extract sentiment scores for each document.
more unconstrained question/answering forum, where the                              We calculate attitude and sentimentality metrics [2] to mea-
freedom of conversation may present advantages such as                              sure polarity and strength of the sentiment. Regarding qual-
opinions, rumors, and social interest and approval.                                 ity, for Yahoo! Answers documents we count the number of
   We compare the networks extracted from the two media                             points assigned by the system to the users, as indication of
by performing user studies in which we juxtapose interest-                          expertise and thus good quality. For Wikipedia, we count
ingness of the results retrieved for a query entity, with rel-                      the number of dispute messages inserted by editors to require
evance. We investigate whether interestingness depends on                           revisions, as indication of bad quality. We derive sentiment
(i) the curated/uncurated nature of the dataset, and/or on                          and quality scores for any entity by averaging over all the
(ii) additional characteristics of the results, such as senti-                      documents in which the entity appears. We use Wikimedia2
ment, content quality, and popularity.                                              statistics to estimate the popularity of entities.

                                                                                    3. EXPLORATORY SEARCH
                                                                                      We test our system using a set of 37 queries originat-
Permission to make digital or hard copies of all or part of this work for           ing from 2010 and 2011 Google Zeitgeist (
personal or classroom use is granted without fee provided that copies are           zeitgeist) and having sufficient coverage in both datasets.
not made or distributed for profit or commercial advantage and that copies           Using one of the two algorithms – PN or IDF – we retrieve
bear this notice and the full citation on the first page. To copy otherwise, to
                                                                                    the top five entities from each dataset – YA or WP – for each
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.                                                            1
WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil.                  
Copyright 2013 ACM 978-1-4503-2038-2/13/05 ...$15.00.                         
Figure 1: Performance: (a) and (b) scale range from 1 to 4, (c) correlation range from 0 to 1

                     int. to query   int. to user   relevant                 diverse      frustrating    interesting    learn new              int. to query      int. to user   relevant






              ya r   ya pn       ya idf     wp r    wp pn wp idf          ya r    ya pn    ya idf       wp r     wp pn wp idf             ya pn          ya idf          wp pn      wp idf
      (a) Average query/result pair labels                               (b) Average query set labels                               (c) Corr. with learn smth new

query. For comparison, we consider setups consisting of 5                                        query (0.214), and quality of the result entity (0.201). These
random entities. Note that unlike for conventional retrieval,                                    features point to important aspects of a retrieval strategy
a random baseline is feasible for a browsing task.                                               which would lead to a successful serendipitous search.
   We recruit four editors to annotate the retrieved results,                                                          Table 1: Retrieval result examples
asking them to evaluate each result entity for relevance, in-                                        YA query: Kim Kardashian       Attitude   Sentiment.            Quality         Pageviews
terestingness to the query, and interestingness regardless of                                        Perry Williams                    0           0                   0                85
                                                                                                     Eva Longoria Parker             −0.602       2.018                6         1 450 814
the query, with responses falling on scale from 1 to 4 (Fig-
                                                                                                     WP query: H1N1 pandemic        Attitude   Sentiment.            Quality          Pageviews
ure 1(a)). Both of our retrieval methods outperform the                                               Phaungbyin                       2           2                   1                706
random baseline (at p < 0.01). The gain in interestingness                                           2009 US flu pandemic               1           1                   1             21 981
to the user despite the query suggests that randomly viewed
information is not intrinsically interesting to the user.                                        4. DISCUSSION & CONCLUSION
   Whereas performance improves from PN to IDF for YA,                                              Beyond the aggregate measures of the previous section,
the interestingness to the user is hurt significantly (at p <                                     the peculiarities of Yahoo! Answers and Wikipedia as so-
0.01) for WP (the other measures remain statistically the                                        cial media present unique advantages and challenges for
same). Note that PN uses the weighting scheme w1 , while                                         serendipitous search. For example, Table 1 shows poten-
IDF operates on the networks sparsified and weighted ac-                                          tial search YA results for an American socialite Kim Kar-
cording to function w2 . The frequency-based approach ap-                                        dashian: an actress Eva Longoria Parker (whose Wikipedia
plied by IDF mediates the mentioning of popular entities in                                      page has over a million visits in two years), and a footballer
a non-curated dataset like YA, but it fails to capture the im-                                   Perry Williams (who played his last game in 1993). Note
portance of entities in a domain with restricted authorship.                                     the difference in attitude and sentimentality. Yahoo! An-
   Next we ask the editors to look at the five results as a                                       swers provides a wider spread of emotion. This data may be
whole, measuring diversity, frustration, interestingness, and                                    of use when searching for potentially serendipitous entities.
the ability of the user to learn something new about the                                            Table 1 also shows potential WP results for the query
query. Figure 1(b) shows that the two random runs are                                            H1N1 Pandemic: a town in Burma called Phaungbyin, and
highly diverse but provoke the most frustration. The most                                        2009 flu pandemic in the United States. We may expect
diverse and the least frustrating result sets are provided by                                    pandemic to be associated with negative sentiment, but the
the YA IDF run. The WP PN run also shows high diversity,                                         documents in Wikipedia do not display it.
but it falls with the IDF constraint. The YA IDF run gives                                          It is our intuition that the two datasets provide a comple-
better diversity and interesting scores at p < 0.01 than the                                     mentary view of the entities and their relations, and that a
WP IDF run, while performing statistically the same.                                             hybrid system exploiting both resources would provide the
   To examine the relationship with the serendipity level of                                     best user experience. We leave this for future work.
the content, we compute correlation between the learn some-
thing new label (LSN) and the others. Figure 1(c) shows                                          5. ACKNOWLEDGEMENTS
the LSN label to be the least correlated with interests of the                                     This work was partially funded by the European Union
user in the WP IDF run, and the most for the YA IDF run.                                         Linguistically Motivated Semantic Aggregation Engines
Especially in the WP IDF run, the relevance is highly asso-                                      (LiMoSINe) project3 .
ciated with the LSN label. We are witnessing two different                                        References
searching experiences: in the YA IDF setup the results are
                                                                                                 [1] G. Jeh and J. Widom. Scaling personalized web search. In
diverse and popular, whereas in the WP IDF setup the re-                                             WWW ’03, pages 271–279. ACM, 2003.
sults are less diverse, and the user may be less interested in                                   [2] O. Kucuktunc, B. Cambazoglu, I. Weber, and H. Ferhatos-
the relevant content, but it will be just as educational.                                            manoglu. A large-scale sentiment analysis for yahoo! answers.
   Finally we analyze the metadata collected for the entities                                        In WSDM ’12, pages 633–642. ACM, 2012.
in any query-result pair: Attitude (A), Sentimentality (S),                                      [3] D. Paranjpe. Learning document aboutness from implicit user
                                                                                                     feedback and document structure. In CIKM, 2009.
Quality (Q), Popularity (V), and Context (T). For each pair,
                                                                                                 [4] E. G. Toms. Serendipitous information retrieval. In DELOS,
we calculate the difference between query and result in these                                         2000.
dimensions. For Context we compute the cosine similarity                                         [5] Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile, and S. Gaffney.
between the TF/IDF vectors of the entities. In aggregate,                                            Resolving surface forms to Wikipedia topics. In COLING,
the best connections are between result popularity and rel-                                          pages 1335–1343, 2010.
evance (0.234), as well as interestingness of the result to the                                  3
user (0.227), followed by contextual similarity of result and

More Related Content

What's hot

Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
IJERA Editor
A Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsA Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social Signals
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
Content Savvy

What's hot (6)

Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
A Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsA Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social Signals
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...

Viewers also liked

Articulo mary-la mejor escuela para padres-2011
Articulo  mary-la mejor escuela para padres-2011Articulo  mary-la mejor escuela para padres-2011
Articulo mary-la mejor escuela para padres-2011
Bitacora 4
Bitacora 4Bitacora 4
Bitacora 5
Bitacora 5Bitacora 5
Ct impuestos nacionales ok impuesto a las ventas
Ct impuestos nacionales ok impuesto a las ventasCt impuestos nacionales ok impuesto a las ventas
Ct impuestos nacionales ok impuesto a las ventas
Marka Empresas
Bitacora 6
Bitacora 6Bitacora 6
Bitacora 8
Bitacora 8Bitacora 8
Plan de desarrollo profecional
Plan de desarrollo profecionalPlan de desarrollo profecional
Plan de desarrollo profecional
Jonathan Gonzalez
Agritourism - introduction
Agritourism - introductionAgritourism - introduction
Agritourism - introduction
Doanh Tưng Tửng
Distributivo docentes 2013
Distributivo docentes 2013Distributivo docentes 2013
Distributivo docentes 2013
Colegio Nacional Técnico "San Isidro"
Digital strategy for pure michigan
Digital strategy for pure michiganDigital strategy for pure michigan
Digital strategy for pure michigan
Fan Ju
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
Algoritmos diagrama-de-flujo
Algoritmos diagrama-de-flujoAlgoritmos diagrama-de-flujo
Algoritmos diagrama-de-flujo
Tourism and travel agency management - cambridge
Tourism and travel agency management - cambridgeTourism and travel agency management - cambridge
Tourism and travel agency management - cambridge
Doanh Tưng Tửng
Distribucion horario 2013
Distribucion horario 2013Distribucion horario 2013
Distribucion horario 2013
Colegio Nacional Técnico "San Isidro"
Proyecto abp
Proyecto abpProyecto abp
Business of Inbound tour operators at a glance
Business of Inbound tour operators at a glanceBusiness of Inbound tour operators at a glance
Business of Inbound tour operators at a glance
Doanh Tưng Tửng
Guia 3e 2013
Guia 3e 2013Guia 3e 2013
Guia 3e 2013
Manuel Wirlok
CRM - Customer Relationship Management
CRM - Customer Relationship ManagementCRM - Customer Relationship Management
CRM - Customer Relationship Management

Viewers also liked (18)

Articulo mary-la mejor escuela para padres-2011
Articulo  mary-la mejor escuela para padres-2011Articulo  mary-la mejor escuela para padres-2011
Articulo mary-la mejor escuela para padres-2011
Bitacora 4
Bitacora 4Bitacora 4
Bitacora 4
Bitacora 5
Bitacora 5Bitacora 5
Bitacora 5
Ct impuestos nacionales ok impuesto a las ventas
Ct impuestos nacionales ok impuesto a las ventasCt impuestos nacionales ok impuesto a las ventas
Ct impuestos nacionales ok impuesto a las ventas
Bitacora 6
Bitacora 6Bitacora 6
Bitacora 6
Bitacora 8
Bitacora 8Bitacora 8
Bitacora 8
Plan de desarrollo profecional
Plan de desarrollo profecionalPlan de desarrollo profecional
Plan de desarrollo profecional
Agritourism - introduction
Agritourism - introductionAgritourism - introduction
Agritourism - introduction
Distributivo docentes 2013
Distributivo docentes 2013Distributivo docentes 2013
Distributivo docentes 2013
Digital strategy for pure michigan
Digital strategy for pure michiganDigital strategy for pure michigan
Digital strategy for pure michigan
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014An introduction to MongoDB by César Trigo #OpenExpoDay 2014
An introduction to MongoDB by César Trigo #OpenExpoDay 2014
Algoritmos diagrama-de-flujo
Algoritmos diagrama-de-flujoAlgoritmos diagrama-de-flujo
Algoritmos diagrama-de-flujo
Tourism and travel agency management - cambridge
Tourism and travel agency management - cambridgeTourism and travel agency management - cambridge
Tourism and travel agency management - cambridge
Distribucion horario 2013
Distribucion horario 2013Distribucion horario 2013
Distribucion horario 2013
Proyecto abp
Proyecto abpProyecto abp
Proyecto abp
Business of Inbound tour operators at a glance
Business of Inbound tour operators at a glanceBusiness of Inbound tour operators at a glance
Business of Inbound tour operators at a glance
Guia 3e 2013
Guia 3e 2013Guia 3e 2013
Guia 3e 2013
CRM - Customer Relationship Management
CRM - Customer Relationship ManagementCRM - Customer Relationship Management
CRM - Customer Relationship Management

Similar to Searching for Interestingness in Wikipedia and Yahoo! Answers

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advanced
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity prediction
eSAT Publishing House
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
Cuong Tran Van
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector Machine
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
IRJET Journal
Entity Linking Combining Open Source Annotators
Entity Linking Combining Open Source AnnotatorsEntity Linking Combining Open Source Annotators
Entity Linking Combining Open Source Annotators
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
Deep Kayal
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
IOSR Journals
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
Mohamed El-Geish
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
IRJET Journal
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET Journal
Sas web 2010 lora-aroyo
Sas web 2010 lora-aroyoSas web 2010 lora-aroyo
Sas web 2010 lora-aroyo
Lora Aroyo
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
Bianca Pereira
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
Micah Altman
Rachel Guan

Similar to Searching for Interestingness in Wikipedia and Yahoo! Answers (20)

Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)Arcomem training – Enrichment Advanced (update)
Arcomem training – Enrichment Advanced (update)
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
Arcomem training enrichment_advanced
Arcomem training enrichment_advancedArcomem training enrichment_advanced
Arcomem training enrichment_advanced
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity prediction
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector Machine
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
Entity Linking Combining Open Source Annotators
Entity Linking Combining Open Source AnnotatorsEntity Linking Combining Open Source Annotators
Entity Linking Combining Open Source Annotators
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
Sas web 2010 lora-aroyo
Sas web 2010 lora-aroyoSas web 2010 lora-aroyo
Sas web 2010 lora-aroyo
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse

More from Gabriela Agustini

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção global
Gabriela Agustini
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociais
Gabriela Agustini
Inovação digital
Inovação digital Inovação digital
Inovação digital
Gabriela Agustini
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e Educação
Gabriela Agustini
Cultura digital - Aula 4
Cultura digital - Aula 4Cultura digital - Aula 4
Cultura digital - Aula 4
Gabriela Agustini
Cultura Digital- aula 3
Cultura Digital- aula 3Cultura Digital- aula 3
Cultura Digital- aula 3
Gabriela Agustini
Cultura Digital- aula 2
Cultura Digital- aula 2Cultura Digital- aula 2
Cultura Digital- aula 2
Gabriela Agustini
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gil
Gabriela Agustini
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and Technology
Gabriela Agustini
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
Gabriela Agustini
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good Report
Gabriela Agustini
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17
Gabriela Agustini
7 Forum Nacional de Museus
7 Forum Nacional de Museus7 Forum Nacional de Museus
7 Forum Nacional de Museus
Gabriela Agustini
Apresentacao metashop
Apresentacao metashopApresentacao metashop
Apresentacao metashop
Gabriela Agustini
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucional
Gabriela Agustini
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2
Gabriela Agustini
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1
Gabriela Agustini
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine Germany
Gabriela Agustini
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos
Gabriela Agustini
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovação
Gabriela Agustini

More from Gabriela Agustini (20)

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção global
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociais
Inovação digital
Inovação digital Inovação digital
Inovação digital
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e Educação
Cultura digital - Aula 4
Cultura digital - Aula 4Cultura digital - Aula 4
Cultura digital - Aula 4
Cultura Digital- aula 3
Cultura Digital- aula 3Cultura Digital- aula 3
Cultura Digital- aula 3
Cultura Digital- aula 2
Cultura Digital- aula 2Cultura Digital- aula 2
Cultura Digital- aula 2
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gil
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and Technology
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good Report
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17
7 Forum Nacional de Museus
7 Forum Nacional de Museus7 Forum Nacional de Museus
7 Forum Nacional de Museus
Apresentacao metashop
Apresentacao metashopApresentacao metashop
Apresentacao metashop
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucional
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine Germany
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovação

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
名前 です男
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU

Searching for Interestingness in Wikipedia and Yahoo! Answers

  • 1. Searching for Interestingness in Wikipedia and Yahoo! Answers Yelena Mejova1 Ilaria Bordino2 Mounia Lalmas3 Aristides Gionis4 {1,2,3} 4 Yahoo! Research Barcelona, Spain Aalto University, Finland {1 ymejova, 2 bordino, 3 mounia} 4 ABSTRACT 2. ENTITY NETWORKS In many cases, when browsing the Web, users are searching We extract entity networks from (i) a dump of the En- for specific information. Sometimes, though, users are also glish Wikipedia from December 2011 consisting of 3 795 865 looking for something interesting, surprising, or entertain- articles, and (ii) a sample of the English Yahoo! Answers ing. Serendipitous search puts interestingness on par with dataset from 2010/2011, containing 67 336 144 questions and relevance. We investigate how interesting are the results one 261 770 047 answers. We use state-of-the-art methods [3, 5] can obtain via serendipitous search, and what makes them to extract entities from the documents in each dataset. so, by comparing entity networks extracted from two promi- Next we draw an arc between any two entities e1 and e2 nent social media sites, Wikipedia and Yahoo! Answers. that co-occur in one or more documents. We assign the arc a weight w1 (e1 , e2 ) = DF(e1 , e2 ) equal to the number of such Categories and Subject Descriptors documents (the document frequency (DF) of the entity pair). H.4 [Information Systems Applications]: Miscellaneous This weighting scheme tends to favor popular entities. To mitigate this effect, we measure the rarity of any entity e Keywords in a dataset by computing its inverse document frequency Serendipity, Exploratory search IDF(e) = log(N )−log(DF(e)), where N is the size of the col- lection, and DF(e) is the document frequency of entity e. We 1. INTRODUCTION set a threshold on IDF to drop the arcs that involve the most Serendipitous search occurs when a user with no a priori popular entities. We also rescale the arc weights according or totally unrelated intentions interacts with a system and to the alternative scheme w2 (e1 → e2 ) = DF(e1 , e2 )·IDF(e2 ). acquires useful information [4]. A system supporting such We use Personalized PageRank (PPR) [1] to extract the exploratory capabilities must provide results that are rele- top n entities related to a query entity. We consider two vant to the user’s current interest, and yet interesting, to scoring methods. When using the w2 weighting scheme, we encourage the user to continue the exploration. simply use the PPR scores (we dub this method IDF). When In this work, we describe an entity-driven exploratory and using the simpler scheme w1 , we normalize the PPR scores serendipitous search system, based on enriched entity net- by the global PageRank scores (with no personalization) to works that are explored through random-walk computations penalize popular entities. We dub this method PN. to retrieve search results for a given query entity. We extract We enrich our entity networks with metadata regard- entity networks from two datasets, Wikipedia, a curated, ing sentiment and quality of the documents. Using Sen- collaborative online encyclopedia, and Yahoo! Answers, a tiStrength1 , we extract sentiment scores for each document. more unconstrained question/answering forum, where the We calculate attitude and sentimentality metrics [2] to mea- freedom of conversation may present advantages such as sure polarity and strength of the sentiment. Regarding qual- opinions, rumors, and social interest and approval. ity, for Yahoo! Answers documents we count the number of We compare the networks extracted from the two media points assigned by the system to the users, as indication of by performing user studies in which we juxtapose interest- expertise and thus good quality. For Wikipedia, we count ingness of the results retrieved for a query entity, with rel- the number of dispute messages inserted by editors to require evance. We investigate whether interestingness depends on revisions, as indication of bad quality. We derive sentiment (i) the curated/uncurated nature of the dataset, and/or on and quality scores for any entity by averaging over all the (ii) additional characteristics of the results, such as senti- documents in which the entity appears. We use Wikimedia2 ment, content quality, and popularity. statistics to estimate the popularity of entities. 3. EXPLORATORY SEARCH We test our system using a set of 37 queries originat- Permission to make digital or hard copies of all or part of this work for ing from 2010 and 2011 Google Zeitgeist ( personal or classroom use is granted without fee provided that copies are zeitgeist) and having sufficient coverage in both datasets. not made or distributed for profit or commercial advantage and that copies Using one of the two algorithms – PN or IDF – we retrieve bear this notice and the full citation on the first page. To copy otherwise, to the top five entities from each dataset – YA or WP – for each republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 1 WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil. 2 Copyright 2013 ACM 978-1-4503-2038-2/13/05 ...$15.00.
  • 2. Figure 1: Performance: (a) and (b) scale range from 1 to 4, (c) correlation range from 0 to 1 4.0 4.0 int. to query int. to user relevant diverse frustrating interesting learn new int. to query int. to user relevant 0.8 3.0 3.0 0.4 2.0 2.0 1.0 1.0 0.0 ya r ya pn ya idf wp r wp pn wp idf ya r ya pn ya idf wp r wp pn wp idf ya pn ya idf wp pn wp idf (a) Average query/result pair labels (b) Average query set labels (c) Corr. with learn smth new query. For comparison, we consider setups consisting of 5 query (0.214), and quality of the result entity (0.201). These random entities. Note that unlike for conventional retrieval, features point to important aspects of a retrieval strategy a random baseline is feasible for a browsing task. which would lead to a successful serendipitous search. We recruit four editors to annotate the retrieved results, Table 1: Retrieval result examples asking them to evaluate each result entity for relevance, in- YA query: Kim Kardashian Attitude Sentiment. Quality Pageviews terestingness to the query, and interestingness regardless of Perry Williams 0 0 0 85 Eva Longoria Parker −0.602 2.018 6 1 450 814 the query, with responses falling on scale from 1 to 4 (Fig- WP query: H1N1 pandemic Attitude Sentiment. Quality Pageviews ure 1(a)). Both of our retrieval methods outperform the Phaungbyin 2 2 1 706 random baseline (at p < 0.01). The gain in interestingness 2009 US flu pandemic 1 1 1 21 981 to the user despite the query suggests that randomly viewed information is not intrinsically interesting to the user. 4. DISCUSSION & CONCLUSION Whereas performance improves from PN to IDF for YA, Beyond the aggregate measures of the previous section, the interestingness to the user is hurt significantly (at p < the peculiarities of Yahoo! Answers and Wikipedia as so- 0.01) for WP (the other measures remain statistically the cial media present unique advantages and challenges for same). Note that PN uses the weighting scheme w1 , while serendipitous search. For example, Table 1 shows poten- IDF operates on the networks sparsified and weighted ac- tial search YA results for an American socialite Kim Kar- cording to function w2 . The frequency-based approach ap- dashian: an actress Eva Longoria Parker (whose Wikipedia plied by IDF mediates the mentioning of popular entities in page has over a million visits in two years), and a footballer a non-curated dataset like YA, but it fails to capture the im- Perry Williams (who played his last game in 1993). Note portance of entities in a domain with restricted authorship. the difference in attitude and sentimentality. Yahoo! An- Next we ask the editors to look at the five results as a swers provides a wider spread of emotion. This data may be whole, measuring diversity, frustration, interestingness, and of use when searching for potentially serendipitous entities. the ability of the user to learn something new about the Table 1 also shows potential WP results for the query query. Figure 1(b) shows that the two random runs are H1N1 Pandemic: a town in Burma called Phaungbyin, and highly diverse but provoke the most frustration. The most 2009 flu pandemic in the United States. We may expect diverse and the least frustrating result sets are provided by pandemic to be associated with negative sentiment, but the the YA IDF run. The WP PN run also shows high diversity, documents in Wikipedia do not display it. but it falls with the IDF constraint. The YA IDF run gives It is our intuition that the two datasets provide a comple- better diversity and interesting scores at p < 0.01 than the mentary view of the entities and their relations, and that a WP IDF run, while performing statistically the same. hybrid system exploiting both resources would provide the To examine the relationship with the serendipity level of best user experience. We leave this for future work. the content, we compute correlation between the learn some- thing new label (LSN) and the others. Figure 1(c) shows 5. ACKNOWLEDGEMENTS the LSN label to be the least correlated with interests of the This work was partially funded by the European Union user in the WP IDF run, and the most for the YA IDF run. Linguistically Motivated Semantic Aggregation Engines Especially in the WP IDF run, the relevance is highly asso- (LiMoSINe) project3 . ciated with the LSN label. We are witnessing two different References searching experiences: in the YA IDF setup the results are [1] G. Jeh and J. Widom. Scaling personalized web search. In diverse and popular, whereas in the WP IDF setup the re- WWW ’03, pages 271–279. ACM, 2003. sults are less diverse, and the user may be less interested in [2] O. Kucuktunc, B. Cambazoglu, I. Weber, and H. Ferhatos- the relevant content, but it will be just as educational. manoglu. A large-scale sentiment analysis for yahoo! answers. Finally we analyze the metadata collected for the entities In WSDM ’12, pages 633–642. ACM, 2012. in any query-result pair: Attitude (A), Sentimentality (S), [3] D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. In CIKM, 2009. Quality (Q), Popularity (V), and Context (T). For each pair, [4] E. G. Toms. Serendipitous information retrieval. In DELOS, we calculate the difference between query and result in these 2000. dimensions. For Context we compute the cosine similarity [5] Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile, and S. Gaffney. between the TF/IDF vectors of the entities. In aggregate, Resolving surface forms to Wikipedia topics. In COLING, the best connections are between result popularity and rel- pages 1335–1343, 2010. evance (0.234), as well as interestingness of the result to the 3 user (0.227), followed by contextual similarity of result and