SlideShare a Scribd company logo
1 of 25
Amélie Marian – Rutgers University09/30/2013
Searching Web Forums
Amélie Marian, Rutgers University
Joint work with Gayatree Ganu
Amélie Marian – Rutgers University09/30/2013
2
Forum Popularity and Search
• Forums with most traffic
[http://rankings.big-boards.com]
- BMW
- 50K uniq visitors/day
- 25M Posts
- 0.6M Members
- Filipino Community
- Subaru Impreza Owners
- Rome Total War
- …
- Pakistan Cricket Fan Site
- Prison Talk
- Online Money making
Despite popularity,
forums lack good
search capabilities
Amélie Marian – Rutgers University09/30/2013
3
Patient Emotion and stRucture Search
USer tool(PERSEUS) - Outline
Multi-Granularity Search
Challenges
- Unstructured text
- Background information omitted
- Discussion digression
Contributions
Return each results at varying focus
levels, allowing more or less
context. (CIKM 2013)
Egocentric Search
Challenges
- Multiple interpersonal relations
with varying importance
Contributions
Proposed a multidimensional user
similarity measure.
Use authorship for improving
personalized and keyword search.
Amélie Marian – Rutgers University09/30/2013
4
Hierarchical Model
• Hierarchy over objects at three searchable levels
– pertinent sentences, larger posts, entire discussions or threads
• Hierarchy
– captures strength of association, containment relationship
• Lower levels for
smaller objects
• Edge represents
containment
• Edge weight of 2
indicates that the text
in child was repeated
in the text of parent
Thread 1 Thread 2
Post 1 Post 2 Post 4Post 3
Sent 1 Sent 2 Sent 3 Sent 4 Sent 5 Sent 6
Dataset
Word 1 Word 2 Word 3 Word 4 Word 1
2
2
2
Amélie Marian – Rutgers University09/30/2013
5
Alternate Scoring Functions
Example Textual Results.
Query : hair loss
Top-4 Results
Post1: (A) Aromasin certainly caused my hair loss and the hair started falling 14 days after the
chemo. However, I bought myself a rather fashionable scarf to hide the baldness. I wear it everyday,
even at home. (B) Onc was shocked by my hair loss so I guess it is unusual on Aromasin. I had no
other side effects from Aromasin, no hot flashes, no stomach aches or muscle pains, no headaches or
nausea and none of the chemo brain.
Post2: (C) Probably everyone is sick of the hair loss questions, but I need help with this falling hair. I
had my first cemotherapy on 16th September, so due in one week for the 2nd treatment. (D) Surely
the hair loss can’t be starting this fast..or can it?. I was running my fingers at the nape of my neck
and about five came out in my fingers. Would love to hear from anyone else have AC done
(Doxorubicin and Cyclophosphamide) only as I am not due to have the 3rd drug (whatever that is - 12
weekly sessions) after the 4 sessions of AC. Doctor said that different people have different side
effects, so I wanted to know what you all went through. (E) Have n’t noticed hair loss elsewhere, just
the top hair and mainly at the back of my neck. (F) I thought the hair would start thining out
between 2nd and 3rd treatment, not weeks after the 1st one. I have very curly long ringlets past my
shoulders and am wondering if it would be better to just cut it short or completely shave it off. I am
willing to try anything to make this stop, does anyone have a good recommendation for a shampoo,
vitamins or supplements and (sadly) a good wig shop in downtown LA.
Post3: My suggestion is, don’t focus so much on organic. Things can be organic and very unhealthy. I
believe it when I read that nothing here is truly organic. They’re allowed a certain percentage. I think
5% of the food can not be organic and it still can carry the organic label. What you want is
nonprocessed, traditional foods. Food that comes from a farm or a farmer’s market. Small farmers are
not organic just because it is too much trouble to get the certification. Their produce is probably better
than most of the industrial organic stuff. (G) Sorry Jennifer, chemotherapy and treatment followed
by hair loss is extremely depressing and you cannot prepare enough for falling hair, especially hair
in clumps. (H) I am on femara and hair loss is non-stop, I had full head of thick hair.
tf*idf
Sent (E) (4.742)
Sent (A) (4.711)
Sent (C) (4.696)
Sent (G) (4.689)
BM25
Sent (D) (10.570)
Sent (B) (10.458)
Sent (H) (10.362)
Sent (E) (10.175)
HScore
Post2 (0.131)
Sent (G) (0.093)
Post1 (0.092)
Sent (H) (0.089)
Score tf*idf (t,d) = (1+log(tft,d)) * log(N/dft) * 1/CharLength
Amélie Marian – Rutgers University09/30/2013
6
Scoring Multi-Granularity Results
Goal: Unified scoring for objects at multiple granularity levels
– largely varying sizes
– with inherent containment relationship
Hierarchical Scoring Function (HScore)
Score for node i with respect to search term t and having j children:
… if i is a non-leaf node
= 1 … if i is a leaf node containing t
= 0 … if i is a leaf node not containing t
ewij = edge weight between parent i and child j
P(j) = number of parents of j
C(i) = number of children of i
Amélie Marian – Rutgers University09/30/2013
7
Effect of Size Weighting
Parameter  on HScore
• Parameter  controls the intermixing of granularities
0
2
4
6
8
10
12
14
16
18
20
0 0.1 0.2 0.3 0.4 0.5 BM25
Threads
Posts
Sentences
Size parameter 
Numberofresults
intop-20list
HScore
Amélie Marian – Rutgers University09/30/2013
8
Multi-Granularity Result Generation
Sorted Ordering:
Post3(2.5), Post1(2.1), Post2(2), Sent1(1.6), Sent2(1.5), Sent3(1.4), Sent4(1.3),
Sent6(0.4), Sent5(0.1), Post4(0.1), Thread1(0.1), Thread2(0.1)
For result size k=4, optimizing for the sum of scores:
• Overlap: {Post3, Post1, Post2, Sent1} Sum Score = 8.2 (minus 1.6?)
• Greedy: {Post3, Post1, Post2, Sent6} Sum Score = 7.0
• Best: {Post3, Post2, Sent1, Sent2} Sum Score = 7.6
33% sample queries had overlap amongst at least 3 of top-10 results
Thread 1 Thread 2
Post 1 Post 2 Post 4Post 3
Sent 1 Sent 2 Sent 3 Sent 4 Sent 5 Sent 6
0.1
2.1 2 2.5 0.1
0.1
0.1 0.41.6 1.5 1.4 1.3
Amélie Marian – Rutgers University09/30/2013
9
Multi-Granularity Result Generation
Goal: Generating a non-overlapping result set maximizing
“quality”
• Quality = Sum of scores of all results in the set
• Maximal independent set problem (NP Hard)
• Existing Algorithm: Lexicographic All Independent Sets (LAIS)
outputs maximal independent set with polynomial delay in
specific order
Amélie Marian – Rutgers University09/30/2013
10
Optimal Algorithm for k-set
(OAKS)
• Fix node ordering by decreasing scores
• Efficient OAKS Algorithm (typically k<<n):
– Start with k-sized first independent set, i.e., greedy
– Branch from nodes preceding kth node of the set, check if
maximal
– Find new k-sized maximal sets, save in priority queue
– Reject sets from priority queue where starting node occurs
after current best set’s kth node
Amélie Marian – Rutgers University09/30/2013
11
OAKS
Sorted Ordering:
Post3(2.5), Post1(2.1), Post2(2), Sent1(1.6), Sent2(1.5), Sent3(1.4), Sent4(1.3),
Sent6(0.4), Sent5(0.1), Post4(0.1), Thread1(0.1), Thread2(0.1)
For k=4, Greedy = {Post3, Post1, Post2, Sent6} SumScore=7.0
In the 1st iteration:
{Post3, Post2, Sent1, Sent2} SumScore = 7.6
{Post3 , Post1, Sent3, Sent4} SumScore = 7.3
Branches from nodes before Sent6,
i.e. Sent1, Sent2, Sent3, Sent4
Branch from Sent1, removing all adjacent to Sent1,  {Post3, Post2, Sent1}
Maximal on first 4 nodes? YES!
then complete to size k and insert in queue- {Post3, Post2, Sent1, Sent2}
Thread 1 Thread 2
Post 1 Post 2 Post 4Post 3
Sent 1 Sent 2 Sent 3 Sent 4 Sent 5 Sent 6
0.1
2.1 2 2.5 0.1
0.1
0.1 0.41.6 1.5 1.4 1.3
Amélie Marian – Rutgers University09/30/2013
12
Evaluating OAKS Algorithm
Comparing OAKS Runtime
Small overhead for practical k (=20)
• Scoring time = 0.96 sec
• OAKS Result set generation time = 0.09 sec
Word
Frequency
Sets Evaluated Run Time (sec)
LAIS OAKS LAIS OAKS
20-30 57.59 8.12 0.78 0.12
30-40 102.07 5.06 7.88 0.01
40-50 158.80 5.88 26.94 0.01
50-60 410.18 6.30 82.20 0.02
60-70 716.40 5.26 77.61 0.01
70-80 896.59 8.30 143.33 0.04
Comparing LAIS and OAKS
– 100 relatively infrequent queries
with corpus frequency in range
20-30, 30-40…
– OAKS is very efficient. Time
required by OAKS depends on k
OAKS improves over
Greedy SumScore in
31% queries @top20
Amélie Marian – Rutgers University09/30/2013
13
Dataset and Evaluation Setting
• Data collected from breastcancer.org
– 31K threads, 301K posts, 1.8M unique sentences, 46K keywords
• 18 Sample Queries
– e.g., broccoli, herceptin side effects, emotional meltdown, scarf or
wig, shampoo recommendation …
• Experimental Search Strategies – top20 results
- Mixed-Hierarchy : Optimal mixed granularity result.
- Posts-Hierarchy : Hierarchical scoring of posts only.
- Posts-tf*idf : Existing traditional search.
- Mixed-BM25
Amélie Marian – Rutgers University09/30/2013
14
Evaluating Perceived Relevance
Graded Relevance Scale
Exactly relevant answer,
Relevant but too broad,
Relevant but too narrow,
Partially relevant answer,
Not Relevant
Crowd Sourced Relevance
using Mechanical Turk
- Over 7 annotations
- Quality control -Honey pot
questions
- EM algorithm for consensus
Query = shampoo recommendation
 = 0.1  = 0.2  = 0.3  = 0.4
Rank = 1 Rel Broad Rel Broad Rel Broad Partial
2 Rel Broad Rel Broad Rel Broad Partial
3 Rel Broad Rel Broad Rel Broad Partial
4 Rel Broad Rel Broad Exactly Rel Rel Broad
5 Rel Broad Rel Broad Exactly Rel Partial
6 Exactly Rel Exactly Rel Rel Narrow Rel Narrow
7 Rel Broad Exactly Rel Rel Narrow Not Rel
8 Rel Broad Rel Broad Not Rel Partial
9 Rel Broad Rel Narrow Rel Broad Partial
10 Exactly Rel Rel Narrow Partial Rel Narrow
11 Rel Broad Rel Broad Exactly Rel Not Rel
12 Rel Broad Rel Broad Exactly Rel Not Rel
13 Rel Broad Exactly Rel Partial Not Rel
14 Not Rel Exactly Rel Rel Narrow Partial
15 Not Rel Exactly Rel Not Rel Rel Broad
16 Not Rel Rel Broad Rel Narrow Not Rel
17 Exactly Rel Rel Broad Exactly Rel Not Rel
18 Exactly Rel Exactly Rel Partial Partial
19 Not Rel Rel Broad Rel Narrow Not Rel
20 Not Rel Exactly Rel Partial Not Rel
Mixed-Hierarchy
Amélie Marian – Rutgers University09/30/2013
15
Evaluating Perceived Relevance
Mean Average Precision
Search
System
MAP
@
 =
0.1
 =
0.2
 =
0.3
 =
0.4
Mixed
Hierarchy
10 0.98 0.98 0.90 0.70
20 0.97 0.95 0.85 0.66
Posts-
Hierarchy
10 0.76 0.75 0.77 0.78
20 0.72 0.71 0.73 0.75
Posts-
tf*idf
10 0.76 0.73 0.76 0.76
20 0.74 0.72 0.72 0.73
Mixed
BM25
10 b=0.75 0.55
20 k=1.2 0.54
Clearly, Mixed-H
outperforms
post only methods
Users perceive higher
relevance of mixed
granularity results
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
α=0.1 α=0.2 α=0.3 α=0.4 b=0.75
k1=1.2
Discounted Cumulative Gain
@20
Mixed-hierarchy Posts-hierarchy
Posts-tf*idf Mixed-BM25
Amélie Marian – Rutgers University09/30/2013
16
EgoCentric Search
• Previous technique did not take the authorship of posts into
account
• Some forum participants are similar, sharing same topics of
interest or having the same needs, not necessarily at the
same time
– Rank similar author’s posts higher for personalized search
• Some forum participants are experts, prolific and
knowledgeable
– Expert opinions carry more weight in keyword search
• Author score to enhance personalized & keyword search
Amélie Marian – Rutgers University09/30/2013
17
Author Score
• Forum participants have several reasons to be linked
• Build a multidimensional heterogeneous graph over authors
incorporating many relations
• But, users assign different importance to different relations
auth 1
Topic 1
auth 2
auth n
Topic 2
Topic t
query 1
query 2
query n
W(a,t) W(q,t) author 1
author 2
author n
author 3
W(a1,a2)
User Profiles:
- Location
- Age
- Cancer stage
- Treatment
- …
-Co-participation
-Explicit References
Amélie Marian – Rutgers University09/30/2013
18
Contributions
Critical problem for leveraging authorship for search:
Incorporating multiple user relations with varying importance
learned egocentrically from user behavior
Outline:
• Author score computation using multidimensional graph
• Personalized predictions of user interactions: authors most
likely to provide answers
• Re-ranking results of keyword search using author expertise
Amélie Marian – Rutgers University09/30/2013
19
Multi-Dimensional Random
Walks (MRW)
• Random Walks (RW) for finding most influential users
– Pt+1 = M × Pt … till convergence
– M = α(A + D) + (1 − α)E … relation matrix A, D for dangling
nodes, uniform matrix E, α usually set to 0.85
• Rooted RW for node similarity
– Teleport back to root node with probability (1-α)
– Computes similarity of all nodes w.r.t root node
• Multidimensional RW– Heterogeneous Networks:
– Transition matrix computed as A = 1 * A1 +  2 * A2 + ... +  n * An
where i  i = 1 and all  i >= 0
– Egocentric weights -
For root node r :  i (r) = j ewAi (r, m)/ Ak j ewAk (r, j)
…  m  Ai and  j  Ak
a
b
c
2
3
A =
a b c
a 0 0 0
b 2 0 0
c 0 3 0
D =
a b c
a 0 0 0.33
b 0 0 0.33
c 0 0 0.33
E =
a b c
a .33 .33 .33
b .33 .33 .33
c .33 .33 .33
Amélie Marian – Rutgers University09/30/2013
20
Personalized Answer Search
• Link prediction by leveraging user similarities:
– Given participant behavior, find similar users to the user asking question
– Predict who will respond to this question
• Learn similarities from first 90% training threads
• Relations used:
– Topics covered in text, Co-participation in threads,
Signature profiles, Proximity of posts
• MRW similarity compared with baselines:
– Single relations
– PathSim:
• Existing approach for heterogeneous networks
• Predefined paths of fixed length
• No dynamic choice of path
Link prediction enables
suggesting which threads
or which users to follow
Amélie Marian – Rutgers University09/30/2013
21
Predicting User Interactions
0
0.1
0.2
0.3
0.4
0.5
10 20 30 40 50 60 70 80 90 100
MAP
Top-K similar participants
MAP for link prediction
Multidimensional RW
has best prediction
performance
Amélie Marian – Rutgers University09/30/2013
22
Predicting User Interactions
• Leverage content of the initial post to find users who are
experts on the question
– TopicScore computed as cosine similarity between author’s history and
initial post
– UserScore = β * MRWScore + (1- β) * TopicScore
Neighbors β = 0 β = 0.1 β = 0.2 β = 1
Top 5 0.52 0.64 (8%) 0.61 (4%) 0.59
Top 10 0.31 0.50 (8%) 0.49 (5%) 0.46
Top 15 0.24 0.43 (8%) 0.42 (6%) 0.40
Top 20 0.20 0.39 (6%) 0.39 (7%) 0.37
Purely MRW
Purely topical
expertise
% Improvement over purely MRW
MAP
Amélie Marian – Rutgers University09/30/2013
23
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.80
0.81
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MAP@10
Tradeoff Parameter ω
IR Score λ=0.1
IR Score λ=0.2
Enhanced Keyword Search
• Non-rooted RW to find most influential expert users
• Re-rank top-k results of IR scoring using author scores
• Final score of post = ω*IR_score λ + (1- ω)*Authority_score
– Posts only, tf*idf scoring with size parameter 
Re-ranking search
results with author
score yields higher
MAP relevance
4% improvement
5%
Amélie Marian – Rutgers University09/30/2013
24
Patient Emotion and stRucture Search
USer tool(PERSEUS) - Conclusions
• Designed hierarchical model and score that allows generating
search results at several granularities of web forum objects.
• Proposed OAKS algorithm for best non-overlapping result.
• Conducted extensive user studies, show that mixed collection of
granularities yields better relevance than post-only results.
• Combined multiple relations linking users for computing similarities
• Enhanced search results using multidimensional author similarity
• Future Directions:
– Multi-granular search on web pages, blogs, emails. Dynamic focus level
selection.
– Search in and out of context over dialogue, interviews, Q&A.
– Optimal result set selection for targeted advertising, result diversification
– Time sensitive recommendations – Changing friendships, progressive
search needs.
Amélie Marian – Rutgers University09/30/2013
Thank you!

More Related Content

Viewers also liked

Our changing earth
Our changing earthOur changing earth
Our changing earthnfregelette
 
RHS Class of 82 Alumni Slideshow
RHS Class of 82 Alumni SlideshowRHS Class of 82 Alumni Slideshow
RHS Class of 82 Alumni SlideshowKaren Lovins Hobbs
 
Porody sobak 2
Porody sobak 2Porody sobak 2
Porody sobak 2Ivakina
 
20130109 het abc van sociale media ertvelde
20130109 het abc van sociale media ertvelde20130109 het abc van sociale media ertvelde
20130109 het abc van sociale media ertveldekwb_eensgezind
 
Future office multi employer worksite ppt
Future office multi employer worksite pptFuture office multi employer worksite ppt
Future office multi employer worksite pptBethany Yorio
 
LEVICK Weekly - Mar 22
LEVICK Weekly - Mar 22LEVICK Weekly - Mar 22
LEVICK Weekly - Mar 22LEVICK
 
Mercuri international business flash february 2013 ii fornight
Mercuri international business flash february 2013    ii fornightMercuri international business flash february 2013    ii fornight
Mercuri international business flash february 2013 ii fornightYogesh Bhat
 
Inbound marketing overview for scoping calls
Inbound marketing overview for scoping callsInbound marketing overview for scoping calls
Inbound marketing overview for scoping callsBrightIdeas.co
 
Evaluation: Question 6
Evaluation: Question 6Evaluation: Question 6
Evaluation: Question 6smdoyle
 
Pron d i_r_ci_ne_ripasso_bas2
Pron d i_r_ci_ne_ripasso_bas2Pron d i_r_ci_ne_ripasso_bas2
Pron d i_r_ci_ne_ripasso_bas2Danilo Buccarello
 
09 state of the art of the management of advanced and recurrent ovarian cancer
09   state of the art of the management of advanced and recurrent ovarian cancer09   state of the art of the management of advanced and recurrent ovarian cancer
09 state of the art of the management of advanced and recurrent ovarian cancerONCOcare
 

Viewers also liked (18)

Our changing earth
Our changing earthOur changing earth
Our changing earth
 
Sapphire Orlando 2013
Sapphire Orlando 2013Sapphire Orlando 2013
Sapphire Orlando 2013
 
Els metalls
Els metallsEls metalls
Els metalls
 
RHS Class of 82 Alumni Slideshow
RHS Class of 82 Alumni SlideshowRHS Class of 82 Alumni Slideshow
RHS Class of 82 Alumni Slideshow
 
Porody sobak 2
Porody sobak 2Porody sobak 2
Porody sobak 2
 
Fotoalbum.swv
Fotoalbum.swvFotoalbum.swv
Fotoalbum.swv
 
20130109 het abc van sociale media ertvelde
20130109 het abc van sociale media ertvelde20130109 het abc van sociale media ertvelde
20130109 het abc van sociale media ertvelde
 
Future office multi employer worksite ppt
Future office multi employer worksite pptFuture office multi employer worksite ppt
Future office multi employer worksite ppt
 
حساب
حسابحساب
حساب
 
Presentation1
Presentation1Presentation1
Presentation1
 
LEVICK Weekly - Mar 22
LEVICK Weekly - Mar 22LEVICK Weekly - Mar 22
LEVICK Weekly - Mar 22
 
Mercuri international business flash february 2013 ii fornight
Mercuri international business flash february 2013    ii fornightMercuri international business flash february 2013    ii fornight
Mercuri international business flash february 2013 ii fornight
 
Inbound marketing overview for scoping calls
Inbound marketing overview for scoping callsInbound marketing overview for scoping calls
Inbound marketing overview for scoping calls
 
Ipronomicombinati bas2
Ipronomicombinati bas2Ipronomicombinati bas2
Ipronomicombinati bas2
 
Evaluation: Question 6
Evaluation: Question 6Evaluation: Question 6
Evaluation: Question 6
 
REPRODUCCIÓN
REPRODUCCIÓNREPRODUCCIÓN
REPRODUCCIÓN
 
Pron d i_r_ci_ne_ripasso_bas2
Pron d i_r_ci_ne_ripasso_bas2Pron d i_r_ci_ne_ripasso_bas2
Pron d i_r_ci_ne_ripasso_bas2
 
09 state of the art of the management of advanced and recurrent ovarian cancer
09   state of the art of the management of advanced and recurrent ovarian cancer09   state of the art of the management of advanced and recurrent ovarian cancer
09 state of the art of the management of advanced and recurrent ovarian cancer
 

Similar to Searching Web Forums

Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptxImXaib
 
Introductory LogicUnit 6 - Assignment 850 pts.I. For each .docx
Introductory LogicUnit 6 - Assignment 850 pts.I. For each .docxIntroductory LogicUnit 6 - Assignment 850 pts.I. For each .docx
Introductory LogicUnit 6 - Assignment 850 pts.I. For each .docxnormanibarber20063
 
Essay Tagger Review. Online assignment writing service.
Essay Tagger Review. Online assignment writing service.Essay Tagger Review. Online assignment writing service.
Essay Tagger Review. Online assignment writing service.Marissa Collazo
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesSujit Pal
 
A Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection TechniquesA Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection TechniquesIRJET Journal
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edubutest
 
1. For each of the following code segments, use OpenMP pragmas.docx
1. For each of the following code segments, use OpenMP pragmas.docx1. For each of the following code segments, use OpenMP pragmas.docx
1. For each of the following code segments, use OpenMP pragmas.docxdurantheseldine
 
Week8 finalexamlivelecture 2010june
Week8 finalexamlivelecture 2010juneWeek8 finalexamlivelecture 2010june
Week8 finalexamlivelecture 2010juneBrent Heard
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Avkash Chauhan
 
Tips And Tricks for Teaching Math Online 2
Tips And Tricks for Teaching Math Online 2Tips And Tricks for Teaching Math Online 2
Tips And Tricks for Teaching Math Online 2Fred Feldon
 
(8) Lesson 9.5
(8) Lesson 9.5(8) Lesson 9.5
(8) Lesson 9.5wzuri
 
Measures of Central tendency
Measures of Central tendencyMeasures of Central tendency
Measures of Central tendencyEdrin Jay Morta
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematicsDr. Trilok Kumar Jain
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Mathematics in the Modern World - GE3 - Set Theory
Mathematics in the Modern World - GE3 - Set TheoryMathematics in the Modern World - GE3 - Set Theory
Mathematics in the Modern World - GE3 - Set TheoryFlipped Channel
 
Tips and Tricks Irvine Valley College 2013
Tips and Tricks Irvine Valley College 2013Tips and Tricks Irvine Valley College 2013
Tips and Tricks Irvine Valley College 2013Fred Feldon
 

Similar to Searching Web Forums (20)

Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptx
 
Introductory LogicUnit 6 - Assignment 850 pts.I. For each .docx
Introductory LogicUnit 6 - Assignment 850 pts.I. For each .docxIntroductory LogicUnit 6 - Assignment 850 pts.I. For each .docx
Introductory LogicUnit 6 - Assignment 850 pts.I. For each .docx
 
Essay Tagger Review. Online assignment writing service.
Essay Tagger Review. Online assignment writing service.Essay Tagger Review. Online assignment writing service.
Essay Tagger Review. Online assignment writing service.
 
Fuzzy clustering of sentence
Fuzzy clustering of sentenceFuzzy clustering of sentence
Fuzzy clustering of sentence
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other Stories
 
A Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection TechniquesA Comparative Analysis of Genetic Algorithm Selection Techniques
A Comparative Analysis of Genetic Algorithm Selection Techniques
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edu
 
1. For each of the following code segments, use OpenMP pragmas.docx
1. For each of the following code segments, use OpenMP pragmas.docx1. For each of the following code segments, use OpenMP pragmas.docx
1. For each of the following code segments, use OpenMP pragmas.docx
 
Unit1 ed572seminar
Unit1 ed572seminarUnit1 ed572seminar
Unit1 ed572seminar
 
Week8 finalexamlivelecture 2010june
Week8 finalexamlivelecture 2010juneWeek8 finalexamlivelecture 2010june
Week8 finalexamlivelecture 2010june
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
Tips And Tricks for Teaching Math Online 2
Tips And Tricks for Teaching Math Online 2Tips And Tricks for Teaching Math Online 2
Tips And Tricks for Teaching Math Online 2
 
(8) Lesson 9.5
(8) Lesson 9.5(8) Lesson 9.5
(8) Lesson 9.5
 
Measures of Central tendency
Measures of Central tendencyMeasures of Central tendency
Measures of Central tendency
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematics
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Mathematics in the Modern World - GE3 - Set Theory
Mathematics in the Modern World - GE3 - Set TheoryMathematics in the Modern World - GE3 - Set Theory
Mathematics in the Modern World - GE3 - Set Theory
 
Accounting Fundamentals
Accounting FundamentalsAccounting Fundamentals
Accounting Fundamentals
 
Accounting For Business 10 October
Accounting For Business 10 OctoberAccounting For Business 10 October
Accounting For Business 10 October
 
Tips and Tricks Irvine Valley College 2013
Tips and Tricks Irvine Valley College 2013Tips and Tricks Irvine Valley College 2013
Tips and Tricks Irvine Valley College 2013
 

More from Amélie Marian

Integration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital TracesIntegration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital TracesAmélie Marian
 
Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Amélie Marian
 
Personal Information Search and Discovery
Personal Information Search and DiscoveryPersonal Information Search and Discovery
Personal Information Search and DiscoveryAmélie Marian
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksAmélie Marian
 
Corroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsCorroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsAmélie Marian
 
Searching data with substance and style
Searching data with substance and styleSearching data with substance and style
Searching data with substance and styleAmélie Marian
 

More from Amélie Marian (6)

Integration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital TracesIntegration and Exploration of Connected Personal Digital Traces
Integration and Exploration of Connected Personal Digital Traces
 
Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015Miettes de données - Keynote BDA 2015
Miettes de données - Keynote BDA 2015
 
Personal Information Search and Discovery
Personal Information Search and DiscoveryPersonal Information Search and Discovery
Personal Information Search and Discovery
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random Walks
 
Corroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative StatementsCorroborating Facts from Affirmative Statements
Corroborating Facts from Affirmative Statements
 
Searching data with substance and style
Searching data with substance and styleSearching data with substance and style
Searching data with substance and style
 

Recently uploaded

Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 

Recently uploaded (20)

Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 

Searching Web Forums

  • 1. Amélie Marian – Rutgers University09/30/2013 Searching Web Forums Amélie Marian, Rutgers University Joint work with Gayatree Ganu
  • 2. Amélie Marian – Rutgers University09/30/2013 2 Forum Popularity and Search • Forums with most traffic [http://rankings.big-boards.com] - BMW - 50K uniq visitors/day - 25M Posts - 0.6M Members - Filipino Community - Subaru Impreza Owners - Rome Total War - … - Pakistan Cricket Fan Site - Prison Talk - Online Money making Despite popularity, forums lack good search capabilities
  • 3. Amélie Marian – Rutgers University09/30/2013 3 Patient Emotion and stRucture Search USer tool(PERSEUS) - Outline Multi-Granularity Search Challenges - Unstructured text - Background information omitted - Discussion digression Contributions Return each results at varying focus levels, allowing more or less context. (CIKM 2013) Egocentric Search Challenges - Multiple interpersonal relations with varying importance Contributions Proposed a multidimensional user similarity measure. Use authorship for improving personalized and keyword search.
  • 4. Amélie Marian – Rutgers University09/30/2013 4 Hierarchical Model • Hierarchy over objects at three searchable levels – pertinent sentences, larger posts, entire discussions or threads • Hierarchy – captures strength of association, containment relationship • Lower levels for smaller objects • Edge represents containment • Edge weight of 2 indicates that the text in child was repeated in the text of parent Thread 1 Thread 2 Post 1 Post 2 Post 4Post 3 Sent 1 Sent 2 Sent 3 Sent 4 Sent 5 Sent 6 Dataset Word 1 Word 2 Word 3 Word 4 Word 1 2 2 2
  • 5. Amélie Marian – Rutgers University09/30/2013 5 Alternate Scoring Functions Example Textual Results. Query : hair loss Top-4 Results Post1: (A) Aromasin certainly caused my hair loss and the hair started falling 14 days after the chemo. However, I bought myself a rather fashionable scarf to hide the baldness. I wear it everyday, even at home. (B) Onc was shocked by my hair loss so I guess it is unusual on Aromasin. I had no other side effects from Aromasin, no hot flashes, no stomach aches or muscle pains, no headaches or nausea and none of the chemo brain. Post2: (C) Probably everyone is sick of the hair loss questions, but I need help with this falling hair. I had my first cemotherapy on 16th September, so due in one week for the 2nd treatment. (D) Surely the hair loss can’t be starting this fast..or can it?. I was running my fingers at the nape of my neck and about five came out in my fingers. Would love to hear from anyone else have AC done (Doxorubicin and Cyclophosphamide) only as I am not due to have the 3rd drug (whatever that is - 12 weekly sessions) after the 4 sessions of AC. Doctor said that different people have different side effects, so I wanted to know what you all went through. (E) Have n’t noticed hair loss elsewhere, just the top hair and mainly at the back of my neck. (F) I thought the hair would start thining out between 2nd and 3rd treatment, not weeks after the 1st one. I have very curly long ringlets past my shoulders and am wondering if it would be better to just cut it short or completely shave it off. I am willing to try anything to make this stop, does anyone have a good recommendation for a shampoo, vitamins or supplements and (sadly) a good wig shop in downtown LA. Post3: My suggestion is, don’t focus so much on organic. Things can be organic and very unhealthy. I believe it when I read that nothing here is truly organic. They’re allowed a certain percentage. I think 5% of the food can not be organic and it still can carry the organic label. What you want is nonprocessed, traditional foods. Food that comes from a farm or a farmer’s market. Small farmers are not organic just because it is too much trouble to get the certification. Their produce is probably better than most of the industrial organic stuff. (G) Sorry Jennifer, chemotherapy and treatment followed by hair loss is extremely depressing and you cannot prepare enough for falling hair, especially hair in clumps. (H) I am on femara and hair loss is non-stop, I had full head of thick hair. tf*idf Sent (E) (4.742) Sent (A) (4.711) Sent (C) (4.696) Sent (G) (4.689) BM25 Sent (D) (10.570) Sent (B) (10.458) Sent (H) (10.362) Sent (E) (10.175) HScore Post2 (0.131) Sent (G) (0.093) Post1 (0.092) Sent (H) (0.089) Score tf*idf (t,d) = (1+log(tft,d)) * log(N/dft) * 1/CharLength
  • 6. Amélie Marian – Rutgers University09/30/2013 6 Scoring Multi-Granularity Results Goal: Unified scoring for objects at multiple granularity levels – largely varying sizes – with inherent containment relationship Hierarchical Scoring Function (HScore) Score for node i with respect to search term t and having j children: … if i is a non-leaf node = 1 … if i is a leaf node containing t = 0 … if i is a leaf node not containing t ewij = edge weight between parent i and child j P(j) = number of parents of j C(i) = number of children of i
  • 7. Amélie Marian – Rutgers University09/30/2013 7 Effect of Size Weighting Parameter  on HScore • Parameter  controls the intermixing of granularities 0 2 4 6 8 10 12 14 16 18 20 0 0.1 0.2 0.3 0.4 0.5 BM25 Threads Posts Sentences Size parameter  Numberofresults intop-20list HScore
  • 8. Amélie Marian – Rutgers University09/30/2013 8 Multi-Granularity Result Generation Sorted Ordering: Post3(2.5), Post1(2.1), Post2(2), Sent1(1.6), Sent2(1.5), Sent3(1.4), Sent4(1.3), Sent6(0.4), Sent5(0.1), Post4(0.1), Thread1(0.1), Thread2(0.1) For result size k=4, optimizing for the sum of scores: • Overlap: {Post3, Post1, Post2, Sent1} Sum Score = 8.2 (minus 1.6?) • Greedy: {Post3, Post1, Post2, Sent6} Sum Score = 7.0 • Best: {Post3, Post2, Sent1, Sent2} Sum Score = 7.6 33% sample queries had overlap amongst at least 3 of top-10 results Thread 1 Thread 2 Post 1 Post 2 Post 4Post 3 Sent 1 Sent 2 Sent 3 Sent 4 Sent 5 Sent 6 0.1 2.1 2 2.5 0.1 0.1 0.1 0.41.6 1.5 1.4 1.3
  • 9. Amélie Marian – Rutgers University09/30/2013 9 Multi-Granularity Result Generation Goal: Generating a non-overlapping result set maximizing “quality” • Quality = Sum of scores of all results in the set • Maximal independent set problem (NP Hard) • Existing Algorithm: Lexicographic All Independent Sets (LAIS) outputs maximal independent set with polynomial delay in specific order
  • 10. Amélie Marian – Rutgers University09/30/2013 10 Optimal Algorithm for k-set (OAKS) • Fix node ordering by decreasing scores • Efficient OAKS Algorithm (typically k<<n): – Start with k-sized first independent set, i.e., greedy – Branch from nodes preceding kth node of the set, check if maximal – Find new k-sized maximal sets, save in priority queue – Reject sets from priority queue where starting node occurs after current best set’s kth node
  • 11. Amélie Marian – Rutgers University09/30/2013 11 OAKS Sorted Ordering: Post3(2.5), Post1(2.1), Post2(2), Sent1(1.6), Sent2(1.5), Sent3(1.4), Sent4(1.3), Sent6(0.4), Sent5(0.1), Post4(0.1), Thread1(0.1), Thread2(0.1) For k=4, Greedy = {Post3, Post1, Post2, Sent6} SumScore=7.0 In the 1st iteration: {Post3, Post2, Sent1, Sent2} SumScore = 7.6 {Post3 , Post1, Sent3, Sent4} SumScore = 7.3 Branches from nodes before Sent6, i.e. Sent1, Sent2, Sent3, Sent4 Branch from Sent1, removing all adjacent to Sent1,  {Post3, Post2, Sent1} Maximal on first 4 nodes? YES! then complete to size k and insert in queue- {Post3, Post2, Sent1, Sent2} Thread 1 Thread 2 Post 1 Post 2 Post 4Post 3 Sent 1 Sent 2 Sent 3 Sent 4 Sent 5 Sent 6 0.1 2.1 2 2.5 0.1 0.1 0.1 0.41.6 1.5 1.4 1.3
  • 12. Amélie Marian – Rutgers University09/30/2013 12 Evaluating OAKS Algorithm Comparing OAKS Runtime Small overhead for practical k (=20) • Scoring time = 0.96 sec • OAKS Result set generation time = 0.09 sec Word Frequency Sets Evaluated Run Time (sec) LAIS OAKS LAIS OAKS 20-30 57.59 8.12 0.78 0.12 30-40 102.07 5.06 7.88 0.01 40-50 158.80 5.88 26.94 0.01 50-60 410.18 6.30 82.20 0.02 60-70 716.40 5.26 77.61 0.01 70-80 896.59 8.30 143.33 0.04 Comparing LAIS and OAKS – 100 relatively infrequent queries with corpus frequency in range 20-30, 30-40… – OAKS is very efficient. Time required by OAKS depends on k OAKS improves over Greedy SumScore in 31% queries @top20
  • 13. Amélie Marian – Rutgers University09/30/2013 13 Dataset and Evaluation Setting • Data collected from breastcancer.org – 31K threads, 301K posts, 1.8M unique sentences, 46K keywords • 18 Sample Queries – e.g., broccoli, herceptin side effects, emotional meltdown, scarf or wig, shampoo recommendation … • Experimental Search Strategies – top20 results - Mixed-Hierarchy : Optimal mixed granularity result. - Posts-Hierarchy : Hierarchical scoring of posts only. - Posts-tf*idf : Existing traditional search. - Mixed-BM25
  • 14. Amélie Marian – Rutgers University09/30/2013 14 Evaluating Perceived Relevance Graded Relevance Scale Exactly relevant answer, Relevant but too broad, Relevant but too narrow, Partially relevant answer, Not Relevant Crowd Sourced Relevance using Mechanical Turk - Over 7 annotations - Quality control -Honey pot questions - EM algorithm for consensus Query = shampoo recommendation  = 0.1  = 0.2  = 0.3  = 0.4 Rank = 1 Rel Broad Rel Broad Rel Broad Partial 2 Rel Broad Rel Broad Rel Broad Partial 3 Rel Broad Rel Broad Rel Broad Partial 4 Rel Broad Rel Broad Exactly Rel Rel Broad 5 Rel Broad Rel Broad Exactly Rel Partial 6 Exactly Rel Exactly Rel Rel Narrow Rel Narrow 7 Rel Broad Exactly Rel Rel Narrow Not Rel 8 Rel Broad Rel Broad Not Rel Partial 9 Rel Broad Rel Narrow Rel Broad Partial 10 Exactly Rel Rel Narrow Partial Rel Narrow 11 Rel Broad Rel Broad Exactly Rel Not Rel 12 Rel Broad Rel Broad Exactly Rel Not Rel 13 Rel Broad Exactly Rel Partial Not Rel 14 Not Rel Exactly Rel Rel Narrow Partial 15 Not Rel Exactly Rel Not Rel Rel Broad 16 Not Rel Rel Broad Rel Narrow Not Rel 17 Exactly Rel Rel Broad Exactly Rel Not Rel 18 Exactly Rel Exactly Rel Partial Partial 19 Not Rel Rel Broad Rel Narrow Not Rel 20 Not Rel Exactly Rel Partial Not Rel Mixed-Hierarchy
  • 15. Amélie Marian – Rutgers University09/30/2013 15 Evaluating Perceived Relevance Mean Average Precision Search System MAP @  = 0.1  = 0.2  = 0.3  = 0.4 Mixed Hierarchy 10 0.98 0.98 0.90 0.70 20 0.97 0.95 0.85 0.66 Posts- Hierarchy 10 0.76 0.75 0.77 0.78 20 0.72 0.71 0.73 0.75 Posts- tf*idf 10 0.76 0.73 0.76 0.76 20 0.74 0.72 0.72 0.73 Mixed BM25 10 b=0.75 0.55 20 k=1.2 0.54 Clearly, Mixed-H outperforms post only methods Users perceive higher relevance of mixed granularity results 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 α=0.1 α=0.2 α=0.3 α=0.4 b=0.75 k1=1.2 Discounted Cumulative Gain @20 Mixed-hierarchy Posts-hierarchy Posts-tf*idf Mixed-BM25
  • 16. Amélie Marian – Rutgers University09/30/2013 16 EgoCentric Search • Previous technique did not take the authorship of posts into account • Some forum participants are similar, sharing same topics of interest or having the same needs, not necessarily at the same time – Rank similar author’s posts higher for personalized search • Some forum participants are experts, prolific and knowledgeable – Expert opinions carry more weight in keyword search • Author score to enhance personalized & keyword search
  • 17. Amélie Marian – Rutgers University09/30/2013 17 Author Score • Forum participants have several reasons to be linked • Build a multidimensional heterogeneous graph over authors incorporating many relations • But, users assign different importance to different relations auth 1 Topic 1 auth 2 auth n Topic 2 Topic t query 1 query 2 query n W(a,t) W(q,t) author 1 author 2 author n author 3 W(a1,a2) User Profiles: - Location - Age - Cancer stage - Treatment - … -Co-participation -Explicit References
  • 18. Amélie Marian – Rutgers University09/30/2013 18 Contributions Critical problem for leveraging authorship for search: Incorporating multiple user relations with varying importance learned egocentrically from user behavior Outline: • Author score computation using multidimensional graph • Personalized predictions of user interactions: authors most likely to provide answers • Re-ranking results of keyword search using author expertise
  • 19. Amélie Marian – Rutgers University09/30/2013 19 Multi-Dimensional Random Walks (MRW) • Random Walks (RW) for finding most influential users – Pt+1 = M × Pt … till convergence – M = α(A + D) + (1 − α)E … relation matrix A, D for dangling nodes, uniform matrix E, α usually set to 0.85 • Rooted RW for node similarity – Teleport back to root node with probability (1-α) – Computes similarity of all nodes w.r.t root node • Multidimensional RW– Heterogeneous Networks: – Transition matrix computed as A = 1 * A1 +  2 * A2 + ... +  n * An where i  i = 1 and all  i >= 0 – Egocentric weights - For root node r :  i (r) = j ewAi (r, m)/ Ak j ewAk (r, j) …  m  Ai and  j  Ak a b c 2 3 A = a b c a 0 0 0 b 2 0 0 c 0 3 0 D = a b c a 0 0 0.33 b 0 0 0.33 c 0 0 0.33 E = a b c a .33 .33 .33 b .33 .33 .33 c .33 .33 .33
  • 20. Amélie Marian – Rutgers University09/30/2013 20 Personalized Answer Search • Link prediction by leveraging user similarities: – Given participant behavior, find similar users to the user asking question – Predict who will respond to this question • Learn similarities from first 90% training threads • Relations used: – Topics covered in text, Co-participation in threads, Signature profiles, Proximity of posts • MRW similarity compared with baselines: – Single relations – PathSim: • Existing approach for heterogeneous networks • Predefined paths of fixed length • No dynamic choice of path Link prediction enables suggesting which threads or which users to follow
  • 21. Amélie Marian – Rutgers University09/30/2013 21 Predicting User Interactions 0 0.1 0.2 0.3 0.4 0.5 10 20 30 40 50 60 70 80 90 100 MAP Top-K similar participants MAP for link prediction Multidimensional RW has best prediction performance
  • 22. Amélie Marian – Rutgers University09/30/2013 22 Predicting User Interactions • Leverage content of the initial post to find users who are experts on the question – TopicScore computed as cosine similarity between author’s history and initial post – UserScore = β * MRWScore + (1- β) * TopicScore Neighbors β = 0 β = 0.1 β = 0.2 β = 1 Top 5 0.52 0.64 (8%) 0.61 (4%) 0.59 Top 10 0.31 0.50 (8%) 0.49 (5%) 0.46 Top 15 0.24 0.43 (8%) 0.42 (6%) 0.40 Top 20 0.20 0.39 (6%) 0.39 (7%) 0.37 Purely MRW Purely topical expertise % Improvement over purely MRW MAP
  • 23. Amélie Marian – Rutgers University09/30/2013 23 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP@10 Tradeoff Parameter ω IR Score λ=0.1 IR Score λ=0.2 Enhanced Keyword Search • Non-rooted RW to find most influential expert users • Re-rank top-k results of IR scoring using author scores • Final score of post = ω*IR_score λ + (1- ω)*Authority_score – Posts only, tf*idf scoring with size parameter  Re-ranking search results with author score yields higher MAP relevance 4% improvement 5%
  • 24. Amélie Marian – Rutgers University09/30/2013 24 Patient Emotion and stRucture Search USer tool(PERSEUS) - Conclusions • Designed hierarchical model and score that allows generating search results at several granularities of web forum objects. • Proposed OAKS algorithm for best non-overlapping result. • Conducted extensive user studies, show that mixed collection of granularities yields better relevance than post-only results. • Combined multiple relations linking users for computing similarities • Enhanced search results using multidimensional author similarity • Future Directions: – Multi-granular search on web pages, blogs, emails. Dynamic focus level selection. – Search in and out of context over dialogue, interviews, Q&A. – Optimal result set selection for targeted advertising, result diversification – Time sensitive recommendations – Changing friendships, progressive search needs.
  • 25. Amélie Marian – Rutgers University09/30/2013 Thank you!

Editor's Notes

  1. Large amount of unstructured textBackground information is often omittedDigressionTime sensitivity and repetitionsLacking good search capabilities
  2. Alpha = 0.2@10 MAP 31%@20 MAP 34%
  3. Our multidimensional RW approach significantly improves over the single thread co-participation relation by 10% for k = 10 neighbors and 21% for k = 100