3. Scientific ArticlesA C L A n t h o l o g y – A c o l l e c t i o n o f 2 0 , 0 0 0
a r t i c l e s i n c o m p u t a t i o n a l l i n g u i s t i c s
4. FacetedN o t j u s t r e c o m m e n d a t i o n s , b u t h o w t h e y
a r e r e l a t e d
6. •Edge labelling task
b
d
l
A b
A
d
l
• Set of Nodes
• Links between similar nodes
• Label the edges
• Analogy
• Nudge user – suggest why
one should buy the combo
offered in Flipkart
• Type of social ties in a
friendship network
7. CHALLENGES
Quality
Accessibility
Ranking
Scalable
Q
R
A
S
• High Specificity & Precision
• Outperforms current system for
Scientific Articles retrieval by high
margin
• Individual ranking per facet
• Most relevant entry comes first
• Aggregation of ranklists over Content
and Citation network info
• Categorized into 4 facets
• Easy to streamline as per need
and filter results
• Random Walks (with restarts)
• Independent of domain
8. InformationOverload
Even for Relatively closed community like ACL
IRTools
Rather than text based indexing
Varyingintentions
Streamlined results based on intention, entries
may appear, which otherwise may not appear in
flat recommendations
9.
10. Dataset
ACL Anthology Collection
Statistics Full Filtered
Number of papers 21,212 9,843
Average number of references
(within ACL only)
5.23 6.21
Number of unique authors 17,551 7,892
Number of unique venues 451 280
• Computational Linguistics
• 1961 – 2013
• text data open to public
11. FormCitationNetwork
• Identify Citation Contexts and Section heading - parscit
• Section heading to Facet Mapping
• Refinement of facets from prior works
Number of citation contexts
extracted
61,051
Number of BG Edges 23,022
Number of AA Edges 10,797
Number of MD Edges 8,828
Number of CM Edges 18,404
AA – Alternative
Approaches
BG – Background
CM – Comparison
MD – Method
12. InducedSubgraphs
• Query Paper
• 2 hop citation in either direction
• Highly similar papers based on cosine similarity
Nodes
• Edges belonging to a particular facet
• 4 different subgraphs for each query paperEdges
13. RandomWalks
• Random walks with restarts
• The walker iteratively moves to its neighbourhood with a probability proportional to the
edge weights.
• Restart probability c = 0.4, to return to the starting node i.
• Teleportation with probability 0.3
15. EXPERIMENTAL RESULTS
• most cosine similar paper comes in 1 hop or 2 hop itself
• less edge density as citation increases (due to single edges or few edges)
• MD sub-graphs have nodes with high degree
• Average path length increases with citation count
• clustering coefficient correlates wit edge density
• 1-hop nodes contribute more in this measurement.
18. EVALUATION
• All systems perform better in >2 hop
• cosine similarity - FeRoSA works in all sections, while others works marginally better or equivalent to
ferosa only in high or mid
• Pr, - FeRoSA in all 3 buckets, others suffer in low citation buckets