The document describes a neural probabilistic model for context-based citation recommendation. It learns distributed representations of words and documents using neural networks. These representations are used to calculate the probability of citing a document given a citation context. The model is trained on over 10 million citation context-document pairs from CiteSeer. It significantly outperforms baseline methods in recommending citations based on citation context, achieving the highest MAP, recall, MRR and nDCG scores. Future work could explore combining this local context-based model with global recommendation models.
Using a keyword extraction pipeline to understand concepts in future work sec...
Context-Based Citation Recommendation Using Neural Probabilistic Model (NPM
1. CONTEXT BASED CITATION RECOMMENDATION
Which work you are referring ? Know similar works …
Guided By : Dr. Animesh Mukherjee & Dr. Pawan Goyal
2. Citation and Citation Context
• Citations – crucial for assignments of academic credit. Helps support claims in
one’s own work.
• Citation Context (c) – sequence of words that appear around a particular citation.
• Citation context contains words that describe or summarize the cited papers.
• Semantics of cited documents should be close to citation context.[1]
• Citation Recommendation Engine – helps checking completeness of citations
when authoring a paper and find prior work related to topic under investigation
and to find missing relevant citations.[1]
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
3. Citation and Citation Context
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
[1]
4. Motivation and Approach
• Motivation : Since all the words in the citation context are used to describe the same
citation, the semantics of these words should be similar.
• This motivates to learn the semantic embedding for words in citation contexts and
cited documents and to recommend citations based on semantic distance.
• Proposed to learn distributed semantic representations of words and documents.
• Using the representations, we train a Neural Network Model that estimate probability
of citing a paper given a citation context.[1]
• Neural Network Model will tune distributed representations of words and documents
so that semantic similarity between citation context and cited paper will be high.[1]
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
5. Related work : Global Citation Recommendation
[1] McNee et al. (2002) : www-users.cs.umn.edu/~mcnee/konstan-nectar-2006.pdf
• It suggest a list of references for an entire given manuscript.
McNee et al. (2002) used a partial list of reference as the input query and
recommended additional references based on collaborative filtering.[1]
Strohman et al. (2007) assumed that the input is an incomplete paper manuscript
and recommended papers with high text similarity using bibliography similarity.
Bethard and Jurafsky (2010) introduced features such as topical similarity, author
behavioral patterns, citation count, and recency of publication to train a classifier
for global recommendation.
6. Related work : Local Citation Recommendation
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
• Input query is a particular context where a citation should be made and output is a
short list of papers that need to be cited given context.
He et al. (2010) assumed that user has provided placeholders for citation in a
query manuscript. A probabilistic model was trained to measure the relevance
between the citation contexts and the cited documents.[1]
In another work (He et al. 2011), they first segmented the document into a
sequence of disjointed candidate citation contexts, and then applied the
dependency feature model to retrieve a ranked list of papers for each context.[1]
IBM translation model-1 was trained to learn the probability of citing a document
given a word. Lu et al. (2011) used the translation model to learn the translation
probabilities between words in the citation context and the cited document.[1]
7. Citation Recommendation Model : Problem Definition
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
• We propose to model the citation context given the cited paper : p(c|d)
• Given a dataset of training samples with |C| pairs of citation context 𝑐𝑡 and cited
document 𝑑 𝑡, the objective is to maximize the log-likelihood:[1]
• By assuming that words in the citation contexts are mutually conditional
independent :
• Objective function can be now written as :
9. [1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
[2] http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
[3] http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
Word Representation Learning
• Negative sampling is used to learn the distributed representations of words using
the surrounding words.[1]
• Given a pair of citation context c and cited document d, the skip-gram model will
go through all the words in the citation context using a sliding window.[1]
• For each word 𝑤 𝑚 that appears in citation context c, words that appear within M
words before/after 𝑤 𝑚 are treated as positive samples. Suppose 𝑤 𝑝 is one of the
positive sample for 𝑤 𝑚 .The training objective is defined as:
where 𝑤 𝑛𝑖
is a negative sample randomly generated by noise distribution 𝑝 𝑛(.), and k
is the number of negative samples.[1,2,3]
11. Document Representation Learning (Cont …)
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
• Since we want to fit the neural probabilistic model 𝑝 𝜃 𝑑|𝑤 to the real
distribution 𝑝 𝑟 𝑑|𝑤 , we can rewrite :
• Given each word 𝑤𝑡 𝑖
that appears in the citation context, and the cited
document 𝑑 𝑡 ,we compute its contribution to the log-likelihood along with k
randomly generated noise words 𝑤 𝑛1
……. 𝑤 𝑛 𝑘
using the following objective
function:
• The training objective is to learn both the word representations and
document representations that maximize the equation.[1]
14. Citation Recommendation
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
• Using the fine-tuned word and document representations, we can get the
normalized probability distribution p(w|d).
• The table of p(d|w) is pre-calculated using Bayes’ rule and stored as an
inverted index.
• Given a query q = [𝑤1, 𝑤2,𝑤3 ….. 𝑤|𝑞|], the task is to recommend a list of
document R = [𝑑1, 𝑑2 …. 𝑑 𝑁] that need to be cited. [1]
• We use term – frequency – inverse – context – frequency (TF-ICF) to
measure p(𝑤𝑗 | 𝑞). [1]
15. Experiment (Data and Metrics)
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
• Snapshot of CiteSeer paper and citation database – Oct. 2013
• Dataset splitted into two parts – paper crawled before 2011(training data) and
paper crawled after 2011(testing data).
• Citations are extracted along with their citation context.(One line before and
after the sentence where citation occur).
• No of citation context and citation pairs : |C| = 8,992,476 (training set) and
1,628,698 (testing set). [1]
• No of recommendation is limited to 10 for each query. Dimension of
representational vectors set to n = 600 and no of negative samples k = 10.[1]
• Metrics : MAP, Recall, MRR, nDCG.
16. Baseline Comparison of Neural Probabilistic Model (NPM)
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf
[1]
19. Conclusion and Future Work
• Used the distributed representations of words and documents to build a neural
network based citation recommendation model.
• The proposed model then learns the word and document representations to
calculate the probability of citing a document given a citation context.
• A comparative study on a snapshot of CiteSeer dataset with existing state-of-
the-art methods showed that the proposed model significantly improves the
quality of context-based citation recommendation.
• Since only neural probabilistic models were considered for context-based local
recommendations, one could explore for a combined model.
20. References
1. Wenyi Huangy, ZhaohuiWuz, Chen Liangy, Prasenjit Mitrayz, C. Lee Giles - A Neural
Probabilistic Model for Context Based Recommendation -
http://www.cse.psu.edu/~zzw109/pubs/AAAI2015NeuralProbabilisticModelCitationRecommenda
tion.pdf
2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean - Distributed
Representations ofWords and Phrases and their Compositionality -
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-
compositionality.pdf
3. McNee et al. (2002) : www-users.cs.umn.edu/~mcnee/konstan-nectar-2006.pdf
4. Deep Learning, NLP, and Representations – (Posted : July 7, 2014)
http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
5. Standard IR measures :
https://en.wikipedia.org/wiki/Information_retrieval#Performance_and_correctness_measures