Context-Based Citation Recommendation Using Neural Probabilistic Model (NPM

CONTEXT BASED CITATION RECOMMENDATION
Which work you are referring ? Know similar works …
Guided By : Dr. Animesh Mukherjee & Dr. Pawan Goyal

Citation and Citation Context
• Citations – crucial for assignments of academic credit. Helps support claims in
one’s own work.
• Citation Context (c) – sequence of words that appear around a particular citation.
• Citation context contains words that describe or summarize the cited papers.
• Semantics of cited documents should be close to citation context.[1]
• Citation Recommendation Engine – helps checking completeness of citations
when authoring a paper and find prior work related to topic under investigation
and to find missing relevant citations.[1]
[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf

Citation and Citation Context
[1]

Motivation and Approach
• Motivation : Since all the words in the citation context are used to describe the same
citation, the semantics of these words should be similar.
• This motivates to learn the semantic embedding for words in citation contexts and
cited documents and to recommend citations based on semantic distance.
• Proposed to learn distributed semantic representations of words and documents.
• Using the representations, we train a Neural Network Model that estimate probability
of citing a paper given a citation context.[1]
• Neural Network Model will tune distributed representations of words and documents
so that semantic similarity between citation context and cited paper will be high.[1]

Related work : Global Citation Recommendation
[1] McNee et al. (2002) : www-users.cs.umn.edu/~mcnee/konstan-nectar-2006.pdf
• It suggest a list of references for an entire given manuscript.
 McNee et al. (2002) used a partial list of reference as the input query and
recommended additional references based on collaborative filtering.[1]
 Strohman et al. (2007) assumed that the input is an incomplete paper manuscript
and recommended papers with high text similarity using bibliography similarity.
 Bethard and Jurafsky (2010) introduced features such as topical similarity, author
behavioral patterns, citation count, and recency of publication to train a classifier
for global recommendation.

Related work : Local Citation Recommendation
• Input query is a particular context where a citation should be made and output is a
short list of papers that need to be cited given context.
 He et al. (2010) assumed that user has provided placeholders for citation in a
query manuscript. A probabilistic model was trained to measure the relevance
between the citation contexts and the cited documents.[1]
 In another work (He et al. 2011), they first segmented the document into a
sequence of disjointed candidate citation contexts, and then applied the
dependency feature model to retrieve a ranked list of papers for each context.[1]
 IBM translation model-1 was trained to learn the probability of citing a document
given a word. Lu et al. (2011) used the translation model to learn the translation
probabilities between words in the citation context and the cited document.[1]

Citation Recommendation Model : Problem Definition
• We propose to model the citation context given the cited paper : p(c|d)
• Given a dataset of training samples with |C| pairs of citation context 𝑐𝑡 and cited
document 𝑑 𝑡, the objective is to maximize the log-likelihood:[1]
• By assuming that words in the citation contexts are mutually conditional
independent :
• Objective function can be now written as :

[2] http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Neural Probabilistic Model
• In a neural probabilistic model, the conditional probability p(w | d) can be defined
using a soft max function:[1,2]
where |V| is the size of the vocabulary that consists of all words appearing in the
citation contexts, and 𝑠 𝜃 𝑤, 𝑑 is a neural network based scoring function with
parameter.
The scoring function 𝑠 𝜃 𝑤, 𝑑 is defined as:
where, is a logistic function that rescales the
inner product of the word representation 𝑉𝑤 and the document representation 𝑣 𝑑 to
0,1 .

[3] http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
Word Representation Learning
• Negative sampling is used to learn the distributed representations of words using
the surrounding words.[1]
• Given a pair of citation context c and cited document d, the skip-gram model will
go through all the words in the citation context using a sliding window.[1]
• For each word 𝑤 𝑚 that appears in citation context c, words that appear within M
words before/after 𝑤 𝑚 are treated as positive samples. Suppose 𝑤 𝑝 is one of the
positive sample for 𝑤 𝑚 .The training objective is defined as:
where 𝑤 𝑛𝑖
is a negative sample randomly generated by noise distribution 𝑝 𝑛(.), and k
is the number of negative samples.[1,2,3]

Document Representation Learning
• Given a pair of citation context c and cited document d, we assume that each
word w that appears in the context c and the cited document d meets the real
data distribution 𝑝 𝑟 𝑤|𝑑 .[1, 2]
• Any other random words are noise data generated from noise distribution 𝑝 𝑛 𝑤 .
• Assumption : the size of noise data is k times of the real data size.
• the posterior probabilities of a pair of word w and document d come from
real/noise data distribution are:

Document Representation Learning (Cont …)
• Since we want to fit the neural probabilistic model 𝑝 𝜃 𝑑|𝑤 to the real
distribution 𝑝 𝑟 𝑑|𝑤 , we can rewrite :
• Given each word 𝑤𝑡 𝑖
that appears in the citation context, and the cited
document 𝑑 𝑡 ,we compute its contribution to the log-likelihood along with k
randomly generated noise words 𝑤 𝑛1
……. 𝑤 𝑛 𝑘
using the following objective
function:
• The training objective is to learn both the word representations and
document representations that maximize the equation.[1]

• Noise-contrastive estimation treats the normalization constraint of 𝑝 𝜃 to be a
constant Z(s) so we can rewrite equation [1, 2]
as,
• The parameters of the neural network are 𝜃 and Z(w), where 𝜃 is the
projection function that maps words and documents in to the n-dimensional
space.
• While learning we will tune both document representation matrix and word
representation matrix to optimize the probability p(𝑤 𝑚 | 𝑑).
Document Representation Learning (Cont …)

[1]

Citation Recommendation
• Using the fine-tuned word and document representations, we can get the
normalized probability distribution p(w|d).
• The table of p(d|w) is pre-calculated using Bayes’ rule and stored as an
inverted index.
• Given a query q = [𝑤1, 𝑤2,𝑤3 ….. 𝑤|𝑞|], the task is to recommend a list of
document R = [𝑑1, 𝑑2 …. 𝑑 𝑁] that need to be cited. [1]
• We use term – frequency – inverse – context – frequency (TF-ICF) to
measure p(𝑤𝑗 | 𝑞). [1]

Experiment (Data and Metrics)
• Snapshot of CiteSeer paper and citation database – Oct. 2013
• Dataset splitted into two parts – paper crawled before 2011(training data) and
paper crawled after 2011(testing data).
• Citations are extracted along with their citation context.(One line before and
after the sentence where citation occur).
• No of citation context and citation pairs : |C| = 8,992,476 (training set) and
1,628,698 (testing set). [1]
• No of recommendation is limited to 10 for each query. Dimension of
representational vectors set to n = 600 and no of negative samples k = 10.[1]
• Metrics : MAP, Recall, MRR, nDCG.

Baseline Comparison of Neural Probabilistic Model (NPM)
[1]

Conclusion and Future Work
• Used the distributed representations of words and documents to build a neural
network based citation recommendation model.
• The proposed model then learns the word and document representations to
calculate the probability of citing a document given a citation context.
• A comparative study on a snapshot of CiteSeer dataset with existing state-of-
the-art methods showed that the proposed model significantly improves the
quality of context-based citation recommendation.
• Since only neural probabilistic models were considered for context-based local
recommendations, one could explore for a combined model.

References
1. Wenyi Huangy, ZhaohuiWuz, Chen Liangy, Prasenjit Mitrayz, C. Lee Giles - A Neural
Probabilistic Model for Context Based Recommendation -
http://www.cse.psu.edu/~zzw109/pubs/AAAI2015NeuralProbabilisticModelCitationRecommenda
tion.pdf
2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean - Distributed
Representations ofWords and Phrases and their Compositionality -
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-
compositionality.pdf
3. McNee et al. (2002) : www-users.cs.umn.edu/~mcnee/konstan-nectar-2006.pdf
4. Deep Learning, NLP, and Representations – (Posted : July 7, 2014)
http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
5. Standard IR measures :
https://en.wikipedia.org/wiki/Information_retrieval#Performance_and_correctness_measures

Context-Based Citation Recommendation Using Neural Probabilistic Model (NPM

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (8)

Similar to Context-Based Citation Recommendation Using Neural Probabilistic Model (NPM

Similar to Context-Based Citation Recommendation Using Neural Probabilistic Model (NPM (20)

Context-Based Citation Recommendation Using Neural Probabilistic Model (NPM