This presentation is a briefing of a paper about Networks and Natural Language Processing. It describes many graph based methods and algorithms that help in syntactic parsing, lexical semantics and other applications.
1. Networks and NLP
Networks and Natural
Language Processing
Presented by: Ahmed Magdy Ezzeldin
2. Graphs in NLP
● Graphs are used in many NLP applications like :
- Text Summarization
- Syntactic parsing
- Word sense disambiguation
- Ontology construction
- Sentiment and subjectivity analysis
- Text clustering
● Associative or semantic networks are used to
represent the language units and their relations
where language units are the vertices (nodes) and
the relations are the edges (links).
3. Networks are Graphs
Nodes are Vertices
Links are Edges
- Node can represent text units can be : (words,
collocations, word senses, sentences,
documents)
- Graph nodes do not have to be of the same
category
- Edges can represent relations: (co-occurrence,
collocation, syntactic dependency, lexical
similarity)
4. Outline
● Syntax
1- Dependency Parsing
2- Prepositional Phrase Attachment
3- Co-reference Resolution
● Lexical Semantics
1- Lexical Networks
2- Semantic Similarity and Relatedness
3- Word Sense Disambiguation
4- Sentiment and Subjectivity Analysis
● Other Applications
1- Summarization
2- Semi-supervised Passage Retrieval
3- Keyword Extraction
6. 1- Dependency Parsing
An approach to sentence parsing
Dependency tree of a sentence is
a directed subgraph of the full
graph connecting all words in the
sentence.
So this subgraph is a tree
The root of the tree is the main
predicate that takes arguments
which are the child nodes
7. ● (McDonald et al, 2005) made a parser that
finds the tree with the highest score using CLE
(Chu Liu Edmonds) Algorithm of Maximum
spanning tree (MST) in a directed graph.
● Each node picks the neighbor with the highest
score which will lead to a spanning tree or a cycle
● CLE collapses each cycles into a single node
● CLE runs in O(n^2)
8. ●No tree covers all nodes so the closest 2 nodes
are collapsed
9. ● We repeat this step until
all nodes are collapsed
then an MST is constructed
by reversing the procedure
and expanding all nodes.
● McDonald achieved
excellent results on a
standard English data set
and even better results on
Czech (free word order
language)
10. 2- Prepositional Phrase Attachment
● (Toutanova et al., 2004) A preposition like "with" is either
attached to the main predicate (high verbal attachment) or the
noun phrase before it (low nominal attachment).
- “I ate pizza with olives.”
- “I ate pizza with a knife.”
● He proposed a semi-supervised learning process where a
graph of nouns and verbs is constructed and if 2 words
appear in the same context they are connected with an edge.
● Random walk until convergence
●Reached performance of 87.54% classification accuracy
which is near the human performance which is 88.20%
11. 3- Co-reference Resolution
● Identifying relations between entity
references in a text
● Can be nouns or pronouns
● Approximate the correct assignment of
references to entities in a text by using a graph-
cut algorithm.
Method:
A graph is constructed for each entity
● Every entity is linked to all the possible co-
reference with weighted edges where weights
are the confidence of each co-reference.
● Min-cut partitioning separate each entity and its
co-references.
12. Lexical Semantics
Semantic Analysis, Machine Translation, Information
retrieval, question answering, knowledge acquisition,
word sense disambiguation, semantic role labeling,
textual entailment, lexical acquisition, semantic relations
13. 1- Lexical Networks
a- Unsupervised lexical acquisition (Widdows and
Dorow, 2002)
Goal: build semantic classes automatically from raw
corpora
Method:
● Build a co-occurrence graph from British National
Corpus where nodes are words linked by conjunction
(and/or)
● Over 100,000 nodes and over half a million edges.
● Representative nouns are manually selected and put in
a seed set.
● Largest number of links with the seed set is added to
the seed
14. Result:
Accuracy 82% which is
far better than before
The drawback of this
method is low coverage
as it is limited to words in
conjunction relation only.
15. 1- Lexical Networks [continued]
b- Lexical Network Properties (Ferrer-i-Cancho and
Sole, 2001)
Goal:
● Observe Lexical Networks properties
Method:
● Build a co-occurrence network where words are
nodes that are linked with edges if they appear in the
same sentences with distance of 2 words at most.
● Half million nodes with over 10 million edges
Result:
● Small-world effect: 2-3 jumps can connect any 2 words
● Distribution of node degree is scale-free
16. 2- Semantic Similarity and Relatedness
●Methods include metrics calculated on existing
semantic networks like WordNet by applying shortest
path algorithms to identify the closest semantic relation
between 2 concepts (Leacock et al. 1998)
● Random Walk algorithm (Hughes and Ramage, 2007)
● PageRank gets the stationary distribution of nodes in
WordNet biased on each word of an input word pair.
● Divergence between these distributions is calculated to
show the words relatedness.
17. 3- Word Sense Disambiguation
a- Label Propagation Algorithm (Niu et al. 2005)
Method:
● Construct a graph of labeled and unlabeled examples for a
given ambiguous word
● Word sense examples are the nodes and weighted edges are
drawn by pairwise metric of similarity.
● Known labeled examples are the seed set are assigned with
their correct labels (manually)
● Labels are propagated through the graph through the weighted
edges
● Labels are assigned with certain probability
● The propagation is repeated until the correct labels are
assigned.
Result: Performs better than SVM when there is a small number
of examples provided.
19. Method:
● Build a graph for a given text and all the senses of its
words as nodes
● Senses are connected on the basis of their semantic
relations (synonymy, antonymy ...)
● A random walk results in a set of scores that reflects the
importance of each word sense.
Result:
● Superior to other Knowledge-based word sense
disambiguation that did not use graph based representations.
Follow up work:
● Mihalcea did not use semantic relations but she used
weighted edges using a measure of lexical similarity
● Brought generality as it can use any electronic dictionary
not just a semantic network like WordNet
20. c- Comparative Evaluation of Graph Connectivity
Algorithms (Navigli and Lapata, 2007)
● Applied on word sense graphs derived from WordNET
●Found out that the best measure to use is a closeness
measure
21. 4- Sentiment and Subjectivity Analysis
a- Using min-cut graph algorithm (Pang and Lea 2004)
Method:
● Drawing a graph where sentences are the nodes and the
edges are drawn according to the sentences proximity
● Each node is assigned a score showing the probability that its
sentence is subjective using a supervised subjectivity classifier
● Use min-cut algorithm to separate subjective from objective
sentences.
Results:
● Better than the supervised subjectivity classifier
b- By Assignment subjectivity and polarity labels (Esuli and
Sebastiani 2007)
Method:
● Random walk on a graph seeded with nodes labeled for
subjectivity and polarity.
24. 1- Summarization
a- (Salton et al. 1994, 1997)
● Draw a graph of the corpus where every node is a paragraph
● Lexically similar paragraphs are linked with edges
● A summary is retrieved by following paths defined by different
algorithms to cover as much of the content of the graph as
possible.
b- Lexical Centrality (Erkan and Radev 2004) (Mihalcea and
Tarau 2004)
Method:
● Sentences are nodes of the graph
● Random walk to define the most visited nodes as central to
the documents
● Remove duplicates or near duplicates
● Select sentences with maximal marginal relevance
25. 2- Semi-supervised Passage Retrieval
●Question Biased Passage Retrieval
(OtterBacher et al., 2005)
Answer a question from a group of documents
Method:
● Use biased random walk on a graph seeded
with positive and negative examples
● Each node is labeled according to the
percentage a random walk ends at this node
● The nodes with the highest score are central to
the document set and similar to the seed nodes.
26. 3- Keyword Extraction
●A set of terms that
best describes the
document
●Used in terminology
Extraction and
construction of
domain specific
dictionaries
27. ● Mihalcea and Tarau, 2004
Method:
● Build a co-occurrence graph of for the input text where
words are the the text words
● Words are linked by co-occurrence relation limited by
the distance between words.
● Random walk on graph
● Words ranked as important important and found next
to each other are collapsed into one key phrase
Result:
● A lot better than tf.idf
28. References
Networks and Natural Language Processing
(Mihalcea and Radev 2008)
Dragomir Radev
University of Michigan
radev@umich.edu
Rada Mihalcea
University of North Texas
rada@cs.unt.edu