This presentation is a briefing of a paper about Networks and Natural Language Processing. It describes many graph based methods and algorithms that help in syntactic parsing, lexical semantics and other applications.
Networks and NLP Networks and Natural Language Processing Presented by: Ahmed Magdy Ezzeldin
Graphs in NLP● Graphs are used in many NLP applications like : - Text Summarization - Syntactic parsing - Word sense disambiguation - Ontology construction - Sentiment and subjectivity analysis - Text clustering● Associative or semantic networks are used torepresent the language units and their relationswhere language units are the vertices (nodes) andthe relations are the edges (links).
Networks are Graphs Nodes are Vertices Links are Edges- Node can represent text units can be : (words,collocations, word senses, sentences,documents)- Graph nodes do not have to be of the samecategory- Edges can represent relations: (co-occurrence,collocation, syntactic dependency, lexicalsimilarity)
Outline● Syntax 1- Dependency Parsing 2- Prepositional Phrase Attachment 3- Co-reference Resolution● Lexical Semantics 1- Lexical Networks 2- Semantic Similarity and Relatedness 3- Word Sense Disambiguation 4- Sentiment and Subjectivity Analysis● Other Applications 1- Summarization 2- Semi-supervised Passage Retrieval 3- Keyword Extraction
1- Dependency Parsing An approach to sentence parsing Dependency tree of a sentence is a directed subgraph of the full graph connecting all words in the sentence. So this subgraph is a tree The root of the tree is the main predicate that takes arguments which are the child nodes
● (McDonald et al, 2005) made a parser thatfinds the tree with the highest score using CLE(Chu Liu Edmonds) Algorithm of Maximumspanning tree (MST) in a directed graph.● Each node picks the neighbor with the highestscore which will lead to a spanning tree or a cycle● CLE collapses each cycles into a single node● CLE runs in O(n^2)
●No tree covers all nodes so the closest 2 nodesare collapsed
● We repeat this step untilall nodes are collapsedthen an MST is constructedby reversing the procedureand expanding all nodes.● McDonald achievedexcellent results on astandard English data setand even better results onCzech (free word orderlanguage)
2- Prepositional Phrase Attachment● (Toutanova et al., 2004) A preposition like "with" is eitherattached to the main predicate (high verbal attachment) or thenoun phrase before it (low nominal attachment).- “I ate pizza with olives.”- “I ate pizza with a knife.”● He proposed a semi-supervised learning process where agraph of nouns and verbs is constructed and if 2 wordsappear in the same context they are connected with an edge.● Random walk until convergence●Reached performance of 87.54% classification accuracywhich is near the human performance which is 88.20%
3- Co-reference Resolution● Identifying relations between entityreferences in a text● Can be nouns or pronouns● Approximate the correct assignment ofreferences to entities in a text by using a graph-cut algorithm. Method:A graph is constructed for each entity● Every entity is linked to all the possible co-reference with weighted edges where weightsare the confidence of each co-reference.● Min-cut partitioning separate each entity and itsco-references.
Lexical Semantics Semantic Analysis, Machine Translation, Information retrieval, question answering, knowledge acquisition, word sense disambiguation, semantic role labeling,textual entailment, lexical acquisition, semantic relations
1- Lexical Networksa- Unsupervised lexical acquisition (Widdows andDorow, 2002)Goal: build semantic classes automatically from rawcorporaMethod:● Build a co-occurrence graph from British NationalCorpus where nodes are words linked by conjunction(and/or)● Over 100,000 nodes and over half a million edges.● Representative nouns are manually selected and put ina seed set.● Largest number of links with the seed set is added tothe seed
Result:Accuracy 82% which isfar better than beforeThe drawback of thismethod is low coverageas it is limited to words inconjunction relation only.
1- Lexical Networks [continued]b- Lexical Network Properties (Ferrer-i-Cancho andSole, 2001)Goal:● Observe Lexical Networks propertiesMethod:● Build a co-occurrence network where words arenodes that are linked with edges if they appear in thesame sentences with distance of 2 words at most.● Half million nodes with over 10 million edgesResult:● Small-world effect: 2-3 jumps can connect any 2 words● Distribution of node degree is scale-free
2- Semantic Similarity and Relatedness●Methods include metrics calculated on existingsemantic networks like WordNet by applying shortestpath algorithms to identify the closest semantic relationbetween 2 concepts (Leacock et al. 1998)● Random Walk algorithm (Hughes and Ramage, 2007)● PageRank gets the stationary distribution of nodes inWordNet biased on each word of an input word pair.● Divergence between these distributions is calculated toshow the words relatedness.
3- Word Sense Disambiguationa- Label Propagation Algorithm (Niu et al. 2005)Method:● Construct a graph of labeled and unlabeled examples for agiven ambiguous word● Word sense examples are the nodes and weighted edges aredrawn by pairwise metric of similarity.● Known labeled examples are the seed set are assigned withtheir correct labels (manually)● Labels are propagated through the graph through the weightededges● Labels are assigned with certain probability● The propagation is repeated until the correct labels areassigned.Result: Performs better than SVM when there is a small numberof examples provided.
b- Knowledge-based word sensedisambiguation(Mihalcea et al.2004, Sinha andMihalcea 2007)
Method:● Build a graph for a given text and all the senses of itswords as nodes● Senses are connected on the basis of their semanticrelations (synonymy, antonymy ...)● A random walk results in a set of scores that reflects theimportance of each word sense.Result:● Superior to other Knowledge-based word sensedisambiguation that did not use graph based representations.Follow up work:● Mihalcea did not use semantic relations but she usedweighted edges using a measure of lexical similarity● Brought generality as it can use any electronic dictionarynot just a semantic network like WordNet
c- Comparative Evaluation of Graph ConnectivityAlgorithms (Navigli and Lapata, 2007)● Applied on word sense graphs derived from WordNET●Found out that the best measure to use is a closenessmeasure
4- Sentiment and Subjectivity Analysisa- Using min-cut graph algorithm (Pang and Lea 2004)Method:● Drawing a graph where sentences are the nodes and theedges are drawn according to the sentences proximity● Each node is assigned a score showing the probability that itssentence is subjective using a supervised subjectivity classifier● Use min-cut algorithm to separate subjective from objectivesentences.Results:● Better than the supervised subjectivity classifierb- By Assignment subjectivity and polarity labels (Esuli andSebastiani 2007)Method:● Random walk on a graph seeded with nodes labeled forsubjectivity and polarity.
1- Summarizationa- (Salton et al. 1994, 1997)● Draw a graph of the corpus where every node is a paragraph● Lexically similar paragraphs are linked with edges● A summary is retrieved by following paths defined by differentalgorithms to cover as much of the content of the graph aspossible.b- Lexical Centrality (Erkan and Radev 2004) (Mihalcea andTarau 2004)Method:● Sentences are nodes of the graph● Random walk to define the most visited nodes as central tothe documents● Remove duplicates or near duplicates● Select sentences with maximal marginal relevance
2- Semi-supervised Passage Retrieval●Question Biased Passage Retrieval(OtterBacher et al., 2005)Answer a question from a group of documentsMethod:● Use biased random walk on a graph seededwith positive and negative examples● Each node is labeled according to thepercentage a random walk ends at this node● The nodes with the highest score are central tothe document set and similar to the seed nodes.
3- Keyword Extraction●A set of terms thatbest describes thedocument●Used in terminologyExtraction andconstruction ofdomain specificdictionaries
● Mihalcea and Tarau, 2004Method:● Build a co-occurrence graph of for the input text wherewords are the the text words● Words are linked by co-occurrence relation limited bythe distance between words.● Random walk on graph● Words ranked as important important and found nextto each other are collapsed into one key phraseResult:● A lot better than tf.idf
References Networks and Natural Language Processing (Mihalcea and Radev 2008) Dragomir Radev University of Michigan email@example.com Rada Mihalcea University of North Texas firstname.lastname@example.org