This paper presents a new exemplar-based approach for word sense disambiguation (WSD) that integrates multiple knowledge sources. The authors' WSD system, called LEXAS, was tested on two datasets. On a common dataset involving the noun "interest", LEXAS achieved 87.4% accuracy, higher than previous work. LEXAS was also tested on a large dataset of 192,800 sense-tagged words, performing better than the most frequent sense heuristic on highly ambiguous words. This represents the largest test of a WSD system to date.
The document describes a new probabilistic topic model called Learning To Summarize (LeToS) that aims to generate coherent multi-sentence summaries by modeling word and sentence transitions between grammatical and semantic roles (GSRs). LeToS represents documents as distributions over topics and GSR transitions, and generates words and sentences. It outperforms LDA on perplexity and generates summaries competitive with state-of-the-art on Pyramid evaluation. However, it has limitations in capturing factual information and understanding queries.
Corpus-based part-of-speech disambiguation of PersianIDES Editor
In this paper we introduce a method for part-ofspeech
disambiguation of Persian texts, which uses word class
probabilities in a relatively small training corpus in order to
automatically tag unrestricted Persian texts. The experiment
has been carried out in two levels as unigram and bi-gram
genotypes disambiguation. Comparing the results gained from
the two levels, we show that using immediate right context to
which a given word belongs can increase the accuracy rate of
the system to a high degree
Chat bot using text similarity approachdinesh_joshy
1. There are three main techniques for chat bots to generate responses: static responses using templates, dynamic responses by scoring potential responses from a knowledge base, and generated responses using deep learning to generate novel responses from training data.
2. Text similarity can be measured using string-based, corpus-based, or knowledge-based approaches. String-based measures operate on character sequences while corpus-based measures use word co-occurrence statistics and knowledge-based measures use information from semantic networks like WordNet.
3. Popular corpus-based measures include LSA, ESA, and PMI-IR which analyze word contexts and co-occurrences in corpora. Knowledge-based measures like Resnik, Lin, and Leacock
This document provides an overview of glue semantics, a theory of the syntax-semantics interface of natural language that uses linear logic for meaning composition. Glue semantics distinguishes between a meaning logic for semantic representations and a glue logic for specifying how chunks of meaning are assembled. It discusses how linear logic is well-suited for modeling linguistic resources and applications of glue semantics, including examples using lexical functional grammar. The document also covers identity criteria for proofs in glue semantics through lambda equivalence and the Curry-Howard isomorphism.
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML
The traditional semantics for First Order Logic (sometimes called Tarskian semantics) is based on the notion of interpretations of constants. Herbrand semantics is an alternative semantics based directly on truth assignments for ground sentences rather than interpretations of constants. Herbrand semantics is simpler and more intuitive than Tarskian semantics; and, consequently, it is easier to teach and learn. Moreover, it is more expressive. For example, while it is not possible to finitely axiomatize integer arithmetic with Tarskian semantics, this can be done easily with Herbrand Semantics. The downside is a loss of some common logical properties, such as compactness and completeness. However, there is no loss of inferential power. Anything that can be proved according to Tarskian semantics can also be proved according to Herbrand semantics. In this presentation, we define Herbrand semantics; we look at the implications for research on logic and rules systems and automated reasoning; and and we assess the potential for popularizing logic.
This document discusses methods for aligning word senses between languages using probabilistic sense distributions. It proposes two approaches: 1) Using only monolingual corpora and aligning senses based on similar sense distributions between closely related languages. 2) Leveraging parallel corpora to estimate sense distribution alignments and the most probable translation for each source sense. The approaches are tested on the Europarl corpus, first ignoring and then exploiting sentence alignments. Several examples are examined to validate the sense alignments. Key aspects include using word sense disambiguation to annotate corpora, estimating sense assignment distributions, and assigning translation weights between language pairs based on relative sense frequencies.
This document presents an approach to syntactic parsing that aims to achieve high accuracy with minimal structural annotation. The approach uses only surface features like the words in a span and the rules used to parse it, without additional annotations like part-of-speech tags or head words. Experimental results on the Penn Treebank show the approach achieves performance comparable to state-of-the-art parsers that use more complex annotations. The approach is also evaluated on multiple languages in the SPMRL 2013 shared task, outperforming baselines on average across nine treebanks.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
The document describes a new probabilistic topic model called Learning To Summarize (LeToS) that aims to generate coherent multi-sentence summaries by modeling word and sentence transitions between grammatical and semantic roles (GSRs). LeToS represents documents as distributions over topics and GSR transitions, and generates words and sentences. It outperforms LDA on perplexity and generates summaries competitive with state-of-the-art on Pyramid evaluation. However, it has limitations in capturing factual information and understanding queries.
Corpus-based part-of-speech disambiguation of PersianIDES Editor
In this paper we introduce a method for part-ofspeech
disambiguation of Persian texts, which uses word class
probabilities in a relatively small training corpus in order to
automatically tag unrestricted Persian texts. The experiment
has been carried out in two levels as unigram and bi-gram
genotypes disambiguation. Comparing the results gained from
the two levels, we show that using immediate right context to
which a given word belongs can increase the accuracy rate of
the system to a high degree
Chat bot using text similarity approachdinesh_joshy
1. There are three main techniques for chat bots to generate responses: static responses using templates, dynamic responses by scoring potential responses from a knowledge base, and generated responses using deep learning to generate novel responses from training data.
2. Text similarity can be measured using string-based, corpus-based, or knowledge-based approaches. String-based measures operate on character sequences while corpus-based measures use word co-occurrence statistics and knowledge-based measures use information from semantic networks like WordNet.
3. Popular corpus-based measures include LSA, ESA, and PMI-IR which analyze word contexts and co-occurrences in corpora. Knowledge-based measures like Resnik, Lin, and Leacock
This document provides an overview of glue semantics, a theory of the syntax-semantics interface of natural language that uses linear logic for meaning composition. Glue semantics distinguishes between a meaning logic for semantic representations and a glue logic for specifying how chunks of meaning are assembled. It discusses how linear logic is well-suited for modeling linguistic resources and applications of glue semantics, including examples using lexical functional grammar. The document also covers identity criteria for proofs in glue semantics through lambda equivalence and the Curry-Howard isomorphism.
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML
The traditional semantics for First Order Logic (sometimes called Tarskian semantics) is based on the notion of interpretations of constants. Herbrand semantics is an alternative semantics based directly on truth assignments for ground sentences rather than interpretations of constants. Herbrand semantics is simpler and more intuitive than Tarskian semantics; and, consequently, it is easier to teach and learn. Moreover, it is more expressive. For example, while it is not possible to finitely axiomatize integer arithmetic with Tarskian semantics, this can be done easily with Herbrand Semantics. The downside is a loss of some common logical properties, such as compactness and completeness. However, there is no loss of inferential power. Anything that can be proved according to Tarskian semantics can also be proved according to Herbrand semantics. In this presentation, we define Herbrand semantics; we look at the implications for research on logic and rules systems and automated reasoning; and and we assess the potential for popularizing logic.
This document discusses methods for aligning word senses between languages using probabilistic sense distributions. It proposes two approaches: 1) Using only monolingual corpora and aligning senses based on similar sense distributions between closely related languages. 2) Leveraging parallel corpora to estimate sense distribution alignments and the most probable translation for each source sense. The approaches are tested on the Europarl corpus, first ignoring and then exploiting sentence alignments. Several examples are examined to validate the sense alignments. Key aspects include using word sense disambiguation to annotate corpora, estimating sense assignment distributions, and assigning translation weights between language pairs based on relative sense frequencies.
This document presents an approach to syntactic parsing that aims to achieve high accuracy with minimal structural annotation. The approach uses only surface features like the words in a span and the rules used to parse it, without additional annotations like part-of-speech tags or head words. Experimental results on the Penn Treebank show the approach achieves performance comparable to state-of-the-art parsers that use more complex annotations. The approach is also evaluated on multiple languages in the SPMRL 2013 shared task, outperforming baselines on average across nine treebanks.
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
Word Sense Disambiguation (WSD) is an important area which has an impact on improving the performance of applications of computational linguistics such as machine translation, information
retrieval, text summarization, question answering systems, etc. We have presented a brief history of WSD,
discussed the Supervised, Unsupervised, and Knowledge-based approaches for WSD. Though many WSD
algorithms exist, we have considered optimal and portable WSD algorithms as most appropriate since they
can be embedded easily in applications of computational linguistics. This paper will also provide an idea of
some of the WSD algorithms and their performances, which compares and assess the need of the word
sense disambiguation.
DDH 2021-03-03: Text Processing and Searching in the Medical DomainLuukBoulogne
This document summarizes Gianmaria Silvello's presentation on text processing and searching in the medical domain. It includes an introduction to text processing and outlines the typical text processing pipeline which includes steps like tokenization, stopword removal, stemming, part-of-speech tagging, and named entity recognition. It then provides an example of applying this pipeline to a short medical report about a colon biopsy. Finally, it discusses term representations and how distributed representations are used to define similarity between terms in order to represent their meanings.
The CLUES database: automated search for linguistic cognatesMark Planigale
Overview of the design of the CLUES database, developed as an aid to the comparative method in historical linguistics. Includes information on the design of the database and the strategies used to detect correlate forms (potential cognates), including metrics used to rate similarity of form and meaning.
Nakov S., Nakov P., Paskaleva E., Improved Word Alignments Using the Web as a...Svetlin Nakov
Nakov P., Nakov S., Paskaleva E., Improved Word Alignments Using the Web as a Corpus, Proceedings of the International Conference RANLP 2007 (Recent Advances in Natural Language Processing), pp. 400-405, ISBN 978-954-91743-7-3, Borovets, Bulgaria, 27-29 September 2007
Logical Description Grammar (LDG) is a framework that uses underspecified representations to model the interaction of syntactic, semantic, and pragmatic constraints in sentence and discourse interpretation. LDG generates partial descriptions of linguistic trees and allows reasoning over possible fully specified interpretations. It represents discourse context through incremental construction of discourse descriptions from sentence descriptions. Local contexts in tree nodes track anaphora resolution and other context-dependent semantic phenomena compositionally.
Automatic Identification of False Friends in Parallel Corpora: Statistical an...Svetlin Nakov
Scientific article: False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the wordsâ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰ssuserc35c0e
The document describes the skip-gram model for word embeddings. It discusses some limitations of existing models, including their inability to capture internal structure of words. It then introduces the skip-gram model, which learns vector representations of words by predicting surrounding context words. It describes the objective function to maximize the likelihood of context words and discusses negative sampling to address computational bottlenecks in the softmax layer and output weight matrix calculations.
The document defines different parts of speech including prepositions, conjunctions, pronouns, interjections, adjectives, verbs, and adverbs. Prepositions indicate relationships between objects. Conjunctions link word phrases and clauses. Pronouns act as subjects in sentences. Interjections express emotions. Adjectives quantify nouns. Verbs indicate actions. Adverbs describe manner, time, place or cause.
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, it is important to have conceptual taxonomies that express common sense and implicit relationships among concepts. This work proposes a mix of several tech niques that are brought to cooperation for learning them automatically. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
More details can be found here:
http://www.di.uniba.it/~loglisci/MCP2011/mce2011.pdf
Cohesion refers to semantic relations that give a text its unity and allow elements in a text to be interpreted based on other elements. There are five main cohesive devices that create texture in a text: reference, substitution, ellipsis, conjunction, and lexical cohesion. These devices link different parts of a text together through grammatical and lexical ties. Cohesion is analyzed by examining the patterns of ties within a text, and plays an important role in forming the overall structure and meaning of a text through linking elements semantically rather than structurally.
This document discusses cohesion in English language texts. It outlines that cohesion refers to semantic relations between elements in a text that give the text unity and allow it to be interpreted as a whole. There are various types of cohesive devices like reference, substitution, ellipsis, and conjunction that create ties between elements. Texture is created through these cohesive ties and is a key factor in forming a unified text. Cohesion operates at both the grammatical and lexical levels to link ideas both within and between sentences.
A survey on parallel corpora alignment andrefsantos
This document provides a survey of methods for aligning parallel text corpora. It discusses the historical background of using parallel texts in language processing from the 1950s onward. Key early methods are described, including ones based on sentence length, lexical mapping between words, and identifying cognates. The document also evaluates major efforts to create benchmark datasets and evaluate system performance against gold standard alignments. It surveys the evolution of various alignment techniques and lists some relevant tools and projects in the field.
The document proposes an Ontology-Based Semantic Context (OBSC) framework for analyzing Arabic web content. The framework includes modules for tokenization, word sense disambiguation using an Arabic WordNet ontology, measuring content similarity, and clustering similar content. The key innovation is customizing existing techniques for Arabic, which has greater complexity than English in its syntax, semantics, and ontology. The framework aims to analyze Arabic text at a conceptual level to enable applications like semantic search engines and question answering systems.
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET
This document summarizes an approach to part-of-speech tagging that combines features extraction and hidden Markov models. It extracts morphological features from words to categorize closed-class words, while using hidden Markov models to determine tag probabilities for other words based on contextual meaning and previous tags. The approach was tested on a manually tagged dataset of 1409 words from a natural language processing textbook, achieving accurate part-of-speech tagging.
The document discusses semantic technologies and formal languages. It provides an overview of upcoming seminars on topics like the semantic web, linked open data, and topic maps. It then discusses what semantic technologies are and why they are useful. Finally, it covers elements of formal languages like syntax, semantics, and deduction rules.
The surface realizer takes a discourse plan as input and generates individual sentences from it. It is constrained by lexical and grammatical resources. It produces an ordered sequence of words that complies with the specified resources. Influential approaches to surface realization include Systemic Grammar and Functional Unification Grammar.
dialogue act modeling for automatic tagging and recognitionVipul Munot
Aim to present comprehensive framework
for modelling and automatic classification of DA’s
founded on well-known statistical methods
Present results obtained with this approach
on large widely available corpus of
spontaneous conversational speech.
This document discusses adjectives and their comparison. It asks the reader to listen to a story and identify adjectives that are short or long. Adjectives can be compared as being shorter or longer than others through their suffixes or additional words added. The purpose is to learn about adjectives and how they can be compared in length.
This document describes a system for named entity recognition in South and Southeast Asian languages that uses conditional random fields for machine learning followed by rule-based post-processing. The system was tested on Bengali, Hindi, Oriya, Telugu, and Urdu. It uses window-based features and prefixes/suffixes to handle agglutinative properties. Post-processing improves recall by considering secondary tags and handles nested entities. Evaluation shows F-measures from 39-51% depending on the language and entity type. The system achieves decent performance without extensive language-specific resources.
The document contains three quotes from the book "The Da Vinci Code" that discuss the role of women in the early Christian church. The quotes argue that Jesus intended for Mary Magdalene to lead the church, that references to the Holy Grail actually refer to Mary Magdalene carrying Jesus's royal bloodline, and that powerful men in the early church spread lies to diminish the role of women and favor men instead.
Simon Peters discussed using several online tools like PollEverywhere.com and Slideshare.net for teaching, noting that he enjoyed using PollEverywhere and Slideshare seems like an easy way to share presentations. However, he also mentioned that Zamzar.com no longer converts files from YouTube. Overall, Simon felt the assignment went pretty well.
DDH 2021-03-03: Text Processing and Searching in the Medical DomainLuukBoulogne
This document summarizes Gianmaria Silvello's presentation on text processing and searching in the medical domain. It includes an introduction to text processing and outlines the typical text processing pipeline which includes steps like tokenization, stopword removal, stemming, part-of-speech tagging, and named entity recognition. It then provides an example of applying this pipeline to a short medical report about a colon biopsy. Finally, it discusses term representations and how distributed representations are used to define similarity between terms in order to represent their meanings.
The CLUES database: automated search for linguistic cognatesMark Planigale
Overview of the design of the CLUES database, developed as an aid to the comparative method in historical linguistics. Includes information on the design of the database and the strategies used to detect correlate forms (potential cognates), including metrics used to rate similarity of form and meaning.
Nakov S., Nakov P., Paskaleva E., Improved Word Alignments Using the Web as a...Svetlin Nakov
Nakov P., Nakov S., Paskaleva E., Improved Word Alignments Using the Web as a Corpus, Proceedings of the International Conference RANLP 2007 (Recent Advances in Natural Language Processing), pp. 400-405, ISBN 978-954-91743-7-3, Borovets, Bulgaria, 27-29 September 2007
Logical Description Grammar (LDG) is a framework that uses underspecified representations to model the interaction of syntactic, semantic, and pragmatic constraints in sentence and discourse interpretation. LDG generates partial descriptions of linguistic trees and allows reasoning over possible fully specified interpretations. It represents discourse context through incremental construction of discourse descriptions from sentence descriptions. Local contexts in tree nodes track anaphora resolution and other context-dependent semantic phenomena compositionally.
Automatic Identification of False Friends in Parallel Corpora: Statistical an...Svetlin Nakov
Scientific article: False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the wordsâ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.
Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰ssuserc35c0e
The document describes the skip-gram model for word embeddings. It discusses some limitations of existing models, including their inability to capture internal structure of words. It then introduces the skip-gram model, which learns vector representations of words by predicting surrounding context words. It describes the objective function to maximize the likelihood of context words and discusses negative sampling to address computational bottlenecks in the softmax layer and output weight matrix calculations.
The document defines different parts of speech including prepositions, conjunctions, pronouns, interjections, adjectives, verbs, and adverbs. Prepositions indicate relationships between objects. Conjunctions link word phrases and clauses. Pronouns act as subjects in sentences. Interjections express emotions. Adjectives quantify nouns. Verbs indicate actions. Adverbs describe manner, time, place or cause.
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, it is important to have conceptual taxonomies that express common sense and implicit relationships among concepts. This work proposes a mix of several tech niques that are brought to cooperation for learning them automatically. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
More details can be found here:
http://www.di.uniba.it/~loglisci/MCP2011/mce2011.pdf
Cohesion refers to semantic relations that give a text its unity and allow elements in a text to be interpreted based on other elements. There are five main cohesive devices that create texture in a text: reference, substitution, ellipsis, conjunction, and lexical cohesion. These devices link different parts of a text together through grammatical and lexical ties. Cohesion is analyzed by examining the patterns of ties within a text, and plays an important role in forming the overall structure and meaning of a text through linking elements semantically rather than structurally.
This document discusses cohesion in English language texts. It outlines that cohesion refers to semantic relations between elements in a text that give the text unity and allow it to be interpreted as a whole. There are various types of cohesive devices like reference, substitution, ellipsis, and conjunction that create ties between elements. Texture is created through these cohesive ties and is a key factor in forming a unified text. Cohesion operates at both the grammatical and lexical levels to link ideas both within and between sentences.
A survey on parallel corpora alignment andrefsantos
This document provides a survey of methods for aligning parallel text corpora. It discusses the historical background of using parallel texts in language processing from the 1950s onward. Key early methods are described, including ones based on sentence length, lexical mapping between words, and identifying cognates. The document also evaluates major efforts to create benchmark datasets and evaluate system performance against gold standard alignments. It surveys the evolution of various alignment techniques and lists some relevant tools and projects in the field.
The document proposes an Ontology-Based Semantic Context (OBSC) framework for analyzing Arabic web content. The framework includes modules for tokenization, word sense disambiguation using an Arabic WordNet ontology, measuring content similarity, and clustering similar content. The key innovation is customizing existing techniques for Arabic, which has greater complexity than English in its syntax, semantics, and ontology. The framework aims to analyze Arabic text at a conceptual level to enable applications like semantic search engines and question answering systems.
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET
This document summarizes an approach to part-of-speech tagging that combines features extraction and hidden Markov models. It extracts morphological features from words to categorize closed-class words, while using hidden Markov models to determine tag probabilities for other words based on contextual meaning and previous tags. The approach was tested on a manually tagged dataset of 1409 words from a natural language processing textbook, achieving accurate part-of-speech tagging.
The document discusses semantic technologies and formal languages. It provides an overview of upcoming seminars on topics like the semantic web, linked open data, and topic maps. It then discusses what semantic technologies are and why they are useful. Finally, it covers elements of formal languages like syntax, semantics, and deduction rules.
The surface realizer takes a discourse plan as input and generates individual sentences from it. It is constrained by lexical and grammatical resources. It produces an ordered sequence of words that complies with the specified resources. Influential approaches to surface realization include Systemic Grammar and Functional Unification Grammar.
dialogue act modeling for automatic tagging and recognitionVipul Munot
Aim to present comprehensive framework
for modelling and automatic classification of DA’s
founded on well-known statistical methods
Present results obtained with this approach
on large widely available corpus of
spontaneous conversational speech.
This document discusses adjectives and their comparison. It asks the reader to listen to a story and identify adjectives that are short or long. Adjectives can be compared as being shorter or longer than others through their suffixes or additional words added. The purpose is to learn about adjectives and how they can be compared in length.
This document describes a system for named entity recognition in South and Southeast Asian languages that uses conditional random fields for machine learning followed by rule-based post-processing. The system was tested on Bengali, Hindi, Oriya, Telugu, and Urdu. It uses window-based features and prefixes/suffixes to handle agglutinative properties. Post-processing improves recall by considering secondary tags and handles nested entities. Evaluation shows F-measures from 39-51% depending on the language and entity type. The system achieves decent performance without extensive language-specific resources.
The document contains three quotes from the book "The Da Vinci Code" that discuss the role of women in the early Christian church. The quotes argue that Jesus intended for Mary Magdalene to lead the church, that references to the Holy Grail actually refer to Mary Magdalene carrying Jesus's royal bloodline, and that powerful men in the early church spread lies to diminish the role of women and favor men instead.
Simon Peters discussed using several online tools like PollEverywhere.com and Slideshare.net for teaching, noting that he enjoyed using PollEverywhere and Slideshare seems like an easy way to share presentations. However, he also mentioned that Zamzar.com no longer converts files from YouTube. Overall, Simon felt the assignment went pretty well.
Paganism refers to polytheistic or indigenous, pre-Christian religions. The document appears to be about comparing Paganism and Christianity, though it does not provide details on their differences or similarities. It includes the author's name, semester, and university but does not contain a full analysis or comparison of the two religions within the 3 sentences.
The document contains three quotes from the book "The Da Vinci Code" that discuss the role of women in the early Christian church. The quotes argue that Jesus intended for Mary Magdalene to lead the church, that references to the Holy Grail actually refer to Mary Magdalene carrying Jesus' bloodline, and that powerful men in the early church spread lies to diminish the role of women and promote male dominance in the church.
The document contains the lyrics to several contemporary Christian worship songs that were likely sung during a church service. It discusses trusting in God during trials and difficulties, as well as praising God for his love, mercy, and salvation through Jesus Christ. Key themes are God's presence with believers through all of life's ups and downs and his ultimate victory over sin and death.
The document discusses the concept of "Uhuru" or freedom in Ngugi wa Thiong'o's novel "A Grain of Wheat". It refers to a speech given by former Mau Mau guerilla general R. where he states that the meaning of Uhuru, or freedom, is contained in their movement's name - "Land and Freedom". It also quotes a passage from the Bible about a grain of wheat needing to fall to the ground and die in order to produce much grain.
Television can negatively impact children's health by promoting snacking while watching, which can lead to obesity. Too much TV during early brain development has also been linked to poorer academic performance later in life. Excessive television viewing reduces physical activity which is important for healthy development and can increase the risk of obesity and related health issues.
The document provides an overview of Unify, including:
- Unify's vision and goals of helping organizations achieve greater productivity and success.
- An overview of Unify's portfolio and services, including OpenScape and Circuit solutions for collaboration, mobility, contact centers, and devices.
- Examples of how Unify solutions provide unified communications and collaboration across multiple devices and locations.
Marginal cost is the change in aggregate costs from increasing or decreasing output by one unit. Marginal costing involves differentiating between fixed and variable costs. Absorption costing includes all costs, resulting in different unit costs at different output levels, while marginal costing only includes variable costs. Cost-volume-profit analysis uses marginal costing concepts like contribution margin, break-even point, and margin of safety to aid managerial decision making regarding pricing, production levels, and profit planning.
This document discusses transformer protection philosophy and methods. It describes various types of faults that can occur in transformers like ground faults, phase-to-phase faults, interturn faults, and core faults. It also discusses mechanical protections like Buchholz relay, sudden pressure relay, pressure relief valve, and temperature indicators. Electrical protections discussed include biased differential relay protection and harmonic restraint. The document provides details on how these protections work and their settings.
Genre Research - Horror Forms and ConventionsLee Turner
The document discusses common tropes and elements found in horror genre films. It describes typical character archetypes like the protagonist, sex appeal character, and antagonist. It also outlines common props, locations, situations, and sounds used to create tension and scare audiences. The protagonist is usually the sole survivor, while the sex appeal character is often the first killed. Isolated, abandoned locations are frequently used as settings. Jumping scares, power outages, and phones not working are some classic scary situations. Both diegetic and non-diegetic sounds help build atmosphere and surprise viewers.
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
WSD is a task with a long history in computational linguistics. It is open problem in NLP. This research focuses on increasing the accuracy of Lesk algorithm with assistant of annotated corpus using Narodowy Korpus Jezyka Polskiego (NKJP “Polish National Corpus”). The
NKJP_WSI (NKJP Word Sense Inventory) is used as senses inventory. A Lesk algorithm is firstly implemented on the whole corpus (training and test) and then getting the results. This is done with assistance of special dictionary that contains all possible senses for each ambiguous
word. In this implementation, the similarity equation is applied to information retrieval using tfidf with small modification in order to achieve the requirements. Experimental results show that the accuracy of 82.016% and 84.063% without and with deleting stop words respectively. Moreover, this paper practically solves the challenge of an execution time. Therefore, we proposed special structure for building another dictionary from the corpus in order to reduce time complicity of the training process. The new dictionary contains all the possible words (only these which help us in solving WSD) with their tf-idf from the existing dictionary with assistant of annotated corpus. Furthermore, eexperimental results show that the two tests are identical. The execution time - of the second test dropped down to 20 times compared to first test with same accuracy
Word sense disambiguation using wsd specific wordnet of polysemy wordsijnlc
This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy
word based on the clue words. The related words for each sense of a polysemy word as well as single sense
word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and
adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the
structure of WordNet, we developed a new model of WordNet that organizes the different senses of
polysemy words as well as the single sense words based on the clue words. These clue words for each sense
of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the
polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms.
The clue word can be a noun, verb, adjective or adverb.
Statistically-Enhanced New Word IdentificationAndi Wu
This document discusses a method for identifying new words in Chinese text using a combination of rule-based and statistical approaches. Candidate character strings are selected as potential new words based on their independent word probability being below a threshold. Parts of speech are then assigned to candidate strings by examining the part of speech patterns of their component characters and comparing them to existing words in a dictionary to determine the most likely part of speech based on word formation patterns in Chinese. This hybrid approach avoids the overgeneration of rule-based systems and data sparsity issues of purely statistical approaches.
Natural language processing (NLP) aims to help computers understand human language. Ambiguity is a major challenge for NLP as words and sentences can have multiple meanings depending on context. There are different types of ambiguity including lexical ambiguity where a word has multiple meanings, syntactic ambiguity where sentence structure is unclear, and semantic ambiguity where meaning depends on broader context. NLP techniques like part-of-speech tagging and word sense disambiguation aim to resolve ambiguity by analyzing context.
This chapter introduces vector semantics for representing word meaning in natural language processing applications. Vector semantics learns word embeddings from text distributions that capture how words are used. Words are represented as vectors in a multidimensional semantic space derived from neighboring words in text. Models like word2vec use neural networks to generate dense, real-valued vectors for words from large corpora without supervision. Word vectors can be evaluated intrinsically by comparing similarity scores to human ratings for word pairs in context and without context.
This document presents a method for measuring the semantic similarity of short texts using both corpus-based and knowledge-based measures of word semantic similarity. It combines word-to-word similarity scores with word specificity measures to determine the overall semantic similarity between two text segments. The method is evaluated on a paraphrase recognition task and is shown to outperform methods based only on simple lexical matching, resulting in up to a 13% reduction in error rate.
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
Deep Learning Architectures for Semantic Relation Detection Tasks
Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.
05 linguistic theory meets lexicographyDuygu Aşıklar
This document discusses the application of linguistic theories to lexicography. It reviews theories like frame semantics that can help lexicographers analyze data and produce dictionary entries. Frame semantics describes how words relate to semantic frames and contexts. The document explains key concepts for lexicographers like sense relationships, inherent properties of words, and what information is relevant for dictionary entries. It provides examples and guidelines for analyzing words using frame semantics and determining lexicographic relevance from corpus data.
The document introduces a new word analogy corpus for the Czech language to evaluate word embedding models. It contains over 22,000 semantic and syntactic analogy questions across various categories. The authors experiment with Word2Vec (CBOW and Skip-gram) and GloVe models on the Czech Wikipedia corpus. Their results show that CBOW generally performs best, with accuracy improving as vector dimension and training epochs increase. Performance is better on syntactic tasks than semantic ones. The new Czech analogy corpus allows further exploration of how word embeddings represent semantics and syntax for highly inflected languages.
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
The document discusses different approaches for representing the semantics and meaning of text, including propositional models that represent sentences as logical formulas and vector-based models that embed texts in a high-dimensional semantic space. It describes word embedding models like Word2vec that learn vector representations of words based on their contexts, and how these embeddings capture linguistic regularities and semantic relationships between words. The document also discusses how composition operations can be performed in the vector space to model the meanings of multi-word expressions.
The document discusses various approaches to word sense disambiguation including supervised learning approaches like Naive Bayes classifiers, bootstrapping approaches like assigning one sense per discourse, and unsupervised approaches like Schutze's word space model. It also discusses using lexical semantic information like thematic roles, selectional restrictions, and WordNet to disambiguate word senses in context.
Presentation of "Challenges in transfer learning in NLP" from Madrid Natural Language Processing Meetup Event, May, 2019.
https://www.meetup.com/es-ES/Madrid-Natural-Language-Processing-meetup/
Practical related work in repository: https://github.com/laraolmos/madrid-nlp-meetup
This document describes a proposed method for introducing a new semantic relation between adjectives and nouns in WordNets. The relation is intended to capture attributes of nouns expressed by frequently occurring adjectives in simile constructions (e.g. "as busy as a bee" would link "busy" to "bee"). The method involves extracting simile constructions from an annotated corpus and identifying the most frequent adjective-noun pairs. These pairs would then be added as a new "specific attribute" relation to the Serbian WordNet to help with sentiment analysis and detecting figurative language. An evaluation found 84% of automatically identified pairs were also selected by human judges.
An Improved Approach to Word Sense DisambiguationSurabhi Verma
This document presents a knowledge-based algorithm for word sense disambiguation that uses WordNet. It computes the similarity between a target word and nearby words based on their intersection in WordNet hierarchies, distance between the words, and hierarchical level. The algorithm was evaluated on the SemCor corpus and performed better than existing supervised and unsupervised methods by frequently ranking the correct sense first or within the top three results.
Chapter 2 Text Operation and Term Weighting.pdfJemalNesre1
Zipf's law describes the frequency distribution of words in natural language corpora. It states that the frequency of any word is inversely proportional to its rank in the frequency table. Most words have low frequency, while a few words are used very frequently. Heap's law estimates how vocabulary size grows with corpus size, at a sub-linear rate. Text preprocessing techniques like stopword removal and stemming aim to reduce noise by excluding non-discriminative words from indexes.
The document discusses logical-statistical methods for knowledge acquisition from texts, including distribution-statistical analysis, component analysis, and frequency-semantic analysis. Distribution-statistical analysis uses the frequency of words occurring together to determine their semantic relationship. Component analysis examines word definitions for common elements. Frequency-semantic analysis considers both the similarity and frequency of elements in word definitions. These methods are used to build semantic fields by grouping words into descriptive categories.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
2. tence has been pre-tagged with its correct POS, so ing w, the morphological form of w, the frequently
that the possible senses to consider for a content co-occurring words surrounding w, the local colloca-
word w are only those associated with the particu- tions containing w, and the verb t h a t takes w as an
lar POS of w in the sentence. For instance, given object (for the case when w is a noun). These values
the sentence "A reduction of principal and interest form the features of a test example.
is one way the problem m a y be solved.", since the This test example is then compared to every train-
word "interest" appears as a noun in this sentence, ing example. The sense of word w in the test exam-
LEXAS will only consider the noun senses of "inter- ple is the sense of w in the closest matching train-
est" but not its verb senses. T h a t is, LEXAS is only ing example, where there is a precise, computational
concerned with disambiguating senses of a word in definition of "closest match" as explained later.
a given POS. Making such an assumption is reason-
able since POS taggers that can achieve accuracy 3.1 Feature Extraction
of 96% are readily available to assign POS to un- The first step of the algorithm is to extract a set F
restricted English sentences (Brill, 1992; Cutting et of features such that each sentence containing an oc-
al., 1992). currence of w will form a training example supplying
In addition, sense definitions are only available for the necessary values for the set F of features.
root words in a dictionary. These are words that Specifically, LEXAS uses the following set of fea-
are not morphologically inflected, such as "interest" tures to form a training example:
(as opposed to the plural form "interests"), "fall"
(as opposed to the other inflected forms like "fell", L3, L2, LI, 1~i, R2, R3, M, K I , . . . , Kin, el,..., 69, V
"fallen", "falling", "falls"), etc. The sense of a mor-
3.1.1 Part of Speech and Morphological
phologically inflected content word is the sense of its
Form
uninflected form. LEXAS follows this convention by
first converting each word in an input sentence into The value of feature Li is the part of speech (POS)
its morphological root using the morphological ana- of the word i-th position to the left of w. The value
lyzer of WORD NET, before assigning the appropriate of Ri is the POS of the word i-th position to the right
word sense to the root form. of w. Feature M denotes the morphological form of
w in the sentence s. For a noun, the value for this
feature is either singular or plural; for a verb, the
3 Algorithm value is one of infinitive (as in the uninflected form
LEXAS performs WSD by first learning from a train- of a verb like "fall"), present-third-person-singular
ing corpus of sentences in which words have been (as in "falls"), past (as in "fell"), present-participle
pre-tagged with their correct senses. T h a t is, it uses (as in "falling") or past-participle (as in "fallen").
supervised learning, in particular exemplar-based 3.1.2 Unordered Set of Surrounding Words
learning, to achieve WSD. Our approach has been
K t , • •., Km are features corresponding to a set of
fully implemented in the program LExAs. Part of
keywords that frequently co-occur with word w in
the implementation uses PEBLS (Cost and Salzberg,
the same sentence. For a sentence s, the value of
1993; Rachlin and Salzberg, 1993), a public domain
feature Ki is one if the keyword It'~ appears some-
exemplar-based learning system.
where in sentence s, else the value of Ki is zero.
LEXAS builds one exemplar-based classifier for The set of keywords K 1 , . . . , Km are determined
each content word w. It operates in two phases: based on conditional probability. All the word to-
training phase and test phase. In the training phase, kens other than the word occurrence w in a sen-
LEXAS is given a set S of sentences in the training tence s are candidates for consideration as keywords.
corpus in which sense-tagged occurrences of w ap- These tokens are converted to lower case form before
pear. For each training sentence with an occurrence being considered as candidates for keywords.
of w, LEXAS extracts the parts of speech (POS) of Let cp(ilk ) denotes the conditional probability of
words surrounding w, the morphological form of w, sense i of w given keyword k, where
the words that frequently co-occur with w in the
same sentence, and the local collocations containing Ni,k
w. For disambiguating a noun w, the verb which cp(ilk) = N~
takes the current noun w as the object is also iden-
tified. This set of values form the features of an ex- Nk is the number of sentences in which keyword k co-
ample, with one training sentence contributing one occurs with w, and Ni,k is the number of sentences
training example. in which keyword k co-occurs with w where w has
Subsequently, in the test phase, LEXAS is given sense i.
new, previously unseen sentences. For a new sen- For a keyword k to be selected as a feature, it
must satisfy the following criteria:
tence containing the word w, LI~XAS extracts from
the new sentence the values for the same set of fea- 1. cp(ilk ) >_ Mi for some sense i, where M1 is some
tures, including parts of speech of words surround- predefined m i n i m u m probability.
41
3. 2. The keyword k must occur at least M2 times Left Offset Right Offset Collocation Example
in some sense i, where /1//2 is some predefined -1 -1 accrued interest
m i n i m u m value. 1 1 interest rate
-2 -1 principal and interest
3. Select at most M3 number of keywords for a
-1 1 national interest in
given sense i if the number of keywords satisfy-
1 2 interest and dividends
ing the first two criteria for a given sense i ex-
ceeds M3. In this case, keywords that co-occur -3 -1 sale of an interest
more frequently (in terms of absolute frequency) -2 in the interest of
with sense i of word w are selected over those -1 2 an interest in a
co-occurring less frequently. 1 3 interest on the bonds
Condition 1 ensures that a selected keyword is in- Table 1: Features for Collocations
dicative of some sense i of w since cp(ilk) is at least
some m i n i m u m probability M1. Condition 2 reduces
the possibility of selecting a keyword based on spu- of the nine collocation features, LEXAS concatenates
rious occurrence. Condition 3 prefers keywords that the words between the left and right offset positions.
co-occur more frequently if there is a large number Using similar conditional probability criteria for the
of eligible keywords. selection of keywords, collocations t h a t are predic-
For example, M1 = 0.8, Ms = 5, M3 = 5 when tive of a certain sense are selected to form the pos-
LEXAS was tested on the c o m m o n data set reported sible values for a collocation feature.
in Section 4.1.
3.1.4 V e r b - O b j e c t S y n t a c t i c R e l a t i o n
To illustrate, when disambiguating the noun "in-
terest", some of the selected keywords are: ex- LEXAS also makes use of the verb-object syntactic
pressed, acquiring, great, attracted, expressions, relation as one feature V for the disambiguation of
pursue, best, conflict, served, short, minority, rates, nouns. If a noun to be disambiguated is the head of
rate, bonds, lower, payments. a noun group, as indicated by its last position in a
noun group bracketing, and if the word immediately
3.1.3 Local Collocations preceding the opening noun group bracketing is a
Local collocations are common expressions con- verb, LEXAS takes such a verb-noun pair to be in a
taining the word to be disambiguated. For our pur- verb-object syntactic relation. Again, using similar
pose, the term collocation does not imply idiomatic conditional probability criteria for the selection of
usage, just words that are frequently adjacent to the keywords, verbs that are predictive of a certain sense
word to be disambiguated. Examples of local collo- of the noun to be disambiguated are selected to form
cations of the noun "interest" include "in the interest the possible values for this verb-object feature V.
of", "principal and interest", etc. When a word to Since our training and test sentences come with
be disambiguated occurs as part of a collocation, its noun group bracketing, determining verb-object re-
sense can be frequently determined very reliably. For lation using the above heuristic can be readily done.
example, the collocation "in the interest of" always In future work, we plan to incorporate more syntac-
implies the "advantage, advancement, favor" sense tic relations including subject-verb, and adjective-
of the noun "interest". Note t h a t the method for headnoun relations. We also plan to use verb-
extraction of keywords that we described earlier will object and subject-verb relations to disambiguate
fail to find the words "in", "the", "of" as keywords, verb senses.
since these words will appear in m a n y different po-
sitions in a sentence for m a n y senses of the noun 3.2 Training and Testing
"interest". It is only when these words appear in The heart of exemplar-based learning is a measure
the exact order "in the interest of" around the noun of the similarity, or distance, between two examples.
"interest" that strongly implies the "advantage, ad- If the distance between two examples is small, then
vancement, favor" sense. the two examples are similar. We use the following
There are nine features related to collocations in definition of distance between two symbolic values
an example. Table 1 lists the nine features and some vl and v2 of a feature f:
collocation examples for the noun "interest". For ex-
ample, the feature with left offset = -2 and right off-
set = 1 refers to the possible collocations beginning
e(vl, v2) = Ic1'cl c2,c. I
i=1
at the word two positions to the left of "interest"
and ending at the word one position to the right of Cl,i is the number of training examples with value
"interest". An example of such a collocation is "in vl for feature f that is classified as sense i in the
the interest of". training corpus, and C1 is the number of training
The method for extraction of local collocations is examples with value vl for feature f in any sense.
similar to that for extraction of keywords. For each C2,i and C2 denote similar quantities for value v2 of
42
4. feature f . n is the total number of senses for a word L D O C E sense Frequency Percent
W. 1: readiness to give 361 15%
This metric for measuring distance is adopted attention
from (Cost and Salzberg, 1993), which in turn is 2: quality of causing 11 <1%
adapted from the value difference metric of the ear- attention to be given
lier work of (Stanfill and Waltz, 1986). The distance 3: activity, subject, etc. 67 3%
between two examples is the sum of the distances which one gives time and
between the values of all the features of the two ex- attention to
amples. 4: advantage, 178 8%
During the training phase, the appropriate set of advancement, or favor
features is extracted based on the method described 5: a share (in a company, 499 21%
in Section 3.1. From the training examples formed, business, etc.)
the distance between any two values for a feature f 6: money paid for the use 1253 53%
is computed based on the above formula. of money
During the test phase, a test example is compared
against allthe training examples. LEXAS then deter- Table 2: Distribution of Sense Tags
mines the closest matching training example as the
one with the m i n i m u m distance to the test example.
The sense of w in the test example is the sense of w verb-object syntactic relation to form the features of
in this closest matching training example. examples.
If there is a tie among several training examples In the results reported in (Bruce and Wiebe,
with the same m i n i m u m distance to the test exam- 1994), they used a test set of 600 randomly selected
ple, LEXAS randomly selects one of these training sentences from the 2369 sentences. Unfortunately,
examples as the closet matching training example in in the data set made available in the public domain,
order to break the tie. there is no indication of which sentences are used as
test sentences. As such, we conducted 100 random
4 Evaluation trials, and in each trial, 600 sentences were randomly
selected to form the test set. LEXAS is trained on
To evaluate the performance of LEXAS, we con- the remaining 1769 sentences, and then tested on a
ducted two tests, one on a common data set used in separate test set of sentences in each trial.
(Bruce and Wiebe, 1994), and another on a larger Note that in Bruce and Wiebe's test run, the pro-
data set that we separately collected. portion of sentences in each sense in the test set is
approximately equal to their proportion in the whole
4.1 Evaluation on a Common Data Set d a t a set. Since we use r a n d o m selection of test sen-
To our knowledge, very few of the existing work on tences, the proportion of each sense in our test set is
WSD has been tested and compared on a common also approximately equal to their proportion in the
data set. This is in contrast to established practice whole d a t a set in our r a n d o m trials.
in the machine learning community. This is partly The average accuracy of LEXAS over 100 random
because there are not m a n y c o m m o n data sets pub- trials is 87.4%, and the standard deviation is 1.37%.
licly available for testing WSD programs. In each of our 100 r a n d o m trials, the accuracy of
One exception is the sense-tagged data set used LEXAS is always higher than the accuracy of 78%
in (Bruce and Wiebe, 1994), which has been made reported in (Bruce and Wiebe, 1994).
available in the public domain by Bruce and Wiebe. Bruce and Wiebe also performed a separate test
This data set consists of 2369 sentences each con- by using a subset of the "interest" data set with only
taining an occurrence of the noun "interest" (or its 4 senses (sense 1, 4, 5, and 6), so as to compare their
plural form "interests") with its correct sense man- results with previous work on WSD (Black, 1988;
ually tagged. The noun "interest" occurs in six dif- Zernik, 1990; Yarowsky, 1992), which were tested
ferent senses in this d a t a set. Table 2 shows the on 4 senses of the noun "interest". However, the
distribution of sense tags from the d a t a set that we work of (Black, 1988; Zernik, 1990; Yarowsky, 1992)
obtained. Note that the sense definitions used in this were not based on the present set of sentences, so
data set are those from Longman Dictionary of Con- the comparison is only suggestive. We reproduced
temporary English (LDOCE) (Procter, 1978). This in Table 3 the results of past work as well as the clas-
does not pose any problem for LEXAS, since LEXAS sification accuracy of LEXAS, which is 89.9% with a
only requires that there be a division of senses into standard deviation of 1.09% over 100 random trials.
different classes, regardless of how the sense classes In summary, when tested on the noun "interest",
are defined or numbered. LEXAS gives higher classification accuracy than pre-
POS of words are given in the data set, as well vious work on WSD.
as the bracketings of noun groups. These are used In order to evaluate the relative contribution of
to determine the POS of neighboring words and the the knowledge sources, including (1) POS and mor-
43
5. WSD research Accuracy These 192,800 word occurrences consist of 121
Black (1988) 72% nouns and 70 verbs which are the m o s t frequently oc-
Zernik (1990) 70% curring and most ambiguous words of English. The
Yarowsky (1992) 72% 121 nouns are:
Bruce & Wiebe (1994) 79%
action activity age air area art board
LEXhS (1996) 89%
body book business car case center cen-
tury change child church city class college
Table 3: Comparison with previous results
community company condition cost coun-
try course day death development differ-
Knowledge Source Mean Accuracy Std Dev
ence door effect effort end example experi-
POS & morpho 77.2% 1.44% ence face fact family field figure foot force
surrounding words 62.0% 1.82% form girl government ground head history
collocations 80.2% 1.55% home hour house information interest job
verb-object 43.5% 1.79% land law level life light line m a n mate-
rial m a t t e r m e m b e r mind m o m e n t money
Table 4: Relative Contribution of Knowledge m o n t h n a m e nation need number order
Sources part party picture place plan point pol-
icy position power pressure problem pro-
cess program public purpose question rea-
phological form; (2) unordered set of surrounding
son result right r o o m school section sense
words; (3) local collocations; and (4) verb to the left
service side society stage state step student
(verb-object syntactic relation), we conducted 4 sep-
study surface system table t e r m thing time
arate runs of 100 r a n d o m trials each. In each run,
town type use value voice water way word
we utilized only one knowledge source and compute
work world
the average classification accuracy and the standard
deviation. The results are given in Table 4. The 70 verbs are:
Local collocation knowledge yields the highest ac-
curacy, followed by POS and morphological form. add appear ask become believe bring build
Surrounding words give lower accuracy, perhaps be- call carry change come consider continue
cause in our work, only the current sentence forms determine develop draw expect fall give
the surrounding context, which averages about 20 go grow happen help hold indicate involve
words. Previous work on using the unordered set of keep know lead leave lie like live look lose
mean meet move need open pay raise read
surrounding words have used a much larger window,
such as the 100-word window of (Yarowsky, 1992), receive r e m e m b e r require return rise run
and the 2-sentence context of (Leacock et al., 1993). see seem send set show sit speak stand start
Verb-object syntactic relation is the weakest knowl- stop strike take talk tell think turn wait
edge source. walk want work write
Our experimental finding, t h a t local collocations For this set of nouns and verbs, the average num-
are the most predictive, agrees with past observa- ber of senses per noun is 7.8, while the average num-
tion that humans need a narrow window of only a ber of senses per verb is 12.0. We draw our sen-
few words to perform WSD (Choueka and Lusignan, tences containing the occurrences of the 191 words
1985). listed above from the combined corpus of the 1 mil-
The processing speed of LEXAS is satisfactory. lion word Brown corpus and the 2.5 million word
Running on an SGI Unix workstation, LEXAS can Wall Street Journal (WSJ) corpus. For every word
process about 15 examples per second when tested in the two lists, up to 1,500 sentences each con-
on the "interest" data set. taining an occurrence of the word are extracted
from the combined corpus. In all, there are about
4.2 E v a l u a t i o n o n a L a r g e D a t a Set 113,000 noun occurrences and a b o u t 79,800 verb oc-
Previous research on W S D tend to be tested only currences. This set of 121 nouns accounts for about
on a dozen number of words, where each word fre- 20% of all occurrences of nouns t h a t one expects to
quently has either two or a few senses. To test the encounter in any unrestricted English text. Simi-
scalability of LEXAS, we have gathered a corpus in larly, about 20% of all verb occurrences in any unre-
which 192,800 word occurrences have been manually stricted text come from the set of 70 verbs chosen.
tagged with senses from WoRDNET 1.5. This data We estimate that there are 10-20% errors in our
set is almost two orders of magnitude larger in size sense-tagged d a t a set. To get an idea of how the
than the above "interest" data set. Manual tagging sense assignments of our d a t a set compare with
was done by university undergraduates majoring in those provided by WoRDNET linguists in SEMCOR,
Linguistics, and approximately one man-year of ef- the sense-tagged subset of Brown corpus prepared
forts were expended in tagging our data set. by Miller et al. (Miller et al., 1994), we compare
44
6. Test set Sense 1 Most Frequent LEXAS ambiguate words coming from such a wide variety of
BC50 40.5% 47.1% 54.0% texts.
WSJ6 44.8% 63.7% 68.6%
5 Related Work
Table 5: Evaluation on a Large Data Set
There is now a large body of past work on WSD.
Early work on WSD, such as (Kelly and Stone, 1975;
a subset of the occurrences that overlap. Out of Hirst, 1987) used hand-coding of knowledge to per-
5,317 occurrences that overlap, about 57% of the form WSD. The knowledge acquisition process is la-
sense assignments in our data set agree with those borious. In contrast, LEXAS learns from tagged sen-
in SEMCOR. This should not be too surprising, as tences, without human engineering of complex rules.
it is widely believed that sense tagging using the The recent emphasis on corpus based NLP has re-
full set of refined senses found in a large dictionary sulted in much work on WSD of unconstrained real-
like WORDNET involve making subtle human judg- world texts. One line of research focuses on the use
ments (Wilks et al., 1990; Bruce and Wiebe, 1994), of the knowledge contained in a machine-readable
such that there are many genuine cases where two dictionary to perform WSD, such as (Wilks et al.,
humans will not agree fully on the best sense assign- 1990; Luk, 1995). In contrast, LEXAS uses super-
ments. vised learning from tagged sentences, which is also
We evaluated LEXAS on this larger set of noisy, the approach taken by most recent work on WSD, in-
sense-tagged data. We first set aside two subsets for cluding (Bruce and Wiebe, 1994; Miller et al., 1994;
testing. The first test set, named BC50, consists of Leacock et al., 1993; Yarowsky, 1994; Yarowsky,
7,119 occurrences of the 191 content words that oc- 1993; Yarowsky, 1992).
cur in 50 text files of the Brown corpus. The second The work of (Miller et al., 1994; Leacock et al.,
test set, named WSJ6, consists of 14,139 occurrences 1993; Yarowsky, 1992) used only the unordered set of
of the 191 content words that occur in 6 text files of surrounding words to perform WSD, and they used
the WSJ corpus. statistical classifiers, neural networks, or IR-based
We compared the classification accuracy of LEXAS techniques. The work of (Bruce and Wiebe, 1994)
against the default strategy of picking the most fre- used parts of speech (POS) and morphological form,
quent sense. This default strategy has been advo- in addition to surrounding words. However, the POS
cated as the baseline performance level for compar- used are abbreviated POS, and only in a window of
ison with WSD programs (Gale et al., 1992). There -b2 words. No local collocation knowledge is used. A
are two instantiations of this strategy in our current probabilistic classifier is used in (Bruce and Wiebe,
evaluation. Since WORDNET orders its senses such 1994).
that sense 1 is the most frequent sense, one pos- T h a t local collocation knowledge provides impor-
sibility is to always pick sense 1 as the best sense tant clues to WSD is pointed out in (Yarowsky,
assignment. This assignment method does not even 1993), although it was demonstrated only on per-
need to look at the training sentences. We call this forming binary (or very coarse) sense disambigua-
method "Sense 1" in Table 5. Another assignment tion. The work of (Yarowsky, 1994) is perhaps the
method is to determine the most frequently occur- most similar to our present work. However, his work
ring sense in the training sentences, and to assign used decision list to perform classification, in which
this sense to all test sentences. We call this method only the single best disambiguating evidence that
"Most Frequent" in Table 5. The accuracy of LEXAS matched a target context is used. In contrast, we
on these two test sets is given in Table 5. used exemplar-based learning, where the contribu-
Our results indicate that exemplar-based classi- tions of all features are summed up and taken into
fication of word senses scales up quite well when account in coming up with a classification. We also
tested on a large set of words. The classification include verb-object syntactic relation as a feature,
accuracy of LEXAS is always better than the default which is not used in (Yarowsky, 1994). Although the
strategy of picking the most frequent sense. We be- work of (Yarowsky, i994) can be applied to WSD,
lieve that our result is significant, especially when the results reported in (Yarowsky, 1994) only dealt
the training data is noisy, and the words are highly with accent restoration, which is a much simpler
ambiguous with a large number of refined sense dis- problem. It is unclear how Yarowsky's method will
tinctions per word. fare on WSD of a common test data set like the one
The accuracy on Brown corpus test files is lower we used, nor has his method been tested on a large
than that achieved on the Wall Street Journal test data set with highly ambiguous words tagged with
files, primarily because the Brown corpus consists the refined senses of WORDNET.
of texts from a wide variety of genres, including The work of (Miller et al., 1994) is the only prior
newspaper reports, newspaper editorial, biblical pas- work we know of which attempted to evaluate WSD
sages, science and mathematics articles, general fic- on a large data set and using the refined sense dis-
tion, romance story, humor, etc. It is harder to dis- tinction of WORDNET. However, their results show
45
7. no improvement (in fact a slight degradation in per- References
formance) when using surrounding words to perform
WSD as compared to the most frequent heuristic. Ezra Black. 1988. An experiment in computational
They attributed this to insufficient training data in discrimination of English word senses. IBM Jour-
SEMCOm In contrast, we adopt a different strategy nal of Research and Development, 32(2):185-194.
of collecting the training data set. Instead of tagging Eric Brill. 1992. A simple rule-based part of speech
every word in a running text, as is done in SEMCOR, tagger. In Proceedings of the Third Conference on
we only concentrate on the set of 191 most frequently Applied Natural Language Processing, pages 152-
occurring and most ambiguous words, and collected 155.
large enough training data for these words only. This
strategy yields better results, as indicated by a bet- Rebecca Bruce and Janyce Wiebe. 1994. Word-
ter performance of LEXAS compared with the most sense disambiguation using decomposable mod-
frequent heuristic on this set of words. els. In Proceedings of the 32nd Annual Meeting
Most recently, Yarowsky used an unsupervised of the Association for Computational Linguistics,
learning procedure to perform WSD (Yarowsky, Las Cruces, New Mexico.
1995), although this is only tested on disambiguat- Claire Cardie. 1993. A case-based approach to
ing words into binary, coarse sense distinction. The knowledge acquisition for domain-specific sen-
effectiveness of unsupervised learning on disam- tence analysis. In Proceedings of the Eleventh Na-
biguating words into the refined sense distinction of tional Conference on Artificial Intelligence, pages
WoRBNET needs to be further investigated. The 798-803, Washington, DC.
work of (McRoy, 1992) pointed out that a diverse
set of knowledge sources are important to achieve Y. Choueka and S. Lusignan. 1985. Disambiguation
WSD, but no quantitative evaluation was given on by short contexts. Computers and the Humani-
the relative importance of each knowledge source. ties, 19:147-157.
No previous work has reported any such evaluation Scott Cost and Steven Salzberg. 1993. A weighted
either. The work of (Cardie, 1993) used a case-based nearest neighbor algorithm for learning with sym-
approach that simultaneously learns part of speech, bolic features. Machine Learning, 10(1):57-78.
word sense, and concept activation knowledge, al-
Doug Cutting, Julian Kupiec, Jan Pedersen, and
though the method is only tested on domain-specific
Penelope Sibun. 1992. A practical part-of-speech
texts with domain-specific word senses.
tagger. In Proceedings of the Third Conference on
6 Conclusion Applied Natural Language Processing, pages 133-
140.
In this paper, we have presented a new approach for William Gale, Kenneth Ward Church, and David
WSD using an exemplar based learning algorithm. Yarowsky. 1992. Estimating upper and lower
This approach integrates a diverse set of knowledge bounds on the performance of word-sense disam-
sources to disambiguate word sense. When tested on
biguation programs. In Proceedings of the 30th
a common data set, our WSD program gives higher
Annual Meeting of the Association for Computa-
classification accuracy than previous work on WSD. tional Linguistics, Newark, Delaware.
When tested on a large, separately collected data
set, our program performs better than the default Graeme Hirst. 1987. Semantic Interpretation and
strategy of picking the most frequent sense. To our the Resolution of Ambiguity. Cambridge Univer-
knowledge, this is the first time that a WSD program sity Press, Cambridge.
has been tested on such a large scale, and yielding Edward Kelly and Phillip Stone. 1 9 7 5 . Com-
results better than the most frequent heuristic on puter Recognition of English Word Senses. North-
highly ambiguous words with the refined senses of Holland, Amsterdam.
WoRDNET.
Claudia Leacock, Geoffrey Towell, and Ellen
7 Acknowledgements Voorhees. 1993. Corpus-based statistical sense
resolution. In Proceedings of the ARPA Human
We would like to thank: Dr Paul Wu for sharing Language Technology Workshop.
the Brown Corpus and Wall Street Journal Corpus;
Dr Christopher Ting for downloading and installing Alpha K. Luk. 1995. Statistical sense disambigua-
WoRDNET and SEMCOR, and for reformatting the tion with relatively small corpora using dictio-
corpora; the 12 undergraduates from the Linguis- nary definitions. In Proceedings of the 33rd An-
tics Program of the National University of Singa- nual Meeting of the Association for Computa-
pore for preparing the sense-tagged corpus; and Prof tional Linguistics, Cambridge, Massachusetts.
K. P. Mohanan for his support of the sense-tagging Susan W. McRoy 1992. Using multiple knowledge
project. sources for word sense discrimination. Computa-
tional Linguistics, 18(1):1-30.
46
8. George A. Miller, Ed. 1990. WordNet: An on-line
lexical database. International Journal of Lezi-
cography, 3(4):235-312.
George A. Miller, Martin Chodorow, Shari Landes,
Claudia Leacock, and Robert G. Thomas. 1994.
Using a semantic concordance for sense identifi-
cation. In Proceedings of the ARPA Human Lan-
guage Technology Workshop.
Paul Procter et al. 1978. Longman Dictionary of
Contemporary English.
John Rachlin and Steven Salzberg. 1993. PEBLS
3.0 User's Guide.
C Stanfill and David Waltz. 1986. Toward memory-
based reasoning. Communications of the ACM,
29(12):1213-1228.
Yorick Wilks, Dan Fass, Cheng-Ming Guo, James E.
McDonald, Tony Plate, and Brian M. Slator.
1990. Providing machine tractable dictionary
tools. Machine Translation, 5(2):99-154.
David Yarowsky. 1992. Word-sense disambigua-
tion using statistical models of Roger's categories
trained on large corpora. In Proceedings of the
Fifteenth International Conference on Computa-
tional Linguistics, pages 454-460, Nantes, France.
David Yarowsky. 1993. One sense per colloca-
tion. In Proceedings of the ARPA Human Lan-
guage Technology Workshop.
David Yarowsky. 1994. Decision lists for lexical am-
biguity resolution: Application to accent restora-
tion in Spanish and French. In Proceedings of the
32nd Annual Meeting of the Association for Com-
putational Linguistics, Las Cruces, New Mexico.
David Yarowsky. 1995. Unsupervised word sense
disambiguation rivaling supervised methods. In
Proceedings of the 33rd Annual Meeting of the
Association for Computational Linguistics, Cam-
bridge, Massachusetts.
Uri Zernik. 1990. Tagging word senses in corpus:
the needle in the haystack revisited. Technical
Report 90CRD198, GE R&D Center.
47