This document discusses exploring metaphorical data in technical documents. It begins by describing how over 1,800 metaphors were collected from various websites and compiled into a CSV file. These include simple metaphors like "sea of fire" as well as more complex metaphors used by Shakespeare. The document then discusses using regular expressions and the Hadoop framework to analyze technical documents and identify if any of the collected metaphors are present. It summarizes several articles on metaphors, identifying five common types of metaphors used, and discusses rules for identifying metaphors. Finally, it discusses how metaphors are conceptual and how we structure our thinking based on common metaphors like "argument is war" and concepts of time.
The role of linguistic information for shallow language processingConstantin Orasan
The document discusses shallow language processing and summarization. It argues that while deep language understanding is limited, shallow methods can be improved by adding linguistic information. As an example, it shows how term frequency, anaphora resolution, discourse cues and genetic algorithms can select extractive summaries that better match human abstracts, without requiring full text comprehension.
Lecture 2: From Semantics To Semantic-Oriented ApplicationsMarina Santini
From the "Natural Language Processing" LinkedIn group:
John Kontos, Professor of Artificial Intelligence
I wonder whether translating into formal logic is nothing more than transliteration which simply isolates the part of the text that can be reasoned upon using the simple inference mechanism of formal logic. The real problem I think lies with the part of text that CANNOT be translated one the one hand and the one that changes its meaning due to civilization advances. My own proposal is to leave NL text alone and try building inference mechanisms for the UNTRANSLATED text depending on the task requirements.
All the best
John"
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSijcsit
The paper proposes a method to check whether two documents are having textual plagiarism or not. This
technique is based on Extrinsic Plagiarism Detection. The technique that is applied here is quite similar to
the one that is used to grade the short answers. The triplets and associated information are extracted from
both texts and are stored in friendship matrices. Then these two friendship matrices are compared and a
similarity percentage is calculated. This similarity percentage is used to take decisions
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...Carrie Wang
This project developed a new quantitative methodology using feature-based context-free grammar to analyze discourse semantics from social media discussions in order to identify potential drug abuse. The methodology was able to parse YouTube comments about recreational cough syrup use and perform anaphora resolution. This computational representation of discourse contributes to understanding human language structure and has applications in public health monitoring and clinical research.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Distributional semantics is a research area that uses statistical analysis of linguistic contexts to develop theories and methods for determining the semantic similarities between words and linguistic items based on their distributional properties in large text corpora. It is based on the distributional hypothesis that words with similar distributions have similar meanings. Distributional semantic models represent words as vectors in a high-dimensional semantic space based on their co-occurrence with other words, allowing semantic similarity to be measured using vector similarity methods. Common distributional semantic models include term frequency-inverse document frequency (tf-idf), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and word embeddings.
The role of linguistic information for shallow language processingConstantin Orasan
The document discusses shallow language processing and summarization. It argues that while deep language understanding is limited, shallow methods can be improved by adding linguistic information. As an example, it shows how term frequency, anaphora resolution, discourse cues and genetic algorithms can select extractive summaries that better match human abstracts, without requiring full text comprehension.
Lecture 2: From Semantics To Semantic-Oriented ApplicationsMarina Santini
From the "Natural Language Processing" LinkedIn group:
John Kontos, Professor of Artificial Intelligence
I wonder whether translating into formal logic is nothing more than transliteration which simply isolates the part of the text that can be reasoned upon using the simple inference mechanism of formal logic. The real problem I think lies with the part of text that CANNOT be translated one the one hand and the one that changes its meaning due to civilization advances. My own proposal is to leave NL text alone and try building inference mechanisms for the UNTRANSLATED text depending on the task requirements.
All the best
John"
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHSijcsit
The paper proposes a method to check whether two documents are having textual plagiarism or not. This
technique is based on Extrinsic Plagiarism Detection. The technique that is applied here is quite similar to
the one that is used to grade the short answers. The triplets and associated information are extracted from
both texts and are stored in friendship matrices. Then these two friendship matrices are compared and a
similarity percentage is calculated. This similarity percentage is used to take decisions
Introduction to Distributional SemanticsAndre Freitas
This document provides an introduction to distributional semantics. It discusses how distributional semantic models (DSMs) represent word meanings as vectors based on their linguistic contexts in large corpora. This distributional hypothesis states that words that appear in similar contexts tend to have similar meanings. The document outlines how DSMs are built, important parameters like context type and weighting, and examples like latent semantic analysis. It also discusses how DSMs can support applications like semantic search. Finally, it introduces how compositional semantics explores representing the meanings of phrases and sentences compositionally based on the meanings of their parts.
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...Carrie Wang
This project developed a new quantitative methodology using feature-based context-free grammar to analyze discourse semantics from social media discussions in order to identify potential drug abuse. The methodology was able to parse YouTube comments about recreational cough syrup use and perform anaphora resolution. This computational representation of discourse contributes to understanding human language structure and has applications in public health monitoring and clinical research.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Distributional semantics is a research area that uses statistical analysis of linguistic contexts to develop theories and methods for determining the semantic similarities between words and linguistic items based on their distributional properties in large text corpora. It is based on the distributional hypothesis that words with similar distributions have similar meanings. Distributional semantic models represent words as vectors in a high-dimensional semantic space based on their co-occurrence with other words, allowing semantic similarity to be measured using vector similarity methods. Common distributional semantic models include term frequency-inverse document frequency (tf-idf), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and word embeddings.
Presentation of the Marcu 2000 ACL paper "The rhetorical parsing of unrestricted texts- A surface-based approach" for Discourse Parsing and Language Technology seminar.
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML
The traditional semantics for First Order Logic (sometimes called Tarskian semantics) is based on the notion of interpretations of constants. Herbrand semantics is an alternative semantics based directly on truth assignments for ground sentences rather than interpretations of constants. Herbrand semantics is simpler and more intuitive than Tarskian semantics; and, consequently, it is easier to teach and learn. Moreover, it is more expressive. For example, while it is not possible to finitely axiomatize integer arithmetic with Tarskian semantics, this can be done easily with Herbrand Semantics. The downside is a loss of some common logical properties, such as compactness and completeness. However, there is no loss of inferential power. Anything that can be proved according to Tarskian semantics can also be proved according to Herbrand semantics. In this presentation, we define Herbrand semantics; we look at the implications for research on logic and rules systems and automated reasoning; and and we assess the potential for popularizing logic.
The document discusses language independent methods for clustering similar contexts without using syntactic or lexical resources. It describes representing contexts as vectors of lexical features, reducing dimensionality, and clustering the vectors. Key methods include identifying unigram, bigram and co-occurrence features from corpora using frequency counts and association measures, and representing contexts in first or second order vectors based on feature presence.
The document describes language-independent methods for clustering similar contexts without using syntactic or lexical resources. It discusses representing contexts as vectors of lexical features and clustering them based on similarity. Feature selection involves identifying unigrams, bigrams, and co-occurrences based on frequency or association measures. Contexts can then be represented in first-order or second-order feature spaces and clustered. Applications include word sense discrimination, document clustering, and name discrimination.
FCA-MERGE: Bottom-Up Merging of Ontologiesalemarrena
The document describes a new bottom-up method called FCA-MERGE for merging ontologies. It extracts instances from documents for each ontology to generate formal contexts. It then merges the contexts and computes a concept lattice using techniques from Formal Concept Analysis. This lattice provides a structural description of the merging process. The final merged ontology is then generated from the lattice with human guidance. FCA-MERGE circumvents the problem of finding instances classified in both ontologies by extracting instances from relevant documents.
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
The document presents a method for extracting semantic relations between concepts in Arabic texts. It constructs context vectors for concepts based on their co-occurrence with other concepts. It then uses several semantic similarity measures (Cosine, Jaccard, Lin) to calculate similarity scores between candidate concept vectors and seed concept vectors. Relations are extracted between candidates and seeds if their similarity score is above the average threshold for that seed. The method was evaluated on an Arabic corpus and achieved a precision of 83-85% for relation extraction, showing it is an effective unsupervised approach for extracting relations to construct Arabic ontologies.
International Journal of Computational Engineering Research(IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
The spread and abundance of electronic documents requires automatic techniques for extracting useful information from the text they contain. The availability of conceptual taxonomies can be of great help, but manually building them is a complex and costly task. Building on previous work, we propose a technique to automatically extract conceptual graphs from text and reason with them. Since automated learning of taxonomies needs to be robust with respect to missing or partial knowledge and flexible with respect to noise, this work proposes a way to deal with these problems. The case of poor data/sparse concepts is tackled by finding generalizations among disjoint pieces of knowledge. Noise is
handled by introducing soft relationships among concepts rather than hard ones, and applying a probabilistic inferential setting. In particular, we propose to reason on the extracted graph using different kinds of relationships among concepts, where each arc/relationship is associated to a number that represents its likelihood among all possible worlds, and to face the problem of sparse knowledge by using generalizations among distant concepts as bridges between disjoint portions of knowledge.
The document presents a new ontology matching system based on a multi-agent architecture. The system takes ontologies described in XML, RDF Schema, and OWL as input. It uses multiple matchers and filtering to generate mappings between ontology entities. The mappings are then validated. The system is implemented as a multi-agent system with different agent types responsible for resources, matching, generating mappings, and filtering/validating mappings. The architecture allows for robust, flexible, and scalable ontology matching.
A Constructive Mathematics approach for NL formal grammarsFederico Gobbo
This document discusses formalizing natural language grammars through adpositional grammars (AdGrams). AdGrams are based on cognitive linguistics concepts of trajector and landmark. The document proposes that natural language structure can be expressed through a triple involving a governor, dependent, and their relation. It provides examples analyzing phrases and sentences as either dependency-based or government-based structures based on whether the dependent or governor is the trajector. The goal of AdGrams is to formalize natural language grammars in a way that is informed by cognitive linguistics concepts and can be computationally analyzed.
The document discusses the basics of ontologies, including their origin in philosophy, definitions, types, benefits and application areas. Some key points are:
- An ontology is a formal specification of a conceptualization used to help humans and programs share knowledge. It establishes a shared vocabulary for exchanging information.
- Ontologies describe domain knowledge and provide an agreed-upon understanding of a domain through concepts and relations. They help solve problems of ambiguity and enable knowledge sharing.
- Ontologies benefit applications like information retrieval, digital libraries, knowledge engineering and natural language processing by facilitating semantic search and integration of data.
This document describes a proposed concept-based mining model that aims to improve document clustering and information retrieval by extracting concepts and semantic relationships rather than just keywords. The model uses natural language processing techniques like part-of-speech tagging and parsing to extract concepts from text. It represents concepts and their relationships in a semantic network and clusters documents based on conceptual similarity rather than term frequency. The model is evaluated using singular value decomposition to increase the precision of key term and phrase extraction.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
This document summarizes a survey on string similarity matching search techniques. It discusses how string similarity matching is used to find relevant information in text collections. The document reviews different algorithms for string matching, including edit distance, NR-grep, n-grams, and approaches based on hashing and locality-sensitive hashing. It analyzes techniques like pattern matching, threshold-based joins, and vector representations. The goal is to present an overview of the field and compare algorithm performance for similarity searches.
In this paper we present the SMalL Ontology for malicious software classification, SMalL Java Application for antivirus systems comparison and the SMalL knowledge based file format for malware related attacks. We believe that our ontology is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. The application is a prototype regarding how this ontology might be used in conjunction with known antivirus capabilities to offer a comprehensive comparison.
Analogy is one of the most studied representatives of a family of non-classical forms of reasoning working across different domains, usually taken to play a crucial role in creative thought and problem-solving. In the first part of the talk, I will shortly introduce general principles of computational analogy models (relying on a generalization-based approach to analogy-making). We will then have a closer look at Heuristic-Driven Theory Projection (HDTP) as an example for a theoretical framework and implemented system: HDTP computes analogical relations and inferences for domains which are represented using many-sorted first-order logic languages, applying a restricted form of higher-order anti-unification for finding shared structural elements common to both domains. The presentation of the framework will be followed by a few reflections on the "cognitive plausibility" of the approach motivated by theoretical complexity and tractability considerations.
In the second part of the talk I will discuss an application of HDTP to modeling essential parts of concept blending processes as current "hot topic" in Cognitive Science. Here, I will sketch an analogy-inspired formal account of concept blending —developed in the European FP7-funded Concept Invention Theory (COINVENT) project— combining HDTP with mechanisms from Case-Based Reasoning.
Topic detecton by clustering and text miningIRJET Journal
This document discusses topic detection from text documents using text mining and clustering techniques. It proposes extracting keywords from documents, representing topics as groups of keywords, and using k-means clustering on the keywords to group them into topics. The keywords are extracted based on frequency counts and preprocessed by removing stop words and stemming. The k-means clustering algorithm is used to assign keywords to topics represented by cluster centroids, and the centroids are iteratively updated until cluster assignments converge.
This document provides an overview of basic probability concepts and statistical methods. It discusses probability as it relates to outcomes and events, and as the tool used in statistics to make inferences from samples. It then covers specific probability concepts like n-gram models, which use the previous n-1 words to predict the next word. The document also summarizes part-of-speech tagging methods, including rule-based, supervised stochastic, and unsupervised approaches. Freely available POS taggers for various languages are also listed.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
Name __________ Grading Rubric for a Power Point Projec.docxroushhsiu
The document provides a grading rubric for a PowerPoint project. It evaluates various elements of the project on a scale from 1 to 5 in these categories: content, slide creation, slide transitions, pictures/images, mechanics, and technology use. For each category, it describes the criteria for meeting expectations and excelling. Overall, the rubric provides a detailed breakdown of the standards and quality levels for the PowerPoint assignment.
Natural language processing (NLP) is a subfield of artificial intelligence that aims to allow computers to understand human language. NLP involves analyzing and representing text or speech at different linguistic levels for applications like question answering or machine translation. Challenges for NLP include ambiguities in language like lexical, syntactic, semantic, and anaphoric ambiguities. Common NLP tasks include part-of-speech tagging, parsing, named entity recognition, and sentiment analysis. Applications of NLP include text processing, machine translation, speech processing, and converting text to speech.
Presentation of the Marcu 2000 ACL paper "The rhetorical parsing of unrestricted texts- A surface-based approach" for Discourse Parsing and Language Technology seminar.
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML
The traditional semantics for First Order Logic (sometimes called Tarskian semantics) is based on the notion of interpretations of constants. Herbrand semantics is an alternative semantics based directly on truth assignments for ground sentences rather than interpretations of constants. Herbrand semantics is simpler and more intuitive than Tarskian semantics; and, consequently, it is easier to teach and learn. Moreover, it is more expressive. For example, while it is not possible to finitely axiomatize integer arithmetic with Tarskian semantics, this can be done easily with Herbrand Semantics. The downside is a loss of some common logical properties, such as compactness and completeness. However, there is no loss of inferential power. Anything that can be proved according to Tarskian semantics can also be proved according to Herbrand semantics. In this presentation, we define Herbrand semantics; we look at the implications for research on logic and rules systems and automated reasoning; and and we assess the potential for popularizing logic.
The document discusses language independent methods for clustering similar contexts without using syntactic or lexical resources. It describes representing contexts as vectors of lexical features, reducing dimensionality, and clustering the vectors. Key methods include identifying unigram, bigram and co-occurrence features from corpora using frequency counts and association measures, and representing contexts in first or second order vectors based on feature presence.
The document describes language-independent methods for clustering similar contexts without using syntactic or lexical resources. It discusses representing contexts as vectors of lexical features and clustering them based on similarity. Feature selection involves identifying unigrams, bigrams, and co-occurrences based on frequency or association measures. Contexts can then be represented in first-order or second-order feature spaces and clustered. Applications include word sense discrimination, document clustering, and name discrimination.
FCA-MERGE: Bottom-Up Merging of Ontologiesalemarrena
The document describes a new bottom-up method called FCA-MERGE for merging ontologies. It extracts instances from documents for each ontology to generate formal contexts. It then merges the contexts and computes a concept lattice using techniques from Formal Concept Analysis. This lattice provides a structural description of the merging process. The final merged ontology is then generated from the lattice with human guidance. FCA-MERGE circumvents the problem of finding instances classified in both ontologies by extracting instances from relevant documents.
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
The document presents a method for extracting semantic relations between concepts in Arabic texts. It constructs context vectors for concepts based on their co-occurrence with other concepts. It then uses several semantic similarity measures (Cosine, Jaccard, Lin) to calculate similarity scores between candidate concept vectors and seed concept vectors. Relations are extracted between candidates and seeds if their similarity score is above the average threshold for that seed. The method was evaluated on an Arabic corpus and achieved a precision of 83-85% for relation extraction, showing it is an effective unsupervised approach for extracting relations to construct Arabic ontologies.
International Journal of Computational Engineering Research(IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
The spread and abundance of electronic documents requires automatic techniques for extracting useful information from the text they contain. The availability of conceptual taxonomies can be of great help, but manually building them is a complex and costly task. Building on previous work, we propose a technique to automatically extract conceptual graphs from text and reason with them. Since automated learning of taxonomies needs to be robust with respect to missing or partial knowledge and flexible with respect to noise, this work proposes a way to deal with these problems. The case of poor data/sparse concepts is tackled by finding generalizations among disjoint pieces of knowledge. Noise is
handled by introducing soft relationships among concepts rather than hard ones, and applying a probabilistic inferential setting. In particular, we propose to reason on the extracted graph using different kinds of relationships among concepts, where each arc/relationship is associated to a number that represents its likelihood among all possible worlds, and to face the problem of sparse knowledge by using generalizations among distant concepts as bridges between disjoint portions of knowledge.
The document presents a new ontology matching system based on a multi-agent architecture. The system takes ontologies described in XML, RDF Schema, and OWL as input. It uses multiple matchers and filtering to generate mappings between ontology entities. The mappings are then validated. The system is implemented as a multi-agent system with different agent types responsible for resources, matching, generating mappings, and filtering/validating mappings. The architecture allows for robust, flexible, and scalable ontology matching.
A Constructive Mathematics approach for NL formal grammarsFederico Gobbo
This document discusses formalizing natural language grammars through adpositional grammars (AdGrams). AdGrams are based on cognitive linguistics concepts of trajector and landmark. The document proposes that natural language structure can be expressed through a triple involving a governor, dependent, and their relation. It provides examples analyzing phrases and sentences as either dependency-based or government-based structures based on whether the dependent or governor is the trajector. The goal of AdGrams is to formalize natural language grammars in a way that is informed by cognitive linguistics concepts and can be computationally analyzed.
The document discusses the basics of ontologies, including their origin in philosophy, definitions, types, benefits and application areas. Some key points are:
- An ontology is a formal specification of a conceptualization used to help humans and programs share knowledge. It establishes a shared vocabulary for exchanging information.
- Ontologies describe domain knowledge and provide an agreed-upon understanding of a domain through concepts and relations. They help solve problems of ambiguity and enable knowledge sharing.
- Ontologies benefit applications like information retrieval, digital libraries, knowledge engineering and natural language processing by facilitating semantic search and integration of data.
This document describes a proposed concept-based mining model that aims to improve document clustering and information retrieval by extracting concepts and semantic relationships rather than just keywords. The model uses natural language processing techniques like part-of-speech tagging and parsing to extract concepts from text. It represents concepts and their relationships in a semantic network and clusters documents based on conceptual similarity rather than term frequency. The model is evaluated using singular value decomposition to increase the precision of key term and phrase extraction.
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONkevig
The aim of Named Entity Recognition (NER) is to identify references of named entities in unstructured documents, and to classify them into pre-defined semantic categories. NER often aids from added background knowledge in the form of gazetteers. However using such a collection does not deal with name variants and cannot resolve ambiguities associated in identifying the entities in context and associating them with predefined categories. We present a semi-supervised NER approach that starts with identifying named entities with a small set of training data. Using the identified named entities, the word and the context features are used to define the pattern. This pattern of each named entity category is used as a seed pattern to identify the named entities in the test set. Pattern scoring and tuple value score enables the generation of the new patterns to identify the named entity categories. We have evaluated the proposed system for English language with the dataset of tagged (IEER) and untagged (CoNLL 2003) named entity corpus and for Tamil language with the documents from the FIRE corpus and yield an average f-measure of 75% for both the languages.
This document summarizes a survey on string similarity matching search techniques. It discusses how string similarity matching is used to find relevant information in text collections. The document reviews different algorithms for string matching, including edit distance, NR-grep, n-grams, and approaches based on hashing and locality-sensitive hashing. It analyzes techniques like pattern matching, threshold-based joins, and vector representations. The goal is to present an overview of the field and compare algorithm performance for similarity searches.
In this paper we present the SMalL Ontology for malicious software classification, SMalL Java Application for antivirus systems comparison and the SMalL knowledge based file format for malware related attacks. We believe that our ontology is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. The application is a prototype regarding how this ontology might be used in conjunction with known antivirus capabilities to offer a comprehensive comparison.
Analogy is one of the most studied representatives of a family of non-classical forms of reasoning working across different domains, usually taken to play a crucial role in creative thought and problem-solving. In the first part of the talk, I will shortly introduce general principles of computational analogy models (relying on a generalization-based approach to analogy-making). We will then have a closer look at Heuristic-Driven Theory Projection (HDTP) as an example for a theoretical framework and implemented system: HDTP computes analogical relations and inferences for domains which are represented using many-sorted first-order logic languages, applying a restricted form of higher-order anti-unification for finding shared structural elements common to both domains. The presentation of the framework will be followed by a few reflections on the "cognitive plausibility" of the approach motivated by theoretical complexity and tractability considerations.
In the second part of the talk I will discuss an application of HDTP to modeling essential parts of concept blending processes as current "hot topic" in Cognitive Science. Here, I will sketch an analogy-inspired formal account of concept blending —developed in the European FP7-funded Concept Invention Theory (COINVENT) project— combining HDTP with mechanisms from Case-Based Reasoning.
Topic detecton by clustering and text miningIRJET Journal
This document discusses topic detection from text documents using text mining and clustering techniques. It proposes extracting keywords from documents, representing topics as groups of keywords, and using k-means clustering on the keywords to group them into topics. The keywords are extracted based on frequency counts and preprocessed by removing stop words and stemming. The k-means clustering algorithm is used to assign keywords to topics represented by cluster centroids, and the centroids are iteratively updated until cluster assignments converge.
This document provides an overview of basic probability concepts and statistical methods. It discusses probability as it relates to outcomes and events, and as the tool used in statistics to make inferences from samples. It then covers specific probability concepts like n-gram models, which use the previous n-1 words to predict the next word. The document also summarizes part-of-speech tagging methods, including rule-based, supervised stochastic, and unsupervised approaches. Freely available POS taggers for various languages are also listed.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
Name __________ Grading Rubric for a Power Point Projec.docxroushhsiu
The document provides a grading rubric for a PowerPoint project. It evaluates various elements of the project on a scale from 1 to 5 in these categories: content, slide creation, slide transitions, pictures/images, mechanics, and technology use. For each category, it describes the criteria for meeting expectations and excelling. Overall, the rubric provides a detailed breakdown of the standards and quality levels for the PowerPoint assignment.
Natural language processing (NLP) is a subfield of artificial intelligence that aims to allow computers to understand human language. NLP involves analyzing and representing text or speech at different linguistic levels for applications like question answering or machine translation. Challenges for NLP include ambiguities in language like lexical, syntactic, semantic, and anaphoric ambiguities. Common NLP tasks include part-of-speech tagging, parsing, named entity recognition, and sentiment analysis. Applications of NLP include text processing, machine translation, speech processing, and converting text to speech.
AN ONTOLOGICAL ANALYSIS AND NATURAL LANGUAGE PROCESSING OF FIGURES OF SPEECHgerogepatton
The purpose of the current paper is to present an ontological analysis to the identification of a particular type of prepositional figures of speech via the identification of inconsistencies in ontological concepts. Prepositional noun phrases are used widely in a multiplicity of domains to describe real world events and activities. However, one aspect that makes a prepositional noun phrase poetical is that the latter suggests a semantic relationship between concepts that does not exist in the real world. The current paper shows that a set of rules based on WordNet classes and an ontology representing human behaviour and properties, can be used to identify figures of speech due to the discrepancies in the semantic relations of the concepts involved. Based on this realization, the paper describes a method for determining poetic vs. non-poetic prepositional figures of speech, using WordNet class hierarchies. The paper also addresses the problem of inconsistency resulting from the assertion of figures of speech in ontological knowledge bases, identifying the problems involved in their representation. Finally, it discusses how a contextualized approach might help to resolve this problem.
AN ONTOLOGICAL ANALYSIS AND NATURAL LANGUAGE PROCESSING OF FIGURES OF SPEECHijaia
The purpose of the current paper is to present an ontological analysis to the identification of a particular type of prepositional figures of speech via the identification of inconsistencies in ontological concepts. Prepositional noun phrases are used widely in a multiplicity of domains to describe real world events and activities. However, one aspect that makes a prepositional noun phrase poetical is that the latter suggests a semantic relationship between concepts that does not exist in the real world. The current paper shows that a set of rules based on WordNet classes and an ontology representing human behaviour and properties, can be used to identify figures of speech due to the discrepancies in the semantic relations of the concepts involved. Based on this realization, the paper describes a method for determining poetic vs. non-poetic prepositional figures of speech, using WordNet class hierarchies. The paper also addresses the problem of inconsistency resulting from the assertion of figures of speech in ontological knowledge bases, identifying the problems involved in their representation. Finally, it discusses how a contextualized approach might help to resolve this problem
Temple Univeresity Digital Scholarship: Model of the Month Club: Modeling Con...Liz Rodrigues
This document summarizes a presentation on modeling plot and conversion narratives in novels computationally. It discusses using vector space models and clustering algorithms to analyze the linguistic patterns in Augustine's Confessions as a prototype conversion narrative. A two-part model is developed to measure the linguistic distance between early and late sections of a work and the internal distances within those sections. Several novels are identified as strongly exhibiting these patterns of conversion. The presentation proposes a taxonomy of conversional narratives and associated hypotheses to test, such as measuring nature/culture vocabularies and communication themes.
This document discusses using databases and SQL to store and organize text data. It explains that arrays in PHP can be used to represent text as data structures like tables and trees, but databases provide more efficient storage and retrieval. Specifically, relational databases use SQL, which allows defining schemas to represent ontologies and then querying the data through logical operations. The document introduces MySQL as an open source relational database and phpMyAdmin as a PHP interface for managing MySQL databases.
This document presents a project report on sarcasm analysis using machine learning techniques. It discusses how sarcasm detection is a challenging task in natural language processing due to the gap between the literal and intended meaning of sarcastic texts. The report outlines a methodology to detect sarcasm in tweets by extracting features like intensifiers and interjections and training machine learning classifiers. Naive Bayes, maximum entropy, and decision tree classifiers are tested, with decision trees achieving the highest accuracy of 63%. The conclusion discusses how accuracy could be improved by incorporating better features, and future work includes adding context and detecting sarcasm in other languages.
Philosophy 7 Asian Philosophy (Fall 2019) Paper Guidelines .docxssuser562afc1
Philosophy 7: Asian Philosophy (Fall 2019)
Paper Guidelines
1
Paper #3: Chinese Philosophy
You may choose to write about either Confucianism (A) or Daoism (B).
(A) Confucianism: Kongzi (Confucius) or Mengzi (Mencius)
Choose a passage from one of the primary Confucian texts that we read: The Analects or
The Mengzi. Whatever you choose, you must confine your essay to one of our authors’
texts: either Confucius’ Analects or Mencius’ Mengzi. You may choose any passage you
like but you may only write within the context of one of the two thinkers.
and
Analyze and explain it as thoroughly and precisely as you can, staying close to the text
of the author you choose (using its terminology, following its reasoning, etc.). This point
is important: refer to, quote, paraphrase, and cite Confucius’ or Mencius’ text—his words, his
terms, his explanations, his examples, etc.—to aid your explanation of the idea. The closer you
stay to the text, the clearer your explanation will be.
(B) Daoism: Laozi or Zhuangzi
Choose a passage from one of the primary Daoist texts that we read: The Daodejing or
The Zhuangzi. Whatever you choose, you must confine your essay to one of our authors’
texts: either Laozi’s Daodejing or Zhuangzi’s Zhuangzi. You may choose any passage you
like but you may only write within the context of one of the two thinkers.
and
Analyze and explain it as thoroughly and precisely as you can, staying close to the text
of the author you choose (using its terminology, following its reasoning, etc.). This point
is important: refer to, quote, paraphrase, and cite Laozi’s or Zhuangzi’s text—his words, his
terms, his explanations, his examples, etc.—to aid your explanation of the idea. The closer you
stay to the text, the clearer your explanation will be.
Note on Daoism: Remember that these specific texts are notoriously opaque and
mysterious, and their purpose seems to be, quite explicitly in some cases, to effect an
experiential change in thinking on the part of the reader. So, if you choose this option, give
yourself time to let the text affect you and wash over you. It is common that the sense of
particular passages vacillates and shifts as one reads them again and again. So try—
without trying, of course (i.e., in a wu wei fashion)—to give yourself ample room to
maneuver within the text’s mysterious spaces, as Zhuangzi’s butcher’s blade
maneuver’s freely within the heavenly contours of the ox’s carcass.
Philosophy 7: Asian Philosophy (Fall 2019)
Paper Guidelines
2
In these papers, I want you to try to capture the essence of what you choose. You might
imagine that what you are trying to do is teach someone what passage means within the
context of Confucianism or Daoism.
I am looking for in-depth and detailed analysis/explanation.
Paper Details
Due Date
SUNDAY, May 3rd on Canvas by MIDNIGHT
Paper Length
At least 3 full pages of text (“full” beginning.
Philosophy 7 Asian Philosophy (Fall 2019) Paper Guidelines .docxkarlhennesey
Philosophy 7: Asian Philosophy (Fall 2019)
Paper Guidelines
1
Paper #3: Chinese Philosophy
You may choose to write about either Confucianism (A) or Daoism (B).
(A) Confucianism: Kongzi (Confucius) or Mengzi (Mencius)
Choose a passage from one of the primary Confucian texts that we read: The Analects or
The Mengzi. Whatever you choose, you must confine your essay to one of our authors’
texts: either Confucius’ Analects or Mencius’ Mengzi. You may choose any passage you
like but you may only write within the context of one of the two thinkers.
and
Analyze and explain it as thoroughly and precisely as you can, staying close to the text
of the author you choose (using its terminology, following its reasoning, etc.). This point
is important: refer to, quote, paraphrase, and cite Confucius’ or Mencius’ text—his words, his
terms, his explanations, his examples, etc.—to aid your explanation of the idea. The closer you
stay to the text, the clearer your explanation will be.
(B) Daoism: Laozi or Zhuangzi
Choose a passage from one of the primary Daoist texts that we read: The Daodejing or
The Zhuangzi. Whatever you choose, you must confine your essay to one of our authors’
texts: either Laozi’s Daodejing or Zhuangzi’s Zhuangzi. You may choose any passage you
like but you may only write within the context of one of the two thinkers.
and
Analyze and explain it as thoroughly and precisely as you can, staying close to the text
of the author you choose (using its terminology, following its reasoning, etc.). This point
is important: refer to, quote, paraphrase, and cite Laozi’s or Zhuangzi’s text—his words, his
terms, his explanations, his examples, etc.—to aid your explanation of the idea. The closer you
stay to the text, the clearer your explanation will be.
Note on Daoism: Remember that these specific texts are notoriously opaque and
mysterious, and their purpose seems to be, quite explicitly in some cases, to effect an
experiential change in thinking on the part of the reader. So, if you choose this option, give
yourself time to let the text affect you and wash over you. It is common that the sense of
particular passages vacillates and shifts as one reads them again and again. So try—
without trying, of course (i.e., in a wu wei fashion)—to give yourself ample room to
maneuver within the text’s mysterious spaces, as Zhuangzi’s butcher’s blade
maneuver’s freely within the heavenly contours of the ox’s carcass.
Philosophy 7: Asian Philosophy (Fall 2019)
Paper Guidelines
2
In these papers, I want you to try to capture the essence of what you choose. You might
imagine that what you are trying to do is teach someone what passage means within the
context of Confucianism or Daoism.
I am looking for in-depth and detailed analysis/explanation.
Paper Details
Due Date
SUNDAY, May 3rd on Canvas by MIDNIGHT
Paper Length
At least 3 full pages of text (“full” beginning ...
1) Artificial Intelligence research draws from many disciplines including formal logic, probability theory, linguistics and philosophy. Computational logic combines and improves upon traditional logic and decision theory.
2) The paper argues that the abductive logic programming (ALP) agent model is a powerful model of both descriptive and normative thinking. It includes production systems and is compatible with classical logic and decision theory.
3) The ALP agent model treats beliefs as describing the world and goals as describing how the world should be. Its semantics aim to generate actions and assumptions to make goals and observations true based on beliefs.
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceDaniel Lewis
At the computational intelligence unconference 2014, Marcelo Funes-Gallanzi presented Simplish, a system for the conversion of text into Simple English. Here are his slides.
If you do not know how to write an introduction, body paragraphs, and conclusion for your essay, or need some help with formatting and structuring of your paper, we are happy to present our full guide on Essay Writing Tips developed by educational experts from EssayTask.com.
This document discusses different perspectives on metaphors and challenges in recognizing and representing metaphors computationally. It reviews ongoing research aiming to accomplish these goals. There is no consensus on what constitutes a metaphor. Linguistic metaphors found in examples are distinguished from cognitive metaphors, which are not linguistic phenomena. Metaphors can involve referential substitution or predication. Framing and context are important for interpretation. Metaphors may be inexhaustible or conventionalized into new meanings. Recognizing metaphors computationally is difficult as literal and figurative meanings depend on context and intention.
The document provides guidelines for writing an abstract, including what an abstract is, its purpose, length, content, and other considerations. An abstract should be a concise summary of the key points of the paper in 3-4 sentences or less, including the objectives, methods, results, and conclusions. It is important that the abstract provides enough information to allow the reader to understand the main topics and conclusions of the paper without having to read the full paper.
The document discusses the relationship between research and teaching in library and information science (LIS) curricula. It argues that research is essential to LIS teaching and that teaching should focus on asking questions rather than just transmitting established knowledge. It also notes the need for LIS terminology to change as the field evolves from focusing on catalogs to linked open data and graphs.
1. Exploring Metaphorical data on Technical
Documents
Lakshmi Himabindu Jonnalagadda
Department of Computer Science
University of North Carolina
Charlotte – 28262
hjonnala@uncc.edu
Abstract--Metaphor is a figure of speech in which a word or a
phrase is applied to an object to which it is not literally
applicable. A csv or text file is taken which contains a
reasonable number of metaphors that are collected from
different websites. Another huge technical document is taken on
which analysis is done to identify if any metaphors collected
earlier are present or not. Initially, this is implemented using
Java programming language and regular expression concepts.
Later on, jobs might be run on Hadoop to identify if technical
documents contain any metaphors that were collected into the
text file.
This report focuses on how exactly metaphors are identified,
problems faced, results obtained and insights gained while doing
this project.
Key words: Metaphors, Shakespeare, Hadoop, and technical
documents,regular expressions.
I. INTRODUCTION
Metaphors are collected from the Internet from various
websites. All these metaphors are collated into a csv file format
or a txt file format so that it can be used for comparison later on.
Over One thousand and eight hundred metaphors are collected
into the csv file, which includes very simple ones like
“Sea of fire
And also long ones like ‘All the world's a stage, and all
the men and women merely players. They have their
exits and their entrances’.
The job interview was a rope ladder dropped from
heaven.
Words are the weapons with which we wound.”[1]
Few metaphors are common that we use in our everyday lives
like ‘Boiling mad’, ‘Breaking News’, ‘Brilliant Idea’, ‘Her
bubbly personality’ etc.
Complex or poetic metaphors are also collected that
Shakespeare used in his Sonnets and in his other works. Few
popular examples of Shakespeare metaphors are:
“Look, love, what envious streaks
Do lace the severing clouds in yonder East:
Night's candles are burnt out, and jocund day
Stands tiptoe on the misty mountain tops” [5]
“Why art thou yet so fair? Shall I believe
that unsubstantial Death is amorous;
And that the lean abhorrèd monster keeps
Thee here in dark to be his paramour?” [5]
“His two chamberlains Will I with wine and wassail so
convince, That memory, the warder of the brain, Shall
be a fume, and the receipt of reason A limbeck only.”
[5]
“Time travels in divers paces with divers persons . . .
I’ll tell you who Ti m e ambles withal, who Time trots
withal, who Time gallops withal, and who he stands tall
withal”[9]
"So, haply slander--Whose whisper o'er the world's
diameter, As level as the cannon to his blank,
Transports his poison'd shot--may miss our name,
And hit the woundless air." [5]
Shakespeare used Rose as a metaphor to symbolize a married
woman, while a rose that is withered on the stem describes a
spinster. Shakespeare used metaphors to describe many more
topics such as life, time, universe, etc. His plays are full of
metaphors mainly related to birds, war, music, food, clothing,
love etc.
Initially regular expressions are used to check if any
metaphors saved in our dataset is present in technical documents.
A regular expression is a string used to describe a search pattern.
It is used to search any patterns in just one line even if program
is written in Java, C, .Net or PHP. Quantifiers are used for
searching patterns. These Quantifiers ‘?’, ‘*’, ‘+’ are used more
for searching patterns. For example: b*c regular expression
matches strings c, bc, bbc, bbbc etc. * searches for zero or more
occurrences of the preceding element. And ‘?’ searches for zero
or one occurrences of element preceding it. A simple exampleon
how to regular expressions is:
Regular expression for writing a username will be
something like this ^[ 𝑎 − 𝑧0 − 9− −]{3,15}$. Here ^ indicates
start of a line. A-z0-9 represents that username can have
alphabets and digits in it. Underscore and hyphen in the RE
matches if there are any underscore or hyphen in the username.
{3,15} says that username can have a minimum length of 3
characters and a maximumof 15 characters. There are tools like
GREP and PowerGrep using which regular expressions can be
used more efficiently.
Hadoop framework is used to run jobs on it, to verify if any
metaphors are present in technical documents or not. Hadoop is
an open source framework where we can store and process large
2. amounts of data making use of its distributed environment. It is
highly reliable and scalable.
Natural Language Processing is growing field, which is related
to Human Computer Interaction, and Artificial Intelligence. In
comparing text between two files, we are making use of few
concepts from Natural Language Processing”.[7]
II. SUMMARY OF ARTICLES ON METAPHORS
Many researches have been performed on finding metaphors
in text automatically using Natural Language Processing
concepts. Processing of metaphors automatically can be divided
into two tasks: metaphor recognition and metaphor interpretation.
Metaphor Recognition is finding the difference between a literal
and metaphor in a text document. Metaphor Interpretation is
finding the meaning of the metaphor. Metaphors are discussed in
four views, they are:
a) Comparison View
b) Interpretation View
c) Conceptual View
d) Selections Restrictions view
Fass made the first approach to identify metaphors
automatically in a text document. He developed a system to
distinguish between literals, metaphors and anomalies and also
to interpret metaphors; this was done in step-by-step process.
(Matter to be added here)
Results: Five types of metaphors are identified during research
process.
a) Metaphors of Space: Many metaphors are found in the
area of space. Largest metaphor was “field” followed
by “area” Other metaphors in this category are
“byways”, “regions” etc. Researchers refer to their
work as part of a particular “field” or “area” [6]
b) Metaphors of Travel: Word that was found most in
metaphors under this category is “Steps”. Other words
found are “Track”, “Path”, and “Journey”. Words like
“flow”, “Sprint”, “Wading”, and “Embark” which
indicate movement. This kind of analogy gives the
reader the thought of investigation, of opening up new
ranges of examination, of taking off into the separation
to discover new information. It proposes a sense of
development included in research that exploration
requires a considerable measure of activity to convey it
to realization that nothing is found by sitting still, just
by moving into the unknown.
c) Metaphors of Action: Large number of words is found
commonly in metaphors under this category. Words
like “Working”, “Delve”, “Reap”, and “Combing”
which refer to some action involved in conducting
research.
d) Metaphors of Body: Number of words were found in
metaphors are related to human body and animal body.
For example: words like “Body”, “Corpus”, “Grasp”,
“Infancy”. It is found that research in metaphors
doesn’t limit to a specific field, but it is spread across
various areas.
e) Metaphors of Ordeal: Words related to ordeal are used
in metaphors under this context. “Struggle”, “Fighting”,
“Crushed”, “Drown”, and “Inflict” etc. are used.
A. The Comparison of Metaphorical Concepts:
“The comparison of metaphorical concepts accounts for a
number of different actions and experiences. Barkfelt (2003) for
example in her study on metaphors of depression, found in
autobiographical writings, works out that some authors
experienced their illness as a light-dark contrast ("die Welt wird
zunehmend grau" - "the world is becoming increasingly grey");
others described their depression as an "Überfall" ("attack"),
which hits them unexpectedly and "niederwirft" ("knocked them
down"). The comparison of the two metaphorical concepts
points to different experiences of the illness, which manifests
itself at different speeds. The use of metaphor in terms of light
dark gives the perception of a transition, thus allowing room for
maneuvers, which is not possible when depression is perceived
as an "attack." In the latter, on the other hand, the illness is more
clearly defined as a personal and dangerous enemy than in the
first metaphorical concept. From this, Barkfelt derives a number
of different options for linguistic or therapeutic intervention. Put
more generally: The comparison of metaphorical concepts with
the models of actions they contain allows certain conclusions to
be drawn. However, these conclusions are only possible if the
context is understood fully. Barkfelt is only able to draw such
conclusions because she is able to recognize the various
implications, due to her competence in the field as a therapist, of
the metaphorical concept of depression – beyond any specialist
manual, which might simplify the process of coming to these
conclusions but which cannot produce them”.
B. Limits to the Use of Metaphor:
“In examining the question: "What is the expression-shortening,
knowledge preventing content of the metaphors used?" it is
possible to work out the "hiding" elements, the ideological and
cognitive deficits of a metaphorical concept. Which aspects does
this use of metaphor conceal? Again, making use of the
container image, it is not able to represent temporal aspects; one
is either "dicht" ("shut") or "nicht dicht" ("not shut"). The
"Verlauf" ("passing") of time is better described in the use of the
path metaphor ("im Leben weiter kommen" - "to make headway
in life", "to get ahead", to make "progress"): The image of the
container is not able to do this. Another example that we are
familiar with is the image of the "Großwetterlage" ("general
weather conditions"); used by the media to describe the
economic situation. The metamorphosis of market movements
into nature disguises the fact that one is dealing with a man-
made phenomenon. In nearly all cases, this use of metaphor can
form the basis of a discussion ofadvantages and disadvantages.
3. The deficits and resources of a metaphorical concept can be
reconstructed for the three stages in the usual individual, sub-
cultural and cultural use of metaphor. Naturally, the process of
assessment, in being able to see one aspect of a metaphor as
"highlighting" and another as "hiding," requires a subjectivity
that is able to draw on a culture that has been lived in and is
understood. It is therefore dependent on the discriminatory
ability of the person undertaking the interpretation”. [7]
C. Rules to Identify Metaphors:
Rules to identify metaphors are listed as follows:
a) “A word or phrase, strictly-speaking, can be understood
beyond the literal meaning in context of what is being
said; and
b) The literal meaning stems from an area of physical or
cultural experience (source area)
c) Which, however, is - in this context - transferred to a
second,often abstract, area (target area)”. [7]
The following examples of metaphors and their explanations are
taken from the paper titled “Systematic Metaphor Analysis as a
Method of Qualitative Research” by Rudolf Schmitt.
EXAMPLE 1:
“You just don’t experience problems like that as being so
weighty (‘gewichtig’) when you are drunk.”
“It is easier (‘leichter’) to get into a conversation with people
when you’re no longer sober.”
“It was simply less burdensome (‘unbeschwerter’) after the
second beer.”
All three quotations are related to various states of drunkenness,
which is also the target area in a current investigation entitled
Which Experiences and Expectations are related to Alcohol
Consumption?
The common source area can be formulated in terms of a
“burden”, “effort”, and “weight” – the most suitable term will
become evident upon the discovery of further metaphors.
Thus, I would offer the following as an initial formulation of the
metaphorical concept:
“Drunkenness makes difficulties easier to bear”. [7]
EXAMPLE 2:
“They met (‘getroffen’) there and got into an argument.”
“He tried to find (‘finden’) a way to reach him” (“Zugang zu
ihm”).
It could be argued that to meet (“treffen”) and find (“finden”)
require space to take place... But in this case it is a very strained
construction of a source area, which uses “space” in its most
abstract quality (i.e., it’s somehow simply being present).
Speaking metaphorically, it is an attempt to give a skinhead a
perm. Therefore, we have no common source area, no
metaphorical model here, even if the target area (interaction) is
the same. [7]
EXAMPLE 3:
“He got out of his way” (“aus dem Weg gegangen”).
387 The Qualitative Report June 2005
“He is making progress (‘Fortschritte’) with his therapy.”
The same source area (path metaphor) but no common target
area. First interaction then individual development, therefore not
suitable for grouping in a common model. [7]
EXAMPLE 4:
“She bubbled over with life” (‘gesprudelt vor Leben’).
“She effervesced (‘gesprüht’) as she told her story.”
“And then the dams burst (‘Dämme gebrochen’) as she told her
story and wept.”
We are able to ascribe the metaphors to the same source area
(moving liquid) and the same target area (emotional exchange).
The corresponding titles might be:
Emotional vitality is running water.
Emotional vitality is overflowing water.
Emotional vitality is pressurized liquid.
A decision for one of these titles cannot yet be made; they are
provisional constructions. [7]
Experience shows that it is also too early to formulate a title
based on just three metaphors. We may well find further
metaphors to add to the image of the bursting dams.
It is true but hardly noticed by many that all of us speak in
metaphors quite often. Nobody has realized that we even live by
metaphors. George Lakoff came up with a huge list of
metaphors based on the different aspects of life. According to
him metaphors shape and refine our way of thinking, makes us
more imaginative and helps us in structuring our thoughts in a
better way. An example is quoted here, that is taken from a
source online “Thinking of marriage as a "contract agreement,"
for example, leads to one set of expectations, while thinking of it
as "teamplay," "a negotiated settlement," "Russian roulette," "an
indissoluble merger," or "a religious sacrament" will carry
different sets of expectations. When a government thinks of its
enemies as "turkeys or "clowns" it does not take them as serious
threats, but if the are "pawns" in the hands of the communists,
they are taken seriously indeed.” [8]
D. Concepts and metaphors we live by:
Most people think that they don’t need and use metaphors in
their daily lives but George Lakoff says that according to his
studies, metaphors are a important part of everyone’s lives even
if they fail to notice it. In his article, George took an example to
explain how metaphors are thought of as concepts and how they
are used in everyday lives. A conceptual example “Metaphor is
War” is taken. [8] This conceptual metaphor is used in different
forms and expressions by many of us.Those are:
“Your claims are indefensible.
He attacked every weak point in my argument.
His criticisms were right on target.
I demolished his argument.
I've never won an argument with him.
You disagree? Okay, shoot!
If you use that strategy, he'll wipe you out.
4. He shot down all of my arguments.” [8]
Here, Argument is war metaphor is the one that we live by, or to
be clearer, it means that we structure our actions based on the
term arguing. People can also have different views and thoughts
about arguments or doesn’t consider the above statements as
arguments at all. This is what makes it a conceptual metaphor.
“The essence of metaphor is understanding and experiencing
one kind of thing in terms of another” [8]
‘Time’ is explained as how it is used in metaphors by using
simple English words. Few metaphors are quoted here that are
taken from Lakoff list of metaphors.
“You're wasting my time.
This gadget will save you hours. I don't have the time
to give you.
How do you spend your time these days? That flat tire
cost me an hour.
I've invested a lot of time in her.
I don't have enough time to spare for that. You're
running out of time.
You need to budget yourtime.
Put aside aside some time for ping-pong.
Is that worth your while?
Do you have much time left?
He's living on I borrowed time.
You don't use your time, profitably.
I lost a lot of time when I got sick.
Thank you for your time.
She spends hertime unwisely.
The diversion should buy him some time.
Time is money
Time heals all wounds.
Time will make you forget.
Time had made her look old.
Time had not been kind to him.
The ravages of time”. [1] [8]
A significantly more subtle instance of how a metaphorical
idea can conceal a part of our experience can be seen in what
Michael Reddy has called the "conduit metaphor."' Reddy
watches that our dialect about language is organized generally
by the following metaphor. Conduit metaphors are those in
which an idea is put into words and sent to a person. Michael
Reddy documented many metaphors in this category during his
research. Few examples of those are taken from his works:
“It's hard to get that idea across to him.
I gave you that idea.
Your reasons came through to us.
It's difficult to put my ideas into words.
When you have a good idea, try to capture it
immediately in words.
Try to pack more thought into fewer words.
You can't simply stuff ideas into a sentence any old
way.
The meaning is right there in the words.
Don't force your meanings into the wrong words.
His words carry little meaning.
The introduction has a great deal of thought content.
Your words seem hollow.
The sentence is without meaning.
The idea is buried in terribly dense paragraphs”.[8]
There are also orientation metaphors where emotions or words
are described using directions/orientations. For example: Happy
is up and sad is down. Here direction up is used to denote
happiness as it has a positive meaning. Similarly, direction down
is used to show sadness. Few examples are taken from George
Lakoff papers and his works. They are:
“I'm feeling up.
That boosted my spirits.
My spirits rose.
You’re in high spirits.
Thinking about her always gives me a lift.
I'm feeling down.
He's really low these days.
My spirits sank.” [8]
Orientation up always has a positive meaning whereas down
always has a negative meaning. Similar examples pointing to
direction up and down are:
The number of books printed each year keeps going up.
His draft number is high.
My income rose last year.
The amount of artistic activity in this state has gone
down in the past year.
The number of errors he made is incredibly low.
His income fell last year.
If you're 100 hot,turn the heat down. [8]
III. PROJECT DESCRIPTION
Metaphors are collected into a csv or text file from various
sources. A program is written in Java programming language
using regex matcher and regex pattern classes. Regex Matcher
class has a Matcher method that matches a complete input line
against a pattern. Java Matcher class is used to find multiple
occurrences of a regular expression in a text file. Pattern class is
used to work with regular expressions. As this class identifies
the patterns of regular expressions. This is also called as pattern
matching. Pattern.Matches can be used to see if a text matches
the regular expression (pattern). Pattern.compile can also be
used with a pattern object to check for the text matching the
regular expression. This is useful when multiple files are to be
compared.
Metaphors file is taken as input into ArrayList and similarly
technical document is also taken into the array. Here ArrayLists
5. are used, as it is dynamic in nature. It does not allocate fixed
memory like in arrays; memory can change depending on the
file used. This is very helpful as comparison is done using
different files and length of it is bound to change.
ArrayList<String> metaphor_list = new
ArrayList<String>();
ArrayList<String> Shakespeare = new
ArrayList<String>();
readFileIntoArray("metaphors.txt", metaphor_list);
readFileIntoArray("Shakespeare.txt", Shakespeare);
A. Comparison against Shakespeare document:
Comparison and analysis was first done on Shakespeare
document. It contains few poems Shakespeare had written and
few scenes taken from Macbeth and Hamlet. It also contains
Shakespeare’s Sonnets. On the whole it is a collection of his
works.
for(String s : Shakespeare)
{
for(String m : metaphor_list)
{
Instances for Shakespeare file and metaphor list are created.
Pattern r = Pattern.compile(m.trim());
Matcher mr = r.matcher(s);
Here, pattern instance is created, and then a Matcher instance is
created from it. Any text that matches the text in metaphor list
are retrieved and displayed on the console. If no metaphors are
found then a message is printed saying “There are no metaphors
in the file”.
FileReader and BufferedReader are used to read a file.
FileNotFoundException and IOException is put in catch block
to prevent bad behavior at run time.
Fig 1. Metaphors retrievedfromShakespeare document.
Fig 2: Metaphors retrievedfrom Shakespeare document.
Results:
The java program written checks if any metaphors saved in
our dataset are present in Shakespeare document or not. And a
reasonable number of metaphors are retrieved from it.
As it is seen from the screenshot, twenty-two metaphors are
retrieved from Shakespeare document. Few of those are:
But thy eternal summer shall not fade
For who's so dumb that cannot write to thee, When thou
thy self dost give invention light Be thou the tenth
Muse, ten times more in worth Than those old nine
which rhymers invocate, And he that calls on thee, let
him bring forth.
For every storm that blows.
Fell from their boughs,and left me open,bare.
In his bright radiance and collateral light Must I be
comforted, not in his sphere.
Tis a commodity will lose the gloss with lying; the
longer kept, the less worth.
Off with't while 'tis vendible; answer the time of
request.
These three world-sharers, these competitors, Are in
thy vessel.
Let me cut the cable; and when we are put off, fall to
their throats.
All there is thine.
They have their exits and their entrances.
All the world's a stage.
My gentle Puck, come hither.
Ten times thy self were happier than thou art, If ten of
thine ten times refigured thee: Then what could death
do if thou shouldst depart, Leaving thee living in
posterity?.
6. But myself,Who had the world as my
confectionary;The mouths, the tongues, the eyes, and
hearts of menAt duty, more than I could frame
employment;That numberless upon me stuck, as
leavesDo on an oak, have with one Winter's brushFell
from their boughs, and left me open, bareFor every
storm that blows.
Here, two metaphors are retrieved twice, as it occurs twice in the
Shakespeare document.
B. Comparison against Milton document:
The same Java program can be on another poetic document
‘Milton’ just be changing name in this command:
readFileIntoArray("Shakespeare.txt", Shakespeare);
Here, Shakespeare.txt is replaced by Milton.txt. Nine metaphors
were retrieved from it as well and screenshot of its results can be
seen below:
All meanly wrapt in the rude manger lies.
At last surrounds theirsight, A globe of circular light,
That with long beams the shame faced night arrayed
The helmed Cherubim And sworded Seraphim, Are
seen in glittering ranks with wings displaid, Harping in
loud and solemn quire, With unexpressive notes to
Heav'ns new-born Heir.
His principles being ceast,he ended strait.
They left me weary on a grassie terf.
FAIREST flower no soonerblown but blasted, Soft
silken Primrose fading timelesslie, Summers chief
honourif thou hadst outlasted Bleak winters force that
made thy blossome drie; For he being amorous on that
lovely die That did thy cheek envermeil, thought to
kiss But kill'd alas, and then bewayl'd his fatal bliss.
Fig 3: Metaphors retrievedfrom Miltondocument.
And peace shall lull him in her flowry lap.
When I considerhow my light is spent, E're half my
days,in this dark world and wide.
C. Checking on Patent Documents
Patent documents downloaded from USPTO Patent website
are used in the next step. Downloaded from Patent Grant Full
Text (1976 – Present) link, and verified on patents of 2014, 2013,
2012, and 2011. It is in xml file format. Xml tags are removed
and analysis is done on the document. Only two metaphors were
found in one document from 2013 archive. To continue with this
research, few more patent documents can be analyzed to see if
any more metaphors can be retrieved.
Fig 4 Metaphors retrievedfrom one ofthe 2013patent documents.
No metaphors were retrieved from 2012, 2014, 2015 patent
documents. Not all documents are analyzed from these archives,
but the ones used didn’t retrieve any metaphors.
Fig 5: Screenshot showingno metaphors retrievedfrom2012patent document.
7. IV. WORKING IN HADOOP
Firstly, Hadoop has to be installed on the machine.
Requirements to install Hadoop on a Mac OS X are:
a) Java Version 1.6 or later
b) Ruby, Home brew
c) SSH and Remote Login
Homebrew installs Hadoop 2.3. Commands given to install
Hadoop using brew is $brew install Hadoop. Homebrew is a
package manager that installs and uninstalls software. This tool
is of great help as Hadoop installation makes it easier.
Homebrew installs a single node cluster. Commands used to run
Hadoop locally on machine are:
$ ./start –dfs.sh
$ ./start –yarn.sh
$jps is a command used to check how many nodes are running
and to verify if Hadoop is running or not.
Access to Hadoop on university server has to be granted. After it
is granted, check if all the nodes are running or not.
Technical documents like USPTO patent files are to be used to
check for metaphors against our dataset. For more analysis
poetic documents like Shakespeare and Milton can also be used
to check for metaphors.
V. CONCLUSION
Metaphors are expressions that are used to describe a thing,
object or a situation etc. Metaphors can be divided into various
categories. For this project, documents were compared against a
couple of thousand metaphors. Around twenty-two metaphors
were retrieved from Shakespeare document, nine from Milton
and a very less number from one patent document. Whole
project is done in Java programming language, using the concept
of regular expressions.
VI. FUTURE SCOPE
Dataset of metaphors taken here is small when compared to the
ocean of metaphors that are available. The dataset can be
modified and compared against various other technical and
poetic documents. Time taken by the program to retrieve results
is a bit high, which can be optimized in the future.
REFERENCES
[1] Index of /lakoff/ metaphors, retrieved from
http://www.lang.osaka-u.ac.jp/ ~sugimoto / MasterMetaphorList
/metaphors/
[2] Metaphorexamples, E reading worksheets, retrieved from
http://www.ereadingworksheets.com/figurative-
language/figurative-language-examples/metaphor-examples/
[3] Metaphorexamples, retrieved from
http://examples.yourdictionary.com/metaphor-examples.html
[4] MetaphorDefination, Literary Devices, retrieved from
http://literarydevices.net/metaphor/
[5] Shakespeare's Metaphors, A compliation of Shakespeare's
most powerful metaphors by Shakespearean scholar Henry
Norman Hudson.Available: http://www.shakespeare-
online.com/biography/metaphorlist.html
[6] Pitcher, Rod. "The metaphors that research students live
by." The qualitative report 18.36 (2013): 1-8.
[7] Schmitt, Rudolph. "Systematic metaphor analysis as a
method of qualitative research." The Qualitative Report 10.2
(2005): 358-394.
[8] The Systematicity of Metaphorical Concepts,retrieved from
http://theliterarylink.com/metaphors.html
[9] Henry V, William Shakespeare, retrieved from
http://www.schillerinstitut.dk/metafor240312.pdf
[10] 10 Java Regular expressions you should know, by mkyong,
retrieved from http://www.mkyong.com/regular-expressions/10-
java-regular-expression-examples-you-should-know/
[11] USPTO Patent Grant Full Text, retrieved from
https://www.google.com/googlebooks/uspto-patents-grants-
text.html#2013
[12] 200 short and sweet metaphor examples, retrieved from
http://literarydevices.net/a-huge-list-of-short-metaphor-
examples/
[13] Shakespeare's Metaphors and Similes, from Shakespeare:
His Life, Art, and Characters, Volume I. New York: Ginn and
Co. Available: http://www.shakespeare-
online.com/biography/imagery.html
[14] SHAKESPEARE’S SONNETS , BY William Shakespeare,
retrieved from
http://www.sparknotes.com/shakespeare/shakesonnets/section4.r
html
[15] Shakespeare’s most popular quotes by Shakespeare,
retrieved from
http://www.shakespearemag.com/summer03/dozen.asp
[16] Meta, Milton and Metaphor: Models of Subjective
Experience, by Penny Tompkins and James Lawley, First
published in Rapport,journal of the Association for NLP (UK),
Issue 36, August 1996. Available:
8. http://www.cleanlanguage.co.uk/articles/articles/2/1/Meta-
Milton-Metaphor-Models-of-Subjective-Experience/Page1.html
[17] Metaphorically Speaking Unlocking the meaning of
Shakespeare’s metaphors. Retrieved from
http://teacher.scholastic.com/lessonrepro/reproducibles/profbook
s/shakespeare.pdf
[18] Regular Expression Language - Quick Reference, retrieved
from https://msdn.microsoft.com/en-
us/library/az24scfc(v=vs.110).aspx
[19] Java Regex - Pattern (java.util.regex.Pattern), retrieved
from http://tutorials.jenkov.com/java-regex/pattern.html
[20] Methods ofthe Pattern Class, retrieved from
https://docs.oracle.com/javase/tutorial/essential/regex/pattern.ht
ml
[21] Java Regex - Matcher (java.util.regex.Matcher), retrieved
from http://tutorials.jenkov.com/java-regex/matcher.html
APPENDIX
Source Code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.*;
import java.util.ArrayList;
import java.util.StringTokenizer;
public class Test {
public static void main(String [] args) {
ArrayList<String> metaphor_list = new
ArrayList<String>();
ArrayList<String> Shakespeare = new
ArrayList<String>();
readFileIntoArray("metaphors.txt", metaphor_list);
readFileIntoArray(“Shakespeare.txt", Shakespeare);
int i =0;
int j=0;
for(String s : Shakespeare)
{
for(String m : metaphor_list)
{
Pattern r = Pattern.compile(m.trim());
Matcher mr = r.matcher(s);
if(mr.find()) {
System.out.println(m);
System.out.printf("%n");
j++;
}
//System.out.println(s);
}
}
System.out.println("Number of Metaphors found are : "
+j);
if (j==0)
System.out.println("There are no metaphors in the
file");
}
public static void readFileIntoArray(String fileName,
ArrayList<String> list)
{
String line = null;
StringBuilder builder = new StringBuilder();
try {
FileReader fileReader = new FileReader(fileName);
BufferedReader bufferedReader = new
BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
builder.append(line);
}
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println("Unable to open file '" + fileName +
"'");
}
catch(IOException ex) {
System.out.println("Error reading file '"+ fileName +
"'");
}
catch(OutOfMemoryError ex){
System.out.println("File occupying too much space,
unable to open '" + fileName + "'");
}
StringTokenizer stringTokenizer = new
StringTokenizer(builder.toString(), ".");
while (stringTokenizer.hasMoreTokens())
{
list.add(stringTokenizer.nextToken()+".");
}
}
}