The document discusses concept-based representation and translation for multilingual information systems. It presents the MultiNet paradigm for semantic representation using multilayered extended semantic networks. Key elements include the HaGenLex computational lexicon, semantic analysis using the WOCADI parser, and representing concepts and relations between them. The goal is concept-based information retrieval and question answering across languages.
Towards Interaction Models Derived From Eye-tracking Data .jacekg
This document discusses using eye tracking data to develop interaction models. It proposes a two-state reading model to analyze eye movement patterns and correlate them with higher-level constructs like task characteristics and user knowledge. Two user studies are described that use eye tracking to measure cognitive effort across different tasks and assess how well eye tracking data can predict a user's self-reported domain knowledge. The goal is to develop models from eye tracking data that can be used to better understand, adapt, and enable user interactions.
DynaLearn: Problem-based learning supported by semantic techniquesOscar Corcho
This document describes a system that supports problem-based learning through semantic techniques. The system grounds learner models in semantic repositories to enable semantic-based feedback. It analyzes learner models and reference models to identify discrepancies in terminology, taxonomy, and qualitative reasoning structures. Suggestions are generated and filtered based on agreement across multiple reference models. The system aims to bridge gaps between learner and expert terminology and provide automated feedback to support the learning process.
Characterising the Emergent Semantics in Twitter ListsOscar Corcho
This document summarizes research analyzing the emergent semantics of lists and list names on Twitter. The researchers investigated whether related keywords can be identified from list names according to how they are used by different user roles (curators, subscribers, members). They used a dataset of over 297,000 lists to extract keywords from list names and model their relationships based on these user roles. Their experiments analyzed the semantics of related keyword pairs using techniques like WordNet searches and found that relationships identified based on members had the highest percentage of direct semantic relations like synonyms.
This presentation focuses on three mai component that are relevant to implement and achieve language competencies. i.e. , the acquisition of word meaning,
teh foramtion of concepts, and the undrstanding of the socio- cultural meaning of language.
This document discusses concepts of equivalence and similarity in translation. It begins by defining equivalence and similarity, noting that similarity is not necessarily symmetrical, reversible, or transitive. It then examines approaches to equivalence in translation theory, including the equative view, taxonomic view, and relativist view which rejects equivalence as an identity assumption. Models of equivalence proposed by Vinay and Darbelnet, Jakobson, and Nida are outlined, noting tensions between formal correspondence and dynamic equivalence. The document emphasizes that equivalence is a complex concept that depends on context and perspective.
Towards Interaction Models Derived From Eye-tracking Data .jacekg
This document discusses using eye tracking data to develop interaction models. It proposes a two-state reading model to analyze eye movement patterns and correlate them with higher-level constructs like task characteristics and user knowledge. Two user studies are described that use eye tracking to measure cognitive effort across different tasks and assess how well eye tracking data can predict a user's self-reported domain knowledge. The goal is to develop models from eye tracking data that can be used to better understand, adapt, and enable user interactions.
DynaLearn: Problem-based learning supported by semantic techniquesOscar Corcho
This document describes a system that supports problem-based learning through semantic techniques. The system grounds learner models in semantic repositories to enable semantic-based feedback. It analyzes learner models and reference models to identify discrepancies in terminology, taxonomy, and qualitative reasoning structures. Suggestions are generated and filtered based on agreement across multiple reference models. The system aims to bridge gaps between learner and expert terminology and provide automated feedback to support the learning process.
Characterising the Emergent Semantics in Twitter ListsOscar Corcho
This document summarizes research analyzing the emergent semantics of lists and list names on Twitter. The researchers investigated whether related keywords can be identified from list names according to how they are used by different user roles (curators, subscribers, members). They used a dataset of over 297,000 lists to extract keywords from list names and model their relationships based on these user roles. Their experiments analyzed the semantics of related keyword pairs using techniques like WordNet searches and found that relationships identified based on members had the highest percentage of direct semantic relations like synonyms.
This presentation focuses on three mai component that are relevant to implement and achieve language competencies. i.e. , the acquisition of word meaning,
teh foramtion of concepts, and the undrstanding of the socio- cultural meaning of language.
This document discusses concepts of equivalence and similarity in translation. It begins by defining equivalence and similarity, noting that similarity is not necessarily symmetrical, reversible, or transitive. It then examines approaches to equivalence in translation theory, including the equative view, taxonomic view, and relativist view which rejects equivalence as an identity assumption. Models of equivalence proposed by Vinay and Darbelnet, Jakobson, and Nida are outlined, noting tensions between formal correspondence and dynamic equivalence. The document emphasizes that equivalence is a complex concept that depends on context and perspective.
Natural language processing (NLP) involves analyzing and understanding human language to allow interaction between computers and humans. The document outlines key steps in NLP including morphological analysis, syntactic analysis, semantic analysis, and pragmatic analysis to convert text into structured representations. It also discusses statistical NLP and real-world applications such as machine translation, question answering, and speech recognition.
Natural language processing (NLP) is introduced, including its definition, common steps like morphological analysis and syntactic analysis, and applications like information extraction and machine translation. Statistical NLP aims to perform statistical inference for NLP tasks. Real-world applications of NLP are discussed, such as automatic summarization, information retrieval, question answering and speech recognition. A demo of a free NLP application is presented at the end.
This document proposes a systemic interpretation language to bridge the systemic and semantic spheres. It involves using patterns and pattern languages at different levels of abstraction across domains. The language uses a grammar of elementary components including dynamics, statics, and heuristics. Patterns are explored using various contexts and methodologies in an open network, with a focus on observation, relationships, and multiple perspectives. The goal is to facilitate systemic coherence while appreciating multiple solutions and organizing knowledge in a pattern repository.
HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri...AMD Developer Central
Presentation HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri Yudanov and Leon Reznik at the AMD Developer Summit (APU13) November 11-13, 2013.
This document presents a distributed framework for performing natural language processing (NLP) on large collections of journal articles and integrating the results with existing structured knowledge bases. The framework uses a scaled NLP pipeline to extract structured annotations from unstructured text. It provides massively parallel access to these structured annotations and integrates them with ontologies and databases in a knowledge base. This allows applications to leverage both the unstructured text and existing structured knowledge for tasks like visualization, natural language understanding, and validation of other methods.
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
Word embeddings are commonly used in NLP tasks but embedding phrases while maintaining semantic meaning has been challenging. The authors present a novel method using Siamese neural networks to embed words and multi-word units in the same vector space. The model learns to generate phrase representations based on their semantic similarity to single words. It is trained on a dataset to predict similarity between words and phrases and outperforms previous models on phrase similarity and composition tasks.
The document discusses opinion mining and sentiment analysis. It describes how opinion mining uses natural language processing techniques on user input from internet sources to understand opinions. Sentiment analysis is used to extract emotions, subjects, and the impact of opinions. The key modules of an opinion mining and sentiment analysis system include opinion retrieval, sentiment classification, and summary generation. Sentiment classification applies a semi-supervised naive Bayes classifier using linguistic features to determine the polarity of opinions. While current systems can effectively analyze sentiments, challenges remain in handling ambiguity and analyzing opinions in different languages.
The document discusses the development of OpenWN-PT, a Brazilian Portuguese Wordnet. Key points:
- OpenWN-PT is being created as part of a joint project between CPDOC and EMAp to apply formal logical tools to Portuguese text.
- It is based on the Universal Wordnet (UWN) which projects WordNet concepts into over 200 languages using statistical methods. The UWN provides an initial automated version of a Portuguese Wordnet.
- The creators are working to improve the initial UWN-based Portuguese Wordnet by combining it with data from Princeton WordNet, UWN, MENTA, and EuroWordNet to generate a new OpenWN-PT file.
The document describes the MONK project which provides over 1400 works of English literature from the 16th-19th century tagged and stored in a database. It discusses using the data for various types of text analysis including predictive modeling, sentiment analysis, and information extraction. Specific techniques are described like named entity recognition, co-reference resolution, and semantic role analysis. Visualization of results is also mentioned.
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...Ana Luísa Pinho
Linking brain systems and mental functions requires accurate descriptions of behavioral tasks and fine demarcations of brain regions. Functional Magnetic Resonance Imaging (fMRI) has contributed to the investigation of brain regions involved in a variety of cognitive processes. However, to date, no data collection has systematically addressed the functional mapping of cognitive mechanisms at a fine spatial scale. The Individual Brain Charting (IBC) project stands for a high-resolution multi-task fMRI dataset that intends to provide the objective basis toward a comprehensive functional atlas of the human brain. The data refer to a permanent cohort performing many different tasks. The large amount of task-fMRI data on the same subjects yields a precise mapping of the underlying functions, free from both inter-subject and inter-site variability. The first release of the IBC dataset consists of data acquired from thirteen participants during performance of a dozen of tasks. Raw data from this release are publicly available in the OpenNeuro repository and derived statistical maps can be found in NeuroVault. These maps reveal a successful cognitive encoding of many psychological domains in large areas of the human brain. Indeed, main findings of the original studies were replicated at higher resolution. Our results thus provide a comprehensive revision of the neural correlates underlying behavior, highlighting nonetheless the spatial variability of functional signatures between participants. In addition, this dataset supports investigations using alternative approaches to group-level analysis of task-specific studies. For instance, such rich task-wise dataset can be applied to mega-analytic encoding models towards the development of a brain-atlasing framework, by systematically mapping functional signatures associated with the cognitive components of the tasks.
Development, distribution and use of open source software comprise a market of data (source code, bug reports, documentation, number of downloads, etc.) from projects, developers and users. This large amount of data makes it difficult for people involved to make sense of implicit links between software projects, e.g., dependencies, patterns, licenses. This context raises the question of what techniques and mechanisms can be used to help users and developers to link related pieces of information across software projects. In this paper, we propose a framework for a marketplace enhanced using linked open data (LOD) technology for linking software artifacts within projects as well as across software projects. The marketplace provides the infrastructure for collecting and aggregating software engineering data as well as developing services for mining, statistics, analytics and visualization of software data. Based on cross-linking software artifacts and projects, the marketplace enables developers and users to understand the individual value of components, their relationship to bigger software systems. Improved understanding creates new business opportunities for software companies: users will be better able to analyze and compare projects, developers can increase the visibility of their products, hosts may offer plug-ins and services over the data to paying customers.
This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
The document discusses formal language theory and its applications in natural language processing (NLP). It covers two main goals in computational linguistics - theoretical interest in formally characterizing natural language and practical interest in using well-understood frameworks like finite state models to solve NLP problems. Finite state devices are widely used in NLP tasks due to their efficiency and ability to model linguistic phenomena like words through dictionaries and rules. While finite state models provide a useful approximation of language, natural languages pose challenges like ambiguity, long distance dependencies and non-regular features that require extensions to basic finite state models.
Tweeting beyond Facts – The Need for a Linguistic PerspectiveData Science Society
The document discusses applying linguistic principles to natural language processing tasks. It argues that a trigger-scope approach to analyzing negation, modality, and speculative language has proven effective. The approach uses general linguistic modules as preprocessing before applying domain-specific models. Underappreciated linguistic elements like numbers, amounts, locations, and modifiers provide useful information for tasks. A suite of language-oriented preprocessing modules could improve downstream specialized processing by adapting general linguistic treatments to specific domains.
Towards comprehensive syntactic and semantic annotations of the clinical narr...Jinho Choi
Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed. Results The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations. Conclusions This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.
The document summarizes research on modeling multiple sequence processing using an unsupervised neural network approach based on the Hypermap Model. Key points:
- The researcher extends previous models to handle complex sequences with repeating subsequences and multiple sequences occurring together without interference.
- Modifications include incorporating short-term memory to dynamically encode time-varying sequence context and inhibitory links to enable competitive queuing during recall.
- Experimental evaluation shows the network can correctly recall sequences using partial context and when sequences overlap.
- Future work aims to model the transition from single-word to two-word child speech and incorporate temporal processing of multimodal inputs like gestures.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Monotonic Multihead Attention, Ma, Xutai, et al. "Monotonic Multihead Attention." International Conference on Learning Representations. 2020. review by June-Woo Kim
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
This document provides an overview of RNA-seq data analysis. It discusses quality control of sequencing data using tools like FastQC, mapping reads to a reference genome or transcriptome using aligners like BWA and TopHat, and summarizing reads using counting tools to obtain read counts for each gene. These counts can then be used to estimate gene expression levels and perform differential expression analysis to identify genes with different expression between samples or conditions.
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
Modal sense classification (MSC) is a special case of sense disambiguation relevant for distinguishing facts from hypotheses and speculations, or apprehended, planned and desired states of affairs. Prior approaches showed that even with carefully designed semantic feature sets, the models have difficulties beating the majority sense baseline in cases of difficult sense distinctions and when applying the models to heterogeneous text genres. Another drawback of former approaches is that feature implementation heavily depends on a external language-specific resources such as dependency or constituency parse trees and lexical databases such as WordNet or CELEX. To alleviate manual crafting of the features and to obtain a model which is easily portable to novel languages, we propose to cast MSC as a sentence classification task with a fixed sense inventory in a convolutional neural network (CNN) architecture. Our performance study shows that CNN is an appropriate model for MSC and its special properties motivate us to investigate it as a formal framework for general word sense disambiguation tasks.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
More Related Content
Similar to Semantic Analysis and Concept-based Translation for Multilingual Information Systems
Natural language processing (NLP) involves analyzing and understanding human language to allow interaction between computers and humans. The document outlines key steps in NLP including morphological analysis, syntactic analysis, semantic analysis, and pragmatic analysis to convert text into structured representations. It also discusses statistical NLP and real-world applications such as machine translation, question answering, and speech recognition.
Natural language processing (NLP) is introduced, including its definition, common steps like morphological analysis and syntactic analysis, and applications like information extraction and machine translation. Statistical NLP aims to perform statistical inference for NLP tasks. Real-world applications of NLP are discussed, such as automatic summarization, information retrieval, question answering and speech recognition. A demo of a free NLP application is presented at the end.
This document proposes a systemic interpretation language to bridge the systemic and semantic spheres. It involves using patterns and pattern languages at different levels of abstraction across domains. The language uses a grammar of elementary components including dynamics, statics, and heuristics. Patterns are explored using various contexts and methodologies in an open network, with a focus on observation, relationships, and multiple perspectives. The goal is to facilitate systemic coherence while appreciating multiple solutions and organizing knowledge in a pattern repository.
HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri...AMD Developer Central
Presentation HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri Yudanov and Leon Reznik at the AMD Developer Summit (APU13) November 11-13, 2013.
This document presents a distributed framework for performing natural language processing (NLP) on large collections of journal articles and integrating the results with existing structured knowledge bases. The framework uses a scaled NLP pipeline to extract structured annotations from unstructured text. It provides massively parallel access to these structured annotations and integrates them with ontologies and databases in a knowledge base. This allows applications to leverage both the unstructured text and existing structured knowledge for tasks like visualization, natural language understanding, and validation of other methods.
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
Word embeddings are commonly used in NLP tasks but embedding phrases while maintaining semantic meaning has been challenging. The authors present a novel method using Siamese neural networks to embed words and multi-word units in the same vector space. The model learns to generate phrase representations based on their semantic similarity to single words. It is trained on a dataset to predict similarity between words and phrases and outperforms previous models on phrase similarity and composition tasks.
The document discusses opinion mining and sentiment analysis. It describes how opinion mining uses natural language processing techniques on user input from internet sources to understand opinions. Sentiment analysis is used to extract emotions, subjects, and the impact of opinions. The key modules of an opinion mining and sentiment analysis system include opinion retrieval, sentiment classification, and summary generation. Sentiment classification applies a semi-supervised naive Bayes classifier using linguistic features to determine the polarity of opinions. While current systems can effectively analyze sentiments, challenges remain in handling ambiguity and analyzing opinions in different languages.
The document discusses the development of OpenWN-PT, a Brazilian Portuguese Wordnet. Key points:
- OpenWN-PT is being created as part of a joint project between CPDOC and EMAp to apply formal logical tools to Portuguese text.
- It is based on the Universal Wordnet (UWN) which projects WordNet concepts into over 200 languages using statistical methods. The UWN provides an initial automated version of a Portuguese Wordnet.
- The creators are working to improve the initial UWN-based Portuguese Wordnet by combining it with data from Princeton WordNet, UWN, MENTA, and EuroWordNet to generate a new OpenWN-PT file.
The document describes the MONK project which provides over 1400 works of English literature from the 16th-19th century tagged and stored in a database. It discusses using the data for various types of text analysis including predictive modeling, sentiment analysis, and information extraction. Specific techniques are described like named entity recognition, co-reference resolution, and semantic role analysis. Visualization of results is also mentioned.
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...Ana Luísa Pinho
Linking brain systems and mental functions requires accurate descriptions of behavioral tasks and fine demarcations of brain regions. Functional Magnetic Resonance Imaging (fMRI) has contributed to the investigation of brain regions involved in a variety of cognitive processes. However, to date, no data collection has systematically addressed the functional mapping of cognitive mechanisms at a fine spatial scale. The Individual Brain Charting (IBC) project stands for a high-resolution multi-task fMRI dataset that intends to provide the objective basis toward a comprehensive functional atlas of the human brain. The data refer to a permanent cohort performing many different tasks. The large amount of task-fMRI data on the same subjects yields a precise mapping of the underlying functions, free from both inter-subject and inter-site variability. The first release of the IBC dataset consists of data acquired from thirteen participants during performance of a dozen of tasks. Raw data from this release are publicly available in the OpenNeuro repository and derived statistical maps can be found in NeuroVault. These maps reveal a successful cognitive encoding of many psychological domains in large areas of the human brain. Indeed, main findings of the original studies were replicated at higher resolution. Our results thus provide a comprehensive revision of the neural correlates underlying behavior, highlighting nonetheless the spatial variability of functional signatures between participants. In addition, this dataset supports investigations using alternative approaches to group-level analysis of task-specific studies. For instance, such rich task-wise dataset can be applied to mega-analytic encoding models towards the development of a brain-atlasing framework, by systematically mapping functional signatures associated with the cognitive components of the tasks.
Development, distribution and use of open source software comprise a market of data (source code, bug reports, documentation, number of downloads, etc.) from projects, developers and users. This large amount of data makes it difficult for people involved to make sense of implicit links between software projects, e.g., dependencies, patterns, licenses. This context raises the question of what techniques and mechanisms can be used to help users and developers to link related pieces of information across software projects. In this paper, we propose a framework for a marketplace enhanced using linked open data (LOD) technology for linking software artifacts within projects as well as across software projects. The marketplace provides the infrastructure for collecting and aggregating software engineering data as well as developing services for mining, statistics, analytics and visualization of software data. Based on cross-linking software artifacts and projects, the marketplace enables developers and users to understand the individual value of components, their relationship to bigger software systems. Improved understanding creates new business opportunities for software companies: users will be better able to analyze and compare projects, developers can increase the visibility of their products, hosts may offer plug-ins and services over the data to paying customers.
This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
The document discusses formal language theory and its applications in natural language processing (NLP). It covers two main goals in computational linguistics - theoretical interest in formally characterizing natural language and practical interest in using well-understood frameworks like finite state models to solve NLP problems. Finite state devices are widely used in NLP tasks due to their efficiency and ability to model linguistic phenomena like words through dictionaries and rules. While finite state models provide a useful approximation of language, natural languages pose challenges like ambiguity, long distance dependencies and non-regular features that require extensions to basic finite state models.
Tweeting beyond Facts – The Need for a Linguistic PerspectiveData Science Society
The document discusses applying linguistic principles to natural language processing tasks. It argues that a trigger-scope approach to analyzing negation, modality, and speculative language has proven effective. The approach uses general linguistic modules as preprocessing before applying domain-specific models. Underappreciated linguistic elements like numbers, amounts, locations, and modifiers provide useful information for tasks. A suite of language-oriented preprocessing modules could improve downstream specialized processing by adapting general linguistic treatments to specific domains.
Towards comprehensive syntactic and semantic annotations of the clinical narr...Jinho Choi
Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed. Results The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations. Conclusions This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.
The document summarizes research on modeling multiple sequence processing using an unsupervised neural network approach based on the Hypermap Model. Key points:
- The researcher extends previous models to handle complex sequences with repeating subsequences and multiple sequences occurring together without interference.
- Modifications include incorporating short-term memory to dynamically encode time-varying sequence context and inhibitory links to enable competitive queuing during recall.
- Experimental evaluation shows the network can correctly recall sequences using partial context and when sequences overlap.
- Future work aims to model the transition from single-word to two-word child speech and incorporate temporal processing of multimodal inputs like gestures.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Monotonic Multihead Attention, Ma, Xutai, et al. "Monotonic Multihead Attention." International Conference on Learning Representations. 2020. review by June-Woo Kim
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
This document provides an overview of RNA-seq data analysis. It discusses quality control of sequencing data using tools like FastQC, mapping reads to a reference genome or transcriptome using aligners like BWA and TopHat, and summarizing reads using counting tools to obtain read counts for each gene. These counts can then be used to estimate gene expression levels and perform differential expression analysis to identify genes with different expression between samples or conditions.
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
Modal sense classification (MSC) is a special case of sense disambiguation relevant for distinguishing facts from hypotheses and speculations, or apprehended, planned and desired states of affairs. Prior approaches showed that even with carefully designed semantic feature sets, the models have difficulties beating the majority sense baseline in cases of difficult sense distinctions and when applying the models to heterogeneous text genres. Another drawback of former approaches is that feature implementation heavily depends on a external language-specific resources such as dependency or constituency parse trees and lexical databases such as WordNet or CELEX. To alleviate manual crafting of the features and to obtain a model which is easily portable to novel languages, we propose to cast MSC as a sentence classification task with a fixed sense inventory in a convolutional neural network (CNN) architecture. Our performance study shows that CNN is an appropriate model for MSC and its special properties motivate us to investigate it as a formal framework for general word sense disambiguation tasks.
Similar to Semantic Analysis and Concept-based Translation for Multilingual Information Systems (20)
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Programming Foundation Models with DSPy - Meetup Slides
Semantic Analysis and Concept-based Translation for Multilingual Information Systems
1. Semantic Analysis and Concept-based
Translation for Multilingual Information
Systems
Johannes Leveling and
Sven Hartrumpf and
Rainer Osswald
Intelligent Information and Communication Systems (IICS)
University of Hagen (FernUniversität in Hagen)
58084 Hagen, Germany
firstname.lastname@fernuni-hagen.de
GAL 2007, Hildesheim, Germany
2. Semantic
Analysis and
Concept-
based
Translation
Outline
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept- 1 Concept-based Representation: MultiNet
based
Representa-
tion:
MultiNet 2 Three Phases for a Concept-Based Multilingual IR
Three Phases System
for a Concept-
Based
Multilingual IR
System 3 Concept-Based Information Systems
Concept-
Based
Information
Systems 4 Applications
Applications
Conclusion
and Outlook 5 Conclusion and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 2 / 27
3. Semantic
Analysis and
Concept-
based
Translation
Motivation for Concept-Based
J. Leveling,
S. Hartrumpf,
Translation
R. Osswald
Concept-
based
• Example 1:
Representa-
tion:
Query expansion in information retrieval (IR) with
MultiNet
elements from same synset
Three Phases
for a Concept- → needs word sense disambiguation (differentiation of
Based
Multilingual IR concepts), otherwise loss of precision
System
Concept-
• Example 2:
Based
Information
Question answering (QA): questions on relations
Systems
between concepts (situations, events, etc.)
Applications
Example: Who killed Lee Harvey Oswald?
Conclusion
and Outlook → need semantic representation;
References bag-of-words information retrieval is not enough
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 3 / 27
4. Semantic
Analysis and
Concept-
based
Translation
The MultiNet Paradigm
J. Leveling,
S. Hartrumpf,
R. Osswald • Meaning and knowledge representation:
Concept-
Multilayered Extended Semantic Networks (Helbig,
based
Representa-
2001, 2006)
tion:
MultiNet • Semantic network of nodes (concepts) and edges
Three Phases (semantic relations from a fixed set)
for a Concept-
Based • In addition:
Multilingual IR
System semantic sorts, semantic features, layer information
Concept-
Based • Different types of concepts:
Information
Systems lexicalized vs. non-lexicalized
Applications • Language-independence:
Conclusion
and Outlook annotation of English/Czech sentences from the Wall
References Street Journal with MultiNet (Charles University,
Prague)
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 4 / 27
5. Semantic
Analysis and
Concept-
based
Translation
Selected Semantic Relations
J. Leveling, Relation Description
S. Hartrumpf,
R. Osswald
ASSOC association
Concept- ATTCH attachment of object to object
based
Representa- CHPA change of sorts (property →abstract object)
tion:
MultiNet
EXP experiencer
Three Phases
MCONT an informational process or object
for a Concept- OBJ neutral object
Based
Multilingual IR PRED predicative concept specifying a plurality
System
PROP property relationship
Concept-
Based PARS meronymy
Information
Systems
SCAR carrier of a state
Applications
SSPE state specifier
SUB conceptual subordination for objects
Conclusion
and Outlook SUBS conceptual subordination for situations
References SYNO synonymy
TEMP temporal restriction for a situation
ALTN 1 an introduction of alternatives
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 5 / 27
6. Semantic
Analysis and
Concept-
based
Translation
The Computational Lexicon –
J. Leveling,
S. Hartrumpf,
HaGenLex
R. Osswald
Concept-
based
Representa-
tion: • Semantically oriented (German) lexical resource
MultiNet
(Hartrumpf et al., 2003)
Three Phases
for a Concept- • Consists of multiple lexicons:
Based
Multilingual IR • full syntactico-semantic information (26,000 entries)
System
• flat lexicon (50,000 entries)
Concept-
Based • compound lexicon (30,000 entries; structure and
Information semantics)
Systems
• name lexicons (250,000 entries)
Applications
Conclusion • Support for the lexicographer: LIAplus workbench
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 6 / 27
7. Semantic
Analysis and
Concept-
based
Translation
Sample Concepts (German)
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Representa- • essen.1.1: (Der Student) (ißt) (eine Schokolade).
tion:
MultiNet • essen.1.2: (Der Student) (ißt) sich (satt).
Three Phases
for a Concept- • essen.2.1: Das Kind hat kein Essen bekommen.
Based
Multilingual IR
System
• essen.2.2: Das Essen am Abend dauerte 2 Stunden.
Concept- • fressen.1.1: (Der Hund) (frißt) (einen Knochen).
Based
Information • fressen.1.2: (Die Großmutter) (frißt) (einen Narren) (an
Systems
Applications
den Blumen).
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 7 / 27
8. Semantic
Analysis and
Concept-
based
Translation
Lexicon Entry (German):
J. Leveling,
S. Hartrumpf,
essen.1.1
R. Osswald
n-sign
Concept- morph base ”essen”
based infl-para i129g
v-syn
Representa-
v-type main
tion:
syn
perf-aux haben
MultiNet v-control nocontr
sem
sem
entity nonment-action
Three Phases
c-id ”essen.1.1”
for a Concept-
Based
rel agt
Multilingual IR
np-syn
System
cat np
syn
agr case nom
sel
Concept-
semsel
sem
Based
select semsel sem
entity human-object
Information
rel aff
Systems
np-syn
cat np
Applications syn
agr case acc
sel
Conclusion sem
semsel sem
and Outlook entity sort co
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 8 / 27
9. Semantic
Analysis and
Concept-
based
Translation
Lexicon Entry (German):
J. Leveling,
S. Hartrumpf,
fressen.1.1
R. Osswald
n-sign
Concept- morph base ”fressen”
based infl-para i139g
v-syn
Representa-
v-type main
tion:
syn
perf-aux haben
MultiNet v-control nocontr
sem
sem
entity nonment-action
Three Phases
c-id ”fressen.1.1”
for a Concept-
Based
rel agt
Multilingual IR
np-syn
System
cat np
syn
agr case nom
sel
Concept-
semsel
sem
Based
select semsel sem
entity animal-object ∨ human-object
Information
rel aff
Systems
np-syn
cat np
Applications syn
agr case acc
sel
Conclusion sem
semsel sem
and Outlook entity sort co
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 9 / 27
10. Semantic
Analysis and
Concept-
based
Translation
Semantic analysis –
J. Leveling,
S. Hartrumpf,
The WOCADI parser
R. Osswald
Concept-
based
Representa- • Produces semantic network representation from
tion:
MultiNet (German) texts (Hartrumpf, 2003):
Three Phases • resolves coreferences,
for a Concept-
Based • analyzes idioms,
Multilingual IR
System
• decompounds nouns and adjectives,
Concept-
• identifies metonymy,
Based • resolves deictic expressions etc.
Information
Systems
• Applied to large corpora, including
Applications
CLEF-NEWS newspaper corpus (275,000 articles) and
Conclusion
and Outlook German Wikipedia (500,000 articles)
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 10 / 27
11. Semantic
Analysis and
Concept-
based
Translation
SN Example (German)
J. Leveling,
S. Hartrumpf,
R. Osswald du.1.1 streß.1.1 psychisch.1.1
Concept-
PROP
SUBS
SUB
based
Representa-
dokument.1.1 problem.1.1
tion: PRED
*ALTN1
MultiNet c3 c7 c6
prüfling.1.1
Three Phases
PRED
EXP
PRED
for a Concept- c10
Based
Multilingual IR OBJ MCONT ATTCH
*ALTN1
System c2 c1 c5 c8
kandidat.1.1
SUBS
Concept-
SCAR
E
P
SS
Based c9
Information
PRED
B
Systems SUBS SU
Applications finden.1.1 c4 berichten.2.2
ASSOC
Conclusion prüfungskandidat.1.1prüfung.1.1
and Outlook
References
Finde Dokumente, die über psychische Probleme oder Stress von
Prüfungskandidaten oder Prüflingen berichten. (GIRT topic 116)
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 11 / 27
12. Semantic
Analysis and
Concept-
based
Translation
SN Example (English)
J. Leveling,
S. Hartrumpf,
R. Osswald you stress mental
Concept-
PROP
SUBS
SUB
based
Representa-
document problem
tion: PRED
*ALTN1
MultiNet c3 c7 c6
examinee
Three Phases
PRED
EXP
PRED
for a Concept- c10
Based
Multilingual IR OBJ MCONT ATTCH
*ALTN1
System c2 c1 c5 c8
candidate
SUBS
Concept-
SCAR
PE
SS
Based c9
Information
PRED
B
Systems SUBS SU
Applications find c4 report
ASSOC
Conclusion exam
and Outlook
References
‘Find documents reporting on mental problems or stress of examination
candidates or examinees.’ (GIRT topic 116)
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 12 / 27
13. Semantic
Analysis and
Concept-
based
Translation
Phase 1: Using Statistical MT
J. Leveling,
S. Hartrumpf,
and Web Services
R. Osswald
Concept- • Employ (statistical) machine translation (MT) web
based
Representa- service for IR experiments (translation of
tion:
MultiNet queries/questions): Systran, Promt, ...
Three Phases • Problems:
for a Concept-
Based • translating questions:
Multilingual IR
System most systems trained on declarative sentences;
Concept- imperative forms often misunderstood
Based
Information
(Find documents ... →Fund Dokument ...)
Systems • named entity recognition:
Applications not reliable (Neuengland →new narrow country )
Conclusion
and Outlook
• Performance loss from off-the-shelf translation tools for
References QA@CLEF: 50%
further examples: Ligozat et al. (2006)
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 13 / 27
14. Semantic
Analysis and
Concept-
based
Translation
Phase 2: Aligning
J. Leveling,
S. Hartrumpf,
Concept-based Tools and
R. Osswald
Resources
Concept-
based
Representa-
tion:
• Morphology and syntax are different for different
MultiNet
languages
Three Phases
for a Concept- • Semantics is the same (in general)
Based
Multilingual IR • Our approach:
System
Concept-
• create lexicons for different languages ;
Based fast construction parallel to existing lexicon(s), e.g.
Information
Systems HaGenLex →HaEnLex
Applications • develop parser for different languages
Conclusion • apply methods from IR/QA on SN representation
and Outlook
References
• General idea: replace concepts (labels) in semantic
network representation (as a form of translation)
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 14 / 27
15. Semantic
Analysis and
Concept-
based
Translation
Status of Alignment of Lexical
J. Leveling,
S. Hartrumpf,
Resources
R. Osswald
Concept-
based
• German to English dictionaries: about 100,000
Representa-
tion:
word/phrase translations
MultiNet
• Mapping between HaGenLex concepts and GermaNet
Three Phases
for a Concept- concepts, plus GermaNet to EuroWordNet mapping:
Based
Multilingual IR about 14,000 concept translations
System
Concept-
• Wikipedia articles (in German and English): about
Based
Information
3,000 proper noun translations for cities, countries,
Systems
persons, organizations, etc.
Applications
• HaEnLex (parallel English version of HaGenLex) with
Conclusion
and Outlook full morphologic, syntactic, semantic description of
References concepts: about 7,000 English entries
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 15 / 27
16. Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (1/6)
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based Compounds (rare in English):
Representa-
tion:
MultiNet
• with regular semantics
Three Phases Kinderernährung →nutrition of children
for a Concept-
Based • with irregular semantics
Multilingual IR
System Frauenzimmer →dame (?); ladies’ room (?)
Concept-
Based • borderline cases
Information
Systems Bankwesen →banking (system) (?)
Applications → compound-less semantic representation is possible
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 16 / 27
17. Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (2/6)
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Representa- Idioms:
tion:
MultiNet • with corresponding idiom:
Three Phases
for a Concept- in den Sinn kommen (DE) →to start thinking about sth.
Based
Multilingual IR
to come into mind (EN) →to start thinking about sth.
System
• without equivalent idiom:
Concept-
Based to be someone’s cup of tea (EN) →to like
Information
Systems
→ semantic representation of idioms
Applications
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 17 / 27
18. Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (3/6)
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Metonymy:
Representa-
tion: • with corresponding metonymy pattern (for regulat
MultiNet
metonymy):
Three Phases
for a Concept- The White House agreed, that ... (EN)
Based
Multilingual IR →place-for-government
System
Das Weiße Haus stimmte zu, dass ... (DE)
Concept-
Based →place-for-government
Information
Systems • without: ?
Applications
Conclusion
→ no problems, yet
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 18 / 27
19. Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (4/6)
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based Proper nouns:
Representa-
tion:
MultiNet
• transcriptions and transliterations, historic name
Three Phases variants
for a Concept-
Based • Böll →Boell;
Multilingual IR
System Gorbatschow →Gorbatchev, Gorbatchov
Concept-
Based
→ can be solved using aligned online resources e.g.
Information
Systems
Wikipedia
Applications → treat name variants as elements of the same synset
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 19 / 27
20. Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (5/6)
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Semantic gaps/lexical gaps:
Representa-
tion: • Fohlen (DE) →colt (if male),
MultiNet
Three Phases
• Fohlen (DE) →filly (if female)
for a Concept-
Based • Alignment of lexicon entries: morpho-syntactic features
Multilingual IR
System differ in different languages, syntactic features also,
Concept- semantic features do not (in general) but: net
Based
Information entries/rules/entailments may be slightly different?!,
Systems
because they already involve other concepts (which
Applications
Conclusion
have to be translated)
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 20 / 27
21. Semantic
Analysis and
Concept-
based
Translation
Linguistic Phenomena (6/6)
J. Leveling,
S. Hartrumpf,
Semantic gaps/lexical gaps:
R. Osswald
essen.1.1 →eat.1.1 AND fressen.1.1 →eat.1.1
Concept-
based
Representa- n-sign
tion:
”eat”
MultiNet morph base
infl-para i20
v-syn
Three Phases syn
v-type main
for a Concept-
sem
sem
Based
entity nonment-action
Multilingual IR
c-id
”eat.1.1”
System
rel agt
Concept-
np-syn
syn
Based cat np
semsel sel
Information sem
semsel sem
entity animal-object ∨ human-object
Systems select
rel aff
Applications
np-syn
syn
cat np
Conclusion sel sem
semsel sem
and Outlook entity sort co
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 21 / 27
22. Semantic
Analysis and
Concept-
based
Translation
Phase 3: Towards a
J. Leveling,
S. Hartrumpf,
Concept-Based Translation
R. Osswald
Concept-
based
Representa-
tion:
MultiNet
Three Phases • Assumption that the same inventory of relations hold
for a Concept-
Based (about 140 relations) for different languages
Multilingual IR
System • Natural language generation (for German)
Concept-
Based • Possible solution: English parser, generate natural
Information
Systems language from semantic network representation
Applications
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 22 / 27
23. Semantic
Analysis and
Concept-
based
Translation
Monolingual Concept-Based IR
J. Leveling,
S. Hartrumpf,
R. Osswald
• Techniques of standard IR: stemming and stopword
Concept-
based removal
Representa-
tion: • Monolingual concept-based IR:
MultiNet
• represent queries (and documents) as semantic
Three Phases
for a Concept- networks
Based • (translate concepts)
Multilingual IR
System • employ methods on semantic network representation
Concept-
Based
• Advantages:
Information
Systems
• semantics of compounds (relation to its constituents)
Applications
• semantics of prepositions is typically represented by
Conclusion
semantic relation or function (no full translation needed)
and Outlook • lemmatizing (instead of stemming)
References • query expansion with elements of synsets
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 23 / 27
24. Semantic
Analysis and
Concept-
based
Translation
Multilingual Concept-Based IR
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Representa-
• Three different approaches at supporting a multilingual
tion: search
MultiNet
Three Phases
1 translate queries into the document language
for a Concept- 2 translate documents into the query language
Based
Multilingual IR 3 translate both queries and documents into an
System
interlingua
Concept-
Based • Multilingual concept-based IR: same as monolingual
Information
Systems approach, but translate concepts (1, 2, or 3)
Applications →towards an interlingua
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 24 / 27
25. Semantic
Analysis and
Concept-
based
Translation
Projects and Evaluations
J. Leveling,
S. Hartrumpf,
R. Osswald
• GeoCLEF (Leveling and Veiel, 2006): Web service for
Concept-
based
MT (query translation)
Representa-
tion: • GIRT-4 experiments (Leveling, 2004, 2006a): combined
MultiNet
concept and word translation
Three Phases
for a Concept- • NLI-Z39.50 (Leveling, 2006b): replace terminal
Based
Multilingual IR
System
concepts in SN, then treat translation alternatives as a
Concept-
synset for query expansion (no decision for a single
Based
Information
reading necessary)
Systems
• QA@CLEF (Hartrumpf and Leveling, 2007): Web
Applications
service for MT, then analysis; concept-based translation
Conclusion
and Outlook with rudimentary English parser (preliminary
References experiments)
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 25 / 27
26. Semantic
Analysis and
Concept-
based
Translation
Conclusion
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Representa-
• General approach:
tion:
MultiNet • Parse queries
Three Phases • Translate concepts in SN representation
for a Concept-
Based
• Operate on SN representation
Multilingual IR
System • Aims at multilingual information systems for different
Concept-
Based
purposes:
Information
Systems
IR, QA
Applications • 3 phases (currently phase 2)
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 26 / 27
27. Semantic
Analysis and
Concept-
based
Translation
Outlook
J. Leveling,
S. Hartrumpf,
R. Osswald
Concept-
based
Representa-
tion:
MultiNet
• Create a repository of interlingua concepts:
Three Phases
allow for a concept-based machine-translation of text
for a Concept-
Based
→natural language generation
Multilingual IR
System
→MT
Concept- • Outlook for IR/QA:
Based
Information index semantic relations as well
Systems
Applications
Conclusion
and Outlook
References
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 27 / 27
28. Semantic Hartrumpf, Sven (2003). Hybrid Disambiguation in Natural Language
Analysis and Analysis. Osnabrück, Germany: Der Andere Verlag.
Concept-
based Hartrumpf, Sven; Hermann Helbig; and Rainer Osswald (2003). The
Translation
semantically based computer lexicon HaGenLex – Structure and
J. Leveling,
S. Hartrumpf, technological environment. Traitement automatique des langues,
R. Osswald 44(2):81–105.
Concept- Hartrumpf, Sven and Johannes Leveling (2007). Interpretation and
based normalization of temporal expressions for question answering. In
Representa-
tion: Evaluation of Multilingual and Multi-modal Information Retrieval: 7th
MultiNet
Workshop of the Cross-Language Evaluation Forum, CLEF 2006
Three Phases (edited by Peters, Carol; Paul Clough; Fredric C. Gey; Jussi Karlgren;
for a Concept-
Based Bernardo Magnini; Douglas W. Oard; Maarten de Rijke; and
Multilingual IR
System
Maximilian Stempfhuber), volume 4730 of LNCS, pp. 432–439. Berlin:
Springer.
Concept-
Based Helbig, Hermann (2001). Die semantische Struktur natürlicher Sprache:
Information
Systems Wissensrepräsentation mit MultiNet. Berlin: Springer.
Applications Helbig, Hermann (2006). Knowledge Representation and the Semantics
Conclusion of Natural Language. Berlin: Springer.
and Outlook
Leveling, Johannes (2004). University of Hagen at CLEF 2003: Natural
References
language access to the GIRT4 data. In Comparative Evaluation of
Multilingual Information Access Systems: 4th Workshop of the
Cross-Language Evaluation Forum, CLEF 2003 (edited by Peters,
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 27 / 27
29. Semantic Carol; Julio Gonzalo; Martin Braschler; and Michael Kluck), volume
Analysis and 3237 of LNCS, pp. 412–424. Berlin: Springer.
Concept-
based Leveling, Johannes (2006a). A baseline for NLP in domain-specific
Translation
information retrieval. In Accessing Multilingual Information
J. Leveling,
S. Hartrumpf, Repositories: 6th Workshop of the Cross-Language Evaluation Forum,
R. Osswald CLEF 2005 (edited by Peters, Carol; Fredric C. Gey; Julio Gonzalo;
Gareth J. F. Jones; Michael Kluck; Bernardo Magnini; Henning Müller;
Concept-
based and Maarten de Rijke), volume 4022 of LNCS, pp. 222–225. Berlin:
Representa- Springer.
tion:
MultiNet Leveling, Johannes (2006b). Formale Interpretation von Nutzeranfragen
Three Phases für natürlichsprachliche Interfaces zu Informationsangeboten im
for a Concept-
Based Internet. Der andere Verlag, Tönning, Germany.
Multilingual IR
System Leveling, Johannes and Dirk Veiel (2006). University of Hagen at
Concept- GeoCLEF 2006: Experiments with metonymy recognition in
Based documents. In Results of the CLEF 2006 Cross-Language System
Information
Systems Evaluation Campaign, Working Notes for the CLEF 2006 Workshop
Applications
(edited by Nardi, Alessandro; Carol Peters; and José Luis Vicedo).
Alicante, Spain.
Conclusion
and Outlook Ligozat, Anne-Laure; Brigitte Grau; Isabelle Robba; and Anne Vilnat
References (2006). Evaluation and improvement of cross-lingual question
answering strategies. In Proceedings of the EACL 2006 Workshop on
Multilingual Question Answering (MLQA’06), pp. 23–30. Trento, Italy.
J. Leveling, S. Hartrumpf, R. Osswald Semantic Analysis and Concept-based Translation 27 / 27