This document discusses current research topics in terminology and ontologies. It covers trends like term variation, culture-specific semantic differences, definitions, contexts, and knowledge-rich contexts. It also discusses term extraction and mapping. Key areas of research include improving techniques for specialised domains, identifying term variants, providing richer semantic descriptions, and supporting terminological workflows and users.
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
This document provides an overview of terminology and ontologies. It discusses why terminology is important, including for expert communication, knowledge transfer, and management. Terms are defined as linguistic symbols that represent concepts, with the relationship between terms and concepts being one-to-one in terminology. Conceptual relations between concepts are also discussed, including hierarchical relations like "is-a" that define a concept's location within a concept system. The document emphasizes that terminology work should be concept-oriented, structuring concepts into organized concept systems.
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
This document discusses information retrieval and describes its three main phases: 1) asking a question to define an information need, 2) constructing an answer by matching queries to documents, and 3) assessing the relevance of the retrieved answers. It also covers several important information retrieval concepts like keywords, indexing documents, stemming words, calculating TF-IDF weights, and evaluating system performance using recall and precision.
The document discusses ontology alignment, which is the process of finding correspondences between concepts in different ontologies to allow them to be used together. It notes that there is no single unified ontology, so alignment helps integrate overlapping conceptualizations. The key constructs for expressing alignments are relations like equivalence and subclass between concepts. Techniques discussed for finding mappings include string-based, linguistic/language-based, taxonomy comparison, and using example instances. The challenges of alignment evaluation and interpretation of results are also covered.
The document discusses ontology matching, which is the process of finding relationships between entities in different ontologies. It describes various techniques for ontology matching including basic techniques that operate at the element-level or structure-level, as well as classifications of matching techniques based on the type of input used and level of interpretation. The document also provides examples of commonly used methods for ontology matching like string-based, language-based, and structure-based techniques.
Introduction to Ontology Concepts and TerminologySteven Miller
The document introduces an ontology tutorial that will cover basic concepts of the Semantic Web, Linked Data, and the Resource Description Framework data model as well as the ontology languages RDFS and OWL. The tutorial is intended for information professionals who want to gain an introductory understanding of ontologies, ontology concepts, and terminology. The tutorial will explain how to model and structure data as RDF triples and create basic RDFS ontologies.
The document summarizes a seminar on ontology mapping presented by Samhati Soor. The seminar covered the need for ontology mapping due to the proliferation of ontologies, and the purpose of mapping ontologies to achieve interoperability and sharing knowledge. It defined ontologies and ontology mapping and discussed categories of mapping including between global and local ontologies, between local ontologies, and for merging ontologies. Tools for ontology mapping discussed included GLUE and SAM. Evaluation criteria and challenges of ontology mapping were also summarized along with conclusions and references.
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
Yang Yu is proposing research on improving machine learning based ontology mapping by automatically obtaining training samples from the web. The proposed system would parse two input ontologies to generate queries to search engines and collect documents to use as samples for each ontology class. These samples would then be used to train text classifiers, which would produce probabilistic mappings between classes in the two ontologies. The results would be evaluated by comparing to mappings from human experts. Current work involves exploring alternative text classification tools and ways to utilize the probabilistic mapping values generated by the classifiers.
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
This document provides an overview of terminology and ontologies. It discusses why terminology is important, including for expert communication, knowledge transfer, and management. Terms are defined as linguistic symbols that represent concepts, with the relationship between terms and concepts being one-to-one in terminology. Conceptual relations between concepts are also discussed, including hierarchical relations like "is-a" that define a concept's location within a concept system. The document emphasizes that terminology work should be concept-oriented, structuring concepts into organized concept systems.
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
This document discusses information retrieval and describes its three main phases: 1) asking a question to define an information need, 2) constructing an answer by matching queries to documents, and 3) assessing the relevance of the retrieved answers. It also covers several important information retrieval concepts like keywords, indexing documents, stemming words, calculating TF-IDF weights, and evaluating system performance using recall and precision.
The document discusses ontology alignment, which is the process of finding correspondences between concepts in different ontologies to allow them to be used together. It notes that there is no single unified ontology, so alignment helps integrate overlapping conceptualizations. The key constructs for expressing alignments are relations like equivalence and subclass between concepts. Techniques discussed for finding mappings include string-based, linguistic/language-based, taxonomy comparison, and using example instances. The challenges of alignment evaluation and interpretation of results are also covered.
The document discusses ontology matching, which is the process of finding relationships between entities in different ontologies. It describes various techniques for ontology matching including basic techniques that operate at the element-level or structure-level, as well as classifications of matching techniques based on the type of input used and level of interpretation. The document also provides examples of commonly used methods for ontology matching like string-based, language-based, and structure-based techniques.
Introduction to Ontology Concepts and TerminologySteven Miller
The document introduces an ontology tutorial that will cover basic concepts of the Semantic Web, Linked Data, and the Resource Description Framework data model as well as the ontology languages RDFS and OWL. The tutorial is intended for information professionals who want to gain an introductory understanding of ontologies, ontology concepts, and terminology. The tutorial will explain how to model and structure data as RDF triples and create basic RDFS ontologies.
The document summarizes a seminar on ontology mapping presented by Samhati Soor. The seminar covered the need for ontology mapping due to the proliferation of ontologies, and the purpose of mapping ontologies to achieve interoperability and sharing knowledge. It defined ontologies and ontology mapping and discussed categories of mapping including between global and local ontologies, between local ontologies, and for merging ontologies. Tools for ontology mapping discussed included GLUE and SAM. Evaluation criteria and challenges of ontology mapping were also summarized along with conclusions and references.
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
Yang Yu is proposing research on improving machine learning based ontology mapping by automatically obtaining training samples from the web. The proposed system would parse two input ontologies to generate queries to search engines and collect documents to use as samples for each ontology class. These samples would then be used to train text classifiers, which would produce probabilistic mappings between classes in the two ontologies. The results would be evaluated by comparing to mappings from human experts. Current work involves exploring alternative text classification tools and ways to utilize the probabilistic mapping values generated by the classifiers.
The document discusses text mining, including defining it as the extraction of information from unstructured text using computational methods. It covers topics such as structured vs unstructured data, common text mining practice areas like information retrieval and document clustering, and challenges in text mining including ambiguity in language. Pre-processing techniques for text mining are also outlined, such as normalization, tokenization, stemming and removing stop words to clean and prepare text for analysis.
The document summarizes and compares schema matching and ontology mapping. It discusses how schema matching approaches can be applied to ontology mapping given the similarities between schemas and ontologies. The document outlines different categories of schema matching techniques (element-based, structure-based) and provides examples. It also summarizes several ontology mapping tools and approaches that utilize different matching strategies like string, structure, and semantic similarity.
Text mining seeks to extract useful information from unstructured text documents. It involves preprocessing the text, identifying features, and applying techniques from data mining, machine learning and natural language processing to discover patterns. The core operations of text mining include analyzing distributions of concepts, identifying frequent concept sets and associations between concepts. Text mining systems aim to analyze document collections over time to identify trends, ephemeral relationships and anomalous patterns.
The document provides an overview of ontology and its various aspects. It discusses the origin of the term ontology, which derives from Greek words meaning "being" and "science," so ontology is the study of being. It distinguishes between scientific and philosophical ontologies. Social ontology examines social entities. Perspectives on ontology include philosophy, library and information science, artificial intelligence, linguistics, and the semantic web. The goal of ontology is to encode knowledge to make it understandable to both people and machines. It provides motivations for developing ontologies such as enabling information integration and knowledge management. The document also discusses ontology languages, uniqueness of ontologies, purposes of ontologies, and provides references.
Formal and Computational Representations
The Semantics of First-Order Logic
Event Representations
Description Logics & the Web Ontology Language
Compositionality
Lamba calculus
Corpus-based approaches:
Latent Semantic Analysis
Topic models
Distributional Semantics
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
The concept of digital library revolutionized its popularity with the development of networking technology. Digital library stores various kind of documents in digitized format that enables user smooth access to these documents at subsidized costs. In the recent past, a similar concept i.e., ontology library has gained popularity among the communities like semantic web, artificial intelligence, information science, philosophy, linguistics, and so forth.
This document discusses ontology mapping. It begins with an introduction to the semantic web and ontologies. Ontology mapping is important for allowing different ontologies to be aligned and related. There are different types of ontology mapping including alignment, merging, and mapping. The document then surveys some popular ontology mapping techniques including GLUE, PROMPT, and QOM. It evaluates these techniques and discusses their inputs, outputs, and approaches. The document concludes that semantic web research is important for advancing web technologies and realizing the goals of web 3.0. Future work could involve developing new ontology mapping techniques and publishing research on existing mapping methods.
Lect6-An introduction to ontologies and ontology developmentAntonio Moreno
The document provides an overview of ontologies and ontology development:
1. It defines ontologies as explicit specifications of conceptualizations in a domain that define concepts, properties, attributes, and relationships to enable knowledge sharing.
2. Ontology components include concepts, properties, restrictions, and individuals. Ontologies can range from single large ontologies to several specialized smaller ones.
3. OWL is introduced as the standard language for representing ontologies, with features like classes, properties, restrictions, and logical operators.
4. A general methodology for ontology development is outlined, including determining scope, reusing existing ontologies, enumerating terms, and defining classes, properties, and other components in an iterative
For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a novel scientific result using them is an essential technique. In social and business areas, open data strategies in many countries promote diversity of public data, how to combine big data and open data is a big challenge. That is, diversity of dataset is a problem to be solved for big data.
Ontology gives a systematized knowledge to integrate multiple datasets across domains with semantics of them. Linked Data also provides techniques to interlink datasets based on semantic web technologies. We consider that combinations of ontology and Linked Data based on ontological engineering can contribute to solution of diversity problem in big data.
In this talk, I discuss how ontological engineering could be applied to big data with some trial examples.
An ontology is a specification of a conceptualization that allows us to represent domain knowledge so that we can share a common understanding, enable reuse, make domain assumptions explicit, and separate domain knowledge from operational knowledge. Ontologies offer reasoning services like consistency checking, subsumption, and query answering that are different from those found in XML and relational databases. OWL ontologies use semantics rather than just syntax to represent knowledge about concepts, individuals, and relationships between them.
This document discusses knowledge patterns, which are invariances or regularities that exist across different types of data and domains. It provides examples of knowledge patterns found in linguistic resources, data, interactions, and semantic resources. It also discusses using knowledge patterns as expertise units and how patterns can be represented at different levels of abstraction through morphisms. Finally, it discusses some examples of problems involving temporal and procedural patterns as well as anti-patterns to avoid in knowledge modeling.
Dimensions of Media Object ComprehensibilityLawrie Hunter
This document discusses dimensions of comprehensibility in media objects. It begins by framing the topic as a pattern language approach for machine-mediated communication (MMC). It notes insights can be drawn from second language learning, where comprehension of partially acquired languages reveals aspects of text and media nature. The document then discusses various parameters that influence the difficulty of comprehending media objects, such as document purpose, content, target behaviors, and lexical items. It provides examples of how knowledge structure maps can link descriptive information about a text. The goal is to develop a pattern language to guide machines in human-like communication by understanding factors affecting media object comprehension.
This document presents a multi-stage framework for extractive summarization of scientific papers based on keyword profiling and language modeling. The framework first identifies keywords that capture a paper's most important contributions, then creates a keyword profile language model. It ranks sentences by divergence from this model and re-ranks them based on novelty to generate a 5-sentence summary. Evaluation shows the framework achieves state-of-the-art performance on summarizing individual papers and remains effective under more stringent length limits.
This document provides an overview of information extraction (IE). It describes IE as the process of scanning text to extract relevant entities, relations, and events. The document outlines common IE tasks like named entity recognition and discusses approaches to IE like using cascaded finite-state transducers and learning-based methods. It also addresses challenges in IE like measuring performance and how systems are progressing towards overcoming the 60% accuracy barrier.
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...Carrie Wang
This project developed a new quantitative methodology using feature-based context-free grammar to analyze discourse semantics from social media discussions in order to identify potential drug abuse. The methodology was able to parse YouTube comments about recreational cough syrup use and perform anaphora resolution. This computational representation of discourse contributes to understanding human language structure and has applications in public health monitoring and clinical research.
Ontology Learning from Text
Ontology construction ‘Layer Cake’
Knowledge representation and knowledge management systems
Subtasks in ontology learning
Most Popular Ontology Learning Tools
The document discusses the impact of standardized terminologies and domain ontologies in multilingual information processing. It outlines how natural language processing (NLP) techniques can be used to semi-automatically populate ontologies by extracting information from text. Integrating knowledge from ontologies, NLP tools, and subject experts allows for more effective information access and management in an organization.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
This document discusses translation memory (TM) tools and features. It provides an overview of the history and evolution of TM tools, including their move to the cloud. It describes key TM features like leveraging previous translations, fuzzy matching, and analysis capabilities. It also explains that while TM tools all provide similar basic functions, they analyze data and display matches differently, which can result in varying word count metrics. Weighted word counts aim to standardize metrics by assigning different values to matches based on their degree of fuzziness.
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
The document provides an overview of hybrid machine translation approaches. It discusses selective machine translation which selects the best translation from multiple systems. Pipelined machine translation uses one system for pre-processing or post-processing of another system. Statistical post-editing uses statistical machine translation as a post-editor for rule-based machine translation outputs to improve the translation quality.
The document discusses text mining, including defining it as the extraction of information from unstructured text using computational methods. It covers topics such as structured vs unstructured data, common text mining practice areas like information retrieval and document clustering, and challenges in text mining including ambiguity in language. Pre-processing techniques for text mining are also outlined, such as normalization, tokenization, stemming and removing stop words to clean and prepare text for analysis.
The document summarizes and compares schema matching and ontology mapping. It discusses how schema matching approaches can be applied to ontology mapping given the similarities between schemas and ontologies. The document outlines different categories of schema matching techniques (element-based, structure-based) and provides examples. It also summarizes several ontology mapping tools and approaches that utilize different matching strategies like string, structure, and semantic similarity.
Text mining seeks to extract useful information from unstructured text documents. It involves preprocessing the text, identifying features, and applying techniques from data mining, machine learning and natural language processing to discover patterns. The core operations of text mining include analyzing distributions of concepts, identifying frequent concept sets and associations between concepts. Text mining systems aim to analyze document collections over time to identify trends, ephemeral relationships and anomalous patterns.
The document provides an overview of ontology and its various aspects. It discusses the origin of the term ontology, which derives from Greek words meaning "being" and "science," so ontology is the study of being. It distinguishes between scientific and philosophical ontologies. Social ontology examines social entities. Perspectives on ontology include philosophy, library and information science, artificial intelligence, linguistics, and the semantic web. The goal of ontology is to encode knowledge to make it understandable to both people and machines. It provides motivations for developing ontologies such as enabling information integration and knowledge management. The document also discusses ontology languages, uniqueness of ontologies, purposes of ontologies, and provides references.
Formal and Computational Representations
The Semantics of First-Order Logic
Event Representations
Description Logics & the Web Ontology Language
Compositionality
Lamba calculus
Corpus-based approaches:
Latent Semantic Analysis
Topic models
Distributional Semantics
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
The concept of digital library revolutionized its popularity with the development of networking technology. Digital library stores various kind of documents in digitized format that enables user smooth access to these documents at subsidized costs. In the recent past, a similar concept i.e., ontology library has gained popularity among the communities like semantic web, artificial intelligence, information science, philosophy, linguistics, and so forth.
This document discusses ontology mapping. It begins with an introduction to the semantic web and ontologies. Ontology mapping is important for allowing different ontologies to be aligned and related. There are different types of ontology mapping including alignment, merging, and mapping. The document then surveys some popular ontology mapping techniques including GLUE, PROMPT, and QOM. It evaluates these techniques and discusses their inputs, outputs, and approaches. The document concludes that semantic web research is important for advancing web technologies and realizing the goals of web 3.0. Future work could involve developing new ontology mapping techniques and publishing research on existing mapping methods.
Lect6-An introduction to ontologies and ontology developmentAntonio Moreno
The document provides an overview of ontologies and ontology development:
1. It defines ontologies as explicit specifications of conceptualizations in a domain that define concepts, properties, attributes, and relationships to enable knowledge sharing.
2. Ontology components include concepts, properties, restrictions, and individuals. Ontologies can range from single large ontologies to several specialized smaller ones.
3. OWL is introduced as the standard language for representing ontologies, with features like classes, properties, restrictions, and logical operators.
4. A general methodology for ontology development is outlined, including determining scope, reusing existing ontologies, enumerating terms, and defining classes, properties, and other components in an iterative
For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a novel scientific result using them is an essential technique. In social and business areas, open data strategies in many countries promote diversity of public data, how to combine big data and open data is a big challenge. That is, diversity of dataset is a problem to be solved for big data.
Ontology gives a systematized knowledge to integrate multiple datasets across domains with semantics of them. Linked Data also provides techniques to interlink datasets based on semantic web technologies. We consider that combinations of ontology and Linked Data based on ontological engineering can contribute to solution of diversity problem in big data.
In this talk, I discuss how ontological engineering could be applied to big data with some trial examples.
An ontology is a specification of a conceptualization that allows us to represent domain knowledge so that we can share a common understanding, enable reuse, make domain assumptions explicit, and separate domain knowledge from operational knowledge. Ontologies offer reasoning services like consistency checking, subsumption, and query answering that are different from those found in XML and relational databases. OWL ontologies use semantics rather than just syntax to represent knowledge about concepts, individuals, and relationships between them.
This document discusses knowledge patterns, which are invariances or regularities that exist across different types of data and domains. It provides examples of knowledge patterns found in linguistic resources, data, interactions, and semantic resources. It also discusses using knowledge patterns as expertise units and how patterns can be represented at different levels of abstraction through morphisms. Finally, it discusses some examples of problems involving temporal and procedural patterns as well as anti-patterns to avoid in knowledge modeling.
Dimensions of Media Object ComprehensibilityLawrie Hunter
This document discusses dimensions of comprehensibility in media objects. It begins by framing the topic as a pattern language approach for machine-mediated communication (MMC). It notes insights can be drawn from second language learning, where comprehension of partially acquired languages reveals aspects of text and media nature. The document then discusses various parameters that influence the difficulty of comprehending media objects, such as document purpose, content, target behaviors, and lexical items. It provides examples of how knowledge structure maps can link descriptive information about a text. The goal is to develop a pattern language to guide machines in human-like communication by understanding factors affecting media object comprehension.
This document presents a multi-stage framework for extractive summarization of scientific papers based on keyword profiling and language modeling. The framework first identifies keywords that capture a paper's most important contributions, then creates a keyword profile language model. It ranks sentences by divergence from this model and re-ranks them based on novelty to generate a 5-sentence summary. Evaluation shows the framework achieves state-of-the-art performance on summarizing individual papers and remains effective under more stringent length limits.
This document provides an overview of information extraction (IE). It describes IE as the process of scanning text to extract relevant entities, relations, and events. The document outlines common IE tasks like named entity recognition and discusses approaches to IE like using cascaded finite-state transducers and learning-based methods. It also addresses challenges in IE like measuring performance and how systems are progressing towards overcoming the 60% accuracy barrier.
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
New Quantitative Methodology for Identification of Drug Abuse Based on Featur...Carrie Wang
This project developed a new quantitative methodology using feature-based context-free grammar to analyze discourse semantics from social media discussions in order to identify potential drug abuse. The methodology was able to parse YouTube comments about recreational cough syrup use and perform anaphora resolution. This computational representation of discourse contributes to understanding human language structure and has applications in public health monitoring and clinical research.
Ontology Learning from Text
Ontology construction ‘Layer Cake’
Knowledge representation and knowledge management systems
Subtasks in ontology learning
Most Popular Ontology Learning Tools
The document discusses the impact of standardized terminologies and domain ontologies in multilingual information processing. It outlines how natural language processing (NLP) techniques can be used to semi-automatically populate ontologies by extracting information from text. Integrating knowledge from ontologies, NLP tools, and subject experts allows for more effective information access and management in an organization.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
This document discusses translation memory (TM) tools and features. It provides an overview of the history and evolution of TM tools, including their move to the cloud. It describes key TM features like leveraging previous translations, fuzzy matching, and analysis capabilities. It also explains that while TM tools all provide similar basic functions, they analyze data and display matches differently, which can result in varying word count metrics. Weighted word counts aim to standardize metrics by assigning different values to matches based on their degree of fuzziness.
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
The document provides an overview of hybrid machine translation approaches. It discusses selective machine translation which selects the best translation from multiple systems. Pipelined machine translation uses one system for pre-processing or post-processing of another system. Statistical post-editing uses statistical machine translation as a post-editor for rule-based machine translation outputs to improve the translation quality.
This document discusses statistical machine translation decoding. It begins with an overview of decoding objectives and challenges, such as ambiguity in possible translations. It then describes decoding phrase-based models using a linear model and dynamic programming approach, with approximations like beam search. Grammar-based decoding is also covered, including synchronous context-free grammar parsing and translation. Key challenges like search complexity and language model integration are addressed.
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
The document discusses a company's evaluation of their machine translation systems. They had hoped automated metrics would correlate with productivity gains reported by post-editors, but found no correlation. Reasons for variability included different translation environments, engines, clients, post-editors, and word volumes. While some metrics indicated better translation quality, other factors like automatic terminology tools impacted productivity more. The company now combines automated metrics with time/productivity data and qualitative reviews to evaluate their machine translation performance.
The document discusses terminology in the translation industry. It outlines several benefits of using terminology, including higher translation quality, shorter turnaround times, and stronger brand identity. Higher quality is achieved through consistent translations and automated quality assessment. Turnaround times are shortened by avoiding time spent searching for terms. Brand identity is strengthened when customers use consistent terminology to affirm their product uniqueness. However, the document notes that in reality, most customers do not invest in terminology management and language service providers have limited time and resources to dedicate to it.
This document outlines the structure and deliverables for the EXPERT project. It consists of 8 work packages related to management, user perspectives, data collection, language technology, learning from translators, hybrid approaches, training, and dissemination. Each work package has 2 deliverables and deadlines for completion. The project involves early stage researchers who will receive training, complete secondments, and participate in workshops and a winter school.
1. EXPERT Winter School Partner IntroductionsRIILP
The document provides information about the University of Wolverhampton's Research Group in Computational Linguistics and Statistical Cybermetrics Research Group. It discusses the groups' expertise in various areas of natural language processing and information retrieval. Key personnel are mentioned, including Ruslan Mitkov, Constantin Orasan, and Mike Thelwall. Ongoing and past projects funded by sources like the EC and NBME are summarized.
9. Ethics - Juan Jose Arevalillo Doval (Hermes)RIILP
This document discusses ethics in the translation industry. It provides definitions of ethics from Webster's and Oxford dictionaries and lists key ethical values like integrity, transparency, and responsibility. It also outlines professional values for translators such as competence, confidentiality, and avoiding practices that undermine the profession. The document discusses issues in the industry like non-paid internships and accepting unrealistic translation projects. It provides examples of codes of conduct and outlines models for project outsourcing in the translation field.
9. Manuel Harranz (pangeanic) Hybrid Solutions for TranslationRIILP
This document discusses PangeaMT, a machine translation system, and experiences with hybridization. It provides a brief history of PangeaMT, describing its use of open-source Moses and capabilities. It outlines features for experts, including domain adaptation, engine creation and training. The document also discusses experiences with hybridization for linguistically distant language pairs, including challenges of word order differences and tokenization. It compares approaches using Toshiba and Mecab for Japanese reordering, finding Mecab produced higher accuracy. Future work is noted on morphology-rich languages like Russian and distant language reordering.
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
The document introduces the EXPERT ITN project, which aims to train young researchers on improving data-driven machine translation through empirical approaches. The project will support researchers during their training and research, with the goal of producing future leaders in the field. It describes the objectives to improve existing corpus-based translation tools by considering user needs, collecting data, incorporating linguistic processing, and developing hybrid approaches. The project consists of 12 individual research projects across 6 work packages and is led by an academic consortium with involvement from private sector partners.
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
This document discusses various methods for evaluating translation quality, including manual metrics, task-based metrics, and reference-based automatic metrics. It notes that evaluating translation quality is difficult because the definition of quality depends on factors like the end user and intended purpose. Methods discussed include n-point scales for adequacy and fluency, ranking translations, and counting errors. Issues with subjective judgments, reliability, and defining what makes a translation "best" are also covered.
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
This document provides an overview of example-based machine translation (EBMT). It discusses the core steps of EBMT including matching, alignment, and recombination. It also describes different varieties of EBMT such as character-based, word-based, pattern-based, syntax-based, and marker-based matching. Finally, it discusses approaches to EBMT including pure/runtime EBMT and compiled EBMT.
This document provides an overview of a tutorial on statistical machine translation given by Dr. Khalil Sima'an. The tutorial is divided into two parts, with Part I covering data and models, including word-based models, alignment, symmetrization, and phrase-based models. Part II, given by Trevor Cohn, will cover decoding and efficiency. The tutorial will examine the statistical approach to machine translation using parallel corpora and will discuss generative source-channel frameworks and challenges in estimating translation probabilities from sparse data. It will also explore how current models induce structure in translation data using alignments between source and target language structures.
This document discusses human translation workflow and contains three sections. Section I provides an overview of human translation workflow. Section II discusses professional translation, including market studies, emerging trends, and the translation workflow. Section III focuses on corpus-based translation, outlining guidelines for corpus creation, using corpora for translation training, and concordancing tools.
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
This document discusses how natural language processing (NLP) techniques can help improve machine translation (MT). It describes some of the linguistic challenges in MT, such as ambiguity at the lexical, syntactic, semantic and pragmatic levels. It then discusses how various NLP tasks, such as tokenization, word sense disambiguation, and handling of named entities could enhance MT systems. Several studies that have successfully integrated NLP techniques like word sense disambiguation into statistical machine translation systems are also summarized.
This document outlines the key elements that should be included in the main text of a thesis or dissertation. It recommends including an introduction that discusses background information and research questions, a literature review that summarizes relevant theories and frameworks, a methodology section that describes the research design and data sources, a results section that presents findings and discussions, and a conclusions section that discusses implications. It also recommends an hourglass structure with a funnel-shaped introduction and reverse funnel-shaped conclusion bookending chapters that discuss topics in more detail. Finally, it provides examples of how to structure paragraphs in an introduction from broad opening statements to more specific discussions of the research topic and questions.
Analysis of technical papers and terms from technical dictionaries (e.g. CIRP dictionary, CIRPedia, etc..). Solution of disambiguation of technical texts through high quality dictionaries.
This document contains definitions of terminology-related terms from various sources:
- It defines terms such as term, preferred term, synonym, deprecated term, and nested term.
- It describes terminology work practices such as term extraction, terminology planning, and standardization.
- It explains terminology resources such as terminology databases and terminological entries that contain terms and their attributes for a given subject field.
A statistical approach to term extraction.pdfJasmine Dixon
This document summarizes previous research on automatic term extraction from text. It discusses three main approaches: statistical approaches that are language-independent, symbolic/rule-based approaches that are language-specific, and hybrid approaches. Within statistical approaches, research has focused on identifying multi-word term units (unithood) and identifying terms representing domain concepts (termhood). Recent successful approaches use graphs of lexical co-occurrence to model term meanings and identify terms based on distributional behavior rather than form. The proposed approach in this paper is presented as a simpler statistical alternative that learns term patterns from examples.
The document discusses the principles and guidelines for creating subject headings in the Library of Congress Subject Headings (LCSH) system. Key points include:
- LCSH headings are created to catalog and retrieve materials on given topics within a collection.
- Headings use standardized terminology from current literature to represent subjects.
- Headings aim to be exhaustive, reflecting all topics covered in a work, while also having indexing depth through multiple assigned headings.
- A single heading is chosen to represent each topic for consistent retrieval, with references to guide users.
- Headings are revised over time to maintain currency, balancing changes against impacts to existing records.
A Cross-Language Study On Citation Practice In PhD ThesesJasmine Dixon
This document summarizes a study that analyzed how citation practices differ in PhD theses written in English versus Spanish. Specifically, it looked at how reporting verbs are used when citing previous work in the literature review sections. The study analyzed 10 PhD theses written in English from the University of Glasgow and 10 theses written in Spanish from the Universidad Politécnica de Valencia, all within the field of computing. It found that Spanish theses tended to be longer overall than English theses. The average length of literature review sections was also longer in Spanish theses compared to English ones. The study aimed to provide a contrastive analysis of how reporting verbs and other interactional resources are deployed differently when citing sources in theses written
The document discusses the key aspects of thesauri including their purpose, structure, types of relationships displayed, and evaluation criteria. Specifically, it notes that a thesaurus provides a standardized vocabulary for information retrieval by displaying hierarchical (e.g. broader and narrower terms) and equivalence (e.g. synonyms) relationships between terms. It also discusses how terms are organized in a thesaurus and criteria for evaluating the effectiveness of a thesaurus.
The document discusses the principles and parameters framework for language acquisition proposed by Chomsky and Lasnik. It explains that universal grammar consists of a finite set of principles common to all languages and a finite set of parameters that determine variation between languages. Children acquire language by learning the parameter settings of their native language based on innate linguistic principles. The document provides examples of parameters like head directionality and the pro-drop parameter. It also discusses how phrase structure rules and lexical subcategorization frames realize principles within syntactic structure.
The document discusses stepwise methodologies for building ontologies. It outlines common steps such as identifying the purpose and scope, capturing concepts and relationships, coding the ontology formally, integrating existing ontologies, evaluation, and documentation. It emphasizes starting with a middle-out approach to capture definitions and discusses reaching consensus among those involved in building the ontology. Modularization of ontologies into reusable components is also presented as an important aspect of the methodology.
Scientific and Technical Translation in English: Week 2Ron Martinez
This document provides an overview of a scientific and technical translation class being taught by Dr. Ron Martinez. It includes the following:
1. Homework assignments for students, including reading articles, translating texts, and discussing translations with classmates.
2. A summary of Rudolf Jumpelt's view that in scientific and technical translation, information content takes priority over form and accuracy of transmission.
3. An outline of the general course structure, which will cover research article structure, translation tools, assignments, and group presentations.
4. Instructions for an in-class discussion of a provided research article on the challenges of scientific translation.
5. Guidance on proper research article structure based on the IMRaD
This document introduces a volume on corpus-based approaches in cognitive linguistics. It discusses common assumptions in cognitive linguistics and functional linguistics, such as the view that language is not autonomous and that linguistic knowledge emerges from language use. It also outlines common methods in corpus linguistics, such as analyzing naturally occurring language data from balanced corpora using frequency lists and statistical techniques. The introduction notes that while corpus methods have a long history, their use in cognitive linguistics is more recent. It identifies different levels of analysis (e.g. lemma vs. word form) and degrees of quantitativeness as parameters of corpus-based research.
This document provides guidance on various aspects of writing a research paper, including conceptual background, methodology, literature review, and structure. It discusses different sources for conducting a literature review, such as library catalogs, databases, and conference proceedings. It outlines a 5-level ranking system for evaluating literature sources. Various types of literature reviews are described, including chronological, thematic, and methodological. Tasks of a literature review like summarizing, synthesizing, critiquing and comparing sources are outlined. The document provides examples of integrating sources into a literature review through summarization, paraphrasing, and analysis. It also discusses evaluating the strengths and weaknesses of sources and comparing and contrasting different studies.
This document discusses a study that analyzed the textual organization of research articles (RAs) across three engineering sub-disciplines: civil engineering, software engineering, and biomedical engineering. The study compiled corpora of 180 high-quality RAs from each sub-discipline. Using genre analysis, the study identified the typical structure of individual RA sections for each sub-discipline. Statistical analysis was then used to identify significant variations in textual organization that distinguish one engineering sub-discipline from another. The findings provide insights into how each sub-discipline structures information differently in RAs and contributes to better professional communication in engineering.
Lecture 7 Translation techniques of scientific texts.pptxsabinafarmonova02
The document discusses translation techniques for scientific texts. It begins by defining scientific texts as written documents based on scientific principles and methods. It describes different types of scientific articles and papers. Scientific translation focuses on translating scholarly materials across various fields of study like medicine, life sciences, social sciences, and mathematics. When translating scientific texts, the translator must adapt the style and format while maintaining terminology, concepts, and avoiding ambiguities. Challenges include conveying technical concepts across languages and adapting to different cultural understandings. The document outlines several lexical and grammatical peculiarities of scientific texts.
Best Practices for Creating Definitions in Technical Writing and EditingThe Integral Worm
This presentation describes best practices for creating and documenting definitions in technical writing and editing. Topics covered are the following: effective definitions, multiple meanings, defining technical nomenclature, defining symbols, formal definitions, and informal definitions, and placement of definitions.
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...Sarah Morrow
This document presents the results of a corpus-based analysis of terminology in the social sciences and humanities fields of legal science and administrative science in Indonesia. Keywords and word clusters were identified from specialized corpora in these fields to extract term candidates based on their linguistic components. Collocation analysis was then used to further examine two term candidates, "LINGKUNGAN HIDUP" and "BUDAYA PERUSAHAAN", to recognize their specific meanings based on habitually co-occurring words, fulfilling the cognitive component for terminology. This analysis demonstrated how corpus linguistics can be integrated with the communicative theory of terminology to study terminology from its linguistic and cognitive perspectives.
Article - An Annotated Translation of How to Succeed as a Freelance Translato...Cynthia Velynne
This document summarizes an annotated translation research study conducted by Wahyu Budi. The study translated a document on how to succeed as a freelance translator from English to Indonesian. The researcher identified 167 difficulties during translation and analyzed the 25 most difficult examples. Thirteen of 30 translation strategies and 5 of 13 translation principles were used in the analysis. The study concluded that not all strategies and principles could be employed due to analyzing a limited number of examples, but analyzing more examples may have identified use of additional strategies and principles. The implications were that translation requires mastery of both source and target languages as well as translation theories and computer software.
This document discusses the particularities of scientific translation. It begins by outlining the historic significance of scientific translation in disseminating knowledge across cultures. It then notes that while translation studies originally focused on literary works, scientific translation has grown in importance as an area of study. The document also differentiates scientific translation from other types by emphasizing the need for accuracy and consistency in terminology. It highlights challenges like translating cultural references and technical language in scientific texts.
Gabriela Gonzalez attended an expert project showcase in Rome, Italy in May 2016 where she participated in roundtable discussions on the relationship between academia, industry, and translators. She noted that while improvements are needed for translators, the main issue is whether translator needs align with industry interests. Gonzalez advocated for greater collaboration between translators, software developers, and researchers to create more user-friendly translation tools. She concluded by expressing her hope that the industry would adopt research findings and that she could be more involved in sharing experiences to improve quality assurance processes.
Pangeanic is an MT company founded in Valencia, Spain with offices in Tokyo, London, and Shanghai. Pangeanic's PangeaMT system was the first commercial application of the open-source Moses platform. It has been further developed and customized for the localization industry. Pangeanic has worked with clients such as Sony Europe to provide MT services and experiences. The company's system includes features such as monolingual training, integration with Apertium, and automated data cleaning. Pangeanic advocates for empowering translators and users in controlling MT systems and sees MT as a business opportunity to transform how translation services are provided and create new revenue streams.
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
This document discusses a study on the productivity of translators when post-editing machine translation (MT) outputs compared to translating from scratch. The study was conducted with 10 in-house translators post-editing the output of an MT system customized for 3 years. It found that all but one translator were faster at post-editing MT outputs compared to translating from scratch. Automatic evaluation metrics like BLEU, TER and a fuzzy match score were found to correlate with productivity gains from MT. Thresholds for productivity gains were proposed based on these metrics.
Hermes Traducciones is the 15th largest translation company in Southern Europe and 154th globally. It is certified under quality standards ISO 9001 and EN 15038. The company has 25-30 permanent employees, over 150 freelance translators in its database, and translation teams in Portugal and Brazil. Hermes provides a wide range of translation and localization services, especially in technical fields like engineering and software. It also collaborates with universities on research projects evaluating machine translation and its potential to increase translation productivity and savings.
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
This document describes improving hybrid translation tools using a full-text search engine approach. It discusses using natural language processing techniques and a translation memory database indexed with ElasticSearch to improve fuzzy matching. The goal is to maximize reuse of existing human translations by handling linguistic features like string transformations, part-of-speech tagging, and tokenization.
KantanMT.com is a statistical machine translation platform that is cloud-based and highly scalable. It provides automated translations at high speed and quality by fusing translation memory, machine translation, and rules. The document then discusses KantanMT's vision, some of its key features and statistics, locations it operates from including the INVENT Concept Space and School of Computing, how it obtained funding from the Commercialization Fund, and its journey from starting as a prototype to becoming widely adopted with billions of words translated.
This document describes CATaLog, a translation tool that provides:
- Incremental machine translation, automatic post-editing, and translation memory capabilities to enhance translations over time.
- Color-coded matching of source segments to translated segments to reduce cognitive load on translators.
- Online project management, translation, and review capabilities without requiring local installation.
This document discusses optimizing machine translation systems for user benefit. It outlines several ways to measure translation quality and utility, including editing time and effort. Current approaches include post-processing machine translation, learning from translator feedback, and using quality estimation to guide humans. The document advocates formalizing the task purpose and taking advantage of user context to explicitly train systems to maximize user benefit, such as optimizing interactive prediction for translation or post-editing tasks. The vision is for task-based optimization to be applied beyond machine translation to any user-agent interaction scenario.
The document summarizes the results of a survey investigating the needs and preferences of translators regarding translation technologies. The survey looked at translators' usage of computer-assisted translation (CAT) tools, machine translation, terminology management tools, and corpora. It found that while CAT tools are widely used, features like machine translation and terminology management that appear as both most useful and most disliked require further improvements to be truly useful. Respondents emphasized needing tools that are simple to use and integrate multiple resources like translation memories and corpora. The survey revealed both opportunities to better meet translators' needs and their varying attitudes towards the role of technology in translation work.
This document discusses quality estimation of machine translation using the QuEst++ framework. It summarizes that QuEst++ can predict the quality of unseen machine translated text using only the source and target texts without references, extracting features to build models that estimate metrics like post-editing effort and time from limited labeled training data. The framework extracts features at the word, sentence and document level from the source and target texts and information from the machine translation system, then trains models using those features to predict quality scores for new translations.
The document discusses evaluating terminology tools through their features. It first introduces how terminology is important for translation and natural language processing. It then explores the features of Terminology Extraction Tools and Terminology Management Tools. These include functions like term extraction, context extraction, and glossary management. The document evaluates several specific tools to compare their feature sets. It concludes by emphasizing the importance of identifying user needs and systematically testing tools to select the most appropriate one.
This document discusses combining translation memory (TM) and statistical machine translation (SMT). It summarizes that TM works best for repetitive text but SMT is more reliable when there are no close matches. It then reviews the speaker's previous work on combining TM and SMT during decoding and before decoding, and presents results showing BLEU score improvements on several language pairs.
The document discusses the differences between how ontologies are used in scientific research versus industry. In scientific research, ontologies focus on creating and extending existing generic ontologies and validating ontology induction methods, using ontologies to improve natural language processing technologies. In industry, ontologies are used as value-adding knowledge bases for various purposes like matching product reviews to categories, terminology standardization in machine translation, and matching resumes to jobs. The document argues that bridging the gap between scientific and industry usage of ontologies requires more domain-specific data and discoveries, true application focus, and open data flow.
This document discusses Acclaro's quality management program. It introduces their services and clients, then describes their quality program which uses a customized memoQ feature to track errors. It discusses two client cases, including a technical software company and an online media company. The quality assurance model and customization options are demonstrated. Benefits include quantitative measurement and issue identification. Challenges include scalability and technical bugs. The goal is more integrated quality reporting and useful statistics.
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
The document discusses collecting and cleaning multilingual data. It describes estimating the amount of parallel data that exists in Common Crawl, testing different crawlers, and developing a machine learning approach to classify translation units as either true translations or errors. Key points include estimating that Common Crawl contains around 1 billion parallel pages, crawlers tested had low recall, and the best performing model for classifying translation units was an SVM classifier with an F1-score of 0.81.
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
This document summarizes the results of a survey on machine translation (MT) usage among professional translators. Some key findings include:
- 36% of respondents currently use MT, while 38% do not use it and do not plan to. Most saw potential benefits from high-quality MT.
- MT is used equally for resource-rich and resource-poor languages. Technical domains like ICT saw higher MT usage.
- Higher computer competence and IT training were associated with greater MT use. Translators working with agencies also used MT more.
- While MT can provide benefits, respondents noted it cannot replace humans and may threaten jobs or lower wages. Better quality is needed.
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
The document describes a statistical automatic post-editing (APE) system that aims to improve machine translation output with minimal human effort. The system uses hierarchical phrase-based statistical machine translation trained on machine translation output and reference human translations. The system first cleans and preprocesses data, generates improved word alignments, and then performs hierarchical phrase-based SMT to output post-edits. Evaluation shows the APE system outperforms the baseline machine translation according to both automatic metrics and human evaluation, requiring less post-editing effort.
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
This document summarizes a study that investigates using distributional similarity measures (DSMs) to assess the relatedness between documents in comparable corpora. The study uses three DSMs - number of common entities, Spearman's rank correlation coefficient, and Chi-square - on four subcorpora from the INTELITERM corpus. The results show the subcorpora generally contain highly related documents, though the smaller Spanish translated corpus shows more inconsistency. Future work could involve expanding experiments to other languages and DSMs, and using the approach to filter unrelated documents.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
17. Anne Schuman (USAAR) Terminology and Ontologies 2
1. Terminology and Ontologies
Section 2: Current Research Topics
Anne-Kathrin Schumann
Saarland University
“Expert“ Winter School
Birmingham
November 13, 2013
2. Overview
Current trends in research
Term variation
Culture-specific semantic differences
Definitions, contexts, knowledge-rich
contexts
Usability aspects
Term extraction and term mapping
3. Current trends in research
Controversial paper by Cabré in Terminology 5 (1),
1998/1999, pp. 5-19: Do we need an autonomous theory
of terms?
“It is increasingly being accepted that Wüster‘s
theoretical stance […] is proving inadequate for the
different current needs of term description and
processing because of its idealising and simplifying
approach.“
(markup is mine)
4. Current trends in research
What have we been talking about?
terminology adopts a decompositional, structuralist approach to
the description of specialised meanings
the meaning of a terminological unit (concept+term) can be
described by a set of sufficient and necessary semantic invariants
no interest in the linguistic domain of the field:
“Only the designations of the concepts, the lexicon, are relevant to
the terminologist. Syntax and inflection are not. For the latter, the
same rules apply as in general language .“
(my translation from Wüster 1985: 2, markup as in the original)
5. Current trends in research
Terminology, then, is an exercise of reducing the complexity of
reality to simpler feature structures
“[D]iscreteness is in the head and fuzzyness is in the world.“
(Geeraerts 2010: 132)
6. Current trends in research
Main criticism: No account for
the multidisciplinary (denominative, cognitive and
functional) nature of terms
the communicative dimension of terminology
connotational aspects in terminology
the linguistic dependence of terms on particular languages
pragmatic/functional aspects of term variation
7. Current trends in research
Small recap: term variation
is ubiquitous
is a problem for applications that use terminology
Wüster‘s solution: standardisation
counter-proposal: systematic study and handling of term
variation
8. Current trends in research
Da jedoch der Massenstrom gleich bleiben muss, weitet sich bei einer frei
angeströmten Windkraftanlage der Wind auf, da eben trotz der geringeren
Geschwindigkeit hinter der Anlage die gleiche Menge Luft abtransportiert werden
muss. Aus eben diesem Grund ist die komplette Umwandlung der Windenergie in
Rotationsenergie mit einer Windkraftanlage nicht möglich: Dafür müssten die
Luftmassen hinter der Windkraftanlage ruhen, könnten also nicht abtransportiert
werden.
(Wikipedia)
-> coreference chains for text cohesion
9. Current trends in research
Term variation:
cannot be treated only prescriptively because it is
functional from a linguistic point of view
terms are reiterated in discourse for reasons of cohesion
the informativity of the term is managed by altering the
form of the term (especially if it is a MWT)
the whole form can normally be retrieved from context
(Collet 2004: 102)
-> term variation is influenced by text-linguistic aspects
10. Current trends in research
Other reasons for terminological variation:
dialects and geographical variation
chronological variation
social variation (e.g. academic expert vs. practitioner)
creativity, emphasis, expressiveness
language contact
conceptual imprecision, ideological reasons (e.g. “armchair
linguistics“) and different points of view (ozone layer depletion,
ozone layer destruction, ozone layer loss, ozone layer reduction)
(Freixa 2006)
11. Current trends in research
What is a term variant?
“ … an utterance which is semantically and conceptually related to
an original term.“
(Daille et al. 1996: 201)
-> an attested form found in a text
-> there is a codified (authorised) original term
-> semantically and conceptually related
12. Current trends in research
Types of variants:
graphical: missing hyphen (e.g. Windkraftanlage vs.
Windkraft-Anlage) or case differences
inflectional: orthographic (e.g. conservation de produit vs.
conservation de produits)
shallow syntactic:
variation of preposition (e.g. chromatographie sur/en
colonne)
optional characters (e. g. fixation de l‘azote vs. fixation
d‘azote)
predicative use of the adjective
13. Current trends in research
Types of variants:
syntactic:
additional modifier
additional nominal modifier (closed list, e.g. protéine
végétale vs. protéine d‘origine végétale)
expansion of the nominal head
permutations (e.g. air pressure vs. pressure of the air)
14. Current trends in research
Types of variants:
morphosyntactic:
alternation between preposition/prefix (e.g. pourissment
aprés récolte vs. pourissment post-récolte)
derivations (e.g. acidité du sang vs. acidité sanguine)
paradigmatic substitution (e. g. Ehemann vs. Ehegatte)
anaphoric uses
acronyms
(Daille 2005)
15. Current trends in research
Variant recognition given a set of candidate terms:
string similarity for inflectional/orthographical variants
(candidates with same POS shape and same length):
rule-based correction of lemmatisation errors
16. Current trends in research
Variant recognition given a set of candidate terms:
term variation patterns for rule-based variant
recognition
(Weller et al. 2011)
17. Current trends in research
Culture-specific semantic differences
Terminology considers specialised concepts to be
universal across languages
For general language, this view is outdated (pragmatics,
text linguistics, cultural differences etc.)
But also for LSP, things are not that easy
18. Current trends in research
Culture-specific semantic differences
Schmitt (1999) mentions different types of semantic
differences on the CONCEPTUAL level, e.g.
culture-dependent differences between conceptual
hierarchies
culture-dependent semantic prototypes
19. Current trends in research
Culture-specific semantic differences
culture-dependent differences between conceptual
hierarchies
e.g. different concept systems for steel in Germany and the
USA
“Primary coolant system interconnecting piping is carbon steel
with internal austenitic stainless steel weld deposit cladding.“
carbon steel = Kohlenstoffstahl?
20. Current trends in research
carbon steel = Baustahl
(+ term variation …)
“Most dictionaries fail to provide
accurate descriptions, especially in
problematic cases …“
(Schmitt 1999: 219, my translation from
German)
21. Current trends in research
Culture-specific semantic differences
culture-dependent semantic prototypes
• typical “German“ hammer:
nr. 1 (second from left)
• typical hammer in UK and
US: nr. 4 (first from right)
-> complicated translation
strategies, e. g.
• insertion of a functional
equivalent
• insertion of semantic markup (“In the US, the hammer
typically used is the …“)
• adaptation of drawings etc.
22. Current trends in research
Culture-specific semantic differences
culture-dependent semantic prototypes
“Apply the parking brake firmly. Shift the automatic transaxle to
Park (or manual transaxle to Neutral).“
->
„Handbremse fest anziehen. Schalthebel in Leerlaufstellung
bringen (bei Automatikgetriebe Wählhebel in Stellung P bringen).“
(Schmitt 1999: 255)
23. Current trends in research
Intermediate summary
Translation is a knowledge-based activity involving deep
semantic analysis, functional adaptation and the creation of
discoursive cohesion.
These issues affect terminological choices.
Detailed terminological descriptions are needed
to cope with lexical issues (term variation),
to constrain terminological (semantic) and, consequently,
translational choices.
The quality of a translation is a matter of functional adequacy (usability
in the target system and language and the intended context) rather
than linguistic (surface or structural) or even semantic similarity (skopos
theory).
24. Current trends in research
Intermediate summary: some research questions
How to improve (or adapt) NLP techniques (lemmatisation,
spelling correction/variant detection, compound splitting) for
specialised domains?
How can we identify term variants and map them to their
“canonical“ counterparts?
Can we use term variants for making (automatic) translation
or any other NLP task more fluent?
To which degree are variants detected by TM systems and can
we improve on that?
How can we provide richer semantic descriptions for terms?
25. Current trends in research
Definitions, contexts, knowledge-rich contexts
(ISOCat)
26. Current trends in research
Definitions, contexts, knowledge-rich contexts
Definitions are traditional parts of lexicographic entries
and were “inherited“ by terminology (but few resources
really provide them).
There are different kinds of definitions and different
ways of using them.
Lexicographic definitions explain lexical meanings
whereas terminographic definitions describe concepts.
Terminography normally requires richer descriptions
than standard definitions.
27. Current trends in research
Definitions, contexts, knowledge-rich contexts
Examples of lexicographic definitions
Linguistics: The scientific study of language
Categorical: Of or belonging to the categories.
- Usually not a complete sentence
- Often only with reduced information (certainly not enough
for learning the concept)
- Direct reference to specific lexical units
28. Current trends in research
Definitions, contexts, knowledge-rich contexts
Terminological definitions
Definition types
relate the concept to its hypernym (class of
objects, “genus proximum“)
enumerate all objects that fall under the category
in question
state how it differs from other hyponyms of the
genus proximum (“differentia specifica“) ,
„intension“ of the concept
“extension“ of the concept, “extensional“
definition, Wüster: “Umfangsdefinition“
A definition which describes the intension of a
concept by stating the superordinate concept and
the delimiting characteristics. (ISO 12620, ISOCat)
A description of a concept by enumerating all of
its subordinate concepts under one criterion of
subdivison. (ISO 12620, ISOCat)
29. Current trends in research
Definitions, contexts, knowledge-rich contexts
Terminological definitions
Examples
“The planets of the solar system are Mercury, Venus, Earth, Mars, Jupiter,
Saturn, Uranus, Neptune and Pluto.“
(Bessé: „Terminological Definitions“. In Wright/Budin 1997, pp. 63-74)
„Defektivum. Wort, das im Vergleich zu anderen Vertretern seiner Klasse
‚defekt‘ ist in bezug (sic!) auf seine grammatische Verwendung, z. B. bestimmte
Adjektive wie hiesig, dortig, mutmaßlich, die nur attributiv verwendet werden
können.“
(Bußmann: Lexikon der Sprachwissenschaft)
Many other classifications, see e.g. Cramer 2011
30. Current trends in research
Definitions, contexts, knowledge-rich contexts
Context
Standard category in terminological entries
Important, but under-specified
Context as usage example, e. g. „Photosynthesis takes place primarily in
plant leaves, and little to none occurs in stems, etc.”
-> can provide linguistic information (selectional preferences,
collocates)
Context as semantic description, e. g. „The parts of a typical leaf include
the upper and lower epidermis, the mesophyll, the vascular bundle(s)
(veins), and the stomates.”
-> provide semantic information, including information about conceptual
relations
(examples from IATE)
31. Current trends in research
Definitions, contexts, knowledge-rich contexts
Knowledge-rich contexts (KRCs, e.g. Meyer 2001)
My take on KRCs
Sentences that provide relevant bits and pieces of information (subject to
the definition of relevant semantic relations) that, taken together, can be
used for building rich semantic descriptions.
(Intentional or extensional) definitions are subtypes of KRCs.
There is much more information in texts than just restircted types
definitions.
Annotating KRCs in corpora is hard
Which is the domain?
Which is the definiendum?
Which semantic relations are relevant for (generic or domain-specific)
terminological descriptions?
Annotators prefer Aristotelian statements and are biased by lack or existence of
domain knowledge (Cramer 2011, Schumann 2013).
Research results for different languages mentioned in references section
32. Current trends in research
Usability aspects
How to support terminological workflows?
For which groups of language workers is terminology
relevant?
What kind of information do they look for?
Which kinds of software and formats do they use?
Survey (1782 respondents) conducted within the TAAS
project (http://www.taas-project.eu/)
information and graphics provided by KD Schmitz
38. Current trends in research
Intermediate summary
The needs of language workers are rather clear (tools, data
formats, time constraints, information needs, …).
Rich terminological descriptions are needed.
Semantic (conceptual) information seems to be more
important than linguistic information (score Wüster^^).
However, some linguistic issues need to be handled.
Almost all terminological resources are deficient in the most
important types of information (semantic information).
39. Term extraction and term mapping
Term extraction
Standard approach (for European languages)
POS filtering
Statistical filtering against a reference corpus
(filtering against stop list, frequency threshold)
40. Term extraction and term mapping
Term extraction
Statistical scores, e.g.
Tf.idf (cf. Manning/Schütze 1999: 543)
C-value (Frantzi et al. 2000), and many others …
41. Term extraction and term mapping
Term extraction
Statistical scores
Zhang et al. (2008) distinguish
unithood measures (mutual information, log-likelihood, t-test
etc.)
termhood measures (tf.idf, weirdness, domain pertinence,
domain specificity)
Combined methods (e.g. C-value)
They compare several methods
42. Term extraction and term mapping
Term extraction
TermExtractor (Sclano and Velardi 2007) combines
several approaches
Domain pertinence, where 𝐷 𝑖 is the domain of interest and
𝐷𝑗 is a document in another domain
Domain consensus, where norm_freq is a normalised
frequency in a domain-specific document
43. Term extraction and term mapping
Term extraction
TermExtractor (Sclano and Velardi 2007) combines
several approaches
Lexical cohesion, where n is the number of words
composing a candidate and 𝑤 𝑗 a word in the candidate
The final score is a linear combination of the three scores
Information about structural mark-up + a set of heuristics
44. Term extraction and term mapping
Term extraction
Nazar and Cabré (2012) present a supervised learning
approach to term extraction
Input
A POS-tagged list of domain terms
A reference corpus of general language
45. Term extraction and term mapping
Term extraction
Nazar and Cabré (2012) present a supervised learning
approach to term extraction
Algorithm
Calculate frequency distribution of POS sequences
Calculate frequency distribution of lexical units (word forms and
lemmas)
Calculate character ngrams for each word type
Accept, in the test data, only candidates with frequent POS
patterns
Rank candidates with frequent features higher than others
46. Term extraction and term mapping
Term alignment
Extract term candidates from comparable multilingual
corpora and map SL terms onto TL terms
Weller et al. (2011) deal only with neoclassical terms
(internationalisms)
Detect candidate equivalents using string similarity
Decompose SL candidates into morphemes (rule-based) and
translate morphemes into TL
For compounds, split the compound first
Check against TL candidate list
47. Term extraction and term mapping
Term alignment
Pinnis (2013) presents a context-independent (knowledgepoor) method for term mapping
Pre-processing
Lowercase candidate terms
Apply simple transliteration rules for converting from other scripts
to Latin
Find top N translation equivalents from a probabilistic dictionary
Find top M transliteration equivalents using Moses character-based
MT
48. Term extraction and term mapping
Term alignment
Pinnis (2013) presents a context-independent (resourceand knowledge-poor) method for term mapping
Example of pre-processed terms
49. Term extraction and term mapping
Term alignment
Pinnis (2013) presents a context-independent (resourceand knowledge-poor) method for term mapping
Mapping
For each token in each pre-processed term, find the longest
common substring in all other terms‘ constituents
Otherwise, fallback on a Levenshtein-based similarity metric
Maximise overlaps and score them
50. Conclusion of the session
To sum up: You have learned about
The role of terminology in translation and LSP
The theoretical foundations of the discipline
The structure, parts and basic principles of terminological
entries
Other kinds of onomasiological resources
Some journals, conferences and other resources
The importance of terminological variation and methods for
finding term variants
Semantic differences between concepts/terms that cannot be
tackled yet automatically
51. Conclusion of the session
To sum up: You have learned about (continued)
Terminological definitions, contexts and knowledge-rich
contexts
The need for rich terminological representations and
approaches for providing them
Some practical aspects of terminological workflows
Knowledge-rich and knowledge-poor approaches to
term extraction and term mapping
52. References: Literature
Bessé, Bruno de (1997): “Terminological definitions“. Wright, Sue Ellen / Budin, Gerhard
(eds.): Handbook of Terminology Management. Vol. 1: Basic Aspects of Terminology
Management. Amsterdam/Philadelphia: John Benjamins, pp. 63-74.
Bußmann, Hadumod (1990): Lexikon der Sprachwissenschaft. Stuttgart: Kröner.
Cabré, M. Teresa (1998): “Do we need an autonomous theory of terms?“. Terminology 5
(1), pp. 5-19.
Cramer, Irene (2011): Definitionen in Wörterbuch und Text: Zur manuellen Annotation,
korpusgestützten Analyse und automatischen Extraktion definitorischer Textsegmente im
Kontext der computergestützten Lexikographie. PhD dissertation, University of
Dortmund, Germany.
Collet, Tanja (2004): “ What’s a term? An attempt to define the term within the
theoretical framework of text linguistics”. Linguistica Antverpiensia 3, pp. 99-111.
Daille, Béatrice (2005): “Variations and application-orinted terminology engineering“.
Terminology 11 (1), pp. 181-197.
Daille, Béatrice / Habert, Benoît / Jacquemin, Christian / Royauté, Jean (1996): “Empirical
observation of term variations and principles for their description“. Terminology 3 (2),
pp. 197-257.
53. References: Literature
Del Gaudio, Rosa / Branco, Antonio (2007): “Automatic Extraction of Definitions in
Portuguese: A Rule-Based Approach“. Neves, José / Santos, Manuel Filipe / Machado,
José Manuel (eds): Progress in Artificial Intelligence. Berlin/Heidelberg: Springer, pp. 659670.
Fahmi, Ismail / Bouma, Gosse (2006): “Learning to Identify Definitions using Syntactic
Features“. Workshop on Learning Structured Information in Natural Language
Applications at EACL 2006, Trento, Italy, April 3, pp. 64-71.
Fišer, Darja / Pollak, Senja / Vintar, Špela (2010): “Learning to Mine Definitions from
Slovene Structured and Unstructured Knowledge-Rich Resources“. LREC 2010, Valletta,
Malta, May 19-21, pp. 2932-2936.
Frantzi, Katerina / Ananiadou, Sophia / Mima, Hideki (2000): “Automatic Recognition of
Multi-Word Terms: the C-value/NC-value Method“. International Journal on Digital
Libraries 3 (2), pp. 115-130.
Freixa, Judit (2006): “ Causes of denominative variation in terminology. A typology
proposal”. Terminology 12 (1), pp. 51-77.
Geeraerts, Dirk (2010): Theories of Lexical Semantics. Oxford: Oxford University Press.
54. References: Literature
Manning, Christopher D. / Schütze, Hinrich (1999): Foundations of statistical natural
language processing. Cambridge: MIT Press.
Meyer, Ingrid (2001): “ Extracting Knowledge-Rich Contexts for Terminography: A
conceptual and methodological framework”. Bourigault, Didier / Jacquemin, Christian /
L’Homme, Marie-Claude (eds.): Recent Advances in Computational Terminology.
Amsterdam/Philadelphia: John Benjamins, pp. 279-302.
Malaisé, Véronique / Zweigenbaum, Pierre / Bachimont, Bruno (2005): “Mining defining
contexts to help structuring differential ontologies”. Terminology 11 (1), pp. 21-53.
Marshman, Elizabeth (2008): “ Expressions of uncertainty in candidate knowledge-rich
contexts”. Terminology 14 (1), pp. 124-151.
Muresan, Smaranda / Klavans, Judith (2002): “A Method for Automatically Building and
Evaluating Dictionary Resources”. LREC 2002, Las Palmas, Spain, May 29-31, pp. 231-234.
Nazar, Rogelio / Cabré, Maria Teresa (2012): “Supervised Learning Algorithms Applied to
Terminology Extraction“. TKE 2012, Madrid, Spain, June 19-22, pp. 209-217.
Pearson, Jennifer (1998): Terms in Context. Amsterdam/Philadelphia: John Benjamins.
Pinnis, Mārcis (2013): “Context Independent Term Mapper for European Languages“.
RANLP 2013, Hissar, Bulgaria, September 7-13, pp. 562-570.
55. References: Literature
Przepiórkowski, Adam / Degórski, Łukasz / Spousta, Miroslav / Simov, Kiril / Osenova,
Petya / Lemnitzer, Lothar / Kuboň, Vladislav / Wójtowicz, Beata (2007): “Towards the
Automatic Extraction of Definitions in Slavic“. BSNLP workshop at ACL 2007, Prague,
Czech Republic, June 29, pp. 43-50.
Sclano, Francesco / Velardi, Paola (2007): “TermExtractor: a Web Application to Learn
the Shared Terminology of Emergent Web Communities“. TIA 2007, Sophia Antipolis,
France, October 8-9.
Schmitt, Peter A. (1999): Translation und Technik. Tübingen: Stauffenburg.
Schumann, Anne-Kathrin (2013): “Collection, Annotation and Analysis of Gold Standard
Corpora for Knowledge-Rich Context Extraction in Russian and German“. Student
workshop at RANLP 2013, Hissar, Bulgaria, September 7-13, pp. 134-141.
Sierra, Gerardo / Alarcón, Rodrigo / Aguilar, César / Bach, Carme (2008): “Definitional
verbal patterns for semantic relation extraction”. Terminology 14 (1), pp. 74-98.
Storrer, Angelika / Wellinghoff, Sandra (2006): “Automated detection and annotation of
term definitions in German text corpora”. LREC 2006, Genoa, Italy, May 24-26, pp. 23732376.
56. References: Literature
Weller, Marion / Gojun, Anita / Heid, Ulrich / Daille, Béatrice / Harastani,
Rima (2011): “Simple methods for dealing with term variation and term
alignment“. TIA 2011, Paris, France, November 8-10, pp. 87-93.
Westerhout, Eline (2009): “Definition Extraction using Linguistic and
Structural Features“. First Workshop on Definition Extraction at RANLP
2009, Borovets, Bulgaria, September 14-16, pp. 61-67.
Wüster, Eugen (1985): Einführung in die Allgemeine Terminologielehre und
terminologische Lexikographie. 2nd edition. Wien: Infoterm.
Zhang, Ziqi / Iria, José / Brewster / Christopher, Ciravegna, Fabio (2008):
“A Comparative Evaluation of Term Recognition Algorithms“. LREC 2008,
Marrakech, Morocco, May 28-30, pp. 2108-2113.
58. Contributions to this Presentation
Prof. Klaus-Dirk Schmitz, Cologne University of Applied Sciences
Thanks to Dr. Alessandro Cattelan for backing me up!