The document discusses human translation versus machine translation. It notes that while human translation requires skills like language proficiency and cultural knowledge, machine translation relies on linguistics and computer science. The document also outlines some challenges of machine translation, such as ambiguity and complex grammar, and presents examples of how machine translation systems struggle with issues like lexical ambiguity. Resources and tools developed for machine translation are also summarized, including lexical databases and paraphrasing tools.
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
The document discusses developments in the field of human language technology (HLT) and its future applications. It notes that HLT is no longer confined to academia and is becoming integrated into information and communication technology products and services in daily life. The document provides an overview of developments in text and speech processing, including machine translation systems and spell checkers. It also discusses the role of open source tools and frameworks in advancing research in HLT, particularly for Indian languages.
While the interpreting profession has had LLD interpreters for many years, these individuals often have had few options for training and development to succeed in their profession. Court systems, community services, and healthcare providers have had difficulties in supporting the success of these interpreters for reasons that include lack of awareness of the unique complexities of interpreting and unrealistic expectations. Barriers that interpreters are tasked with overcoming include lack of
equivalencies of westernized terms, lack of shared medical concepts among the participants in the encounter, and difficulties in working with other interpreters when relay interpreting2 is required to facilitate communication among the participants of the encounter. Often times neither interpreter working in relay interpreting encounters has had training on how to effectively perform relay interpreting in consecutive or simultaneous mode. This presentation will aid in presenting and discussing possible strategies and solutions to overcome these barriers. Interpreter trainers will gain awareness of the complexities that these interpreters face and will be presented with practical
strategies that they can include in their training programs or workshops.
This training covers concepts and practical techniques, including:
- Identify barriers for LLD interpreters related to lack of language equivalencies
- Identify barriers to effectively interpreting in situations that require relay interpreting
- Describe training strategies and solutions that prepare interpreters to overcome these barriers
This document discusses translation studies and provides an outline of key topics. It defines translation as turning a source text into another language. There are two main stages in the translation process: comprehension of the source text and expression of the target text. Direct translation is from a foreign language into your native tongue, while indirect is from your native tongue into a foreign one. Specialized translation requires expertise in a field, while non-specialized does not. Translation techniques at the morphosyntactic and semantic levels are explained, including techniques like loan words, calques, and modulation. Translation can be used as a tool in second language learning by teaching it as a separate skill from acquisition.
The document discusses research on intra-sentential and inter-sentential code-switching between Japanese (L1) and English (L2) in diary entries. The study analyzed 8,797 English words from a Japanese student's diary during their year abroad in the US. It found differences in how English words were used based on whether they were within sentences (intra-sentential) or across sentences (inter-sentential). Words in intra-sentential code-switching tended to be content words like nouns, longer, less frequent, and more assimilated into Japanese. Words in inter-sentential code-switching included more function words, were shorter, more frequent, and less assimilated. The study
This document outlines classroom guidelines for learners at the School of Continuing Education (SCE) at the American University in Cairo. It details policies on attendance, punctuality, grading, and how final grades are determined and posted. The key points are:
- Learners must attend at least 75% of class sessions or they will fail the course.
- Learners are expected to arrive on time and any tardiness beyond twice will count as absences.
- Grades are based on continual assessments, a final exam, and end-of-term achievement tests. Final grades are posted by student ID number, not name.
Natural language processing provides a way in which human interacts with computer / machines by means of voice.
"Google Search by voice is the best example " which makes use of natural language processing.
The document discusses natural language processing and some of the key challenges involved. It describes how NLP systems aim to understand human language in written or spoken form by performing tasks like morphological analysis, parsing, semantic analysis, and discourse processing. It also discusses sources of ambiguity in natural language and different models and algorithms used to represent linguistic knowledge and process language, with the goal of building intelligent systems that can understand human communication.
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
The document discusses developments in the field of human language technology (HLT) and its future applications. It notes that HLT is no longer confined to academia and is becoming integrated into information and communication technology products and services in daily life. The document provides an overview of developments in text and speech processing, including machine translation systems and spell checkers. It also discusses the role of open source tools and frameworks in advancing research in HLT, particularly for Indian languages.
While the interpreting profession has had LLD interpreters for many years, these individuals often have had few options for training and development to succeed in their profession. Court systems, community services, and healthcare providers have had difficulties in supporting the success of these interpreters for reasons that include lack of awareness of the unique complexities of interpreting and unrealistic expectations. Barriers that interpreters are tasked with overcoming include lack of
equivalencies of westernized terms, lack of shared medical concepts among the participants in the encounter, and difficulties in working with other interpreters when relay interpreting2 is required to facilitate communication among the participants of the encounter. Often times neither interpreter working in relay interpreting encounters has had training on how to effectively perform relay interpreting in consecutive or simultaneous mode. This presentation will aid in presenting and discussing possible strategies and solutions to overcome these barriers. Interpreter trainers will gain awareness of the complexities that these interpreters face and will be presented with practical
strategies that they can include in their training programs or workshops.
This training covers concepts and practical techniques, including:
- Identify barriers for LLD interpreters related to lack of language equivalencies
- Identify barriers to effectively interpreting in situations that require relay interpreting
- Describe training strategies and solutions that prepare interpreters to overcome these barriers
This document discusses translation studies and provides an outline of key topics. It defines translation as turning a source text into another language. There are two main stages in the translation process: comprehension of the source text and expression of the target text. Direct translation is from a foreign language into your native tongue, while indirect is from your native tongue into a foreign one. Specialized translation requires expertise in a field, while non-specialized does not. Translation techniques at the morphosyntactic and semantic levels are explained, including techniques like loan words, calques, and modulation. Translation can be used as a tool in second language learning by teaching it as a separate skill from acquisition.
The document discusses research on intra-sentential and inter-sentential code-switching between Japanese (L1) and English (L2) in diary entries. The study analyzed 8,797 English words from a Japanese student's diary during their year abroad in the US. It found differences in how English words were used based on whether they were within sentences (intra-sentential) or across sentences (inter-sentential). Words in intra-sentential code-switching tended to be content words like nouns, longer, less frequent, and more assimilated into Japanese. Words in inter-sentential code-switching included more function words, were shorter, more frequent, and less assimilated. The study
This document outlines classroom guidelines for learners at the School of Continuing Education (SCE) at the American University in Cairo. It details policies on attendance, punctuality, grading, and how final grades are determined and posted. The key points are:
- Learners must attend at least 75% of class sessions or they will fail the course.
- Learners are expected to arrive on time and any tardiness beyond twice will count as absences.
- Grades are based on continual assessments, a final exam, and end-of-term achievement tests. Final grades are posted by student ID number, not name.
Natural language processing provides a way in which human interacts with computer / machines by means of voice.
"Google Search by voice is the best example " which makes use of natural language processing.
The document discusses natural language processing and some of the key challenges involved. It describes how NLP systems aim to understand human language in written or spoken form by performing tasks like morphological analysis, parsing, semantic analysis, and discourse processing. It also discusses sources of ambiguity in natural language and different models and algorithms used to represent linguistic knowledge and process language, with the goal of building intelligent systems that can understand human communication.
The document describes the key steps in natural language processing including morphological analysis, part-of-speech tagging, lexical processing, syntactic processing, semantic analysis, knowledge representation, discourse analysis, and applications such as machine translation. It outlines techniques for analyzing words, assigning parts of speech, determining word meanings, parsing sentences into syntactic structures, assigning semantic meanings, representing knowledge, analyzing discourse, and translating between languages.
The document provides an introduction to programming, discussing what programming is, common myths, how to get started, choosing a language, necessary tools, the importance of writing code, collaborating with other programmers, and concluding that making mistakes is how one becomes clever in programming. It outlines fundamental concepts to understand like algorithms and data structures, recommends starting with a desktop, web, or mobile language like Java, C#, PHP, or C++, and emphasizes the importance of writing and debugging code as the best way to learn.
Resources for linguistically motivated Multilingual Anaphora ResolutionKepa J. Rodriguez
This document outlines Kepa Joseba Rodríguez's dissertation on developing resources for linguistically motivated multilingual anaphora resolution. It proposes a new annotation scheme that addresses limitations of previous schemes by annotating all noun phrases, distinguishing referring and non-referring expressions, and capturing ambiguity and discontinuous references. The scheme was used to manually annotate English and Italian corpora containing newspaper articles, dialogues, Wikipedia pages, and blogs totaling over 400,000 words. The annotated data aims to provide features for developing improved anaphora resolution systems in multiple languages.
The document discusses translation services provided by CTS LanguageLink. It covers common file types like videos and eLearning that they can translate and localize. It also describes their translation process, which involves subject matter experts, localization experts, and quality assurance checks. Translation memory software and computer-assisted translation tools are used to help improve accuracy and efficiency. CTS LanguageLink can handle a wide variety of projects from simple documents to complex eLearning courses.
This document discusses industrial translation and provides 3 key areas to consider: 1) The target and source languages for translation. 2) The types of documents being translated such as technical manuals. 3) The technical area or industry that the documents relate to such as engineering or manufacturing.
This document discusses language service providers and their importance. It notes that Mandarin Chinese, Hindi, and Spanish are the top languages globally. It outlines reasons to use an LSP, such as their familiarity with legal terminology and ability to minimize misunderstandings. The document also provides tips for choosing an LSP, such as considering their language proficiency, reputation, quality control procedures, accuracy, customer service, and scope of services. It distinguishes between different types of language services like interpreting, translation, and certification.
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
Role of Language Engineering to Preserve Endangered Languages discusses how language engineering can help preserve endangered languages through documentation and digitization. Language engineering is the application of computer science to develop language-related software and hardware. It involves techniques like speech and text processing to develop systems that can understand, interpret, and generate human language. Documenting endangered languages through recording speech samples and collecting texts is important for preservation. Language engineering makes this documentation process easier through tools like speech-to-text, text-to-speech, and transcription tools. It also allows for digital storage of language data, which helps preserve languages for longer as digital data is more durable than other forms of storage. Developing applications that use endangered languages, like translation systems,
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
This document describes a tool to convert audio/text to Indian sign language using Python libraries. It discusses using natural language processing and machine learning algorithms to take text or audio as input and output the corresponding sign language video. The tool is being developed as a website to help deaf and hard of hearing people in India communicate. It covers related work on sign language recognition and conversion tools. It then describes the methodology which includes audio to text conversion, searching a database of sign language video clips, and combining clips to generate the output video. Screenshots of the frontend website and examples of inputs and outputs are provided. Future work discussed includes improving the UI and adding mobile apps to make the tool cross-platform.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
This unit plan focuses on Philippine literature during the contemporary period for a first year high school English class. Over the course of 5 lessons, students will read poems and discuss poetic devices, learn about adjectives and their functions, complete two checkup tests, do a poetry reading assessment, and engage in descriptive writing. The unit aims to help students understand contemporary Philippine literature and improve their English skills through analyzing poems, public speaking, and other activities.
This study examines how cross-linguistic influences and markedness affect second language learners' acquisition and use of derivational morphemes in English. The study analyzes essays written by TOEFL exam takers with native languages of Spanish, Italian, German, Turkish, Arabic, and Japanese. It looks at instances of the affixes -ness, -ity, -ment, -ful, -less, and -ly to identify errors that may be attributed to cross-linguistic influences or markedness. The results will add to research on how these factors hinder or help the complex process of second language word formation.
The document describes a project to develop dynamic syllabi for teaching historical languages through eLearning. It discusses the need to support localization for learners of different languages and the challenges of internationalization. It describes the user experience design for the eLearning platform, including how to introduce users to the system, provide goals and feedback, and visualize learning progress. It also discusses using games to cover different tasks involved in digital editing projects, like transcription, translation, and annotation. Finally, it explains how a graph database is used to store and query the interrelated linguistic data from digital editing projects in a scalable way that is optimized for performance.
Spotting The Difference–Machine Versus Human TranslationUlatus
Regardless of how much the systems have improved and made worldwide communication easier, there is still no alternative to human translation. Machines can only comply to grammatical accuracy, but the semantic, linguistic, and the cultural completeness in a text can only be achieved by human speakers
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
This document presents a proposed approach for analyzing and classifying sentiments in code-mixed data, which is text containing a combination of languages like Hindi and English. The approach uses wordnet techniques to first separate words in each sentence into English and non-English words. It then processes the English and non-English words separately to calculate polarity scores, which indicate whether the sentiment is positive or negative. These polarity scores are combined to determine the overall sentiment of the sentence. The proposed system is aimed at improving over dictionary-based approaches by leveraging wordnet resources like WordNet and Hindi SentiWordNet to analyze code-mixed movie review data and classify the overall polarity.
Translation Ally: Document and Audio TranslatorIRJET Journal
This document describes a web application called Translation Ally that aims to remove language barriers in business communication by translating documents and audio files between languages. The application allows users to upload files in various formats like text, Word, Excel, and audio, select the source and target languages, and receive a translated file in the target language. The key technologies used include Python, Django, and machine translation APIs to perform the actual translation. A survey of common business document formats found that PDF, Word and Excel were the most widely used. The proposed system aims to preserve formatting of translated documents and support more language pairs compared to existing translation tools.
The document describes the key steps in natural language processing including morphological analysis, part-of-speech tagging, lexical processing, syntactic processing, semantic analysis, knowledge representation, discourse analysis, and applications such as machine translation. It outlines techniques for analyzing words, assigning parts of speech, determining word meanings, parsing sentences into syntactic structures, assigning semantic meanings, representing knowledge, analyzing discourse, and translating between languages.
The document provides an introduction to programming, discussing what programming is, common myths, how to get started, choosing a language, necessary tools, the importance of writing code, collaborating with other programmers, and concluding that making mistakes is how one becomes clever in programming. It outlines fundamental concepts to understand like algorithms and data structures, recommends starting with a desktop, web, or mobile language like Java, C#, PHP, or C++, and emphasizes the importance of writing and debugging code as the best way to learn.
Resources for linguistically motivated Multilingual Anaphora ResolutionKepa J. Rodriguez
This document outlines Kepa Joseba Rodríguez's dissertation on developing resources for linguistically motivated multilingual anaphora resolution. It proposes a new annotation scheme that addresses limitations of previous schemes by annotating all noun phrases, distinguishing referring and non-referring expressions, and capturing ambiguity and discontinuous references. The scheme was used to manually annotate English and Italian corpora containing newspaper articles, dialogues, Wikipedia pages, and blogs totaling over 400,000 words. The annotated data aims to provide features for developing improved anaphora resolution systems in multiple languages.
The document discusses translation services provided by CTS LanguageLink. It covers common file types like videos and eLearning that they can translate and localize. It also describes their translation process, which involves subject matter experts, localization experts, and quality assurance checks. Translation memory software and computer-assisted translation tools are used to help improve accuracy and efficiency. CTS LanguageLink can handle a wide variety of projects from simple documents to complex eLearning courses.
This document discusses industrial translation and provides 3 key areas to consider: 1) The target and source languages for translation. 2) The types of documents being translated such as technical manuals. 3) The technical area or industry that the documents relate to such as engineering or manufacturing.
This document discusses language service providers and their importance. It notes that Mandarin Chinese, Hindi, and Spanish are the top languages globally. It outlines reasons to use an LSP, such as their familiarity with legal terminology and ability to minimize misunderstandings. The document also provides tips for choosing an LSP, such as considering their language proficiency, reputation, quality control procedures, accuracy, customer service, and scope of services. It distinguishes between different types of language services like interpreting, translation, and certification.
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
Role of Language Engineering to Preserve Endangered Languages discusses how language engineering can help preserve endangered languages through documentation and digitization. Language engineering is the application of computer science to develop language-related software and hardware. It involves techniques like speech and text processing to develop systems that can understand, interpret, and generate human language. Documenting endangered languages through recording speech samples and collecting texts is important for preservation. Language engineering makes this documentation process easier through tools like speech-to-text, text-to-speech, and transcription tools. It also allows for digital storage of language data, which helps preserve languages for longer as digital data is more durable than other forms of storage. Developing applications that use endangered languages, like translation systems,
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
This document describes a tool to convert audio/text to Indian sign language using Python libraries. It discusses using natural language processing and machine learning algorithms to take text or audio as input and output the corresponding sign language video. The tool is being developed as a website to help deaf and hard of hearing people in India communicate. It covers related work on sign language recognition and conversion tools. It then describes the methodology which includes audio to text conversion, searching a database of sign language video clips, and combining clips to generate the output video. Screenshots of the frontend website and examples of inputs and outputs are provided. Future work discussed includes improving the UI and adding mobile apps to make the tool cross-platform.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
This unit plan focuses on Philippine literature during the contemporary period for a first year high school English class. Over the course of 5 lessons, students will read poems and discuss poetic devices, learn about adjectives and their functions, complete two checkup tests, do a poetry reading assessment, and engage in descriptive writing. The unit aims to help students understand contemporary Philippine literature and improve their English skills through analyzing poems, public speaking, and other activities.
This study examines how cross-linguistic influences and markedness affect second language learners' acquisition and use of derivational morphemes in English. The study analyzes essays written by TOEFL exam takers with native languages of Spanish, Italian, German, Turkish, Arabic, and Japanese. It looks at instances of the affixes -ness, -ity, -ment, -ful, -less, and -ly to identify errors that may be attributed to cross-linguistic influences or markedness. The results will add to research on how these factors hinder or help the complex process of second language word formation.
The document describes a project to develop dynamic syllabi for teaching historical languages through eLearning. It discusses the need to support localization for learners of different languages and the challenges of internationalization. It describes the user experience design for the eLearning platform, including how to introduce users to the system, provide goals and feedback, and visualize learning progress. It also discusses using games to cover different tasks involved in digital editing projects, like transcription, translation, and annotation. Finally, it explains how a graph database is used to store and query the interrelated linguistic data from digital editing projects in a scalable way that is optimized for performance.
Spotting The Difference–Machine Versus Human TranslationUlatus
Regardless of how much the systems have improved and made worldwide communication easier, there is still no alternative to human translation. Machines can only comply to grammatical accuracy, but the semantic, linguistic, and the cultural completeness in a text can only be achieved by human speakers
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
This document presents a proposed approach for analyzing and classifying sentiments in code-mixed data, which is text containing a combination of languages like Hindi and English. The approach uses wordnet techniques to first separate words in each sentence into English and non-English words. It then processes the English and non-English words separately to calculate polarity scores, which indicate whether the sentiment is positive or negative. These polarity scores are combined to determine the overall sentiment of the sentence. The proposed system is aimed at improving over dictionary-based approaches by leveraging wordnet resources like WordNet and Hindi SentiWordNet to analyze code-mixed movie review data and classify the overall polarity.
Translation Ally: Document and Audio TranslatorIRJET Journal
This document describes a web application called Translation Ally that aims to remove language barriers in business communication by translating documents and audio files between languages. The application allows users to upload files in various formats like text, Word, Excel, and audio, select the source and target languages, and receive a translated file in the target language. The key technologies used include Python, Django, and machine translation APIs to perform the actual translation. A survey of common business document formats found that PDF, Word and Excel were the most widely used. The proposed system aims to preserve formatting of translated documents and support more language pairs compared to existing translation tools.
The document provides a summary of a translator's professional experience and qualifications. The translator has 12 years of experience accurately translating between English and Japanese in various fields such as business, technology, media, and spiritual/religious documents. Recent translation work has been for GenevaWorldwide and Accurate Communications. The translator also has project management experience from their role at Hitachi America assisting with business planning, translation, travel arrangements, and event coordination. Areas of translation expertise include business strategy, websites, patents, animation, law, and audio books.
Programming language design and implemenationAshwini Awatare
The document discusses key topics in programming language design and implementation including:
1. The importance of studying programming languages to improve problem solving skills, learn new languages, and understand language design.
2. An overview of different programming paradigms like imperative, functional, object-oriented, and logic-based languages.
3. Factors that influence language design like software architectures, programming environments, internationalization needs, and standardization.
This document provides an overview of natural language processing (NLP). It defines NLP as a branch of artificial intelligence focused on enabling computers to understand, interpret, and generate human language. The document outlines several key NLP applications including sentiment analysis, chatbots, machine translation and text summarization. It also discusses some of the core processes in NLP like tokenization and part-of-speech tagging. Challenges in NLP including ambiguity and context understanding are presented. Recent advances like BERT and transfer learning are noted, as is the potential for improved language models and multimodal NLP in the future.
It’s getting crowded! A critical view of what crowdsourcing can do for termin...TERMCAT
It’s getting crowded! A critical view of what crowdsourcing can do for terminology as a discipline
Barbara Inge Karsch - BIK Terminology
VII EAFT Terminology Summit. Barcelona, 27-28 november 2014
Chingju Cheng(城菁汝), Sophy Chen(陳淑君)
Program Office (計畫辦公室)
Research and Development of Digital Archives and e-Learning Technologies Project (Division1: 數位技術研發與整合計畫)
International Collaboration and Promotion of Taiwan e-Learning & Digital Archives Program (Division 8: 海外推展暨國際合作計畫)
This document describes an audio transcription and text-to-speech system that aims to help people with disabilities or difficulties typing. It outlines the problem of some users not being able to type quickly or see screens clearly. The proposed system would use speech recognition and text-to-speech to allow users to input and output text orally rather than through typing or reading. It would draw on large databases and accurate speech recognition to minimize errors. The system would benefit many users by making digital content more accessible. It provides background on similar existing technologies and discusses how the project would be developed using Python libraries for speech recognition, translation, and text-to-speech.
This document discusses key techniques used by professional translators. It begins by outlining preliminary considerations when taking on a translation project such as ensuring the text is legible and the translator is qualified. It emphasizes that computers are essential tools for translation due to benefits like word processing, storage, and collaboration. The document then details techniques like assessing reference needs, time required, and handling untranslatable text. It stresses the importance of review and proofreading translations. Overall, the document provides guidance on the translation process and techniques used by professional translators.
In this webinar, you will learn how to:
- Recognize key similarities and differences between the oral and written forms of a language and how these play out in interpreting, sight translation and document translation.
- Present techniques to incorporate translation skills into interpreter training or professional development.
- Provide hooks for addressing different levels of prominence of literacy among interpreter students or practicing professionals.
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
Machine translation is being carried out by the researchers from quite a long time. However, it is still a
dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at
translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built
owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers
have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of
Rule based Machine Translation. The paper describes the major divergences observed in language
Marathi and English and many challenges encountered while attempting to build machine translation
system form Marathi to English using rule based approach and rules to handle these challenges. As there
are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine
translation from Marathi to English is a complex task.
The document describes a webinar presentation on teaching sight translation given by Rachel Herring for the National Council on Interpreting in Health Care. The webinar covered an overview of sight translation and its uses, challenges of sight translation as a performance skill, approaches to teaching and practicing it, and considerations around deciding when to sight translate. The webinar provided techniques, tips, and sample scripts for teaching sight translation effectively and addressing common situations interpreters encounter.
The document discusses analyzing communicative events to effectively assess language learning needs. It suggests identifying common functions, vocabulary overlap, and skills across events to prioritize training objectives. A case study examines one learner's job duties and results in a scheme of work focusing on describing problems, processes, and past actions to improve his work performance. The analysis fits materials to learners' needs rather than fitting learners to materials.
Similar to New Tools and Resources to Support Machine Translation (20)
This paper is the result of collaboration between two projects: Emocionário and eSPERTo.
Emocionário aims at organizing emotions in Portuguese and annotate them in corpora. eSPERTo is a paraphrasing system that uses the NooJ linguistic engine, grammars, and lexicons.
The aims for this collaboration were fivefold:(i) From the Emocionário’s point of view, it would be very useful to have an emotion paraphraser to help us identify more cases of emotions in our corpora; (ii) while from eSPERTo’s point of view adding emotion paraphrases would considerably enhance its paraphrasing power. (iii) Applying the emotion classification to an hitherto not used application domain would be a good way to evaluate Emocionário’s capabilities and shortcomings; (iv) and both projects would gain from learning more about real paraphrases of emotion in text. Finally, (v) an interesting question is to assess how good is the methodology employed to harvest emotion paraphrases from parallel text.
This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book "Os Livros Que Devoraram o Meu Pai" (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate “challenging” cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version.
O presente estudo propõe uma análise comparativa –linguística, mas também literária e cultural – entre as edições portuguesa e brasileira de uma obra de literatura infantojuvenil – Os Livros que devoraram o meu pai, do autor português Afonso Cruz –que integra as listas de leituras sugeridas, tanto nos planos curriculares de Portugal como do Brasil. O objetivo específico é apresentar e discutir uma seleção de unidades lexicais, locuções e estruturas frásicas com função adjetiva em alternância nas duas variedades – ou seja, entre as escolhas do autor na variedade PE e as correspondentes soluções adotadas na versão PB. A metodologia escolhida centra-se na análise linguística contrastiva posta em prática com o auxílio de ferramentas digitais baseadas no projeto eSPERTo com recurso a alinhamentos semiautomáticos usando a ferramenta CLUE-Aligner (REF). O corpus utilizado é composto pelas edições portuguesa e brasileira da obra em estudo. O objetivo geral deste trabalho é otimizar os processos editoriais necessariamente presentes na adaptação dos textos, assim como fazer o levantamento das principais dificuldades desse processo. Isso implica, entre outras coisas, uma tomada de consciência face aos limites impostos por um texto literário, como a ténue fronteira entre a adaptação indispensável e a intervenção excessiva. Partindo dos resultados alcançados, pretendemos ainda incentivara investigação de recursos linguísticos para os propósitos de edição, revisão e ensino de Português língua materna e/ou língua estrangeira, entre outras aplicações.
This document provides an introduction and welcome message from the local organizers of the 3rd annual enetCollect MC meeting being held in Lisbon, Portugal. The summary includes:
1) The organizers thank the speakers, chairs, members, volunteers, and sponsors for their contributions to the meeting.
2) They introduce the official host, Professor Isabel Trancoso, and provide details on her extensive experience and leadership roles in spoken language processing.
3) The organizers conclude by thanking everyone for their participation in the meeting in Lisbon.
This document discusses using syntactic-semantic analysis for information extraction in biomedicine. It aims to extract biomolecular events like phosphorylation from text. It uses dictionaries of entities and verbs associated with event types, and NooJ grammars to identify events. Evaluation on a shared task dataset shows average recall of 36.76% and precision of 65.58% for six event types. While results are promising, it discusses limitations like manual pattern identification and challenges with more complex event constructions.
This presentation addresses the problem of translating SVC, such as fazer uma operação (to make an operation). In particular, it focus on the MT of biomedical-related SVC. It argues that paraphrasing can help translate these MWE with a higher quality. This work is based on my PhD research, which addressed the problem of paraphrasing and translating SVC in general.
ReWriter uses linguistically based automated paraphrasing and text-editing mechanisms to help users with their writing needs by providing suggestions for customized text authoring. It also generates word and phrasal usage data to help guide decision-making. ReWriter can be used in word processing applications or linguistic quality control for both source and target texts and it is a useful pre-editor for machine translation. The linguistic resources behind ReWriter, the paraphrasing grammars, and the tools from which ReWriter was derived will also be described, in this particular case, we illustrate ReWriter as a tool to process legal language.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
The poster shows how chatbots can play an important role in Language Learning applications.
This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
This paper presents the automation process of paraphrasing and converting Portuguese constructions typical of informal or spoken language into a formal written language. We illustrate this automation process with examples extracted from the e-PACT corpus that involve the placement of clitic pronouns in verbal compound contexts. Our task consists in paraphrasing and normalizing, among others, constructions such as "vou-lhe/posso-lhe fazer uma surpresa" into "vou/posso fazer-lhe uma surpresa" `lit: I will/can\_to him/her make a surprise / I will/can make\_to him/her a surprise; I will/can make him/her a surprise', where the clitic pronoun "lhe" migrates from an enclitic position after the first verb of the verbal compound to an enclitic position after the main verb, which is the verb responsible for the selection of that pronominal argument. The first verb is either an auxiliary verb or a volitive verb, e.g. "querer" `want'. This is a standard revision procedure in EP. Cases like this represent linguistic phenomena where in general language students and language users get confused or stumble. The paper focuses on general language where the phenomena being observed occur, describes examples of interest found in the corpus, and presents an automatic solution for the normalization of informal syntactic inadequacies found in the researched structures into standard formal writing structures through the application of very generic transformational grammars.
This paper presents the alignment of verbal predicate constructions with the clitic pronoun "lhe" in the European (EP) and Brazilian (BP) varieties of Portuguese, such as in the sentences "Já lhe} arrumaram a bagagem" | "Sua bagagem está seguramente guardada" 'His baggage is safely stowed away', where the EP dative proclisis "lhe" contrasts with the BP possessive pronoun "sua". We have selected several different paraphrastic contrasts, such as proclisis and enclisis, clitic pronouns co-occurring with relative pronouns and negation-type adverbs, among other constructions to illustrate the linguistic phenomenon. Some differences correspond to real contrasts between the two Portuguese varieties, while others purely represent stylistic choices. The contrasting variants were manually aligned in order to constitute a gold standard dataset, and a typology has been established to be further enlarged and made publicly available. The paraphrastic alignments were performed in the e-PACT corpus using the CLUE-Aligner tool. The research work was developed in the framework of the eSPERTo project.
This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT- [no seio de] [a União Europeia] EN- [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT- [no que diz respeito a] EN- [with regard to] or PT- [além disso] EN-[in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and
phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired
in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases.1 The construction of a larger dataset of
paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a
key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
This poster shows paraphrastic suggestions in the eSPERTo paraphrasing system applied to a QA application on a virtual agent and to a summarization tool. It also shows how paraphrases can be used in language learning and the tests envisaged to make eSPERTo a Portuguese learning tool.
O documento descreve o sistema eSPERTo, que gera paráfrases para edição e revisão de texto. O objetivo principal do projeto é desenvolver um sistema capaz de identificar e gerar paráfrases para melhorar a compreensão, simplificar a linguagem e auxiliar na aprendizagem da língua portuguesa. O sistema pode ser útil em vários ambientes como educação, jornalismo e tradução.
ReEscreve (in English, ReWriter) is a multi-purpose paraphraser that uses grammar-based paraphrasing capabilities suitable for source and target control (pre- and post-editing) and is useful for human and machine translation.
Spoken Language Systems Lab @ INESC-ID poster presented at the 1st meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Eurac Research in Bolzano, Italy.
More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Programming Foundation Models with DSPy - Meetup Slides
New Tools and Resources to Support Machine Translation
1. Anabela Barreiro
barreiro_anabela@hotmail.com
FLUP & CLUP-Linguateca
New York University
New Tools and Resources to Support
Machine Translation
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
3. Human Translation vs Machine Translation
An objective and purpose distinction must be established
between human translation and machine translation!
•They use different methods
•They apply to different types of texts
•They serve different purposes
•They face different barriers
•They are NOT in competition!
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
4. Human Translation
Professional translation requires:
•a profound knowledge of the source language and native
proficiency of the target language
•above-average writing skills
•an insightful knowledge of the social-cultural aspects of the
source and target languages
•knowledge of the grammar of the two languages, their
writing conventions, and the situational and cultural context
•In the case of scientific and technical translation, subject
matter knowledge is required, including terminologies of the
field or knowledge domain.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
5. Human Translation
Theory of translation has been dealing with controversial
issues:
•problems related to privileging meaning over form
•visibility or invisibility of the translator
•being faithful to the author or trying to make the text
accessible to the reader (and which kind of reader)
•giving value to the source language culture (foreignise) or
making the text suitable for the target language culture
(domesticate)
•Allowing languages/cultures with more impact to
predominate over languages/cultures with less impact, or being
creative, etc.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
6. Human Translation
The most relevant aspect in translation is to define the
purpose of each translation, which is related to the
characteristics of each text.
… And to define paraphrasing capabilities.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
7. Human Translation: Types of Texts
A certain subjectivity and distance from the source
language text is allowed in translation of literary text for the
sake of maintaining the artistic and aesthetic aspects of the
target language text [Hermans, 1985] [Landers, 2001].
Literary translation may be considered an ART [Leighton,
1990] [Weaver, 2002], where the translator has more freedom
of expression.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
8. Human Translation: Types of Texts
Technical, commercial, and legal translators, like the
authors of the original texts, are more restrained in their use of
language, and they need to be precise and convey the exact
meaning of the original text.
Technical texts are not meant to be beautiful but rather
to be informative, instructive and explanatory. Their main
function is to be clear, so the easier they are to read, the better
they are understood.
Technical translation may be regarded as a CRAFT
[Newmark, 1988] [Biguenet & Schulte, 1989] for which both
technical and linguistic competence is essential, but creativity
and vagueness prohibited.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
9. Machine Translation
With more translation being performed by machines,
new challenges are imposed on the field, theoretical traditions
shaken and the need to rethink the status of translation
becomes more evident. Of all automated applications, machine
translation compels us to reconsider the nature of translation.
ART and CRAFT are NOT appropriate concepts for
machine translation, because it has necessarily to rely on
linguistics and computer science.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
10. Machine Translation
1- Automated translation of text or speech from one natural
language into another
2- An important tool that assists human translators
3- It has become available to the general public in the last few
years due to:
• sophisticated computers
• continuous development of computer software capabilities
• internet boom
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
12. Machine Translation Bottlenecks
1.Complexity of language
2.Ambiguity of language
3.Wordiness (related to text quality)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
13. Machine Translation: Limitations
• The task of delivering high-quality machine translation of certain
types of texts and complex linguistic phenomena is difficult
• It is difficult to grasp humour, sarcasm, and other human feelings
expressed in/by means of sophisticated linguistic expression
• Difficulties in handling extra-sentential and extra-textual and
extra-linguistic information (problems of culture or context),
because knowledge of the world cannot be assumed
• Difficult to deal with anaphora resolution
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
14. Machine Translation Linguistic Challenges
1.Homography
2.Cross-language phenomena (lexical divergences and idioms
and cross-language syntactic transformations, such as
passives)
3.Identification of named entities
4.Capacity to deal with long sentences and wordiness
5.Unusual alterations to the order of words in the target
language
6.Enhanced dictionaries and grammars to recognize and
translate multiword expressions
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
15. Machine Translation Linguistic Challenges: Examples
• Handling of ellipsis
advanced ambiguity problems – related to anaphora
O João visitou muitos países do mundo. A Maria não visitou nenhum.
=> João has visited many countries in the world. Maria hasn’t visited any.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
16. Machine Translation Linguistic Challenges: Examples
• Common-noun nuance resolution / homography
(1) ele não quis tomar partido de ninguém
(2) ele é um bom partido
(3) ele tirou partido da situação
(4) ele pertence a esse partido (político)
(5) o copo está partido
(6) já esteve em melhor partido
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
17. Machine Translation Linguistic Challenges: Examples
Translation Engine Translation Results
FreeTranslation Francisco Scallop advances even if is it do an effort in the sense of take a decision still this
week, defined advances or not for a candidacy to the RTLRS.
WorldLingo advances despite he is to make an effort in the direction to still take a decision this week,
defining if he advances or he does not stop a candidacy to the RTLRS.
Translation Engine Translation Results
Google Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.
Amikai que eu não posso fazer para uma decisão sobre qualquer coisa estes dias.
FreeTranslation Eu não posso tomar uma decisão sobre algo estes dias.
Babelfish Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.
WorldLingo Eu no posso fazer a uma deciso sobre qualquer coisa estes dias.
E-Translation Server Não posso tomar uma decisão sobre qualquer coisa estes dias.
I can't make a decision about anything these days. [Compara]
Francisco Vieira adianta ainda que está a fazer um esforço no sentido de
tomar uma decisão ainda esta semana, definindo se avança ou não para
uma candidatura à RTLRS. [CdP]
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
18. Multiword Expressions: Support Verb Constructions
Support verb construction = predicate noun construction
is a multiword expression containing a verb with weak semantic value
and a noun which is the predicate of the sentence.
Predicate nouns can be:
morphologically related to a verb
fazer uma apresentação de = apresentar
pay a visit to = to visit
autonomous
fazer um mestrado - *mestrar
have fun - *to fun
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
19. Main Objectives
1.Build a body of lexical, syntactic and semantic knowledge
around support verb constructions
2.Apply this linguistic knowledge to paraphrasing
3.Improve machine translation
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
20. Outcome: Resources
Port4NooJ
•an open source, ontology driven Portuguese linguistic
system, which integrates a bilingual extension for
Portuguese-English machine translation
DicTUM
•Dicionário de Termos e Unidades Multipalavra
•a Dictionary of Multiword Expressions
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
21. Outcome: Tools
ReWriter
•a monolingual paraphraser to pre-edit texts, using
paraphrasing capabilities
•Portuguese version ReEscreve
ParaMT
•a bilingual/multilingual paraphraser to be integrated in
machine translation systems
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
22. Resources
Port4NooJ - Publicly available at:
http://www.nooj4nlp.net
http://www.linguateca.pt/Repositorio/Port4Nooj/
Based on:
•NooJ linguistic environment (http://www.nooj4nlp.net/)
•OpenLogos English-Portuguese dictionary (http://logos-
os.dfki.de/)
OpenLogos is an open-source derivative of the Logos Machine Translation System
Data Used
•COMPARA (http://www.linguateca.pt/COMPARA)
•METRA (http://www.linguateca.pt/metra)
•Other corpora
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
23. HIV,N+FLX=PORTUGAL+AB+state+IMMUN+EN=HIV
doença maníaco-depressiva,N+FLX=CASA+AB+state+MH+EN=manic-depressive disorder
doença bipolar,N+FLX=CASA+AB+state+MH+EN=bipolardisorder
asma,N+FLX=CASA+AB+state+PULM+EN=asthma
Amesterdão,N+PL+city+EN=Amsterdam
Estados Unidos da América,N+PL+coun+EN=United States of America
África,N+PL+cont+EN=Africa
Extremo Oriente,N+PL+othprop+EN=Far East
Mediterrâneo,N+FLX=ANO+PL+water+EN=Mediterranean
Alpes Peninos,N+FLX=ALPES+PL+othprop+EN=Pennine Alps
ONU,N+AN+org+EN=UN
Syntactic-
Semantic
Attributes
English
Transfer
Inflectional
Paradigm
Part of
Speech
Lemma
mesa,N+FLX=CASA+CO+surf+EN=table
cair,V+FLX=ATRAIR+INMO+IntoType+EN=fall
holandês,A+FLX=INGLÊS+AN+lang+EN=Dutch
actualmente,ADV+FLX=FACILMENTE+TEMP+punc+pres+EN=nowadays
alguém,PRO+IMPERS+INDEF+EN=somebody
porque,RELINT+why+EN=why
e,CONJ+JOIN+EN=and
durante,PREP+TEMP+EN=during
cada,DET+IMPERS+INDEF+SG+EN=each
terceiro+NUM+ord+EN=one third
Port4NooJ Dictionaries
a curto prazo,ADV+TEMP+EN=in the short run
a favor de,PREP+CAUS+EN=in favor of
cada um,PRO+INDEF+SG+EN=each one
de quem,INT+ThatType+EN=whose
quem quer que seja,REL+WhateverType+EN=whoever
além disso,CONJ+COOR+EN=besides
um quarto,NUM+frac+EN=one fourth
adro da igreja,N+FLX=MENINO+PL+encl+EN=churchyard
cabo de vassoura,N+FLX=MENINO+COtool+EN=broomstick
bebida alcoólica,N+FLX=CASA+MA+liqu+EN=alcoholic drink+UNAMB
bebida alcoólica,N+FLX=CASA+MA+liqu+EN=booze+slang
cor de laranja,A+NAV+Apred+EN=orange
sul-americano,A+FLX=ALTO+AN+des+EN=South American
a curto prazo,ADV+LocTime+TEMP+EN=in the short run
fora de serviço,ADV+STAT+phr+EN=out of order
há muito tempo,ADV+LocTime+TEMP+puncpast+EN=a long time ago
isto é,CONJ+COOR+EN=i.e.
já não,CONJ+COOR+EN=no longer
mesmo assim,CONJ+SUB+EN=even so
juntamente com,PREP+ASSOC+EN=along with
à direita de,PREP+Loc+AT+EN=at the right of
em conformidade com,PREP+ALOG+EN=in congruence with
General dictionary
sample representing all
PoS, variable and
invariable forms Sample of the
dictionary of Terms
and
Multiword Expressions
DicTUMSample of invariable
compounds in the
general dictionary
Sample of the
dictionary of
Biomedical Terms
Sample of the
dictionary of
Proper Names
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
24. Port4NooJ Dictionaries
Sample of terms
classified as Information
+ Instructional/legal
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
26. Syntactic-Semantic Ontology
Noun Supersets
concrete
mass
animate
place
information
abstract
process (intr)
process (tr)
measure
time
aspective
Sets and Subsets of the CONCRETE Noun Superset
Click on CONCRETE Superset, sets and subsets for explanations
functionals
receptacles
bearing surfaces
links/bridges
thresholds, focal
points, barriers
conduits
fasteners
devices, tools
cloth thing
structural elements
concretizations of
verbals
concretizations of
mass nouns
undifferentiated
functionals
product/brand
names
* * *
agentives
software
vehicles
meters
machines/systems
communication agents
concrete chemical
agents
undifferentiated
agentives
* * *
natural things
minute flora
plants
trees
trees/wood
miscellaneous natural
things
* * *
other concrete sets*
impulses/lights
blemishes/marks
edibles (non-mass)
edibles/color
classifiers
amorphous
atomistic
undifferentiated
concrete things
* * *
*With one exception, these
sets have no subsets
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
27. Syntactic-Semantic Ontology
Category Mnemonic Examples in English Examples in Portuguese
agentives CO+undagt See subsets See subsets
software CO+soft routine rotina, ficheiro
concrete chemical agents CO+chem catalyst, warhead ácido sulfúrico
machines/systems CO+mach battery, camera máquina fotográfica
vehicles CO+vehic truck, ship automóvel
meters CO+meter clock, gauge manómetro
communication agents CO+comm radio, radar rádio
functionals CO+undfunc trinket, ornament ornamento
devices/tools CO+tool pliers alicate
fasteners CO+fast nail, tendon prego
bearing surfaces CO+surf table, shelf mesa
receptacles CO+recp bottle, barrel garrafa
conduits CO+cond chute, artery artéria
thresholds/focal points/barriers CO+barr wall, door porta
links/bridges CO+link circuit, nerve circuito
cloth things CO+cloth shirt, blanket camisola
structural elements CO+struc spar, bone osso
concretizations of verbals CO+verb threading
concretizations of mass nouns CO+mass acid lining
product/brand names CO+brand Windows NT Windows NT
natural things CO+nat See subsets See subsets
minute flora CO+flora algae, spore alga
plants CO+plant rose, weed erva
trees CO+tree apple, willow macieira
trees/wood CO+trwd oak, maple carvalho
misc. natural things CO+mnat pebble, iceberg iceberg
edibles (non-mass) CO+ednm pork chop costoleta
edibles/color CO+edcol orange, cherry laranja
impulses/lights Col+ight lamp, beam lâmpada
blemishes/marks CO+blem scratch, freckle sarda
classifiers CO+class element elemento
amorphous CO+amor breeze, tide brisa
atomistic CO+atom electron, atom átomo
undifferentiated CO+obj trifle, curio
Categories of
CONCRETE nouns
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
28. ME - MEASURE Noun Sets and Subsets
Sets and Subsets
Mnemonics (=
SynSem)
Examples
abstract concepts measured by unit ME+abs humidity, length
discrete measurable concepts ME+dis sum, increment
units of measure ME+unit See subsets
units of weight ME+unit+wt ounce, pound
units of velocity ME+unit+vel mph, megahertz
units of volume measure ME+unit+vol gallon, liter
units of temperature ME+unit+temp degrees celsius
units of energy/force ME+unit+ener watt, horsepower
measurement systems ME+unit+sys fahrenheit, kelvin
units of duration ME+unit+dur hour, minute, year
specialized units of measure ME+unit+spec oersted, ohm, phon
units of money/value ME+unit+value dollar, euro, forint
units of linear/area measure ME+unit+lin inch, yard, mile
general undifferentiated measure ME+undif degree, gross, share
Syntactic-Semantic Ontology
Categories of
MEASURE nouns
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
30. Paraphrasing and Translation Grammars
Translation and
bilingual paraphrasing
of simple sentences
Graph to translate simple
sentences
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
31. Verb entries:
• Identification of derivational paradigms for nominalizations
(annotation NDRV) and predicate adjectives (annotation ADRV)
• Link to the derived noun’s support verbs and to the adjective’s
copula verbs (annotation VSUP and annotation VCOP)
adaptar,V+FLX=FALAR+Aux=1+INOP57+Subset=132+EN=adapt+VSUP=fazer+DRV=NDRV00:CANÇÃO
azedar,V+FLX=LIMPAR+Aux=1+OBJTRundif98+Subset=740+EN=sour+VCOP=estar+DRV=ADRV00:ALTO
Explicit Marking of Derivation and Support Verb
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
32. Adjective entries:
• Identification of derivational paradigms for adverbializations
(annotation AVDRV)
literal,A+FLX=PRINCIPAL+IN+symb+EN=literal+DRV=AVDRV00:LITERALMENTE
Autonomous predicate nouns:
• Identification of autonomous predicate nouns (annotation
Npred)
• Identification of a semantically related verb
curso,N+FLX=ANO+Npred+IN+inst+EN=course+VSUP=tirar+VRB=estudar+NPrep=de+Det=um
Explicit Marking of Derivation and Semantic Verb Association
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
33. ReWriter: a Monolingual Standalone Paraphraser
Recognition and monolingual paraphrasing
of support verb constructions
(support verb construction / morphologically related lexical verb)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
35. ReWriter: Application - Interface
Interactive ReWriter
for word processing applications
such as text editing
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
36. ReWriter: Application - Interface
Interactive ReWriter
for word processing applications
such as text editing
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
37. ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
38. ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
39. ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
40. ReWriter: Extensibility
1.Applications to General Language
2.Applications to Technical Language
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
41. ReWriter: Extensibility - Examples
[Paraphrasing adverbials]
à volta da órbita ≡ periorbital (popular versus technical)
around the orbit of the eye periorbital≡
[Paraphrasing relative clauses - into adjectival past
participles]
N0 que têm sido escritos N0 que foram descritos N0≡ ≡
escritos
N0 that have been written N0 that were described≡ ≡
N0 written
[Paraphrasing if clauses]
se for necessário se necessário≡
if it is necessary if necessary≡Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
42. ReWriter: Extensibility - Examples
[Paraphrasing coordinated noun phrases - conjoining
or disjoining]
recursos linguísticos para o ensino e para a investigação
Ŧ ?linguistic resources for teaching and for research
≡ recursos linguísticos para o ensino e a investigação
Ŧ linguistic resources for teaching and research
[Paraphrasing subjunctive clauses - into infinitives]
pedimos o favor que confirme a sua participação
Ŧ *we ask the favor that you confirm your attendance
≡ pedimos o favor de confirmar a sua participação
Ŧ *we ask the favor of confirming your attendance
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
43. ReWriter: Extensibility - Examples
[Paraphrasing marked-up constructions]
se a necessidade do utilizador é criar um texto em linguagem controlada
Ŧ ?if the end-user need is to create controlled language text
≡ se o utilizador necessita de criar um texto em linguagem controlada
Ŧ if the end-user needs to create controlled language text
[Paraphrasing of vague and undefined or null subject sentences]
(whenever the real subject/actor is known)
[-] houve um grito na rua [N-PRON]/≡ alguém gritou na rua
Ŧ there was shouting in the street [N-PRON]/≡ someone shouted in the
street
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
44. ReWriter: Extensibility - Examples
[Paraphrasing passives - whenever suitable]
Esse livro foi escrito por Saramago em 2008 ≡ Saramago escreveu
esse livro em 2008
That book was written by Saramago in 2008 Saramago wrote that≡
book in 2008
Florida foi atingida por um tornado ≡ Um tornado atingiu a Florida
Florida was hit by a tornado A tornado hit Florida≡
O carro foi roubado ≡ Alguém roubou o carro
The car was stolen ≡ Someone stole the car
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
45. ParaMT: a Bilingual/Multilingual Paraphraser for MT
Recognition and bilingual paraphrasing of support verb constructions
(Portuguese support verb construction / corresponding English verb)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
46. Preliminary Quantitative Results
SVC Recognition
Precision
SVC Recognition
Recall
SVC Paraphrasing
Precision
Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6%
Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1%
Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7%
Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1%
Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8%
Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%
Evaluation of recognition and paraphrasing
of support verb constructions
500 sentences
100 for each elementary support verb
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
47. Conclusions
Linguistic knowledge applied to a machine
translation system improves its output quality.
Effective results from linguistically based research
on paraphrases can save substantial effort and
resources employed by machine translation systems
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
48. Thank you for your attention!
Acknowledgements
This work was partly supported by grant SFRH/BD/14076/2003
from Fundação para a Ciência e a Tecnologia, co-financed by
POSI and partly by Fundação para a Computação Científica
Nacional.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Editor's Notes
Good afternoon! My name is AB and I am a PhD student working on MT. I am affiliated with Universidade do Porto-Linguateca and New York University. My interests have centered on MT after working on a commercial MT system for over 7 years. In this presentation , I will introduce ParaMT, a paraphraser applied to machine translation, which was developed during my research work.
Outline First an introduction to distinguish HT from MT Then talk about the resources and tools developed within the scope of my PhD research
Human translation cannot be replaced by machine translation, at least until there are breakthroughs in the limitation of machine translation to sentence level translation, and in artificial intelligence.
Some facts about Machine Translation For most of human history, translation was an exclusively human activity. Before that, machine translation was only accessible to a very restricted niche of the market, and computer-aided translation was used only by professional translators.
Despite the availability of funding and many talented researchers worldwide, most efforts to build cost-effective, industrial strength, high-quality machine translation have fallen short of their goals, since first attempts in the 1950's. Successful machine translation has been difficult to achieve because of two major hurdles: complexity and ambiguity of language.
Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language.
More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text. Typical problems in machine translation They often produce errors
Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
"bom partido" também pode ser considerado um composto e "tirar partido de" como uma expressao fixa ou semi-fixa
Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
A support verb construction is defined as a predicate noun construction containing a main verb which has a weak semantic value. Support verb constructions is an area where statistics tend to “trap” systems. If statistical systems are not sensitive to these constructions, the consequence may be misleading translations. Linguistic knowledge about support verb constructions provides a statistical system with special training data that could correct this problem.
So, according to this desire to see better results, my main objectives were: READ 1, 2, 3.
The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
In any language processing application, the linguistic resources represent the foundation. In machine translation especially, the linguistic resources are the driving force that boosts the translation process. Port4NooJ is developed on two original sources: NooJ linguistic environment and OpenLogos lexical resources. Linguateca’s resources were also used.
The system includes several dictionaries. The structure of the dictionary is XXX
The system includes several dictionaries. The structure of the dictionary is XXX
I will skip this slide on the inflectional and derivational descriptions.
Este slide apresenta uma gramática local para a análise e reconhecimento de construções com verbos suporte elementares e o parafraseamento monolingue que podemos ver na concordância. Paralelamente podemos encontrar, à esquerda a CVS e à direita um verbo lexical que lhe é equivalente.
Neste slide temos representada mais uma concordância, desta vez para o reconhecimento e parafraseamento de construções com verbos suporte elementares que co-ocorrem com nomes predicativos da área biomédica. À esquerda está representada a CVS e à direita um verbo lexical que lhe é equivalente ou uma variante estilística da construção, que pode ser construída a partir de um verbo suporte não elementar, tal como efectuar ou realizar ou por uma construção do tipo “sujeitar-se a” ou “submeter-se a”, no caso de o sujeito da CVS ser obrigatoriamente um paciente. À esquerda está representada a CVS e à direita as suas paráfrases.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
A concordância representada neste slide ilustra o reconhecimento e parafraseamento bilingue PT-EN de CVS. À esquerda temos a CVS em português e à direita, um verbo lexical equivalente em inglês.