This presentation describes the integration of lexicon-grammar of predicate nouns with the support verb "fazer" ("to do" or "to make") into Port4NooJ, the Portuguese language module for NooJ. Port4NooJ resources are used by eSPERTo system to generate paraphrases, i.e., alternative ways to say or write the same sentence.
This paper is the result of collaboration between two projects: Emocionário and eSPERTo.
Emocionário aims at organizing emotions in Portuguese and annotate them in corpora. eSPERTo is a paraphrasing system that uses the NooJ linguistic engine, grammars, and lexicons.
The aims for this collaboration were fivefold:(i) From the Emocionário’s point of view, it would be very useful to have an emotion paraphraser to help us identify more cases of emotions in our corpora; (ii) while from eSPERTo’s point of view adding emotion paraphrases would considerably enhance its paraphrasing power. (iii) Applying the emotion classification to an hitherto not used application domain would be a good way to evaluate Emocionário’s capabilities and shortcomings; (iv) and both projects would gain from learning more about real paraphrases of emotion in text. Finally, (v) an interesting question is to assess how good is the methodology employed to harvest emotion paraphrases from parallel text.
This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book "Os Livros Que Devoraram o Meu Pai" (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate “challenging” cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version.
O presente estudo propõe uma análise comparativa –linguística, mas também literária e cultural – entre as edições portuguesa e brasileira de uma obra de literatura infantojuvenil – Os Livros que devoraram o meu pai, do autor português Afonso Cruz –que integra as listas de leituras sugeridas, tanto nos planos curriculares de Portugal como do Brasil. O objetivo específico é apresentar e discutir uma seleção de unidades lexicais, locuções e estruturas frásicas com função adjetiva em alternância nas duas variedades – ou seja, entre as escolhas do autor na variedade PE e as correspondentes soluções adotadas na versão PB. A metodologia escolhida centra-se na análise linguística contrastiva posta em prática com o auxílio de ferramentas digitais baseadas no projeto eSPERTo com recurso a alinhamentos semiautomáticos usando a ferramenta CLUE-Aligner (REF). O corpus utilizado é composto pelas edições portuguesa e brasileira da obra em estudo. O objetivo geral deste trabalho é otimizar os processos editoriais necessariamente presentes na adaptação dos textos, assim como fazer o levantamento das principais dificuldades desse processo. Isso implica, entre outras coisas, uma tomada de consciência face aos limites impostos por um texto literário, como a ténue fronteira entre a adaptação indispensável e a intervenção excessiva. Partindo dos resultados alcançados, pretendemos ainda incentivara investigação de recursos linguísticos para os propósitos de edição, revisão e ensino de Português língua materna e/ou língua estrangeira, entre outras aplicações.
This paper is the result of collaboration between two projects: Emocionário and eSPERTo.
Emocionário aims at organizing emotions in Portuguese and annotate them in corpora. eSPERTo is a paraphrasing system that uses the NooJ linguistic engine, grammars, and lexicons.
The aims for this collaboration were fivefold:(i) From the Emocionário’s point of view, it would be very useful to have an emotion paraphraser to help us identify more cases of emotions in our corpora; (ii) while from eSPERTo’s point of view adding emotion paraphrases would considerably enhance its paraphrasing power. (iii) Applying the emotion classification to an hitherto not used application domain would be a good way to evaluate Emocionário’s capabilities and shortcomings; (iv) and both projects would gain from learning more about real paraphrases of emotion in text. Finally, (v) an interesting question is to assess how good is the methodology employed to harvest emotion paraphrases from parallel text.
This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book "Os Livros Que Devoraram o Meu Pai" (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate “challenging” cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version.
O presente estudo propõe uma análise comparativa –linguística, mas também literária e cultural – entre as edições portuguesa e brasileira de uma obra de literatura infantojuvenil – Os Livros que devoraram o meu pai, do autor português Afonso Cruz –que integra as listas de leituras sugeridas, tanto nos planos curriculares de Portugal como do Brasil. O objetivo específico é apresentar e discutir uma seleção de unidades lexicais, locuções e estruturas frásicas com função adjetiva em alternância nas duas variedades – ou seja, entre as escolhas do autor na variedade PE e as correspondentes soluções adotadas na versão PB. A metodologia escolhida centra-se na análise linguística contrastiva posta em prática com o auxílio de ferramentas digitais baseadas no projeto eSPERTo com recurso a alinhamentos semiautomáticos usando a ferramenta CLUE-Aligner (REF). O corpus utilizado é composto pelas edições portuguesa e brasileira da obra em estudo. O objetivo geral deste trabalho é otimizar os processos editoriais necessariamente presentes na adaptação dos textos, assim como fazer o levantamento das principais dificuldades desse processo. Isso implica, entre outras coisas, uma tomada de consciência face aos limites impostos por um texto literário, como a ténue fronteira entre a adaptação indispensável e a intervenção excessiva. Partindo dos resultados alcançados, pretendemos ainda incentivara investigação de recursos linguísticos para os propósitos de edição, revisão e ensino de Português língua materna e/ou língua estrangeira, entre outras aplicações.
This presentation addresses the problem of translating SVC, such as fazer uma operação (to make an operation). In particular, it focus on the MT of biomedical-related SVC. It argues that paraphrasing can help translate these MWE with a higher quality. This work is based on my PhD research, which addressed the problem of paraphrasing and translating SVC in general.
ReWriter uses linguistically based automated paraphrasing and text-editing mechanisms to help users with their writing needs by providing suggestions for customized text authoring. It also generates word and phrasal usage data to help guide decision-making. ReWriter can be used in word processing applications or linguistic quality control for both source and target texts and it is a useful pre-editor for machine translation. The linguistic resources behind ReWriter, the paraphrasing grammars, and the tools from which ReWriter was derived will also be described, in this particular case, we illustrate ReWriter as a tool to process legal language.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
The poster shows how chatbots can play an important role in Language Learning applications.
This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
This paper presents the automation process of paraphrasing and converting Portuguese constructions typical of informal or spoken language into a formal written language. We illustrate this automation process with examples extracted from the e-PACT corpus that involve the placement of clitic pronouns in verbal compound contexts. Our task consists in paraphrasing and normalizing, among others, constructions such as "vou-lhe/posso-lhe fazer uma surpresa" into "vou/posso fazer-lhe uma surpresa" `lit: I will/can\_to him/her make a surprise / I will/can make\_to him/her a surprise; I will/can make him/her a surprise', where the clitic pronoun "lhe" migrates from an enclitic position after the first verb of the verbal compound to an enclitic position after the main verb, which is the verb responsible for the selection of that pronominal argument. The first verb is either an auxiliary verb or a volitive verb, e.g. "querer" `want'. This is a standard revision procedure in EP. Cases like this represent linguistic phenomena where in general language students and language users get confused or stumble. The paper focuses on general language where the phenomena being observed occur, describes examples of interest found in the corpus, and presents an automatic solution for the normalization of informal syntactic inadequacies found in the researched structures into standard formal writing structures through the application of very generic transformational grammars.
This paper presents the alignment of verbal predicate constructions with the clitic pronoun "lhe" in the European (EP) and Brazilian (BP) varieties of Portuguese, such as in the sentences "Já lhe} arrumaram a bagagem" | "Sua bagagem está seguramente guardada" 'His baggage is safely stowed away', where the EP dative proclisis "lhe" contrasts with the BP possessive pronoun "sua". We have selected several different paraphrastic contrasts, such as proclisis and enclisis, clitic pronouns co-occurring with relative pronouns and negation-type adverbs, among other constructions to illustrate the linguistic phenomenon. Some differences correspond to real contrasts between the two Portuguese varieties, while others purely represent stylistic choices. The contrasting variants were manually aligned in order to constitute a gold standard dataset, and a typology has been established to be further enlarged and made publicly available. The paraphrastic alignments were performed in the e-PACT corpus using the CLUE-Aligner tool. The research work was developed in the framework of the eSPERTo project.
This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT- [no seio de] [a União Europeia] EN- [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT- [no que diz respeito a] EN- [with regard to] or PT- [além disso] EN-[in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and
phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired
in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases.1 The construction of a larger dataset of
paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a
key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
This poster shows paraphrastic suggestions in the eSPERTo paraphrasing system applied to a QA application on a virtual agent and to a summarization tool. It also shows how paraphrases can be used in language learning and the tests envisaged to make eSPERTo a Portuguese learning tool.
Resumo sobre o projeto eSPERTo (Sistema de Parafraseamento para Edição e Revisão de Texto) baseado numa entrevista ao programa Páginas de Português da Antena 2.
ReEscreve (in English, ReWriter) is a multi-purpose paraphraser that uses grammar-based paraphrasing capabilities suitable for source and target control (pre- and post-editing) and is useful for human and machine translation.
Spoken Language Systems Lab @ INESC-ID poster presented at the 1st meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Eurac Research in Bolzano, Italy.
Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process
and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine
translation. This paper presents an empirical analysis of non-contiguous multiwords, and highlights our use of the Logos
Model and the Semtab function to deploy semantic knowledge to align non-contiguous multiword units with the goal to translate these units with high fidelity. The phrase level manual
alignments illustrated in the paper were produced with the CLUE-Aligner, a Cross-Language Unit Elicitation alignment tool.
This paper presents a set of linguistically informed and motivated multilingual alignments -- the CLUE4Translation Alignments -- covering several categories of multiwords and phrasal units, which constitute important challenges to high quality machine translation. The alignments comprise all possible word combinations between English, French, Portuguese, and Spanish parallel texts of the common test set of the Europarl corpus. The gold collection of the manually annotated alignments -- the Gold-CLUE-Translation -- is constituted of 400 sentences aligned according to previously proposed guidelines -- CLUE4Translation Alignment Guidelines -- for each language pair, resulting in a set of 2,400 alignments. The alignments were performed with the support of a new alignment tool -- CLUE-Aligner -- developed to facilitate the alignment of the translation units in the bitexts, including the alignment of non-contiguous multiwords and phrasal translation units. The Gold CLUE4Translation, the CLUE-Aligner, and the CLUE4Translation Alignment Guidelines are publicly available.
This presentation describes the integration of paraphrases of human intransitive adjectives (of disease, membership, nationality and generic human adjectives) in the eSPERTo paraphrasing system, a linguistically enhanced paraphrase generator that enables conversion of semantically equivalent phrases, and sentences based on semantico-syntactic patterns and multiword units, sensitive to context. eSPERTo is meant to be an hybrid system, combining statistics and linguistic knowledge to identify and generate new and more complex paraphrases and exploit existing paraphrasing resources. This system is integrated in an interactive application that helps users in producing and revising their texts. Among other functionalities, eSPERTo’s web platform includes text-editing mechanisms that provide a variety of alternatives for each expression.
We used the Portuguese linguistic resources of Port4NooJ (the Portuguese module) enhanced with the distributional properties of the human intransitive adjectives described in Lexicon-Grammar tables and applied to grammars to generate paraphrases, invoking NooJ's linguistic engine (noojappy). The new integrated properties allowed to generate several new transformations, namely: (i) relate adjective, noun and verb related constructions; (ii) adjective constructions supported by different copulative verbs; (iii) constructions involving nationality and other membership relations; (iv) cross-constructions; (v) appropriate noun constructions; (vi) generic noun phrases.
This preseantation addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos
rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct reasons, multiwords remain a problematic area for MT independently of the approach, and require adequate linguistic quality evaluation metrics founded on a systematic categorization of errors by MT expert linguists. We propose an empirically-driven taxonomy for multiwords, and highlight the need for the development of specific corpora for multiword evaluation. Finally, the paper presents the Logos approach to multiword processing, illustrating how semantico-syntactic rules contribute to multiword translation quality.
This presentation addresses the problem of translating SVC, such as fazer uma operação (to make an operation). In particular, it focus on the MT of biomedical-related SVC. It argues that paraphrasing can help translate these MWE with a higher quality. This work is based on my PhD research, which addressed the problem of paraphrasing and translating SVC in general.
ReWriter uses linguistically based automated paraphrasing and text-editing mechanisms to help users with their writing needs by providing suggestions for customized text authoring. It also generates word and phrasal usage data to help guide decision-making. ReWriter can be used in word processing applications or linguistic quality control for both source and target texts and it is a useful pre-editor for machine translation. The linguistic resources behind ReWriter, the paraphrasing grammars, and the tools from which ReWriter was derived will also be described, in this particular case, we illustrate ReWriter as a tool to process legal language.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
The poster shows how chatbots can play an important role in Language Learning applications.
This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
This paper presents the automation process of paraphrasing and converting Portuguese constructions typical of informal or spoken language into a formal written language. We illustrate this automation process with examples extracted from the e-PACT corpus that involve the placement of clitic pronouns in verbal compound contexts. Our task consists in paraphrasing and normalizing, among others, constructions such as "vou-lhe/posso-lhe fazer uma surpresa" into "vou/posso fazer-lhe uma surpresa" `lit: I will/can\_to him/her make a surprise / I will/can make\_to him/her a surprise; I will/can make him/her a surprise', where the clitic pronoun "lhe" migrates from an enclitic position after the first verb of the verbal compound to an enclitic position after the main verb, which is the verb responsible for the selection of that pronominal argument. The first verb is either an auxiliary verb or a volitive verb, e.g. "querer" `want'. This is a standard revision procedure in EP. Cases like this represent linguistic phenomena where in general language students and language users get confused or stumble. The paper focuses on general language where the phenomena being observed occur, describes examples of interest found in the corpus, and presents an automatic solution for the normalization of informal syntactic inadequacies found in the researched structures into standard formal writing structures through the application of very generic transformational grammars.
This paper presents the alignment of verbal predicate constructions with the clitic pronoun "lhe" in the European (EP) and Brazilian (BP) varieties of Portuguese, such as in the sentences "Já lhe} arrumaram a bagagem" | "Sua bagagem está seguramente guardada" 'His baggage is safely stowed away', where the EP dative proclisis "lhe" contrasts with the BP possessive pronoun "sua". We have selected several different paraphrastic contrasts, such as proclisis and enclisis, clitic pronouns co-occurring with relative pronouns and negation-type adverbs, among other constructions to illustrate the linguistic phenomenon. Some differences correspond to real contrasts between the two Portuguese varieties, while others purely represent stylistic choices. The contrasting variants were manually aligned in order to constitute a gold standard dataset, and a typology has been established to be further enlarged and made publicly available. The paraphrastic alignments were performed in the e-PACT corpus using the CLUE-Aligner tool. The research work was developed in the framework of the eSPERTo project.
This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT- [no seio de] [a União Europeia] EN- [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT- [no que diz respeito a] EN- [with regard to] or PT- [além disso] EN-[in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and
phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired
in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases.1 The construction of a larger dataset of
paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a
key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
This poster shows paraphrastic suggestions in the eSPERTo paraphrasing system applied to a QA application on a virtual agent and to a summarization tool. It also shows how paraphrases can be used in language learning and the tests envisaged to make eSPERTo a Portuguese learning tool.
Resumo sobre o projeto eSPERTo (Sistema de Parafraseamento para Edição e Revisão de Texto) baseado numa entrevista ao programa Páginas de Português da Antena 2.
ReEscreve (in English, ReWriter) is a multi-purpose paraphraser that uses grammar-based paraphrasing capabilities suitable for source and target control (pre- and post-editing) and is useful for human and machine translation.
Spoken Language Systems Lab @ INESC-ID poster presented at the 1st meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Eurac Research in Bolzano, Italy.
Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process
and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine
translation. This paper presents an empirical analysis of non-contiguous multiwords, and highlights our use of the Logos
Model and the Semtab function to deploy semantic knowledge to align non-contiguous multiword units with the goal to translate these units with high fidelity. The phrase level manual
alignments illustrated in the paper were produced with the CLUE-Aligner, a Cross-Language Unit Elicitation alignment tool.
This paper presents a set of linguistically informed and motivated multilingual alignments -- the CLUE4Translation Alignments -- covering several categories of multiwords and phrasal units, which constitute important challenges to high quality machine translation. The alignments comprise all possible word combinations between English, French, Portuguese, and Spanish parallel texts of the common test set of the Europarl corpus. The gold collection of the manually annotated alignments -- the Gold-CLUE-Translation -- is constituted of 400 sentences aligned according to previously proposed guidelines -- CLUE4Translation Alignment Guidelines -- for each language pair, resulting in a set of 2,400 alignments. The alignments were performed with the support of a new alignment tool -- CLUE-Aligner -- developed to facilitate the alignment of the translation units in the bitexts, including the alignment of non-contiguous multiwords and phrasal translation units. The Gold CLUE4Translation, the CLUE-Aligner, and the CLUE4Translation Alignment Guidelines are publicly available.
This presentation describes the integration of paraphrases of human intransitive adjectives (of disease, membership, nationality and generic human adjectives) in the eSPERTo paraphrasing system, a linguistically enhanced paraphrase generator that enables conversion of semantically equivalent phrases, and sentences based on semantico-syntactic patterns and multiword units, sensitive to context. eSPERTo is meant to be an hybrid system, combining statistics and linguistic knowledge to identify and generate new and more complex paraphrases and exploit existing paraphrasing resources. This system is integrated in an interactive application that helps users in producing and revising their texts. Among other functionalities, eSPERTo’s web platform includes text-editing mechanisms that provide a variety of alternatives for each expression.
We used the Portuguese linguistic resources of Port4NooJ (the Portuguese module) enhanced with the distributional properties of the human intransitive adjectives described in Lexicon-Grammar tables and applied to grammars to generate paraphrases, invoking NooJ's linguistic engine (noojappy). The new integrated properties allowed to generate several new transformations, namely: (i) relate adjective, noun and verb related constructions; (ii) adjective constructions supported by different copulative verbs; (iii) constructions involving nationality and other membership relations; (iv) cross-constructions; (v) appropriate noun constructions; (vi) generic noun phrases.
This preseantation addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos
rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct reasons, multiwords remain a problematic area for MT independently of the approach, and require adequate linguistic quality evaluation metrics founded on a systematic categorization of errors by MT expert linguists. We propose an empirically-driven taxonomy for multiwords, and highlight the need for the development of specific corpora for multiword evaluation. Finally, the paper presents the Logos approach to multiword processing, illustrating how semantico-syntactic rules contribute to multiword translation quality.
More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
5. N0 fazer Det (Npred + C) W
C =: ≠ V-n, Adj-n N0 =: Nhum
Det =: E
W =: (E + a Nhum)
Npred =: V-n, Adj-n
W =: EW =: ≠ E
W =: Prep N
W =: Prep Que-F
Prep =: a
Prep =: de
Prep =: ≠ a, de
W =: E
W =: Prep Que-F
W =: de N Prep N
W =: ≠ E
W =: Prep N
Prep =: a
Prep =: de
Prep =: ≠ a, de
Prep =: a
Prep =: de
Predicate nouns with Vsup fazer ClassiZication Criteria
6. N0 fazer Det (Npred + C) W
C =: ≠ V-n, Adj-n FND
FNSA
Npred =: V-n, Adj-n
FCW =: ≠ E
W =: Prep N
FCQ
FCAN
FCDN
Prep =: ≠ a, de
FN
FNQ
W =: de N Prep N
W =: ≠ E
W =: Prep N
FNAN
FNDN
Prep =: ≠ a, de
FNDNAN
FNDNPN
FCPN
FCSI
FNPN
FNSI
Predicate nouns with Vsup fazer ClassiZication Criteria
11. +DRV=N2V2:FALAR
FN
N0=:Nhum
N0=:N-hum
N0=:Npluralobrig
Vsup
Det=:E
Det=:UM-Modif
Det=:O
Npred
Vsup=:estarPrep
Vsup=:ter
Vsup=:dar
Vasp=:iniciar
Vasp=:prosseguir
Vasp=:concluir
Vestil=:cometer
V Adj
deV0infW
dequeF
GN=:NdeN0
Exemplo
+ - - fazer + - - espionagem - - - + + + - + + - - - O homem fazia espionagem
- + - fazer + + + espuma - + - - - - - + + - - + O detergente fazia espuma
+ - - fazer + + + estardalhaço - - + - - - - + - - - + O Zé fez um estardalhaço enorme
– From LG tables to NooJ dictionaries
• Representation of LG table properties
Integration of LG of PT Vsup fazer
11
Espionagem – espiar; ser espião, ia – FN
Espuma – espumar; ser espumoso, a – FN
Estardalhaço – estardalhaçar – FN
• DRV code is determined and formalized automatically by finding the radical
between the noun and the verb or adjective that are listed in a separate file
espuma(r) => N2V2 = r/V
espum(oso) => A2V614= <B1>oso/A
• FLX code of derived word is determined by consulting Port4NooJ
espumar,V+FLX=FALAR+Aux=1+INOPfor46+Subset=363+EN=foam…
espumoso,A+FLX=ALTO+NAV+Apred+EN=bubbly…
If the derived form does not exist, then its code is assigned automatically
+DRV=N2A14:ALTO
12. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
① Noun not in Port4NooJ (old or current):
ü Create new entry:
ü FLX code is assigned automatically given the ending of the word
ü Entries are checked for missing FLX codes and reviewed by a linguist
ü All other properties come from LG table
ü Add entry to new standalone dictionary npred_vsupfazer.dic
batota,N+FLX=CASA+Npred+Vsup=fazer+Table=FN+N0Nhum+DetE+DetUMModif
+Npred+DRV=N2V2:FALAR+DRV=N2A6:ALTO+GNNdeN0
rodagem,N+FLX=ANO+Npred+Vsup=fazer+Table=FNAN+N0Nhum+DetO+Npred+Prepa
+N1Nnhum+VsupestarPrep+Vaspiniciar+Vaspprosseguir+Vaspconcluir
+DRV=N2V27:FALAR
rodagem,N+FLX=ANO+Npred+Vsup=fazer+Table=FNDNhl+N0Nhum+DetUMModif
+DetO+Npred+Prepde+ConstConversa+DRV=N2V27:FALAR
rodagem,N+FLX=ANO+Npred+Vsup=fazer+Table=FNDNhl+N0Nhum+DetUMModif
+DetO+Npred+Prepde+Vaspiniciar+Vaspprosseguir+Vaspconcluir
+DRV=N2V27:FALAR
12
Integration of LG of PT Vsup fazer
13. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
② Noun exists both in current and old Port4NooJ
A. If entries are the same do Merge 1:
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
13
Integration of LG of PT Vsup fazer
curva,N+FLX=ANO+Set=56+Subset=280+EN=curve
curva,N+FLX=ANO+Set=56+Subset=280+EN=curving
curva,N+FLX=ANO+PresPart+Set=56+Subset=280+EN=curving
+Npred
+Vsup=fazer
+Table=FN
+N0Nhum
+N0Nnhum
+DetUMModif
+DetO
+Npred
+Vsupdar
+Vaspiniciar
+Vaspprosseguir
+Vaspconcluir
+DRV=N2V2:FALAR
15. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
② Noun exists both in current and old Port4NooJ
B. If entries are not the same do Merge 2 with old entries as shown in case 3:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
15
Integration of LG of PT Vsup fazer
Entries in CV:
cruzamento,N+FLX=ANO+AB+mot+EN=crossover
cruzamento,N+FLX=ANO+CO+recp+EN=frog
cruzamento,N+FLX=ANO+PL+nagcom+EN=crossings
Entries in OV:
cruzamento,N+FLX=ANO+PresPart+Npred+Nom+Set=68+Subset=551+EN=intersecting
+VRB=cruzar
16. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
② Noun exists both in current and old Port4NooJ
B. If entries are not the same do Merge 2 with old entries as shown in case 3:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
16
Integration of LG of PT Vsup fazer
Entries in CV:
cruzamento,N+FLX=ANO+AB+mot+EN=crossover
cruzamento,N+FLX=ANO+CO+recp+EN=frog
cruzamento,N+FLX=ANO+PL+nagcom+EN=crossings
Entries in OV:
cruzamento,N+FLX=ANO+PresPart+Npred+Nom+Set=68+Subset=551+EN=intersecting
+VRB=cruzar
17. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
② Noun exists both in current and old Port4NooJ
B. If entries are not the same do Merge 2 with old entries as shown in case 3:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
17
Integration of LG of PT Vsup fazer
Entries in CV:
cruzamento,N+FLX=ANO+AB+mot+EN=crossover
cruzamento,N+FLX=ANO+CO+recp+EN=frog
cruzamento,N+FLX=ANO+PL+nagcom+EN=crossings
Entries in OV:
cruzamento,N+FLX=ANO+PresPart+Set=68+Subset=551+EN=intersecting
+Npred
+Vsup=fazer
+Table=FNPN
+N0Nhum
+DetE
+DetUMModif
+DetO
+Preppara
+N1Nhum
+DRV=N2V16:FALAR
+GNNdeN0PrepN1
18. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
② Noun exists both in current and old Port4NooJ
B. If entries are not the same do Merge 2 with old entries as shown in case 3:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
18
Integration of LG of PT Vsup fazer
Entries in CV:
cruzamento,N+FLX=ANO+AB+mot+EN=crossover
cruzamento,N+FLX=ANO+CO+recp+EN=frog
cruzamento,N+FLX=ANO+PL+nagcom+EN=crossings
Entries in OV:
cruzamento,N+FLX=ANO+PresPart+Set=68+Subset=551+EN=intersecting+Npred
+Vsup=fazer+Table=FNPN+N0Nhum+DetE+DetUMModif+DetO+Npred+Preppara
+N1Nhum+DRV=N2V16:FALAR+GNNdeN0PrepN1
19. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
③ Noun exists only in old Port4NooJ
ü Do Merge 2 with old entries as shown in Case 2-B:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
19
Integration of LG of PT Vsup fazer
protesto,N+FLX=ANO+AB+strvb+Npred+Nom+EN=outcry+VRB=protestar
protesto,N+FLX=ANO+PNT+Npred+Nom+Set=32+Subset=248+EN=protest
+VRB=protestar
20. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
③ Noun exists only in old Port4NooJ
ü Do Merge 2 with old entries as shown in Case 2-B:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
20
Integration of LG of PT Vsup fazer
protesto,N+FLX=ANO+AB+strvb+Npred+Nom+EN=outcry+VRB=protestar
protesto,N+FLX=ANO+PNT+Npred+Nom+Set=32+Subset=248+EN=protest
+VRB=protestar
21. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
③ Noun exists only in old Port4NooJ
ü Do Merge 2 with old entries as shown in Case 2-B:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
21
Integration of LG of PT Vsup fazer
protesto,N+FLX=ANO+AB+strvb+EN=outcry
protesto,N+FLX=ANO+PNT+Set=32+Subset=248+EN=protest
+Npred
+Vsup=fazer
+Table=FNPN
+N0Nhum
+DetUMModif
+DetO
+Npred
+Prepcontra
+N1Nhum
+N1abstr
+Vaspiniciar
+Vaspprosseguir
+Vaspconcluir
+DRV=N2V8:FALAR
+GNNdeN0PrepN1
22. – From LG tables to NooJ dictionaries
• Integration with eSPERTo dictionary entries
③ Noun exists only in old Port4NooJ
ü Do Merge 2 with old entries as shown in Case 2-B:
ü Remove previous Npred related properties
ü Blindly add additional properties as speciZied by the LG tables to current entries
ü Add merged entries to npred_vsupfazer.dic
ü Remove nominalization from CV
22
Integration of LG of PT Vsup fazer
protesto,N+FLX=ANO+AB+strvb+EN=outcry+Npred+Vsup=fazer+Table=FNPN
+N0Nhum+DetUMModif+DetO+Npred+Prepcontra+N1Nhum+N1abstr
+Vaspiniciar+Vaspprosseguir+Vaspconcluir+DRV=N2V8:FALAR
+GNNdeN0PrepN1
protesto,N+FLX=ANO+PNT+Set=32+Subset=248+EN=protest+Npred
+Vsup=fazer+Table=FNPN+N0Nhum+DetUMModif+DetO+Npred+Prepcontra
+N1Nhum+N1abstr+Vaspiniciar+Vaspprosseguir+Vaspconcluir
+DRV=N2V8:FALAR+GNNdeN0PrepN1
27. Preliminary Results
27
• 5,236 predicate nouns with Vsup fazer (1,610 different noun lemmas)
• 332 new derivational paradigms
• Example grammars for the syntactic parser
• Most nouns already existed in Port4NooJ (63%)
è 5% increase in Port4NooJ nominal entries and 17.5% increase in predicate
nouns Table Example In Port4NooJ New % In
FNDN-hl O barco fez a abordagem do cais 367 169 68%
FNDNa O advogado fez a alegação de insanidade mental 208 95 69%
FN A instituição fez uma angariação de fundos 112 127 47%
FNAN O Zé fez um aceno ao Tó 121 82 60%
FNPN O Zé fez um acrescento na prova tipográfica 127 46 73%
FNDNh A Ana faz o acolhimento dos convidados 98 42 70%
FNDNAN O Zé fez uma adaptação do romance ao cinema 60 16 79%
FND O Zé faz ciclismo 17 56 23%
FNDNPN O Tó fez o câmbio das pesetas em liras 52 13 80%
FCSI O Zé fez um acordo com a Ana 19 40 32%
FNSI O Zé fez uma aliança com o Tó 28 14 67%
FNDNPNSI O padre fez o casamento da Ana com o Zé 21 4 84%
FCQ O Tó fez a fineza de convidar a Ana 5 5 50%
FNQ A Maria faz tenção de ter filhos 4 1 80%
Total 1239 710 63%