This document discusses using syntactic-semantic analysis for information extraction in biomedicine. It aims to extract biomolecular events like phosphorylation from text. It uses dictionaries of entities and verbs associated with event types, and NooJ grammars to identify events. Evaluation on a shared task dataset shows average recall of 36.76% and precision of 65.58% for six event types. While results are promising, it discusses limitations like manual pattern identification and challenges with more complex event constructions.
Identifying Relevant Temporal Expressions for Real-world EventsNattiya Kanhabua
Event detection is an interesting task for many applications, for instance: surveillance, scientific discovery, and Topic Detection and Tracking. Numerous works have focused on detecting events from unstructured text and determining what features constitutes an event, e.g., key terms or named entities. Although most works are able to find interesting time associated to an event, there is a lack in research on determining the relevance of time for an event. In this paper, we propose a method for automatically extracting real-world events from unstructured text documents. In addition, we propose a machine learning approach to identifying relevant time (i.e., temporal expressions) for the extracted events using three classes of features: sentence-based, document-based and corpus-specific features. Through experiments using real-world data and 3,500 manually judged relevance pairs, we show that our proposed approach is able to identify the relevant time of events with good accuracy.
CDAO presentation.
The idea of the comparative analysis ontoloty has been presented worldwide, including: NESCent (USA), IGBMC (France), UFRJ (Brazil). Providing a semantic framework for evolutionary analysis in a high-throughtput way after the next and third generation sequencing is the way to approach evolutionary-based studies into genome-wide analysis. The darwinian core of reasoning also allows CDAO to be used with other entities.
Identifying Relevant Temporal Expressions for Real-world EventsNattiya Kanhabua
Event detection is an interesting task for many applications, for instance: surveillance, scientific discovery, and Topic Detection and Tracking. Numerous works have focused on detecting events from unstructured text and determining what features constitutes an event, e.g., key terms or named entities. Although most works are able to find interesting time associated to an event, there is a lack in research on determining the relevance of time for an event. In this paper, we propose a method for automatically extracting real-world events from unstructured text documents. In addition, we propose a machine learning approach to identifying relevant time (i.e., temporal expressions) for the extracted events using three classes of features: sentence-based, document-based and corpus-specific features. Through experiments using real-world data and 3,500 manually judged relevance pairs, we show that our proposed approach is able to identify the relevant time of events with good accuracy.
CDAO presentation.
The idea of the comparative analysis ontoloty has been presented worldwide, including: NESCent (USA), IGBMC (France), UFRJ (Brazil). Providing a semantic framework for evolutionary analysis in a high-throughtput way after the next and third generation sequencing is the way to approach evolutionary-based studies into genome-wide analysis. The darwinian core of reasoning also allows CDAO to be used with other entities.
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
Specialty gene sets, such as virulence factors and antibiotic resistance genes, are of particular interest to infectious disease researchers. Much of the information about specialty genes’ function is described in literature but unavailable as structured data in bioinformatics databases. The steadily increasing volume of literature makes it difficult to manually find relevant papers and extract assertion sentences about specialty genes. This presentation describes efforts to build and an automatic classifier for such sentences. Experiments were conducted to assess the impact of the imbalance of positive and negative examples in source documents on classification; develop a support vector machine (SVM) classifier using term frequency-inverse document frequency (TF-IDF) representation of text; and assess the marginal benefit of additional training examples on the quality of the classifier. Analysis of learning curves indicates that additional training examples will not likely improve the quality of the classifier. We discuss options for other text representation schemes to investigate in order to improve the quality of the classifier as measured by F-score.
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
Hi, I am Jesmin, studying MCSE. I think this file will help you if you want to know the basic information about Bioinformatics and the use of BLAST tool. The BLAST tool is the tool that matches the sequences of DNA,RNA and proteins.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
A knowledge capture framework for domain specific search systemsramakanz
This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...OECD Environment
24 June 2019: This OECD seminar presented and discussed the potential use of genome sequence, bioinformatic tools and databases in a regulatory decision process for microbial pesticides.
Quantifying the content of biomedical semantic resources as a core for drug d...Syed Muhammad Ali Hasnain
The biomedical research community is providing large-scale data sources to enable knowledge discovery from the data alone, or from novel scientific experiments in combination with the existing knowledge.
Increasingly semantic Web technologies are being developed and used including ontologies, triple stores and combinations thereof.
The amount of data is constantly increasing as well as the complexity of data.
Since the data sources are publicly available, the amount of content can be derived giving an overview on the accessible content but also on the state of the data representation in comparison to the existing content.
For a better understanding of the existing data resources, i.e.\ judgments on the distribution of data triples across concepts, data types and primary providers, we have performed a comprehensive analysis which delivers an overview on the accessible content for semantic Web solutions.
It can be derived that the information related to genes, proteins and chemical entities form the center, whereas the content related to diseases and pathways forms a smaller portion.
Further data relates to dietary content and specific questions such as cancer prevention and toxicological effects of drugs.
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Presented at the 10th annual Data Harmony Users Group meeting on Tuesday, February 11, 2014 by Rachel Drysdale of PLOS. Discusses the process of building and integrating their new thesaurus into the PLOS journals workflow and publication platform. From constructing the thesaurus to creating channels for feedback and updates, through building new current awareness and discovery tools, to gathering data for article level metrics and web site analytics, follow their progress through to today’s PLOS websites and services.
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
Workshop on Assignment 2 SCI115 Live workshop 103020.docxdunnramage
Workshop on Assignment 2
SCI115
Live workshop 10/30/2018
In Assignment 2, you continue with the
same topic as the article you used in
Assignment 1
In order to give a better sense of what
we’re looking for in each component of
the paper, this presentation covers a lot
of concrete examples from different
topics.
This slide was
added after
the live
session
Begin with a description of the
biotechnology and what it accomplishes.
Then, explain whether it involves
manipulating the DNA (or RNA) of an
organism, or simply utilizing the DNA (or
RNA) that is already there naturally.
• For your technology describe the applications
• Changing DNA versus Interpreting DNA
• Let’s look at some concrete examples of this
to illustrate the concept
Changing DNA or Interpreting DNA
• Changing DNA
– Human gene therapy
– Gene drives
– GM crops
– GM animals
• Interpreting DNA
– Crime scene analysis
– Precision Medicine
Changing DNA – applications
• Human gene therapy
– Curing a disease
• Gene drives
– Wiping out harmful pest populations
• GM crops
– Enhancing agriculture
• GM animals
– Agriculture, special products, research
Interpreting DNA – applications
• Crime scene analysis
– Identifying criminal perpetrators
• Precision Medicine
– Informing medical decisions
Interpreting DNA – applications
• Crime scene analysis (identifying criminal
perpetrators)
– CODIS System (using STRs)
– Phenotyping
• Precision Medicine (informing medical decisions)
– Medical decisions based on inherited DNA
• Prevention
• Treatment
– Custom treatment for tumors based on acquired
mutations
Explain the basics of how your
selected technology works.
• Focus on your biotechnology
• How does it work?
Changing DNA – how it works
• Human gene therapy (curing a disease)
– Compensating for a loss-of-function gene by adding a
copy of the functional gene
– Replacing a non-functional gene with the functional
gene
– Silencing a gene that is causing problems
• Gene drives (wiping out harmful pest
populations)
– Creating a situation in an organism will always get two
copies of a gene
• if it only gets one copy, another is added by special enzymes
Changing DNA – how it works
• GM crops (enhancing agriculture)
– Genes added (usually from another species) to
confer some trait such as disease resistance,
insect resistance, tolerance of drought/heat,
enhanced nutrition
– Genes suppressed to prevent ripening too fast,
etc.
Changing DNA – how it works
• GM animals (agriculture, special products, research)
– GM meats
• A gene has been added to meat animals to grow quicker, have less
fat, etc.
– “Pharming”
• A gene has been added so that the animal secretes a specialized
product in milk or other secretions
– Xenotransplantation
• Genes added/removed to make animals tissues more like human
tissue for transplantation purposes
– .
This paper is the result of collaboration between two projects: Emocionário and eSPERTo.
Emocionário aims at organizing emotions in Portuguese and annotate them in corpora. eSPERTo is a paraphrasing system that uses the NooJ linguistic engine, grammars, and lexicons.
The aims for this collaboration were fivefold:(i) From the Emocionário’s point of view, it would be very useful to have an emotion paraphraser to help us identify more cases of emotions in our corpora; (ii) while from eSPERTo’s point of view adding emotion paraphrases would considerably enhance its paraphrasing power. (iii) Applying the emotion classification to an hitherto not used application domain would be a good way to evaluate Emocionário’s capabilities and shortcomings; (iv) and both projects would gain from learning more about real paraphrases of emotion in text. Finally, (v) an interesting question is to assess how good is the methodology employed to harvest emotion paraphrases from parallel text.
More Related Content
Similar to Syntactic-semantic analysis for information extraction in biomedicine
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
Specialty gene sets, such as virulence factors and antibiotic resistance genes, are of particular interest to infectious disease researchers. Much of the information about specialty genes’ function is described in literature but unavailable as structured data in bioinformatics databases. The steadily increasing volume of literature makes it difficult to manually find relevant papers and extract assertion sentences about specialty genes. This presentation describes efforts to build and an automatic classifier for such sentences. Experiments were conducted to assess the impact of the imbalance of positive and negative examples in source documents on classification; develop a support vector machine (SVM) classifier using term frequency-inverse document frequency (TF-IDF) representation of text; and assess the marginal benefit of additional training examples on the quality of the classifier. Analysis of learning curves indicates that additional training examples will not likely improve the quality of the classifier. We discuss options for other text representation schemes to investigate in order to improve the quality of the classifier as measured by F-score.
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
Hi, I am Jesmin, studying MCSE. I think this file will help you if you want to know the basic information about Bioinformatics and the use of BLAST tool. The BLAST tool is the tool that matches the sequences of DNA,RNA and proteins.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
A knowledge capture framework for domain specific search systemsramakanz
This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...OECD Environment
24 June 2019: This OECD seminar presented and discussed the potential use of genome sequence, bioinformatic tools and databases in a regulatory decision process for microbial pesticides.
Quantifying the content of biomedical semantic resources as a core for drug d...Syed Muhammad Ali Hasnain
The biomedical research community is providing large-scale data sources to enable knowledge discovery from the data alone, or from novel scientific experiments in combination with the existing knowledge.
Increasingly semantic Web technologies are being developed and used including ontologies, triple stores and combinations thereof.
The amount of data is constantly increasing as well as the complexity of data.
Since the data sources are publicly available, the amount of content can be derived giving an overview on the accessible content but also on the state of the data representation in comparison to the existing content.
For a better understanding of the existing data resources, i.e.\ judgments on the distribution of data triples across concepts, data types and primary providers, we have performed a comprehensive analysis which delivers an overview on the accessible content for semantic Web solutions.
It can be derived that the information related to genes, proteins and chemical entities form the center, whereas the content related to diseases and pathways forms a smaller portion.
Further data relates to dietary content and specific questions such as cancer prevention and toxicological effects of drugs.
Keynote presentation from Plant and Pathogen Bioinformatics workshop at EMBL-EBI, 8-11 July 2014
Slides and teaching material are available at https://github.com/widdowquinn/Teaching-EMBL-Plant-Path-Genomics
Presented at the 10th annual Data Harmony Users Group meeting on Tuesday, February 11, 2014 by Rachel Drysdale of PLOS. Discusses the process of building and integrating their new thesaurus into the PLOS journals workflow and publication platform. From constructing the thesaurus to creating channels for feedback and updates, through building new current awareness and discovery tools, to gathering data for article level metrics and web site analytics, follow their progress through to today’s PLOS websites and services.
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
Workshop on Assignment 2 SCI115 Live workshop 103020.docxdunnramage
Workshop on Assignment 2
SCI115
Live workshop 10/30/2018
In Assignment 2, you continue with the
same topic as the article you used in
Assignment 1
In order to give a better sense of what
we’re looking for in each component of
the paper, this presentation covers a lot
of concrete examples from different
topics.
This slide was
added after
the live
session
Begin with a description of the
biotechnology and what it accomplishes.
Then, explain whether it involves
manipulating the DNA (or RNA) of an
organism, or simply utilizing the DNA (or
RNA) that is already there naturally.
• For your technology describe the applications
• Changing DNA versus Interpreting DNA
• Let’s look at some concrete examples of this
to illustrate the concept
Changing DNA or Interpreting DNA
• Changing DNA
– Human gene therapy
– Gene drives
– GM crops
– GM animals
• Interpreting DNA
– Crime scene analysis
– Precision Medicine
Changing DNA – applications
• Human gene therapy
– Curing a disease
• Gene drives
– Wiping out harmful pest populations
• GM crops
– Enhancing agriculture
• GM animals
– Agriculture, special products, research
Interpreting DNA – applications
• Crime scene analysis
– Identifying criminal perpetrators
• Precision Medicine
– Informing medical decisions
Interpreting DNA – applications
• Crime scene analysis (identifying criminal
perpetrators)
– CODIS System (using STRs)
– Phenotyping
• Precision Medicine (informing medical decisions)
– Medical decisions based on inherited DNA
• Prevention
• Treatment
– Custom treatment for tumors based on acquired
mutations
Explain the basics of how your
selected technology works.
• Focus on your biotechnology
• How does it work?
Changing DNA – how it works
• Human gene therapy (curing a disease)
– Compensating for a loss-of-function gene by adding a
copy of the functional gene
– Replacing a non-functional gene with the functional
gene
– Silencing a gene that is causing problems
• Gene drives (wiping out harmful pest
populations)
– Creating a situation in an organism will always get two
copies of a gene
• if it only gets one copy, another is added by special enzymes
Changing DNA – how it works
• GM crops (enhancing agriculture)
– Genes added (usually from another species) to
confer some trait such as disease resistance,
insect resistance, tolerance of drought/heat,
enhanced nutrition
– Genes suppressed to prevent ripening too fast,
etc.
Changing DNA – how it works
• GM animals (agriculture, special products, research)
– GM meats
• A gene has been added to meat animals to grow quicker, have less
fat, etc.
– “Pharming”
• A gene has been added so that the animal secretes a specialized
product in milk or other secretions
– Xenotransplantation
• Genes added/removed to make animals tissues more like human
tissue for transplantation purposes
– .
Similar to Syntactic-semantic analysis for information extraction in biomedicine (20)
This paper is the result of collaboration between two projects: Emocionário and eSPERTo.
Emocionário aims at organizing emotions in Portuguese and annotate them in corpora. eSPERTo is a paraphrasing system that uses the NooJ linguistic engine, grammars, and lexicons.
The aims for this collaboration were fivefold:(i) From the Emocionário’s point of view, it would be very useful to have an emotion paraphraser to help us identify more cases of emotions in our corpora; (ii) while from eSPERTo’s point of view adding emotion paraphrases would considerably enhance its paraphrasing power. (iii) Applying the emotion classification to an hitherto not used application domain would be a good way to evaluate Emocionário’s capabilities and shortcomings; (iv) and both projects would gain from learning more about real paraphrases of emotion in text. Finally, (v) an interesting question is to assess how good is the methodology employed to harvest emotion paraphrases from parallel text.
This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book "Os Livros Que Devoraram o Meu Pai" (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate “challenging” cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version.
O presente estudo propõe uma análise comparativa –linguística, mas também literária e cultural – entre as edições portuguesa e brasileira de uma obra de literatura infantojuvenil – Os Livros que devoraram o meu pai, do autor português Afonso Cruz –que integra as listas de leituras sugeridas, tanto nos planos curriculares de Portugal como do Brasil. O objetivo específico é apresentar e discutir uma seleção de unidades lexicais, locuções e estruturas frásicas com função adjetiva em alternância nas duas variedades – ou seja, entre as escolhas do autor na variedade PE e as correspondentes soluções adotadas na versão PB. A metodologia escolhida centra-se na análise linguística contrastiva posta em prática com o auxílio de ferramentas digitais baseadas no projeto eSPERTo com recurso a alinhamentos semiautomáticos usando a ferramenta CLUE-Aligner (REF). O corpus utilizado é composto pelas edições portuguesa e brasileira da obra em estudo. O objetivo geral deste trabalho é otimizar os processos editoriais necessariamente presentes na adaptação dos textos, assim como fazer o levantamento das principais dificuldades desse processo. Isso implica, entre outras coisas, uma tomada de consciência face aos limites impostos por um texto literário, como a ténue fronteira entre a adaptação indispensável e a intervenção excessiva. Partindo dos resultados alcançados, pretendemos ainda incentivara investigação de recursos linguísticos para os propósitos de edição, revisão e ensino de Português língua materna e/ou língua estrangeira, entre outras aplicações.
This presentation addresses the problem of translating SVC, such as fazer uma operação (to make an operation). In particular, it focus on the MT of biomedical-related SVC. It argues that paraphrasing can help translate these MWE with a higher quality. This work is based on my PhD research, which addressed the problem of paraphrasing and translating SVC in general.
ReWriter uses linguistically based automated paraphrasing and text-editing mechanisms to help users with their writing needs by providing suggestions for customized text authoring. It also generates word and phrasal usage data to help guide decision-making. ReWriter can be used in word processing applications or linguistic quality control for both source and target texts and it is a useful pre-editor for machine translation. The linguistic resources behind ReWriter, the paraphrasing grammars, and the tools from which ReWriter was derived will also be described, in this particular case, we illustrate ReWriter as a tool to process legal language.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
The poster shows how chatbots can play an important role in Language Learning applications.
This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
This paper presents the automation process of paraphrasing and converting Portuguese constructions typical of informal or spoken language into a formal written language. We illustrate this automation process with examples extracted from the e-PACT corpus that involve the placement of clitic pronouns in verbal compound contexts. Our task consists in paraphrasing and normalizing, among others, constructions such as "vou-lhe/posso-lhe fazer uma surpresa" into "vou/posso fazer-lhe uma surpresa" `lit: I will/can\_to him/her make a surprise / I will/can make\_to him/her a surprise; I will/can make him/her a surprise', where the clitic pronoun "lhe" migrates from an enclitic position after the first verb of the verbal compound to an enclitic position after the main verb, which is the verb responsible for the selection of that pronominal argument. The first verb is either an auxiliary verb or a volitive verb, e.g. "querer" `want'. This is a standard revision procedure in EP. Cases like this represent linguistic phenomena where in general language students and language users get confused or stumble. The paper focuses on general language where the phenomena being observed occur, describes examples of interest found in the corpus, and presents an automatic solution for the normalization of informal syntactic inadequacies found in the researched structures into standard formal writing structures through the application of very generic transformational grammars.
This paper presents the alignment of verbal predicate constructions with the clitic pronoun "lhe" in the European (EP) and Brazilian (BP) varieties of Portuguese, such as in the sentences "Já lhe} arrumaram a bagagem" | "Sua bagagem está seguramente guardada" 'His baggage is safely stowed away', where the EP dative proclisis "lhe" contrasts with the BP possessive pronoun "sua". We have selected several different paraphrastic contrasts, such as proclisis and enclisis, clitic pronouns co-occurring with relative pronouns and negation-type adverbs, among other constructions to illustrate the linguistic phenomenon. Some differences correspond to real contrasts between the two Portuguese varieties, while others purely represent stylistic choices. The contrasting variants were manually aligned in order to constitute a gold standard dataset, and a typology has been established to be further enlarged and made publicly available. The paraphrastic alignments were performed in the e-PACT corpus using the CLUE-Aligner tool. The research work was developed in the framework of the eSPERTo project.
This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT- [no seio de] [a União Europeia] EN- [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT- [no que diz respeito a] EN- [with regard to] or PT- [além disso] EN-[in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and
phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired
in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases.1 The construction of a larger dataset of
paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a
key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
Poster presented at the 2nd meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Alexandru Ioan Cuza University, in Iasi, Romania.
This poster shows paraphrastic suggestions in the eSPERTo paraphrasing system applied to a QA application on a virtual agent and to a summarization tool. It also shows how paraphrases can be used in language learning and the tests envisaged to make eSPERTo a Portuguese learning tool.
Resumo sobre o projeto eSPERTo (Sistema de Parafraseamento para Edição e Revisão de Texto) baseado numa entrevista ao programa Páginas de Português da Antena 2.
ReEscreve (in English, ReWriter) is a multi-purpose paraphraser that uses grammar-based paraphrasing capabilities suitable for source and target control (pre- and post-editing) and is useful for human and machine translation.
Spoken Language Systems Lab @ INESC-ID poster presented at the 1st meeting of the COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques, which took place at Eurac Research in Bolzano, Italy.
This presentation describes the integration of lexicon-grammar of predicate nouns with the support verb "fazer" ("to do" or "to make") into Port4NooJ, the Portuguese language module for NooJ. Port4NooJ resources are used by eSPERTo system to generate paraphrases, i.e., alternative ways to say or write the same sentence.
More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Comparative structure of adrenal gland in vertebrates
Syntactic-semantic analysis for information extraction in biomedicine
1. Syntactic-semantic analysis for
information extraction in biomedicine
Sérgio Matos1, Anabela Barreiro2
1IEETA, Universidade de Aveiro
2Centro de Linguística, Universidade do Porto
aleixomatos@ua.pt; barreiro_anabela@hotmail.com
June 2009
2. Outline
• Background
• Text Mining and Information Extraction in
Biomedicine
• Objectives
• Implementation
• Results
• Conclusions
3. Background
• Genomics and Proteomics are fast-growing fields
• Literature grows exponentially
– MEDLINE/PubMed ~ 18m citations
• Researchers need to contextualize their theories and findings
– Interactions between genes/proteins
– Involvement in biological processes and in disease
– And many other factors...
• How to keep up-to-date with new knowledge in the
field?
4. Background
• Manually curated biomedical databases are a good source of
information
– Publications are reviewed and important information added to DBs
(e.g. protein interactions)
– Impossible to keep DBs up-to-date due to increased volume of
publications
• Text Mining can be useful for
– Information retrieval (IR)
– Information extraction (IE)
– DB curators and end-users (researchers)
5. Text Mining and Information Extraction
in Biomedicine
• Text mining deals with the automated processing of texts to
derive high quality information
• Information Extraction can be seen as one application of TM
• Different processing levels
• Entity Recognition (ER) genes, proteins, etc.
• Normalization ATF2 - GeneID 1386
ATF-2 – Uniprot P15336
• Relation extraction PPI, gene/disease
• Event extraction gene expression, regulation
+ semantics + domain knowledge
6. Text Mining and Information Extraction
in Biomedicine
• Good results for NER, but limited to a few entity types
– 80%-90% for recognition of genes/proteins
– Need to include more entities, like chemical compounds, diseases,
experimental conditions
• Relation extraction has focused mostly on PPI
• Inter-concept relations not too explored
– e.g. gene/disease, drug/target
– mostly based on co-occurence statistics
7. Text Mining and Information Extraction
in Biomedicine
• Recent interest towards extraction of events
– BioNLP shared task and BioCreaTive II.5
• ... and other entities / facts
– e.g. Experimental conditions, lab techniques, measurements
• ... Discourse analysis
– “indicating/suggesting that...”, “in contrast...”
• Full-text vs. Abstracts
– Complexity in grammar
8. Linguistic Resources for Biomedical TM
• UMLS Metathesaurus
– various terms, all linked to same concept (e.g. ‘Hypertension’)
– semantic information provided by the UMLS Semantic Network
• BioLexicon
– Includes domain relevant verbs (localize, bind, express, …)
• Lexical resources can be created from available online DBs
– NCBI Entrez Gene for gene names
– UniProt for proteins
– OMIM for diseases
– Various ontologies
9. Objectives
• Extract phrases indicating a biomolecular event from
scientific text
• Biomolecular events include various types
– Examples
• “phosphorylation of TRAF2”
• “localization of beta-catenin”
• “TRADD interacts with TES2”
• BioNLP'09 Shared Task on Event Extraction
– http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/
10. Objectives
• Six event types considered
– Localization, Binding, Gene expression, Transcription, Protein
Catabolism, Phosphorylation
• Training data
– Annotation of genes/proteins occurring in each input text, including
the text section (start and end characters)
– Annotation of the events, including the event type, the participating
entities and the corresponding trigger word (with start and end times)
• Test data
– Annotation of participating genes/proteins is given
– Create annotation of events for the given entities
11. Implementation
• General approach
– Create syntactic grammars to detect phrases that indicate events
– Grammars are based only on NEs and domain verbs (and derived
names)
• Requisites
– Grammars outputs should indicate the event type
• Solution
– Event types can be associated with the trigger word using the
semantic properties in NooJ dictionaries
– Event types associated with each trigger word are derived from
training data
13. Implementation
Lemma PoS FLX
Semantic
properties
ID TAXID
human N TABLE ORGANISM 9606
Homo sapiens N ORGANISM 9606
Mus musculus N ORGANISM 10090
Breast cancer type 1
susceptibility protein
N PROTEIN P38398 9606
BRCA1 N PROTEIN P38398 9606
BRCA1 N PROTEIN P48754 10090
BRCA1 N GENE 672 9606
RNF53 N GENE 672 9606
14. Implementation
• Resources
– Entity dictionary
• Create dictionary with list of entities occurring in the texts
– BioLexicon verb dictionary
• Adapted to include event type
– From the training data, extract the verbs associated with events
– Add a semantic property to the dictionary entry indicating the event type
– Example: “express,V+EventType=Gene_Expression”
• Added inflectional and derivation rules
– The inflected and derivated forms inherit the verb’s semantic properties
15. • Verb dictionary
Implementation
Lemma PoS DRV FLX EventType
express V ION:TABLE ABOLISH Gene_expression
ligate V TION:TABLE SMILE Binding
stimulate V TION:TABLE SMILE Positive_regulation
16. Implementation
• Syntactic grammars
– Sentences from training set used to generate surface
patterns
– Manual procedure
– Seven grammars created
– Example:
“stimulation of human CD4”
18. Results
Pattern Concordance in text
<entity> [<entity_type>] <nominalization> HSP gene expression
<nominalization> “of” [<entity_type>] <entity> upregulation of Fas
<entity> [<entity_type>] <be> [“not”] [<adverb>] <verb>
IL-2R stimulation was totally
inhibited
<verb> <preposition> <entity> binding of TRAF2
<verb> <nominalization> “of” <entity>
suppressing activation of
STAT6
• Example patterns extracted from texts
19. Results
Event type Recall Precision F-score
Localization 35.63 70.45 47.33
Binding 13.54 34.06 19.38
Gene Expression 46.40 78.45 58.31
Transcription 33.58 41.07 36.95
Protein Catabolism 35.71 62.50 45.45
Phosphorylation 49.63 79.76 61.19
Average 36.76 65.58 47.11
• Average results
20. Conclusions
• NooJ syntactic grammars for IE
– Simple and flexible approach
– Takes advantage of semantic properties and inflectional and
derivational morphology in NooJ dictionaries
• Pattern identification
– Manual method is limited
– How to generate new patterns automatically ?
• Gene regulatory events
– Described by complex constructions
– Can syntactic grammars be used for this type of events ?
21. References and Acknowledgments
• BioLexicon was developed within the BOOTStrep project
– http://www.nactem.ac.uk/biolexicon/
– http://www.bootstrep.eu/bin/view/Extern/WebHome
• Data set from the BioNLP’09 Shared Task on Event Extraction
– http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/
Sérgio Matos is funded by Fundação para a Ciência e Tecnologia (FCT)
under the Ciência2007 programme .
Editor's Notes
Involvement of genes in biological processes and in disease
DB curators can use text mining tools to accelerate their job. They can sort the articles by relevance and find the relevant sentences. If an article seems to contain relevant information they still need to read the full article to validate that.
Users can use TM tools for better IR. There are also some tools that present the results obtained from IE
Information retrieval - find and rank publications containing relevant information for a particular gene/protein/disease/...
Information extraction – extract information such as interaction between two proteins or the involvement of a gene in a disease
Different processing levels in TM for biomedicine require different levels of domain knowledge (semantics)
Named entity recognition – recognizing gene/protein names, diseases, etc
Normalization – Similar genes in different organisms (human, mouse, fly, etc) usually have the same gene symbol. Also a gene and related protein frequently have the same name, and some times also for a gene and a related disease (there is high ambiguity!). Normalization means two things: disambiguation (is it a gene, protein, disease) and specifying which gene/protein it refers to (for example, the x gene in men, or the x gene in mice)
Relation extraction – extract a relation between a gene and a protein (“protein A, encoded by gene Y”), a gene and a disease (“Y is involved in X”), or between proteins (protein-protein interaction, PPI)
Event extraction – similar, but may involve one, two or more entities: “gene Y was expressed”, “protein X binding with protein Z”
This and next slide not too important. Just give an insight of where the field stands and where it’s going to (our own view)
The shared tasks are evaluation contests for IR and IE
There is some interest in discourse analysis (including me), for summarization for example, and for finding things like new research directions (“further work should be carried to validate .....”, “this may indicate ...”), hesitation/contradiction (“this seems to indicate...”, author X showed... However...”)
Most work is done over abstracts (from the MEDLINE/PubMed database) but most information is in the full-texts
Abstracts have a more “restrained” grammar as compared to full-text, and that “facilitates” our approach
The BioCreaTive challenge is on full texts
The BioLexicon is for now available in a early version, as a collection of tables that can be converted into a relational database (like MySQL). In this version the linguistic information is limited to a list of verbs
The final version should be available soon and it will include inflexional and derivational forms of the verbs. It will also include information about what a specific verb/noun may indicate in the text
A trigger word is the word in the sentence that indicates the event. In “TRADD interacts with TES2”, “interacts” is the trigger word
Given this framework, we do not need to do NER, as the entities that we should worry about are given to us.
Also, using the training data, it is simple to obtain a list of trigger words for each type of event.
Example entries in the biomedical dictionaries
We use separate dictionaries: one for organisms, other for genes+proteins
Simple to add new entities like diseases or anatomy (arm, leg, heart, ...)
IDs are obtained from major databases (Uniprot for proteins, Entrez Gene for genes, OMIM for diseases, etc)
Note ambiguity in BRCA1: can be either a human or mouse protein or a human gene
Note synonyms: BRCA1 and RNF53 represent the same human gene, with the unique ID 672 (NCBI gene ID)
Note: Mus musculus scientific name for mouse
As the current BioLexicon dictionaries do not contain semantic information about the verbs (which verbs are used to indicate a “localization”, “binding”, “regulation” and each other type of events) this had to be derived from the training data
EPIA paper: “Based on the manual linguistic annotations, we extracted the sentences corresponding to each event, and assigned the event type to the verbs found on those sentences. We then manually checked this list and selected only those verbs showing a specific link to a type of event. In case verbs were linked to more than one event type, only the most frequent event type was selected, and the remaining ones removed.“
- If “localize” is used to indicate a “localization” event 20 times in the training data and is used to indicate a “binding” event 1 time, then we only keep the “localization” type
Example entries in the verb dictionary
The verbs that may indicate an event also have a ‘+FUNC’ semantic property (not shown)
This is to differentiate from all other entries in the dictionary that are more general.
The dictionary does not include names or adjectives. Only the ones derived from these verbs are available.
Each grammar describes 1-3 syntactic patterns:
“Stimulation of human CD4” and “human CD4 stimulation”
Same grammar for both forms
Average results are good compared with other results in the BioNLP shared task and given the simple implementation
Binding events are more difficult because they usually (but not allways) include two proteins or genes
We did not cover the regulation / down-regulation / up-regulation for lack of time. These are even more complex and need more complex grammars
The proposed method takes advantage of the inflectional and derivational morphology and semantic properties established in dictionaries and grammars developed with NooJ, which allow to associate terminological verbs and their derivations to specific event types.
Methods such as the one proposed in this paper can be used to help database curators identify the most relevant facts in the literature and speed-up the annotation process. Tools based on these methods can also provide alternative querying and browsing of facts cited in the literature and be useful for researchers.