This document discusses techniques for text mining and natural language processing. It outlines basic steps like sentence splitting, tokenization, part-of-speech tagging, named entity recognition, and syntactic parsing. It provides examples of using these techniques and challenges that can arise, such as ambiguous tokens. Hidden Markov Models are also introduced as a machine learning approach for part-of-speech tagging.
This document discusses a pathway-guided approach to predicting the impact of mutations by integrating multiple omics data sources to understand how genes function in pathways. It predicts the functional consequences of mutations by quantifying their effects on surrounding pathways. Pathway signatures can implicate mutations in novel genes and identify critical points that distinguish subtypes. The approach uses curated pathway models and patient omics data to infer pathway activities, which are then used as predictive features for outcomes like drug response or subtypes. This reduces the number of features compared to using individual genes.
Tdp2 and Top2β are proteins involved in DNA repair. This study examines their expression patterns in developing zebrafish embryos to gain insight into their potential roles in neural development. DIG-labeled probes were designed and used to perform whole-mount in situ hybridization, which showed that tdp2 and top2β are specifically expressed in the brain at certain timepoints. Further experiments inhibiting Tdp2 are proposed to study its relationship with Top2β during neural development.
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
PPTX - Research In Educational Technology: Expanding Possibilitiesbutest
This document summarizes Richard Anderson's research into expanding education through technology. It discusses using video conferencing for distance education between universities. It also explores tutored video instruction, where pre-recorded lectures are shown with a facilitator, and digital study halls, which aim to provide primary education in rural India. The research found that facilitating interaction across remote sites, high video quality, and active student participation were important for successful distance education. Facilitated video instruction was also effective, especially with support for facilitators. These techniques show promise for enhancing education in low-resource environments.
Flexible and efficient Gaussian process models for machine ...butest
This document presents a dissertation on developing computationally efficient Gaussian process models for machine learning tasks. The author develops several techniques to reduce the training cost of Gaussian processes from O(N3) to O(NM2), where M is much smaller than the number of training points N. This includes a sparse pseudo-input Gaussian process (SPGP) method that uses a set of M "pseudo-inputs" optimized during training. The author also combines local and global approximations in a partially independent training conditional approach. Further, variable noise models and dimensionality reduction are introduced to increase the applicability of Gaussian processes to complex datasets. Empirical results demonstrate the effectiveness of the proposed methods.
The document discusses a machine learning lecture on supervised learning. It provides an overview of administrative details for the class like project requirements and deadlines. It then reviews concepts from the previous lecture like the differences between supervised, unsupervised, and reinforcement learning. The remainder of the document outlines the topics to be covered in the current lecture, including defining supervised learning, discussing hypothesis spaces, and introducing linear threshold algorithms.
The document discusses different types of machine learning including supervised learning, unsupervised learning, and reinforced learning. It provides examples of classification problems and how machine learning algorithms can be used to build models to classify new data based on patterns learned from training data. Examples of learning paradigms like decision trees and artificial neural networks are also mentioned. The key benefits of machine learning like improving performance over time and handling unknown environments are highlighted.
A Machine Learning Toolkit for Power Systems Security Analysisbutest
This document describes a machine learning toolkit called UMLPSE that has been developed to facilitate power systems security analysis and contingency analysis. UMLPSE has a modular structure combining independent power flow and machine learning tool packages. It incorporates a data warehouse to store operating point data and contingency analysis results. The toolkit is applied to assess contingencies on the power grid of Crete, Greece involving 61 buses and 78 lines. It uses various operating points simulated through different connectivity, load, and generation scenarios to train machine learning models to automatically rank contingencies by risk for new operating points.
This document discusses a pathway-guided approach to predicting the impact of mutations by integrating multiple omics data sources to understand how genes function in pathways. It predicts the functional consequences of mutations by quantifying their effects on surrounding pathways. Pathway signatures can implicate mutations in novel genes and identify critical points that distinguish subtypes. The approach uses curated pathway models and patient omics data to infer pathway activities, which are then used as predictive features for outcomes like drug response or subtypes. This reduces the number of features compared to using individual genes.
Tdp2 and Top2β are proteins involved in DNA repair. This study examines their expression patterns in developing zebrafish embryos to gain insight into their potential roles in neural development. DIG-labeled probes were designed and used to perform whole-mount in situ hybridization, which showed that tdp2 and top2β are specifically expressed in the brain at certain timepoints. Further experiments inhibiting Tdp2 are proposed to study its relationship with Top2β during neural development.
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
PPTX - Research In Educational Technology: Expanding Possibilitiesbutest
This document summarizes Richard Anderson's research into expanding education through technology. It discusses using video conferencing for distance education between universities. It also explores tutored video instruction, where pre-recorded lectures are shown with a facilitator, and digital study halls, which aim to provide primary education in rural India. The research found that facilitating interaction across remote sites, high video quality, and active student participation were important for successful distance education. Facilitated video instruction was also effective, especially with support for facilitators. These techniques show promise for enhancing education in low-resource environments.
Flexible and efficient Gaussian process models for machine ...butest
This document presents a dissertation on developing computationally efficient Gaussian process models for machine learning tasks. The author develops several techniques to reduce the training cost of Gaussian processes from O(N3) to O(NM2), where M is much smaller than the number of training points N. This includes a sparse pseudo-input Gaussian process (SPGP) method that uses a set of M "pseudo-inputs" optimized during training. The author also combines local and global approximations in a partially independent training conditional approach. Further, variable noise models and dimensionality reduction are introduced to increase the applicability of Gaussian processes to complex datasets. Empirical results demonstrate the effectiveness of the proposed methods.
The document discusses a machine learning lecture on supervised learning. It provides an overview of administrative details for the class like project requirements and deadlines. It then reviews concepts from the previous lecture like the differences between supervised, unsupervised, and reinforcement learning. The remainder of the document outlines the topics to be covered in the current lecture, including defining supervised learning, discussing hypothesis spaces, and introducing linear threshold algorithms.
The document discusses different types of machine learning including supervised learning, unsupervised learning, and reinforced learning. It provides examples of classification problems and how machine learning algorithms can be used to build models to classify new data based on patterns learned from training data. Examples of learning paradigms like decision trees and artificial neural networks are also mentioned. The key benefits of machine learning like improving performance over time and handling unknown environments are highlighted.
A Machine Learning Toolkit for Power Systems Security Analysisbutest
This document describes a machine learning toolkit called UMLPSE that has been developed to facilitate power systems security analysis and contingency analysis. UMLPSE has a modular structure combining independent power flow and machine learning tool packages. It incorporates a data warehouse to store operating point data and contingency analysis results. The toolkit is applied to assess contingencies on the power grid of Crete, Greece involving 61 buses and 78 lines. It uses various operating points simulated through different connectivity, load, and generation scenarios to train machine learning models to automatically rank contingencies by risk for new operating points.
Natural language processing (NLP) is a field that develops techniques to allow computers to analyze, understand, and generate human language. NLP aims to address challenges in areas like information extraction, automatic summarization, and dialogue systems. OpenNLP is an open source Java toolkit that provides common NLP tasks like sentence detection, tokenization, part-of-speech tagging, named entity recognition, and parsing.
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
This slides introduce a python toolkit for Natural Language Processing (NLP). The author introduces several useful topics in NLTK and demonstrates with code examples.
Text mining tools for semantically enriching scientific literatureDuncan Hull
1) Text mining tools can semantically enrich scientific literature by extracting concepts, relationships, and facts to enable more precise semantic searching beyond keywords.
2) This allows documents to be annotated with semantic metadata derived from text mining, improving information access and discovery of hidden links and associations.
3) Systems have been developed that leverage techniques such as named entity recognition, relationship extraction, and ontology population to provide semantically searchable databases of literature.
This document provides an overview of a talk on genome curation and manual annotation using the Apollo genome annotation tool. The talk aims to help scientists understand the genome curation process from assembled genome to automated and manual annotation. It will introduce Apollo and teach how to identify homologs of known genes, corroborate and modify automated gene models using evidence in Apollo. The talk will also refresh attendees on key biological concepts like the definition of a gene, central dogma, transcription, and translation to better understand manual annotation.
This document describes a pipeline for predicting long non-coding RNA (lncRNA) transcripts from comprehensive rat renal cell-type specific transcriptome libraries. The pipeline applies characteristics of lncRNAs, such as shorter open reading frames, lack of conserved domains, and cell-type specific expression, to filter transcripts from the libraries. Applying this multi-step filtering process identifies transcripts predicted to be lncRNAs based on satisfying all the characteristics. The results are stored in GTF format for further analysis and classification of different lncRNA types.
1. The document outlines objectives and topics for a lesson on DNA and genetics, including describing DNA structure and function, DNA replication and protein synthesis, and implications of DNA research.
2. Key concepts covered are DNA structure, transcription and translation processes, differences between DNA, mRNA and tRNA, mutations, and medical applications of understanding DNA.
3. Students will complete activities like extracting DNA from cells and building DNA and RNA models to reinforce these concepts.
The Phenoscape Knowledgebase is a collaboration between researchers at multiple institutions that aims to semantically integrate phenotype data. It contains over 4 million asserted links between data from fish systematics publications, Zebrafish model organism databases, and ontologies. Researchers annotate evolutionary phenotype characters using Phenex and the annotations are stored in an Ontology-Based Database along with the ontologies. The knowledgebase uses reasoning to infer new links between phenotypes, genes, taxa and anatomy and provides tools for exploring and mining the integrated data.
This is an introduction to conducting manual annotation efforts using Apollo. This webinar was offered to members of the i5K Research community on 2015-10-07.
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project on Eurytemora affinis
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
Transposable elements, or transposons, are DNA sequences that can move within genomes. There are two main classes of transposons: those that encode proteins to directly move the DNA element, and retrotransposons that move via an RNA intermediate using reverse transcriptase. Barbara McClintock discovered transposons in the 1940s and 1950s through her studies of maize, where she observed "jumping genes" that caused mosaic color patterns in kernels. Transposons are found in both prokaryotes and eukaryotes and can insert into new locations in genomes, sometimes causing mutations. They have played an important role in genome evolution and can continue to induce genetic variation.
This document discusses chromatin and transposable genetic elements. It defines chromatin as a combination of DNA and proteins that condenses to form chromosomes during cell division in eukaryotes. It is located in the cell nucleus and functions to compact DNA. The document also defines transposable elements as mobile genetic sequences that can move within and between genomes. It describes different types of transposable elements found in prokaryotes and eukaryotes, such as insertion sequences, transposons, Ty elements, SINEs, and LINEs. These elements can cause genetic changes by inserting into genes or regulatory sequences.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This document discusses text-to-speech (TTS) technology for the Khmer language. It describes TTS as a system that takes electronic text input in Khmer Unicode and outputs synthesized Khmer speech. The document outlines the author's concatenation-based synthesis method using diphones and the steps involved, including word segmentation, text normalization, text-to-sound conversion, syllabification, stress assignment, and sound changes. It also mentions developing a new statistical TTS system using a speech corpus and either an HMM labeler or Sphinx for automatic labeling and unit selection or parameter synthesis.
Cambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 Samplemrexham
This is a widescreen fully animated and editable PowerPoint presentation that covers the first half of section 1.6 of the Cambridge Pre-U Biology course.
It is 64 slides long and covers the following topics:
What is a gene?
How does the genetic code work?
Protein synthesis
The lac operon
Variation
Proteomics and genomics
The full PowerPoint can be downloaded from mrexham.com
Chromatin is a complex of DNA and proteins found in the nucleus of eukaryotic cells. It functions to compact DNA into chromosomes. Chromatin is made up of nucleosomes, which consist of DNA wound around histone proteins. Nucleosomes can further condense to form chromatin fibers. During cell division, chromatin condenses even further to form chromosomes. Chromatin is involved in processes like DNA replication, transcription, repair, and recombination. There are two main types: euchromatin, which decondenses during interphase, and heterochromatin, which remains condensed. Transposable elements are DNA sequences that can change positions within a genome. They are found in both prokaryotes and eukary
This document contains review questions about DNA, mutations, transcription and translation. It asks about the differences between DNA and RNA, the four bases of RNA, the types of RNA involved in translation, and components of DNA like purines and pyrimidines. Questions also cover mutations like frameshift, substitution and their effects on proteins. The central dogma of biology and components of proteins are discussed. Transcription and translation examples are provided to analyze the impact of a point mutation.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Natural language processing (NLP) is a field that develops techniques to allow computers to analyze, understand, and generate human language. NLP aims to address challenges in areas like information extraction, automatic summarization, and dialogue systems. OpenNLP is an open source Java toolkit that provides common NLP tasks like sentence detection, tokenization, part-of-speech tagging, named entity recognition, and parsing.
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
This slides introduce a python toolkit for Natural Language Processing (NLP). The author introduces several useful topics in NLTK and demonstrates with code examples.
Text mining tools for semantically enriching scientific literatureDuncan Hull
1) Text mining tools can semantically enrich scientific literature by extracting concepts, relationships, and facts to enable more precise semantic searching beyond keywords.
2) This allows documents to be annotated with semantic metadata derived from text mining, improving information access and discovery of hidden links and associations.
3) Systems have been developed that leverage techniques such as named entity recognition, relationship extraction, and ontology population to provide semantically searchable databases of literature.
This document provides an overview of a talk on genome curation and manual annotation using the Apollo genome annotation tool. The talk aims to help scientists understand the genome curation process from assembled genome to automated and manual annotation. It will introduce Apollo and teach how to identify homologs of known genes, corroborate and modify automated gene models using evidence in Apollo. The talk will also refresh attendees on key biological concepts like the definition of a gene, central dogma, transcription, and translation to better understand manual annotation.
This document describes a pipeline for predicting long non-coding RNA (lncRNA) transcripts from comprehensive rat renal cell-type specific transcriptome libraries. The pipeline applies characteristics of lncRNAs, such as shorter open reading frames, lack of conserved domains, and cell-type specific expression, to filter transcripts from the libraries. Applying this multi-step filtering process identifies transcripts predicted to be lncRNAs based on satisfying all the characteristics. The results are stored in GTF format for further analysis and classification of different lncRNA types.
1. The document outlines objectives and topics for a lesson on DNA and genetics, including describing DNA structure and function, DNA replication and protein synthesis, and implications of DNA research.
2. Key concepts covered are DNA structure, transcription and translation processes, differences between DNA, mRNA and tRNA, mutations, and medical applications of understanding DNA.
3. Students will complete activities like extracting DNA from cells and building DNA and RNA models to reinforce these concepts.
The Phenoscape Knowledgebase is a collaboration between researchers at multiple institutions that aims to semantically integrate phenotype data. It contains over 4 million asserted links between data from fish systematics publications, Zebrafish model organism databases, and ontologies. Researchers annotate evolutionary phenotype characters using Phenex and the annotations are stored in an Ontology-Based Database along with the ontologies. The knowledgebase uses reasoning to infer new links between phenotypes, genes, taxa and anatomy and provides tools for exploring and mining the integrated data.
This is an introduction to conducting manual annotation efforts using Apollo. This webinar was offered to members of the i5K Research community on 2015-10-07.
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project on Eurytemora affinis
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
Transposable elements, or transposons, are DNA sequences that can move within genomes. There are two main classes of transposons: those that encode proteins to directly move the DNA element, and retrotransposons that move via an RNA intermediate using reverse transcriptase. Barbara McClintock discovered transposons in the 1940s and 1950s through her studies of maize, where she observed "jumping genes" that caused mosaic color patterns in kernels. Transposons are found in both prokaryotes and eukaryotes and can insert into new locations in genomes, sometimes causing mutations. They have played an important role in genome evolution and can continue to induce genetic variation.
This document discusses chromatin and transposable genetic elements. It defines chromatin as a combination of DNA and proteins that condenses to form chromosomes during cell division in eukaryotes. It is located in the cell nucleus and functions to compact DNA. The document also defines transposable elements as mobile genetic sequences that can move within and between genomes. It describes different types of transposable elements found in prokaryotes and eukaryotes, such as insertion sequences, transposons, Ty elements, SINEs, and LINEs. These elements can cause genetic changes by inserting into genes or regulatory sequences.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This document discusses text-to-speech (TTS) technology for the Khmer language. It describes TTS as a system that takes electronic text input in Khmer Unicode and outputs synthesized Khmer speech. The document outlines the author's concatenation-based synthesis method using diphones and the steps involved, including word segmentation, text normalization, text-to-sound conversion, syllabification, stress assignment, and sound changes. It also mentions developing a new statistical TTS system using a speech corpus and either an HMM labeler or Sphinx for automatic labeling and unit selection or parameter synthesis.
Cambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 Samplemrexham
This is a widescreen fully animated and editable PowerPoint presentation that covers the first half of section 1.6 of the Cambridge Pre-U Biology course.
It is 64 slides long and covers the following topics:
What is a gene?
How does the genetic code work?
Protein synthesis
The lac operon
Variation
Proteomics and genomics
The full PowerPoint can be downloaded from mrexham.com
Chromatin is a complex of DNA and proteins found in the nucleus of eukaryotic cells. It functions to compact DNA into chromosomes. Chromatin is made up of nucleosomes, which consist of DNA wound around histone proteins. Nucleosomes can further condense to form chromatin fibers. During cell division, chromatin condenses even further to form chromosomes. Chromatin is involved in processes like DNA replication, transcription, repair, and recombination. There are two main types: euchromatin, which decondenses during interphase, and heterochromatin, which remains condensed. Transposable elements are DNA sequences that can change positions within a genome. They are found in both prokaryotes and eukary
This document contains review questions about DNA, mutations, transcription and translation. It asks about the differences between DNA and RNA, the four bases of RNA, the types of RNA involved in translation, and components of DNA like purines and pyrimidines. Questions also cover mutations like frameshift, substitution and their effects on proteins. The central dogma of biology and components of proteins are discussed. Transcription and translation examples are provided to analyze the impact of a point mutation.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
1. Linguistic techniques for Text
Mining
NaCTeM team
www.nactem.ac.uk
Sophia Ananiadou
Chikashi Nobata
Yutaka Sasaki
Yoshimasa Tsuruoka
2. lexicon ontology
Natural Language Processing
deep annotated
raw part-of-speech named entity
syntactic
(unstructured) tagging recognition (structured)
parsing
text text
………………………………..………… S
……………………………….………....
... Secretion of TNF was abolished by VP
BHA in PMA-stimulated U937 cells.
…………………………………………… VP
NP
………………..
PP
NP PP PP NP
NN IN NN VBZ VBN IN NN IN JJ NN NNS .
Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells .
protein_molecule organic_compound cell_line
negative regulation 2
3. Basic Steps of Natural Language
Processing
• Sentence splitting
• Tokenization
• Part-of-speech tagging
• Shallow parsing
• Named entity recognition
• Syntactic parsing
• (Semantic Role Labeling)
3
4. Sentence splitting
Current immunosuppression protocols to prevent lung transplant rejection
reduce pro-inflammatory and T-helper type 1 (Th1) cytokines. However, Th1
T-cell pro-inflammatory cytokine production is important in host defense
against bacterial infection in the lungs. Excessive immunosuppression of Th1
T-cell pro-inflammatory cytokines leaves patients susceptible to infection.
Current immunosuppression protocols to prevent lung transplant rejection
reduce pro-inflammatory and T-helper type 1 (Th1) cytokines.
However, Th1 T-cell pro-inflammatory cytokine production is important in host
defense against bacterial infection in the lungs.
Excessive immunosuppression of Th1 T-cell pro-inflammatory cytokines
leaves patients susceptible to infection.
4
5. A heuristic rule for sentence splitting
sentence boundary
= period + space(s) + capital letter
Regular expression in Perl
s/. +([A-Z])/.n1/g;
5
6. Errors
IL-33 is known to induce the production of Th2-associated
cytokines (e.g. IL-5 and IL-13).
IL-33 is known to induce the production of Th2-associated
cytokines (e.g.
IL-5 and IL-13).
• Two solutions:
– Add more rules to handle exceptions
– Machine learning
6
7. Tools for sentence splitting
• JASMINE
– Rule-based
– http://uvdb3.hgc.jp/ALICE/program_download.html
• Scott Piao‟s splitter
– Rule-based
– http://text0.mib.man.ac.uk:8080/scottpiao/sent_det
ector
• OpenNLP
– Maximum-entropy learning
– https://sourceforge.net/projects/opennlp/
– Needs training data
7
8. Tokenization
The protein is activated by IL2.
The protein is activated by IL2 .
• Convert a sentence into a sequence of tokens
• Why do we tokenize?
• Because we do not want to treat a sentence as a
sequence of characters!
8
9. Tokenization
The protein is activated by IL2.
The protein is activated by IL2 .
• Tokenizing general English sentences is
relatively straightforward.
• Use spaces as the boundaries
• Use some heuristics to handle exceptions
9
10. Tokenisation issues
• separate possessive endings or abbreviated forms from
preceding words:
– Mary‟s Mary „s
Mary‟s Mary is
Mary‟s Mary has
• separate punctuation marks and quotes from words :
– Mary. Mary .
– “new” “ new “
10
11. Tokenization
• Tokenizer.sed: a simple script in sed
• http://www.cis.upenn.edu/~treebank/tokenization.h
tml
• Undesirable tokenization
– original: “1,25(OH)2D3”
– tokenized: “1 , 25 ( OH ) 2D3”
• Tokenization for biomedical text
– Not straight-forward
– Needs dictionary? Machine learning?
11
12. Tokenisation problems in Bio-text
• Commas
– 2,6-diaminohexanoic acid
– tricyclo(3.3.1.13,7)decanone
• Four kinds of hyphens
– “Syntactic:”
– Calcium-dependent
– Hsp-60
– Knocked-out gene: lush-- flies
– Negation: -fever
– Electric charge: Cl-
K. Cohen NAACL-2007 12
13. Tokenisation
• Tokenization: Divides the text into smallest
units (usually words), removing punctuation.
Challenge: What should be done with
punctuation that has linguistic meaning?
• Negative charge (Cl-)
• Absence of symptom (-fever)
• Knocked-out gene (Ski-/-)
• Gene name (IL-2 –mediated)
• Plus, “syntactic”uses (insulin-dependent)
K. Cohen NAACL-2007
13
14. Part-of-speech tagging
The peri-kappa B site mediates human immunodeficiency
DT NN NN NN VBZ JJ NN
virus type 2 enhancer activation in monocytes …
NN NN CD NN NN IN NNS
• Assign a part-of-speech tag to each token in a
sentence.
14
15. Part-of-speech tags
• The Penn Treebank tagset
– http://www.cis.upenn.edu/~treebank/
– 45 tags
NN Noun, singular or mass JJ Adjective
NNS Noun, plural JJR Adjective, comparative
NNP Proper noun, singular JJS Adjective, superlative
NNPS Proper noun, plural : :
: : DT Determiner
VB Verb, base form CD Cardinal number
VBD Verb, past tense CC Coordinating conjunction
VBG Verb, gerund or present participle IN Preposition or subordinating
VBN Verb, past participle conjunction
VBZ Verb, 3rd person singular present FW Foreign word
: : : :
15
16. Part-of-speech tagging is not easy
• Parts-of-speech are often ambiguous
I have to go to school.
verb
I had a go at skiing.
noun
• We need to look at the context
• But how?
16
17. Writing rules for part-of-speech tagging
I have to go to school. I had a go at skiing.
verb noun
• If the previous word is “to”, then it‟s a verb.
• If the previous word is “a”, then it‟s a noun.
• If the next word is …
:
Writing rules manually is impossible
17
18. Learning from examples
The involvement of ion channels in B and T lymphocyte activation is
DT NN IN NN NNS IN NN CC NN NN NN VBZ
supported by many reports of changes in ion fluxes and membrane
VBN IN JJ NNS IN NNS IN NN NNS CC NN
…………………………………………………………………………………….
…………………………………………………………………………………….
training
Unseen text
We demonstrate
We demonstrate PRP VBP
Machine Learning
that … that …
Algorithm
IN
18
19. Part-of-speech tagging with Hidden
Markov Models
P w1...wn | t1...tn P t1...tn
P t1...tn | w1...wn
tags words P w1...wn
P w1...wn | t1...tn P t1...tn
n
P wi | ti P ti | ti 1
i 1
output probability transition probability
19
20. First-order Hidden Markov Models
• Training
– Estimate P word j | tagx
P tag y | tagz
– Counting (+ smoothing)
• Using the tagger
n
arg max P wi | ti P ti | ti 1
i 1
20
21. Machine learning using diverse features
• We want to use diverse types of
information when predicting the tag.
He opened it
Verb
The word is “opened”
The suffix is “ed”
many clues The previous word is “He”
:
21
22. Machine learning with log-linear models
Feature function
Feature weight
1
p y|x exp f x, y
i i
Z x i
Z x exp f x, y
i i
y i
22
23. Machine learning with log-linear models
• Maximum likelihood estimation
– Find the parameters that maximize the
conditional log-likelihood of the training data
~ x ~ y| x
p p
LL( ) log p y|x
x, y
• Gradient
LL( )
E~ [ fi ] E p [ fi ]
p
i 23
24. Computing likelihood and model
expectation
• Example
– Two possible tags: “Noun” and “Verb”
– Two types of features: “word” and “suffix”
He opened it
Noun Verb Noun
tag verb, word opened tag verb, suffix ed
tag noun , word opened tag noun , suffix ed tag verb, word opened tag verb, suffix ed
24
tag = noun tag = verb
25. Conditional Random Fields (CRFs)
• A single log-linear model on the whole sentence
F n
1
P( y1...yn | x) exp f t , yt 1 , yt , x
i i
Z i 1 t 1
• The number of classes is HUGE, so it is
impossible to do the estimation in a naive way.
25
26. Conditional Random Fields (CRFs)
• Solution
– Let‟s restrict the types of features
– You can then use a dynamic programming
algorithm that drastically reduces the amount of
computation
• Features you can use (in first-order CRFs)
– Features defined on the tag
– Features defined on the adjacent pair of tags
26
27. Features
• Feature weights are associated with states
W0=He
and edges &
Tag = Noun
He has opened it
Noun Noun Noun Noun
Tagleft = Noun
Verb Verb Verb Verb
&
Tagright = Noun
27
32. Maximum entropy learning and
Conditional Random Fields
• Maximum entropy learning
– Log-linear modeling + MLE
– Parameter estimation
• Likelihood of each sample
• Model expectation of each feature
• Conditional Random Fields
– Log-linear modeling on the whole sentence
– Features are defined on states and edges
– Dynamic programming
32
33. POS tagging algorithms
• Performance on the Wall Street Journal corpus
Training Speed Accura
Cost cy
Dependency Net (2003) Low Low 97.2
Conditional Random Fields High High 97.1
Support vector machines (2003) 97.1
Bidirectional MEMM (2005) Low 97.1
Brill‟s tagger (1995) Low 96.6
HMM (2000) Very low High 96.7
33
35. Tagging errors made by
a WSJ-trained POS tagger
… and membrane potential after mitogen binding.
CC NN NN IN NN JJ
… two factors, which bind to the same kappa B enhancers…
CD NNS WDT NN TO DT JJ NN NN NNS
… by analysing the Ag amino acid sequence.
IN VBG DT VBG JJ NN NN
… to contain more T-cell determinants than …
TO VB RBR JJ NNS IN
Stimulation of interferon beta gene transcription in vitro by
NN IN JJ JJ NN NN IN NN IN
35
36. Taggers for general text do not work well
on biomedical text
Performance of the Brill tagger evaluated on randomly selected 1000
MEDLINE sentences: 86.8% (Smith et al., 2004)
Accuracy
Exact 84.4%
NNP = NN, NNPS = NNS 90.0%
LS = NN 91.3%
JJ = NN 94.9%
Accuracies of a WSJ-trained POS tagger evaluated on the GENIA
corpus (Tsuruoka et al., 2005)
36
37. MedPost
(Smith et al., 2004)
• Hidden Markov Models (HMMs)
• Training data
– 5700 sentences randomly selected from various
thematic subsets.
• Accuracy
– 97.43% (native tagset), 96.9% (Penn tagset)
– Evaluated on 1,000 sentences
• Available from
– ftp://ftp.ncbi.nlm.nih.gov/pub/lsmith/MedPost/medpost.tar.gz
37
39. Performance on new data
Relative performance evaluated on recent abstracts selected from
three journals:
- Nucleic Acid Research (NAR)
- Nature Medicine (NMED)
- Journal of Clinical Investigation (JCI)
training NAR NMED NMED Total (Acc.)
WSJ 109 47 102 258 (70.9%)
GENIA 121 74 132 327 (89.8%)
PennBioIE 129 65 122 316 (86.6%)
WSJ + GENIA 125 74 135 334 (91.8%)
WSJ + PennBioIE 133 71 133 337 (92.6%)
GENIA + PennBioIE 128 75 135 338 (92.9%)
WSJ + GENIA + PennBioIE 133 74 139 346 (95.1%)
39
40. Chunking (shallow parsing)
He reckons the current account deficit will narrow to
NP VP NP VP PP
only # 1.8 billion in September .
NP PP NP
• A chunker (shallow parser) segments a
sentence into non-recursive phrases.
40
43. Machine learning-based chunking
• Convert a treebank into sentences that are
annotated with chunk information.
– CoNLL-2000 data set
• http://www.cnts.ua.ac.be/conll2000/chunking/
• The conversion script is available
• Apply a sequence tagging algorithm such as
HMM, MEMM, CRF, or Semi-CRF.
• YamCha: an SVM-based chunker
– http://www.chasen.org/~taku/software/yamcha/
43
44. GENIA tagger
• Algorithm: Bidirectional MEMM
• POS tagging
– Trained on WSJ, GENIA and Penn BioIE
– Accuracy: 97-98%
• Shallow parsing
– Trained on WSJ and GENIA
– Accuracy: 90-94%
• Can output base forms
• Available from
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/
44
45. Named-Entity Recognition
We have shown that interleukin-1 (IL-1) and IL-2 control
protein protein protein
IL-2 receptor alpha (IL-2R alpha) gene transcription in
DNA
CD4-CD8-murine T lymphocyte precursors.
cell_line
• Recognize named-entities in a sentence.
– Gene/protein names
– Protein, DNA, RNA, cell_line, cell_type
45
46. Performance of biomedical NE recognition
• Shared task data for Coling 2004 BioNLP workshop
- entity types: protein, DNA, RNA, cell_type, and cell_line
Recall Precision F-score
SVM+HMM (Zhou, 2004) 76.0 69.4 72.6
Semi-Markov CRFs (in prep.) 72.7 70.4 71.5
Two-Phase (Kim, 2005) 72.8 69.7 71.2
Sliding Window (in prep.) 71.5 70.2 70.8
CRF (Settles, 2005) 72.0 69.1 70.5
MEMM (Finkel, 2004) 71.6 68.6 70.1
: : : :
46
47. Features
Classification models, main features used in NLPBA (Kim, 2004)
CM lx af or sh g gz p n sy tr a ca d p pr ext.
n o p b o a
Zho SH x x x x x x x x x
Fin M x x x x x x x x x x B,
W
Set C x x x x (x) (x) x (W)
Son SC x x x x x V
Classification Model (CM):
Zha H x x M
S: SVM; H: HMM; M: MEMM; C: CRF
Features
lx: lexical features; af: affix information (chracter n-grams); or; orthographic Information;
sh: word shapes; gn: gene sequence; gz: gazetteers; po: part-of-speech tags; np: noun
phrase tags; sy: syntactic tags; tr: word triggers; ab: abbreviations; ca: cascaded entities;
do: global document information; pa: parentheses handling; pre: previously predicted entity
tags; B: British National Corpus; W: WWW; V: virtually generated corpus; M: MEDLINE
47
48. CFG parsing
S
VP
NP
NP QP
VBN NN VBD DT JJ CD CD NNS .
Estimated volume was a light 2.4 million ounces .
48
49. Phrase structure + head information
S
VP
NP
NP QP
VBN NN VBD DT JJ CD CD NNS .
Estimated volume was a light 2.4 million ounces .
49
50. Dependency relations
VBN NN VBD DT JJ CD CD NNS .
Estimated volume was a light 2.4 million ounces .
50
51. CFG parsing algorithms
• Performance on the Penn Treebank
LR LP F-score
Generative model (Collins, 1999) 88.1 88.3 88.2
Maxent-inspired (Charniak, 2000) 89.6 89.5 89.5
Simply Synchrony Networks (Henderson, 2004) 89.8 90.4 90.1
Data Oriented Parsing (Bod, 2003) 90.8 90.7 90.7
Re-ranking (Johnson, 2005) 91.0
51
57. HPSG parsing
HEAD: verb • HPSG
SUBJ: <>
COMPS: <> – A few schema
Subject-head schema
– Many lexical entries
HEAD: verb – Deep syntactic
SUBJ: <noun> analysis
Lexical entry COMPS: <>
Head-modifier schema
• Grammar
– Corpus-based
HEAD: noun HEAD: verb
SUBJ: <> SUBJ: <noun> HEAD:
adv grammar construction
COMPS: <> COMPS: <> MOD: verb (Miyao et al 2004)
• Parser
Mary walked slowly
– Beam search
(Tsuruoka et al.)
57
58. Experimental results
• Training set: Penn Treebank Section 02-21
(39,832 sentences)
• Test set: Penn Treebank Section 23 (< 40 words,
2,164 sentences)
• Accuracy of predicate argument relations (i.e.,
red arrows) is measured
Precision Recall F-score
87.9% 86.9% 87.4%
58
59. Parsing MEDLINE with HPSG
• Enju
– A wide-coverage HPSG parser
– http://www-tsujii.is.s.u-tokyo.ac.jp/enju/
59
60. Extraction of Protein-protein Interactions:
Predicate-argument relations + SVM (1)
• (Yakushiji, 2005)
CD4 protein interacts with non-polymorphic regions of MHCII .
ENTITY1 ENTITY2
Extraction patterns based on predicate-argument relations
argM arg1 arg1 arg2 arg1 arg2
CD4 protein interact with non-polymorphic region of MHCII
ENTITY1 ENTITY2
arg1
SVM learning with predicate-argument patterns
60
61. Text Mining for Biology
• MEDIE: An interactive intelligent IR
system retrieving events
– Performs a semantic search
• InfoPubMed: an interactive IE system and
an efficient PubMed search tool, helping
users to find information about biomedical
entities such as genes, proteins, and the
interactions between them.
61
62. Medie system overview
Off-line
On-line
Deep
parser RegionAlgebra
Input Semantically-
Textbase annotated Search engine
Entity Textbase
Recognizer
Search
Query
results
62
65. Service: extracting interactions
• Info-PubMed: interactive IE system and an
efficient PubMed search tool, helping users
to find information about biomedical entities
such as genes, proteins,and the
interactions between them.
• System components
– MEDIE
– Extraction of protein-protein interactions
– Multi-window interface on a browser
• UTokyo: NaCTeM self-funded partner 65
66. Info-PubMed
• helps biologists to search for their interests
– genes, proteins, their interactions, and
evidence sentences
– extracted from MEDLINE
(about 15 million abstracts of
biomedical papers)
• uses many NLP techniques explained
– in order to achieve high precision of retrieval
66
67. Flow Chart
Input Output
Gene or protein token:“TNF” Gene or protein
keywords entities
Gene:“TNF”
interactions
Gene or protein
around the
entitiy
given gene
Interaction:
“TNF” and “IL6” evidence sentences
interaction describing the given
interaction 67
69. Techniques(2/2)
• Extract sentences describing
protein-protein interaction
– deep parser based on HPSG syntax
• can detect semantic relations between
phrases
– domain dependent pattern recognition
• can learn and expand source patterns
• by using the result of the deep parser, it
can extract semantically true patterns
• not affected by syntactic variations
69