Greene Bosc2008


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Greene Bosc2008

  1. 1. BOSC 2008 Lightning Talk: The E nteropathogen R esource I ntegration C enter (ERIC), A NIAID Bioinformatics Resource Center for Biodefense and Emerging/Re-emerging Infectious Disease D. Pot 1 , J. Whitmore 1 , M. Shaker 1 , J. Fedorko 1 , K. Joshi 1 , S. Nanan 1 , P. Shetty 1 , J. Thangiah 1 , S. Zaremba 1 , G. Plunkett, III 2 , J. Glasner 2 , B. Anderson 2 , D. Baumler 2 , B. Biehl 2 , V. Burland 2 , E. Cabot 2 , E. Neeno-Eckwall 2 , B. Mau 2 , P. Liss 2 , M. Rusch 2 , F. R. Blattner 2 , N. T. Perna 2 , J. M. Greene 1 1 SRA International, Inc., Rockville MD and 2 University of Wisconsin, Madison WI
  2. 2. <ul><li>ERIC is a NIAID Bioinformatics Resource Center for Biodefense and Emerging/Re-emerging Disease , one of 8 such centers funded in July 2004 for 5 years. </li></ul><ul><li>ERIC primarily focuses on the integration of data from five enteropathogens as well as related reference organisms: </li></ul><ul><ul><li>Diarrheagenic E. coli </li></ul></ul><ul><ul><li>Shigella spp. </li></ul></ul><ul><ul><li>Salmonella spp. </li></ul></ul><ul><ul><li>Yersinia enterocolitica </li></ul></ul><ul><ul><li>Yersinia pestis </li></ul></ul><ul><li>Partnership between personnel at the Genome Center of Wisconsin (Nicole Perna, Fred Blattner, Guy Plunkett) and SRA International’s Global Health Sector, Rockville MD. </li></ul><ul><li>Everything done under contract funding required to be made freely available to the Scientific Community. </li></ul>
  3. 3. ERIC-Overview Genomes Annotations (ASAP) Genome Views and Comparisons (Mauve, GBrowse) Microarray Analysis (mAdb) ERIC is a portal based system using the JBoss portal . ASAP ( A Systematic Annotation Package for community annotation) from UW-Madison is being used to allow the scientific community to annotate genes for the five enteropathogens and related reference organism useful for comparative genomics.
  4. 4. ERIC Portal Home Page
  5. 5. <ul><li>ERIC contains tools for comparative genomics, such as Mauve, which has the distinct advantage of allowing comparison of more than two genomes, as well as being able to handle chromosomal rearrangements. (We provide access to some other pathogenic and non-pathogenic reference genomes, particularly for E. coli .) </li></ul>Mauve – whole genome comparison
  6. 6. <ul><li>Mauve identifies and aligns regions of local collinearity called locally collinear blocks (LCBs). Each locally collinear block is a homologous region of sequence shared by two or more of the genomes under study, and does not contain any rearrangements </li></ul><ul><li>of homologous sequence. </li></ul><ul><li>The Mauve genome alignment procedure results in a global alignment of each locally collinear block that has sequence elements conserved among all the genomes under study. Nucleotides in any given genome are aligned only once to other genomes, </li></ul><ul><li>suggesting orthology among aligned residues. Mauve makes no attempt to align paralogous regions. </li></ul><ul><li>The remaining unaligned regions may be lineage-specific sequence or rearranged or paralogous repetitive regions and can be identified as such </li></ul><ul><li>during subsequent processing with other tools. </li></ul><ul><li>Available at: </li></ul>Mauve
  7. 7. <ul><li>SRA is an industry leader in natural language processing (NLP)-based text mining </li></ul><ul><ul><li>Dedicated group of linguists and software engineers </li></ul></ul><ul><ul><li>Routinely win Government text mining competitions (e.g. Message Understanding Competitions (MUC)) </li></ul></ul><ul><li>Extensive experience in multilingual information extraction , text clustering, and text summarization – this is not just keyword searching. </li></ul><ul><li>Numerous commercial and government clients/applications </li></ul><ul><li>Health care organizations (fraud detection); Financial services (anti-money laundering, e-mail surveillance); Government (homeland security, e- Government, business intelligence) </li></ul><ul><li>See Poster S04, Extraction of Facts and Relationships Relevant to Molecular Mechanisms of Bacterial Pathogenesis through Natural Language Processing, for details on how this is used in ERIC! </li></ul>Text Mining
  8. 8. <ul><li>Latest Articles tab – we present the mined extracts on enteropathogens for the preceding week. </li></ul>Text Mining – Current Awareness
  9. 9. <ul><li>Search Tab – allows users to search across extracted data </li></ul><ul><li>Unlike Latest Articles , this is not limited to our contract enteropathogens, and should be useful across all bacteria. </li></ul>Text Mining - Search
  10. 10. <ul><li>Currently, in addition to processing all new PubMed abstracts weekly, we are extracting about 4-5,000 abstracts per night, and have extracted all PubMed abstracts back about four years. </li></ul><ul><li>We intend to go back at least 10 years…. </li></ul><ul><li>No reason this cannot be applied to Open Access full length text; also will provide Web Services access to extracted data in near future… </li></ul><ul><li>Search Results: </li></ul>Text Mining – Search Results
  11. 11. Extracted Terms and Relationships (requirements from Biologists) Frequency of Terms and Coloration Control (requirements from IT Types) Extracted Text from PubMed
  12. 12. What is different? <ul><li>Hundreds of abstracts to read (days) </li></ul><ul><li>Limited, keyword searching </li></ul><ul><li>Data handling complex (stacks of paper) </li></ul><ul><li>Slower ability to reach conclusions </li></ul><ul><li>Quick summary provided (seconds) </li></ul><ul><li>Enhanced role searching </li></ul><ul><li>Knowledge base with links to details </li></ul><ul><li>Faster conclusions through mining of extracted data </li></ul>Before Now
  13. 13. Other ERIC Notes <ul><li>Again, everything we do under the contract must be made freely available to the Scientific Community – all SRA’s work is available under the MIT License, and components from UW are under the GNU GPL. </li></ul><ul><li>Posters - Monday evening session: </li></ul><ul><ul><li>I-04 on ERIC System (John Greene) </li></ul></ul><ul><ul><li>S-04 on Text Mining (David Pot) </li></ul></ul><ul><li>For more information, contact [email_address] . </li></ul><ul><li>ERIC is supported via NIAID contract HSN266200400040C. </li></ul>
  14. 16. NetOwl Extractor Optimizing Manual Literature Annotation *Software licensed for use on ERIC bounded in red. Ontologies and patterns developed to mine text Pattern Writers