Your SlideShare is downloading. ×
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
National Center for Biotechnology Information
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

National Center for Biotechnology Information

353

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
353
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • <number>
  • Transcript

    • 1. 110101 NCBI National Center for BiotechnologyNational Center for Biotechnology InformationInformation • Created by Public Law 100-607 in 1988 as part of National Library of Medicine at NIH to: • Create automated systems for knowledge about molecular biology, biochemistry, and genetics. • Perform research into advanced methods of analyzing and interpreting molecular biology data. • Enable biotechnology researchers and medical care personnel to use the systems and methods developed. • Builders and providers of GenBank, Entrez, Blast, PubMed. Online systems host about 1.8 million users per day at peak rates of 3,200 web hits a second. • Center for basic research and training in computational biology.
    • 2. 110101 NCBI NCBI is the most heavily site inNCBI is the most heavily site in biomedicine. Why?biomedicine. Why? 300,000 200,000 100,000 NCBI Web Traffic – 1997-2006 400,000 January1998 500,000 600,000 700,000 January1999 January2000 January2001 January2002 January2003 January2004 January2005 January2006 722,000 Unique IPs a Day 91 Million Web Hits a Day 3200 Peak Web Hits a Second 1.5 Terabytes FTP a Day 1.8 Million Unique Users a Day
    • 3. 110101 NCBI Data, the Next Intel InsideData, the Next Intel Inside Growth of Searches and GenBank 0 5000 10000 15000 20000 25000 30000 35000 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 SearchesperDay 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 Megabases GenBank (Megabases) Searches/Day (BLAST & Text)
    • 4. 110101 NCBI Comparative Analysis of GenesComparative Analysis of Genes Enables “Innovation in Assembly”Enables “Innovation in Assembly” Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPC 697 Yeast 657 RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISLMAQIGCFVPC 716 E.coli 584 RHPVVEQVLNEPFIANPLNLSPQRR-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642 Colon cancer gene sequence 3000 Myr 1000 Myr 500 Myr HumanFlyWormYeastBacteria Mouse
    • 5. 110101 NCBI Ignoring the Central Dogma inIgnoring the Central Dogma in Bioinformatics is Evidence of “StupidBioinformatics is Evidence of “Stupid Design”Design” G e n e G e n e G e n e G e n e S tr u c tu r e M a tu r e P e p ti d e P r o P e p ti d e m R N A T r a n s c r i p t C h r o m o s o m e G e n e ti c s G e n o m e s O r g a n i s m s F u n c ti o n D i s e a s e
    • 6. 110101 NCBI It Guides “Innovative Assembly” ofIt Guides “Innovative Assembly” of Separate ResourcesSeparate Resources GenBank RefSeq Human Genome Bacterial Genome Virus Genome MMDB PubMed UniGene(s) LocusLink OMIM Taxonomy GEO PopSet BLAST Entrez ePCR Sequin G e n e G e n e G e n e G e n e S tr u c tu r e M a tu r e P e p tid e P r o P e p ti d e m R N A T r a n s c r i p t C h r o m o s o m e G e n e tic s G e n o m e s O r g a n is m s F u n c ti o n D i s e a s e
    • 7. 110101 NCBI EntrezEntrez: Pathway to Discovery: Pathway to Discovery Amino acid sequence similarityCoding region features Nucleotide sequence similarity Term frequency statistics Literature citations in sequence databases Literature citations in sequence databases MEDLINE abstracts Nucleotide sequences Protein sequences
    • 8. 110101 NCBI Entrez Increases Discovery SpaceEntrez Increases Discovery Space Nucleotide sequences Protein sequences Taxon Phylogeny 3-D Structure MMDB 3 -D Structure PubMed abstracts Complete Genomes PubMed Entrez Genomes Publishers Genome Centers
    • 9. 110101 NCBI Entrez is Intrinsically ComponentsEntrez is Intrinsically Components NCBI C++ Toolkit enforces common modules in internal pipelines, external applications, and web components. Entrez has common model for Booleans and Summaries. Unique models for deep data. New projects can be easily added or extended. Long standing use of the “productotype” keeps NCBI agile, but (fairly) robust.
    • 10. 110101 NCBI Web Services Provide Access to EntrezWeb Services Provide Access to Entrez Eutils supports about 5 million service requests a day SOAP versions support about 38,000 service requests a day (0.8%) similar to Amazon experience with REST and SOAP Eutils allows outside sites to recreate Entrez and NCBI does not know who or why Current NCBI Sequence Viewer uses Eutils itself
    • 11. 110101 NCBI Harnessing Collective Intelligence inHarnessing Collective Intelligence in BioMedicineBioMedicine
    • 12. 110101 NCBI Bibliographic ResourcesBibliographic Resources PubMed – Citations and Abstracts from publishers; MEDLINE indexing PMC – PubMed Central, full text journal articles from publishers (and NIHMS). pPMC – portable mirror of PMC content NIHMS – NIH Manuscript Submission System for Public Access policy NLM DTD – Modular DTD for bibliographic material pNIHMS – portable NIHMS XML Authoring System – MS Word/XML authoring Bookshelf – Books and monographs in XML from publishers and authors.
    • 13. 110101 NCBI PubMed Central XMLPubMed Central XML Why XML? • Preserves structure of an article • Lends itself to intelligent processing • Human readable – not dependent on technology • Is based on SGML, a publishing industry standard • Portable and migratable
    • 14. 110101 NCBI PMC2PMC2 Content is converted to a standard XML format on ingest and then stored and rendered from the one format. But, What format?
    • 15. 110101 NCBI Harvard E-journal Archiving ProjectHarvard E-journal Archiving Project The Mellon Foundation funded the Harvard Library to study the feasibility of using one DTD for archiving journal articles. Harvard commissioned Inera, Inc. for the E-Journal Archive DTD Feasibility Study. • Conclusion – yes, it is feasible, but the right DTD does not exist. Recommendations from the study were used in modified PMC DTD. NCBI collaborated with Harvard to broaden the scope of the new PMC DTD to accommodate journals from all disciplines (not just life sciences).
    • 16. 110101 NCBI NLM Journal Article DTDsNLM Journal Article DTDs Establishing Standards from PracticeEstablishing Standards from Practice Archiving and Interchange DTD Purpose is to preserve journal’s intellectual content Written for • ease of conversion (from other DTDs) • completeness (union of current journal DTDs) Journal Publishing DTD A subset of the Archiving DTD Written for • authoring article content • initial tagging of non-XML content • creating consistent structures
    • 17. 110101 NCBI AdoptionAdoption Highwire Press JStor’s Electronic Archiving Initiative Australia’s Commonwealth Scientific and Industrial Research Organization PLoS and other PMC contributors Atypon Systems (over 150 titles) and other conversion vendors and journal service providers Wiley, Nature, Blackwell common format (PXI)
    • 18. 110101 NCBI SupportSupport Complete documentation for both DTDs available online. Established public discussion lists for user questions Generic transformations to HTML and PDF forms of articles Public XML validation tool Working group of leaders in printing and markup industries provides advice on changes to Tagset
    • 19. 110101 NCBI Portable PubMed Central (pPMC)Portable PubMed Central (pPMC) Provides a local mirror of PMC content Updated daily from NCBI Multiple site archiving Provides rendering of PMC XML into HTML Provides searching through NCBI EUtils Provides for controlled local content in presentation Provides first step toward collaborative archiving Collaboration with Microsoft on support
    • 20. 110101 NCBI Previously published books What’s on the Bookshelf?What’s on the Bookshelf? Previously published books New collections Previously published books New collections New content
    • 21. 110101 NCBI Diabetes • Health information with links to molecular data • NIDDK advisors on content • ~ 10,000 users per month • “…a truly valuable resource…” Gene Barrett, President, American Diabetes Association Obesity
    • 22. 110101 NCBI BooksBooks • Authoring in MS Word • Simple mark-up based on Word styles • WordML to XML conversion
    • 23. 110101 NCBI
    • 24. 110101 NCBI BioMedicine Moves to the WebBioMedicine Moves to the Web Electronic Authoring and Distribution of Articles • Linking and annotating factual data as a side effect • Ability to mine data and text together • Richer data “between” supported databases High Throughput Biology generates large datasets stored in public repositories • Common factual data roadmap • Greater transparency • Greater incidental collaboration for discovery New “private” sites for discussion on this armature New products arise from a public infrastructure
    • 25. 110101 NCBI Influenza Anti-viral CompoundsInfluenza Anti-viral Compounds
    • 26. 110101 NCBI Influenza Anti-viral CompoundsInfluenza Anti-viral Compounds
    • 27. 110101 NCBI Influzena Anti-viral/Protein BindingInfluzena Anti-viral/Protein Binding
    • 28. 110101 NCBI Influenza Neuraminidase GeneInfluenza Neuraminidase Gene
    • 29. 110101 NCBI Influenenza Genome ProjectInfluenenza Genome Project
    • 30. 110101 NCBI Influenza Assembly ArchiveInfluenza Assembly Archive

    ×