Beyond the Tsunami: Dealing with Life Sciences Data


Published on

Microsoft E Science 2009

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • EBI 5 Petabytes, doubling every year
  • Beyond the Tsunami: Dealing with Life Sciences Data

    1. 1. Beyond the Tsunami: Developing the Infrastructure to Deal with Life Sciences Data Christopher Southan and Graham Cameron, EMBL-European Bioinformatics Institute (EBI), Cambridge, U.K.
    2. 2. EBI and Sanger at Hinxton: Engaging with the Data Challenges <ul><li>Technology for sequence data generation and reduction </li></ul><ul><li>Repositories, storage, archiving </li></ul><ul><li>Databases, entitity linking, infrasctruture and utility </li></ul><ul><li>Biocuration, annotation, standards, ontolgies </li></ul><ul><li>Experimental biological data from research groups </li></ul><ul><li>Data exploitation, mining and visualisation </li></ul><ul><li>Biological hypothesis iteration </li></ul>
    3. 3. EMBL-Bank Release 101, Aug 2009, 163 million entries, 283 billion bases
    4. 4. 10 years of Rapid Growth GU057010; SV 1; linear; viral cRNA; STD; VRL; 1701 BP. 08-OCT-2009 (Rel. 102, Created) 08-OCT-2009 (Rel. 102, Last updated, Version 1) Influenza A virus (A/Chengdu/03/2009(H1N1)) segment 4 hemagglutinin (HA) Jiang T., Qin C., Li X., Zhao H., Yu M., Deng Y., Yu X., Han J., Qin E., RA Zhu Q.; &quot;A community transmission of influenza A (H1N1) virus in a boarding school RT in China, 22-27 July 2009“ ******************************************************************************************* AF177758; SV 1; linear; mRNA; STD; HUM; 1868 BP. 10-SEP-1999 (Rel. 61, Created) 07-OCT-2008 (Rel. 97, Last updated, Version 6) Homo sapiens ubiquitin specific protease 16 (USP16) mRNA, complete cds. PUBMED; 10786635. Smith T.S., Southan C.; &quot;Sequencing, tissue distribution and chromosomal assignment of a novel ubiquitin-specific protease USP23&quot;; Biochim. Biophys. Acta 1490(1-2):184-88(2000). Ensembl-Gn; ENSG00000143258; Homo_sapiens.
    5. 5. New Technology > New Data Archives European Nucleotide Archive Snapshot March 2009
    6. 6. Accelerating Genome Coverage Jan 2009, 4370 projects
    7. 7. from EBI/Sanger
    8. 8. The 1000 Genomes Project: Cataloging Human Genetic Variation <ul><li>Initial human genome -10 years and 40 gigabases </li></ul><ul><li>Over next two years the eqivalent of two human genomes will be produced every 24 hours </li></ul><ul><li>Completed dataset will be 6 trillion DNA bases, 500 TB </li></ul><ul><li>60-fold more than 28 years of EMBL-Bank </li></ul><ul><li>Expected to cover 1200 genomes </li></ul>
    9. 9. Data Exploitation: EBI Accesses Last 4 years of hit-rates for web pages and web services
    10. 10. Towards a sustainable infrastructure for biological information in Europe, to support life science, translation to medicine, the environment, bio-industries and society. Genomes Nucleotide sequence Expression Proteomes Protein families, and domains Protein structure Protein interactions Chemical entities Pathways Systems Literature, ontologies
    11. 11. Conclusions <ul><li>The International Nucleotide Sequence Database Collaboration will exeed 300 billion bases in 2009. </li></ul><ul><li>Storage at the EBI has doubled annually and is now 5 Petabytes. </li></ul><ul><li>Next-Generation Sequencing is increasing data production ~ 10-fold. </li></ul><ul><li>By 2010 the full genomic variation in over 1000 people will be revealed and genomes from over 1000 species completed. </li></ul><ul><li>An increase in data mining is needed to facilitate conversion into knowledge. </li></ul><ul><li>The European ELIXIR project and other global initiatives to enhance the sustainable infrastructure for biological databases are essential. </li></ul><ul><li>The impact of data-intensive computing on the Life Sciences will be profound and transforming. </li></ul><ul><li>Exploitation will bring major benefits for biology, medicine, agriculture, biofuels and environmental science. </li></ul>