A New Era in Diagnostic Microbiology Pathogen Genomics. Whole Genome Sequencing
15 January 2014. The Royal College of Path...
The Royal College of Pathologists
15 January 2014

A New Cloud Computing System for Massive Analysis of Reads from
Metagen...
The Royal College of Pathologists
15 January 2014

A bit of context

http://ohnosequences.com

www.era7bioinformatics.com
The Royal College of Pathologists
15 January 2014

What is Era7 Bioinformatics

http://ohnosequences.com

www.era7bioinfor...
The Royal College of Pathologists
15 January 2014

•
•
•
•

Research driven SME
Open Source
Cloud Computing
Next Generatio...
The Royal College of Pathologists
15 January 2014

•
•
•
•
•
•
•

Bacterial Genomics projects
Comparative Genomics
Metagen...
The Royal College of Pathologists
15 January 2014

What is Era7 Oh no sequences!

http://ohnosequences.com

www.era7bioinf...
The Royal College of Pathologists
15 January 2014

A New Cloud Computing System for Massive Analysis of Reads from
Metagen...
The Royal College of Pathologists
15 January 2014

Traditional microbial genome sequencing
relies upon clonal cultures,
bu...
The Royal College of Pathologists
15 January 2014

Microbiome analysis is possible by
metagenomics approaches.
•
•
•
•

He...
The Royal College of Pathologists
15 January 2014

Microbiome in Health and Disease
•
•
•
•
•

Inflamatory Bowel Disease
D...
The Royal College of Pathologists
15 January 2014

Modifying the Microbiome
• Prebiotics
• Probiotics
• Microbiome Transpl...
The Royal College of Pathologists
15 January 2014

For bacteria, it should be reminded that only
a small fraction of the p...
The Royal College of Pathologists
15 January 2014

Metagenomics has the potential to
revolutionize pathogen detection in p...
The Royal College of Pathologists
15 January 2014

Metagenomic analysis after PCR amplification
of different gene regions
...
The Royal College of Pathologists
15 January 2014

Metagenomic analysis after PCR amplification
of different gene regions:...
The Royal College of Pathologists
15 January 2014

Shotgun Metagenomics
Shotgun metagenomics is a much more massive approa...
The Royal College of Pathologists
15 January 2014

Thechnology
•

454 in the past

•

illumina today (approaches overlapin...
The Royal College of Pathologists
15 January 2014

For viruses:
Unbiased high-throughput sequencing approach is useful for...
The Royal College of Pathologists
15 January 2014

For Bacteria:
Metagenomics will probably serve to identify new pathogen...
The Royal College of Pathologists
15 January 2014

For Bacteria:
As an example, metagenomics for Mycobacterium infections
...
The Royal College of Pathologists
15 January 2014

The Bioinformatics challenge
Metagenomics has a high computational cost...
The Royal College of Pathologists
15 January 2014

The Bioinformatics challenge
Metagenomics has a high computational cost...
The Royal College of Pathologists
15 January 2014

The Bioinformatics challenge
Metagenomics has a high computational cost...
The Royal College of Pathologists
15 January 2014

The Bioinformatics challenge
Metagenomics has a high computational cost...
The Royal College of Pathologists
15 January 2014

The Bioinformatics challenge
Cloud computing can solve the problem of m...
The Royal College of Pathologists
15 January 2014

MG7
•
•
•
•
•

Based in Cloud Computing (AWS)
Parallel computation
Each...
The Royal College of Pathologists
15 January 2014

MG7
Based in Cloud Computing (AWS)
•
•
•
•
•

EC2
S3
SQS
SNS
……

http:/...
The Royal College of Pathologists
15 January 2014

MG7
Based in Cloud Computing (AWS) parallel computation
•
•
•

A Cloud ...
The Royal College of Pathologists
15 January 2014

https://github.com/pablopareja/MG7/wiki

http://ohnosequences.com

www....
The Royal College of Pathologists
15 January 2014

https://github.com/pablopareja/MG7/wiki

http://ohnosequences.com

Data...
The Royal College of Pathologists
15 January 2014

MG7
Based in Cloud Computing (AWS)
•

Storage , another challenge. AWS ...
The Royal College of Pathologists
15 January 2014

MG7
Each read is compared with the complete database:
•

Direct Assignm...
The Royal College of Pathologists
15 January 2014

MG7
Lowest Common Ancestor
First step:

We start from a set of nodes wi...
The Royal College of Pathologists
15 January 2014

MG7
Lowest Common Ancestor
Second step:
We fetch then the first node fr...
The Royal College of Pathologists
15 January 2014

MG7
Lowest Common Ancestor
Third step:
Now that we have the list, we ta...
The Royal College of Pathologists
15 January 2014

MG7
Lowest Common Ancestor
Fourth step:
We keep going trough our node s...
The Royal College of Pathologists
15 January 2014

MG7
Lowest Common Ancestor
Fifth step:
Finally we reach the last node o...
The Royal College of Pathologists
15 January 2014

MG7
Lowest Common Ancestor
Here we have our lowest common
ancestor!

ht...
The Royal College of Pathologists
15 January 2014

MG7
All the known sequences (nt database) for shotgun
Nt database is th...
The Royal College of Pathologists
15 January 2014

MG7
NCBI taxonomy
This Taxonomy is probably the best and most
comprehen...
Thanks
for your attention!

Marina Manrique
Eduardo Pareja-Tobes
Pablo Pareja-Tobes
Raquel Tobes
Eduardo Pareja

epareja@e...
Upcoming SlideShare
Loading in...5
×

Metagenomics and cloud_computing_london_january_2014_era7_bioinformatics

764

Published on

Traditional microbial genome sequencing relies upon clonal cultures, but the new era of genomics is facing a new challenge: the metagenomics analysis. In the next few years it is probable that metagenomics will be used in clinical diagnostic settings. Thus, metagenomics has the potential to revolutionize pathogen detection in public health laboratories by allowing the simultaneous detection of all microorganisms in a clinical sample. For viruses, unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. The use of metagenomics for virus discovery in clinical samples has opened new opportunities for understanding the aetiology of unexplained illness. For bacteria, it should be reminded that only a small fraction of the phylogenetic diversity of Bacteria and Archaea is represented by cultivated organisms. Hence, metagenomics will probably serve to identify new pathogens, and new infections caused by consortiums. In chronic infections metagenomics will give us information about the relevance of biofilms and other bacterial organizations that would be important in such infections. As an example, metagenomics for Mycobacterium infections have demonstrated undetected, plural, strains in the same patient. Microbiome analysis has been one of the most important applications of metagenomics.
Two major strategies have been applied in the past years for bacterial metagenomics: 16S and shotgun metagenomics. 16S metagenomics tells us about microbial diversity and relative abundance of species and taxa. Shotgun metagenomics is a much more massive approach able to inform about the functional profile of the different genes present in the sample and even to obtain assembled genomes if the sample is not very complex.
Metagenomics has brought new challenges to bioinformatics. Cloud computing can solve the problem of massive data analysis providing scalable, real time, on demand computing for metagenomics data analysis. However, Cloud Computing infrastructure is not easy to manage and publicly available software solutions would be needed to extend the use of cloud for the analysis of huge metagenomics data sets.
MG7 is a new system for analysis of reads from metagenomics based on the use of cloud computing for the parallel computation of the BLAST similarity in which is based the inference of function and the assignment of taxonomic origin. A special peculiarity of MG7 system is the utilization of a non relational model database. MG7 uses a graph database to store the results of the analysis and to facilitate the querying and the access to the data organized in the hierarchic structure of the taxonomy tree. MG7 is an open source project that is licensed under AGPLV3 license.

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
764
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Metagenomics and cloud_computing_london_january_2014_era7_bioinformatics

  1. 1. A New Era in Diagnostic Microbiology Pathogen Genomics. Whole Genome Sequencing 15 January 2014. The Royal College of Pathologists. A New Cloud Computing System for Massive Analysis of Reads from Metagenomics Samples http://ohnosequences.com www.era7bioinformatics.com
  2. 2. The Royal College of Pathologists 15 January 2014 A New Cloud Computing System for Massive Analysis of Reads from Metagenomics Samples - A bit of context: - The metagenomics bioinformatics challenge: • What is Era7 • High computational cost • What is Oh no sequences! Research group • Bining for reducing computation • Research lines / Research projects • Reducing reference database - Clonal cultures versus Metagenomics - Microbiome - MG7 - Microbiome in health and disease • Cloud computing - Metagenomics in a clinical sample • MG7 algorithms and pipeline - 16S and shotgun metagenomics • Lowest Common Ancestor assignment - Metagenomics for detection of viruses • MG7 uses Graph databases - Metagenomics for detection of bacteria • MG7 uses NCBI taxonomy tree MG7 for metagenomics analysis
  3. 3. The Royal College of Pathologists 15 January 2014 A bit of context http://ohnosequences.com www.era7bioinformatics.com
  4. 4. The Royal College of Pathologists 15 January 2014 What is Era7 Bioinformatics http://ohnosequences.com www.era7bioinformatics.com
  5. 5. The Royal College of Pathologists 15 January 2014 • • • • Research driven SME Open Source Cloud Computing Next Generation Sequencing http://ohnosequences.com www.era7bioinformatics.com
  6. 6. The Royal College of Pathologists 15 January 2014 • • • • • • • Bacterial Genomics projects Comparative Genomics Metagenomics Microbiome RNA-seq (and Dual RNA-seq) Cancer Genomics Big Data management and integration http://ohnosequences.com www.era7bioinformatics.com
  7. 7. The Royal College of Pathologists 15 January 2014 What is Era7 Oh no sequences! http://ohnosequences.com www.era7bioinformatics.com
  8. 8. The Royal College of Pathologists 15 January 2014 A New Cloud Computing System for Massive Analysis of Reads from Metagenomics Samples Research Lines: Software Research Ptojects • BG7 • Algorithms for assembly • Bio4j • Methods for bacterial genome annotation • Nextmicro • New Cloud Computing Architectures • Statika • Graph Databases for Biological data • Nispero • Comparative genomics and bacterial evolution • Genome Plasticity • Big Data integration and visualization • Host Immune System and infection • MG7 (All of them are Open Source AGPLv3 projects) MG7 for metagenomics analysis
  9. 9. The Royal College of Pathologists 15 January 2014 Traditional microbial genome sequencing relies upon clonal cultures, but the new era of genomics is facing a new challenge: the metagenomics analysis http://ohnosequences.com www.era7bioinformatics.com
  10. 10. The Royal College of Pathologists 15 January 2014 Microbiome analysis is possible by metagenomics approaches. • • • • Health and Disease Therapeutic Interventions Transplant Immune system http://ohnosequences.com www.era7bioinformatics.com
  11. 11. The Royal College of Pathologists 15 January 2014 Microbiome in Health and Disease • • • • • Inflamatory Bowel Disease Diabetes Obesity Cardiovascular Disease Colon Cancer http://ohnosequences.com www.era7bioinformatics.com
  12. 12. The Royal College of Pathologists 15 January 2014 Modifying the Microbiome • Prebiotics • Probiotics • Microbiome Transplant (Clostridium Difficile) http://ohnosequences.com www.era7bioinformatics.com
  13. 13. The Royal College of Pathologists 15 January 2014 For bacteria, it should be reminded that only a small fraction of the phylogenetic diversity of Bacteria and Archaea is represented by cultivated organisms http://ohnosequences.com www.era7bioinformatics.com
  14. 14. The Royal College of Pathologists 15 January 2014 Metagenomics has the potential to revolutionize pathogen detection in public health laboratories by allowing the simultaneous detection of all microorganisms in a clinical sample http://ohnosequences.com www.era7bioinformatics.com
  15. 15. The Royal College of Pathologists 15 January 2014 Metagenomic analysis after PCR amplification of different gene regions Shotgun Metagenomics http://ohnosequences.com www.era7bioinformatics.com
  16. 16. The Royal College of Pathologists 15 January 2014 Metagenomic analysis after PCR amplification of different gene regions: • 16S rRNA • • • • • Gyrase Ribosomal proteins Elongation Fctors RNA Polymerase ………. 16S metagenomics tells us about microbial diversity and relative abundance of species and taxa http://ohnosequences.com www.era7bioinformatics.com
  17. 17. The Royal College of Pathologists 15 January 2014 Shotgun Metagenomics Shotgun metagenomics is a much more massive approach able to inform about the functional profile of the different genes present in the sample and even to obtain assembled genomes if the sample is not very complex http://ohnosequences.com www.era7bioinformatics.com
  18. 18. The Royal College of Pathologists 15 January 2014 Thechnology • 454 in the past • illumina today (approaches overlaping paired reads) • Preprocessing steps very important http://ohnosequences.com www.era7bioinformatics.com
  19. 19. The Royal College of Pathologists 15 January 2014 For viruses: Unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. The use of metagenomics for virus discovery in clinical samples has opened new opportunities for understanding the aetiology of unexplained illness http://ohnosequences.com www.era7bioinformatics.com
  20. 20. The Royal College of Pathologists 15 January 2014 For Bacteria: Metagenomics will probably serve to identify new pathogens, and new infections caused by consortiums. In chronic infections metagenomics will give us information about the relevance of biofilms and other bacterial organizations that would be important in such infections.. Microbiome analysis has been one of the most important applications of metagenomics. http://ohnosequences.com www.era7bioinformatics.com
  21. 21. The Royal College of Pathologists 15 January 2014 For Bacteria: As an example, metagenomics for Mycobacterium infections have demonstrated undetected, plural, strains in the same patient http://ohnosequences.com www.era7bioinformatics.com
  22. 22. The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 1. One approach is to reduce the need of computation 2. The other is to be more efficient http://ohnosequences.com www.era7bioinformatics.com
  23. 23. The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 1. Reducing the computation • Binning (clustering) the reads 16S and Shotgun. Operational Taxonomic Units (OTUs) in 16S http://ohnosequences.com www.era7bioinformatics.com
  24. 24. The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 1. Reducing the computation • Reducing the size of the reference database: It is frequent to use only the complete bacterial genomes Shotgun http://ohnosequences.com www.era7bioinformatics.com
  25. 25. The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Metagenomics has a high computational cost 2. The other is to be more efficient: http://ohnosequences.com MG7 www.era7bioinformatics.com
  26. 26. The Royal College of Pathologists 15 January 2014 The Bioinformatics challenge Cloud computing can solve the problem of massive data analysis providing scalable, real time, on demand computing for metagenomics data analysis. However, Cloud Computing infrastructure is not easy to manage and publicly available software solutions would be needed to extend the use of cloud for the analysis of huge metagenomics data sets. http://ohnosequences.com www.era7bioinformatics.com
  27. 27. The Royal College of Pathologists 15 January 2014 MG7 • • • • • Based in Cloud Computing (AWS) Parallel computation Each read is compared with the complete database: • No binning, all the reads • All the known sequences (nt database) for shotgun NCBI taxonomy Graph database for analyzing the assignment results http://ohnosequences.com www.era7bioinformatics.com
  28. 28. The Royal College of Pathologists 15 January 2014 MG7 Based in Cloud Computing (AWS) • • • • • EC2 S3 SQS SNS …… http://ohnosequences.com www.era7bioinformatics.com
  29. 29. The Royal College of Pathologists 15 January 2014 MG7 Based in Cloud Computing (AWS) parallel computation • • • A Cloud Master machine creates tasks and set Qeues A set (hundreds, it could be thousands) of Cloud instances (usually micro cloud EC2 instances) are launched After the parallel computation, results are modeled in a graph database. This allows to further analysis http://ohnosequences.com www.era7bioinformatics.com
  30. 30. The Royal College of Pathologists 15 January 2014 https://github.com/pablopareja/MG7/wiki http://ohnosequences.com www.era7bioinformatics.com
  31. 31. The Royal College of Pathologists 15 January 2014 https://github.com/pablopareja/MG7/wiki http://ohnosequences.com Data Model for the Graph DatabaseNeo4j www.era7bioinformatics.com
  32. 32. The Royal College of Pathologists 15 January 2014 MG7 Based in Cloud Computing (AWS) • Storage , another challenge. AWS Cloud is very useful: • S3 for inmediate access • Glacier for archiving . http://ohnosequences.com www.era7bioinformatics.com
  33. 33. The Royal College of Pathologists 15 January 2014 MG7 Each read is compared with the complete database: • Direct Assignment Best Blast Hit It can be done by: • E value • Depending on similarity % and length of the hit • Lowest Common Ancestor http://ohnosequences.com www.era7bioinformatics.com
  34. 34. The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor First step: We start from a set of nodes with an arbitrary length – 4 in this sample, which are spread through the taxonomy tree http://ohnosequences.com www.era7bioinformatics.com
  35. 35. The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Second step: We fetch then the first node from the set and calculate its whole ancestor list to the main root of the taxonomy. http://ohnosequences.com www.era7bioinformatics.com
  36. 36. The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Third step: Now that we have the list, we take the second node of the set and check if it’s contained in it, if not, we keep going up through its ancestors until we find a marked node. Once it has been found, we get rid of the previous elements in the list (if any) so that they are not taken into account for the next iterations in the algorithm. http://ohnosequences.com www.era7bioinformatics.com
  37. 37. The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Fourth step: We keep going trough our node set, and node C also removes some elements of the list… http://ohnosequences.com www.era7bioinformatics.com
  38. 38. The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Fifth step: Finally we reach the last node of our set, but no element is removed from our list as a result. http://ohnosequences.com www.era7bioinformatics.com
  39. 39. The Royal College of Pathologists 15 January 2014 MG7 Lowest Common Ancestor Here we have our lowest common ancestor! http://ohnosequences.com www.era7bioinformatics.com
  40. 40. The Royal College of Pathologists 15 January 2014 MG7 All the known sequences (nt database) for shotgun Nt database is the largest nucleotide database. It contains nucleotide sequences from all the organisms. This is important to detect: • • Unexpected organism Contamination http://ohnosequences.com www.era7bioinformatics.com
  41. 41. The Royal College of Pathologists 15 January 2014 MG7 NCBI taxonomy This Taxonomy is probably the best and most comprehensive A Graph Database is very appropriate to model a Taxonomy tree http://ohnosequences.com www.era7bioinformatics.com
  42. 42. Thanks for your attention! Marina Manrique Eduardo Pareja-Tobes Pablo Pareja-Tobes Raquel Tobes Eduardo Pareja epareja@era7.com http://ohnosequences.com www.era7bioinformatics.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×