Your SlideShare is downloading. ×
0
Bioinformatics in Genetics                        Research             Genetics Noon Symposium Series                     ...
IGNITE   Orphan Diseases: Identifying Genes and Novel    Therapeutics to Enhance Treatment   Identify causative genetic ...
IGNITE   Orphan Diseases: Identifying Genes and Novel    Therapeutics to Enhance Treatment   Identify causative genetic ...
Outline   Introduction       Bioinformatics in Disease Genomics       Next-Generation Sequencing   Genomics in Researc...
Bioinformatics in Disease Genomics   Handling and long-term storage of raw data    (sequencing, gene expression, etc)   ...
Bioinformatics in Disease Genomics   Handling and long-term storage of raw data    (sequencing, gene expression, etc)   ...
‘Next-Generation’ Sequencing and               Disease Genomics
Disease Genomics: Hunting Down Pathogenic Genetic VariationReferenc       Exon 1   Intron 1   Exon 2e       Start         ...
Disease Genomics: Hunting Down Pathogenic Genetic Variation                             Splice                            ...
Disease Genomics: Hunting Down Pathogenic Genetic Variation                                Splice                         ...
Disease Genomics: Hunting Down Pathogenic Genetic Variation                                Splice                         ...
Disease Genomics: Hunting Down Pathogenic Genetic Variation                                Splice                         ...
Disease Genomics: Hunting Down Pathogenic Genetic Variation                                Splice                         ...
Disease Genomics: Research vs Clinic   Still predominantly research oriented       Complex/Common disease       Mendeli...
Disease Genomics: Research vs Clinic   Still predominantly research oriented       Complex/Common disease       Mendeli...
Clinical Genomics   Children’s Mercy Hospital NICU       In the US >20% of infant deaths due to genetic disease       S...
Children’s Mercy Hospital NICU   50-hour differential diagnosis of monogenic disease       Sample preparation and sequen...
Children’s Mercy Hospital NICU   50-hour differential diagnosis of monogenic disease       Sample preparation and sequen...
The Data Deluge           4 million genetic variants           2 million associated with             protein-coding genes ...
Surviving the Data DelugeReducing the Search Space: Exome Sequencing
Exome Sequencing   Exome: Portion of genome composed of protein-    coding exons and functional RNA sequences   1.5 - 2%...
Caveats   Incomplete and non-uniform coverage of exome       Systematic bias (GC content)       Random sampling   Not ...
Surviving The Data Deluge                 Bioinformatics
Typical Bioinformatics Workflow              QC of Raw Data              Map to Reference                    QC           ...
It Sounds simple but…   For every stage there are multiple programs    available and published in the literature
It Sounds simple but…   For every stage there are multiple programs    available and published in the literature   For e...
It Sounds simple but…   For every stage there are multiple programs    available and published in the literature   For e...
It Sounds simple but…   For every stage there are multiple programs    available and published in the literature   For e...
It Sounds simple but…   For every stage there are multiple programs    available and published in the literature   For e...
Typical Bioinformatics Workflow              QC of Raw Data              Map to Reference                    QC           ...
Annotating Variants
If a problem cannot besolved, enlarge it.             --Dwight D.Eisenhower
Annotations Associated with GenomicVariants   Is variant in a known protein-coding gene?       What does the gene do?   ...
Gene Annotation Resources
Variant Annotation Resources
Potential Pitfalls with Annotation Sources   Databases often overlap and agree, but there may    be disagreements   Sour...
Bioinformatics Analyses of Genomic                           Variants           Combining Data Sources and Filtering
IGNITE Data Pipeline and Integration                 Gene               Annotations    Annotated                          ...
Filtering the Data: Categorization                                    4 million                                    variant...
Filtering the Data: Common or Rare?   Variants in dbSNP – Typically known polymorphisms,    unlikely to be associated wit...
Notes on Filtering and Variant Annotation   Very important to be aware of population when    referencing frequency of a v...
Notes on Filtering and Variant Annotation   Very important to be aware of population when    referencing frequency of a v...
Notes on Filtering and Variant Annotation   Very important to be aware of population when    referencing frequency of a v...
Notes on Filtering and Variant Annotation   Very important to be aware of population when    referencing frequency of a v...
Applications to Real DataCharcot-Marie-Tooth Disease and Cutis Laxa
IGNITE Data Pipeline and Integration                 Gene               Annotations    Annotated                          ...
Charcot-Marie-Tooth: Genetic Mapping                       Chromosome 9:                       120,962,282 -              ...
Cutis Laxa: Genetic Mapping                      Chromosome 17:                      79,596,811-                      81,0...
Charcot-Marie-Tooth               Cutis Laxa   143 genes in region      52 genes in region   13 known genes in        ...
Pathway and Interaction Data   37 pathways                       10 pathways       Clathrin-derived vesicle          P...
Results: Charcot-Marie-Tooth   8 Genes PrioritizedGene                     Interactions   PathwayLRSAM1                  ...
Results: Cutis Laxa 10 genes prioritizedGene                   Interactions    PathwayHEXDC                  Multiple    ...
Conclusions
Conclusions   Bioinformatics is involved at every stage of genomic    research from experimental design through to final ...
Where Are We Headed?   Integration of more data sources       Gene expression       More annotation sources           ...
Acknowledgements   Dalhousie University          McGill/Genome Quebec       Dr. Karen Bedard              Dr. Jacek Ma...
Bioinformatics in Gene Research
Upcoming SlideShare
Loading in...5
×

Bioinformatics in Gene Research

550

Published on

Intended for a mixed/general audience of Clinicians, Business Interests, and Research Scientists. No audio, however the event was recorded and posted to youtube by Genome Atlantic: http://www.youtube.com/watch?v=FLVjwOngu-Q I

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
550
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Bioinformatics in Gene Research"

  1. 1. Bioinformatics in Genetics Research Genetics Noon Symposium Series Daniel Gaston, PhDDr. Karen Bedard Lab, Department of Pathology November 21st, 2012
  2. 2. IGNITE Orphan Diseases: Identifying Genes and Novel Therapeutics to Enhance Treatment Identify causative genetic variations in orphan diseases with an emphasis on Atlantic Canada Develop animal and cell culture models Identify and develop novel therapeutics igniteproject.ca
  3. 3. IGNITE Orphan Diseases: Identifying Genes and Novel Therapeutics to Enhance Treatment Identify causative genetic variations in orphan diseases with an emphasis on Atlantic Canada Develop animal and cell culture models Identify and develop novel therapeutics igniteproject.ca
  4. 4. Outline Introduction  Bioinformatics in Disease Genomics  Next-Generation Sequencing Genomics in Research and the Clinic The Data Deluge and its Solutions  Bioinformatic Methods for Analyzing Genomic Data Case Studies Conclusion
  5. 5. Bioinformatics in Disease Genomics Handling and long-term storage of raw data (sequencing, gene expression, etc) Maintenance and support of computational infrastructure Experimental design Data analysis Methods development  Analysis pipelines  Statistical analyses  Algorithm design
  6. 6. Bioinformatics in Disease Genomics Handling and long-term storage of raw data (sequencing, gene expression, etc) Maintenance and support of computational infrastructure Experimental design Data analysis Methods development  Analysis pipelines  Statistical analysis techniques  Algorithm design
  7. 7. ‘Next-Generation’ Sequencing and Disease Genomics
  8. 8. Disease Genomics: Hunting Down Pathogenic Genetic VariationReferenc Exon 1 Intron 1 Exon 2e Start TAA Stop
  9. 9. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice SitesReferenc Exon 1 Intron 1 Exon 2e Start TAA mRNA coding for protein Stop
  10. 10. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice SitesReferenc Exon 1 Intron 1 Exon 2e Start TAA mRNA coding for protein StopPatient Exon 1 Intron 1 Exon 2
  11. 11. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice SitesReferenc Exon 1 Intron 1 Exon 2e Start TAA mRNA coding for protein Stop TAC TyrPatient Exon 1 Intron 1 Exon 2
  12. 12. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice SitesReferenc Exon 1 Intron 1 Exon 2e Start TAA mRNA coding for protein Stop TAC TyrPatient Exon 1 Intron 1 Exon 2
  13. 13. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice SitesReferenc Exon 1 Intron 1 Exon 2e Start TAA mRNA coding for protein Stop TAC TyrPatient Exon 1 Intron 1 Exon 2
  14. 14. Disease Genomics: Research vs Clinic Still predominantly research oriented  Complex/Common disease  Mendelian disorders  Cancer genomics
  15. 15. Disease Genomics: Research vs Clinic Still predominantly research oriented  Complex/Common disease  Mendelian disorders  Cancer genomics Clinical genomics starting to gain traction  Cancer genomics  Cancer subtype identification  Personalized medicine and predicting outcomes  Mendelian disorders  Early diagnosis  Cost effectiveness
  16. 16. Clinical Genomics Children’s Mercy Hospital NICU  In the US >20% of infant deaths due to genetic disease  Serial sequencing of candidate genes too slow
  17. 17. Children’s Mercy Hospital NICU 50-hour differential diagnosis of monogenic disease  Sample preparation and sequencing: 30.5 hours  Automated bioinformatics analysis: 17.5 hours  Previous high-throughput sequencing methods: 19 days  Test on seven infants, two previously diagnosed using standard methods, five undiagnosed
  18. 18. Children’s Mercy Hospital NICU 50-hour differential diagnosis of monogenic disease  Sample preparation and sequencing: 30.5 hours  Automated bioinformatics analysis: 17.5 hours  Previous high-throughput sequencing methods: 19 days  Test on seven infants, two previously diagnosed using standard methods, five undiagnosed Caveats  Bioinformatics portion not available outside of hospital  Requires thorough clinical phenotyping using a controlled vocabulary  Generates a large amount of data
  19. 19. The Data Deluge 4 million genetic variants 2 million associated with protein-coding genes 10,000 possibly of disease causing type 1500 <1% frequency in population
  20. 20. Surviving the Data DelugeReducing the Search Space: Exome Sequencing
  21. 21. Exome Sequencing Exome: Portion of genome composed of protein- coding exons and functional RNA sequences 1.5 - 2% of human genome (50 Mb) > 85% of monogenic diseases due to variants in exome Complete exome sequencing: ~ $1000/sample
  22. 22. Caveats Incomplete and non-uniform coverage of exome  Systematic bias (GC content)  Random sampling Not all genetic variants amenable to discovery  Non-coding variants  Structural variants
  23. 23. Surviving The Data Deluge Bioinformatics
  24. 24. Typical Bioinformatics Workflow QC of Raw Data Map to Reference QC Find Variants QC Annotate Filter
  25. 25. It Sounds simple but… For every stage there are multiple programs available and published in the literature
  26. 26. It Sounds simple but… For every stage there are multiple programs available and published in the literature For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always
  27. 27. It Sounds simple but… For every stage there are multiple programs available and published in the literature For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always Best combinations of programs and options not well understood
  28. 28. It Sounds simple but… For every stage there are multiple programs available and published in the literature For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always Best combinations of programs and options not well understood Protocols changing rapidly as new technologies and methods developed
  29. 29. It Sounds simple but… For every stage there are multiple programs available and published in the literature For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always Best combinations of programs and options not well understood Protocols changing rapidly as new technologies and methods developed Different centres and groups use slightly different workflows with similar, but not identical results
  30. 30. Typical Bioinformatics Workflow QC of Raw Data Map to Reference QC Find Variants QC Annotate Filter
  31. 31. Annotating Variants
  32. 32. If a problem cannot besolved, enlarge it. --Dwight D.Eisenhower
  33. 33. Annotations Associated with GenomicVariants Is variant in a known protein-coding gene?  What does the gene do?  What molecular pathways?  What protein-protein interactions? 4 million genetic variants  What tissues is it expressed in? 2 million associated with protein-coding genes  When in development? 10,000 possibly Has this variant been seen before? of disease causing type 1500 <1%  What population(s)? With what frequency? frequency in population  Has it been seen in local sequencing projects?  Is there any known clinical significance? What is the effect of the variation?  Does it change the resulting protein? How?
  34. 34. Gene Annotation Resources
  35. 35. Variant Annotation Resources
  36. 36. Potential Pitfalls with Annotation Sources Databases often overlap and agree, but there may be disagreements Source of information: Predicted versus experimental Incorrect and out-of-date information Large-scale un-validated versus manually curated datasets
  37. 37. Bioinformatics Analyses of Genomic Variants Combining Data Sources and Filtering
  38. 38. IGNITE Data Pipeline and Integration Gene Annotations Annotated Genomic Variants Mapped Gene Region(s) Definitions Filter Sort PrioritizeKnown Genes Pathway and Interactions
  39. 39. Filtering the Data: Categorization 4 million variants Intronic Exonic Intergenic Amino Acid Unknown Splice Site Silent Mutation Splice Site Changing Potential Potential Disease Disease Causing CausingKnown Genetic Amino Acid Amino Acid Known Stop Loss / Disease Change Likely Change Likely Polymorphism Stop Gain Variant Pathogenic Benign in Population
  40. 40. Filtering the Data: Common or Rare? Variants in dbSNP – Typically known polymorphisms, unlikely to be associated with rare disease Variants with relatively high frequency in control populations (1000 Genomes, HapMAP, EVS, 2800 Exomes) Number of times variant previously seen at sequencing centre/locally
  41. 41. Notes on Filtering and Variant Annotation Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence
  42. 42. Notes on Filtering and Variant Annotation Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence Reasonably well-sampled local populations are better than any other reference
  43. 43. Notes on Filtering and Variant Annotation Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence Reasonably well-sampled local populations are better than any other reference Strike a balance between hard filtering for variants of largest potential effect and being inclusive to not miss variants
  44. 44. Notes on Filtering and Variant Annotation Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence Reasonably well-sampled local populations are better than any other reference Strike a balance between hard filtering for variants of largest potential effect and being inclusive to not miss variants Some genes acquire large effect variants (stop loss / stop gain, etc) frequently. Some genes can be lost without causing disease
  45. 45. Applications to Real DataCharcot-Marie-Tooth Disease and Cutis Laxa
  46. 46. IGNITE Data Pipeline and Integration Gene Annotations Annotated Genomic Variants Mapped Gene Region(s) Definitions Filter Sort PrioritizeKnown Genes Pathway and Interactions
  47. 47. Charcot-Marie-Tooth: Genetic Mapping Chromosome 9: 120,962,282 - 133,033,431
  48. 48. Cutis Laxa: Genetic Mapping Chromosome 17: 79,596,811- 81,041,077
  49. 49. Charcot-Marie-Tooth Cutis Laxa 143 genes in region  52 genes in region 13 known genes in  5 known genes in genome genome  ATP6V0A2  MPZ  ELN  PMP22  FBLN5  GDAP1  EFEMP2  KIF1B  SCYL1BP1  MFN2  ALDH18A1  SOX  EGR2  DNM2  RAB7  LITAF (SIMPLE)  GARS  YARS  LMNA
  50. 50. Pathway and Interaction Data 37 pathways  10 pathways  Clathrin-derived vesicle  Phagosome budding  Collecting duct acid  Lysosome vesicle secretion biogenesis  Lysosome  Endocytosis  Protein digestion and  Golgi-associated vesicle absorption biogenesis  Metabolic pathways  Membrane trafficking  Oxidative  Trans-Golgi network phosphorylation vesicle budding  Arginine and proline Primarily LMNA or metabolism DNM2  Primarily ATP6V0A2
  51. 51. Results: Charcot-Marie-Tooth 8 Genes PrioritizedGene Interactions PathwayLRSAM1 Multiple EndocytosisDNM1 DNM2 -FNBP1 DNM2 -TOR1A MNA -STXBP1 Multiple FiveSH3GLB2 - EndocytosisPIP5KL1 - EndocytosisFAM125B - Endocytosis For more information  Guernsey et al (2010) PLoS Genetics. 6(8): e1001081
  52. 52. Results: Cutis Laxa 10 genes prioritizedGene Interactions PathwayHEXDC Multiple PhagosomeHG5 - PhagosomeHG5 Multiple Lysosome, ProteindigestionSIRT7 Multiple Metabolic PathwaysFASN - Metabolic PathwaysDCXR - Metabolic PathwaysPYCR1 - Metabolic Pathways, Arginine/ProlinePCYT2 - Metabolic PathwaysARHGDIA - Oxidative Phosphorylation For more information  Guernsey et al (2009) Am J Hum Genet. 85(1): 120-9
  53. 53. Conclusions
  54. 54. Conclusions Bioinformatics is involved at every stage of genomic research from experimental design through to final analysis Standards and best practices do exist, but are rapidly evolving as new technologies and methods are developed Progress towards automatic generation of clinically interpretable genomics studies Annotation, filtering, and prioritization of genetic variants crucial Balance between false positive calls and false negatives
  55. 55. Where Are We Headed? Integration of more data sources  Gene expression  More annotation sources  Controlled phenotype vocabularies  Gene Ontology terms  Predictive models  Recessive versus Dominant inheritance and Penetrance “New” and Emerging Technologies  RNA-Seq (Gene Expression)  ChIP-Seq (Protein-DNA binding)  Single-Molecule Sequencing
  56. 56. Acknowledgements Dalhousie University  McGill/Genome Quebec  Dr. Karen Bedard  Dr. Jacek Majewski  Dr. Chris McMaster  Jeremy  Dr. Andrew Orr Schwartzentruber  Dr. Conrad Fernandez  Dr. Marissa Leblanc  Dr. Sarah Dyack  Mat Nightingale  Dr. Johane Robataille  Bedard Lab  Genome Atlantic  IGNITE
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×