ASHG sequencing workshop

  • 5,400 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,400
On Slideshare
0
From Embeds
0
Number of Embeds
9

Actions

Shares
Downloads
30
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A unique targeted sequencing service providing meaningful results, not insurmountable data Dr. Mike Evans — Chief Executive
  • 2. Outline of presentation• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO• Optimised bait design for targeted sequencing — Dr Volker Brenner, Head of Computational Biology• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology• Summary• Q&A
  • 3. OGT - provides advanced clinical genetics solutions - develops innovative molecular diagnostics• Founded by Ed Southern in 1995• 64 people OGT Begbroke: Corporate offices and high- OGT Southern Centre: Biomarker discovery throughput labs
  • 4. OGT’s key businessesIP Licensing40 licence relationships TechnologiesDiagnostic BiomarkersGenomic- and protein-based diagnostics For Molecular MedicineClinical and Genomic SolutionsCytogenetics products and genomic services
  • 5. Clinical and Genomic SolutionsAddressing the challenges of high-throughput, high-resolutionmolecular technologies:• High equipment and staff training costs• Short equipment lifespan• Complex study design and processes (e.g. platform evaluation & selection)• Vast amounts of data • Extensive computing infrastructure • Data analysis expertise and resource The solution: Genefficiency Genomic Services
  • 6. Genefficiency™ — World’s leading aCGH serviceHigh-quality data & complete reassurance • Experimental and array design expertise • High-throughput processing (>2000 samples / week) • Applications: aCGH-CNV, methylation, miRNA, gene expression analysis • Comprehensive data analysis services • >40 QC checks on each sample to ensure high-quality data
  • 7. Independent accreditations • First Agilent High-Throughput Microarray Certified Service Provider • ISO 9001:2008 — Quality management systems FS 561156 • ISO 27001:2005 — Information security IS 561157 • ISO 17025:2005 — aCGH Laboratory services 4593
  • 8. Customer satisfaction… “In order to characterise genetic variants, reproducible performance and reliable processing of the high resolution microarrays is essential. We were pleased with OGT’s responsive approach and attention to producing high quality data to tight deadlines” Dr Matt Hurles, Wellcome Trust Sanger Institute.” 20,000 samples. 1,000 samples / week
  • 9. OGT collaborators and customers
  • 10. A world-class teamOur expert team deliver:• Excellent project management and customer service • >600 projects to date • >50,000 samples• Unparalleled expertise in study and probe design• Advanced data analysis though a dedicated team of bioinformaticians• Rapid turnaround times• A wealth of experience of clinical and translational research projects
  • 11. New Genefficiency Targeted Sequencing Services
  • 12. Delivering discoveryGenefficiency Targeted Sequencing Services — designed to be different:• Comprehensive — taking you from genomic DNA to filtered, qualified results• Rigorously designed — project and probe design expertise maximises your likelihood of discovery• Expert support — experienced team of biologists and bioinformaticians• Dedication to quality — from sample to result, delivering reliable results every time
  • 13. Delivering an integrated, comprehensive service 1. Selection of most 2. Capture, sample 3. Data analysis and appropriate genomic multiplexing and advanced filtering of regions for enrichment sequencing variants27/10/2011 13
  • 14. Delivering expert project designStep 1: Selection of most appropriate genomic regions for your project and budgetWhole exome Custom genomic regionsPre-designed, validated whole Expert custom design of capture probesexome capture probes for your regions of interest Coding regions are “most likely” Flexibility to focus on regions of clinical candidates for many disorders significance or GWAS regions
  • 15. Delivering class-leading technologyWe have fully optimised the DNA capture and sequencingmethodologies, so you don’t have to!Step 2: Performing the capture, sample multiplexing, library preparation and sequencing• Options for sample indexing and multiplexing to minimise sequencing cost• Depth of sequencing coverage to suit your samples and project• Paired-end sequencing on the industry-leading Illumina HiSeq 2000
  • 16. OGT delivers discovery, not just dataStep 3: Data analysis and advanced filtering of variants• OGT’s dedicated analysis pipeline brings you beyond data, to a filtered list of variants relevant to your study SEQUENCE FILTER DISCOVER
  • 17. Genefficiency Targeted Sequencing ServicesThe PLATFORM • Core sequencing platform: Illumina HiSeq 2000 • Core sequence capture technology: Agilent SureSelectThe PEOPLE • Team of highly skilled molecular biologists and bioinformaticians • Core expertise in probe design • Successful development of advanced analysis solutions
  • 18. Outline of presentation• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO• Optimised bait design for targeted sequencing — Dr Volker Brenner, Head of Computational Biology• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology• Summary• Q&A
  • 19. Agenda• Important Definitions and Terminologies• Introduction to Targeted Enrichment• Custom Bait Design
  • 20. Definitions and terminologies• Read length — The number of bases sequenced in a fragment Region of Interest• Capture efficiency Off target On target Off target Region of Interest• Paired end sequencing Fragment 1 Fragment 2• Read depth — How many times has a base been sequenced?
  • 21. Read depth required for mutation detectionAssuming no allelic bias the theoretical read depth required to detectheterozygous variation with given accuracy can be calculated using abinomial distribution Calculations based on variation being seen in at least 2 reads • Should not be just one read as this could be ‘noise’ • Required observations could be a percentage of reads Depth Required Het. Call Accuracy Probability of Error Quality 11 99% 1:100 Q20 14 99.9% 1:1000 Q30 18 99.99% 1:10000 Q40 25 99.999% 1:100000 Q50 • Minimum capacity required = Region of interest (ROI) x required depth • Q30 variant detection for 15Kb ROI requires 210Kb sequencing capacity
  • 22. Agenda• Important Definitions and Terminologies• Introduction to Targeted Enrichment• Custom Bait Design
  • 23. Why use targeted enrichment? Flexibility in choice of genomic loci • Allows capture of specific regions of interest for SNP and Indel detection Cost Effectiveness • Ideal for clinical applications • Specific candidate genes are targeted • Fine mapping post-GWAS • Cost Benefits • Enables multiplexing to fill capacity Streamlined Data Analysis • Reduced noise due to targeted specificity
  • 24. Example of design bias — Insufficient coverageTargeted gene sequencing can lead to some targets without therequired depth of coverage Inadequate Coverage14x (Q30) *data kindly provided by C. Mattocks National Genetics Reference Lab, Salisbury, UK
  • 25. Solution: Intelligent design to improve coverage: Option 1: Option 2: • Increase coverage by • Intelligent design of capture probes increasing depth of increases under-represented loci sequencing • More even coverage of entire region, • Coverage of all targets no loci missed (more likely to find proportionally increased mutations present) • Increased cost of • No need to increase sequence depth sequencing overall (more cost effective) • Some bases still missed (Q30)
  • 26. Agenda• Important Definitions and Terminologies• Introduction to Targeted Enrichment• Custom Bait Design
  • 27. Problems facing users• Design tools not user friendly• Design tools only good for draft design• Potential sources of bias • Regions of interest too short • Bait thermodynamic behaviour • GC content • Melting Temperature• Risk of Design Errors• OGT’s extensive experience in designing probes for microarrays allows us to minimise bias and ensure evenness of coverage giving the best chance to identify mutations
  • 28. OGT’s design pipeline — what we need from you • Regions of Interest • Gene lists • Chromosomal locations • Genome build version • Data file format • Text, Excel, etc.... • Consistent e.g. chr1: 2247628-2248537 2. Draft 4. Thermo- 1. Data 3. Singletons 5. Report Design dynamics
  • 29. Run draft design• Assess the output: • Coverage • Bait distribution • Repeat masking Region of Interest Repeat masking 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  • 30. Custom baits improve coverage at region boundaries OGT 1KG OGT custom bait design gives increased read depth around edges of target regions.
  • 31. Correction for singleton baits• Review the draft design and identify any regions covered by a single bait • These regions span less than 120 bases• Add additional singleton baits to the design Before After• This ensures that small regions are captured as well as large regions• Advantage — Improves evenness of capture across the design 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  • 32. Custom approach ensures variant detection OGT 1KG Even at more than 50x coverage, whole exome sequencing does not accurately identify all SNPs. OGT custom baits design compared with 1000 Genomes whole exome capture data.
  • 33. Correction for bait thermodynamicsGC content Tm content• Calculate GC content for all baits • Calculate the Tm for all baits• Identify those baits where GC • Identify those baits where Tm is content is extreme (for instance extreme (e.g. > 75oC) >65% and <40%)• Add additional copies of these baits • Add additional copies of these baits Region of Interest GC extreme Tm extreme 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  • 34. OGT custom bait designs help overcome GC issues OGT SureSelect In a region with 70% GC content OGT custom bait design achieved a maximum read depth of 50x. The Agilent SureSelect 50Mb capture kit does not capture any reads in this region.
  • 35. OGT custom bait designs help overcome GC issues OGT SureSelectRelative capture of targets within a single gene. Agilent coverage is 20x for the target with no GCcontent bias, and minimal for targets with a GC content of 65%.In contrast OGT custom baits perform excellently in this region.
  • 36. Customer report • Design Parameters • Depth of Coverage • On target / Off target • Regions not covered – and why not • Bait Details • Singletons • GC distribution • Tm distribution • Library Design • Baits generated 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  • 37. Summary • Custom design of regions for targeted sequencing offers significant flexibility for many applications • Expert probe design will ensure: • Better ‘evenness’ of coverage helps ensure no regions are missed and maximises the likelihood of variant detection • Improvement of overall capture efficiency and on-target performance equals cost effective sequencing downstream • Increase capture efficiency of SNPs and Indels equals an increase in the likelihood of detection • Reduction of risk and better performance
  • 38. Adding value through analysis • Introduction • NGS data analysis • Primary analysis • Mapping and assembly • Q score re-calibration • NGS sequencing QC • NGS alignment QC • Secondary analysis • SNP and Indel calling • Annotation and evaluation pipeline • SIFT and PolyPhen • Deliverables • Case study • Summary
  • 39. The analysis challenge Hard driveSequencer with ~4Gb per exome Publication NGS Raw data Mapping Mapping Annotation Annotation Filtering Filtering Reporting Reporting
  • 40. Raw data: FASTQ(standard text representation of short reads)FASTQ uses four lines per sequence. • Line 1: @ followed by a sequence identifier • Line 2: raw sequence letters • Line 3: + (and optional sequence identifier) • Line 4: quality values for the sequence in Line 2. Must contain the same number of symbols as letters in the sequence. (The letters encode Phred Quality Scores from 0 to 93 using ASCII 33 to 126) Example @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !*((((***+))%%%++)(%%%%).1***-+*))**55CCF>>>>>>CCCCCCC65
  • 41. Phred quality scores• Phred is an accurate base-caller used for capillary traces (Ewing et al Genome Research 1998)• Each called base is given a quality score Q• Quality based on simple metrics (such as peak spacing) calibrated against a database of hand-edited data• QPhred = -10 * log10(estimated probability call is wrong) Probability of incorrect Phred Quality Score Base call accuracy base call 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % Q30 often used as a threshold for useful sequence data
  • 42. Adding value through analysis • Introduction • NGS data analysis • Primary analysis • Mapping and assembly • Q score re-calibration • NGS sequencing QC • NGS alignment QC • Secondary analysis • SNP and Indel calling • Annotation and evaluation pipeline • SIFT and PolyPhen • Deliverables • Case study • Summary
  • 43. Primary analysis — Mapping and alignment Raw Sequence Files FASTQ Format Raw Local Quality Duplicate Analysis- Mapping Alignment Realignment score re- (around InDels) marking ready Files calibration Alignment BWA/Bowtie SAM/BAM Format GATK Picard Picard SAM/BAM Format
  • 44. Why mark duplicates and realignment around indels? 3 incorrect calls within 40bp!
  • 45. Primary analysis — Mapping and alignment Raw Sequence Files FASTQ Format Raw Local Quality Duplicate Analysis- Mapping Alignment Realignment score re- (around InDels) marking ready Files calibration Alignment BWA/Bowtie SAM/BAM Format GATK Picard Picard SAM/BAM Format
  • 46. NGS variant calling methods Option 1 - Hard filtering Example: SNP can only be called if • read depth >10 • >35% of reads carry SNP  Effective filtering  Transparent to user – Simplistic approach – Will miss high quality calls that don’t pass threshold Option 2 - Statistical analysis Based on quality scores of individual basepairs, the alignment and statistical probability models  Robust  Optimum balance of sensitivity and specificity due to the use of statistical models  Fewer false positive and false negative SNP calls – Requires correctly pre-processed data with reliable quality scores
  • 47. Base quality score re-calibration Before Recalibration After RecalibrationSource: The Broad Institutehttp://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_poplin.pdf
  • 48. Primary analysis — Raw data and assembly QC Raw Sequence Files FASTQ Format Alignment QC check Picard Sequence QC check Raw Local Quality Duplicate Analysis- Mapping Alignment Realignment score re- (around InDels) marking ready FastQC Files calibration Alignment Alignment QC Report BWA/Bowtie SAM/BAM Format GATK Picard Picard SAM/BAM Format Raw data QC Report
  • 49. Secondary analysisSNP and Indel calling, annotation and filtering • Known variant? • Impact on gene expression? SNPs Analysis- • Splicing affected? Unified Variant ready Genotyper Evaluation alignment • Non-synonymous or frameshift InDels mutation? GATK OGTSAM/BAM Format • Impact on protein function? VCF Format • How confident are we in the call? • Zygosity? Sequence QC Report Alignment QC Report Comprehensive interactive OGT Report
  • 50. SNP/Indel classification(standard analysis)We check and annotate every single detected SNP and Indel against all humanEnsembl genes and transcripts and dbSNPdbSNP annotation:• Is the variant known?• Obtain allele frequencyDoes it affect any of the following• Promoter region• UTR• Splice sites or intronic region• CDS • Synonymous mutation • Non synonymous mutation • Frameshift mutation • Stop codon (truncated/elongated protein sequence) • Overlap with protein domain • Consequence on protein function predicted (SIFT & PolyPhen)
  • 51. OGT Processing Overview Filter out variants Mapped to Perform pairwise present in “baseline” Additional Filtering Promoter Regions genome analysis genome (e.g. somatic Filter out and Analysis tissue, healthy sibling) variants Not Described in Filter out variants present“baseline” in any Non-synonymous Perform pairwise present in Additional Filtering dbSNP “baseline” StudyAnalysis specific Mapped to Exons, Coding Variations Perform genome analysis genome (e.g. somatic tissue, healthy sibling) and additional in- Splice sites or UTRs pairwise exome (e.g. and Protein somatic variants Filter out tissue, depth filteringGather All detected domains Variations with Serious Consequences to the genome Perform pairwise and analysis Additional Filtering SNP/Indels Protein Sequence analysis genome analysis healthy “baseline” present in sibling) genome (e.g. somatic and Analysis (SIFT) AND not all tissue, healthy sibling) “case” exomes Filter out variants Rare RS ID Perform pairwise present in “baseline” Additional Filtering Described in dbSNP Variations genome analysis genome (e.g. somatic and Analysis tissue, healthy sibling) Multi Genome Analysis, Data Tailored analysis based on client’s Individual Genome Analysis Gathering and Comparison individual requirements (Standard Level) (Advanced Level) (Expert Level) Data Information
  • 52. NGS data delivery ship data Hard drive (or FTP) Double click! File location & share results Comprehensive HTML analysis report
  • 53. Analysis report: Summary section
  • 54. Analysis report: QC section — Read QC
  • 55. Analysis report: QC section — Read QC
  • 56. Analysis report: QC section — Alignment QC
  • 57. Analysis report: QC section — Alignment QC
  • 58. Analysis section — Overview
  • 59. The Variant Table View Data display Data export
  • 60. The Variant Table View — External links
  • 61. The Detailed Variant View
  • 62. Predicted consequences on protein function
  • 63. Alignment View of selected variant in IGV
  • 64. OGT data processing ensures detection of insertions Detection of an 31bp insertion
  • 65. OGT data processing ensures detection of deletions:Example1 Detection of an 84bp deletion
  • 66. Detection of homozygous and heterozygous deletions Homozygous deletion Heterozygous deletion No deletion (reference sequence)
  • 67. Interactive data filtering
  • 68. Customer data: Analysis of consanguineous samples 1 2 I HACE1 Exon11 c.994C>T 1 2 R332X II (CGA -> TGA) Data courtesy of Dr. Bernd Wollnik, Institute of Human Genetics, University Hospital of Cologne
  • 69. Confirmation by Sanger sequencing X H V F R I G P Control R332X 69-161 168-258 602-909 ANK1 ANK1 HECT Mother Father Patient1 Patient2 Data courtesy of Dr. Bernd Wollnik, Institute of Human Genetics, University Hospital of Cologne
  • 70. Customer feedback... Analysis of Consanguineous Samples “Just wanted to let you know that we have probably identified the causative gene and mutation in the patient sample. The mutation is located in the middle of an 18 Mb homozygous stretch and is a homozygous nonsense mutation!!! Wow, its going so nicely with your data!!!” Dr. Bernd Wollnik, Institute of Human Genetics, University Hospital of Cologne
  • 71. SummaryOGT offers fast, accurate & powerful NGS analysisStandard Analysis• Robust statistical data analysis• Comprehensive variant annotation• Interactive filtering and prioritisation of data based on • chromosomal region • allele frequency / novelty • zygosity • confidence score and read depth • severity of mutationAdvanced Analysis• Multi-genome comparisonBespoke analysis• Tailored to your specific requirements
  • 72. Outline of presentation• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO• Optimised bait design for targeted sequencing — Dr Volker Brenner, Head of Computational Biology• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology• Summary• Q&A
  • 73. Speak to one of our team or visit booth 713 to:• Book a demonstration of our interactive analysis report — Hurry limited availability• Discuss your specific project requirements• Take part in our short survey and have your chance to win an Amazon Kindle
  • 74. Thank youwww.ogt.co.uk 75