Your SlideShare is downloading. ×
0
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
ASHG sequencing workshop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ASHG sequencing workshop

5,557

Published on

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,557
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. A unique targeted sequencing service providing meaningful results, not insurmountable data Dr. Mike Evans — Chief Executive
  2. Outline of presentation• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO• Optimised bait design for targeted sequencing — Dr Volker Brenner, Head of Computational Biology• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology• Summary• Q&A
  3. OGT - provides advanced clinical genetics solutions - develops innovative molecular diagnostics• Founded by Ed Southern in 1995• 64 people OGT Begbroke: Corporate offices and high- OGT Southern Centre: Biomarker discovery throughput labs
  4. OGT’s key businessesIP Licensing40 licence relationships TechnologiesDiagnostic BiomarkersGenomic- and protein-based diagnostics For Molecular MedicineClinical and Genomic SolutionsCytogenetics products and genomic services
  5. Clinical and Genomic SolutionsAddressing the challenges of high-throughput, high-resolutionmolecular technologies:• High equipment and staff training costs• Short equipment lifespan• Complex study design and processes (e.g. platform evaluation & selection)• Vast amounts of data • Extensive computing infrastructure • Data analysis expertise and resource The solution: Genefficiency Genomic Services
  6. Genefficiency™ — World’s leading aCGH serviceHigh-quality data & complete reassurance • Experimental and array design expertise • High-throughput processing (>2000 samples / week) • Applications: aCGH-CNV, methylation, miRNA, gene expression analysis • Comprehensive data analysis services • >40 QC checks on each sample to ensure high-quality data
  7. Independent accreditations • First Agilent High-Throughput Microarray Certified Service Provider • ISO 9001:2008 — Quality management systems FS 561156 • ISO 27001:2005 — Information security IS 561157 • ISO 17025:2005 — aCGH Laboratory services 4593
  8. Customer satisfaction… “In order to characterise genetic variants, reproducible performance and reliable processing of the high resolution microarrays is essential. We were pleased with OGT’s responsive approach and attention to producing high quality data to tight deadlines” Dr Matt Hurles, Wellcome Trust Sanger Institute.” 20,000 samples. 1,000 samples / week
  9. OGT collaborators and customers
  10. A world-class teamOur expert team deliver:• Excellent project management and customer service • >600 projects to date • >50,000 samples• Unparalleled expertise in study and probe design• Advanced data analysis though a dedicated team of bioinformaticians• Rapid turnaround times• A wealth of experience of clinical and translational research projects
  11. New Genefficiency Targeted Sequencing Services
  12. Delivering discoveryGenefficiency Targeted Sequencing Services — designed to be different:• Comprehensive — taking you from genomic DNA to filtered, qualified results• Rigorously designed — project and probe design expertise maximises your likelihood of discovery• Expert support — experienced team of biologists and bioinformaticians• Dedication to quality — from sample to result, delivering reliable results every time
  13. Delivering an integrated, comprehensive service 1. Selection of most 2. Capture, sample 3. Data analysis and appropriate genomic multiplexing and advanced filtering of regions for enrichment sequencing variants27/10/2011 13
  14. Delivering expert project designStep 1: Selection of most appropriate genomic regions for your project and budgetWhole exome Custom genomic regionsPre-designed, validated whole Expert custom design of capture probesexome capture probes for your regions of interest Coding regions are “most likely” Flexibility to focus on regions of clinical candidates for many disorders significance or GWAS regions
  15. Delivering class-leading technologyWe have fully optimised the DNA capture and sequencingmethodologies, so you don’t have to!Step 2: Performing the capture, sample multiplexing, library preparation and sequencing• Options for sample indexing and multiplexing to minimise sequencing cost• Depth of sequencing coverage to suit your samples and project• Paired-end sequencing on the industry-leading Illumina HiSeq 2000
  16. OGT delivers discovery, not just dataStep 3: Data analysis and advanced filtering of variants• OGT’s dedicated analysis pipeline brings you beyond data, to a filtered list of variants relevant to your study SEQUENCE FILTER DISCOVER
  17. Genefficiency Targeted Sequencing ServicesThe PLATFORM • Core sequencing platform: Illumina HiSeq 2000 • Core sequence capture technology: Agilent SureSelectThe PEOPLE • Team of highly skilled molecular biologists and bioinformaticians • Core expertise in probe design • Successful development of advanced analysis solutions
  18. Outline of presentation• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO• Optimised bait design for targeted sequencing — Dr Volker Brenner, Head of Computational Biology• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology• Summary• Q&A
  19. Agenda• Important Definitions and Terminologies• Introduction to Targeted Enrichment• Custom Bait Design
  20. Definitions and terminologies• Read length — The number of bases sequenced in a fragment Region of Interest• Capture efficiency Off target On target Off target Region of Interest• Paired end sequencing Fragment 1 Fragment 2• Read depth — How many times has a base been sequenced?
  21. Read depth required for mutation detectionAssuming no allelic bias the theoretical read depth required to detectheterozygous variation with given accuracy can be calculated using abinomial distribution Calculations based on variation being seen in at least 2 reads • Should not be just one read as this could be ‘noise’ • Required observations could be a percentage of reads Depth Required Het. Call Accuracy Probability of Error Quality 11 99% 1:100 Q20 14 99.9% 1:1000 Q30 18 99.99% 1:10000 Q40 25 99.999% 1:100000 Q50 • Minimum capacity required = Region of interest (ROI) x required depth • Q30 variant detection for 15Kb ROI requires 210Kb sequencing capacity
  22. Agenda• Important Definitions and Terminologies• Introduction to Targeted Enrichment• Custom Bait Design
  23. Why use targeted enrichment? Flexibility in choice of genomic loci • Allows capture of specific regions of interest for SNP and Indel detection Cost Effectiveness • Ideal for clinical applications • Specific candidate genes are targeted • Fine mapping post-GWAS • Cost Benefits • Enables multiplexing to fill capacity Streamlined Data Analysis • Reduced noise due to targeted specificity
  24. Example of design bias — Insufficient coverageTargeted gene sequencing can lead to some targets without therequired depth of coverage Inadequate Coverage14x (Q30) *data kindly provided by C. Mattocks National Genetics Reference Lab, Salisbury, UK
  25. Solution: Intelligent design to improve coverage: Option 1: Option 2: • Increase coverage by • Intelligent design of capture probes increasing depth of increases under-represented loci sequencing • More even coverage of entire region, • Coverage of all targets no loci missed (more likely to find proportionally increased mutations present) • Increased cost of • No need to increase sequence depth sequencing overall (more cost effective) • Some bases still missed (Q30)
  26. Agenda• Important Definitions and Terminologies• Introduction to Targeted Enrichment• Custom Bait Design
  27. Problems facing users• Design tools not user friendly• Design tools only good for draft design• Potential sources of bias • Regions of interest too short • Bait thermodynamic behaviour • GC content • Melting Temperature• Risk of Design Errors• OGT’s extensive experience in designing probes for microarrays allows us to minimise bias and ensure evenness of coverage giving the best chance to identify mutations
  28. OGT’s design pipeline — what we need from you • Regions of Interest • Gene lists • Chromosomal locations • Genome build version • Data file format • Text, Excel, etc.... • Consistent e.g. chr1: 2247628-2248537 2. Draft 4. Thermo- 1. Data 3. Singletons 5. Report Design dynamics
  29. Run draft design• Assess the output: • Coverage • Bait distribution • Repeat masking Region of Interest Repeat masking 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  30. Custom baits improve coverage at region boundaries OGT 1KG OGT custom bait design gives increased read depth around edges of target regions.
  31. Correction for singleton baits• Review the draft design and identify any regions covered by a single bait • These regions span less than 120 bases• Add additional singleton baits to the design Before After• This ensures that small regions are captured as well as large regions• Advantage — Improves evenness of capture across the design 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  32. Custom approach ensures variant detection OGT 1KG Even at more than 50x coverage, whole exome sequencing does not accurately identify all SNPs. OGT custom baits design compared with 1000 Genomes whole exome capture data.
  33. Correction for bait thermodynamicsGC content Tm content• Calculate GC content for all baits • Calculate the Tm for all baits• Identify those baits where GC • Identify those baits where Tm is content is extreme (for instance extreme (e.g. > 75oC) >65% and <40%)• Add additional copies of these baits • Add additional copies of these baits Region of Interest GC extreme Tm extreme 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  34. OGT custom bait designs help overcome GC issues OGT SureSelect In a region with 70% GC content OGT custom bait design achieved a maximum read depth of 50x. The Agilent SureSelect 50Mb capture kit does not capture any reads in this region.
  35. OGT custom bait designs help overcome GC issues OGT SureSelectRelative capture of targets within a single gene. Agilent coverage is 20x for the target with no GCcontent bias, and minimal for targets with a GC content of 65%.In contrast OGT custom baits perform excellently in this region.
  36. Customer report • Design Parameters • Depth of Coverage • On target / Off target • Regions not covered – and why not • Bait Details • Singletons • GC distribution • Tm distribution • Library Design • Baits generated 2. Draft 3. Singleton 4. Bait Thermo- 1. Data 5. Report Design Baits dynamics
  37. Summary • Custom design of regions for targeted sequencing offers significant flexibility for many applications • Expert probe design will ensure: • Better ‘evenness’ of coverage helps ensure no regions are missed and maximises the likelihood of variant detection • Improvement of overall capture efficiency and on-target performance equals cost effective sequencing downstream • Increase capture efficiency of SNPs and Indels equals an increase in the likelihood of detection • Reduction of risk and better performance
  38. Adding value through analysis • Introduction • NGS data analysis • Primary analysis • Mapping and assembly • Q score re-calibration • NGS sequencing QC • NGS alignment QC • Secondary analysis • SNP and Indel calling • Annotation and evaluation pipeline • SIFT and PolyPhen • Deliverables • Case study • Summary
  39. The analysis challenge Hard driveSequencer with ~4Gb per exome Publication NGS Raw data Mapping Mapping Annotation Annotation Filtering Filtering Reporting Reporting
  40. Raw data: FASTQ(standard text representation of short reads)FASTQ uses four lines per sequence. • Line 1: @ followed by a sequence identifier • Line 2: raw sequence letters • Line 3: + (and optional sequence identifier) • Line 4: quality values for the sequence in Line 2. Must contain the same number of symbols as letters in the sequence. (The letters encode Phred Quality Scores from 0 to 93 using ASCII 33 to 126) Example @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !*((((***+))%%%++)(%%%%).1***-+*))**55CCF>>>>>>CCCCCCC65
  41. Phred quality scores• Phred is an accurate base-caller used for capillary traces (Ewing et al Genome Research 1998)• Each called base is given a quality score Q• Quality based on simple metrics (such as peak spacing) calibrated against a database of hand-edited data• QPhred = -10 * log10(estimated probability call is wrong) Probability of incorrect Phred Quality Score Base call accuracy base call 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % Q30 often used as a threshold for useful sequence data
  42. Adding value through analysis • Introduction • NGS data analysis • Primary analysis • Mapping and assembly • Q score re-calibration • NGS sequencing QC • NGS alignment QC • Secondary analysis • SNP and Indel calling • Annotation and evaluation pipeline • SIFT and PolyPhen • Deliverables • Case study • Summary
  43. Primary analysis — Mapping and alignment Raw Sequence Files FASTQ Format Raw Local Quality Duplicate Analysis- Mapping Alignment Realignment score re- (around InDels) marking ready Files calibration Alignment BWA/Bowtie SAM/BAM Format GATK Picard Picard SAM/BAM Format
  44. Why mark duplicates and realignment around indels? 3 incorrect calls within 40bp!
  45. Primary analysis — Mapping and alignment Raw Sequence Files FASTQ Format Raw Local Quality Duplicate Analysis- Mapping Alignment Realignment score re- (around InDels) marking ready Files calibration Alignment BWA/Bowtie SAM/BAM Format GATK Picard Picard SAM/BAM Format
  46. NGS variant calling methods Option 1 - Hard filtering Example: SNP can only be called if • read depth >10 • >35% of reads carry SNP  Effective filtering  Transparent to user – Simplistic approach – Will miss high quality calls that don’t pass threshold Option 2 - Statistical analysis Based on quality scores of individual basepairs, the alignment and statistical probability models  Robust  Optimum balance of sensitivity and specificity due to the use of statistical models  Fewer false positive and false negative SNP calls – Requires correctly pre-processed data with reliable quality scores
  47. Base quality score re-calibration Before Recalibration After RecalibrationSource: The Broad Institutehttp://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_poplin.pdf
  48. Primary analysis — Raw data and assembly QC Raw Sequence Files FASTQ Format Alignment QC check Picard Sequence QC check Raw Local Quality Duplicate Analysis- Mapping Alignment Realignment score re- (around InDels) marking ready FastQC Files calibration Alignment Alignment QC Report BWA/Bowtie SAM/BAM Format GATK Picard Picard SAM/BAM Format Raw data QC Report
  49. Secondary analysisSNP and Indel calling, annotation and filtering • Known variant? • Impact on gene expression? SNPs Analysis- • Splicing affected? Unified Variant ready Genotyper Evaluation alignment • Non-synonymous or frameshift InDels mutation? GATK OGTSAM/BAM Format • Impact on protein function? VCF Format • How confident are we in the call? • Zygosity? Sequence QC Report Alignment QC Report Comprehensive interactive OGT Report
  50. SNP/Indel classification(standard analysis)We check and annotate every single detected SNP and Indel against all humanEnsembl genes and transcripts and dbSNPdbSNP annotation:• Is the variant known?• Obtain allele frequencyDoes it affect any of the following• Promoter region• UTR• Splice sites or intronic region• CDS • Synonymous mutation • Non synonymous mutation • Frameshift mutation • Stop codon (truncated/elongated protein sequence) • Overlap with protein domain • Consequence on protein function predicted (SIFT & PolyPhen)
  51. OGT Processing Overview Filter out variants Mapped to Perform pairwise present in “baseline” Additional Filtering Promoter Regions genome analysis genome (e.g. somatic Filter out and Analysis tissue, healthy sibling) variants Not Described in Filter out variants present“baseline” in any Non-synonymous Perform pairwise present in Additional Filtering dbSNP “baseline” StudyAnalysis specific Mapped to Exons, Coding Variations Perform genome analysis genome (e.g. somatic tissue, healthy sibling) and additional in- Splice sites or UTRs pairwise exome (e.g. and Protein somatic variants Filter out tissue, depth filteringGather All detected domains Variations with Serious Consequences to the genome Perform pairwise and analysis Additional Filtering SNP/Indels Protein Sequence analysis genome analysis healthy “baseline” present in sibling) genome (e.g. somatic and Analysis (SIFT) AND not all tissue, healthy sibling) “case” exomes Filter out variants Rare RS ID Perform pairwise present in “baseline” Additional Filtering Described in dbSNP Variations genome analysis genome (e.g. somatic and Analysis tissue, healthy sibling) Multi Genome Analysis, Data Tailored analysis based on client’s Individual Genome Analysis Gathering and Comparison individual requirements (Standard Level) (Advanced Level) (Expert Level) Data Information
  52. NGS data delivery ship data Hard drive (or FTP) Double click! File location & share results Comprehensive HTML analysis report
  53. Analysis report: Summary section
  54. Analysis report: QC section — Read QC
  55. Analysis report: QC section — Read QC
  56. Analysis report: QC section — Alignment QC
  57. Analysis report: QC section — Alignment QC
  58. Analysis section — Overview
  59. The Variant Table View Data display Data export
  60. The Variant Table View — External links
  61. The Detailed Variant View
  62. Predicted consequences on protein function
  63. Alignment View of selected variant in IGV
  64. OGT data processing ensures detection of insertions Detection of an 31bp insertion
  65. OGT data processing ensures detection of deletions:Example1 Detection of an 84bp deletion
  66. Detection of homozygous and heterozygous deletions Homozygous deletion Heterozygous deletion No deletion (reference sequence)
  67. Interactive data filtering
  68. Customer data: Analysis of consanguineous samples 1 2 I HACE1 Exon11 c.994C>T 1 2 R332X II (CGA -> TGA) Data courtesy of Dr. Bernd Wollnik, Institute of Human Genetics, University Hospital of Cologne
  69. Confirmation by Sanger sequencing X H V F R I G P Control R332X 69-161 168-258 602-909 ANK1 ANK1 HECT Mother Father Patient1 Patient2 Data courtesy of Dr. Bernd Wollnik, Institute of Human Genetics, University Hospital of Cologne
  70. Customer feedback... Analysis of Consanguineous Samples “Just wanted to let you know that we have probably identified the causative gene and mutation in the patient sample. The mutation is located in the middle of an 18 Mb homozygous stretch and is a homozygous nonsense mutation!!! Wow, its going so nicely with your data!!!” Dr. Bernd Wollnik, Institute of Human Genetics, University Hospital of Cologne
  71. SummaryOGT offers fast, accurate & powerful NGS analysisStandard Analysis• Robust statistical data analysis• Comprehensive variant annotation• Interactive filtering and prioritisation of data based on • chromosomal region • allele frequency / novelty • zygosity • confidence score and read depth • severity of mutationAdvanced Analysis• Multi-genome comparisonBespoke analysis• Tailored to your specific requirements
  72. Outline of presentation• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO• Optimised bait design for targeted sequencing — Dr Volker Brenner, Head of Computational Biology• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology• Summary• Q&A
  73. Speak to one of our team or visit booth 713 to:• Book a demonstration of our interactive analysis report — Hurry limited availability• Discuss your specific project requirements• Take part in our short survey and have your chance to win an Amazon Kindle
  74. Thank youwww.ogt.co.uk 75

×